[01:42:37] boing [01:43:08] have there been bugs in semantic mediawiki json output where extraneous data is included in the query? [01:44:49] because I think I found a bug where this is happening. [01:45:07] look at the markup for this page: http://wiki.zenoss.org/ZenPack:Rundeck [01:46:02] when I do an Ask query for [[Release of::ZenPack:Rundeck]] ?Version, I get "1.0.0, 1.0.1". I should only be getting a single value. I'll try adding a space and re-saving the ZenPack page and see if that fixes it... [01:46:40] ok. here was the issue. ApprovedRevs [01:47:15] Had to approve the current version of the page. I think the bug may be that if there is no approved version, semantic mediawiki sees too many sub-objects and gives multiple results? [01:51:55] yaron, you there? [01:53:31] yaron: I found a bug with semantic mediawiki and approvedrevs. If there is no approved version of a page, it seems like smw may process *all* unapproved revs. Detected this because of bogus multiple results for sub-objects. [02:31:24] drobbins: hello! I just saw this now. [02:34:06] When does this happen? When you resave a page? [03:01:16] I fixed the problem by approving a page. [03:01:51] the zenpack page was unapproved. It was getting bogus internal object data back. A version that should have been either '1.0.0' or '1.0.1' was set to '1.0.0, 1.0.1' [03:02:10] the JSON python parsing code was failing because I was getting the version back as a list rather than a string. [03:55:03] yaron: info above [04:16:54] drobbins: okay. So what do you mean by "fixed the problem"? It only happened for one page? [10:15:34] New review: Nischayn22; "Strange, I see no reason for that bug. Working fine for me." [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69668 [13:42:02] New patchset: Mwjames; "SMW\UnusedPropertiesCollector" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [13:50:44] New review: Mwjames; "Not unexpected because of @group Database ... " [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [13:57:51] New patchset: Mwjames; "SMW\UnusedPropertiesCollector" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:00:43] New review: Mwjames; "Expect failures because of the setup/teardown of unitest tables but I don't want to make SpecialTest..." [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:14:46] New patchset: Mwjames; "SMW\UnusedPropertiesCollector" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:23:01] New review: Mwjames; "Still failing on Error: 1 no such table: unittest_unittest_smw_object_ids" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:52:21] New review: Mwjames; "I don't what is going or why the unittest table doesn't exists. Doing a var_dump now since locally I..." [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:52:50] New patchset: Mwjames; "SMW\UnusedPropertiesCollector" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [14:56:10] New review: Mwjames; "Doing var_dump( $storeIdTableName, $this->dbConnection->tableExists( $storeIdTableName ) );" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [15:07:33] New patchset: Mwjames; "SMW\UnusedPropertiesCollector" [mediawiki/extensions/SemanticMediaWiki] (master) - https://gerrit.wikimedia.org/r/69859 [20:00:20] yaron1, have you or anybody else worked on bi-directionality for JSON format? So that what gets exported is the same as what gets written to a page? (Like, instead of a semantic forms writing templates it writes JSON structures to the page) [20:12:06] JeroenDeDauw, you helped make the JSON format result format right? [20:12:48] jgay: that was mostly MWJames [20:13:11] JeroenDeDauw, oh ok. Have you heard of any work being done on a JSON input format that works with semantic forms? [20:13:49] (like,instead of the form parsing a MW template it would parse and write-out to a JSON object [20:16:14] jgay: no plans fore this AFAIK [20:16:28] yaron1 his PagesSchemas extension goes in that direction however [20:16:50] Oops, didn't see this until now! [20:17:24] jgay: why would you want to store data as JSON? [20:18:03] Yeah, I had no part in the JSON result format. [20:19:00] Page Schemas doesn't really do that... it stores the data schema in a structured way, not actual data. [20:19:20] Yaron, so, we have an importer right now that takes Debian package info and pulls it into our wiki [20:20:28] and we will want to start doing regular updates ... but, the problem is that the Debian package info won't stay in sync over time with what we have on the wiki -- because we capture a lot mroe info than Debian does [20:21:31] Well, that's a big problem... storing the data differently on the wiki won't really solve the issue of reconciling two versions of the same data. [20:21:38] JSON is our intermediary language between the MediaWiki templates we generate and the various Debian package info sources we draw from [20:22:31] What about using External Data to only display, not import, the Debian info? [20:23:04] Yaron, well, we aren't just displaying the raw info. We edit what Debian spits out. [20:23:18] And we are going to be adding more sources of info to pull into the Directory. Other packaging systems. [20:23:25] What if that info then also gets changed by Debian? [20:23:46] Yaron, exactly. This is where we are now [20:24:06] Well, that's... a problem. [20:24:18] Yes, but a problem by design [20:24:30] Does Debian know/care about your version/changes to their data? [20:25:17] Yaron, no, all of that part has been wroked out [20:25:23] Oh. [20:25:28] it's been like a year of work to get to this point :-) [20:25:36] So how does the whole thing work? [20:27:50] So, we have a tool that pulls info from Debian (which is in a few different places) on a given repo. We then can output everything in to JSON format. And then we can spit out a Directory wiki page (which is a series of templates that our form works with) [20:28:59] Cool. [20:29:15] (If it had been CSV or XML, you could have used Data Transfer, but you probably knew that.) [20:29:53] So, the next phase is making the toolchain a bit more interactive [20:30:34] Once we pull in the data as a JSON object, we can then edit it by hand before spitting out the Wiki templates page. [20:30:52] it'd be nice to also do merging at this phase [20:31:21] What do you mean by merging? [20:32:01] So, some info about a package will be from the Free Software Directory and some info will be what we are getting from Debian [20:32:37] Yes, but the difficult part is the fields that are in common, no? [20:33:31] Yaron, right. We are thinking of maybe presenting a user with the standard diff options [20:34:03] By "user", you mean a behind-the-scenes, admin-type user, I assume. [20:34:06] which would include being able to open an editor to do a complicated merge [20:34:11] right [20:34:56] staff or volunteer or whatnot. Normal users will just edit wiki pages as usual [20:35:15] Well, I still don't see the benefit of storing the data as JSON - you could just turn the new info from Debian into wikitext format, and do the "merge" on that. [20:35:51] Yaron, so the main advantage has to do with there being a lot of toolchains that work well with JSON and not a lot that work well with mediawiki template syntax [20:36:16] Oh, I see. [20:36:24] What's an example of a toolchain you'd use? [20:36:36] Yaron, and the longer term version isn't just dealing with Debian ... there are a lot of cool sources we can pull from [20:36:47] Sounds cool [20:36:48] . [20:37:08] Yaron, well, what we have now we are calling dafsoup [20:37:27] it is a python program that does what I described above [20:38:05] but, there is another little tool i am working on which is basically a javascript bookmarklet [20:38:35] So these tools are stuff that you guys have created? [20:38:45] (are all) [20:39:16] Yaron, yeah, the importer is created and has already been used to create a few thousand pages [20:39:45] Nice. [20:39:49] and the javascript bookmarklet is only a demo-qualiyt thing. A group of students made it at a Google sponsored 24 hour hackathon [20:40:37] the bookmarklet targets stuff that isn't already in a distro or to help collect more detailed info quickly [20:41:54] So, you go to a sourceforge or github or just some custom project homepage and then collect info by highlighting or clicking data on the page and having it fill-up/update a form of info and then when you click save it writes out to a page on the free software directory [20:44:12] If this stopped at just keeping the free software directory up to date, then I wouldn't necessarily expect it to take off or other people to want to help hack on tools. But, if I can get a bidirectional thing going ... where updating the Free Software Directory can also update package info, or provide additional package info for some distro, then I think this could get more momentum [20:46:01] Yeah, sending stuff back is the way to go. [20:50:09] Yaron, so I guess a more specific question is, do you think it'd be a lot of work to have Semantic Forms read-in and write-out a JSON object rather than Templates? [20:50:34] Yes. [20:58:11] Yaron, is PageSchemas a similar idea? [20:58:17] No. [20:59:50] hmm, so, maybe it might be better/easier to simply do our own parsing of wiki template [21:41:54] Yaron, anyhow, thanks for your help :-) I am thinking the best bet might be to just write a separate extension to do the conversion to and from wiki tempaltes to other formats specified by an XML file or something. [21:42:19] I'll let you know as things progress no that front ... probably when we publish our importer tool in the next week or two [21:42:25] Cool.