[08:27:40] hi [08:33:03] why does this query seem to hang and never give any results on https://tools.wmflabs.org/autolist ? (female botanists with no en.wiki page): claim[106:(tree[2374149][][279])] AND claim[21:6581072] AND nolink[enwiki] [10:01:20] Is this the right place to ask a question about the json-data-dumps? [10:06:47] I'm just wondering whether there are any specific reason for some item pages to missing in the json-data-dumps - e.g. https://www.wikidata.org/wiki/Q133 ?? [12:59:32] * Dirtybutt slaps der-onkel around a bit with a large fishbot [13:03:12] * Dirtybutt slaps Dirtybutt around a bit with a large fishbot [15:50:41] I'm wondering whether there are any specific reason for some item pages to be missing in the json-data-dumps - e.g. https://www.wikidata.org/wiki/Q133 ?? I should probably open a bug report for this? [16:02:03] Torben: Probably, yeah [16:02:08] hoo will look at it later no doubt [16:02:35] ugh [16:02:43] Will indeed look later on [16:03:20] Torben: Can you maybe give more examples? (Just in case) [16:07:21] Reedy: Will you be in SF, btw? [16:07:26] Nope :( [16:12:48] hoo: When stuff was being booked, I wasn't supposed to be free to go... [16:12:57] When things got closer, I was able to. And it was too late really [16:15:18] :/ [16:15:43] Shit happens, unfortunately [16:16:11] I think I'll try and be in israel [16:16:27] Thinking about fosdem too [16:40:14] hoo@snapshot1003:~$ zgrep '{"type":"item","id":"Q133",' /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.gz [16:40:14] {"type":"item","id":"Q133" [16:40:16] Torben: ^ [16:40:21] (I cut of the rest) [16:40:34] * cut off [16:40:57] do you have more examples? Do you maybe mean an older dump? [18:14:46] moin :) [18:18:07] hi aude :) [18:18:12] In SF, now? [18:19:07] yep [18:19:13] when do you arrive? [18:19:41] Tomorrow at 8pm :S [18:19:45] :) [18:20:12] Would have liked to come in a bit earlier, but this made the cheapest flight [18:20:16] looks like it will rain every day [18:20:24] just fyi (it's a good thing for sf) [18:20:35] coming yesterday was cheapest for me [18:20:38] Wow... that's surprising [18:20:46] I'm flying via Detroit this time [18:21:00] ah [18:21:10] I don't think I have ever seen rain there... [18:21:34] it's rare [18:21:56] I checked that they have free wlan... hope that actually works out. I spent 4h sitting in the Istanbul airport in October essentially doing nothing [18:22:25] :/ [19:31:34] Sorry - was obviously away - yeah, I can find more examples :) [19:34:57] I can see you're using the file wikidata-20151228-all.json.gz - should it not contain exactly the same as the bz2-file ? [19:38:30] And I am using the latest 20151228....bz2 file [19:39:37] I will download and take a look in the corresponding .gz file [20:02:13] Torben: Some archive applications have troubles unpacking the bz2 [20:02:20] because we pack it using pbzip2 [20:05:07] I'm looking into the bz2 one myself now [20:05:38] They both contain the same data [20:05:41] well, they should [20:05:52] hoo@snapshot1003:~$ zgrep '{"type":"item","id":"Q133",' /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.bz2 [20:05:52] hoo@snapshot1003:~$ echo $? [20:05:53] 1 [20:05:54] ugh [20:06:17] but could be that zgrep also fails reading bz2 files with multiple streams [20:06:23] I'll try with zcat and a pipe [20:06:32] zcat for sure can read them, I tested that before [20:07:13] doh, the zcat on jessie can't read bz2 at all?! [20:09:51] * bzcat [20:26:31] hoo@snapshot1003:~$ bzip2 -cdfk /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.bz2 | grep '{"type":"item","id":"Q133",' [20:26:31] {"type":"item","id":"Q133","labels":{"en":{"language":"en","value":"mixed-member proportional representation"}[…] [20:26:33] Torben: ^ [20:26:56] So I guess whatever you used to read the bz2 couldn't make sense of our multi stream bz2 [20:28:01] I just read in Wikimania 2016 that there may be a "Watson" person as Wikidata keynote speaker. Is this Watson the IBM project ? [20:28:57] I don't know for sure, but probably, yes [20:29:20] so, they would use Wikidata ? [20:29:26] nice! [20:29:51] So, Platypus would not be the only query answering project using Wikidata [20:34:49] hoo: just about to unpack now - if the solution is this simple I'll be happy ;) [20:35:09] Torben: Yeah... but be aware, they are about 62G by now [20:35:18] GiB, I mean [20:35:23] yup - takes a while ;) [21:03:11] There definitely seems to an issue with the decompressor I use (7-zip on windows) - just downloaded bzip2 cmd line tool and giving that a try [21:04:06] That should work [21:04:40] When I chose to use pbzip2 for these, I thought everything would be able to read the files (as the bzip2 standard is being followed) [21:04:49] but apparently most applications out there fail with bzip2 [21:05:04] I think we need to go back to just using bzip2 to compress the files :S [21:07:18] or maybe just make a note of it somewhere ;) [21:08:02] the funny thing is that 7-zip fails on the .gz-file, as well - .gz is not exactly new ;) [21:09:39] Yeah... but we also use parallelism there [21:09:48] which is supported by the standard [21:09:55] but maybe not by overly naive implementations [21:10:27] ha - they may have forgotten to read the nastier details of the spec [21:10:56] but I'm learning something new, which is almost always a good outcome :) [21:11:56] wikidata is, by the way, an awesome and exciting project! [21:11:57] Compressing the dumps as bz2 takes over 4h w/o pbzip2... and given we grow by 1% per week, I wanted something that can scale [21:14:37] 1% per week, that quickly adds up... [21:14:52] Indeed [21:38:58] How big is the unpacked file? Mine is 62.259.208.192 bytes - does that sound right? [21:39:31] That sounds about right [21:39:48] cool - just searching for my favorite item Q133 right now ;) [21:42:06] and there it is - thank you! You've been a great help :) [21:47:12] You're welcome :) [21:55:45] hoo: [21:55:46] [21:49:29] Could someone help me convert wikibase to the extension registration please. I have tryed here https://gerrit.wikimedia.org/r/#/c/229119/ but the tests keep failing. [21:55:46] [21:49:49] paladox: You'd be best leaving that to the Wikidata fokes [21:55:46] [21:49:51] *folks [21:55:46] [21:50:00] Oh ok. [21:55:47] [21:50:27] Because they use a special meta repo for deploment too [21:55:49] [21:50:27] https://github.com/wikimedia/mediawiki-extensions-Wikidata [21:55:51] [21:52:36] Oh. [21:55:55] lol. [23:08:53] Reedy: I kind of have that on my list [23:09:53] :) [23:10:05] More the fact he had no idea what he was doing, or the added "complexities" [23:18:36] heh... I didn't review it myself yet, as it would probably take me ages to get it rght myself