[08:27:40] <pengo>	 hi
[08:33:03] <pengo>	 why does this query seem to hang and never give any results on https://tools.wmflabs.org/autolist ? (female botanists with no en.wiki page): claim[106:(tree[2374149][][279])] AND claim[21:6581072] AND nolink[enwiki]
[10:01:20] <Torben>	 Is this the right place to ask a question about the json-data-dumps?
[10:06:47] <Torben>	 I'm just wondering whether there are any specific reason for some item pages to missing in the json-data-dumps - e.g. https://www.wikidata.org/wiki/Q133 ??
[12:59:32] * Dirtybutt slaps der-onkel around a bit with a large fishbot
[13:03:12] * Dirtybutt slaps Dirtybutt around a bit with a large fishbot
[15:50:41] <Torben>	 I'm wondering whether there are any specific reason for some item pages to be missing in the json-data-dumps - e.g. https://www.wikidata.org/wiki/Q133 ?? I should probably open a bug report for this?
[16:02:03] <Reedy>	 Torben: Probably, yeah
[16:02:08] <Reedy>	 hoo will look at it later no doubt
[16:02:35] <hoo>	 ugh
[16:02:43] <hoo>	 Will indeed look later on
[16:03:20] <hoo>	 Torben: Can you maybe give more examples? (Just in case)
[16:07:21] <hoo>	 Reedy: Will you be in SF, btw?
[16:07:26] <Reedy>	 Nope :(
[16:12:48] <Reedy>	 hoo: When stuff was being booked, I wasn't supposed to be free to go...
[16:12:57] <Reedy>	 When things got closer, I was able to. And it was too late really
[16:15:18] <hoo>	 :/
[16:15:43] <Reedy>	 Shit happens, unfortunately
[16:16:11] <Reedy>	 I think I'll try and be in israel
[16:16:27] <Reedy>	 Thinking about fosdem too
[16:40:14] <hoo>	 hoo@snapshot1003:~$ zgrep '{"type":"item","id":"Q133",' /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.gz 
[16:40:14] <hoo>	 {"type":"item","id":"Q133"
[16:40:16] <hoo>	 Torben: ^
[16:40:21] <hoo>	 (I cut of the rest)
[16:40:34] <hoo>	 * cut off
[16:40:57] <hoo>	 do you have more examples? Do you maybe mean an older dump?
[18:14:46] <aude>	 moin :)
[18:18:07] <hoo>	 hi aude :)
[18:18:12] <hoo>	 In SF, now?
[18:19:07] <aude>	 yep
[18:19:13] <aude>	 when do you arrive?
[18:19:41] <hoo>	 Tomorrow at 8pm :S
[18:19:45] <aude>	 :)
[18:20:12] <hoo>	 Would have liked to come in a bit earlier, but this made the cheapest flight
[18:20:16] <aude>	 looks like it will rain every day
[18:20:24] <aude>	 just fyi (it's a good thing for sf)
[18:20:35] <aude>	 coming yesterday was cheapest for me
[18:20:38] <hoo>	 Wow... that's surprising
[18:20:46] <hoo>	 I'm flying via Detroit this time
[18:21:00] <aude>	 ah
[18:21:10] <hoo>	 I don't think I have ever seen rain there...
[18:21:34] <aude>	 it's rare
[18:21:56] <hoo>	 I checked that they have free wlan... hope that actually works out. I spent 4h sitting in the Istanbul airport in October essentially doing nothing
[18:22:25] <aude>	 :/
[19:31:34] <Torben>	 Sorry - was obviously away - yeah, I can find more examples :)
[19:34:57] <Torben>	 I can see you're using the file wikidata-20151228-all.json.gz - should it not contain exactly the same as the bz2-file ?
[19:38:30] <Torben>	 And I am using the latest 20151228....bz2 file
[19:39:37] <Torben>	 I will download and take a look in the corresponding .gz file
[20:02:13] <hoo>	 Torben: Some archive applications have troubles unpacking the bz2
[20:02:20] <hoo>	 because we pack it using pbzip2
[20:05:07] <hoo>	 I'm looking into the bz2 one myself now
[20:05:38] <hoo>	 They both contain the same data
[20:05:41] <hoo>	 well, they should
[20:05:52] <hoo>	 hoo@snapshot1003:~$ zgrep '{"type":"item","id":"Q133",' /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.bz2 
[20:05:52] <hoo>	 hoo@snapshot1003:~$ echo $?
[20:05:53] <hoo>	 1
[20:05:54] <hoo>	 ugh
[20:06:17] <hoo>	 but could be that zgrep also fails reading bz2 files with multiple streams
[20:06:23] <hoo>	 I'll try with zcat and a pipe
[20:06:32] <hoo>	 zcat for sure can read them, I tested that before
[20:07:13] <hoo>	 doh, the zcat on jessie can't read bz2 at all?!
[20:09:51] <hoo>	 * bzcat
[20:26:31] <hoo>	 hoo@snapshot1003:~$ bzip2 -cdfk /mnt/data/xmldatadumps/public/other/wikibase/wikidatawiki/20151228/wikidata-20151228-all.json.bz2 | grep '{"type":"item","id":"Q133",'
[20:26:31] <hoo>	 {"type":"item","id":"Q133","labels":{"en":{"language":"en","value":"mixed-member proportional representation"}[…]
[20:26:33] <hoo>	 Torben: ^
[20:26:56] <hoo>	 So I guess whatever you used to read the bz2 couldn't make sense of our multi stream bz2
[20:28:01] <Tpt>	 I just read in Wikimania 2016 that there may be a "Watson" person as Wikidata keynote speaker. Is this Watson the IBM project ?
[20:28:57] <hoo>	 I don't know for sure, but probably, yes
[20:29:20] <Tpt>	 so, they would use Wikidata ?
[20:29:26] <Tpt>	 nice!
[20:29:51] <Tpt>	 So, Platypus would not be the only query answering project using Wikidata
[20:34:49] <Torben>	 hoo: just about to unpack now - if the solution is this simple I'll be happy ;)
[20:35:09] <hoo>	 Torben: Yeah... but be aware, they are about 62G by now
[20:35:18] <hoo>	 GiB, I mean
[20:35:23] <Torben>	 yup - takes a while ;)
[21:03:11] <Torben>	 There definitely seems to an issue with the decompressor I use (7-zip on windows) - just downloaded bzip2 cmd line tool and giving that a try
[21:04:06] <hoo>	 That should work
[21:04:40] <hoo>	 When I chose to use pbzip2 for these, I thought everything would be able to read the files (as the bzip2 standard is being followed)
[21:04:49] <hoo>	 but apparently most applications out there fail with bzip2
[21:05:04] <hoo>	 I think we need to go back to just using bzip2 to compress the files :S
[21:07:18] <Torben>	 or maybe just make a note of it somewhere ;)
[21:08:02] <Torben>	 the funny thing is that 7-zip fails on the .gz-file, as well - .gz is not exactly new ;)
[21:09:39] <hoo>	 Yeah... but we also use parallelism there
[21:09:48] <hoo>	 which is supported by the standard
[21:09:55] <hoo>	 but maybe not by overly naive implementations
[21:10:27] <Torben>	 ha - they may have forgotten to read the nastier details of the spec
[21:10:56] <Torben>	 but I'm learning something new, which is almost always a good outcome :)
[21:11:56] <Torben>	 wikidata is, by the way, an awesome and exciting project!
[21:11:57] <hoo>	 Compressing the dumps as bz2 takes over 4h w/o pbzip2... and given we grow by 1% per week, I wanted something that can scale
[21:14:37] <Torben>	 1% per week, that quickly adds up...
[21:14:52] <hoo>	 Indeed
[21:38:58] <Torben>	 How big is the unpacked file? Mine is 62.259.208.192 bytes - does that sound right?
[21:39:31] <hoo>	 That sounds about right
[21:39:48] <Torben>	 cool - just searching for my favorite item Q133 right now ;)
[21:42:06] <Torben>	 and there it is - thank you! You've been a great help :)
[21:47:12] <hoo>	 You're welcome :)
[21:55:45] <Reedy>	 hoo: 
[21:55:46] <Reedy>	 [21:49:29] <paladox> Could someone help me convert wikibase to the extension registration please. I have tryed here https://gerrit.wikimedia.org/r/#/c/229119/ but the tests keep failing.
[21:55:46] <Reedy>	 [21:49:49] <Reedy> paladox: You'd be best leaving that to the Wikidata fokes
[21:55:46] <Reedy>	 [21:49:51] <Reedy> *folks
[21:55:46] <Reedy>	 [21:50:00] <paladox> Oh ok.
[21:55:47] <Reedy>	 [21:50:27] <Reedy> Because they use a special meta repo for deploment too
[21:55:49] <Reedy>	 [21:50:27] <Reedy> https://github.com/wikimedia/mediawiki-extensions-Wikidata
[21:55:51] <Reedy>	 [21:52:36] <paladox> Oh. 
[21:55:55] <Reedy>	 lol.
[23:08:53] <hoo>	 Reedy: I kind of have that on my list
[23:09:53] <Reedy>	 :)
[23:10:05] <Reedy>	 More the fact he had no idea what he was doing, or the added "complexities"
[23:18:36] <hoo>	 heh... I didn't review it myself yet, as it would probably take me ages to get it rght myself