[00:56:42] https://phabricator.wikimedia.org/T215970 should've been requested onwiki presumaly? [05:07:36] Help [08:12:51] reosarevok: it has the advantage that you can just edit a statement to replace the string with the item (no need to migrate references or any other qualifiers) [08:13:15] but if that method works fine, I would question why we don't do the same for author name string [08:15:09] Asked [08:18:32] it looks like author name string is used a lot, but I bet the majority of the uses are those bot-added scientific articles [08:18:45] I remember fixing a few of those statements... it was rather annoying :P [08:19:29] since there's lots of statements and I had to scroll up and down to copy the information from one statement to the other [08:20:11] and being able to just change the value of the statement would have made that a lot easier [09:13:37] nikki: pretty sure the idea is you just use https://tools.wmflabs.org/author-disambiguator/ and don't touch them by hand [09:14:17] I can't load that [09:14:45] How come? [09:14:57] I get ERR_ADDRESS_UNREACHABLE [09:15:00] Huh [09:15:02] Worked here [09:15:48] it's probably because it doesn't have an ipv6 address [09:16:12] Oh, some ISP issues? [09:16:15] although my isp has decided to allow access to most ipv4 things, but I haven't been able to load tools.wmflabs.org for ages [09:16:22] :( [09:16:29] lots of isp issues... [09:16:48] If you end up getting more cases where you would need to fix this by hand, just tell me and I can run the tool maybe [09:17:55] either way, I wouldn't have any reason to use an external tool if I could just edit the statement :P [13:56:46] Hello [13:58:13] I got the json file from yesterday but i get an error when extracting the data (gzip: invalid compressed data--format violated) [14:04:19] melderick: Hi, from what I read, it sounds like you're trying to extract data from a .json file? [14:04:21] Is that the case? [14:05:57] yes [14:06:20] each week i download the json.gz file [14:06:43] melderick: Do you check the SHA1 or MD5 sums? [14:06:44] it works pretty well, except for the one generated yesterday [14:07:00] I am generating the md5 right now [14:07:08] Okay :) [14:07:09] takes time :) [14:07:46] So I suppose yours should be 10aef3299f32dc7695c6e49559a77a2c [14:08:02] yes [14:08:29] Then I don't know what the problem is, sorry :( [14:08:37] do you have access to the file itself ? [14:09:01] Not in my disk, I should download it [14:09:19] oh ok don't bother then :) [14:09:44] But we have to chance to investigate to prevent future issues like this :) [14:09:49] *the chance [14:10:14] sure [14:10:48] Would you be so kind as to describe the problem on https://phabricator.wikimedia.org/maniphest/task/edit/form/1/ ? [14:11:24] If you have a Wikidata account, you can login with it [14:11:36] hmm maybe i will wait for md5 to be generated first. In case it's some corruption on my side [14:12:38] If the MD5 is correct, 10aef3299f32dc7695c6e49559a77a2c, then the other will be correct too [14:13:42] Ah, yes; sorry, I thought you said SHA1 :) [14:13:50] Let's wait, no problem :) [14:13:52] :) [14:14:06] yeah generation is soo long [14:14:29] ahh [14:14:36] de51813666aac81951b6f30e0a468a26 wikidata-20190211-all.json.gz [14:14:42] :S [14:14:44] ok so issue on my side [14:14:49] \o/ [14:15:11] Those should be smaller files anyway [14:15:39] smaller ? [14:15:41] Do you often use just a subset of entities? For example, only people, only classes, etc.? [14:16:38] Or you need the full dump? [14:17:19] i need the full dump as i also search for unclassified items [14:17:31] Okay :/ [14:17:55] working on Greek/Roman Mythology [14:18:05] Very interesting :D [14:18:23] Next time you can access to the coresponding subdirectory in https://dumps.wikimedia.org/wikidatawiki/entities/ [14:18:24] quite often i find unclassified items [14:18:35] And look for the chechsum files [14:18:41] *checksum [14:18:53] yep i did that [14:19:16] Cool [14:20:32] How do you find the unclassified Items? [14:23:41] hmm I generate a text file version of the archive, a kind of index with the Q number, one label (english preferred, then french, then any latin label, or any other label), a list of parent classes (from instance of and subclass of) [14:24:04] and then I grep the index file [14:24:19] for labels I am missing [14:24:45] And you can't use SPARQL because the timeouts expire? [14:26:08] Or haven't you explored that way too much? [14:27:06] a bit of both and also I have scripts to keep me updated with changes done by others [14:27:55] i don't see me checking every week 20000+ items with SPARQL [14:28:27] maybe when i am done :) [14:29:57] No, definitely I don't see me doing so either :P [14:30:48] I think there are more people doing the same (tracking changes) with their own scripts [14:31:58] Maybe if you share yours, or you ask on an open list or on the wiki, people can tell you how you can improve the process [14:32:53] yeah [14:36:03] i spent most of my time checking and adding data, the process of searching existing items is not what really takes time [14:36:37] maybe 10% of my time ^^ [14:42:02] but you are right, i should try to explain my current process somewhere, to see how it could be improved, and maybe give people ideas for new tools :) [14:42:24] :D [14:44:12] Even if you use some tools that aren't included in https://www.wikidata.org/wiki/Wikidata:Tools, feel free to include them, unfortulately for most of them we have no record [15:01:24] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @milimetric & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:01:55] Helping our friends at VIAF again :) https://www.wikidata.org/wiki/Q28146561#P214 [15:14:46] by the way, 2 months ago I sent an edit request on : https://www.wikidata.org/wiki/MediaWiki_talk%3AGadget-Move.js [15:14:56] got no response yet ^^ [15:15:43] not exactly sure what is the right process for suggesting changes on Gadgets [15:25:13] You need to transclude {{edit request}}, otherwise it will not trigger something [15:25:13] 10[1] 10https://www.wikidata.org/wiki/Template:edit_request [15:26:16] (and seems the phab task snowed under...) [15:31:16] sjoerddebruinsj : I wrote {{tl|edit request}} isn't it the right way ? [15:31:16] 10[2] 10https://www.wikidata.org/wiki/Template:tl [15:34:08] tl is template link, we do that after the request was completed :) [15:38:31] ohhh [15:38:34] damn :) [15:38:48] so i need to remove the 'tl|' ? [15:40:27] yes [15:41:18] done thx [15:41:26] i could have waited years :D [15:43:41] Indeed... [15:51:06] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @milimetric & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [18:04:57] Hiii [20:10:42] flup