[09:12:59] Hey, a question regarding API.. First: result of wbgetentity and query is different [09:13:00] https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&props=claims [09:13:07] https://www.wikidata.org/w/api.php?action=query&prop=revisions&titles=Q42&rvprop=content [09:13:41] these are different (e.g. claims does not have 'datatype' in mainsnak [09:14:36] so I want to have a consistent result but I need to query certain revisions (and not the current revision) [09:15:32] so is it possible to make query return result as the same as wbgetentity? or make wbgetentity returns results for revisions too? [09:21:06] DanielK_WMDE_: ^ [09:21:09] Hey :) [09:32:34] Amir1: hi! [09:33:00] Amir1: if the API output is missing teh datatype in value snaks, that#s a bug. please report it. [09:33:18] Addshore is currently rewriting the API result generation, please poke him about it [09:33:50] *readsup* [09:34:18] Amir1: getting specific revisions from getentities would be useful, and easy enough to do. Please file a feature request (and poke lydia about scheduling it) [09:34:22] Amir1: we dont really support the whole query thing at all, dont use it [09:34:26] :P [09:34:35] Hey all [09:34:38] DanielK_WMDE_: I think there may already be a ticket somewhere! [09:34:39] thanks :) [09:34:55] oh, right, i was thinking of action=wbsearchentities. action=query is completely unsupported. [09:35:40] yeh, Amir1 query stuff will get you what is in the DB :/ whcih although it may look similar to the output of getentities sometimes youll get very old serializations etc [09:35:48] https://phabricator.wikimedia.org/T40971 [09:35:52] Amir1: is this for calculating diffs? an api module that would return a structural diff would still be the best solutioin I think. [09:36:13] DanielK_WMDE_: it's not just that [09:36:31] there is aslo a ticket for that ;) [09:36:33] the system needs to current number of claims, etc. as features [09:36:34] Amir1: raw revision content, as returnsed by action=query, should be treated as an opaque blob. It is not guiaranteed to be consistent or to conform to any documented structure. [09:36:46] I see [09:36:55] https://phabricator.wikimedia.org/T106306 [09:37:04] (and I assume it is the same about xml dumps of Wikidata) [09:37:11] Amir1: yup [09:37:16] Amir1: yes indeed. use the json dumps [09:37:53] Amir1: allowing individual revisions to be queries from wbgetentities should not be hard. [09:38:15] indeed! [09:38:21] yup [09:38:22] https://phabricator.wikimedia.org/T40971 [09:38:33] addshore: the result would then be keyed by revision id, not entity id. that [09:38:41] ...that's a bit odd, but not terrible, i think [09:38:47] mhhhhm [09:38:49] There was a patch by legoktm to implement such feature [09:38:51] but the two modes should not be mixed [09:39:00] Amir1: yeh I thought I remembered something like that! [09:39:04] Amir1: Was looking for you [09:39:12] multichill: hey :) [09:39:15] Have you ever done something with word distances? [09:39:40] do mean word distances in NLP? [09:39:54] We have a list of painters that have an item and a list of painters that are not matched yet (https://tools.wmflabs.org/multichill/queries/wikidata/top_unmatched_painters.txt) [09:40:11] Finding matches between the two of them. All exact matches have already been covered [09:40:16] I did several similar works before [09:40:58] multichill: before I start [09:41:23] Do you want to find matches between this list or this list and members of another list? [09:41:29] (I just want to be sure) [09:43:23] multichill: ^ [09:43:27] List A : http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A5%5D%20AND%20CLAIM%5B106%3A1028181%5D (all the painters we currently have on Wikidata) [09:43:36] List B : https://tools.wmflabs.org/multichill/queries/wikidata/top_unmatched_painters.txt [09:43:46] Possible matches between List A & B. [09:44:33] Amir1: For list A I could probably do a database query to get all labels and aliases in all languages so more chance of finding something [09:44:40] Is there a way to download English labels? [09:44:56] multichill: that would be really good [09:45:16] I just do a database query and I except I might have some weird stuff in there [09:45:51] it's okay [09:45:54] I will handle that [09:46:59] Amir1: The query is something like. Give me every pages that links to painter and give me all the labels and aliases for this page [09:47:30] It will contain some wrong items (subclass of painter, etc), but that's a small number and it needs human verification anyway [10:08:55] Amir1: Is this something you could work with? [10:09:11] addshore: Didn't you have a bot or something to find failed merges like https://www.wikidata.org/w/index.php?title=Q18565822&action=history ? [10:09:24] Of course [10:09:38] multichill: nope :/ [10:09:40] I was hoping you gave me the list and I start working on it [10:09:56] I have a script which restores deleted items that were merged and creates a redirect [10:10:11] Right, that's probably the one I'm mixing it up with [10:10:51] Will do Amir1. Is the form -