[08:00:44] Hi, need some help how to deal in Wikidata with an institution and the related building, example Greek National Library Q1467610 [08:03:05] Is it best practice to create to different WD entries for institution and building or not? [08:04:31] And is this the right place here for my question ?? [08:46:13] Anyone here? [08:47:57] Arch2all: yes it is possible to create different items for the building and the institution [08:48:31] (if you have anything meaningful to state about the building itself, like a heritage status for instance) [08:49:24] Ok, but the relating Wikipedia article (in most languages) is about institution AND building. How to link? [08:50:35] intuitively I would expect the institution to be the main topic, so I would put the sitelinks on the institution (but that may not be true all the time) [08:52:54] Are sitelinks unique or is it possible to link from different WD entries to the same Wikipedia entry? [09:08:16] So if I was to try building a docker image of wikibase 1.31, would I be expected to run into a bunch of problems around the move from PHP to pure JavaScript for some packages like wikibase-serialization and wikibase-data-values? Because I am :) [09:09:36] https://github.com/dbs/wikibase-docker/tree/wikibase-1.31-attempt if you want to see my docker newbie attempts [09:10:49] dbs: they're git submodules now, so a `git submodule update --init` should in theory do the trick as it says in https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/README.md [09:17:19] jakob_WMDE: ahh, helpful - so perhaps the 1.30 docker approach of grabbing the zip file of the release from github won't do the trick for 1.31 [09:19:06] https://stackoverflow.com/questions/34719785/how-to-add-submodule-files-to-a-github-release suggests that indeed that will no longer work, so I guess I'll try creating my own zip file for the time being [09:19:12] jakob_WMDE++ [09:22:08] dbs: yup, I think before they were installed through `composer install`. I'm not sure what the best way to pull them in would be, if not through the git submodules. getting the individual releases of the libraries one by one would work, but sounds a bit tedious. [09:48:16] is anyone around who's familiar with pywikibot? [09:50:10] I want to add references at the same time as the statement, the wikidata ui can do it, so there's no reason why pywikibot couldn't too... but I can't find a way of doing it, nor can I find a ticket asking for it [09:55:59] I am trying to POST a query to wikidata, but I keep getting a 405 response. It works when I use GET https://phabricator.wikimedia.org/P7375 [09:57:11] The body of my request is — query=SEL… [09:59:23] have you url-encoded the query? [10:05:51] nikki: I have, yes [10:07:34] nikki: https://phabricator.wikimedia.org/P7376 [10:14:16] hm... [10:15:52] using jquery, $.post("https://query.wikidata.org/sparql?format=json", "query="+encodeURIComponent("select * { ?item wdt:P31 ?p31 } limit 1"), function (data) { console.log(data); }); works for me [10:16:02] so maybe it's something to do with the accept header [10:16:29] (I have the format in the url instead - might want to try that) [10:30:51] addshore: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/445411 [10:30:54] nikki: I have a large query so it exceeds the 2000 character limit :( [10:31:06] nikki: Thank you for your help though :) [10:31:26] I think I'll work on a smaller (than 2000 chars) query :P [10:32:48] nikki: yeah that's one of the numerous issues of pywikibot… there really isn't a satisfactory python library to interact with Wikibase yet [10:33:09] if you can use Java instead, Wikidata-Toolkit is great [10:33:23] it's definitely possible to post more than 2000 characters since I've been doing that... but ‎I guess I can't help if they leave :P [10:33:56] it is also possible to use Wikidata-Toolkit from Python via Jython (that's what I do to create properties) but it's quite painful to set up and only works with Python 2.7 [10:33:59] prtksxna: it's definitely possible to post more than 2000 characters, since I've been doing that. I've never tried using accept headers though, I just put format=json in the url [10:34:49] java would be a step too far for me, python is already foreign enough :/ [10:36:32] yeah it's a different ecosystem… [10:37:00] (I assume you can't use OpenRefine for that task, right? ^^) [10:40:09] I'm not entirely sure what openrefine does, but probably not? I'm adding missing statements and adding or replacing references based on ids I already added [10:40:48] which works fine, the main issue is just the sheer number of edits needed, so reducing the number would be nice [10:45:37] nikki: OpenRefine lets you add arbitrary statements and references to items, and does that in one edit per item no matter what the changes are [10:47:14] it is all explained at or both [10:47:16] connected to the same vertex. [10:47:32] oups, sorry, let's do it again: [10:47:50] it is all explained at https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing [10:53:37] nikki: you can also get a better idea of what it does by looking at the edit groups made with it, https://tools.wmflabs.org/editgroups/?tool=OR [12:32:52] how do you write a query for a specific item? i.e. WHERE id = 'Q43' ? [12:33:51] use wd:Q43 instead of a variable [12:34:36] e.g. select * { wd:Q43 wdt:P31 ?p31 } will select the p31 statements [12:36:06] you can also use values ?item { wd:Q1 wd:Q2 } to set ?item to a set of values (which can be just one, if you want) [12:38:41] oh ok [12:39:29] thanks! [12:39:44] :) [12:39:47] Lucas_WMDE: are you busy? was wondering if you can help me [12:40:51] trying to work out how/where the "id" parameter is generated when adding a new statement, but I have no idea where to look [12:43:03] it seems like wbcreateclaim lets you just pass the item id, but doesn't allow adding references at the same time, while wbsetclaim allows adding references at the same time but doesn't say how to specify which item to add it to... [12:53:10] nikki: yes wbsetclaim lets you add references at the same time, it just expects the JSON representation of the entire statement [12:53:32] but you also need to provide a fresh statement id for the claim (which contains the item id) [12:54:58] something like this: "Q55670660$3FB1FC8D-91D7-4676-A988-DDBCA7624083" (qid, dollar sign, statement UUID) [12:59:37] do you know how the uuid is generated? [13:01:19] like this: https://en.wikipedia.org/wiki/Universally_unique_identifier [13:01:35] (it is not related to the content of the statement at all) [13:01:48] version 4? [13:02:11] not sure about the version [13:02:29] I was hoping to find where it does it for the ui, but I can never find the code for anything when it comes to wikidata x_x [13:03:17] there are libraries for that, you should not have to code it yourself [13:03:35] in Java it is even part of the standard library [13:03:53] have you had a look at OpenRefine by the way? [13:06:21] (sorry to insist but I have been working on the exact same thing 6 months ago to make this work in Wikidata-Toolkit and OpenRefine… so I have a vague hope that it could be useful to you) [13:12:47] I mean to verify what it does... it's not the first time I've wanted to know how the ui does something and had trouble finding it [13:13:56] ok, so how would you get the label of a specific item? i.e. Q42? [13:14:33] and I looked at the page and it looks interesting, but it seems a bit silly to try it with the thing I'm currently doing when I already have something that's working [13:15:00] okay [13:15:50] but there are plenty more datasets out there... maybe for one of the next ones [13:16:30] one thing I wasn't sure about, can it replace existing references and existing values? [13:17:32] not yet, indeed [13:18:42] ok, so it wouldn't actually work for what I'm currently doing anyway, now I feel less bad about not trying it yet :P [13:20:15] davidwbarratt: you can explicitly select labels using ?item rdfs:label ?label filter (lang(?label) = "en"), or you can use the label service to automatically create label variables (by appending Label to the variable name, e.g. ?item will have ?itemLabel) with service wikibase:label { bd:serviceParam wikibase:language "en" } [13:20:48] oooo! [13:20:48] thanks! [13:21:01] the label service accepts a list of languages, so you could say "en,fr,de" if you wanted, and falls back to the item ID [13:21:11] oh nice [13:21:47] when selecting them explicitly, you will only get results where there is a label in that language, you can wrap it in optional { } if you want the label to be optional (and then you'll get a blank value for the label when there isn't one) [13:37:37] \o/ I think https://github.com/dbs/wikibase-docker/tree/wikibase-1.31-attempt produces a working wikibase 1.31 image - need to sort out a WDQS redirect to HTTPS that shouldn't be happening but the basics are there [13:44:18] pintoch: I managed to persuade pywikibot to add the reference at the same time! thanks for the hint about the ids :) [13:45:03] it's pretty horrible though [13:56:03] great! let's hope that one day we'll have good libraries for that [15:45:00] anyone here who can help me with my interwiki link question from yesterday, 21:30? [16:08:27] joker234: the only thing I'm aware of would be to edit the dewiki page "Kokoswasser" to remove the redirect, link it in wikidata, then restore the redirect [16:10:18] (as you can probably guess from that, linking to redirects or anchors is not officially supported, people have been arguing about whether it should be changed or not for years) [16:17:16] nikki: oh. I suppose you mean linking in wikidata Kokoswasser and create a redirection article Kokoswasser → Kokospalme#Kokswasser. [16:17:46] yes, except Kokoswasser -> Kokospalme#Kokoswasser already exists [16:17:55] Damn I knew there is a way to do this elegant [16:20:08] hmm nope, wikidata says that I can't create the link to Kokoswasser because there is already a link from Q13187 using Kokospalme [16:20:21] I think the wikidata tool is checking the redirects [16:20:25] that's why you have to remove the direct from Kokoswasser first [16:20:32] the redirect, even [16:20:49] ah [16:20:53] uh thats a hack ^^ [16:22:00] yep, that's what I meant by the bit I wrote in brackets [16:22:38] hmm [16:22:50] but it works, so the people who want links to redirects do it [16:22:52] I suppose it takes a while to see the link in the english article? [16:23:52] there's usually a small delay, but I don't how long exactly. purging the enwiki page should make it update immediately [16:26:51] okay it took about 5 minutes [16:27:09] thanks for your advice [16:27:30] no problem :) [19:42:52] so this is the query I have so far: https://query.wikidata.org/#%20SELECT%20%3Fimage%20%3Flabel%20%3FgenreLabel%20WHERE%20%7B%0A%20%20%20wd%3AQ817266%20wdt%3AP18%20%3Fimage%20.%0A%20%20%20wd%3AQ817266%20rdfs%3Alabel%20%3Flabel%20filter%20%28lang%28%3Flabel%29%20%3D%20%22en%22%29%20.%0A%20%20%20wd%3AQ817266%20wdt%3AP136%20%3Fgenre%20.%0A%20%20%20service%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%0A [19:43:02] how would I concat the genreLabel so it's on a single line? [19:58:51] by using group_concat() in the select and grouping by all the other variables at the end of the query [19:59:04] gah and group_concat seems to not work with the label service [19:59:19] like http://tinyurl.com/y9dwbpfa this [20:01:36] omg that's amazing! [20:03:21] :) [20:06:56] ok, well that's good to know that it doesn't work with the label service (I can file a bug if you'd like me to). [20:07:26] another question: if any statement can have more than one value, is there a way to ensure that I only get the first one? like let's say I just one the first image and I don't care about the other ones? [20:07:58] nikki also, are you at Wikimania? if so I'd like to meet you (if I haven't already, haha) [20:08:48] no, I'm not [20:09:10] I think there was already a ticket about the label service but I don't remember if it was the same thing... [20:09:33] WikidataFacts: ping? [20:10:13] nikki: the label service thing is known and expected behavior as far as I know [20:10:27] you have to configure it explicitly in that case [20:10:35] ah [20:10:48] like this: http://tinyurl.com/y7gzsqf5 [20:11:32] davidwbarratt: to get *any* value, you can use the same pattern but with SAMPLE instead of GROUP_CONCAT [20:11:42] but the query service doesn’t have any idea what the first one is [20:11:48] (unless it has “series ordinal” qualifiers or whatever) [20:11:54] triples aren’t ordered in RDF [20:13:10] WikidataFacts ooo! cool! thanks! [20:14:39] also, if there are statements with the rank set to preferred, the wdt: prefix will only return those... sometimes the data should be fixed, e.g. if there are two values and one is old, the current one should have the rank set to preferred [20:15:33] but yeah, in general, sample() sounds like the best option [20:15:55] ah cool. [20:16:18] an item will only have a single label per language right? like there's no way to have two english labels? [20:16:33] it can have an en label and an en-gb label [20:16:35] but no en labels [20:16:42] no *two en labels [20:16:49] yeah, any extra labels would have to be added as aliases [20:17:24] * addshore just ran a bad query :P [20:17:39] which you can query using skos:altLabel instead of rdfs:label [20:18:04] oh ok, cool [20:20:28] * nikki wonders what addshore is doing [20:20:42] or rather, why you're running bad queries :P [20:20:57] i accidently ran curl http://wdqs1003.eqiad.wmnet:8888/bigdata/namespace/wdq/sparql?format=json&query= which apparently just dumps all of the tripples.... with no timeout... [20:22:01] I would guess that it *does* time out, but it manages to send *many many triples* to Varnish before it dies, and then Varnish takes more than 60 s to replay them to you? [20:22:19] but also [20:22:31] why are you running queries against the internal endpoint directly [20:22:35] wat r u doing addshore [20:22:47] being naughty by the sound of it :P [20:22:51] evil secrets [20:23:09] WikidataFacts: to avoid the timeout of course ;) [20:23:23] just seeing what time this query actually completes in (not a blank query, i meant to put one there) [20:23:34] oh right [20:23:41] no varnish if you’re talking directly to that host [20:23:43] eeeeek [20:24:07] but running empty queries from the WDQS UI just results in a parse error… [20:24:30] “encountered "" at line 1, column 1” [20:39:15] WikidataFacts: thats what I was expected when I accidently made the request [20:39:23] instead it just started dumping allllll of the tripes in json :P [20:42:00] WikidataFacts: so this query actually completes super fast, its the time spent downloading the result that makes it time out [20:42:25] infact running the query within the cluster using curl, its still downlading json and has been running for 4 mins.... [20:51:56] meh, whenh I run it on the cluster, its still to much and I get com.bigdata.rwstore.sector.MemoryManagerOutOfMemory [21:04:19] WikidataFacts: some way of chunking the results / chunking the downloads would be cool [21:04:23] mmmhhmmpf [21:04:37] just stumbled across https://github.com/dbcls/sparql-proxy which looks a bit cool [21:04:43] you can use LIMIT/OFFSET, but it’s tricky to make that work correctly [21:04:59] because in general that just means it has to run the whole query and then cut out a part of the results afterwards [21:05:03] WikidataFacts: yes, as the order isn't set, so you might not always get the correct offset [21:05:21] yup, but the query itself takes a few ms to run, just downloading the results takes an age [21:05:33] but if you LIMIT+OFFSET a subquery with just the first triple, it would be more reliable [21:05:39] hm, ok [21:06:26] and even if there is a "given" order or some inherent order, if our queries then go to different backend instances then that is confusing too [21:08:50] right, maybe sleep time