[13:02:41] WikidataFacts: do you know anything about the mw api search in the query service? [13:03:05] not very much… I’ve used it once or twice, but it was a struggle [13:03:09] ah [13:03:14] it seems like srlimit doesn't do anything [13:03:42] hm [13:03:45] like http://tinyurl.com/y74hhm2e returns 50 results even though I set srlimit to 10 [13:03:57] and I can't work out if I'm doing something wrong, or if srlimit really is broken [13:06:52] strange [13:07:24] if I’m reading the source code correctly, the WDQS’ default for srlimit is "max", so if you don’t specify it at all, I think you should get 500 results, not just 50… [13:07:56] I limited it to 50 because it timed out when I didn't [13:08:18] oh, I see [13:08:20] outer limit [13:08:27] I didn’t notice that [13:09:02] ok, I got 10000 results in 20s with srlimit and SPARQL limit commented out [13:09:26] so I think the srlimit is effectively useless, because WDQS automatically continues the query anyways? [13:09:37] so you’re just setting a different batch size [13:10:07] that seems a bit silly [13:11:15] yeah [13:15:26] hi all [13:15:48] WikidataFacts , abian , [13:15:52] SELECT DISTINCT ?name WHERE { ?person wdt:P31 wd:Q5. ?person wdt:P735 ?name. } = 28833 results, is this mean we only have 28833 name in wikibidia ? [13:16:23] i can not belive we ony have 28833 identical name ! [13:17:46] so kindly mai have an advice [13:21:50] beshoo: it’s actually slightly less, because some items apparently have “unknown value” as the given name [13:21:56] http://tinyurl.com/ybuwjfzm 28634 given names [13:26:54] it is very small dataset [13:27:20] is there any way to get any name , but with gender at least [13:27:55] It seems Wikedata is not that big source for names [13:27:59] SELECT (COUNT(DISTINCT ?name) AS ?c) WHERE { ?name wdt:P31/wdt:P279* wd:Q202444 } -> 40368 [13:28:20] these are all given names (used or not) [13:28:58] my database from USA university names more than that ! [13:30:06] 40k name can not be usefull with AI system [13:30:31] any other idea to get names with gender :D [13:32:37] is giving name some time empty ! [13:34:29] https://www.wikidata.org/wiki/Q855 the given name is just Q30467509 [13:34:45] which is No label defined [13:35:35] so i am thinkin about this , get full name + giving name [13:35:43] which is avalabile [13:36:08] if full name avalabile , but giving name is not , so we take full name [13:36:24] WikidataFacts , what do you think ? [13:37:13] well, it has no label defined in English [13:37:21] it does have labels in cyrillic-script languages [13:38:17] but it has full name [13:38:32] https://www.wikidata.org/wiki/Q855 in the top of the page [13:39:01] you got the idea , but i dont know how can i collect bigest data from wiki [13:39:25] Joseph Stalin (Q855) <----- this is the full name correct ? [13:39:44] no [13:40:11] Then what it is ? [13:40:23] I think you should discuss this with WikiProject Names [13:40:32] but the closest thing to a full name is probably the “name in native language” [13:40:46] which for that item is იოსებ ბესარიონის ძე ჯუღაშვილი (Georgian) or Ио́сиф Виссарио́нович Джугашви́ли (Russian) [13:41:38] how can get all of thease opthins with gender , more data i have more usfull will be [13:41:52] opthins = options [13:42:08] i mean any name , but linked to gender [13:42:27] i dont care what this name related to , but it hase to be for human [13:43:42] ok hold a sec , the list which has "Joseph Stalin" name [13:44:04] "In more languages" [13:44:16] what is this and how can we get it ? [13:45:23] can i search database and ask for DISTINCT list of all "In more languages" [13:45:41] those are the labels of the item [13:45:48] but there are lots of non-human items [13:45:56] so just a list of labels won’t be useful for you [13:46:33] but there are lots of non-human items = we are calling wdt:P31 wd:Q5. [13:46:42] wich is it only human [13:47:07] correct ? [13:47:11] ok, sure [13:47:45] so how can i ask for "humanwdt P31 wd:Q5 DISTINCT list of all "In more languages" labels of the item [13:48:33] i am sure there is a way [13:49:06] well, sure there is a way [13:49:10] that doesn’t mean it’s a good idea [13:49:16] but that would be http://tinyurl.com/ybwbh8hj [13:49:48] well more data is what i need to feed the AI [13:52:03] well now we back to how can we download all SELECT DISTINCT ?label ?gender WHERE { ?human wdt:P31 wd:Q5; wdt:P21 ?gender; rdfs:label ?label. } [13:54:57] :) [14:03:15] why this is not working ? the offset [14:03:16] SELECT DISTINCT ?label ?gender WITH { SELECT * WHERE { ?given_name wdt:P31 wd:Q5. } OFFSET 10000 LIMIT 1000 } AS %slice WHERE { INCLUDE %slice. ?human wdt:P21 ?gender. ?human rdfs:label ?label. } [14:03:54] it take forever :) [14:04:25] is it correct how i use the offset ! [14:16:16] hello :) [14:18:23] it seem we can not DISTINCT the rdfs:label ?label.? [14:24:01] WikidataFacts , first thank you for http://tinyurl.com/ybwbh8hj , it is very good , but what we need is 2 things , 1- show unique name so if there is a duplication will gon , 2- find a way to download all of this set , which i think it is the offset [14:24:19] but i test it and it is not working [14:26:36] well the “duplication” is because the labels have different languages [14:27:04] in RDF syntax that’s e. g. "Maud Wagner"@ca vs. "Maud Wagner"@de [14:27:21] and the language is an important bit of information IMHO… [14:27:38] I’m not sure if you’ll get useful results if you just mix all languages together [14:27:54] it will be winderfull results [14:28:26] if user type in jap , then i can know the name gender in jap way [14:29:58] https://i.imgur.com/b4hYHW8.png see the same name ! [14:30:20] AHA [14:30:34] "Maud Wagner"@ca vs. "Maud Wagner"@de [14:31:05] this data is very valuable if i can manage it and download it [14:32:12] WikidataFacts> honistly , i will have useful results if i can download this [14:32:35] regarding duplication if there is no way to sort this out [14:32:43] well, you should be able to run that query against your local query service, right? [14:32:46] without the LIMIT [14:33:01] can we do offsit ? [14:33:13] since i did test it on my local server [14:33:19] and it it not working [14:33:50] it take very long time and memory limit [14:34:07] so if we can do some kind of offsit it will be wonderful [14:35:15] well we can do the same trick as yesterday: http://tinyurl.com/y8xser33 [14:36:04] i did that but it is not loading , Running query..................... [14:36:28] test it and see by your self [14:36:50] I tried that query and it returned results in about three seconds [14:37:13] local server ! [14:37:23] it works 279036 results in 62380 ms [14:38:32] you could try a shorter LIMIT, then [14:38:55] OFFSET 10 [14:38:55] LIMIT 1000 71112 results in 16344 ms [14:39:17] ok small limit speed it [14:39:35] so you think there is no way to remove dublecation [14:41:21] and i have to deal with it via my php to remove any dublecation [14:48:25] well, it is possible, just a lot less efficient [14:48:33] but since you’re using your local service anyways… http://tinyurl.com/y9mfqn6e [14:49:30] or, with lower limits: http://tinyurl.com/yajz6za2 [14:55:15] NICE [16:35:18] hey,.. :) https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440550/ [16:35:47] the wikidata dispatch monitor was checking for "100 to 299 or 600" [16:35:57] fixing it to "not over 599" [16:36:05] hmmm [16:36:15] hoo|away: ^ [16:36:35] the nice part is.. it is now like 3 minutes instead of 10 minutes the other day [16:36:40] and running from new hardware [16:37:04] we will have 2 checks in a moment.. first the one that checks for over 10m.. but it will be just WARN [16:37:18] and a second that will be CRIT but only if it's really serious.. like 1h [16:37:35] yay for parsing JSON with regex :D [16:37:40] and the first one will be fixed to actually does what we thought it does :p [16:38:27] yea, after this quick fix.. "replace check_http --ereg with bash/php/python/jq something and do normal integer comparison" [16:38:30] i suppose [16:38:35] ack WikidataFacts ^ [16:38:57] :) [21:57:24] Is there a template that provides icon links to Wikidata+Wikipedia? E.g. Like https://www.wikidata.org/wiki/Template:Claim but with Wikipedia "W" or Globe icon, instead of sqid. -- I'm sure I've seen something like that somewhere, but cannot find it. [21:57:46] WikidataFacts: I made https://phabricator.wikimedia.org/T197495 for that issue earlier btw [21:57:55] I found another problem in the process :P [21:58:38] "?num wikibase:apiOrdinal true." is supposed to add numbers so you can recover the original order of the results... except it resets for each page of results [22:05:09] ah :D [22:05:15] that also sounds like a bug to me [22:05:30] (I assume you can work around the srlimit thing with a subquery, btw?) [22:05:38] just created https://phabricator.wikimedia.org/T197496 for it [22:06:15] it's not a very good workaround because it still wastes a load of time and makes my queries likely to time out :/ [22:06:29] (depending on the text I'm searching for, of course) [22:10:16] for now I'm just using two normal mw api queries, I can't filter by properties that way though [22:41:34] boohoohoo, quickstatementsbot is stealing me edits XD