[00:52:59] Why is Mix-n-match not working for me? :/ [01:23:09] https://twitter.com/JonatanGlad/status/1007070779983319040 [06:35:32] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.086 second response time [07:56:34] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1961 bytes in 0.077 second response time [08:16:35] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.065 second response time [08:33:52] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1971 bytes in 0.067 second response time [08:53:26] blergh, it still hits 600 seconds [08:59:02] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1967 bytes in 0.098 second response time [09:31:42] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1973 bytes in 0.086 second response time [09:57:12] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1969 bytes in 0.072 second response time [10:04:31] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1967 bytes in 0.086 second response time [11:05:24] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1947 bytes in 0.079 second response time [11:12:33] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.081 second response time [11:57:59] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1969 bytes in 0.076 second response time [12:11:18] dear all [12:11:45] i install the wikidata data , 440 GB [12:13:16] on my 24 core , 128 Ram , SSD HDD server [12:13:24] and i want to download this : [12:13:36] SELECT DISTINCT ?given_name (str(?given_nameLabel) as ?label) ?fnameLabel ?genderLabel ?countryLabel WHERE { ?given_name wdt:P31 wd:Q5. OPTIONAL { ?given_name wdt:P21 ?gender. ?gender rdfs:label ?genderLabel . FILTER(lang(?genderLabel) = "en") } OPTIONAL { ?given_name wdt:P735 ?fname. ?fname rdfs:label ?fnameLabel . FILTER(lang(?fnameLabel) = "en") } OPTIONAL { ?given_name wdt:P27 ?country. [12:13:36] ?country rdfs:label ?countryLabel . FILTER(lang(?countryLabel) = "en") } ?given_name rdfs:label ?given_nameLabel . } [12:13:49] how can i download this withoout any limit ? [12:18:17] Hi, beshoo [12:18:21] Hello [12:18:33] when i send this command server hang for long time [12:18:36] and breack [12:18:51] java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent [12:18:51] . [12:19:30] You should be able to download the data if you use QIDs instead of labels [12:19:31] abian hello . [12:19:49] I need the labels! [12:20:07] I guess they can be retrieved in a more efficient way [12:20:10] what should i do with QIDs [12:20:16] But I should test [12:20:59] i spend 4 days just to install wikidata [12:21:11] just to download this [12:21:25] with " labels" [12:21:25] Ahh, so you're running the Query Service locally [12:21:31] Yes [12:21:34] I misunderstood you, sorry [12:21:37] on my own server [12:22:01] my server is 24 core , 128 Ram , SSD HDD server [12:22:11] That might be unnecessary with some optimization tips [12:22:28] Kindly :) [12:22:31] But, anyway, to remove the limit... perhaps WikidataFacts knows? [12:22:48] WikidataFacts ? [12:22:55] when i remove the limit [12:22:55] no idea, I’ve never run a local query server with Wikidata data [12:23:32] https://i.imgur.com/pdvL9zC.png [12:24:07] My command : SELECT DISTINCT ?given_name (str(?given_nameLabel) as ?label) ?fnameLabel ?genderLabel ?countryLabel WHERE { ?given_name wdt:P31 wd:Q5. OPTIONAL { ?given_name wdt:P21 ?gender. ?gender rdfs:label ?genderLabel . FILTER(lang(?genderLabel) = "en") } OPTIONAL { ?given_name wdt:P735 ?fname. ?fname rdfs:label ?fnameLabel . FILTER(lang(?fnameLabel) = "en") } OPTIONAL { ?given_name wdt:P27 [12:24:08] ?country. ?country rdfs:label ?countryLabel . FILTER(lang(?countryLabel) = "en") } ?given_name rdfs:label ?given_nameLabel . } [12:24:18] will never works on the web site [12:24:32] https://query.wikidata.org here [12:25:21] that is why i insalled it on my powerful server [12:25:45] so i can get rid of timeout [12:25:48] but even [12:27:31] ok let say i will run the command as patches [12:27:48] how can i tell , load from 1 to 100000 , [12:28:00] and load form 100000 to 200000 [12:28:09] like limit in mysql [12:28:20] I think if you add `-Dwdqs.throttling-filter.enabled=false`, it might work [12:28:30] that should disable throttling, if I’m reading the code correctly [12:28:35] (I haven’t tested it) [12:28:45] add that to the java command line, I mean [12:28:58] hold ... let me undestand [12:29:05] in my command line [12:29:09] i am doing this [12:29:30] curl 'http://localhost:9999/bigdata/namespace/wdq/sparql?query=SELECT%20DISTINCT%20?given_name%20(str(?given_nameLabel)%20as%20?label)%20?fnameLabel%20?genderLabel%20?countryLabel%20WHERE%20{%20?given_name%20wdt:P31%20wd:Q5.%20OPTIONAL%20{%20?given_name%20wdt:P21%20?gender.%20?gender%20rdfs:label%20?genderLabel%20.%20FILTER(lang(?genderLabel)%20=%20"en")%20}%20OPTIONAL%20{%20?given_name%20w [12:29:30] dt:P735%20?fname.%20?fname%20rdfs:label%20?fnameLabel%20.%20FILTER(lang(?fnameLabel)%20=%20"en")%20}%20OPTIONAL%20{%20?given_name%20wdt:P27%20?country.%20?country%20rdfs:label%20?countryLabel%20.%20FILTER(lang(?countryLabel)%20=%20"en")%20}%20?given_name%20rdfs:label%20?given_nameLabel%20.%20}&format=json' [12:29:51] -Dwdqs.throttling-filter.enabled=false where should i add this ? [12:29:56] you have to add that to the command line where you’re running the server [12:30:00] not on an individual request [12:31:16] if you’re using `runBlazegraph.sh`, then `BLAZEGRAPH_OPTS=-Dwdqs.throttling-filter.enabled=false ./runBlazegraph.sh` might work [12:31:47] yes i am using ./runBlazegraph.sh [12:31:58] or ./runBlazegraph.sh -o '-Dwdqs.throttling-filter.enabled' [12:32:10] ok hold let me test [12:33:41] A nearly identical query can also be run via LDF: https://ldfclient.wmflabs.org/#query=%0ASELECT%20DISTINCT%20%3Fgiven_name%20%3Fgiven_nameLabel%20%3FfnameLabel%20%3FgenderLabel%20%3FcountryLabel%20WHERE%20%7B%20%3Fgiven_name%20wdt%3AP31%20wd%3AQ5.%20OPTIONAL%20%7B%20%3Fgiven_name%20wdt%3AP21%20%3Fgender.%20%3Fgender%20rdfs%3Alabel%20%3FgenderLabel%20.%20FILTER(lang(%3FgenderLabel)%20%3D%20%22en%22)%20%7D%20OPTIONAL%20%7B%20%3Fgiven_name%20wdt%3AP735%20%3Ff [12:33:41] name.%20%3Ffname%20rdfs%3Alabel%20%3FfnameLabel%20.%20FILTER(lang(%3FfnameLabel)%20%3D%20%22en%22)%20%7D%20OPTIONAL%20%7B%20%3Fgiven_name%20wdt%3AP27%20%0A%3Fcountry.%20%3Fcountry%20rdfs%3Alabel%20%3FcountryLabel%20.%20FILTER(lang(%3FcountryLabel)%20%3D%20%22en%22)%20%7D%20%3Fgiven_name%20rdfs%3Alabel%20%3Fgiven_nameLabel%20.%20%7D%20 [12:34:11] please use a pastebin... [12:35:00] Nope [12:35:07] it did not works [12:35:31] java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: org.openrdf.query.QueryEvaluationException: java.lang.RuntimeException: java.util.concurrent [12:35:31] .ExecutionException: java.lang.Exception: task=ChunkTask{query=15d40b7a-76d9-4aca-8ae7-e630392dcd3c,bopId=6,partitionId=-1,sinkId=8,altSinkId=null}, cause=java.util.concurren [12:36:26] pastebin please [12:36:36] the important part of the exception is generally at the end, and I can’t even see that because it got cut off [12:36:48] Hold please [12:39:08] https://pastebin.com/8HtRhsYe [12:39:25] okay, that’s a different error [12:39:28] out of memory [12:40:11] 128 GB and nothing on the server works [12:40:29] i am dedecating the server to get this download :) [12:40:56] how can i locate all memory for this [12:41:32] or as i told you can we send the command as pagination ? [12:41:58] so i can make a script will loop to download pages ? [12:42:23] Perhaps you have to adjust the memory of your JVM (?) [12:42:40] I really don’t think throwing more resources at the problem is the best way forward here [12:42:48] try to optimize your query instead, as abian said [12:43:04] You probably don't need to retrieve labels for ?gender nor ?country [12:43:24] The number of possible values is low and you can retrieve these values later [12:43:27] what we can optimize! it is straightforward [12:43:58] maybe gender but ?country i need it ! [12:44:01] straightforward ≠ efficient :) [12:44:10] for example, you’re only going to see at most 19 values for the gender: http://tinyurl.com/y7f2fpf4 [12:44:20] so there’s no need to fetch the label of the gender for each of the millions of result rows [12:44:22] do it afterwards instea [12:44:24] afk [12:45:30] can we paging the data ? [12:45:37] like in mysql ? [12:45:43] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1968 bytes in 0.095 second response time [12:46:56] in mysql i can select * from table limit 100,1 this will give me first 100 record [12:48:05] and i can paging the quiry . so i can download all pages in loop [12:48:18] not asking for allll pages at once ! [12:49:49] is there any function to do that ? [12:49:58] beshoo: You can use something like https://etherpad.wikimedia.org/p/un20QuKgUY [12:50:25] Iterating with OFFSET [12:51:15] Try to set a LIMIT that lets you get as many results as possible first [12:51:15] let me read please [12:51:36] limit works fine with 50000 [12:51:53] just 20 sec of waiting [12:52:40] abian, beshoo: LIMIT+OFFSET isn’t really a viable strategy with SPARQL as far as I understand [12:52:50] because for the higher OFFSETs, it still has to compute all the earlier solutions [12:53:43] But it's possible to get all the Q5s, the issue comes with labels [12:53:57] yes [12:55:53] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1957 bytes in 0.077 second response time [12:57:12] well the purpose is the labels ! [12:57:20] WikidataFacts: Not sure what you mean exactly :/ [12:57:50] because for the higher OFFSETs, it still has to compute all the earlier solutions ! [12:58:06] Not their labels, I suppose [12:58:10] What it is a bug in the system ! [12:58:22] Or does it? :( [12:58:33] abian, beshoo: paging using OFFSET is inefficient. it should never be done on big data sets, even with sql. [12:58:47] abian: well it would at least have to check whether the label exists… [12:58:49] pagign should always be based on a unique key [12:59:08] but to be honest, I have no idea how many optimizations Blazegraph even applies to OFFSET at all [12:59:20] because you really don’t want to use it [12:59:45] (for simple queries, Linked Data Fragments are a better “paginating” solution AFAIU, see hoo’s comment earlier) [13:00:01] WikidataFacts: with SQL, you can page based on a unique key. but with SPARQL that'S not really possibel, is it? [13:00:38] well, I imagine if you only have a single triple pattern, it should still be possible to page efficiently [13:00:43] but not for a full query, probably [13:01:04] But https://etherpad.wikimedia.org/p/un20QuKgUY has just a triple inside [13:02:52] abian: the optimizer does weird things to subqueries sometimes – named subquery might work better: http://tinyurl.com/yavus6qg [13:03:18] with that, an offset of 1M still seems to work [13:03:25] in mysql : SELECT * FROM tbl LIMIT 5,10; [13:03:34] nothing here like that ? [13:03:43] and mysql will not load all old pages [13:03:55] SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15 [13:04:21] SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows [13:04:22] WikidataFacts: Okay, thanks! Named subqueries are still unknown territory for me :) [13:06:44] Both seem to be working in the same way in this case though [13:07:44] really? ok [13:07:53] (tbh I didn’t even try the non-named form :D) [13:07:57] but how can that help me if i can not see the wd:Q4992! [13:08:01] what is this wd:Q4992 ! [13:09:12] You can write `SELECT DISTINCT ?given_name ?fname ?gender ?country ?given_nameLabel ?fnameLabel ?genderLabel ?countryLabel WITH { [13:09:30] ... instead of the old first line [13:09:40] It seems to be working equally [13:10:31] you mean take out (str(?given_nameLabel) as ?label) ? [13:11:29] WITH ? [13:11:32] http://tinyurl.com/yav8mukq [13:12:02] Although it would be better with less labels [13:12:46] SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } this only get english [13:12:53] i need all lang ! [13:13:47] beshoo: then don't use the service. [13:14:13] just ask for rdfs:label [13:15:21] beshoo: You are using FILTER(... = "en") since you've entered this channel :) [13:16:07] No , hold let me show you [13:16:08] We've just put your first triple in a subquery so that you can execute it [13:16:25] http://tinyurl.com/y8lmo8yv [13:16:37] フランソワ・ヴィヨン [13:16:51] (str(?given_nameLabel) as ?label) [13:17:56] Is that what you actually want? That's not the given name but the label of each person in Wikidata [13:18:32] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1962 bytes in 0.075 second response time [13:18:39] Well , i need to get all human names and the translation of the name on other lang. [13:19:03] Your given names are ?fname/?fnameLabel [13:20:05] ?given_name wdt:P735 ?fname. [13:20:10] yes [13:21:31] In that case, you can add again the FILTERs, as DanielK_WMDE suggests, but without filtering LANG(?fname) [13:23:03] please can you show me a demo [13:23:15] i am very new to this ! [13:24:28] My target : load all human name + full name + gender + country of human in all lang. [13:25:05] in all lang. is the translation of the name [13:25:16] https://i.imgur.com/7CKFP8y.png [13:26:59] Please can you help me with a good selelct ! since it i very complecated [13:31:22] beshoo: i don't think you can do that in one select. the result set is too large. [13:31:35] do a select to find all Qs. then split that up, and ask for the labels of each set [13:31:41] +1, now this explodes [13:31:52] you can use the plain web api for that instead of sparql, too [13:32:36] https://etherpad.wikimedia.org/p/un20QuKgUY [13:38:08] what is loading this ? [13:38:22] the XML ? [13:41:24] beshoo: What you want, you can use -H 'Accept: whatever' in your curl command [13:41:50] Where 'whatever' can be, for example, application/json [13:45:26] the one you sent is not show the givin name in other lang. [13:45:51] which is in my case (str(?given_nameLabel) as ?label) [13:46:50] correct ? [13:47:51] beshoo: can you clarify whether you *really* mean the given name proeprty and item, or you just want the person't full name in all languages? [13:48:02] do you need both? [13:49:04] well i only need giving name in all lang. [13:49:34] not the full name? [13:50:08] i'm confused... why do you need all *people* if you are only interested in all names? [13:50:13] can't you just list all names, then? [13:51:02] well i will tell you what i am doing ... i need all names to create an AI system to know the gender and the country of the name [13:51:50] and i need the name in all lang ! [13:52:12] that is the goal ! [13:52:49] Then you don't need to get all people unless you want to get the number of people having each name, and you should use a different (better) query for that anyway [13:53:42] that is correct , and i am new to this , and i dont know how to do it correct way ! [13:53:45] you want to infer gender and country from the given name or from the full name? [13:53:53] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1977 bytes in 0.078 second response time [13:55:21] just let me understand this : [13:55:22] https://i.imgur.com/7CKFP8y.png [13:55:30] what is these translations ? [13:55:37] giving name or full name ? [13:56:49] those are the labels of the item [13:57:02] the item of ? giving name ? [13:57:08] which for humans usually correspond to the full name [13:57:08] of the full name [13:57:10] the item of the human [13:57:37] nice , i need these item of this human and the country & gender [13:58:19] Are you sure? Labels in multiple languages are usually lots of identical strings or their exact transliterations [13:58:38] but I don’t understand why you want to do this whole process anyways [13:58:47] the given name item already tells you the gender and the language(s), e. g. https://www.wikidata.org/wiki/Q13553631 [13:58:54] no AI needed :) [13:59:23] let me show you this : [13:59:24] wdt:P27 ?country. [13:59:26] sorry [13:59:30] https://instaranker.com/panel/gender6.php? [13:59:41] open it and type name in any way you like [13:59:51] type a name form your mind [14:00:33] this will try to understand the name if it FEMALE or MALE [14:00:55] it use 500k FEMALE and MALE name to learn from , [14:01:06] using ngram2 [14:01:44] so if i have big database of first name of human in all lang. then i can do a wonderful thing [14:02:13] and if i have a country of the name , it will be nice add on [14:02:25] that is all ! [14:03:21] Nice tool, beshoo :) [14:03:33] Thank you [14:03:50] and i want to give this tool as free to the community [14:04:03] but i need the database to learn form ! [14:04:50] Then you don't need full names, you just need to count the number of every combination (name,gender,country) [14:04:52] so is any user type name in any lang, i can tell what is the Gender even if he type a name not avalabile in the dataset [14:05:43] first of all please execuse my english :) i am from syria :) [14:05:57] i dont fowllow you with " combination " [14:06:18] Sorry, probably it's (name,language,gender,country) [14:06:24] yes [14:06:33] but name in all lang. [14:07:08] "execuse my english" → No problem, mine is terrible too :) [14:07:12] so i dont know how to do that , but wiki is the great DB ever made [14:07:25] Yeah, definitely :D [14:07:36] and i can make a wondefull free service for evry one to use [14:07:47] and to download an learn [14:08:12] i spent 200$ on this server to download wiki [14:08:19] so no timeout [14:08:35] so i am looking fo such way to do that [14:09:18] abian : with big respect to you sir [14:09:40] so as long as we have this DB , which is GOD gift .... [14:09:59] i need a way to get all (name,language,gender,country) [14:10:25] (name=> in all language,gender,country) [14:11:02] so here’s what I would suggest: https://etherpad.wikimedia.org/p/ChzkT4FIzK [14:11:22] for every given name item, get the language of the name and the gender and country of people with that name [14:11:44] Pleasehold let me test [14:11:52] https://www.wikidata.org/wiki/Special:Contributions/186.5.238.213 [14:11:57] vandalism ^ [14:12:08] i.e. https://commons.wikimedia.org/wiki/Commons:Village_pump#Anti-Arpitan_vandalism_in_Wikidata [14:12:48] happened yesterday [14:14:13] abian : a lot of dublecation ! [14:14:34] DISTINCT [14:15:40] still a lot of same rows [14:15:57] aha i think i know why [14:16:21] country [14:16:53] well, you want to have those duplicate rows, no? [14:17:06] yes yes since country is not the same [14:17:12] if a name is mostly used for women but occasionally for men, you want to have more female than male rows so that you can know the ratio [14:18:31] can i selelct all Q of https://www.wikidata.org/wiki/Q7411 [14:18:49] so in php i can replace the Q with the labile of it [14:19:01] and for the lang. [14:19:52] language and country labiles [14:28:35] ? [14:29:44] since i dont know the Q what it is , if i can list all country and language labels , so in php i can replace this Q witht the correct label [14:29:52] PROBLEM - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.094 second response time [14:31:25] ^ turning that into a WARN now [14:31:32] that will mean no output on IRC [14:31:41] then we add a second check that turns CRIT after 1 hour [14:35:37] ACKNOWLEDGEMENT - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1942 bytes in 0.071 second response time daniel_zahn . [14:36:13] are WARNs still public somewhere else? [14:36:55] in the web interface of icinga [14:37:36] login is granted based on an LDAP group, not fully public [14:37:48] but wmde people already have icinga contacts [14:38:16] ok thanks [14:38:31] WikidataFacts any advice [14:39:03] how to list all language labels and there Q ? [14:39:09] and country as well [14:40:02] RECOVERY - wikidata.org dispatch lag is higher than 600s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1971 bytes in 0.071 second response time [14:40:53] beshoo: Do you want the ratio male-female for each combination or not? [14:41:26] can we get it ! [14:42:09] Well Ngram will do it by it self [14:42:18] Ah, okay [14:42:41] but now if i can not load labels , i need a way to know what this Q [14:43:18] so how can i tell wiki give me the Q and labels for work lang. [14:43:32] so i will replace them later with php [14:43:51] so Q7411 = Dutch [14:51:25] do you think http://tinyurl.com/yafbg8nw we only have 179242 name with dublecation ! [14:52:01] beshoo: label languages are not identified by the language Q-id. they are identified by language codes (e.g. "nl" for dutch). [14:55:41] i remamber there is LANG() of LANGCODE [14:55:45] correct ? [14:55:53] to show the language codes [15:23:30] dispatch lag don to 3.5 minutes :) [15:23:32] down [15:23:49] \o/ [15:32:42] SELECT DISTINCT ?name WHERE { ?person wdt:P31 wd:Q5. ?person wdt:P735 ?name. } = 28833 results, is this mean we only have 28833 name in wikibidia ? [19:14:02] du wirklich hast mich, du wirklich hast mich du wirklich hast mich jetzt verstanden [19:14:08] (sirry :D) [19:14:10] sorry even [19:17:10] - verstanden :P haha it's bad no matter what anyway