[07:44:55] PROBLEM - WDQS SPARQL on wdqs1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [07:54:15] RECOVERY - WDQS SPARQL on wdqs1003 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.060 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [08:06:25] PROBLEM - WDQS SPARQL on wdqs1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [08:13:17] RECOVERY - WDQS SPARQL on wdqs1008 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.070 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [14:20:32] Is there any tutorial on how to use the replica cluster of Wikidata in Toolforge wikidatawiki.analytics.db.svc.wikimedia.cloud to make queries with SQL? All I find is SPARQL. For example, how would you use that cluster to query something simple like "return all the instances of humans"? [14:23:21] [mattermost] you don’t really query the structured data in SQL [14:23:45] [mattermost] in SQL, it’s all just JSON blobs in the `text` table, which isn’t even included in the Cloud replica cluster [14:24:15] [mattermost] that’s why you should use SPARQL instea d^^ [14:24:18] [mattermost] that’s why you should use SPARQL instead ^^ [14:32:24] sdesalcala: making a list of humans is not something I'd try to use SQL queries for, as Lucas says. I've only ever used the SQL queries when I needed to join with other tables which are only there, but I'm not even sure that's possible any more. [14:36:32] [mattermost] strictly speaking, connections between items are available in the `pagelinks` table (the source behind [[Special:WhatLinksHere]]) [14:36:33] 10[1] 04https://www.wikidata.org/wiki/Special:WhatLinksHere [14:36:46] [mattermost] and occasionally, it can be useful to query this or a few other meta tables, as Nemo_bis says [14:36:56] [mattermost] but not very often [14:37:22] Hi Lucas, but what if I just want to know the IDs, and I don't need the text? I want to try SQL because SPARQL has timeout errors for certain queries (many of the simple-but-expensive kind, like returning all instances of humans) [14:50:12] [mattermost] I don’t follow… the IDs have nothing to do with SQL vs. SPARQL? [14:51:12] [mattermost] if you want to get all matches for a single triple, you can maybe use LDF: https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2FP31&object=http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ5 [14:59:04] You mentioned that the text contents of the item are not present in the cloud replica cluster. However, I don't think I need them. I only need the ids of the resulting items of the query. So, apart from that, is there any reason why I wouldn't want to query SQL instead of SPARQL? LDF is good, but I think I will also want to do queries more complex [14:59:04] than that. Basically I'm looking into what can one do when SPARQL gives you a timeout error because the query is too demanding. [14:59:37] [mattermost] SQL is not an alternative to SPARQL in a Wikidata context [14:59:48] [mattermost] “text” is the name of the database table where MediaWiki stores the contents of each page [15:00:01] [mattermost] and in the case of Wikidata/Wikibase, those page contents are blobs of JSON [15:00:25] [mattermost] so all you have on the SQL level is a bunch of JSON blobs, one per item, with no connections on the SQL level [15:00:33] [mattermost] you’d need to parse each JSON blob to find the connections to other items [15:00:54] [mattermost] that’s why the data is also exported to Blazegraph, in RDF format, where you can run SPARQL queries on it [15:01:13] [mattermost] and then Blazegraph (I assume) maintains the data in a more useful format, with tons of indexes etc., so that those queries actually perform well [15:02:52] oooh so it's not even possible to make queries with SQL there. I see. Thank you for the clarification. [15:02:52] However, the Wikidata Query Service timeouts for certain expensive queries. What to do then? [15:03:06] [mattermost] there’s not that much you can do, I’m afraid :/ [15:03:12] [mattermost] try to optimize your query [15:03:32] [mattermost] there are some tips for that at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization [15:04:19] [mattermost] if you’re very dedicated, you can also run your own version of the query service, but you need seriously good hardware for that [15:04:26] [mattermost] there was some documentation on that, trying to find it now [15:04:44] [mattermost] https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Standalone_service [15:05:09] [mattermost] https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits/ [15:06:16] [mattermost] but optimizing your query is usually the best bet [15:08:31] I see, thank you for the explanation.