[01:29:59] C [09:03:57] Lucas_WMDE: according to https://tools.wmflabs.org/wikidata-todo/stats.php?reverse there are 37 million wikipedia references and 77 million others, so 12 million doesn't seem that low for enwiki (there are probably quite high numbers for other wikipedias too) [09:04:58] also what counts as a different reference? if they're the same except the date retrieved, is that different? :P [09:20:40] nikki: where do you get the 37M and 77M from? I don’t see them on that page (also # referenced statements ≠ # references) [09:20:55] there's a tab for a table rather than a graph [09:21:03] and the way I was counting, to be the same reference it has to have the exact same snaks (property+value), and only those [09:21:07] same hash, that is [09:21:17] it’s still a lot of tables, I don’t know which one to look at :) [09:21:30] oh, I see, one of the tables is just for Wikipedia references [09:33:27] I was looking at the top one [09:34:22] anyway I suspect enwiki is really the top one, non-wikipedia ones are likely to have other stuff like dates/urls/ids [09:35:16] I guess the only way to find out would be to extract them from a data dump [09:35:23] could adapt the descriptions thing [09:36:06] :D [09:36:15] yeah, I doubt anything beats enwiki [09:36:31] I thought perhaps VIAf or something, but that’s likely to have qualifiers or something [09:50:30] yeah, I think those tend to have dates, at least the more recent ones [10:17:40] is there any way if I can set to see date in the property (ie. https://www.wikidata.org/wiki/Q18630113 ) in ISO 8601 format? I see it in DD MM YYYY and locale in Korean. [11:18:08] how can i download all images from commons based on a query? [11:19:34] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1946 bytes in 0.107 second response time [11:29:34] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1958 bytes in 0.132 second response time [13:31:48] hi [13:32:11] i want help me [13:33:27] hello [13:34:19] kouider: ask your question? [13:35:20] i want clean history from my article [13:36:42] which article? what do you mean by "clean history"? [13:41:40] on this link https://ar.wikipedia.org/w/index.php?title=%D8%B3%D9%8A%D8%A7%D9%81_%D8%B3%D9%84%D9%8A%D9%85%D8%A7%D9%86_%D9%85%D8%AD%D9%85%D8%AF_%D8%A3%D8%A8%D8%A7%D8%A7%D9%84%D8%AE%D9%8A%D9%84&action=history [13:41:56] i wana clean edit history [13:44:12] That is not related to the project Wikidata. [13:44:35] Revision history [13:44:47] ask someone on #wikipedia-ar [13:44:55] /join #wikipedia-ar [13:45:31] ok [13:46:02] or info-ar@wikimedia.org [17:56:55] how do i modify a query to ONLY QUERY FOR LIVING PEOPLE? [17:58:47] ok, i have to filter by P570 [18:03:50] frettchen: MINUS { ?person wdt:P570 ?dateOfDeath. } [18:05:39] WikidataFacts: How do I only the first image of a person? [18:06:12] GROUP BY ?person ?personLabel at the end, and select (SAMPLE(?image) AS ?image) [18:06:17] but that will get you any image [18:06:28] there’s no way to get the first one [18:06:37] afaik statement order isn’t supposed to be meaningful anyways [18:06:58] and why do i get so many duplicates: http://tinyurl.com/yclum63h [18:07:31] remove line 4 [18:07:32] there's no such thing as "first" unless you add a qualifier like "ordinal number" or something [18:07:39] or make it preferred [18:08:35] frettchen: http://tinyurl.com/ybd8p45d [18:09:45] oh thanks [18:52:31] revi: no :( there was some talk of using iso format for some languages on https://phabricator.wikimedia.org/T63958 but I dunno if/when that will happen [19:13:52] SMalyshev, it seems the wikidata+osm service is stabilizing (thanks to you and Jonas_WMDE for all the help!). Now time to optimize it a bit :) Is there a way/need to optimize specific value storage in Blazegraph? e.g. if i have a very common subject prefix, how can I easily adjust Blazegraph to compact its storage? [19:14:27] yurik: possibly, how does it look like? [19:16:04] SMalyshev, https://wiki.openstreetmap.org/wiki/Wikidata%2BOSM_SPARQL_query_service#How_OSM_data_is_stored [19:17:02] osmnode:1234 and such definitely can be optimized, the same way wikidata ones are [19:17:26] SMalyshev, could you point me to the relevant code? Any suggestions are welcome ) [19:17:33] check WikibaseInlineUriFactory class and InlineUnsignedIntegerURIHandler [19:17:57] and probably register prefix in WikibaseVocabulary [19:18:27] note that you want to version it because data stores with different vocabulaties are binary-incompatible [19:19:49] so in vocabulary you can add common prefixes and common predicates [19:20:23] version data store? E.g. if i add a new inline class, I have to do a full rebuild of the index? [19:20:52] yurik: yes, if you use different vocabulary [19:20:53] vocabulary is specified in RWStore.properties [19:21:05] com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=org.wikidata.query.rdf.blazegraph.WikibaseVocabulary$V002 [19:21:26] if you change it (and you should not change the class, you create new one!) you will have to reindex [19:22:17] since vocabulary is url->int mapping, and different set of URLs would produce different mapping [19:22:25] so stored ints will no longer work [19:22:51] makes sense. Is it possible to combine multiple classes in RWStore? [19:23:08] yurik: you mean multiple vocabularies? Don't think so... [19:23:31] so my class basically has to call your class to register Ps and Qs ? [19:23:32] the setting seems to be global, not per-namespace [19:23:47] i put everything into one NS, so not an issue [19:23:51] yurik: yes, you can extend my class [19:24:01] just as I extend BigdataCoreVocabulary_v20160317 [19:24:23] note that it's only half the work - the other half is WikibaseInlineUriFactory [19:24:24] gotcha. also, if you collapse everything into an int, how do you keep different types of data separate? [19:24:44] e.g. P123 vs Q123 [19:24:51] yurik: read comment in WikibaseInlineUriFactory :) it's not single int :) [19:25:02] tldr is there are several ints [19:25:18] And inlined uris are stored as 1 [19:25:18] * flag byte, 1 (or 2) uri prefix bytes, and then delegate date type [19:26:30] sorry, i did read that file's comments, a bit unclear :) [19:26:58] yurik: it stores a pair of (type, delegate) basically [19:27:03] where type would be your prefix [19:27:23] gotcha, so two ints, one for prefix, one for value. Sounds good to me :) [19:27:36] so Q123 would be stored as (1, 123) and P123 as (2,123) provided 1 and 2 are ints that match P and Q in the vocabulary [19:27:55] that's why you need to register stuff in the vocabulary - so it gets stable small id [19:28:14] (which also means you can't have vocabulary that is too fat but we're not near that) [19:28:47] yep, it is all much clearer now, thanks :) [19:28:51] yurik: yes. [19:29:04] hmm... could there be a significant IRC lag? [19:29:26] yurik: be careful that the suffix part is short - if it's more than 6 bytes no point to inline [19:29:38] i wonder why WD chose Ps and Qs... could it be because you should "mind your Ps and Qs" ? [19:30:02] P is just property :) Q has more interesting story :) [19:30:27] SMalyshev, i meant https://en.wikipedia.org/wiki/Mind_your_Ps_and_Qs [19:30:51] yurik: yeah I know :) [19:30:55] :D [19:31:00] wow, the JSON dump is 224 GB now (unpacked) [19:33:12] hoo, time to switch to PBF for data distribution :) [19:33:49] SMalyshev, thanks for your help, i will try something out soonish. Thanks! [19:34:04] np :) [19:34:13] SMalyshev, next step would be more complex - how can it store geomerties? [19:34:18] can it? [19:34:28] yurik: well, we can store any string... [19:34:43] yeah, but processing it would be painful [19:34:51] it's a database :) making use of them (aka geosparql) is another question :) [19:34:58] blazegraph doesn't have geosparql support as of now [19:35:17] did you have to implement all the geospatial stuff yourself for wd? [19:35:21] e.g. point distance [19:35:28] it has geosearch (aka which points near this point) but that's it [19:36:06] our geospatial is using that. but no operations like shapes, intersections, etc. [19:36:27] it's just very basic index & distance calculations, no more [19:36:56] i wonder if that old osm rdf store can be reused for that [19:37:04] they seemed to have implemented some geo handling [19:37:30] anyway, this is a much bigger discussion, lets continue another time, after i solve the basic stuff. Thanks for your help! [19:37:32] later [19:38:28] SMalyshev, oh, and btw, the OSM+WD db was nominated at this year's OSM awards - https://blog.openstreetmap.org/2017/07/20/choose-the-best-among-us-at-osm-awards-2017/ [19:38:53] coolio! [20:57:20] yurik: have some time for https://www.mediawiki.org/wiki/User:Graphoid ? [20:57:42] he is bothering me that nobody would help him [21:36:29] nikki: thanks, that sucks (meh)