[12:15:14] why does Suriname have "France" and "European Union" as bordering countries? [12:16:08] or have i messed something [12:16:38] it neighbours French Guiana, which is a French territory [12:16:58] can you think of a way of filtering such an edge case out?? [12:17:19] im thinking more just about physically tangential nations [12:17:52] in theory, these *are* physically tangential nations :) [12:18:29] sshhhhh [12:18:45] any magical wiki identifier come to mind? [12:18:49] :D [12:21:17] I guess you’re interested in Metropolitan France? https://www.wikidata.org/wiki/Q212429 [12:21:27] not sure if it has useful adjacency data though [12:26:21] hmm i guess not, but thanks anyway [14:01:09] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @James_F & @mooeypoo - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:51:08] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @James_F & @mooeypoo - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:17:23] hello, I have started using wikidata recently and I am very impressed by its performance. I am using other SPARQL endpoints at work and those do not perform ads well as yours. I would like to learn more about your architecture and data modeling. Could somebody point me to a source with updated information? Thank you. [15:18:48] miquel: would https://wikitech.wikimedia.org/wiki/Wikidata_query_service answer your questions? [15:30:01] this is helpful pintoch, I am also interested in learning more details like if you cache results somewhere, SPARQL issues, things that you would change... [15:30:18] maybe somebody made a presentation about this in the past and it was recorded? [15:31:12] I’m not aware of any presentation specifically about its performance [15:31:26] GET requests are cached (in Varnish, I think) for five minutes [15:49:01] is there somebody that would be open to give an overview of model, sparql related issues, architecture,... in a hangouts or similar? [15:52:13] this is the most detailed doc I found: https://iccl.inf.tu-dresden.de/w/images/5/5a/Malyshev-et-al-Wikidata-SPARQL-ISWC-2018.pdf [16:10:08] nobody knows the details :p [16:14:50] miquel: you might want to join the technical advice IRC meeting: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [16:15:26] thanks SothoTalker, I am there as well (#wikimedia-tech, right?) [16:15:50] #wikimedia-tech sent me here actually :) [16:15:57] really? [16:16:13] bounce bounce :D [16:17:27] well, i'm just a normal user, the only thing i know is that they make use of multiple servers [16:51:35] i just found that we miss a lot of properties for species :p [17:04:11] SothoTalKer: we certainly do :/ https://twitter.com/LucasWerkmeistr/status/1123633357131325441 [17:14:50] i actually want to know how much % i've edited of a query... [17:15:13] that query assumes that no one else used the same reference as I did [17:15:19] probably not going to work in most cases [17:15:29] unless you use the history query service (which I haven’t played with yet) [17:41:25] lucaswerkmeister: looks like there's some room for improvements [20:36:04] lucaswerkmeister, matthiasmullie: are you around per chance? need some help with SDC entities setup, I am not sure I understand what's going on there [20:37:59] around but not sure if I can help [20:56:10] lucaswerkmeister: do you know how SDC/mediainfo works with namespaces/slots, etc.? [20:56:28] not very well :/ [20:56:47] only that it uses a different slot in the same (File) namespace, as far as I know [20:57:31] yep but what is the separate mediainfo namespace then? [20:57:40] is it used or not? [20:57:49] I’m not aware of one [20:58:11] I don’t see it on https://commons.wikimedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces&formatversion=2 either [20:58:37] on https://federated-commons.wmflabs.org/ there's namespace 144 with a bunch of M* entities [20:59:08] hm [20:59:14] might be set up differently there [20:59:26] no idea if that wiki is even used, I remember it in the context of early tests years ago [20:59:39] on Commons, the entity ID is M plus the page ID of the file page [20:59:56] (which you can see on ?action=info, for example) [20:59:57] lucaswerkmeister: yeah but what namespace is it? [21:00:08] it’s not stored as a separate page at all [21:00:31] ok but then how can I get from page to the actual entity? see https://phabricator.wikimedia.org/T222299 [21:01:28] i.e. if I have entity type, how I find entities of this type? previously it went via namespace and then just used page title [21:01:35] but this doesn't work anymore for mediainfo [21:01:54] hm, not sure [21:02:18] a) no idea if Wikibase has code for that already, and b) no idea if it’s even possible to list pages with mediainfo slots at all [21:04:50] hmm that's what we need for dumps [21:05:08] otherwise there's no way to make a dump of mediainfo entities [21:05:15] yeah [21:06:51] I'm not even sure I understand where it's stored... [21:06:56] looks like the `slots` table only has indexes starting with slot_revision_id, so getting all the mediainfo slots from there is not going to work [21:07:04] couldn’t find anything in the page_props either [21:07:45] so I have page table, with wikitext content type. But it does not say which slots are there... [21:08:26] and slots table does not index by page id [21:08:45] no, because slots are per revision [21:08:51] a file can start out with only the main slot [21:08:59] and then get a mediainfo slot the first time the structured data is edited [21:09:02] btw why we have page and revision (singular) but slots (plural)? just for fun [21:09:08] very inconsistent [21:09:12] mw.org says plural is preferred [21:09:26] but we have enough schema changes and other DBA work before we can even think of cleaning that up, I guess ☺ [21:09:37] no, fake news, mw.org says SINGULAR is preferred [21:09:39] sorry [21:09:46] oh fun [21:10:10] it says the plurals are “historical” exceptions, but with `slots` that’s clearly not true [21:10:15] anyways [21:10:20] so, let's suppose I want to get a list of entities. [21:10:34] how I even know if entity type is main type or slot type? [21:10:53] e.g. items live in main slot but mediainfo lives in separate slot... how do I know? [21:12:16] if I could then I guess I could use page_latest against slot table and maybe get somewhere [21:12:53] of course I still need to get to slot ID which I am not sure how... and still not sure how I get entity ID from all that [21:13:10] it looks like WikibaseMediaInfoHooks adds that to an $entityNamespacesSetting array [21:13:14] not sure where that array ends up [21:13:16] I mean mediainfo uses page id, but some other extension might use something else? [21:13:35] yeah $entityNamespacesSetting[ MediaInfo::ENTITY_TYPE ] = NS_FILE . '/' . MediaInfo::ENTITY_TYPE; [21:13:45] but I'm not sure what I am supposed to do with this [21:14:47] I don’t know either [21:14:57] I think you’ll need help from the Structured Commons people on this [21:20:24] but also, joining page_latest on slot_revision_id will still require you to scan tens of millions of rows [21:20:36] even though we’re currently only at a few tens of thousands of entities, I believe [21:20:41] I was hoping for a way to avoid that [21:20:50] but I couldn’t find one yet [21:32:56] yeah not sure either... ok i'll keep digging [21:33:01] thanks for helping [21:33:07] good luck [21:34:21] slots are indexed by slot_revision_id, slot_role_id so maybe it's not that bad [21:36:20] I thought you’d need an index starting with slot_role_id for that to be efficient