[09:01:49] Lydia_WMDE: Will the future structured data for Commons be able to handle this? https://commons.wikimedia.org/w/index.php?title=File%3AWikimedia_Foundation_website_screenshot_-_21_September_2016.png&type=revision&diff=231531909&oldid=218546874 [09:18:30] or https://commons.wikimedia.org/w/index.php?title=File%3AArticle_recommendations.png&type=revision&diff=231538737&oldid=215358938 [10:50:08] Josve05a: it'll definitely be possible to make statements saying something like "contains: image x" and link that to the other image [10:50:23] you could then also add the license as a qualifier or leave it out and get it from the linked image [10:50:29] both is possible [10:51:10] but detirmine if the licenses are in "confliction" with eachother as well? If a GFDL-1.2-only image is used in a cc-by-sa-4.0 file etc. Could such be detected? [10:51:43] should be possible to build some tools to find images with conflicting licenses yeah [10:51:47] if we have the data in wikidata [10:51:54] like licenses x conflicts with license y [10:52:11] then you should in the future be able to query for images that have such a conflict [11:00:24] woop [15:38:01] Glorian_WMDE: I was wondering. Are you a volunteer who got hired or did you come from outside of the movement? [15:44:42] Glorian_WMDE: https://sites.google.com/a/ucsc.edu/luca/ [15:45:46] multichill: I got hired from outside the movement. Why? [15:47:43] are we a movement? [15:48:15] There's no such thing as a movement. There are individual men and women and they contribute. [15:55:00] ethylisocyanat: that's like saying there is no internet, just individual computers sending and receiving data. [15:55:35] i personally prefer "community" in this context, but "movement" seems fine [15:56:40] in case of open source software i would speak from a movement, but not for wikipedia or wikidata [15:56:52] too many contributors have not the same goal [15:59:02] ethylisocyanat: Semantics, movement or community are both acceptable. [16:00:51] Glorian_WMDE: Because if you've been around as a volunteer, you probably ran into a lot of things volunteers do. [16:02:26] DanielK_WMDE: Fosdem this year? [16:05:04] o/ multichill [16:05:14] Let me know if you want to chat about https://www.wikidata.org/wiki/Wikidata:Project_chat#Quality_Criteria_for_Building_a_Tool_to_Evaluate_Item_Quality [16:05:26] o/ Glaisher [16:05:34] woops. Mean to ping Glorian_WMDE [16:05:45] Sorry Gla.isher [16:05:46] halfak: Sure! [16:06:10] 1. I made https://www.wikidata.org/wiki/Wikidata:Item_quality so that we don't need to make edits to Glorian's post as we iterate on this. :) [16:06:42] halfak: Without taking a look at the history, how would you grade https://www.wikidata.org/wiki/Q28554539 ? [16:06:50] 2. I'm not sure how to capture notions of quality in each level of the scale while staying subjective. [16:07:04] * halfak is no wikidata expert. [16:07:08] This might be embarassing. [16:07:10] But I'll try. [16:07:48] E / Stub [16:08:21] One reference is repeated for each item. Looks like this was pulled from a table somewhere. [16:08:47] Maybe a D / Start [16:09:03] I guess we'll have a lot of items with no reference at all. [16:09:07] What do you think multichill ? [16:11:04] Glorian_WMDE, we're discussing the quality scale. [16:11:17] halfak: o/ [16:11:42] I like the version of the quality scale that Glorian_WMDE proposed because it says things like "many" or "few" rather than tying to anything exact. [16:11:58] But I'm not sure we've hit the sweet spot for WD items yet. [16:12:05] Hence my ping of Glorian_WMDE and multichill [16:12:40] See https://en.wikipedia.org/wiki/Template:Grading_scheme for enwiki's scale to reference. [16:13:01] halfak: *reading* [16:13:12] I wonder if we might talk about what a "Reader's experience" would be for each level that we propose. [16:13:42] In this case, we'd have to imagine a machine reader or a human reader or maybe a Wikipedian who is pulling in data using a lua module. [16:17:09] Are there any Wikidata editors who we really need to pull towards this conversation? Say, maybe the people who curate the lists of Showcase Items? [16:19:24] Wasn't there a script that calculated the quality of a Wikidata item? [16:22:15] i wan't to be the asshole [16:22:29] but wouldn't it have been better not to import so much low-quality data [16:22:39] than introducing a quality meter now? [16:23:00] sorry, i've forgot the *don't* [16:23:56] what about a mandatory waiting period of ~1h before starting any massive edit? [16:30:42] typical, they get disconnected before I can answer... [16:31:14] I was going to say that not adding the data would not have made the situation better [16:32:27] I still think the biggest problem is that adding references is tedious and it's not even very satisfying, since your really good references are treated just the same as the "imported from wikipedia" ones [16:37:44] halfak: I would definitely say multichill's item is higher than the lowest category, it has a bunch of statements, almost all them have references and the references aren't just links to wikipedia [16:38:16] nikki, gotcha. That's cool with me :) [16:39:01] nikki, maybe you can take pass at making the criteria reflect that? [16:39:14] https://www.wikidata.org/wiki/Wikidata:Item_quality [16:39:28] sjoerddebruin: the script does not calculate the whole quality.. it only measures completeness [16:39:30] Maybe the best way to get going would be to collect a bunch of example. [16:39:53] In our case, I think we want a better approximation of quality/completeness than a simple set of rules. [16:40:04] But starting with that script might help. [16:40:19] it would be nice to have more references as additional evidence but it's still a big improvement on a lot of items [16:40:43] halfak: the people who curate the showcase item criteria should be in this channel [16:41:08] halfak: Sorry, got pulled away [16:41:17] nikki, so we might say that having an external reference *at all* could push you above E/Stub? [16:41:59] halfak: It's bot generated so I would say start class. [16:42:28] multichill, what does bot-generation have to do with anything? [16:44:33] hm... hard to say. I'm not sure I would say having an external reference is sufficient (if all the item has is "instance of human" with an external reference, it's still a stub, since you know basically nothing about the item) [16:44:49] halfak: I try to generate a basic item with the bot. Not a stub (not enough info), but shouldn't be rated higher that start [16:45:29] I see. Bots have a higher threshold for item creation quality than humans? [16:46:04] but I could also imagine that an item could have lots of statements but no references, in which case it's not really a stub, it's just in dire need of references [16:46:23] halfak: sorry I gotta leave now.. I will read the log tomorrow [16:46:50] nikki, agreed. How do we make that distinction clear in the criteria. [16:46:52] ? [16:47:10] Without saying something too specific so as to tie an assessors' hands? [17:02:34] hmm... I would say that to get out of the stub class, the statements need to provide enough information to easily identify the item [17:02:50] halfak: No higher standard, but it would be a good reference for start [17:03:29] Looking at the WP system: The lower levels should be able to derived in an automated way. Good and featured always require human review [17:07:49] nikki: that sounds like a good criterion (if we ignore external identifiers), but how do you measure that? [17:08:45] we have properties for saying which properties are expected on certain items, I imagine that would be a good way of determining it [17:11:28] oh and we also have constraints [17:19:04] nikki: I use constraints a lot for quality control! [17:20:43] Do you like the idea of linked items contributing to the quality of the item nikki? [17:27:28] backlinks? not sure, it would definitely contribute to notability, but for quality it seems too variable [18:00:46] nikki, I love this criteria "the statements need to provide enough information to easily identify the item" [18:01:02] It's both clear and easy to apply and does not rely on specifics of an item. [18:06:35] PROBLEM - Check systemd state on wdqs1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:06:35] PROBLEM - DPKG on wdqs1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:06:35] PROBLEM - WDQS HTTP Port on wdqs2001 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused [18:06:35] PROBLEM - Check systemd state on wdqs2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:06:42] PROBLEM - WDQS HTTP Port on wdqs1002 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused [18:06:42] PROBLEM - WDQS SPARQL on wdqs2001 is CRITICAL: connect to address 10.192.32.148 and port 80: Connection refused [18:06:51] PROBLEM - WDQS HTTP Port on wdqs2002 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused [18:06:52] PROBLEM - DPKG on wdqs1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:06:52] PROBLEM - DPKG on wdqs2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:06:53] PROBLEM - WDQS SPARQL on wdqs2002 is CRITICAL: connect to address 10.192.48.65 and port 80: Connection refused [18:06:54] PROBLEM - WDQS HTTP on wdqs2002 is CRITICAL: connect to address 10.192.48.65 and port 80: Connection refused [18:08:31] RECOVERY - Check systemd state on wdqs1002 is OK: OK - running: The system is fully operational [18:08:32] RECOVERY - DPKG on wdqs1001 is OK: All packages OK [18:08:42] RECOVERY - WDQS HTTP Port on wdqs1002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80 [18:08:51] RECOVERY - DPKG on wdqs1002 is OK: All packages OK [18:09:31] RECOVERY - WDQS HTTP Port on wdqs2001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80 [18:09:32] RECOVERY - Check systemd state on wdqs2001 is OK: OK - running: The system is fully operational [18:09:42] RECOVERY - WDQS SPARQL on wdqs2001 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time [18:09:51] RECOVERY - WDQS HTTP Port on wdqs2002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80 [18:09:52] RECOVERY - DPKG on wdqs2001 is OK: All packages OK [18:09:52] RECOVERY - WDQS SPARQL on wdqs2002 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time [18:09:53] RECOVERY - WDQS HTTP on wdqs2002 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time [19:26:48] so, what's the latest on how to run queries that hit the timeout? [19:50:51] nikki: No I meant frontlinks, not backlinks! ;-) [19:53:35] abartov: There's nothing really, I think [19:53:54] If it's important, people with access to the systems can run them there manually [19:54:36] abartov: You can start by sharing here and looking at the optimizer output [19:56:57] what is "the optimizer"? [19:57:01] ^ multichill [19:57:55] I think what is meant is explain output [19:58:23] which is available if you add "explain" query parameter (you need to call the endpoint directly, GUI doesn't do it now) [19:58:33] abartov: what is the query you have trouble with? [19:59:08] SMalyshev, multichill: http://tinyurl.com/zbxnxy2 [19:59:48] SMalyshev: I remember some months ago, there was talk of upping the timeout once some new hardware is deployed. Has this happened? [20:00:46] abartov: ?item wdt:P31 wd:Q5 . # human will already time out [20:01:25] multichill: no, it worked fine, until I added the multiple-countries and badges conditions. [20:01:26] multichill: The optimizer should look for that last [20:01:50] wdt:P31 wd:Q5 in in most cases useless [20:02:13] Can I rely on the gender line, since it's *supposed* only to apply to human females? [20:02:18] abartov: I think that ?wen wikibase:badge … might cause trouble [20:02:20] selectivity is extermely low, and all females are human anyway (for non-humans AFAIK other entity is used) [20:02:26] I’m not sure if that’s the same ?wen as in the FILTER EXISTS [20:02:48] and if it isn’t, that’s a giant cross product of all your result with every featured article ever [20:03:07] abartov: also we have a property for sitelink count now afaik [20:03:19] WikidataFacts: oh! That... would not be what I meant. Can anyone confirm that is the case? and/or suggest a fix? [20:03:28] SMalyshev: cool. what is it? [20:04:00] SMalyshev: repasting my question above: "I remember some months ago, there was talk of upping the timeout once some new hardware is deployed. Has this happened?" [20:04:02] abartov: wikibase:sitelinks see https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_properties [20:04:21] abartov: note it may be out of date though due to some unfortunate bugs [20:04:23] abartov: this already doesn’t timeout (8s): http://tinyurl.com/hyrsuxg [20:04:36] but as SMalyshev says it can be improved with sitelink count, hang on [20:04:38] maybe I haven't tried enough, but I haven't noticed any negative effect from using wdt:P31 wd:Q5 when trying to select people, it's always something in the rest of the query for me (e.g. sitelinks seems to be quite slow for me) [20:04:39] abartov: it is happening right now [20:04:54] it is being configured/installed/etc [20:05:14] but individual queries won't become faster because of it. otoh, we may set higher timeout [20:05:18] 2.6 s: http://tinyurl.com/zypht5o [20:05:18] will http://tinyurl.com/zz3tzq7 do it for you? [20:06:01] abartov: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization [20:06:28] WikidataFacts: oh, very nice, thank you! [20:06:51] also, if there's a fixed list, it should use VALUES not FILTER I think [20:06:59] edoderoo: thank you, but no, I did need those other conditions. [20:07:03] abartov: np – actually, the VALUES ?country also seems to make a lot of a difference, with the FILTER instead I get 21s [20:07:09] SMalyshev: got it, thanks. [20:07:27] abartov, just add those extra filters, they didn't cause the time-out [20:07:28] SMalyshev: We had some performance issues with VALUES. Unpacking in UNIONs turned out to be a lot faster [20:09:05] multichill: you mean with BIND in the UNION or actually copying the entire query for each case? [20:09:33] https://phabricator.wikimedia.org/T154280 this ticket [20:09:34] abartov: also I think your filter exists is wrong... let me rewrite it [20:10:03] nikki: thx [20:12:46] abartov: are you sure btw this query returns any results? I see only 4 articles that satisfy your query that have any badges [20:12:55] none of them are featured badge [20:13:26] abartov: see http://tinyurl.com/hg5oa2d [20:15:04] SMalyshev: WikidataFacts suggested the ?wen outside the filter may not be the same ?wen (?) [20:15:12] WikidataFacts: your query is wrong unfrotunately - the ?wen clause finds any featured article, not featured article about that specific person [20:15:43] oops, you’re right :D [20:15:53] abartov: yes, exactly. but the query above is correct, it just doesn't have any such articles as it seems http://tinyurl.com/hg5oa2d [20:16:00] hm, 0 results with ?wen schema:about ?item [20:16:19] that's what I'm saying - looks like there's no articles with this criteria [20:16:38] SMalyshev: yeah, it's certainly possible (albeit sad) there are no featured articles on women from those African countries. [20:17:33] abartov: yup. there are 3 good and 1 featured list one though. Let me add sitelink counts and labels back.... [20:18:00] SMalyshev: got it, thanks. [20:19:37] abartov: something like this: http://tinyurl.com/gm7q9ta [20:20:17] (unfortunately, wikibase:sitelinks is not filled in for all items, will be soon once we get new hw and do a reload) [20:20:38] oh and https://www.wikidata.org/wiki/Q7565408 is also a bug [20:21:43] it's definitely not a human, no idea why it was done this way [20:21:45] wow [20:21:58] even the English description pretends it’s a human [20:22:27] yeah it's completely messed up... not sure it's wrong enlink maybe? I'll look into it [20:22:27] and the Dutch one if I read it correctly [20:23:44] wow this one is messed up [20:27:51] so, if I want to generalize this to all African countries -- [20:28:05] https://www.irccloud.com/pastebin/O6y6xGss/ [20:28:09] but timed out [20:28:59] I suppose if I collect the QIDs for all African countries it would be faster? [20:30:55] abartov: yeah looks like there's 71 of them [20:31:19] worked for me, 8 seconds: http://tinyurl.com/jtf3bax [20:31:26] of course I might still have an incorrect query :D [20:31:39] oops, line 14 probably shouldn’t be commented out [20:31:42] abartov: but also looks it includes historic ones: http://tinyurl.com/j6yakeh [20:33:37] abartov: this list seems to look reasonable: http://tinyurl.com/hvmgwt4 [20:34:29] woops except for https://www.wikidata.org/wiki/Q7204 - somehow it got into this... [20:34:34] SMalyshev: that last one times out for me. [20:34:52] I actually want to include historic countries, for my purposes. [20:35:03] abartov: ah ok then we can drop filter [20:35:09] (women from historically African countries are African women) [20:38:09] yes of course I just wasn't sure which ones you want to get. So let me see... [20:38:55] abartov: I think then list from here: http://tinyurl.com/j6yakeh should would reasonably well [20:40:01] let me see if it works directly.... [20:41:13] nah too slow so maybe using the list is better [20:41:53] if we exclude "no french article" though it's fast [20:41:59] so let me see maybe I can make it better [20:42:55] abartov: ok, here it goes: http://tinyurl.com/hs7pvfs [20:42:59] seems to work fine [20:43:24] abartov: the trick is FILTER NOT EXISTS is much slower than OPTIONAL+FILTER(!bound) for some reason [20:43:31] probably a bug [20:43:37] I've noticed that [20:44:34] abartov: so you still have three good articles with en, not fr criteria. If you drop no fr, you have 7 good and 2 featured ones [20:44:51] SMalyshev: okay, nice, this works well, even with the full list of African countries. I finally have what I need. :) [20:44:59] cool :) [22:41:39] anyone here?