[01:27:22] PROBLEM - High lag on wdqs1003 is CRITICAL: 3658 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:50:12] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1139 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [12:06:00] Hi! is it possible to query information mentioned in Wikipedia articles? e.g. The date of release of a film in the standard "Home media" section of an article? [12:14:38] bitbit: only if that information has been imported into wikidata [13:06:07] DanielK_WMDE: is there any related project that I could do that with? [13:27:12] bitbit: dbpedia might have it, that's the only thing I'm aware of that has extracted data from wikipedia articles more extensively [13:59:02] nikki: thanks. I checked, I guess it doesn't have it. I wonder what would a researcher do to overcome this... [14:15:13] bitbit: write a bot to import the data into Wikidata :) [14:15:43] Sometimes even simple web tools like https://tools.wmflabs.org/pltools/harvesttemplates/ suffice [14:19:45] Nemo_bis: I will check it out. Thanks! :) [14:42:07] UGH I am SO frustrated with not being able to leave explanations when adding (or, especially removing) properties :/ [14:43:01] hm. is ther ea phabricator ticket about this? [14:43:10] really really want this [14:49:34] CatQuest: the API actually allows this. But making this nice in the UI is tricky. Also, the max length of the edit summary may have to be adjusted. Luckily, there no longer is a hard limit imposed by the deatabase. [14:49:56] I know this was discussed in the past. Not sure we actually have a ticket for it, though [14:54:16] yes I know, there re people using (api enabled?) editing tools that do this. and being based on wiki(pedia?/media?) which obvs has it. but WD is very altered so as you say, the UI does not allow for it nicely :( [14:54:59] in MusicBrainz we have the "edit note" wich is in some cases mandatory (when doing un-revertable data) but else "strongly encouraged" [14:55:47] CatQuest: I always wanted this, but since not many people asked for it, it was never high priority, so it was never done. [14:55:56] IF you want to push for it, you have my support :) [14:55:57] if I have some time tomorrow I will try to make a phabricator ticket for it? :D [14:56:03] i deff want ot push for it :D [14:59:07] SothoTalKer: ^ seem to remember you wanted that too [14:59:25] CatQuest: if you want to generate interested, after making the ticket, write Wikidata:Project_Chat and/or the mailing list [14:59:37] * abian endorses :D [14:59:52] halp mailing lists :D [14:59:58] but :D [15:01:08] abian: didn't you also think a historical rank would make sense? do you have any good examples? [15:01:17] I was making some notes in case I make a ticket for it any time soon [15:02:06] nikki: I don't remember right now [15:02:36] Maybe someone else? [15:02:48] I think you said +1 when I mentioned it not too long ago [15:02:58] so maybe you just liked the idea [15:04:33] Looking for the string "historical rank" now :) [15:10:18] nikki: Just found a +100 with <3, :) but not related, apparently [15:10:42] ah, I probably got confused then [15:11:06] < nikki> abian: I've seen that suggested before, not sure why it wasn't changed [15:11:09] < nikki> also what a surprise, they're trying to mass-add descriptions [15:11:12] < abian> Yes, curious :) [15:11:14] < nikki> I really really wish we automatic descriptions so people didn't need to do that [15:11:17] < abian> +100 <3 [15:11:53] coincidentally I'm in the middle of trying to fix a load of mass-added descriptions >_< [15:12:33] When you're in the middle of everything, that's no longer a coincidence :D [15:13:15] I mean I'm alternating between talking in here and editing a file of descriptions to fix :P [15:13:52] Ah, cool :) [15:14:30] nikki: So tell us about your idea of a historical rank, is there a task about that? [15:14:52] there isn't one yet [15:16:54] and the problem is that people keep misusing the deprecated rank for things which were correct but no longer are, when it should only be for things which were never correct [15:17:57] and I think it's because people expect to edit the statement which is old data to mark it as old and less preferred, not edit the other statements to mark them as more preferred [15:18:45] and because sometimes there aren't any newer statements to mark as preferred (which you can work around by adding a novalue statement, but that's not very intuitive either) or because they don't know which of the newer statements is best, or because there are so many newer statements that it's too much work [15:20:27] so historical would be less preferred but not wrong and wouldn't appear in the query service when using wdt: (since you can't access start/end dates using wdt:) either, like how deprecated works, but with a different meaning [15:21:54] I think it could be useful for identifiers, we have lots of those marked as deprecated because they were withdrawn or merged into another [15:22:32] I assume that's partly because wikipedias don't want to list broken identifiers [15:25:07] which I can understand but it's still semantically wrong... if I have one of those identifiers and want to find out which wikidata item it corresponds to, a deprecated statement means it's *not* a match for that item [15:31:33] nikki: Sorry, I was eating an ice cream with priority "Unbreak now!" :) [15:31:39] ... but reading you at the same time [15:31:43] hehe [15:32:01] nikki: I understand your point, but things get weird/unexpected when introducing qualifiers [15:32:56] How a normal statement A P B start 1910 end 1950 would be different from a historical one with identical value and qualifiers? [15:33:40] I find this a bit tricky [15:34:30] how would that statement with preferred rank be different from the same statement with normal rank? :) [15:34:38] I would expect them to be considered duplicates in both cases [15:36:05] I'm not sure which rank should be the one to keep... I guess it depends on the situation [15:36:38] you could say the highest rank wins, or you could say anything other than normal wins (since normal is the default whereas the others have to be set deliberately) [15:37:44] how about if you have two identical statements, one with rank preferred and one with rank deprecated? >:D [15:38:37] I guess ranks should be used for metadata (we say that [A P B starting 1900 ending 1910] is valid or deprecated) and qualifiers for data ([A P B starting 1900 ending 1910]) [15:38:43] Haha }:) [15:39:01] "two identical statements, one with rank preferred and one with rank deprecated" → this would be inconsistent, I think [15:39:31] We say something is invalid and we say that the same thing is valid [15:39:47] yes, logically it doesn't make sense, but the website won't stop anyone from doing it [15:42:14] Yeah :/ [15:42:41] But, returning to the problem... could this be solved if the UI tell users what's the purpose of the deprecated rank when setting it? [15:42:50] *told [15:43:10] *what the purpose of the deprecated rank is [15:43:11] :) [15:44:26] it might help, but I don't think it would solve the problem [15:45:19] because it would require adding lots of preferred-rank novalue statements to override the old values, and there are still situations where it's difficult to know what to edit instead [15:45:53] and there are various experienced users who use deprecated rank that way :/ [15:46:32] I often feel like I'm the only person who cares about the semantics :( [15:47:41] Oh, no, I hope not :( [15:48:01] What semantics would you propose for each rank when introducing the historical one? [15:50:21] Assuming the current ones are: preferred (statement S is the most relevant/correct), normal (S is correct) and deprecated (S isn't correct) [15:50:24] also there are situations where you know the existing value is out of date, but you don't know what the newer value is. for example, if someone gets divorced, you know that they're no longer married to that person, so when querying using wdt: you don't expect that person to show up as their spouse... but what statement do you add instead if you don't know their current marital status? novalue means they are not married, somevalue means they are [15:53:19] You should use a qualifier (end time or similar), it would be your fault to assume that truthy values are valid at the present time [15:53:41] Too often SPARQL queries don't return what their writers think they return [15:54:11] But I see it as a problem of querying [15:54:30] what's the point of using the truthy values if you have to always use p: in order to check qualifiers? [15:55:49] and there are multiple qualifiers that could mean something is out of date [15:56:06] You reduce the amount of data with which you have to deal, but I agree with you, :) perhaps the idea of "truthy" should be sophisticated [15:57:38] Anyway, the lack of qualifiers shouldn't mean that something is current either [15:59:46] preferred would be the most current or relevant, normal would be neutral (valid alternative values, values which haven't been checked, or simply ones where changing the rank hasn't been necessary), historical would be values known to be out of date and only useful if you're interested in using historical values, deprecated would mean it was never right [16:00:05] the main thing I'm not sure about is when the whole item is something historical [16:02:01] my instinct is that a historical rank would be for values which have been superseded by another value, even if we don't know what that value is (e.g. the spouse situation) [16:02:39] and not this person died, now we have to mark all the statements on it as historical [16:04:10] The present is infinitesimal and constantly moves on, and we shouldn't speculate about the future, so the history is the only thing we should save in Wikidata [16:04:53] Entities about people can also be referenced or modified somehow over time, e.g. they can be awarded post mortem [16:05:44] Or properties about their burials... [16:07:36] Just when you say that something isn't historical, that can change immediately [16:08:33] So I guess there's almost nothing we could say to be current [16:08:39] we can't say for sure that something is current, but we can say that something is definitely not current [16:09:28] That wouldn't make you sure that you retrieve current things when you use wtd:, which was the issue :/ [16:09:56] you can't be sure, no, but if you find something you know isn't, there's also no way to fix it [16:10:05] (well, sometimes there are ways, but not always) [16:13:42] If a historical rank were automatically set when there's an end date as a qualifier, would that be enough? [16:16:12] not sure... it would get weird with historical items [16:16:26] also if you want to catch all of them, it would need to be a bit more sophisticated (another example I've used is a population statement for 1900 and two for 2010... the 1900 one would clearly be less preferred than the 2010 ones even if you don't know which of those two should be marked as preferred) [16:16:36] or perhaps someone used a different qualifer [16:17:44] and people don't usually add end date qualifiers to old identifiers [16:22:30] Pff... I don't know, the problem exists but the solution isn't perfect :) [16:23:13] I would rather prefer to keep people informed about what's right and what's wrong, and about the purposes of the current ranks [16:24:36] But hey, nice approach anyway :) [16:25:41] Maybe it's me who's wrong and a new rank is the best solution [17:07:45] Heh [17:08:16] "population change measured by a new census" should have end time... but when :D I mean, the population didn't stop being exact the day a new census came out [17:08:24] But probably the day after the census :D [18:03:40] Sup channel [18:03:50] I wanted to discuss this page: https://www.wikidata.org/wiki/Q694219 [18:04:01] Is that proper use of the label system? [18:04:49] :O [18:04:52] It's the only educational organization with more than 23 labels [18:05:01] 52 labels, to be precise [18:05:35] machine translated labels [18:05:43] at least Finnish is nonsense [18:06:11] Yep, portuguese ones aren't very good either. But the worst part is that they're all stuffed as english labels [18:06:28] let me fix [18:07:22] https://www.wikidata.org/w/index.php?title=Q694219&diff=698223974&oldid=688905153 [18:08:32] Looking at the resource history, it was born this way created by a bot. Tunyk did some editing, but didn't touch those 52 english labels [18:09:06] ah you mean those aliases [18:09:22] OOOOpsss sorry, yeah. The aliases [18:09:38] label is the unique one. I meant that it had 52 english aliases [18:10:30] see https://www.wikidata.org/w/index.php?title=Q694219&diff=351285865&oldid=351282890 [18:11:20] I'm not sure which of those 52 English aliases should be removed [18:11:27] but it's surely too many... [18:11:46] Gosh, it already had too many and he added even more. Didn't catch that in the history page [18:14:38] Well, for starters I think I can remove all english aliases that are not actually in english [18:15:40] yep it's a good start [18:32:36] It's very frustrating the inability to reorder aliases :-| [18:36:12] I reduced it to the best of my common sense and good intents: https://www.wikidata.org/wiki/Q694219 [20:45:32] is there a way to filter the precision of the date in a query? [20:48:22] something like http://tinyurl.com/ybc7recx [20:49:20] (and the meaning of the different numbers is explained on https://www.mediawiki.org/wiki/Wikibase/DataModel#Dates_and_times) [21:00:10] nikki: nice, but how can I filter it to only get results for a specific precision like 9? [21:00:56] change ?precision to 9 [21:01:28] ah :) [21:02:53] this stuff is so complicated for my old brain (: [21:03:10] good job there's help at hand then :D [21:03:45] i am very grateful [21:26:06] wow, quite a lot double entries o_o