[04:20:01] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1117 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [13:07:05] PROBLEM - puppet last run on wdqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:38:45] RECOVERY - puppet last run on wdqs1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:30:12] I'm having trouble understanding what does and doesn't constitute notable data according to https://www.wikidata.org/wiki/Wikidata:Notability . As a hypothetical, if I were the owner if IMDB and wanted to publish all the films and actors I had stored to Wikidata - some of those entries would not meet #1 criteria (because they haven't got an article [16:30:12] in another namespace), but they are are clearly identifiable entities (#2) and they would all have properties associated with them so even if they don't immediately contribute to an existing query, they are available to augment future queries against the dataset [16:44:31] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/08#All_characters_from_all_movies? appears to at least partially address this concern with an example where somebody has included a very large cast list for a film and the consensus was in line with my appreciation of #2 and #3 [16:48:04] it's hard to define exactly what counts for the second criteria, so it tends to be decided on a case by case basis. it's usually enough if the item has multiple identifiers which aren't user-created or user-editable (i.e. people can't add themselves as self-promotion or a hoax) [16:48:31] fixol: If it's quality data (complete, accurate, consistent, timeless enough, etc.) and mainly has a sufficient number of good references, for me it's great that you include it :) [16:48:42] And +1 nikki [17:04:19] To extend the hypothetical, some of the films in the database would be large enough to be recognised in most any vaguely related database so mapping my dataset to identifiers in third-party databases would be possible; and I can see the value in doing that where possible. But there would certainly be items in my dataset that I would not be able to [17:04:19] directly map to another third-party dataset, but these items would still be mapped to other items (publishers, directors, etc) so would not be completely "orphaned" as such [17:06:06] in the case of those items, providing much more than first party references might not be possible - at least at scale. Would this mean I could only offer a subset of records that other databases also have available? [17:07:09] for the third one, if the item is used by another (already notable) item in a statement which has a reference, that should be enough, since you can't enter the information without having the item [17:08:21] if there aren't any references, it will probably still be ok, but we don't encourage that :) [17:12:07] By the item being used by another item meeting notary criteria, is that relationship one-way? For example, a B-Movie which itself is not notable, but has complete information about production and available, but it was published by a company that itself met notability criteria would not itself be notable because that relationship is not bi-direction [17:12:07] al? [17:16:03] hmm... I'm not sure it's entirely one-directional, but it's harder to argue against deleting an item which isn't being used by other items [17:16:31] for example, if you delete the company in that example, the statement on the film saying what the production company is would break [17:16:45] but if you delete the film, the company item wouldn't change [17:18:24] but you could argue that the company is notable, therefore it should be possible to list the films they made via a sparql query, but that's a less convincing argument [17:26:59] I see, so is this a particularly contentious scenario? Is it the sort of thing where there is a lot of having to defend complete data that is only currently useful in Wiki* projects by enhancing other metadata? [17:29:20] Seeing as the main English Wikipedia doesn't currently have list-class articles or subsections driven by Wikidata. Having a "List of Films Published by ACME Corp" or "Filmography of John Doe" sections would be the best examples of usecases [18:18:24] do edit filters for wikidata exist? [18:20:26] they do, but they don't work idealy [18:25:05] for over a month, the entry on the Prime Minister of Portugal called him "MAIOR CORRUPTO DE PORTUGAL" [18:31:17] I still think blocking anonymous editing for Spanish speaking languages is the most efficient thing to do. [18:31:51] his item has two active watchlist holders :| [18:36:45] the person who came asking about that said they were a representative of CERT.PT [18:38:15] wanted to know if they could share the diff with the police [18:51:11] Hi [18:51:18] there we go [18:51:33] sjoerddebruin - can you answer some questions about wikidata for this person? [18:51:45] If they give me the questions, of course :) [18:52:17] In a nutshell [18:52:30] Page of PT Prime Minister vandalized [18:52:38] https://pt.m.wikipedia.org/wiki/António_Costa [18:52:57] Only seen if you access it from a mobile device/browser [18:53:35] The problem is a tag change from "Primeiro Ministro de Portugal" to "MAIOR CORRUPTO DE PORTUGAL" [18:53:46] Yeah i know the situation [18:54:32] We see the revert of this but the problem persists. Even after wiping the cache [18:54:58] I don't see the vandalism anymore... [18:55:28] hm [18:55:59] sjoerddebruin - i wonder [18:56:07] what if we were to impose another edit on that tag [18:56:16] or, hm [18:56:35] am I making sense here? [18:56:51] It's some cache issue, not sure if server or local. [18:57:39] Can't be local. It's not just me seeing. Called to another person who didn't saw the page yet [18:57:45] I think it is local; I do not see the vandalized description as well [18:58:01] But to give some idea why this kind of vandalism is hard to find: we got 92.000 edits done by anonymous users last month. That'll require volunteers to check at least 3000 edits a day in various languages. The people who mostly edit Wikipedia use desktop, so they don't see these descriptions sadly. [18:58:05] Local [18:58:11] Sorry ... [18:58:29] We only patrol like 500 edits a day sadly. :| https://www.wikidata.org/api/rest_v1/page/graph/png/User%3ASjoerddebruin/0/734baac4c899a80fc967659dcace1db19981f28a.png [18:58:49] No doubt about your tremendous effort [18:59:46] Can local be ISP cache? I tried accessing through two different ISP and no changes [19:01:09] can you open the page with another browser? [19:01:28] or logged out in a private tab? [19:01:49] let me check again [19:02:14] Are you sure that it isn't server cache? You probably have dozens of them [19:02:43] if it was server cache, we would have the same [19:03:08] Ok. Makes sense [19:03:31] Let me check another thing. Going to try from another IP (outside PT) [19:04:40] Tried from a Netherland IP and I see the same. Used hide.me [19:05:15] I wonder [19:05:19] I think the version is cached in your computer [19:05:20] when was this first reported to you? [19:05:51] This morning [19:05:56] It does not matter which IP or ISP you use, it always loads from the same local cache [19:06:06] Thus I was recommending using another browser [19:06:17] I can see "MAIOR CORRUPTO DE PORTUGAL" too [19:06:25] It's not you, CSIRT [19:07:04] abian - where are you located? [19:07:08] This might be https://phabricator.wikimedia.org/T207651... or not exactly [19:07:11] Dragonfly6-7: Spain [19:07:13] Listen. If the name of the PM is spelled "António" - watch the "o" the text is still there. If just Antonio then it's correct [19:07:35] "António Costa" [19:07:47] That when the nasty words appear [19:07:49] ....... oh! [19:07:54] in that case [19:08:11] but this wasn't a revert, abian [19:08:48] Oh, indeed [19:09:26] the insult appears if the accent is correct, but not otherwise? [19:09:37] Yep [19:09:44] hm [19:09:49] sjoerddebruin - where are you located? [19:09:54] nl [19:10:00] PT [19:10:10] you don't see it, but users in .pt and .es do [19:10:10] hm [19:10:11] I don't understand the issue about the accent [19:10:24] There wasn't any change in that sense in the last days I think [19:10:51] Does anyone see it without the accent? [19:11:00] On Wikidata? [19:11:14] what if we do a null edit [19:11:53] It's fixed!!! [19:11:53] Did some edit [19:12:04] sjoerddebruin - you did a null edit? [19:12:14] You can't do null edits on Wikidata. [19:12:18] ah [19:12:20] You have to do a constructive edit :P [19:12:24] okay [19:12:50] I made already two; do you see a difference on the ptwiki page? [19:13:06] That makes it really hard, because sometimes the amount of statements isn't updated for disambiguation or category items which are mostly already perfect [19:13:23] I see it we'll now [19:13:38] well [19:13:40] CSIRT - you mentioned earlier that the police might get involved with this? [19:14:02] I guess. Because it's the PM then it's a crime/offense [19:14:18] Well the IP is visible so they can contact the ISP. [19:14:19] They know about this already. They contacted us [19:14:25] * Dragonfly6-7 nods [19:14:42] yes, but they well probably ask you for evidence on the changes [19:14:52] Can you preserve them? [19:15:08] everything is preserved by default [19:15:08] Logs and stuff [19:15:15] Great [19:15:25] one moment [19:17:11] CSIRT - check your msg [19:18:00] Amir1: I think https://www.wikidata.org/?diff=871052009 could have been detected by ORES, maybe you could have a look or use that example somehow for training? [19:18:55] abian: thanks for telling me. I'll check. It definitely should. [19:19:11] we still need help with labeling edits :" [19:19:12] :| [19:19:41] Amir1: Thanks! :) [19:19:48] https://labels.wmflabs.org/stats/wikidatawiki/ sigh [19:20:17] sjoerddebruin: But it's a pity that this kind of bad edits is unusual if we pick edits randomly [19:21:08] Maybe it would be useful for Amir1 if we send him some problematic edits directly too (?) [19:21:19] both would help [19:22:04] I'm planning to use to some new technologies in machine learning to improve our accuracy but it takes some time and I have a million things on my plate already... [19:22:18] Guys, I can't thank you enough for all the help [19:22:44] Please visit us if you come to Portugal. Just look for CERT.PT [19:23:01] Beers on the house [19:23:32] That would be awesome :D [19:23:50] You're very welcome [19:27:11] CSIRT: is the problem fixed now? not clear to me [19:27:56] Yes it is. For now ... :) [19:28:13] Election period, so I'm guessing it might come back :( [19:29:21] I am *SHOCKED* that someone might feel that the leader of their nation is corrupt. [19:30:22] CSIRT: When does the election period finish? [19:31:40] We can protect the Wikidata entity until it finishes (although this has some disadvantages, we may be preventing good edits too) [19:34:17] election day will be in October, according to enwiki [19:34:42] Yep. But there are EU elections next month [19:34:53] So it's a full year with elections [19:35:21] oy [19:35:36] well, this person didn't edit *much* [19:36:00] * MisterSynergy is still not a fan of these precautious protections [19:37:37] Guys, need to go. [19:37:53] Again, many thanks for all the help [19:38:38] Wasn't expecting so much support. Will give feedback of this to my whole organization. [19:38:39] I've protected the entity for three months, but we'll keep an eye on it and reprotect if vandalism returns [19:38:48] Btw. there is a user script "WikidataInfo.js" by User:Yair_rand which displays Wikidata descriptions in the Wikipedia desktop view; might be useful to see such problems quicker [19:38:52] Cheers [19:39:09] Thanks, CSIRT, read you :) [19:39:30] Tks abian