[00:00:16] harej: not as far as I can tell [00:00:28] still 60 seconds [00:00:46] interesting, i saw a query that took 65000 ms to complete, but maybe that includes network lag [00:01:06] I’ve had that happen a few times before [00:01:28] also: journal article related stuff is very difficult to query for now that we have so many of them :s [00:01:34] yeah :D [00:01:43] (i disclaim any and all responsibility) [07:50:50] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1954 bytes in 0.149 second response time [07:55:50] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1950 bytes in 0.149 second response time [09:33:05] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1950 bytes in 0.143 second response time [09:34:40] ACKNOWLEDGEMENT - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1950 bytes in 0.143 second response time Volans Populate term_full_entity_id on www.wikidata.org T171460 [09:54:31] Hi, where can I find the RDF/Turtle dumps from six months ago? [10:53:13] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1933 bytes in 0.145 second response time [11:29:44] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1950 bytes in 0.090 second response time [11:39:31] Ash_Crow: I believe that InternetArchive has some old dumps [11:41:02] Ash_Crow: E.g. https://archive.org/details/wikidata-json-20160125 [11:41:48] Ash_Crow: The full list: https://archive.org/search.php?query=creator%3A%22Wikidata+editors%22&sort=-publicdate [11:44:00] Amir1: Db lag seems to be high again.... [11:44:25] multichil: It's okay for now. I'm on it. We need it for some refactoring [11:44:35] My bot stopped editing 23 minutes ago because of it [11:48:23] multichil: don't we prefer the external id property over reference url for things like https://www.wikidata.org/w/index.php?title=Q2222804&diff=prev&oldid=544671458? [11:48:46] (at least that's what we claim on https://www.wikidata.org/wiki/Help:Sources#Databases) [11:49:51] I always use reference url. I don't trust that help page. [11:50:07] I remember it containing all sorts of nonsense and useless information [11:51:15] multichil: Can we slow down? I know it's lots of things but this refactor is actually blocking several important things like structured images metadata and stuff [11:51:43] Who can slow down what Amir1? [11:51:57] all bots [11:52:11] All bots that respect maxlag just stopped editing [11:52:27] Look at the big gap at https://www.wikidata.org/wiki/Special:Contributions/BotMultichill [11:52:50] It's the Quickstatements users that keep pooping out 60 edits a minute [11:53:17] Amir1: https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard#Maxlag_parameter_not_respected [11:53:48] Thank you [11:54:01] https://www.wikidata.org/wiki/Special:Contributions/Muhammad_Abul-Futooh <- like this guy [11:54:29] https://www.wikidata.org/wiki/Special:OAuthListConsumers/view/e096baf9fd24cfe180275b52518d7403 can be set to disabled. That will kill *all* edits [11:54:45] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1952 bytes in 0.129 second response time [11:54:54] Or remove the high volume activity grant? [11:56:04] yeah, I think that's possible. Who should approve that? Lydia_WMDE? [11:56:19] https://www.wikidata.org/wiki/Special:ListGrants#highvolume only works if a user has the right afaik [11:56:24] https://www.wikidata.org/wiki/Special:OAuthManageMyGrants/update/33429 [11:56:38] It's not that anyone can magically delete pages when they use widar [11:57:04] Amir1: ^ [11:58:15] maybe we should take the diplomatic solution first and notify the developer to put throttle and then remove it if they didn't comply? [12:02:43] I think it's also causing other issues like https://phabricator.wikimedia.org/T173710 [12:23:09] tarrow: https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard#Maxlag_parameter_not_respected [12:26:01] multichil: who does the most edits right now? [12:30:10] Amir1: My guess would be https://www.wikidata.org/wiki/Special:Contributions/Muhammad_Abul-Futooh , https://www.wikidata.org/wiki/User_talk:Muhammad_Abul-Futooh looks like he has been warned and blocked before [12:32:31] Amir1: Are you still running the job because it seems like my bot has resumed editing [12:32:50] no I stopped the job [12:33:06] you can find clues here: http://wikidata.wikiscan.org/?menu=live&date=6&list=users&sort=weight&filter=all [12:33:45] Lydia_WMDE: Do you want to warn https://www.wikidata.org/wiki/User_talk:Muhammad_Abul-Futooh ? [12:34:20] Amir1: if he has been warned before i think more is in order [12:34:24] and that'd not be up to me [12:34:36] okay [12:34:46] let me check and issue a block if needed [12:41:22] Lydia_WMDE: Thanks, I'm thinking about it now. I think the best thing to do is to offer an option to users to respect maxlag (i.e. let us know they aren't waiting for the result) [12:41:47] tarrow: this should not be an option [12:41:55] not respecting the lag is not acceptable [12:42:10] it is ausing all kinds of issues for us right now [12:42:15] and for the community it sucks as well [12:42:28] Sure, sorry about that [12:42:36] yeah - it happens [12:43:05] * Lydia_WMDE doesn't know why this isn't enforced on the mediawiki site instead of each tool having to do this... [12:43:32] I thought that according to https://www.mediawiki.org/wiki/API:Etiquette I should only be looking at maxlag if the user isn't waiting for a result [12:43:50] In fatameh the user *is* waiting for a result (generally) [12:44:01] this one does not seem to be :D [12:44:38] ah, that sucks. Is it so bad I should be taking the tool offline for a bit? [12:44:57] depends if amir blocks them [12:47:41] Who knows how quickstatements handles maxlag? I guess we (I) should do it in the same way? [12:48:15] idunno [12:49:49] Hi [12:53:42] Ah, I see! Neither QS nor fatameh respects maxlag. I'm guessing (but not sure) that generally OAuth tools aren't looking at it and are assuming that the users will respect it themselves. [12:57:57] thanks Amir1 :) [12:58:52] It actually looks (a bit) like the fatameh users did this themselves yesterday: https://grafana-labs-admin.wikimedia.org/dashboard/db/fatameh?orgId=1 but they probably haven't dropped back enough [13:01:24] my job is also putting pressure, I guess I need to wait until the sleep argument gets deployed. It's not easy to backport it :/ [13:02:07] well, it gets deployed tonight so meh [13:02:08] sleep argument? [13:02:57] I'm talking about T171460 [13:02:57] T171460: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460 [13:05:15] most item creations seem to be by https://www.wikidata.org/wiki/User:Research_Bot [13:05:36] has this been going through a community discussion that is suitable for this amount of new items? [13:06:16] Amir1: how big batches do you run with that script? [13:06:40] leszek_wmde: one k, it's a good idea to reduce it [13:07:18] Amir1: and that is 1000 and no time for dispatcher to catch up etc, and another thousand right away? [13:07:57] yeah but my sleep argument fixes that which is not deployed yet [13:08:20] yeah got it Amir1. I guess might be worth stopping it indeed [13:08:42] yeah [13:16:56] Lydia_WMDE: Is there anywhere I can read about what OAuth tools in general do about maxlag? Because they aren't quite like bots. For example if User A is making 100 edits/min they should probably respect maxlag but if user B is making 1edit/day and their expecting an interactive response the tool may become totally useless to them. If I understand right they (user B) shouldn't respect maxlag (just like users making a [13:16:56] single edit from the UI on wikidata.org don't respect it). [13:19:43] i am not sure sorry [13:25:34] I'm not sure that to always respect maxlag is the best solution. I will look at introducing it to fatameh (at least to stop the problems we have now) but if QS or fatameh were to introduce "always respect" (at say 5s) then whenever the lag goes above this those tools will effectively be totally broken (until the lag drops). Perhaps this is fine but I think that would suck for the tool users. Fatameh doesn't have a [13:25:34] queue and AFAIK neither does QS so all the tool would be able to do is error at the user (even a very light user who is effectively editing manually). [13:26:32] I.e. it would be equivalent to disabling the tool and then turning it on again when lag drops [13:36:25] Lydia_WMDE: I made a ticket for sorting this. T173921 You can let me know there if it sounds like a suitable solution from WMDE's end. [13:36:25] T173921: Help users respect maxlag - https://phabricator.wikimedia.org/T173921 [13:37:23] tarrow: i'll discuss it with the team [13:41:21] :) thanks! I don't want to cause problems. When we come to a solution we should add it to the OAuth tools page. I think this may be a problem that hasn't come up before so there is no documentation for how OAuth tool writers should deal with it. [14:53:12] tarrow: magnus had similar issues in the past [14:53:18] and i believe he solved them [14:53:32] or at least his tools are no longer as bad [14:53:38] in that regard [15:13:25] Lydia_WMDE: in the admin noticeboard quickstatements was listed as a worse offender. I'm sure we can come up with a good solution for both though. I think the key thing is to separate out users who are and aren't waiting. [15:13:58] *nod* [15:14:10] maybe magnus fixed another tool then. hmmm [15:17:02] Really it should also be the concern of the bot operator/user but the tools should be tuned to make it as easy as possible for them to be good citizens. [16:49:33] do we know what the most common reference on Wikidata is? [16:49:34] “imported from: English Wikipedia” is used twelve million times (12161850×; see http://tinyurl.com/y7jnss8u), which actually doesn’t feel like *that* much to me [17:22:08] I see that there's an Issue open for respecting maxlag in WikidataIntegrator, which is what I use [17:23:15] hey folks, our research showcase today features Andrew Su, from the Gene Wiki Project [17:23:28] live stream starts in an hour (or you can watch it later): https://www.youtube.com/watch?v=Fa0Ztv2iF4w [17:26:54] https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#August_2017 [17:42:33] * Lydia_WMDE gets panic attacts when looking at https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel?refresh=30m&panelId=3&fullscreen&orgId=1&from=1440351665808&to=1503510065808 [17:46:18] harej: yes, it doesn't seem to respect max lag. This means you need to think about that yourself. Unfortunately I'm travelling but whoever gets time first would be eternally loved for writing the patch which sets a user configurable maxlag (you just need to tack it on to the end of every API call) [17:51:19] Is this the correct usage of P971? https://www.wikidata.org/wiki/Q7766663 [17:56:39] Lydia_WMDE: yep, and I wonder how MariaDB will scale in the future (as it cannot scale out of the box...) [17:57:21] Envlh: yeah... [18:09:38] Lydia_WMDE remember that graph isn't 0 at the axis, so it looks worse than it is [18:10:15] https://usercontent.irccloud-cdn.com/file/vPorbQHH/image.png [18:10:18] addshore: it is still the steepest increase by far we've ever had except maybe the very beginning [18:10:41] Lydia_WMDE: yup [18:10:56] and it is not backed up by additional tools and usage of the same magnitude [18:11:02] or a huge increases in editors [18:11:03] Envlh: we still have a long way to go before we run into issues there I think [18:12:47] addshore: aren't we already hitting replication lag? [18:13:42] I wouldnt say thats a mariadb scaling issue though [19:17:07] Since I stopped editing, has the replag recovered? [19:17:31] harej: https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?orgId=1&var-dc=eqiad%20prometheus%2Fops [19:18:39] I'm not sure how to read those charts [19:40:11] DanielK_WMDE_: gentle ping about https://gerrit.wikimedia.org/r/#/c/327862/ [19:43:22] SMalyshev: :O [19:43:26] I hadn’t seen that [19:43:45] you're welcome to see it now :) [19:44:20] thanks :D [19:44:56] so you’re still pursuing this even though MWAPI is now available? [19:49:20] yes, I think productizing MWAPI would be much more complex task [19:49:32] and I want this on production servers, so we could use it in search [19:49:59] or you mean MWAPI service call? then it's a different thing [19:50:22] I'm not sure mediawiki API even has this ability [19:52:26] ok [19:53:32] WikidataFacts: this is mailnly so you could do queries like "give me all child categories of this one" or "give me all parents", etc. [19:53:53] SMalyshev: will you also include the relationships between individual articles and their categories? [19:54:03] or just parent categories of a category? [19:54:33] WikidataFacts: right now just categories [19:54:47] ok [19:55:03] I imagine that’s a more manageable number of triples [20:00:59] WikidataFacts: You we asking about Commons references? I am killing a lot of them :-) [20:01:16] was I? [20:01:36] was it the question of commons category vs. commons sitelink? [20:02:49] SMalyshev: See you are working on categories! :-) Would it be possible to include the Wikidata link or is that a lot of overhead? [20:03:06] (didn't see it in your owl file) [20:03:44] multichil: wikidata link to the category? hmm. if it's in wikidata it is probably linked via sitelink? [20:04:12] but categories will be in separate db, at least for now (easier to manage this way) [20:04:17] https://commons.wikimedia.org/wiki/Category:Haarlem has a link to https://www.wikidata.org/wiki/Special:EntityPage/Q7427769 [20:05:34] yeah so in https://www.wikidata.org/wiki/Special:EntityData/Q7427769.ttl you have sitelink - schema:about wd:Q7427769 [20:05:38] you can use that [20:06:36] Is it going to be in the same blazegraph thing? [20:06:52] Than it would probably be a bit redundant I guess [20:09:26] SMalyshev: Including the page id would probably turn out a bit useful in the future when wanting to combine the data. Including the fact if the category is hidden or not would be extremely usefull [20:10:20] multichil: hmm ok could you make a task about hidden ones? I'll probably won't change it with existing patch but can definitely add it later [20:11:15] Sure [20:11:25] I think you're fixing a really old task of me ;-) [20:11:30] multichil: same about page id (please add the argument about why it's useful). I want to keep it simple for now while we're still building it but definitely can be open to additions once minimal one works [20:12:03] SMalyshev: https://phabricator.wikimedia.org/T110833 :-D [20:13:20] ah yes :) so once we load the cats into blazegraph, you probably would be able to do all kinds of nice things [20:22:11] SMalyshev: https://phabricator.wikimedia.org/T173980 , haha, got a nice tree now ;-)