[01:56:17] i can't create any new labs instance for maps-team project for the past few days. Help? [02:51:04] YuviPanda, ping [03:25:58] 6Labs, 10Tool-Labs, 10DBA: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876032 (10Beetstra) Please revert this. This is effectively killing the hole anti-spam effort on Wikipedia. The bot needs multiple user connections into the database. [04:58:50] 6Labs, 10Tool-Labs, 10DBA: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876099 (10Billinghurst) Are we able to backtrack to see when this occurred? The bot has been running for years, so if it is a new phenomenon then maybe we can explore that iss... [05:01:23] 6Labs, 10Tool-Labs, 10DBA: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876102 (10Jalexander) It would indeed be good if we can figure out what is causing this instead of just throttling. It's been running for ages so causing massive issues is new... [05:54:12] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876115 (10Billinghurst) [07:57:53] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pengo was created, changed by Pengo link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Pengo edit summary: Created page with "{{Tools Access Request |Justification=Hi I'm an admin on enwiki, and I've been heavily involved with Wiktionary too. Some projects I've created outside of the toolserver:..." [07:57:53] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pengo was created, changed by Pengo link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Pengo edit summary: Created page with "{{Tools Access Request |Justification=Hi I'm an admin on enwiki, and I've been heavily involved with Wiktionary too. Some projects I've created outside of the toolserver:..." [10:10:20] is there any reason why I shouldn't use Redis to cache some simple DB-queries when running perl-based importscripts ? more to the point: i'm reading the page-table on a given wiki each time the script is called.. would a redis-call be quicker? [10:27:33] 10PAWS, 5Patch-For-Review: PAWS network error: - https://phabricator.wikimedia.org/T120561#1876200 (10yuvipanda) This still is happening, but only on a per-node basis, triggered randomly?! I can reproduce this by just setting up a container in docker (independent of kubernetes) [10:42:47] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876207 (10yuvipanda) @Billinghurst a more respectful tone towards the only person who is keeping labsdb up and running (and the rest of our databa... [11:22:07] YuviPanda: could you take a look at https://phabricator.wikimedia.org/T121313 or find me someone I could bug about it? :) [11:48:27] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876227 (10Beetstra) @yuvipanda, @jcrespo - with all respect, this has just completely brought the complete Wikipedia anti-spam effort to a near ha... [11:56:30] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876228 (10jcrespo) Not performing this action would have mean all bots and utilities using toolsdb would be down due to massive CPU consumption. I... [12:01:29] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876233 (10Beetstra) @jcrespo .. what issue? What query is making this happening. It can't be the couple of hundred of usual insert queries that... [12:03:12] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876234 (10Beetstra) @jcrespo - I receive complains every time there is a CPU/IO spike from other users - that means that you knew for a long time... [12:11:27] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876242 (10jcrespo) This is why this issue is not closed- we can figure it out. We cannot even query the server or debug the issue if it is at 100%... [12:25:03] jynus: it's in /data/project/linkwatcher on toollabs (readable for all), but a grep * -i -e 'select' doesn't immediately show what's could be the culprit [12:28:00] this is the historic of CPU usage: https://grafana.wikimedia.org/dashboard/db/server-board?panelId=7&fullscreen&from=1442230023350&to=1450009383350&var-server=labsdb1005&var-network=eth0 [12:28:59] if we do not put a limit or separate the host elsewhere, we can end up with https://phabricator.wikimedia.org/T119604 [12:29:43] this is toolsdb, right? [12:30:20] the last issue, yes [12:30:35] I worry about all users in general [12:30:42] yes, of course :-) [12:31:00] even if that means one has to suffer for a bit until the problem is fixed [12:32:12] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876247 (10Beetstra) Well, with one connection the bots cannot run. LiWa3 uses something like 50 parallel processes (to keep up with the 600+ edit... [12:33:20] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876248 (10jcrespo) Could it be an increase in traffic/usage? (not your fault) [12:34:53] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876249 (10jcrespo) If you mean that the usage was a one-time spike, I can reenable normal resources, but I will revert if the problem persists, ok... [12:36:48] jynus: one thing that could be useful for monitoring is to dump a list of all running queries every N seconds. Would that be reasonably possible? [12:37:19] for debugging? [12:37:48] we could setup something like that, but that requires maintenance [12:38:03] what we have now is CPU and server time usage per user [12:38:20] yeah, for example to report on e.g. long-running queries [12:38:33] but that is very relative: small, many queries do not hurt other users [12:38:45] exactly, long ones are the issue [12:39:09] I would ask however, if people in general are ok with a "shame list" [12:39:14] and for detecting long-running queries, just sampling a list of all running queries every N seconds should work [12:39:24] we already do that [12:39:34] just it is not published [12:39:58] Ah, ok! No, I wouldn't publish it, but that means we can probably pinpoint what the issue in linkwatcher is from that? [13:08:14] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876254 (10jcrespo) Let me give you specific examples of problematic queries: means I left some strings out due to privacy concerns (u... [13:08:24] valhallasw`cloud, https://phabricator.wikimedia.org/T121094#1876254 [13:08:48] jynus: that's great! [13:09:38] question is, I cannot even do that for production most of the time, there should be someone helping tools users [13:09:53] which you are, and I am happy [13:10:33] Do I have access to those logs? [13:10:45] maybe, contact me privately [14:10:58] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876273 (10Beetstra) Let me have a manual look at the 'offending queries' one of these days .. see if I can reproduce. When WikiData started I had... [14:24:33] 6Labs, 10Tool-Labs: Install Perl module Redis. - https://phabricator.wikimedia.org/T121341#1876278 (10Stigmj) 3NEW [14:28:46] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876286 (10Merl) [14:29:38] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1876288 (10Merl) [14:30:48] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876289 (10jcrespo) Thanks for taking the time to fix the issue. I've allowed 10 simultaneous connections for the user for now: ```GRANT USAGE ON... [14:37:06] jynus: I have some imports running now. Any chance you can check if there are any resource-issues with the DB's due to this before I let it them run unattended? [14:38:16] hello [14:38:33] thanks, Stigmj, can you remind me your user? [14:38:57] jynus: s52721 [14:39:55] jynus: the scripts have run for about 4 hours now already [14:39:59] can you suggest any documentation about database schema other than https://wikitech.wikimedia.org/wiki/Help:MySQL_queries ? [14:40:49] HakanIST, https://www.mediawiki.org/wiki/Manual:Database_layout [14:41:09] thanks [14:41:26] jynus: I have some issues with my lockfiles (race conditions) which makes the scripts a bit more aggressive than usual, that's why I'm worried. [14:42:47] well, there is a failover slave, that due to an unrelated maintenance was behind, up to now it is catching up nicely, but it is still 15 hours behind [14:43:29] I do not see any problem, but do you have any recommendation, should it become a problem? [14:43:44] as to stop a particular process, etc.? [14:44:28] jynus: to make the scripts shutdown cleanly, just do a "touch /data/project/pagecount/stop-total" [14:44:32] I only take measures when the server availability is in danger [14:44:52] that'll kill all the jobs [14:44:54] I am a bit more picky since the recent double server failure [14:45:19] thanks, Stigmj, hopefully it will not be necessary [14:45:34] I will reopen the ticket if there is any issue [14:45:45] excellent. :) [14:49:32] jynus: btw. I think the main "issue" with this particular import would be that it does a "SELECT page_id, page_title, page_namespace FROM page" from the nowiki_p every time the script is called.. which takes about 20 seconds every time. [14:50:07] jynus: and I'm seeing this is called f.ex at these times https://pastee.org/87apt [14:50:14] jynus: too much? [14:51:12] Stigmj, do not worry too much until I worry [14:51:35] jynus, ok.. [14:52:15] The algorithm is: "are users complaining that the servers is slow OR it is next to crashing" -> take action [14:54:03] a single connections is not usually a problem, but things gets saturated when multiple connections are doing slow queries at the same time [14:55:28] imports are a bit problematic because they tend to be slow to trasmit over replication, to the point that it is usually better to import separately to each server. I will talk to you if it was necessary, not for now [14:57:12] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876317 (10Merl) Because my bot (dbuser s51826) also runs queries on linkwatcher_linklog if have checked statistics of my script. Mainly to exclude... [15:03:09] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pengo was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=227121 edit summary: [15:03:09] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pengo was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=227121 edit summary: [15:12:35] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876321 (10jcrespo) @Merl, I only throttled user s51230. I did not touch s51826, nor I did see any problems with its queries. It is true that if My... [15:50:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Lebronj23 was created, changed by Lebronj23 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Lebronj23 edit summary: Created page with "{{Tools Access Request |Justification=Host a website helping the creation of references (for the French wikipedia). So far the website is hosted here: http://refswikipedia.toi..." [15:50:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Lebronj23 was created, changed by Lebronj23 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Lebronj23 edit summary: Created page with "{{Tools Access Request |Justification=Host a website helping the creation of references (for the French wikipedia). So far the website is hosted here: http://refswikipedia.toi..." [15:51:50] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Lebronj23 was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=227137 edit summary: [15:51:50] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Lebronj23 was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=227137 edit summary: [15:52:03] * valhallasw`cloud eyes wm-bot [15:52:14] petan: ^ it seems to duplicate every change? [20:08:22] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876525 (10Krenair) >>! In T121094#1876227, @Beetstra wrote: > the WMF (this is not the first time that unannounced and undiscussed actions from WM... [23:36:39] anyone knows about disk space for vagrant for /srv partition? mine has gotten full for some reason (( [23:36:50] bd808 or YuviPanda ? [23:37:07] i think it happened during labs-vagrant git-update [23:37:36] probably too much service crap :) [23:38:04] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876795 (10Billinghurst) >>! In T121094#1876207, @yuvipanda wrote: > @Billinghurst a more respectful tone towards the only person who is keeping la... [23:39:20] yurik: is this an instance that is still using the old labs-vagrant setup? [23:39:39] bd808, that is a possibliity - i created it about 4 months ago [23:39:44] I think that role mounted the secondary volume on /srv and put everything on it [23:39:47] there is someone responsible for the xtools here? [23:40:06] bd808, i would love to create a new one, but all "create instance" fail for maps-team (( [23:40:24] xtools is inactive... [23:40:30] or is there an easy way to fix it? [23:41:59] yurik: create instance fails because you have used up all your cpu quota -- https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&action=displayquotas&projectname=maps-team [23:42:43] bleh, i wish the failure screen would have told me that - the 3 day suffering :((( [23:42:49] thanks! [23:43:01] is there a way to reduce cpu count? [23:43:13] delete instances [23:43:30] i will do that too, but if possibel, i would like to reduce the number of cores [23:43:40] because we only have 6 instances [23:43:43] and 20 cores [23:43:47] seems excessive [23:43:55] you can't resize an existing instance as far as I know [23:44:10] core resizing is not supported? [23:44:20] should be much easier than HD space [23:45:18] clear-tables and kertotherian1 are the hogs. Both are 8 core images [23:46:03] 6Labs, 10Labs-Infrastructure: On create instance failure, give link to quotas page and explain the error - https://phabricator.wikimedia.org/T121368#1876798 (10Yurik) 3NEW [23:46:32] i will poke max about it, thanks! [23:46:47] still, hope we can easily reduce the count without a reinstall [23:47:15] Le0n: you can find the admins for xtools on the https://tools.wmflabs.org/ landing page. Cyberpower678 and MusikAnimal are on irc sometimes but they aren't in this channel right now [23:47:41] yurik: I think that is a lost hope. There's no UI for doing that in wikitech [23:50:24] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10Labs-Infrastructure: On create instance failure, give link to quotas page and explain the error - https://phabricator.wikimedia.org/T121368#1876806 (10Reedy) [23:56:47] bd808, well, maybe an admin could do it unless i can get max to rebuild these instances [23:59:59] Well, I'm walking here, thanks for the help