[02:11:16] !log tools.tb-dev Removing myself from maintainer list (cc T179599) [02:11:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tb-dev/SAL [02:11:20] T179599: Adoption of tb-dev Red Link Recovery tools - https://phabricator.wikimedia.org/T179599 [13:59:59] Hi all, is commons beta database replicated at toolsforge? [14:46:33] Urbanecm: Don't think any of the beta databases are [14:56:10] Reedy, ok, thanks for info [22:57:51] toolforge has heavy lags, do you know? [22:59:35] mbh_: you mean bastion? [23:00:32] yes [23:01:31] logging in is very slow and I can't upload file to my folder \data\project\mbh [23:01:52] and putty works very slow too [23:05:43] probably someone hammering NFS again [23:07:35] there are around 10 processes in D state [23:16:39] I think the culprit is pid 25631 [23:16:50] tools.u+ 25631 0.2 0.0 20060 2580 pts/11 DN+ 22:28 0:06 | \_ /usr/lib/gridengine/qacct -j 2067313 [23:19:17] It reads /var/lib/gridengine/default/common/accounting, which is a symlink to /data/project/.system/accounting, which lives on NFS and is 3.4 gigabytes [23:21:32] I guess I have to use the stalkword :( [23:22:17] !help please kill PID 25631 on tools-bastion-03 "/usr/lib/gridengine/qacct -j 2067313" for hammering NFS [23:22:17] zhuyifei1999_: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [23:23:01] zhuyifei1999_: hey, sure I can do that [23:23:29] thanks [23:24:01] tools.ukbot [23:25:23] !log killed pid 25631 - /usr/lib/gridengine/qacct -j 2067313 being run by tools.ukbot [23:25:23] madhuvishy: Unknown project "killed" [23:25:41] !log tools.ukbot killed pid 25631 - /usr/lib/gridengine/qacct -j 2067313 [23:25:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ukbot/SAL [23:25:59] mbh_: looks faster now? [23:26:08] (at least to me yes :P) [23:26:14] yeah loads dropping [23:26:18] load is* [23:28:00] hi, could anyone help me delete several grid jobs? ids are 2007566 2075798 2076059 2076195 [23:28:21] qdel -f doesn't work for me, keeps starting new jobs (?) [23:28:50] what's wrong with jstop? [23:29:23] they should stop the jobs within a minute [23:30:14] jstop won't recognize jobs marked as 'dr' or 'dt' [23:32:23] zhuyifei1999_: do you have powers to qdel -f [23:32:37] obviously no :P [23:32:51] we should get those to you ;) [23:33:00] PeterBowman: I'm looking :) [23:33:05] thx madhuvishy [23:33:48] PeterBowman: which tool is this? [23:33:57] tool/tools [23:34:00] I usually don't need that anyways. most of the complains are 'bastion is slow' and qdel don't help [23:34:03] madhuvishy: pbbot [23:34:22] grid job name is tomcat-pbbot [23:35:02] https://www.irccloud.com/pastebin/dakjXpiY/ [23:35:04] no, slow as it was [23:35:20] ugh yeah it just slowed back down [23:35:56] btw, load on bastion-03 is growing again [23:36:25] tools.ukbot seems to be back [23:36:35] now dropping again [23:36:35] ukbot running that again on PID 12594 [23:37:17] btw, what about an icinga/shinken alert if the load on bastion-02 and -03 gets too big? [23:37:57] according to nagf this is not the first time on bastion-03 today, around 8PM we got around 20 load [23:38:06] it's danmichaelo doing `become ukbot` [23:41:15] I sent: $ write danmichaelo [23:41:15] Please do not run qacct on the main mastion. It's using up all the NFS and make bastion slower for everyone [23:41:40] zhuyifei1999_: thanks! [23:42:03] np [23:42:17] PeterBowman: hey, I deleted your jobs [23:43:09] * madhuvishy disappears [23:43:16] madhuvishy: I have still 8 new jobs running [23:43:33] now with 'r' state, though [23:43:41] right, those must have just showed up [23:44:44] the webservicewatcher thing broke? [23:48:59] PeterBowman: how's it looking now? [23:49:26] madhuvishy: the list was already empty, but they keep showing up [23:49:45] i killed all the jobs, stopped and restarted the service [23:49:52] there's one running job there now [23:51:01] madhuvishy: could we switch from lighttpd to tomcat? [23:51:03] ah you had tomcat [23:51:08] let me try that [23:51:23] i suspect we have some kinks there [23:52:51] PeterBowman: better? [23:52:58] https://tools.wmflabs.org/pbbot/ seems to be up [23:53:09] madhuvishy: now it's fine, thanks! [23:53:15] cool