[00:19:57] !log rcm Tin: Rebooting [00:19:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [00:29:59] !log rcm Tin and Oxygen: Recreating runners, four per server [00:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [00:37:50] !log Neon: Setting new weight on runners. Tin = 4; Oxygen = 2 [00:37:51] Sagan: Unknown project "Neon:" [00:37:57] !log rcm Neon: Setting new weight on runners. Tin = 4; Oxygen = 2 [00:37:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [00:38:14] !log rcm Neon: Testing current config by producing load on runners [00:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [00:42:40] !log rcm Neon, Oxygen, Tin: Confirming current config is working as expected [00:42:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL [01:05:40] tools-bastion-03 is very slow :/ [01:21:33] danilo: you can use -02 in the meantime [01:21:46] madhuvishy: can you take a look at -03? even the login is very slow [01:22:16] I ran "ps aux --sort=-pcpu" and I suspect this: v2 25766 13.9 0.4 151716 74780 pts/30 DN+ Dec30 33:44 python pwb.py amb_mehrdadbot.py [01:22:30] according to nagf the load is currently at least 6.2 [01:22:47] cc andrewbogott bd808 [01:23:16] load is 5.6 currently [01:23:47] anyway, it's dropping a bit, but still slow [01:55:25] I wish I could figure out how to ban pwb from executing on the bastions :/ [02:00:55] !log tools Killed some pwb.py and qacct processes running on tools-bastion-03 [02:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:07:26] an idea: a bot with root access could run a "ps aux" periodically and kill proccess with D in STAT column for more than 1 minute [02:10:27] danilo: true. I've thought about writing a watcher process that would at least nag people. I'm not sure how many false positives it would have [02:11:58] it could only log for some to check the false positives [02:14:04] *some time [02:34:29] another idea is apply a "renice 11" to give a lower priority instead of kill [02:53:54] bd808: I'll work on it ;) [05:19:25] danilo bd808: https://phabricator.wikimedia.org/P6507 [08:36:47] PID 6622 on tools-bastion-03 is being naughty [08:53:47] + PID 10156 [08:54:09] ah great it just exited