[00:04:45] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 17% free memory [00:14:53] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [00:19:03] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [00:21:46] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [00:24:06] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [00:49:54] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [00:51:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [00:52:43] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: Warning: 18% free memory [00:54:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:09:53] PROBLEM Total processes is now: WARNING on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS WARNING: 175 processes [01:12:42] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: OK: 20% free memory [01:14:53] RECOVERY Total processes is now: OK on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS OK: 96 processes [01:19:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [01:21:54] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [01:25:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:36:53] PROBLEM Free ram is now: WARNING on dumps-bot2 i-000003f4.pmtpa.wmflabs output: Warning: 19% free memory [01:44:24] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 148 processes [01:49:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [01:51:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [01:55:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:59:36] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:20:21] 11/02/2012 - 02:20:20 - Creating a home directory for mono at /export/keys/mono [02:21:12] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [02:21:27] Hello, anyone here? [02:21:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [02:25:20] 11/02/2012 - 02:25:20 - Updating keys for mono at /export/keys/mono [02:26:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:29:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:37:23] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 155 processes [02:51:15] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [02:51:55] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [02:52:28] aude: ping [02:56:03] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:56:17] shes very likely asleep [02:57:24] In which timezone is she currently? [02:59:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:09:03] * ori-l does a bugzilla drive-by on andrewbogott_afk. [03:09:08] any idea re: https://bugzilla.wikimedia.org/show_bug.cgi?id=41622 ? [03:09:23] * ori-l rolls up his window and speeds off. [03:21:13] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [03:21:53] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [03:26:14] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:29:44] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:51:14] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [03:51:56] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [03:56:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:02:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:21:55] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [04:22:06] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [04:26:55] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:33:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:52:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [04:52:15] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [04:56:54] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:04:12] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:22:04] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [05:22:14] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [05:26:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:34:18] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:52:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [05:53:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [05:57:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:04:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:22:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [06:23:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [06:27:34] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:29:24] PROBLEM Total processes is now: WARNING on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS WARNING: 151 processes [06:32:23] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 148 processes [06:34:23] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:36:33] RECOVERY Disk Space is now: OK on testing-arky i-0000033b.pmtpa.wmflabs output: DISK OK [06:52:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [06:53:46] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [06:54:23] RECOVERY Total processes is now: OK on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS OK: 147 processes [06:58:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:04:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:22:47] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [07:23:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [07:29:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:34:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:52:56] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [07:53:53] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [07:59:34] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:23:35] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [08:24:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [08:29:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:34:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:53:33] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [08:54:14] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [08:59:44] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:24:26] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [09:24:57] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [09:29:46] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:34:35] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:56:13] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [09:56:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [09:59:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:04:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:26:48] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [10:27:20] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [10:29:47] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:34:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:56:54] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [10:57:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [11:00:38] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:04:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:27:02] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [11:27:57] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [11:31:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:34:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:57:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [11:57:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:01:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:07:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:28:09] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:28:12] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [12:31:53] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:38:04] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:49:49] hashar- good morning [12:52:14] hello anomie [12:52:21] hi Platonides [12:52:52] almost noon for hashar :) [12:55:05] Past noon I think, isn't it almost 2pm there? But I subscribe to the philosophy that an exchange "Good morning" "and good afternoon to you too" makes as much sense as anything else when dealing across timezones ;) [12:55:32] hello anomie :-) [12:55:35] Platonides: 2pm for me [12:55:47] though I woke up at noon [12:55:57] I was so tired that I simply disabled the alarm clock [12:56:08] I guess I will sleep again this afternoon; I must be sick or something [12:56:20] hashar- Uh oh. Family feeling any better? [12:56:25] kind of [12:56:27] my wife is fine [12:56:36] I guess it is my turn now :-( [12:56:40] going to be a looooonnnnng week-end [12:56:52] (I usually get sick on friday hehe) [12:58:32] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:58:47] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [12:59:37] hashar- As far as I can figure, the problem we were having with "git status" hanging yesterday seems to be that some of the files (e.g. /home/wikipedia/common/.git/objects/08/0ffc63d65a25b3803ac81d4fb66b98e18e60f8) just hang the process trying to access them, e.g. see pid 5369 on deployment-dbdump. I don't know what might be going on with that though. [13:02:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: OK - load average: 4.59, 3.93, 4.89 [13:02:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:03:01] anomie: looking [13:04:02] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 4.82, 3.94, 4.91 [13:04:48] anomie: maybe we should kill those commands [13:05:04] though they are in D state (uninterruptible sleep) [13:05:36] hashar- I tried killing the ones I started, even kill -9 didn't seem to have any effect. [13:05:36] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [13:05:53] yeah uninterruptible :/ [13:06:10] not sure how to get ride of them without rebooting [13:06:42] PROBLEM dpkg-check is now: CRITICAL on wikisource-web i-000000fe.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [13:07:47] and there is a git repack still locked :-] [13:08:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:08:44] anomie: just reboot the instance I guess ;-] [13:08:53] not going to do any harm to the beta cluster [13:09:00] must leave for a few minutes will be back soon [13:10:33] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 110 processes [13:11:38] RECOVERY dpkg-check is now: OK on wikisource-web i-000000fe.pmtpa.wmflabs output: All packages OK [13:12:59] !log deployment-prep Rebooted deployment-dbdump to clear up hung processes, hopefully clear up NFS weirdness [13:13:00] Logged the message, Master [13:13:15] RECOVERY Current Load is now: OK on deployment-dbdump i-000000d2.pmtpa.wmflabs output: OK - load average: 0.41, 0.22, 0.08 [13:13:31] Nifty, reboot fixed "git status" too. [13:18:29] hashar is there a reason you are using dbdump instead of bastion? [13:18:40] :P [13:18:56] !log deployment-prep petrb: rebooting -bastion to install updates [13:18:56] Logged the message, Master [13:27:52] anomie: back :-D [13:28:34] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:28:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [13:29:38] hashar- wb. Question about groups and permissions: I see most of the files are owned by group "svn", which I don't have so I have to sudo to edit anything in there. Can we do something about that? [13:30:02] dohh [13:30:02] hashar- Also, while you were gone, petan asked "hashar is there a reason you are using dbdump instead of bastion?" [13:30:03] I thought it was the default for everyone [13:30:19] Maybe it was before the changeover to git? [13:30:24] our perm are really a mess [13:30:36] looks like the default was changed from 550 (svn) to 500 (wikidev) [13:30:53] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 3.21, 4.49, 4.98 [13:31:30] might need a change in LDAP [13:31:56] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 5.47, 5.78, 5.16 [13:31:56] wikidev makes sense, but you don't seem to have that one [13:32:28] nop :( [13:32:28] my default gid is 550 [13:33:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:34:36] hashar- Is there a reason all.dblist for beta has just one line? Or should that one line be added to all-wmflabs.dblist and that copied over? [13:34:53] Or do I need to wait for csteipp to get an answer to that? [13:38:07] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:38:40] paravoid: do you happen to know why LDAP users either have gid 500 (wikidev) or gid 550 (svn) [13:39:02] paravoid: looks like the default gid has been changed at some point [13:39:28] anomie: still can't git status in the dir :((( [13:39:44] I don't sorry [13:39:44] hashar- odd, it worked for me earlier [13:40:10] paravoid: thanks :-] [13:41:54] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 4.51, 4.49, 4.90 [13:42:18] hashar why we just don't use that mwdev user or something [13:42:26] there is mwdeploy [13:42:30] that [13:42:43] though it is not in the svn/wikidev group [13:42:43] we could make it own all files [13:42:43] and everyone would just switch to it [13:43:14] that is an idea [13:43:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=41311 - mwdeploy unix user should be in svn group [13:43:30] also we really want to switch to -bastion in future [13:43:46] if it is in a workable state sure [13:44:10] that's something I am unable to determine :P [13:44:43] also a reason why we shouldn't kill dbdump before we check it [13:45:58] Damianz if you had a server with raid 0 consisting of 2 physical devices, filesystem was damaged and you weren't able to reboot it and it was a root filesystem, what would you do :D [13:46:27] shoot yourself in a head? :P [13:46:44] err... [13:46:44] raid 1 [13:47:01] !log deployment-prep created a second job runner instance: deployment-jobrunner07 [13:47:02] Logged the message, Master [13:47:32] petan: restore from backup ? :-] [13:47:57] I was thinking of detaching one physical drive, fsck that, mount, chroot into it, unmount raid, recreate it back from detached drive :D but that is creepy [13:48:21] hashar it's not so damaged :) but fsck would be nice [13:48:24] just that machine can't be rebooted... [13:48:37] it's permanently in use [13:51:53] RECOVERY Free ram is now: OK on dumps-bot2 i-000003f4.pmtpa.wmflabs output: OK: 26% free memory [13:51:53] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 4.49, 4.38, 4.74 [13:58:38] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:58:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:04:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:08:15] I give up with labs for today [14:08:18] too much madness [14:09:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:19:55] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 5.38, 5.74, 5.31 [14:24:16] In beta, where does /home/wikipedia/common/bin come from? Or is it not in git anywhere? [14:25:00] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 4.60, 4.74, 4.98 [14:29:05] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:30:07] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [14:32:28] hashar- In beta, where does /home/wikipedia/common/bin come from? Or is it not in git anywhere? [14:32:28] it is not in git afaik [14:32:30] anomie: they are some scripts by petan I think [14:33:16] most are probably unneeded nowadays [14:33:17] since we have mwscript and the WikimediaMaintenance extension deployed [14:33:23] which provide all the admin tools we should need [14:34:04] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:34:37] csteipp- Is there a reason all.dblist for beta has just one line? Or should that one line be added to all-wmflabs.dblist and that copied over? [14:35:12] Hey anomie: Is that in the repo? Or on the actual beta server? [14:35:42] csteipp- Actual beta server. all.dblist in the repo has the production version, I would think [14:37:08] Yep. So yeah, I'm not sure why it's that way in production. I'm guessing that whoever added in all-wmflabs.dblist probably updated the local file to take everything out. [14:37:08] Was the 1 line dewikivoyage? [14:37:28] I added dewikivoyage, and the script I used added to all.dblist (addWiki.php) [14:39:04] so i can't get into bots-3 right now, but bots-1, -2 are working just fine [14:39:05] channel 0: open failed: connect failed: No route to host [14:39:05] ssh_exchange_identification: Connection closed by remote host [14:39:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:41:18] csteipp- Yes, the one line is dewikivoyage. [14:41:36] petan: is bots-3 down right now? [14:41:45] I don't think beta uses all.dblist [14:41:47] so can probably be checked out from master [14:41:49] shouldn't be [14:41:51] Alright-- yeah, that should be moved to all-wmflabs.dblist [14:41:51] legoktm ^ [14:42:05] I think Reedy fixed the scripts to use all-wmflabs.dblist [14:42:09] hashar- Well, some of the scripts still look at all.dblist. The main config doesn't though. [14:42:36] the bug report was https://bugzilla.wikimedia.org/show_bug.cgi?id=41133 [14:42:41] if you know of any script that still use all.dblist , we want to fix them [14:42:52] related patch : https://gerrit.wikimedia.org/r/#/c/28642/ [14:42:52] petan: thats weird. i'm getting the error that i pasted above [14:42:56] bots-1,2 are just fine for me [14:43:18] ok let me check it [14:43:41] hashar- I'm trying to figure out how to fix them. But half the git commands hanging is making it hard... [14:43:43] that box is probably down [14:43:49] checking why it crashed [14:43:57] anomie: there must be an issue with the labs setup [14:44:06] [4872766.691154] Out of memory: kill process 30748 (perl) score 9684 or a child [14:44:07] OOM [14:44:15] Unable to create and initialize directory '/home/hashar'. [14:44:17] oh yeahhh [14:44:21] boom [14:44:23] !log bots bots-3 dead OOM [14:44:25] Logged the message, Master [14:44:48] [4872869.210414] Killed process 26970 (glusterfs) [14:44:48] [4874406.797627] EXT3-fs error (device vda): ext3_lookup: deleted inode referenced: 98310 [14:44:48] LOL [14:44:56] Ouch. [14:45:11] !log bots booting bots-3 [14:45:12] Logged the message, Master [14:48:39] Change on 12mediawiki a page Developer access was modified, changed by Tarheel95 link https://www.mediawiki.org/w/index.php?diff=600512 edit summary: [14:48:55] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 7.58, 7.11, 5.80 [14:50:34] petan: is there any type of process-killer like the toolserver has if something goes over the amount of allotted memory limit? [14:50:48] not yet [14:51:04] original proposal was that each botop would have own machine [14:51:13] if they exhaust memory, it's their problem, not affecting others [14:51:46] anomie: some/most scripts are in the WikimediaMaintenance extension [14:52:02] anomie: ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/WikimediaMaintenance.git [14:52:11] anomie: looks like some still refers to all.dblist explicitly [14:52:12] is that still the plan? if so I would like a separate machine... [14:52:36] hashar- Copied once and seldom (if ever) updated, I take it? [14:52:44] RECOVERY host: i-000000e5.pmtpa.wmflabs is UP address: i-000000e5.pmtpa.wmflabs PING OK - Packet loss = 0%, RTA = 10.49 ms [14:53:38] legoktm I will need to talk with Ryan regarding that, but unlikely, given the number of botops and resources of labs [14:53:50] [bz] (8REOPENED - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 41133] beta all.dblist is a live hack - https://bugzilla.wikimedia.org/show_bug.cgi?id=41133 [14:53:55] but if you are going to run some expensive bots, I can create a box for you... [14:54:26] i'm not planning on running any expensive (yet), but it would be a nice if there was a box that had near 100% uptime [14:54:44] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 1356% free memory [14:54:50] I have a vague plan to adjust most if not all of this stuff to check /etc/wikimedia-realm and look for *-$realm.dblist, falling back to the current *.dblist if it's not found. If I can ever sort out the current state of the config in order to figure out where to start, what with these git issues and everything. [14:54:58] because i'm checking right now, and it looks like my 3am UTC cronjob didn't run either [14:55:01] that will never happen because puppet for some reason reboot the boxes occasionally to patch them [14:55:06] anomie: that would be nice [14:55:23] anomie: also consider updating the WikimediaMaintenance class to provide the dblist filename [14:55:47] it's back up [14:55:52] anomie: or maybe that could be handled by $wgConf [14:56:08] hashar- Good idea, for the bits that actually include either of those anyway. [14:56:09] thanks [14:56:36] anomie: $wgConf is simply = new SiteConfiguration [14:56:37] is any one bots server considered to be more stable than others? besides puppet runs and other maintenance [14:56:40] which is in mw/core : includes/SiteConfiguration.php [14:56:53] * anomie really needs to get into the 'bots' lab project one of these days, in his non-work time anyway [14:56:57] anomie: though that never reference the dblist files apparently [14:58:53] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 2.26, 2.94, 4.29 [14:59:13] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:59:39] Oh look, bug 41133 showed up in my inbox. I think I'll just assign that to myself then. [15:00:12] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 41697] homedir creation fail on a new instance - https://bugzilla.wikimedia.org/show_bug.cgi?id=41697 [15:00:34] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: WARNING - load average: 6.67, 6.28, 5.45 [15:01:02] paravoid: looks like we can't access to new instances :/ Unable to create and initialize directory '/home/hashar'. [15:01:02] paravoid: bug report is https://bugzilla.wikimedia.org/show_bug.cgi?id=41697 ;-D [15:01:19] paravoid: I am wondering if that is related to the homedir migrated out of NFS to gluster [15:01:25] anomie: I added you to cc [15:01:35] andrewbogott: around? [15:01:51] can you help with labs issues while I take care of some other stuff? [15:02:08] I'm around to help, but I can't run point on this atm [15:02:14] paravoid: will ping him. Thanks :-] [15:02:24] paravoid: Sure. You're talking about hashar's problem just above? [15:02:24] yeah [15:02:27] morning andrew! [15:02:45] hashar: morning! [15:03:19] So, you can create new instances but you get kicked out when you try to login with 'Unable to create and initialize directory'? [15:03:33] exactly [15:03:50] rebooted the instance but it does not seem to fix it [15:04:07] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:04:14] andrewbogott: I have created deployment-jobrunner07 a few hours ago [15:04:26] Hm. I think Ryan set aside the gluster migration for the time being... [15:04:40] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [15:04:40] so must be something else [15:04:44] I'm going to create a new instance and see if the same thing happens for me. [15:05:13] ah should have done that too myself [15:05:21] will reboot mine once more just in case (id is i-000004fd ) [15:06:21] OK. I know only about 50% of the process for homedir creation so it'll take me a while to sort out. [15:07:24] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 151 processes [15:07:45] poor andrewbogott :/ [15:08:06] andrewbogott: if it is going to take a long time, you might want to grab a coffee :-] [15:08:18] Hey, I was just wondering what to do this morning anyway :) [15:08:31] if unsure, ask me :-] [15:08:46] I always have something for ops buckets [15:09:24] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:10:50] hashar csteipp petan (anyone else?) - Any objection to my killing anything from /home/wikipedia/common/bin on beta that seems to exist in extensions/WikimediaMaintenance? Or anything to watch out for when doing that? [15:11:18] anomie: I would say go ahead [15:11:35] the script have propably been copied from production or something [15:11:53] * hashar looks at the list [15:12:10] stupid gluster [15:12:45] I can't even cd in the directory [15:15:53] PROBLEM Current Load is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:16:37] PROBLEM Current Users is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:13] PROBLEM Disk Space is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:26] PROBLEM Total processes is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:53] PROBLEM dpkg-check is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:18:03] PROBLEM Free ram is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:23:45] hashar: This is the first instance in a new project? [15:23:53] RECOVERY Current Load is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK - load average: 4.58, 4.34, 4.99 [15:23:54] andrewbogott: nop [15:24:01] we have several instances already [15:24:02] project is deployment-prep [15:24:18] really? And other instances work properly? [15:24:18] andrewbogott: have you managed to create a new one and connect to it ? [15:24:24] Ok, then I'm looking in the wrong place. [15:24:26] we can connect to the other ones [15:24:34] hashar: Yep. In a different project, though. [15:24:50] let me dish that instance and recreate it [15:24:53] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 6.78, 6.47, 5.67 [15:25:46] deployment-jobrunner08 i-000004ff [15:25:49] building … :-] [15:25:56] RECOVERY Current Load is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: OK - load average: 0.20, 0.78, 0.57 [15:25:56] hashar: I'm puzzled because it shouldn't be trying to create your homedir -- homedirs are project-wide and you already have one, right? [15:26:21] So that means the instance failed to mount the existing /home... [15:26:32] RECOVERY Current Users is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [15:27:12] RECOVERY Disk Space is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: DISK OK [15:27:22] RECOVERY Total processes is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: PROCS OK: 85 processes [15:27:54] RECOVERY dpkg-check is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: All packages OK [15:28:02] RECOVERY Free ram is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: OK: 1046% free memory [15:29:22] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [15:29:32] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 107 processes [15:30:06] There are so many things that shouldn't be happening in these gluster logs… I don't know which ones to care about :( [15:31:59] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: WARNING - load average: 4.84, 5.24, 5.26 [15:32:25] andrewbogott: maybe the fstab configuration is wrong on the instance ? [15:32:49] That'd do it, although I can't guess why that would happen in the first place. [15:33:04] I suspect that your new instance will work fine and we will be none the wiser. [15:33:20] still waiting for puppet to complete on the new one [15:33:23] if that works I guess we can forget about the bug ;-] [15:33:54] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:34:08] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:34:34] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:35:22] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:35:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: OK - load average: 1.27, 3.16, 4.38 [15:35:52] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:37:05] andrewbogott: I am wondering if the puppet log could give informations about /home creation [15:37:22] PROBLEM Total processes is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:37:37] hashar: On that system it might, if we could log into it :( [15:37:54] andrewbogott: can't you login as root and its /root homedir ? [15:38:12] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:39:00] hashar: Hm, maybe… I'm not sure that I have root on instances. But maybe there's a backdoor from the nova node. [15:39:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:40:05] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 1.02, 2.21, 4.06 [15:41:59] RECOVERY Current Load is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK - load average: 4.47, 4.59, 4.93 [15:42:23]