[00:04:45] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5.pmtpa.wmflabs output: Warning: 17% free memory [00:14:53] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [00:19:03] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [00:21:46] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [00:24:06] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [00:49:54] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [00:51:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [00:52:43] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: Warning: 18% free memory [00:54:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [00:59:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:09:53] PROBLEM Total processes is now: WARNING on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS WARNING: 175 processes [01:12:42] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: OK: 20% free memory [01:14:53] RECOVERY Total processes is now: OK on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS OK: 96 processes [01:19:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [01:21:54] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [01:25:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:29:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [01:36:53] PROBLEM Free ram is now: WARNING on dumps-bot2 i-000003f4.pmtpa.wmflabs output: Warning: 19% free memory [01:44:24] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 148 processes [01:49:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [01:51:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [01:55:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:59:36] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:20:21] 11/02/2012 - 02:20:20 - Creating a home directory for mono at /export/keys/mono [02:21:12] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [02:21:27] Hello, anyone here? [02:21:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [02:25:20] 11/02/2012 - 02:25:20 - Updating keys for mono at /export/keys/mono [02:26:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:29:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [02:37:23] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 155 processes [02:51:15] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [02:51:55] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [02:52:28] aude: ping [02:56:03] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:56:17] shes very likely asleep [02:57:24] In which timezone is she currently? [02:59:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:09:03] * ori-l does a bugzilla drive-by on andrewbogott_afk.  [03:09:08] any idea re: https://bugzilla.wikimedia.org/show_bug.cgi?id=41622 ? [03:09:23] * ori-l rolls up his window and speeds off. [03:21:13] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [03:21:53] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [03:26:14] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:29:44] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [03:51:14] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [03:51:56] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [03:56:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:02:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:21:55] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [04:22:06] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [04:26:55] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:33:05] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [04:52:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [04:52:15] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [04:56:54] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:04:12] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:22:04] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [05:22:14] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [05:26:52] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:34:18] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [05:52:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [05:53:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [05:57:32] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:04:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:22:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [06:23:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [06:27:34] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:29:24] PROBLEM Total processes is now: WARNING on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS WARNING: 151 processes [06:32:23] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 148 processes [06:34:23] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [06:36:33] RECOVERY Disk Space is now: OK on testing-arky i-0000033b.pmtpa.wmflabs output: DISK OK [06:52:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [06:53:46] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [06:54:23] RECOVERY Total processes is now: OK on vumi-metrics i-000004ba.pmtpa.wmflabs output: PROCS OK: 147 processes [06:58:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:04:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:22:47] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [07:23:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [07:29:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:34:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [07:52:56] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [07:53:53] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [07:59:34] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:23:35] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [08:24:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [08:29:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:34:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [08:53:33] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [08:54:14] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [08:59:44] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:04:33] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:24:26] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [09:24:57] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [09:29:46] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:34:35] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [09:56:13] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [09:56:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [09:59:43] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:04:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:26:48] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [10:27:20] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [10:29:47] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:34:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [10:56:54] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [10:57:43] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [11:00:38] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:04:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:27:02] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [11:27:57] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [11:31:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:34:43] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [11:57:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [11:57:53] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:01:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:07:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:28:09] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:28:12] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [12:31:53] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:38:04] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [12:49:49] hashar- good morning [12:52:14] hello anomie [12:52:21] hi Platonides [12:52:52] almost noon for hashar :) [12:55:05] Past noon I think, isn't it almost 2pm there? But I subscribe to the philosophy that an exchange "Good morning" "and good afternoon to you too" makes as much sense as anything else when dealing across timezones ;) [12:55:32] hello anomie :-) [12:55:35] Platonides: 2pm for me [12:55:47] though I woke up at noon [12:55:57] I was so tired that I simply disabled the alarm clock [12:56:08] I guess I will sleep again this afternoon; I must be sick or something [12:56:20] hashar- Uh oh. Family feeling any better? [12:56:25] kind of [12:56:27] my wife is fine [12:56:36] I guess it is my turn now :-( [12:56:40] going to be a looooonnnnng week-end [12:56:52] (I usually get sick on friday hehe) [12:58:32] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [12:58:47] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [12:59:37] hashar- As far as I can figure, the problem we were having with "git status" hanging yesterday seems to be that some of the files (e.g. /home/wikipedia/common/.git/objects/08/0ffc63d65a25b3803ac81d4fb66b98e18e60f8) just hang the process trying to access them, e.g. see pid 5369 on deployment-dbdump. I don't know what might be going on with that though. [13:02:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: OK - load average: 4.59, 3.93, 4.89 [13:02:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:03:01] anomie: looking [13:04:02] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 4.82, 3.94, 4.91 [13:04:48] anomie: maybe we should kill those commands [13:05:04] though they are in D state (uninterruptible sleep) [13:05:36] hashar- I tried killing the ones I started, even kill -9 didn't seem to have any effect. [13:05:36] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [13:05:53] yeah uninterruptible :/ [13:06:10] not sure how to get ride of them without rebooting [13:06:42] PROBLEM dpkg-check is now: CRITICAL on wikisource-web i-000000fe.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [13:07:47] and there is a git repack still locked :-] [13:08:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:08:44] anomie: just reboot the instance I guess ;-] [13:08:53] not going to do any harm to the beta cluster [13:09:00] must leave for a few minutes will be back soon [13:10:33] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 110 processes [13:11:38] RECOVERY dpkg-check is now: OK on wikisource-web i-000000fe.pmtpa.wmflabs output: All packages OK [13:12:59] !log deployment-prep Rebooted deployment-dbdump to clear up hung processes, hopefully clear up NFS weirdness [13:13:00] Logged the message, Master [13:13:15] RECOVERY Current Load is now: OK on deployment-dbdump i-000000d2.pmtpa.wmflabs output: OK - load average: 0.41, 0.22, 0.08 [13:13:31] Nifty, reboot fixed "git status" too. [13:18:29] hashar is there a reason you are using dbdump instead of bastion? [13:18:40] :P [13:18:56] !log deployment-prep petrb: rebooting -bastion to install updates [13:18:56] Logged the message, Master [13:27:52] anomie: back :-D [13:28:34] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:28:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [13:29:38] hashar- wb. Question about groups and permissions: I see most of the files are owned by group "svn", which I don't have so I have to sudo to edit anything in there. Can we do something about that? [13:30:02] dohh [13:30:02] hashar- Also, while you were gone, petan asked "hashar is there a reason you are using dbdump instead of bastion?" [13:30:03] I thought it was the default for everyone [13:30:19] Maybe it was before the changeover to git? [13:30:24] our perm are really a mess [13:30:36] looks like the default was changed from 550 (svn) to 500 (wikidev) [13:30:53] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 3.21, 4.49, 4.98 [13:31:30] might need a change in LDAP [13:31:56] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 5.47, 5.78, 5.16 [13:31:56] wikidev makes sense, but you don't seem to have that one [13:32:28] nop :( [13:32:28] my default gid is 550 [13:33:23] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:34:36] hashar- Is there a reason all.dblist for beta has just one line? Or should that one line be added to all-wmflabs.dblist and that copied over? [13:34:53] Or do I need to wait for csteipp to get an answer to that? [13:38:07] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [13:38:40] paravoid: do you happen to know why LDAP users either have gid 500 (wikidev) or gid 550 (svn) [13:39:02] paravoid: looks like the default gid has been changed at some point [13:39:28] anomie: still can't git status in the dir :((( [13:39:44] I don't sorry [13:39:44] hashar- odd, it worked for me earlier [13:40:10] paravoid: thanks :-] [13:41:54] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 4.51, 4.49, 4.90 [13:42:18] hashar why we just don't use that mwdev user or something [13:42:26] there is mwdeploy [13:42:30] that [13:42:43] though it is not in the svn/wikidev group [13:42:43] we could make it own all files [13:42:43] and everyone would just switch to it [13:43:14] that is an idea [13:43:21] https://bugzilla.wikimedia.org/show_bug.cgi?id=41311 - mwdeploy unix user should be in svn group [13:43:30] also we really want to switch to -bastion in future [13:43:46] if it is in a workable state sure [13:44:10] that's something I am unable to determine :P [13:44:43] also a reason why we shouldn't kill dbdump before we check it [13:45:58] Damianz if you had a server with raid 0 consisting of 2 physical devices, filesystem was damaged and you weren't able to reboot it and it was a root filesystem, what would you do :D [13:46:27] shoot yourself in a head? :P [13:46:44] err... [13:46:44] raid 1 [13:47:01] !log deployment-prep created a second job runner instance: deployment-jobrunner07 [13:47:02] Logged the message, Master [13:47:32] petan: restore from backup ? :-] [13:47:57] I was thinking of detaching one physical drive, fsck that, mount, chroot into it, unmount raid, recreate it back from detached drive :D but that is creepy [13:48:21] hashar it's not so damaged :) but fsck would be nice [13:48:24] just that machine can't be rebooted... [13:48:37] it's permanently in use [13:51:53] RECOVERY Free ram is now: OK on dumps-bot2 i-000003f4.pmtpa.wmflabs output: OK: 26% free memory [13:51:53] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 4.49, 4.38, 4.74 [13:58:38] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [13:58:44] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:04:02] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:08:15] I give up with labs for today [14:08:18] too much madness [14:09:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:19:55] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 5.38, 5.74, 5.31 [14:24:16] In beta, where does /home/wikipedia/common/bin come from? Or is it not in git anywhere? [14:25:00] RECOVERY Current Load is now: OK on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: OK - load average: 4.60, 4.74, 4.98 [14:29:05] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:30:07] PROBLEM host: i-000000e5.pmtpa.wmflabs is DOWN address: i-000000e5.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e5.pmtpa.wmflabs) [14:32:28] hashar- In beta, where does /home/wikipedia/common/bin come from? Or is it not in git anywhere? [14:32:28] it is not in git afaik [14:32:30] anomie: they are some scripts by petan I think [14:33:16] most are probably unneeded nowadays [14:33:17] since we have mwscript and the WikimediaMaintenance extension deployed [14:33:23] which provide all the admin tools we should need [14:34:04] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:34:37] csteipp- Is there a reason all.dblist for beta has just one line? Or should that one line be added to all-wmflabs.dblist and that copied over? [14:35:12] Hey anomie: Is that in the repo? Or on the actual beta server? [14:35:42] csteipp- Actual beta server. all.dblist in the repo has the production version, I would think [14:37:08] Yep. So yeah, I'm not sure why it's that way in production. I'm guessing that whoever added in all-wmflabs.dblist probably updated the local file to take everything out. [14:37:08] Was the 1 line dewikivoyage? [14:37:28] I added dewikivoyage, and the script I used added to all.dblist (addWiki.php) [14:39:04] so i can't get into bots-3 right now, but bots-1, -2 are working just fine [14:39:05] channel 0: open failed: connect failed: No route to host [14:39:05] ssh_exchange_identification: Connection closed by remote host [14:39:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [14:41:18] csteipp- Yes, the one line is dewikivoyage. [14:41:36] petan: is bots-3 down right now? [14:41:45] I don't think beta uses all.dblist [14:41:47] so can probably be checked out from master [14:41:49] shouldn't be [14:41:51] Alright-- yeah, that should be moved to all-wmflabs.dblist [14:41:51] legoktm ^ [14:42:05] I think Reedy fixed the scripts to use all-wmflabs.dblist [14:42:09] hashar- Well, some of the scripts still look at all.dblist. The main config doesn't though. [14:42:36] the bug report was https://bugzilla.wikimedia.org/show_bug.cgi?id=41133 [14:42:41] if you know of any script that still use all.dblist , we want to fix them [14:42:52] related patch : https://gerrit.wikimedia.org/r/#/c/28642/ [14:42:52] petan: thats weird. i'm getting the error that i pasted above [14:42:56] bots-1,2 are just fine for me [14:43:18] ok let me check it [14:43:41] hashar- I'm trying to figure out how to fix them. But half the git commands hanging is making it hard... [14:43:43] that box is probably down [14:43:49] checking why it crashed [14:43:57] anomie: there must be an issue with the labs setup [14:44:06] [4872766.691154] Out of memory: kill process 30748 (perl) score 9684 or a child [14:44:07] OOM [14:44:15] Unable to create and initialize directory '/home/hashar'. [14:44:17] oh yeahhh [14:44:21] boom [14:44:23] !log bots bots-3 dead OOM [14:44:25] Logged the message, Master [14:44:48] [4872869.210414] Killed process 26970 (glusterfs) [14:44:48] [4874406.797627] EXT3-fs error (device vda): ext3_lookup: deleted inode referenced: 98310 [14:44:48] LOL [14:44:56] Ouch. [14:45:11] !log bots booting bots-3 [14:45:12] Logged the message, Master [14:48:39] Change on 12mediawiki a page Developer access was modified, changed by Tarheel95 link https://www.mediawiki.org/w/index.php?diff=600512 edit summary: [14:48:55] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 7.58, 7.11, 5.80 [14:50:34] petan: is there any type of process-killer like the toolserver has if something goes over the amount of allotted memory limit? [14:50:48] not yet [14:51:04] original proposal was that each botop would have own machine [14:51:13] if they exhaust memory, it's their problem, not affecting others [14:51:46] anomie: some/most scripts are in the WikimediaMaintenance extension [14:52:02] anomie: ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/WikimediaMaintenance.git [14:52:11] anomie: looks like some still refers to all.dblist explicitly [14:52:12] is that still the plan? if so I would like a separate machine... [14:52:36] hashar- Copied once and seldom (if ever) updated, I take it? [14:52:44] RECOVERY host: i-000000e5.pmtpa.wmflabs is UP address: i-000000e5.pmtpa.wmflabs PING OK - Packet loss = 0%, RTA = 10.49 ms [14:53:38] legoktm I will need to talk with Ryan regarding that, but unlikely, given the number of botops and resources of labs [14:53:50] [bz] (8REOPENED - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 41133] beta all.dblist is a live hack - https://bugzilla.wikimedia.org/show_bug.cgi?id=41133 [14:53:55] but if you are going to run some expensive bots, I can create a box for you... [14:54:26] i'm not planning on running any expensive (yet), but it would be a nice if there was a box that had near 100% uptime [14:54:44] RECOVERY Free ram is now: OK on bots-3 i-000000e5.pmtpa.wmflabs output: OK: 1356% free memory [14:54:50] I have a vague plan to adjust most if not all of this stuff to check /etc/wikimedia-realm and look for *-$realm.dblist, falling back to the current *.dblist if it's not found. If I can ever sort out the current state of the config in order to figure out where to start, what with these git issues and everything. [14:54:58] because i'm checking right now, and it looks like my 3am UTC cronjob didn't run either [14:55:01] that will never happen because puppet for some reason reboot the boxes occasionally to patch them [14:55:06] anomie: that would be nice [14:55:23] anomie: also consider updating the WikimediaMaintenance class to provide the dblist filename [14:55:47] it's back up [14:55:52] anomie: or maybe that could be handled by $wgConf [14:56:08] hashar- Good idea, for the bits that actually include either of those anyway. [14:56:09] thanks [14:56:36] anomie: $wgConf is simply = new SiteConfiguration [14:56:37] is any one bots server considered to be more stable than others? besides puppet runs and other maintenance [14:56:40] which is in mw/core : includes/SiteConfiguration.php [14:56:53] * anomie really needs to get into the 'bots' lab project one of these days, in his non-work time anyway [14:56:57] anomie: though that never reference the dblist files apparently [14:58:53] RECOVERY Current Load is now: OK on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: OK - load average: 2.26, 2.94, 4.29 [14:59:13] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [14:59:39] Oh look, bug 41133 showed up in my inbox. I think I'll just assign that to myself then. [15:00:12] [bz] (8NEW - created by: 2Antoine "hashar" Musso, priority: 4Unprioritized - 6normal) [Bug 41697] homedir creation fail on a new instance - https://bugzilla.wikimedia.org/show_bug.cgi?id=41697 [15:00:34] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: WARNING - load average: 6.67, 6.28, 5.45 [15:01:02] paravoid: looks like we can't access to new instances :/ Unable to create and initialize directory '/home/hashar'. [15:01:02] paravoid: bug report is https://bugzilla.wikimedia.org/show_bug.cgi?id=41697 ;-D [15:01:19] paravoid: I am wondering if that is related to the homedir migrated out of NFS to gluster [15:01:25] anomie: I added you to cc [15:01:35] andrewbogott: around? [15:01:51] can you help with labs issues while I take care of some other stuff? [15:02:08] I'm around to help, but I can't run point on this atm [15:02:14] paravoid: will ping him. Thanks :-] [15:02:24] paravoid: Sure. You're talking about hashar's problem just above? [15:02:24] yeah [15:02:27] morning andrew! [15:02:45] hashar: morning! [15:03:19] So, you can create new instances but you get kicked out when you try to login with 'Unable to create and initialize directory'? [15:03:33] exactly [15:03:50] rebooted the instance but it does not seem to fix it [15:04:07] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:04:14] andrewbogott: I have created deployment-jobrunner07 a few hours ago [15:04:26] Hm. I think Ryan set aside the gluster migration for the time being... [15:04:40] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [15:04:40] so must be something else [15:04:44] I'm going to create a new instance and see if the same thing happens for me. [15:05:13] ah should have done that too myself [15:05:21] will reboot mine once more just in case (id is i-000004fd ) [15:06:21] OK. I know only about 50% of the process for homedir creation so it'll take me a while to sort out. [15:07:24] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 151 processes [15:07:45] poor andrewbogott :/ [15:08:06] andrewbogott: if it is going to take a long time, you might want to grab a coffee :-] [15:08:18] Hey, I was just wondering what to do this morning anyway :) [15:08:31] if unsure, ask me :-] [15:08:46] I always have something for ops buckets [15:09:24] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:10:50] hashar csteipp petan (anyone else?) - Any objection to my killing anything from /home/wikipedia/common/bin on beta that seems to exist in extensions/WikimediaMaintenance? Or anything to watch out for when doing that? [15:11:18] anomie: I would say go ahead [15:11:35] the script have propably been copied from production or something [15:11:53] * hashar looks at the list [15:12:10] stupid gluster [15:12:45] I can't even cd in the directory [15:15:53] PROBLEM Current Load is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:16:37] PROBLEM Current Users is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:13] PROBLEM Disk Space is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:26] PROBLEM Total processes is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:17:53] PROBLEM dpkg-check is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:18:03] PROBLEM Free ram is now: CRITICAL on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: Connection refused by host [15:23:45] hashar: This is the first instance in a new project? [15:23:53] RECOVERY Current Load is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK - load average: 4.58, 4.34, 4.99 [15:23:54] andrewbogott: nop [15:24:01] we have several instances already [15:24:02] project is deployment-prep [15:24:18] really? And other instances work properly? [15:24:18] andrewbogott: have you managed to create a new one and connect to it ? [15:24:24] Ok, then I'm looking in the wrong place. [15:24:26] we can connect to the other ones [15:24:34] hashar: Yep. In a different project, though. [15:24:50] let me dish that instance and recreate it [15:24:53] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 6.78, 6.47, 5.67 [15:25:46] deployment-jobrunner08 i-000004ff [15:25:49] building … :-] [15:25:56] RECOVERY Current Load is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: OK - load average: 0.20, 0.78, 0.57 [15:25:56] hashar: I'm puzzled because it shouldn't be trying to create your homedir -- homedirs are project-wide and you already have one, right? [15:26:21] So that means the instance failed to mount the existing /home... [15:26:32] RECOVERY Current Users is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [15:27:12] RECOVERY Disk Space is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: DISK OK [15:27:22] RECOVERY Total processes is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: PROCS OK: 85 processes [15:27:54] RECOVERY dpkg-check is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: All packages OK [15:28:02] RECOVERY Free ram is now: OK on testlabs-abogott1 i-000004fe.pmtpa.wmflabs output: OK: 1046% free memory [15:29:22] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [15:29:32] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 107 processes [15:30:06] There are so many things that shouldn't be happening in these gluster logs… I don't know which ones to care about :( [15:31:59] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: WARNING - load average: 4.84, 5.24, 5.26 [15:32:25] andrewbogott: maybe the fstab configuration is wrong on the instance ? [15:32:49] That'd do it, although I can't guess why that would happen in the first place. [15:33:04] I suspect that your new instance will work fine and we will be none the wiser. [15:33:20] still waiting for puppet to complete on the new one [15:33:23] if that works I guess we can forget about the bug ;-] [15:33:54] PROBLEM Current Load is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:34:08] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:34:34] PROBLEM Current Users is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:35:22] PROBLEM Disk Space is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:35:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: OK - load average: 1.27, 3.16, 4.38 [15:35:52] PROBLEM Free ram is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:37:05] andrewbogott: I am wondering if the puppet log could give informations about /home creation [15:37:22] PROBLEM Total processes is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:37:37] hashar: On that system it might, if we could log into it :( [15:37:54] andrewbogott: can't you login as root and its /root homedir ? [15:38:12] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: Connection refused by host [15:39:00] hashar: Hm, maybe… I'm not sure that I have root on instances. But maybe there's a backdoor from the nova node. [15:39:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [15:40:05] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: OK - load average: 1.02, 2.21, 4.06 [15:41:59] RECOVERY Current Load is now: OK on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: OK - load average: 4.47, 4.59, 4.93 [15:42:23] RECOVERY Total processes is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: PROCS OK: 93 processes [15:42:34] andrewbogott: I can connect to the new instance. [15:42:47] andrewbogott: maybe autofs does not work on the other one [15:43:04] hashar: That's good! I will continue to try to understand gluster but… at leisure :) [15:43:13] RECOVERY dpkg-check is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: All packages OK [15:43:30] andrewbogott: good luck on that :/ [15:43:36] hashar: Yeah, possible that it was a random hiccup, but it looks like gluster was misbehaving at roughtly… 15:03, is that when you set up the instance? [15:43:42] That is, 40 minutes ago? [15:43:53] RECOVERY Current Load is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: OK - load average: 0.69, 1.00, 0.68 [15:44:08] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000004fd [15:44:16] apparently started it at 13:50 utc [15:44:28] hm, nope. Ok, so, unrelated. [15:44:33] RECOVERY Current Users is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: USERS OK - 1 users currently logged in [15:44:57] andrewbogott: should we simply forgot about that issue ? If so I will delete the instance and close the bug [15:45:04] don't delete it yet. [15:45:14] ok :-] [15:45:22] RECOVERY Disk Space is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: DISK OK [15:45:30] will just update the bug report about how a new instance works [15:45:52] RECOVERY Free ram is now: OK on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: OK: 1692% free memory [15:46:37] andrewbogott: by the way are you working on labs with Ryan ? [15:46:43] or is your expertise in a different area ? [15:46:55] hashar: Ryan and I are the 'labs team'. [15:46:59] ahhh [15:47:03] I'm a developer though, so slower at troubleshooting operations things. [15:47:19] but fast at fixing them probably :-]]]]]]] [15:47:26] I'm mostly here to write and debug python :) [15:47:27] nice!! [15:47:27] andrewbogott paravoid is not a member of that team? :D [15:47:43] petan: Yes, normally, but right now he's sidelined to work on Swift [15:47:51] aha [15:47:58] andrewbogott: do you want to be automatically notified of bugs opened in Wikimedia labs ? [15:47:58] Until we fill the swift engineer position. [15:47:58] among other things... [15:48:14] I'm still around and can help with stuff though [15:48:46] paravoid: I will not forget that, especially while Ryan is away :) [15:51:32] !log deployment-prep applying role::applicationserver::jobrunner to jobrunner08 [15:51:33] Logged the message, Master [15:53:04] andrewbogott: want me to add you as a default CC: to all bugs submitted against Wikimedia Labs > General ? [15:53:13] that is where most people open bug reports when it comes to generic labs issues [15:53:14] hashar: yes please [15:53:29] gmail or wikimedia email ? [15:55:07] andrewbogott: do you prefer mail notification to ends in your gmail email or in your wikimedia one ? [15:55:21] wikimedia (they all go to the same place anyway.) [15:55:24] thanks! [15:55:30] you are in :-] [15:55:46] example bugs: https://bugzilla.wikimedia.org/buglist.cgi?component=General&product=Wikimedia%20Labs&list_id=157086 [15:59:12] PROBLEM dpkg-check is now: CRITICAL on deployment-jobrunner08 i-000004ff.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [15:59:27] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [16:01:41] Since we know what public ips are assigned to what boxes inside labs (they're in ldap), is the dns setup for split horizon resolutions so the assigned names resolve to internal ips inside labs? [16:04:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:09:32] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [16:14:28] !log deployment-prep started mw-job-runner on jobrunner08 [16:14:28] Logged the message, Master [16:16:32] RECOVERY Total processes is now: OK on ipv6test1 i-00000282.pmtpa.wmflabs output: PROCS OK: 102 processes [16:18:55] grbmlbl [16:19:18] our permissions are a mess [16:30:54] RECOVERY Current Load is now: OK on deployment-jobrunner06 i-0000031d.pmtpa.wmflabs output: OK - load average: 2.03, 2.07, 4.50 [16:31:23] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [16:34:39] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:39:34] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [16:45:07] * andrewbogott back after lunch [17:01:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [17:04:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:09:42] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:17:52] PROBLEM Current Load is now: WARNING on parsoid-roundtrip4-8core i-000004ed.pmtpa.wmflabs output: WARNING - load average: 5.21, 5.76, 5.41 [17:31:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [17:35:27] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:39:46] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [17:49:38] [bz] (8NEW - created by: 2Ori Livneh, priority: 4Unprioritized - 6normal) [Bug 41622] Unable to create and initialize home directory - https://bugzilla.wikimedia.org/show_bug.cgi?id=41622 [17:50:54] * Damianz frowns at ori-l [17:51:05] Damianz: ? [17:51:45] Damianz: anything specific, or do i just generally inspire sadness? [17:51:58] inspired sadness [17:52:09] also nfs is due to be readonly anytime now [17:52:26] my instance is also readonly [17:52:30] i can read about it in labsconsole :) [17:53:03] your instance shouldn't be ro... wonder what logstatsh says [17:53:30] Damianz: see the first comment / bug desc [17:53:51] in a few, just writing up documentation for work [17:54:18] ok. i'm going to try restarting it [18:01:52] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [18:04:34] PROBLEM Current Users is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:05:14] PROBLEM Disk Space is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:05:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [18:05:55] PROBLEM Current Load is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:05:55] PROBLEM Free ram is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:07:24] PROBLEM Total processes is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:07:54] PROBLEM dpkg-check is now: CRITICAL on gerrit-stream i-00000500.pmtpa.wmflabs output: Connection refused by host [18:09:44] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:12:23] RECOVERY Total processes is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: PROCS OK: 86 processes [18:13:00] RECOVERY dpkg-check is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: All packages OK [18:14:33] RECOVERY Current Users is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [18:15:12] RECOVERY Disk Space is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: DISK OK [18:15:52] RECOVERY Current Load is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: OK - load average: 0.04, 0.58, 0.56 [18:16:02] RECOVERY Free ram is now: OK on gerrit-stream i-00000500.pmtpa.wmflabs output: OK: 1053% free memory [18:27:14] Damianz: As far as I know the home dir migration to gluster has been postponed for a couple of weeks. Is that what you're talking about? And/or did I miss a memo? [18:27:55] Dunno, I know they didn't get done the other day as expected. [18:28:11] I'm half betwean paying attention and hammering out documentation heh [18:28:28] Damianz: Ryan is now on holiday for two weeks and I know /I'm/ not going to do the migration. [18:28:44] delegate to paravoid? :D [18:28:53] he's too busy to handle it [18:29:00] it'll need to wait till I get back [18:29:10] * Ryan_Lane isn't gone just yet ;) [18:29:24] Everyone's allways busy, if you're not busy clearly you don't have enough todo :D [18:29:49] Hope you have fun anyway :) [18:29:49] he's working on keeping our media infrastructure from collapsing on itself [18:30:04] it's more important than the home directories currently ;) [18:30:07] We don't need swift, it's fine [18:30:10] :D [18:30:42] * Damianz hopes this windows server decides to behave its self sometime soon [18:31:57] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [18:33:25] Anyone here? [18:33:35] no [18:34:39] Damianz: -_- [18:34:40] heh [18:34:42] UserMono: what's up? [18:35:04] _-_ [18:35:12] Hi, I wanted to create a new project or at least a new instance :) [18:35:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [18:39:42] What kind of stuff are you working on? [18:40:15] * UserMono gets a link [18:40:46] https://labsconsole.wikimedia.org/wiki/Shell_Request/Mono [18:41:07] details them briefly Extending functionality of wikimediafoundation.org (contact forms and such), Commons POTY tracking and tools, migration of old /~mono/ toolserver tools. [18:41:12] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [18:41:24] * Ryan_Lane nods [18:41:33] so tools and stuff probably projects around, other probably new project or such [18:41:46] we really need a tools project that could host multiple projects [18:41:46] * Damianz shrugs and delegates talking to someone with access to magic [18:41:57] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3 i-000004d8.pmtpa.wmflabs output: WARNING - load average: 5.11, 5.75, 5.24 [18:42:49] A misc project might be best [18:43:08] Personally I think we should have tools which is 100% web stuff with multiple frontend instances and data on project storage, a project for scrub mediawiki installs that are auto updated and non-web tools should fall under bots and be packaged/scheduled [18:43:22] Damianz: yeah [18:43:22] agreed [18:43:31] didn't someone start a toolserver-like project? [18:43:38] recently? [18:43:38] is there any web stuff project that is actually working now? [18:43:43] I think someone created tools with no instances [18:43:43] there's 137 projects [18:44:01] I was offering assistance with puppet/my crappy script that auto creates dirs to sort user spaces out [18:44:04] almost all of them have something accessible via a webserver [18:44:17] hmm [18:44:29] heh [18:44:34] Not sure what to do with project spaces... maybe extend so /data/project/users/ and /data/project/projects/ are both user_dir paths and the later is group owned by the project group for access [18:44:50] Would suck for clashing.. though tools.wm/~user project.wm/~project could work [18:44:54] * Damianz stops thinking out typing [18:45:11] bleh I really dislike user directories [18:45:40] You know what, shared storange where by like /data/project/somefolder can be mapped into a web project then have .wmflabs.org mapped to it would be best [18:45:54] Though the current gluster setup and dns setup would be restrictive [18:46:00] that's doable [18:46:11] though it should be UserMono: unfortunately you're one of the first trying to migrate tools ;) [18:46:50] so nothing is really set up, yet [18:47:23] I kinda prefer like secEdit.tools.wmflabs.org which is in /data/project/secEdit and perms are setup for to access it [18:47:33] yes [18:47:33] me too [18:47:41] Problem is without full groups we can't use ldap groups... unless we write out own shit to create groups/permissions management [18:47:50] are there apache-type servers, PHP, and databases? [18:47:50] Which we could, if someone has time [18:47:52] UserMono: for bots, yes [18:47:55] not for tools [18:47:58] ah [18:48:10] we're just kicking off toolserver migration talks [18:48:16] none of the work has started yet [18:48:43] It would be interesting if a 'project' died from what it means now [18:48:55] in which way would it change? [18:48:55] Make 'project' a grouping, vms/storage/web stuff services [18:49:11] that's what it is? [18:49:31] Well a project right now really means 'can make instances' not 'can consume resources' [18:49:37] that's not true [18:49:43] Though that's mostly because we only have instances not mysql/storage/web resources [18:49:53] I do think a treamlined Toolserver, with a lot of the hard work sorted out like frontend user logins, would be ideal [18:49:59] well, we have project storage [18:50:05] UserMono: agreed [18:50:18] user logins is sorted [18:50:24] all projects share authentication and authorization [18:50:41] Ryan_Lane: not on Toolserver, it's prohibited but there's a nasty workaround for authenticating Wikimedia (non-labs) users [18:50:54] labs is very open about access [18:51:01] soon it'll be self-registration [18:51:24] I think if we added a bunch of stuff like 'manage databases', 'manage webspace' etc under the sysadmin tab and used salt or w/e to handle that, backed with using the project ldap groups then doing .tools. would be easy as pie. [18:51:30] you'll be able to self-register service accounts, too [18:51:32] But as you know, I hate the web interface [18:51:40] I meant trying to tap in to Wikimedia's SUL system to verify people who use the ools [18:51:41] Damianz: yeah, that's the goal [18:51:45] has anyone put mediawiki on a labs server? [18:51:48] UserMono: ooohhhh. that [18:51:56] yes [18:52:00] there's tons of mediawiki around [18:52:16] we need OAuth/OpenID for authenticating against SUL [18:52:18] well, that's might be a start for a tools project because of the dependencies [18:52:29] Or you know, if we used chef you could do that in a databucket and have the config management system do sexy on all the web instances :D [18:53:01] meh. we're definitely not switching from puppet to chef [18:53:18] that's like "do you want the serrated or non-serrated knife to stab you in the face?" [18:53:27] Serrated every time [18:53:34] hahaha [18:53:42] UserMono: labs restrictions are fairly minor [18:53:54] There's 1 feature I like in chef - auto discovery. Dozens of arrays/hashes in puppet is painful... but I also hate ruby [18:54:25] Bonus for Puppet though.. it can handle cisco gear [18:54:26] UserMono: http://www.mediawiki.org/wiki/Wikimedia_Labs/Terms_of_use [18:54:30] that's our only restrictions [18:55:30] can we change 'must be hashed' to 'must be hashed and salted'? [18:55:30] it's rather redundant plain hashing stuff these days [18:55:30] I was *just* thinking that [18:55:52] Change on 12mediawiki a page Wikimedia Labs/Terms of use was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=600595 edit summary: [18:55:54] fixed :) [18:56:26] Everyone should totally use rot13! [18:56:38] that's why I added "strong" :) [18:56:39] rot13 isn't a hash, anyway ;) [18:56:43] it's a cipher [18:56:54] True [18:57:02] Though md5 pretty much falls into the same boat [18:57:18] salted md5 is still relatively safe for passwords [18:57:33] Yeah, unsalted md5 isn't exactally the most secure thing ever though :) [18:57:45] not that people should use it, since there are viable alternatives that are easy to use [18:57:45] My love is retained for sha or bcrypt [18:58:02] Ryan_Lane: is there a specific project running mediawiki that I could look at? [18:58:02] It's interesting that adding a few ms in calc saves you from bruting stuff with bcrypt [18:58:02] UserMono: sorry we don't have an easy solution for you [18:58:20] lemme see [18:58:26] hm. seems education is still down [18:58:37] 'we don't need no education' [18:58:40] i was just wondering :) - the platform is a whole lot more robust than Toolserver, which has horrible reliability [18:58:42] !resource signwriting [18:58:42] https://labsconsole.wikimedia.org/wiki/Nova_Resource:signwriting [18:58:53] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core i-000004f8.pmtpa.wmflabs output: WARNING - load average: 10.49, 9.13, 6.24 [18:59:03] TS is lacking resources sadly [18:59:10] damn it. what's signwriting's public address? [18:59:18] I really need to make these project pages more useful [18:59:27] Though yes, it's uptime is crappy... some of my labs stuff relies on TS bits and it's commonly the TS that goes down [19:00:18] UserMono: http://ase.wikipedia.wmflabs.org/wiki/Main_Page [19:00:24] andrewbogott: Next time you visit the office can you print off http://sterlinghamilton.com/wp-content/uploads/2011/06/refactor.png and stick it on the wall next to ryan? :D [19:00:35] heh [19:00:54] PROBLEM Current Load is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: WARNING - load average: 11.36, 10.80, 7.55 [19:01:24] well, someone has definitely started [19:01:30] ,gcalc (15/100)*20 [19:01:41] bleh, no gcalc bot [19:01:57] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [19:02:00] UserMono: many of the wiki projects are hidden [19:02:13] since they don't have public IPs [19:02:15] they are accessible by a SOCKS proxy [19:02:48] Talking of socks [19:03:11] * UserMono would basically like to clone that server [19:03:19] ah [19:03:23] well, the good thing is... [19:03:36] we have a puppet class that will completely install mediawiki for you [19:03:36] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core i-000004f9.pmtpa.wmflabs output: WARNING - load average: 11.06, 10.62, 7.36 [19:03:42] Ryan_Lane: Any chance of setting up split horizon dns, since everything is NAT? Would be much nice UX wise to just resolve to w/e internal ip the name is mapped to via the public ip. [19:03:44] set up in a sane way [19:03:59] * Damianz raw at not being able to talk to public hostnames of labs stuff [19:04:09] Damianz: ah, so that you can access public ips from inside? [19:04:26] well sorta [19:04:26] well, public addresses [19:04:31] so if I hit bots.wmflabs.org from inside labs it goes to 10. not 203. [19:04:36] Which should be do-able [19:05:00] yeah. not the easiest thing in the world to accomplish [19:05:08] And when I say do-able that totally depends on how our recursors are setup and how ldap plays a part in there. [19:05:33] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [19:05:38] recursors send wmflabs and wmflabs.org to labs dns [19:06:07] the way it's stored in ldap makes it difficult [19:06:08] we'd need to run two dns servers [19:06:24] and store things in different locations [19:06:40] I also don't want to make too many changes to the dns code [19:06:40] Hmmm I guess without straight 1->1 mappings then you'd have to run dedicated recursors which pretended to be authoritive if the view was internal [19:06:43] because we're moving away from it [19:06:48] I think bind calls those views... [19:06:58] yeah [19:06:59] * Damianz stabs bind and hugs his pdns [19:07:03] likely possible in pdns too [19:07:14] still not terribly easy [19:07:15] Custom backend would be, not clean or easy though [19:07:33] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [19:07:36] Still, would be nice =D [19:07:53] UserMono: so, andrewbogott will likely be able to help you out with what you need [19:07:57] I'm about to leave for two weeks [19:08:05] and really should be packing right now [19:08:38] have a nice trip :) [19:08:43] thanks :) [19:08:59] UserMono: hello! [19:09:03] Hi there. [19:09:25] Do you have a specific desire for a mediawiki server right now, or are you just trying to lay groundwork for toolserver migration? [19:09:25] Have fun and don't bring back too many tropical diseases for the collection =D [19:09:58] I vote anti-tropical diseases but pro venomous animals. Liven up the office a bit. [19:10:15] oh venomous animals are cool [19:10:31] "I just got back from Australia and there are a bunch of dangerous snakes in the kitchen and a stonefish in the water cooler, in case anyone is interested" [19:10:33] hahaha [19:10:36] Closer to the latter - I am essentially looking for a virtuak LAMP stack [19:10:57] are you going to be doing mediawiki dev? [19:11:06] andrewbogott: You TOTALLY need a Tetraodontidae in the water cooler [19:11:54] it may be good to give you admin in the tools project [19:11:54] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:11:55] Damianz: My housemate used to keep some of these guys, they're the best: http://en.wikipedia.org/wiki/Dwarf_pufferfish [19:12:08] eventually tools will need to work out how it works. heh [19:12:10] Ryan_Lane: down the road [19:12:16] * Ryan_Lane nods [19:12:30] Oooh pretty [19:12:58] Hmm do I want fried chicken, pizza or ice cream [19:13:37] UserMono: What we have now is… a) an easy way to create a one-person and one-node server running a trivial mediawiki isntall, and b) a 'beta' cluster running a miniature version of the actual WMF production cluster. What we don't have is c) any actual content from the real projects integrated into a or b. [19:14:36] I'm not really looking for content or interested in importing from things - the tools I run are somewhat independent on WMF data [19:14:44] a) sounds fine [19:14:56] I want c [19:15:08] OK. So if you want your own private a) I can set you up. If you want to mess around on b) then you need to talk to people who manage that project (e.g. Damianz or Petan). [19:15:16] andrewbogott: UserMono there is a fair amount of content at http://en.wikipedia.beta.wmflabs.org, not so much in other language wikis [19:15:28] Hey, I totally don't manage that... it's a hashar thing :P [19:15:45] I just fix it when stuff goes down and get confused by the 'pending deployment' policy on extensions [19:15:49] I can answer a lot of questions about beta, but not all. [19:15:57] Damianz: I think that c is officially Asher's problem and he has the good sense to avoid this channel. [19:16:13] If Asher was here he'd never get anything done =D [19:16:36] UserMono: Are you collaborating with other people who are already using labs or would you like your own project? [19:16:46] andrewbogott: no [19:16:46] beta is going to get a lot more interesting when hashar has Zuul done [19:17:02] Totally could use sharded database config with a local db for users federated to a read only slave of prod for awesomesauce magic... mysql federation is pretty cool but sucks to think about [19:17:02] andrewbogott: and yes [19:17:51] UserMono: OK, all I need from you is a suggested project name and your labs username (unless you don't have a labs account yet, in which case we need to do that first.) [19:18:12] Zuul looks pretty cool... personally I think it would be more interesting if we're at the stage where spinning up an entire cluster, pulling in a random dataset, water testing the whole ruddy thing and trashing it all can be done from mr jenkins or mrs hudson =D [19:18:26] he has shell etc [19:18:57] * Damianz wonders how much stuff is A/B testing on prod for ui changes [19:19:14] andrewbogott: https://labsconsole.wikimedia.org/wiki/User:Mono [19:19:30] Damianz: yep, Zuul is so that Jenkins can talk to gerrit. /me oversimplifies [19:19:34] does the name have to be something specific? [19:20:08] chrismcmahon: As I understand it basically that bit is required for proper gerrit intergration with tests and we can replace all the current hooks with the 'framework' but I've not read up on it that much. [19:21:08] It's sad when you have to eat £7 worth of food to avoid walking out in the rain. [19:21:08] UserMono: You'll have an opportunity to add a long-form description as well, but I do need to enter some sort of project name. Usually project names refer to the tools that they're being used to develop. [19:21:30] Damianz: not only tests but also deploying to beta automatically by way of Jenkins builds, whose results will be publicly visible [19:21:42] deploying code [19:21:59] ooh shiny [19:22:16] Not just let gerrit git/ssh replicate on merge but actually visable when stuff has gone over... awesome [19:22:18] (which is done right now by hacky shell scripts that have some issues and break a lot) [19:22:48] The loop script is insane and I don't know how it doesn't break db migrations/localisation updates etc [19:22:54] Though tbf I don't know /that/ much about mediawiki [19:23:21] (reading scrollback) I see talk about migration from toolserver. Does that mean we have something resembling a database slave replicated from production available to labs instances now? [19:23:29] anomie: Not yet [19:23:42] andrewbogott: just call it 'glass' for now [19:23:45] and forever [19:24:01] IIRC it's on the roadmap for this Q, which most likely means start of next year [19:24:10] Damianz- Too bad. That's the biggest thing I know of holding up real migration of tools. [19:24:43] Yep - I have a php/json thing so my laps stuff can talk to the db via TS... works for sutff that needs 1 query with little data, not for antyhing serious though. [19:24:59] UserMono: OK, coming up. [19:26:57] * andrewbogott basks in the glow of now-working project creation code [19:27:22] andrewbogott: :D [19:27:23] =D [19:27:31] I reviewed your code ;) [19:27:33] and +2'd it [19:28:03] You know it's sad when Ryan_Lane reviews and merges your code... usually you have no reason to use *that* bit of the interface for the next 3months [19:28:19] heh [19:28:28] at least it's fixed for the next time you need it ;) [19:29:28] I wonder if I order a half & half pizza with both halfs the same the pizza guy will freak out :D [19:30:05] 11/02/2012 - 19:30:04 - Creating a project directory for glass [19:30:23] UserMono: While labs catches up, consider reading this page: https://labsconsole.wikimedia.org/wiki/Help:InstanceConfigMediawiki [19:30:38] And let me know if it's baffling :) [19:32:09] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [19:32:11] Damianz: troll ;) [19:32:17] I'll have to set up an instance first [19:32:25] =D [19:32:32] Damianz: did you see that I removed the tiny instance type? [19:32:42] UserMono: Step 2 on that guide is 'make an instance' [19:32:44] Oooh nope, that's awesome =D [19:32:48] ah didn't get there yet [19:32:50] I filtered it from the interface :D [19:32:58] heh [19:32:58] Ryan_Lane: btw, you did that via a hotpatch; would you like me to merge that in? [19:33:07] andrewbogott: please do [19:33:15] hm, not sure 'hotpatch' is the right word for that. [19:33:17] ok [19:33:22] livehack is :) [19:33:22] if($name == 'tiny') // it's the size that matters [19:33:32] livehack! That's what I was looking for. [19:34:06] Meh, hsbc are cheap. Apparently they don't put your details *on* the card. They put it on a tiny film over the card... which I seem to peel off every few months -.- [19:34:13] andrewbogott: does it have the security group? [19:34:22] default [19:34:25] UserMono: You'll need to set that up. [19:34:30] OK [19:34:30] for web access. [19:34:31] De [19:34:38] ok. I better start packing [19:35:19] andrewbogott: You know if we had a mw specific project where by we created instances/wikis on user request we could default open the right ports [19:35:28] * Damianz yays at better ux [19:36:02] Damianz: Yeah. Projects are cheap so it doesn't really hurt us to have a million one-instance projects, but it does feel silly sometimes. [19:36:17] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [19:36:19] I just want the 80 and 443 ports right? [19:36:21] It also wouldn't be crazy to just always have a 'web' sec group in every new project since that's mostly what people want. [19:36:43] ah yes [19:36:50] It's kind of a shame 'sysadmin' is project wise... delegating instance access in this case would make sense... though for other stuff it wouldn't [19:36:50] UserMono: I think so. The important thing is to create the group and make sure your instance is assigned to it. [19:36:50] Y [19:37:06] Once an instance is running you can change what a group means, but you can't change which groups the instance is in. [19:37:07] the web group? [19:37:23] right. [19:37:23] I was kinda leaning this way when talking about 'sub groups' in the sense that like the beta project has a 'web' group, a 'database' group etc... which security/access rules could be applied at that level. [19:37:39] * Damianz frowns at nova not being magic and supporting weird usage [19:40:00] How much justification is needed to get a labs project, anyway? [19:40:26] 'I want to work on that doesn't totally go against the ToS' [19:40:28] and [19:40:40] 'there isn't a suitable project already' [19:43:03] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [19:43:36] anomie: Yep, the bar is pretty low. Although I might say that needs to be related to the mission of WMF. [19:44:07] So e.g. if you just want to hack on Apache we might send you elsewhere. [19:45:23] Apache is dull to hack on anyway, now nginx I could get behind :) [19:45:39] * anomie should probably figure out puppet someday if he's going to keep on working with beta. And on the non-work side, he has [[en:User:AnomieBOT]]. [19:47:11] I've never tried hacking on Apache, although a project I submitted some code to once was later absorbed by ASF. [19:49:33] PROBLEM Current Users is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [19:50:13] PROBLEM Disk Space is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [19:50:56] PROBLEM Current Load is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [19:50:56] PROBLEM Free ram is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [19:52:23] PROBLEM Total processes is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [19:54:34] PROBLEM dpkg-check is now: CRITICAL on mediawikiglass i-00000501.pmtpa.wmflabs output: Connection refused by host [20:02:03] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [20:02:34] PROBLEM Total processes is now: CRITICAL on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS CRITICAL: 293 processes [20:04:02] UserMono: How's it going? [20:06:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [20:07:33] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [20:09:53] PROBLEM Free ram is now: WARNING on dumps-bot2 i-000003f4.pmtpa.wmflabs output: Warning: 19% free memory [20:13:08] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [20:14:34] RECOVERY Current Users is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [20:14:34] RECOVERY dpkg-check is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: All packages OK [20:15:12] RECOVERY Disk Space is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: DISK OK [20:15:55] RECOVERY Current Load is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: OK - load average: 0.46, 1.03, 0.94 [20:15:55] RECOVERY Free ram is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: OK: 560% free memory [20:17:22] RECOVERY Total processes is now: OK on mediawikiglass i-00000501.pmtpa.wmflabs output: PROCS OK: 93 processes [20:20:56] PROBLEM Current Load is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:21:33] PROBLEM Current Users is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:22:19] PROBLEM Disk Space is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:22:53] PROBLEM dpkg-check is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:23:08] PROBLEM Free ram is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:24:23] PROBLEM Total processes is now: CRITICAL on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: Connection refused by host [20:31:37] RECOVERY Current Users is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [20:32:12] RECOVERY Disk Space is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: DISK OK [20:32:24] RECOVERY Total processes is now: OK on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS OK: 146 processes [20:32:54] RECOVERY dpkg-check is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: All packages OK [20:33:02] RECOVERY Free ram is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: OK: 1054% free memory [20:33:42] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [20:34:27] RECOVERY Total processes is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: PROCS OK: 84 processes [20:35:53] RECOVERY Current Load is now: OK on mwreview-abogotttest i-00000502.pmtpa.wmflabs output: OK - load average: 0.06, 0.55, 0.60 [20:36:16] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [20:40:25] PROBLEM Total processes is now: WARNING on parsoid-spof i-000004d6.pmtpa.wmflabs output: PROCS WARNING: 157 processes [20:44:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [21:03:47] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [21:06:13] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [21:14:13] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [21:23:59] andrewbogott: you there? [21:24:02] yep! [21:24:05] Having any luck? [21:24:40] got through most everything [21:24:49] just wondering where the instance should be? [21:24:53] (URL) [21:25:24] Ah, OK. You don't have a public IP at the moment, so you'll need to do some proxy magic to get at it. [21:25:32] Lemme find you a link. [21:26:19] UserMono: This is what I do: https://labsconsole.wikimedia.org/wiki/Help:Access#Accessing_web_services_using_a_SOCKS_proxy [21:26:19] It is not [21:26:20] as painful as it looks. [21:26:23] OK [21:26:51] could I get a public IP? [21:27:37] UserMono: They are in short supply. If you need many other people to have web access then we can give you a public IP, but otherwise proxying is better. [21:27:52] ok [21:28:14] is there a direct way to access the mediawiki install like the example? [21:28:26] or does that require the ip? [21:29:45] Well, the example URLS like http://instance-id.pmtpa.wmflabs/wiki are proxied (you can tell because the TLD is wmflabs with no .org -- that means that wmflabs is aliased to a tunnel. [21:30:05] mmkay [21:30:41] Are those proxy instructions impossible to follow? I haven't looked at them in a while, I'll re-read. [21:30:56] no [21:31:36] the mediawiki setup page you sent was tho [21:32:13] I can believe that :) Can you be more specific about what was unclear? [21:32:29] (It sort of presumes that you already know what labs is and how to use labs console, which is maybe a mistake.) [21:33:39] yep, that's the problem [21:33:45] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [21:33:47] links to all of the consoles would help [21:34:10] ok, that's a good idea. And maybe just a disclaimer that says "Before you start this, read this"? [21:34:41] yep, screenshots would be really nice and consolidating some of the linked pages would help :) [21:35:41] what do you mean by 'consolidating some of the linked pages'? [21:36:22] placing the https://labsconsole.wikimedia.org/wiki/Help:Security_Groups info right there in the steps [21:36:53] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [21:37:08] Hm… OK, although that page is of general use, not just for mediawiki. [21:37:18] But maybe there's some transcluding magic we can do there. [21:37:32] ok [21:37:44] just based on my experience for what I wanted to do [21:37:56] * andrewbogott may not be using that word correctly. [21:38:04] Yep, what you're saying makes sense. I'll work on this a bit. [21:44:02] <^demon> ori-l: I figured out what broke logging in on my gerrit master :) [21:44:22] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [21:45:48] hey Jamesofur [21:45:54] pm? [21:46:10] UserMono: sure, though I only have about 5 minutes atm before a meeting [21:53:07] https://fbcdn-sphotos-c-a.akamaihd.net/hphotos-ak-prn1/59675_504384319586792_861222566_n.jpg < So true. [21:56:37] ^demon, that's awesome. does the version you're upgrading us to include plug-in support? [21:57:01] <^demon> Yep. We'll be upgrading as soon as this regression is fixed. [21:57:32] Ryan_Lane: http://i.imgur.com/wAKZH.png [22:01:07] any love for node.js on labs? [22:02:32] PROBLEM Total processes is now: CRITICAL on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS CRITICAL: 293 processes [22:02:45] There's a package for it IIRC? [22:03:42] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [22:05:58] I'm interested in having a node based server process hosted on labs [22:06:15] just puttig out feelers [22:06:48] but I'm working on a new project that would need it [22:06:53] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [22:07:51] I don't really like node but do actually use it on labs and I'm pretty sure there's a package for it as some prod stuff uses it (or will do soon). [22:08:04] Hmm I wonder if I replaced that node bot with twisted actually [22:12:32] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 200 processes [22:14:25] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [22:32:33] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 113 processes [22:33:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [22:36:56] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [22:40:59] node is nice for realtime stuff (together with socket.io) [22:41:33] plus javascript is as good a scripting language as any, and it has a _darn_ fast implementation with V8 [22:44:24] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [23:03:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [23:04:09] Anyone know why git status / git pull on beta is taking a very, very long time now? I know it didn't take this long last week... not sure what changed. [23:07:42] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [23:14:37] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs) [23:16:55] Hey hashar, I don't want to bother your friday night, but do you know if Brad did anything to git on beta? I can't git status / git pull [23:17:24] csteipp: ohh [23:17:25] csteipp: yeah we talked about it today [23:17:33] csteipp: seems to be an issue in labs :/ [23:17:34] * RoanKattouw is somewhat surprised to see hashar on IRC past midnight [23:18:00] hashar: Yeah, it does... Alright, I'll keep working on it :) [23:18:01] RoanKattouw: have too. I slept this morning :-D [23:18:25] csteipp: I have no idea what is going on. Most probably glusterFS is doomed [23:18:54] definitely not helping [23:19:25] cd /home/wikipedia/common is stalling for me :/ [23:19:41] Yep... strace stop when git tries to read .git/objects/d6/55a716d866089f94736a6964a303ab73f69b9a [23:20:32] http://ganglia.wmflabs.org/latest/?r=4hr&cs=&ce=&m=load_one&s=by+name&c=deployment-prep&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 [23:20:35] tbf [23:20:41] git = lots of small(ish) files [23:20:42] bah both deployment-bastion and deployment-dbdump show high load [23:20:45] gluster = hater of all things small [23:20:55] As she said, it's the size that matters [23:20:56] all waiting for something [23:23:28] csteipp: can't help that much on this though sorry :-( [23:24:13] hashar: No problem. But I may need to to a few local changes on beta to get voyage working tonight. I'll add gerrit changes if I do :) [23:24:25] urgh [23:24:33] you'll have to fix it after they're merged [23:24:39] as the script doesn't reset + clean [23:24:42] so it will conflict on pull [23:24:55] Yeah, I can do that [23:25:08] anyway, heading bed for now [23:25:15] nn hashy [23:25:30] have a good weekend! [23:25:42] You as well! [23:33:43] PROBLEM host: i-0000039b.pmtpa.wmflabs is DOWN address: i-0000039b.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000039b.pmtpa.wmflabs) [23:36:14] PROBLEM Current Load is now: WARNING on deployment-dbdump i-000000d2.pmtpa.wmflabs output: WARNING - load average: 6.10, 6.02, 5.23 [23:38:24] PROBLEM host: i-000004de.pmtpa.wmflabs is DOWN address: i-000004de.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [23:44:34] PROBLEM host: i-000003ef.pmtpa.wmflabs is DOWN address: i-000003ef.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003ef.pmtpa.wmflabs)