[01:09:11] New review: Krinkle; "Going once.. Going twice.." [operations/mediawiki-config] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/16241 [01:18:24] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [01:40:27] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 231 seconds [01:40:27] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 232 seconds [01:47:48] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 674s [01:55:45] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 23 seconds [01:58:09] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [01:58:36] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 14s [02:00:24] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 4 seconds [02:10:10] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [02:10:55] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.045 second response time [02:10:55] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [02:33:16] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [02:33:16] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [02:33:16] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [02:39:16] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:52:46] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [02:53:40] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [02:56:49] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [02:59:04] RECOVERY - Puppet freshness on ms-be6 is OK: puppet ran at Mon Sep 3 02:58:47 UTC 2012 [03:23:10] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , plwiktionary (34746) [03:23:55] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [03:23:55] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , plwiktionary (34142) [03:34:25] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [03:35:19] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [03:35:37] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Mon Sep 3 03:35:22 UTC 2012 [03:39:40] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [03:45:40] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [04:04:16] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [04:04:16] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [04:09:58] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [04:10:43] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [05:56:50] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , itwiki (68153) [05:57:35] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , itwiki (71025) [06:16:16] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [06:23:19] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [06:25:43] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [06:37:17] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [06:38:01] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [06:48:15] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [07:24:57] PROBLEM - NTP on ms-be7 is CRITICAL: NTP CRITICAL: No response from NTP server [07:55:06] PROBLEM - Puppet freshness on snapshot1001 is CRITICAL: Puppet has not run in the last 10 hours [08:34:07] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [08:34:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:34:07] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [08:34:07] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [08:34:07] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [08:34:08] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [08:34:08] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [08:34:09] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [08:34:09] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [08:37:43] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [08:38:46] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [08:43:07] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [08:53:19] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [08:56:28] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [09:05:47] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [09:15:50] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [09:21:32] RECOVERY - Host ms-be7 is UP: PING WARNING - Packet loss = 37%, RTA = 291.75 ms [09:29:11] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [09:38:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [09:40:44] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [09:43:13] !log manually cleaning out /tmp on gallium. It is being filled by faulty tests not cleaning the files they are creating. [09:43:23] Logged the message, Master [09:48:23] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [10:04:44] PROBLEM - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is CRITICAL: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y CRITICAL - *2600* [10:07:53] RECOVERY - ps1-d1-sdtpa-infeed-load-tower-A-phase-Y on ps1-d1-sdtpa is OK: ps1-d1-sdtpa-infeed-load-tower-A-phase-Y OK - 2350 [10:13:08] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [10:16:53] RECOVERY - Host ms-be7 is UP: PING WARNING - Packet loss = 86%, RTA = 548.53 ms [10:20:47] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [10:21:59] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms [10:24:32] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [10:38:02] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [10:41:29] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [11:01:17] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [12:06:04] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [12:06:31] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [12:10:25] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:20:09] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [12:33:48] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [12:33:48] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [12:33:48] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [12:39:48] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:43:06] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 15.85 ms [12:50:45] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [12:59:54] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [13:02:18] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 104.88 ms [13:09:57] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [13:14:18] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [13:18:52] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 16%, RTA = 144.11 ms [13:23:13] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [13:26:31] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:52] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [13:32:58] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [13:40:10] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [13:47:58] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [13:50:04] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [13:50:35] !log installed Gerrit commitmsg hook in /h/w/c/php-1.20wmf10/.git/hooks [13:50:43] Logged the message, Master [13:52:06] !log Rebased local 1.20wmf10 directory on top of its current origin/1.20wmf10 ( 472ec09 ) [13:52:14] Logged the message, Master [13:55:28] !log Sent one live hack made to 1.20wmf10 : https://gerrit.wikimedia.org/r/#/c/22461/ [13:55:36] Logged the message, Master [14:04:55] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [14:04:55] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [14:17:31] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [14:19:01] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [14:24:01] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [14:27:19] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [14:31:40] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [14:33:37] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [14:37:49] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [14:44:44] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 40.30 ms [14:45:29] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [14:45:33] a [14:52:22] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [15:02:16] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.050 second response time [15:09:29] !log reinstalling all eqiad search frontends as precise [15:09:37] Logged the message, notpeter [15:21:01] PROBLEM - Host search1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:10] PROBLEM - Host search1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:10] PROBLEM - Host search1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:10] PROBLEM - Host search1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:19] PROBLEM - Host search1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:19] PROBLEM - Host search1009 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:19] PROBLEM - Host search1010 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:19] PROBLEM - Host search1005 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:19] PROBLEM - Host search1007 is DOWN: PING CRITICAL - Packet loss = 100% [15:21:37] PROBLEM - Host search1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:26:43] RECOVERY - Host search1003 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [15:26:52] RECOVERY - Host search1001 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [15:26:52] RECOVERY - Host search1002 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [15:26:52] RECOVERY - Host search1008 is UP: PING OK - Packet loss = 0%, RTA = 26.81 ms [15:27:01] RECOVERY - Host search1009 is UP: PING OK - Packet loss = 0%, RTA = 26.77 ms [15:27:01] RECOVERY - Host search1010 is UP: PING OK - Packet loss = 0%, RTA = 26.62 ms [15:27:01] RECOVERY - Host search1004 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [15:27:01] RECOVERY - Host search1005 is UP: PING OK - Packet loss = 0%, RTA = 26.65 ms [15:27:01] RECOVERY - Host search1007 is UP: PING OK - Packet loss = 0%, RTA = 27.08 ms [15:27:19] RECOVERY - Host search1006 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [15:30:01] PROBLEM - SSH on search1003 is CRITICAL: Connection refused [15:30:10] PROBLEM - SSH on search1005 is CRITICAL: Connection refused [15:30:10] PROBLEM - SSH on search1007 is CRITICAL: Connection refused [15:30:19] PROBLEM - SSH on search1004 is CRITICAL: Connection refused [15:30:19] PROBLEM - SSH on search1008 is CRITICAL: Connection refused [15:30:19] PROBLEM - SSH on search1001 is CRITICAL: Connection refused [15:30:28] PROBLEM - SSH on search1002 is CRITICAL: Connection refused [15:30:55] PROBLEM - Lucene disk space on search1006 is CRITICAL: Connection refused by host [15:31:25] PROBLEM - Lucene disk space on search1005 is CRITICAL: Connection refused by host [15:31:25] PROBLEM - Lucene disk space on search1003 is CRITICAL: Connection refused by host [15:31:25] PROBLEM - Lucene disk space on search1007 is CRITICAL: Connection refused by host [15:32:10] PROBLEM - Lucene disk space on search1001 is CRITICAL: Connection refused by host [15:32:10] PROBLEM - Lucene disk space on search1002 is CRITICAL: Connection refused by host [15:32:10] PROBLEM - Lucene disk space on search1008 is CRITICAL: Connection refused by host [15:32:10] PROBLEM - Lucene disk space on search1004 is CRITICAL: Connection refused by host [15:32:10] PROBLEM - Lucene disk space on search1010 is CRITICAL: Connection refused by host [15:32:11] PROBLEM - Lucene disk space on search1009 is CRITICAL: Connection refused by host [15:32:55] PROBLEM - SSH on search1006 is CRITICAL: Connection refused [15:32:55] PROBLEM - SSH on search1009 is CRITICAL: Connection refused [15:32:55] PROBLEM - SSH on search1010 is CRITICAL: Connection refused [15:34:52] PROBLEM - Lucene on search1001 is CRITICAL: Connection refused [15:34:52] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [15:34:52] PROBLEM - Lucene on search1006 is CRITICAL: Connection refused [15:34:52] PROBLEM - Lucene on search1003 is CRITICAL: Connection refused [15:34:52] PROBLEM - Lucene on search1004 is CRITICAL: Connection refused [15:34:53] PROBLEM - Lucene on search1008 is CRITICAL: Connection refused [15:34:53] PROBLEM - Lucene on search1005 is CRITICAL: Connection refused [15:34:54] PROBLEM - Lucene on search1007 is CRITICAL: Connection refused [15:34:54] PROBLEM - Lucene on search1010 is CRITICAL: Connection refused [15:34:55] PROBLEM - Lucene on search1009 is CRITICAL: Connection refused [15:35:10] PROBLEM - Host search1012 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:10] PROBLEM - Host search1011 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:10] PROBLEM - Host search1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:10] PROBLEM - Host search1013 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:10] PROBLEM - Host search1016 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:11] PROBLEM - Host search1014 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:11] PROBLEM - Host search1017 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:12] PROBLEM - Host search1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:12] PROBLEM - Host search1018 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:13] PROBLEM - Host search1020 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:13] PROBLEM - Host search1021 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:14] PROBLEM - Host search1022 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:14] PROBLEM - Host search1024 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:15] PROBLEM - Host search1023 is DOWN: PING CRITICAL - Packet loss = 100% [15:37:34] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [15:40:52] RECOVERY - Host search1012 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [15:40:52] RECOVERY - Host search1011 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [15:40:52] RECOVERY - Host search1015 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [15:40:52] RECOVERY - Host search1013 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [15:40:52] RECOVERY - Host search1014 is UP: PING OK - Packet loss = 0%, RTA = 26.60 ms [15:40:53] RECOVERY - Host search1016 is UP: PING OK - Packet loss = 0%, RTA = 26.82 ms [15:40:53] RECOVERY - Host search1019 is UP: PING OK - Packet loss = 0%, RTA = 26.59 ms [15:40:54] RECOVERY - Host search1018 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [15:40:54] RECOVERY - Host search1017 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [15:40:55] RECOVERY - Host search1024 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:40:55] RECOVERY - Host search1022 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [15:40:56] RECOVERY - Host search1020 is UP: PING OK - Packet loss = 0%, RTA = 26.79 ms [15:40:56] RECOVERY - Host search1021 is UP: PING OK - Packet loss = 0%, RTA = 26.75 ms [15:40:57] RECOVERY - Host search1023 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [15:42:31] PROBLEM - Host search1001 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:31] PROBLEM - Host search1003 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:31] PROBLEM - Host search1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:31] PROBLEM - Host search1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:40] PROBLEM - Host search1006 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:40] PROBLEM - Host search1007 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:40] PROBLEM - Host search1010 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:40] PROBLEM - Host search1008 is DOWN: PING CRITICAL - Packet loss = 100% [15:42:40] PROBLEM - Host search1009 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:52] RECOVERY - SSH on search1008 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:01] RECOVERY - SSH on search1003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:01] RECOVERY - SSH on search1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:01] RECOVERY - SSH on search1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:01] RECOVERY - SSH on search1010 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:01] RECOVERY - SSH on search1009 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:02] RECOVERY - SSH on search1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:02] RECOVERY - SSH on search1004 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:03] RECOVERY - SSH on search1005 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:03] RECOVERY - SSH on search1007 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:44:04] PROBLEM - SSH on search1017 is CRITICAL: Connection refused [15:44:04] PROBLEM - SSH on search1018 is CRITICAL: Connection refused [15:44:05] PROBLEM - SSH on search1021 is CRITICAL: Connection refused [15:44:05] RECOVERY - Host search1008 is UP: PING OK - Packet loss = 0%, RTA = 26.68 ms [15:44:10] PROBLEM - SSH on search1024 is CRITICAL: Connection refused [15:44:10] PROBLEM - SSH on search1020 is CRITICAL: Connection refused [15:44:10] PROBLEM - SSH on search1022 is CRITICAL: Connection refused [15:44:10] RECOVERY - Host search1003 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:44:10] RECOVERY - Host search1006 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:44:11] RECOVERY - Host search1002 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:44:11] RECOVERY - Host search1010 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:44:12] RECOVERY - Host search1009 is UP: PING OK - Packet loss = 0%, RTA = 26.70 ms [15:44:12] RECOVERY - Host search1001 is UP: PING OK - Packet loss = 0%, RTA = 26.44 ms [15:44:13] RECOVERY - Host search1004 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [15:44:13] RECOVERY - Host search1007 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [15:44:37] PROBLEM - Lucene disk space on search1020 is CRITICAL: Connection refused by host [15:44:37] PROBLEM - Lucene disk space on search1017 is CRITICAL: Connection refused by host [15:44:37] PROBLEM - Lucene disk space on search1014 is CRITICAL: Connection refused by host [15:44:37] PROBLEM - Lucene disk space on search1016 is CRITICAL: Connection refused by host [15:44:37] PROBLEM - Lucene disk space on search1022 is CRITICAL: Connection refused by host [15:44:38] PROBLEM - Lucene disk space on search1013 is CRITICAL: Connection refused by host [15:44:38] PROBLEM - Lucene disk space on search1011 is CRITICAL: Connection refused by host [15:44:39] PROBLEM - Lucene disk space on search1012 is CRITICAL: Connection refused by host [15:44:39] PROBLEM - Lucene disk space on search1015 is CRITICAL: Connection refused by host [15:44:40] PROBLEM - Lucene disk space on search1023 is CRITICAL: Connection refused by host [15:44:40] PROBLEM - Lucene disk space on search1018 is CRITICAL: Connection refused by host [15:44:41] PROBLEM - Lucene disk space on search1019 is CRITICAL: Connection refused by host [15:44:41] PROBLEM - Lucene disk space on search1021 is CRITICAL: Connection refused by host [15:44:42] PROBLEM - Lucene disk space on search1024 is CRITICAL: Connection refused by host [15:45:04] PROBLEM - SSH on search1013 is CRITICAL: Connection refused [15:45:31] PROBLEM - SSH on search1014 is CRITICAL: Connection refused [15:45:31] PROBLEM - SSH on search1016 is CRITICAL: Connection refused [15:45:40] PROBLEM - SSH on search1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:40] PROBLEM - SSH on search1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:40] PROBLEM - SSH on search1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:40] PROBLEM - SSH on search1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:40] PROBLEM - SSH on search1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:45:41] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 85.68 ms [15:45:52] Any ops/roots about? Can someone run chgrp wikidev /home/wikipedia/common/php-1.20wmf11/cache/l10n/ on fenari for me please? [15:45:58] sure [15:46:06] just that exact thing? [15:46:19] yup [15:46:20] please [15:46:49] done [15:46:52] need -R ? [15:47:12] Nope, it has no contents currently [15:47:18] cool. should be gtg [15:47:18] Lack of the group means I can't even populate it! [15:47:19] Thanks [15:47:24] np [15:48:31] PROBLEM - Host search1012 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:31] PROBLEM - Host search1015 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:31] PROBLEM - Host search1016 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:31] PROBLEM - Host search1017 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:31] PROBLEM - Host search1024 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:32] PROBLEM - Host search1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:32] PROBLEM - Host search1014 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:33] PROBLEM - Host search1021 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:33] PROBLEM - Host search1023 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:34] PROBLEM - Host search1022 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:34] PROBLEM - Host search1020 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:37] yeah yeah [15:48:40] RECOVERY - SSH on search1012 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:40] RECOVERY - SSH on search1014 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:40] RECOVERY - SSH on search1011 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:40] RECOVERY - SSH on search1023 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:40] RECOVERY - SSH on search1019 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:49] RECOVERY - SSH on search1020 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:49] RECOVERY - SSH on search1015 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:49] RECOVERY - SSH on search1017 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:49] RECOVERY - Host search1012 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:48:49] RECOVERY - Host search1014 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:48:50] RECOVERY - Host search1023 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [15:48:50] RECOVERY - Host search1019 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:48:58] RECOVERY - SSH on search1021 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:58] RECOVERY - SSH on search1018 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:48:58] RECOVERY - Host search1020 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:48:58] RECOVERY - Host search1015 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [15:48:58] RECOVERY - Host search1017 is UP: PING OK - Packet loss = 0%, RTA = 26.48 ms [15:49:07] RECOVERY - Host search1021 is UP: PING OK - Packet loss = 0%, RTA = 26.48 ms [15:49:43] RECOVERY - SSH on search1013 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:50:20] RECOVERY - SSH on search1016 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:50:20] RECOVERY - SSH on search1022 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:50:20] RECOVERY - SSH on search1024 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:50:20] PROBLEM - NTP on search1005 is CRITICAL: NTP CRITICAL: No response from NTP server [15:50:28] RECOVERY - Host search1016 is UP: PING OK - Packet loss = 0%, RTA = 26.47 ms [15:50:28] RECOVERY - Host search1022 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [15:50:28] RECOVERY - Host search1024 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [15:51:23] New patchset: Ottomata; "Precise has libcairo2, not libcairo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22478 [15:52:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22478 [15:52:51] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22478 [15:53:00] ottomata: merged [15:53:08] (assuming that's what you wanted...) [15:53:19] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [15:53:38] thanks! [15:53:40] np [15:55:25] RECOVERY - Lucene disk space on search1001 is OK: DISK OK [15:56:47] New patchset: Ottomata; "giovanni is now on stat1 - RT 3460" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22482 [15:56:47] PROBLEM - Lucene on search1011 is CRITICAL: Connection refused [15:56:47] PROBLEM - Lucene on search1018 is CRITICAL: Connection refused [15:56:55] PROBLEM - Lucene on search1015 is CRITICAL: Connection refused [15:56:55] PROBLEM - Lucene on search1012 is CRITICAL: Connection refused [15:56:55] PROBLEM - Lucene on search1014 is CRITICAL: Connection refused [15:57:04] PROBLEM - Lucene on search1013 is CRITICAL: Connection refused [15:57:04] PROBLEM - Lucene on search1017 is CRITICAL: Connection refused [15:57:18] notpeter, could approve that too pleasuh? [15:57:25] //gerrit.wikimedia.org/r/22482 [15:57:30] http://gerrit.wikimedia.org/r/22482 [15:57:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22482 [15:58:07] PROBLEM - Lucene on search1021 is CRITICAL: Connection refused [15:58:07] PROBLEM - Lucene on search1023 is CRITICAL: Connection refused [15:58:07] PROBLEM - Lucene on search1020 is CRITICAL: Connection refused [15:58:07] PROBLEM - Lucene on search1019 is CRITICAL: Connection refused [15:58:25] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [15:59:37] PROBLEM - Lucene on search1022 is CRITICAL: Connection refused [15:59:37] PROBLEM - Lucene on search1024 is CRITICAL: Connection refused [15:59:38] yep! [15:59:45] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22482 [15:59:50] danke! [16:01:52] RECOVERY - Lucene disk space on search1011 is OK: DISK OK [16:06:04] PROBLEM - NTP on search1008 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:07] PROBLEM - NTP on search1003 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:07] PROBLEM - NTP on search1006 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:16] PROBLEM - NTP on search1002 is CRITICAL: NTP CRITICAL: Offset unknown [16:07:16] PROBLEM - NTP on search1001 is CRITICAL: NTP CRITICAL: Offset unknown [16:07:16] PROBLEM - NTP on search1007 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:16] PROBLEM - NTP on search1004 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:25] PROBLEM - NTP on search1009 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:34] PROBLEM - NTP on search1010 is CRITICAL: NTP CRITICAL: No response from NTP server [16:07:52] RECOVERY - Lucene disk space on search1012 is OK: DISK OK [16:07:52] RECOVERY - Lucene disk space on search1002 is OK: DISK OK [16:10:25] PROBLEM - NTP on search1018 is CRITICAL: NTP CRITICAL: No response from NTP server [16:11:28] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [16:11:37] PROBLEM - NTP on search1012 is CRITICAL: NTP CRITICAL: Offset unknown [16:11:46] PROBLEM - NTP on search1021 is CRITICAL: NTP CRITICAL: No response from NTP server [16:11:55] RECOVERY - NTP on search1001 is OK: NTP OK: Offset -0.01949703693 secs [16:11:55] PROBLEM - NTP on search1013 is CRITICAL: NTP CRITICAL: Offset unknown [16:11:55] PROBLEM - NTP on search1014 is CRITICAL: NTP CRITICAL: No response from NTP server [16:11:55] PROBLEM - NTP on search1020 is CRITICAL: NTP CRITICAL: No response from NTP server [16:12:04] PROBLEM - NTP on search1015 is CRITICAL: NTP CRITICAL: No response from NTP server [16:12:04] PROBLEM - NTP on search1023 is CRITICAL: NTP CRITICAL: No response from NTP server [16:12:04] PROBLEM - NTP on search1017 is CRITICAL: NTP CRITICAL: No response from NTP server [16:12:13] PROBLEM - NTP on search1011 is CRITICAL: NTP CRITICAL: Offset unknown [16:12:13] PROBLEM - NTP on search1019 is CRITICAL: NTP CRITICAL: No response from NTP server [16:12:22] RECOVERY - Lucene disk space on search1003 is OK: DISK OK [16:13:25] PROBLEM - NTP on search1016 is CRITICAL: NTP CRITICAL: No response from NTP server [16:13:25] PROBLEM - NTP on search1024 is CRITICAL: NTP CRITICAL: No response from NTP server [16:13:34] PROBLEM - NTP on search1022 is CRITICAL: NTP CRITICAL: No response from NTP server [16:14:28] RECOVERY - Lucene disk space on search1013 is OK: DISK OK [16:16:43] RECOVERY - Host ms-be7 is UP: PING WARNING - Packet loss = 93%, RTA = 0.22 ms [16:17:01] RECOVERY - Lucene disk space on search1004 is OK: DISK OK [16:18:22] RECOVERY - NTP on search1011 is OK: NTP OK: Offset -0.006490826607 secs [16:18:49] RECOVERY - Lucene disk space on search1014 is OK: DISK OK [16:22:52] RECOVERY - NTP on search1002 is OK: NTP OK: Offset -0.0130828619 secs [16:23:10] RECOVERY - Lucene disk space on search1005 is OK: DISK OK [16:23:28] RECOVERY - Lucene disk space on search1015 is OK: DISK OK [16:23:55] RECOVERY - NTP on search1012 is OK: NTP OK: Offset -0.01048111916 secs [16:24:22] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [16:24:40] RECOVERY - NTP on search1015 is OK: NTP OK: Offset 0.04254758358 secs [16:28:07] RECOVERY - Lucene disk space on search1006 is OK: DISK OK [16:28:43] RECOVERY - NTP on search1003 is OK: NTP OK: Offset -0.003215432167 secs [16:28:52] RECOVERY - NTP on search1013 is OK: NTP OK: Offset -0.01371765137 secs [16:29:28] RECOVERY - Lucene disk space on search1016 is OK: DISK OK [16:32:55] RECOVERY - Lucene disk space on search1007 is OK: DISK OK [16:33:49] RECOVERY - NTP on search1004 is OK: NTP OK: Offset -0.009851694107 secs [16:34:07] RECOVERY - NTP on search1007 is OK: NTP OK: Offset 0.08741259575 secs [16:34:07] RECOVERY - NTP on search1017 is OK: NTP OK: Offset 0.0404509306 secs [16:34:34] RECOVERY - Lucene disk space on search1017 is OK: DISK OK [16:34:43] New patchset: Ottomata; "Installing required python packages for community-analytics (django) site." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22484 [16:35:01] RECOVERY - NTP on search1014 is OK: NTP OK: Offset -0.006680250168 secs [16:35:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22484 [16:39:04] RECOVERY - NTP on search1005 is OK: NTP OK: Offset 0.004180908203 secs [16:39:13] RECOVERY - Lucene disk space on search1008 is OK: DISK OK [16:39:13] RECOVERY - Lucene disk space on search1018 is OK: DISK OK [16:43:07] RECOVERY - NTP on search1006 is OK: NTP OK: Offset -0.01088643074 secs [16:43:25] RECOVERY - NTP on search1009 is OK: NTP OK: Offset 0.1265801191 secs [16:43:34] RECOVERY - Lucene disk space on search1009 is OK: DISK OK [16:45:26] RECOVERY - Lucene disk space on search1019 is OK: DISK OK [16:45:43] RECOVERY - NTP on search1016 is OK: NTP OK: Offset -0.0009349584579 secs [16:48:52] RECOVERY - NTP on search1010 is OK: NTP OK: Offset 0.009970188141 secs [16:49:55] RECOVERY - Lucene disk space on search1020 is OK: DISK OK [16:50:22] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [16:55:01] RECOVERY - NTP on search1021 is OK: NTP OK: Offset 0.02542984486 secs [16:55:10] RECOVERY - NTP on search1018 is OK: NTP OK: Offset -0.001874923706 secs [16:55:10] RECOVERY - NTP on search1008 is OK: NTP OK: Offset -0.008196234703 secs [16:56:04] RECOVERY - Lucene disk space on search1021 is OK: DISK OK [16:59:49] RECOVERY - NTP on search1019 is OK: NTP OK: Offset -0.009433627129 secs [17:00:43] RECOVERY - Lucene disk space on search1022 is OK: DISK OK [17:03:43] RECOVERY - Lucene disk space on search1023 is OK: DISK OK [17:05:58] RECOVERY - NTP on search1020 is OK: NTP OK: Offset -0.008770346642 secs [17:08:22] RECOVERY - Lucene disk space on search1024 is OK: DISK OK [17:09:16] RECOVERY - NTP on search1024 is OK: NTP OK: Offset -0.07337450981 secs [17:11:49] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [17:16:00] !log swift: distributing new ring; disable ms-be7, ms-be8, ms-be11 sdi, increase ms-be9/11/12 weight from 31 to 66 [17:16:09] Logged the message, Master [17:16:37] RECOVERY - NTP on search1022 is OK: NTP OK: Offset -0.007305502892 secs [17:17:04] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.027 second response time on port 8123 [17:19:19] RECOVERY - Lucene disk space on search1010 is OK: DISK OK [17:20:04] RECOVERY - Lucene on search1003 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:04] RECOVERY - Lucene on search1004 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:04] RECOVERY - Lucene on search1009 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:04] RECOVERY - Lucene on search1012 is OK: TCP OK - 0.028 second response time on port 8123 [17:20:13] RECOVERY - Lucene on search1010 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:13] RECOVERY - Lucene on search1008 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:13] RECOVERY - Lucene on search1002 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:22] RECOVERY - Lucene on search1005 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:22] RECOVERY - Lucene on search1011 is OK: TCP OK - 0.028 second response time on port 8123 [17:20:22] RECOVERY - Lucene on search1007 is OK: TCP OK - 0.027 second response time on port 8123 [17:20:22] RECOVERY - Lucene on search1013 is OK: TCP OK - 0.037 second response time on port 8123 [17:20:31] RECOVERY - Lucene on search1006 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:34] RECOVERY - NTP on search1023 is OK: NTP OK: Offset 0.002897977829 secs [17:21:43] RECOVERY - Lucene on search1022 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:43] RECOVERY - Lucene on search1014 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:43] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.030 second response time on port 8123 [17:21:43] RECOVERY - Lucene on search1017 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:43] RECOVERY - Lucene on search1020 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:52] RECOVERY - Lucene on search1021 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:52] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:52] RECOVERY - Lucene on search1024 is OK: TCP OK - 0.027 second response time on port 8123 [17:21:52] RECOVERY - Lucene on search1018 is OK: TCP OK - 0.027 second response time on port 8123 [17:22:01] RECOVERY - Lucene on search1019 is OK: TCP OK - 0.027 second response time on port 8123 [17:22:10] RECOVERY - Lucene on search1023 is OK: TCP OK - 0.027 second response time on port 8123 [17:22:10] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 3.48 ms [17:32:46] !log killing power via ipmi on ms-be7; broken hardware, stale data, should not be on rotation [17:32:55] Logged the message, Master [17:35:31] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [17:39:34] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [17:40:04] notpeter: is mw8 one of your test boxes? [17:40:33] ah, no broken dimm [17:40:36] paravoid: nope! all test boxes are out of rotation [17:40:48] and was 194 and 281, for the record [17:41:04] yeah I remember those two [17:41:15] are you working on labor day? [17:41:17] bad notpeter [17:42:02] I'm terrible at not labouring... [17:42:08] but now I'm off! have a good day! [17:42:27] !log powering down mw8; broken DIMM (#3499), has been flapping for days [17:42:36] Logged the message, Master [17:45:16] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [17:56:04] PROBLEM - Puppet freshness on snapshot1001 is CRITICAL: Puppet has not run in the last 10 hours [18:22:23] PROBLEM - Host virt1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:24:11] PROBLEM - SSH on virt1001 is CRITICAL: Connection refused [18:26:35] RECOVERY - Host virt1002 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [18:28:05] PROBLEM - Host virt1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:28:29] Can someone run this as root on fenari for me please? rm -rf /home/wikipedia/common/php-1.20wmf8 [18:30:13] Reedy: done [18:30:17] thanks [18:30:25] More umask issues [18:30:26] * Reedy grumbles [18:31:59] RECOVERY - SSH on virt1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [18:32:05] I love that Reedy doesn't have root [18:32:08] RECOVERY - Host virt1001 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [18:32:10] I don't trust him anyway [18:32:59] domas: I do have this overwhelming urge to use rm -rf / everywhere... [18:33:36] ye I know [18:34:59] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [18:34:59] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [18:34:59] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [18:34:59] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [18:34:59] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [18:35:00] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [18:35:00] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [18:35:01] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [18:35:01] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [19:03:30] New patchset: Ottomata; "Excluding /a/wikistats/{tmp,backup} from amanda backup of stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22495 [19:04:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22495 [19:39:02] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [20:37:23] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (33979) [20:37:50] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (33741) [20:53:50] New patchset: Hashar; "(bug 38299) alias 'cmr10' font to 'Computer Modern'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22533 [20:54:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/22533 [20:55:43] New review: Hashar; "$ fc-match cmr10" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/22533 [21:02:08] New review: Hashar; "Pending user input on bug 38299 to confirm this fix the issue." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/22533 [21:02:28] I am happy :) [21:02:31] see you tomorrow! [21:19:52] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [21:20:03] New patchset: Hoo man; "Clean up: Removed sep11wiki code/ old test files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/22534 [21:34:43] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [22:04:43] PROBLEM - Varnish traffic logger on cp1024 is CRITICAL: Connection refused by host [22:04:43] PROBLEM - Varnish HTTP upload-frontend on cp1024 is CRITICAL: Connection refused [22:05:01] PROBLEM - NTP on cp1024 is CRITICAL: NTP CRITICAL: No response from NTP server [22:05:19] PROBLEM - Varnish HTTP upload-backend on cp1024 is CRITICAL: Connection refused [22:05:19] PROBLEM - Varnish HTCP daemon on cp1024 is CRITICAL: Connection refused by host [22:05:46] PROBLEM - Varnish HTTP upload-backend on cp1028 is CRITICAL: Connection refused [22:05:46] PROBLEM - Varnish HTTP upload-frontend on cp1026 is CRITICAL: Connection refused [22:05:46] PROBLEM - Varnish HTTP upload-frontend on cp1028 is CRITICAL: Connection refused [22:05:46] PROBLEM - Varnish HTCP daemon on cp1025 is CRITICAL: Connection refused by host [22:05:55] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: Connection refused by host [22:06:04] PROBLEM - Varnish HTCP daemon on cp1027 is CRITICAL: Connection refused by host [22:06:13] PROBLEM - Varnish HTTP upload-backend on cp1027 is CRITICAL: Connection refused [22:06:22] PROBLEM - Varnish HTTP upload-backend on cp1025 is CRITICAL: Connection refused [22:06:22] PROBLEM - NTP on cp1025 is CRITICAL: NTP CRITICAL: No response from NTP server [22:06:31] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: Connection refused by host [22:06:31] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: Connection refused by host [22:06:31] PROBLEM - NTP on cp1028 is CRITICAL: NTP CRITICAL: No response from NTP server [22:06:31] PROBLEM - NTP on cp1026 is CRITICAL: NTP CRITICAL: No response from NTP server [22:06:40] PROBLEM - Varnish HTCP daemon on cp1028 is CRITICAL: Connection refused by host [22:06:40] PROBLEM - Varnish HTTP upload-frontend on cp1027 is CRITICAL: Connection refused [22:06:40] PROBLEM - Varnish HTTP upload-frontend on cp1025 is CRITICAL: Connection refused [22:06:40] PROBLEM - NTP on cp1027 is CRITICAL: NTP CRITICAL: No response from NTP server [22:06:49] PROBLEM - Varnish HTCP daemon on cp1026 is CRITICAL: Connection refused by host [22:06:59] PROBLEM - Varnish traffic logger on cp1027 is CRITICAL: Connection refused by host [22:07:07] PROBLEM - Varnish HTTP upload-backend on cp1026 is CRITICAL: Connection refused [22:35:25] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [22:35:25] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [22:35:25] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [22:41:25] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [23:01:22] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [23:05:27] those alerts don't look good but idk if they're really in use (e.g. in pybal) [23:06:14] but otoh, it's been an hour and no one complained [23:14:31] also watchmouse is all green [23:22:31] jeremyb: not in use afaik [23:22:48] paravoid: k [23:46:11] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (38478) [23:46:47] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (37630) [23:53:14] PROBLEM - Puppet freshness on cp1025 is CRITICAL: Puppet has not run in the last 10 hours [23:53:55] It's not all paravoid's fault if they are :D