[00:00:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24645 [00:00:58] RECOVERY - Apache HTTP on srv190 is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.006 seconds [00:09:04] RECOVERY - NTP on srv190 is OK: NTP OK: Offset 0.0157648325 secs [00:13:16] PROBLEM - Apache HTTP on srv190 is CRITICAL: Connection refused [00:13:49] AaronSchulz: we should test it :) [00:19:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:20:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24645 [00:27:04] PROBLEM - Puppet freshness on analytics1005 is CRITICAL: Puppet has not run in the last 10 hours [00:33:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.804 seconds [00:38:01] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [00:54:25] RECOVERY - Apache HTTP on srv190 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [01:08:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:22:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.045 seconds [01:41:13] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 216 seconds [01:42:43] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 262 seconds [01:45:43] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 5 seconds [01:45:52] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 7 seconds [01:54:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.145 seconds [02:59:10] RECOVERY - Puppet freshness on analytics1005 is OK: puppet ran at Sat Sep 22 02:59:06 UTC 2012 [03:18:49] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:18:49] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [03:18:49] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [03:18:49] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [03:18:49] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [03:18:49] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:10:06] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 212 seconds [05:10:42] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 205 seconds [05:16:33] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [05:16:51] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [05:18:08] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [05:30:08] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 216 seconds [05:30:35] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 216 seconds [05:34:47] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 4 seconds [05:35:05] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [06:06:08] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [06:12:38] so how many jobrunners do we have now compared to before https://gerrit.wikimedia.org/r/24645 ? [06:24:44] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [06:32:23] RECOVERY - Lucene on search1015 is OK: TCP OK - 9.032 second response time on port 8123 [06:43:50] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [06:52:05] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:53:17] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [07:26:08] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 187 seconds [07:26:08] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 186 seconds [07:27:47] RECOVERY - Lucene on search1015 is OK: TCP OK - 9.017 second response time on port 8123 [07:27:56] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [07:29:08] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [07:29:08] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [07:38:57] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [07:40:09] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.027 second response time on port 8123 [07:48:34] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:50:04] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [08:17:12] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [08:24:33] RECOVERY - Lucene on search1015 is OK: TCP OK - 3.023 second response time on port 8123 [08:33:47] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:35:17] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [08:36:38] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [08:41:35] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [08:47:44] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [08:47:44] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.027 second response time on port 8123 [08:49:59] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:49:59] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [09:05:26] PROBLEM - Lucene on search1015 is CRITICAL: Connection timed out [09:24:29] RECOVERY - Lucene on search1015 is OK: TCP OK - 3.018 second response time on port 8123 [09:26:13] !log Restarted lucene on search1015 [09:26:24] Logged the message, Master [09:43:24] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [10:38:49] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [10:41:29] New patchset: Dereckson; "(bug 40436) Namespaces configuration for se.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24656 [10:45:16] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 199 seconds [10:46:01] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 220 seconds [11:11:58] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 7 seconds [11:12:07] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 16 seconds [11:18:07] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 243 seconds [11:18:07] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 243 seconds [12:07:20] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [12:10:02] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [13:02:36] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 4 seconds [13:03:12] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 2 seconds [13:12:03] PROBLEM - Host search32 is DOWN: PING CRITICAL - Packet loss = 100% [13:15:12] RECOVERY - Host search32 is UP: PING OK - Packet loss = 0%, RTA = 2.42 ms [13:19:51] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [13:19:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:19:51] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [13:19:51] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [13:19:51] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [13:19:51] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [15:18:03] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [16:07:22] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [16:12:28] PROBLEM - Host cp1043 is DOWN: PING CRITICAL - Packet loss = 100% [17:28:59] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [18:19:18] can anyone tell me when the last updates for http://rt.wikimedia.org/Ticket/Display.html?id=452 and http://rt.wikimedia.org/Ticket/Display.html?id=456 were? [18:39:36] ACKNOWLEDGEMENT - MySQL Slave Delay on es1001 is CRITICAL: CRIT replication delay 94330 seconds asher part of making read-only [18:39:36] ACKNOWLEDGEMENT - MySQL Slave Delay on es1002 is CRITICAL: CRIT replication delay 82406 seconds asher part of making read-only [18:40:06] ACKNOWLEDGEMENT - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 82379 seconds asher part of making read-only [18:40:06] ACKNOWLEDGEMENT - MySQL Slave Delay on es2 is CRITICAL: CRIT replication delay 94392 seconds asher part of making read-only [18:40:36] ACKNOWLEDGEMENT - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 94438 seconds asher part of making read-only [18:46:24] PROBLEM - MySQL Replication Heartbeat on db1001 is CRITICAL: CRIT replication delay 269 seconds [18:46:42] PROBLEM - MySQL Slave Delay on db1001 is CRITICAL: CRIT replication delay 287 seconds [18:51:21] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [18:51:21] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [18:54:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:57:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.170 seconds [19:32:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:38:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.187 seconds [19:43:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:18] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [19:44:27] RECOVERY - MySQL Replication Heartbeat on db1001 is OK: OK replication delay 0 seconds [19:44:36] RECOVERY - MySQL Slave Delay on db1001 is OK: OK replication delay 0 seconds [19:44:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [20:12:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [20:39:36] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [20:45:59] New review: Dereckson; "Moving the discussion back to the bug report to discuss the change more broadly (and in a more natur..." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/23985 [21:00:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:02:14] New patchset: Dereckson; "(bug 39569) Activating flood flag on it.wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24671 [21:12:29] New patchset: Dereckson; "(bug 38398) Namespace configuration for meta." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24672 [21:14:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [21:47:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:00:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.203 seconds [22:15:03] New review: Dereckson; "We need bug reference." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/24561 [22:34:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:48:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.120 seconds [23:20:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [23:20:51] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [23:20:51] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [23:20:51] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [23:20:51] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [23:20:51] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [23:23:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:35:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.319 seconds