[00:08:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:21:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.213 seconds [00:27:55] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [00:42:01] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:09] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.002 second response time on port 11000 [00:57:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:03:27] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:06] RECOVERY - Memcached on virt0 is OK: TCP OK - 8.993 second response time on port 11000 [01:10:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [01:12:54] PROBLEM - Apache HTTP on srv220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:54] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [01:18:27] PROBLEM - SSH on srv220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:18:51] grrr died again [01:19:57] RECOVERY - SSH on srv220 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:20:27] swap death? [01:21:54] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [01:22:02] yes [01:22:03] but there's a larger issue [01:22:07] I'm sending a mail to ops@ now [01:30:45] RECOVERY - Apache HTTP on srv220 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [01:33:24] !log powercycling srv220/srv224, swapdeath [01:33:39] Logged the message, Master [01:34:03] !log reedy synchronized wmf-config/CommonSettings.php 'wgMaxImageArea to 20MP' [01:34:16] Logged the message, Master [01:34:44] it's a PDF [01:34:47] same PDF [01:35:06] PROBLEM - Host srv219 is DOWN: PING CRITICAL - Packet loss = 100% [01:35:53] !log powercycling srv219, swap-died while trying to get a core dump [01:36:06] Logged the message, Master [01:38:54] RECOVERY - Host srv219 is UP: PING OK - Packet loss = 0%, RTA = 1.14 ms [01:40:21] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 188 seconds [01:42:42] Failed to boot both default and fallback entries. [01:42:44] yay [01:43:39] PROBLEM - Apache HTTP on srv219 is CRITICAL: Connection refused [01:44:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:52:48] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [01:55:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.449 seconds [01:59:53] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 269 seconds [02:01:39] RECOVERY - Apache HTTP on srv219 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.129 second response time [02:10:21] RECOVERY - Host srv224 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [02:12:45] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [02:14:24] PROBLEM - SSH on srv224 is CRITICAL: Connection refused [02:14:33] PROBLEM - Apache HTTP on srv224 is CRITICAL: Connection refused [02:28:54] !log LocalisationUpdate completed (1.21wmf2) at Sun Oct 21 02:28:50 UTC 2012 [02:29:07] Logged the message, Master [02:31:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:33] PROBLEM - Host srv224 is DOWN: PING CRITICAL - Packet loss = 100% [02:40:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [02:41:24] RECOVERY - Host srv224 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [02:42:49] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:49:24] PROBLEM - Host srv224 is DOWN: PING CRITICAL - Packet loss = 100% [02:49:42] !log LocalisationUpdate completed (1.21wmf1) at Sun Oct 21 02:49:42 UTC 2012 [02:49:56] Logged the message, Master [03:19:48] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 30 seconds [03:24:54] RECOVERY - Host srv224 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [03:44:15] PROBLEM - Host srv224 is DOWN: PING CRITICAL - Packet loss = 100% [04:00:54] RECOVERY - Host srv224 is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms [04:04:50] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [04:04:50] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [04:10:32] RECOVERY - SSH on srv224 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [04:36:27] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [04:38:33] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:41:51] RECOVERY - Memcached on virt0 is OK: TCP OK - 9.008 second response time on port 11000 [05:42:36] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:44:06] RECOVERY - Memcached on virt0 is OK: TCP OK - 2.993 second response time on port 11000 [05:57:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.017 seconds [06:37:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:50:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.389 seconds [06:55:11] New review: Dereckson; "Shellpolicy has been mitigated. There is now a local consensus, so the change can be safely merged a..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/23927 [06:55:21] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [07:26:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:37:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.072 seconds [08:02:32] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [08:11:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.428 seconds [08:28:29] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:28:29] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [08:57:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:03:35] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [09:10:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.018 seconds [09:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:57:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [10:28:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [10:29:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:36:53] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:38:26] RECOVERY - Memcached on virt0 is OK: TCP OK - 8.993 second response time on port 11000 [10:42:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.438 seconds [11:16:29] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [11:17:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:22:30] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [11:30:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.172 seconds [11:53:32] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [12:04:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:13:29] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [12:15:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.176 seconds [12:43:29] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:51:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:04:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.345 seconds [13:38:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:48:20] fg [13:48:20] dfg [13:48:26] fdg [13:48:26] dfg [13:48:26] df [13:48:26] g [13:48:26] dfgdf [13:48:26] g [13:48:26] df [13:48:26] g [13:48:26] dfg [13:48:26] dfg [13:48:26] dfg [13:48:26] dfdg [13:48:40] dfg [13:48:40] dfg [13:48:40] dfg [13:48:40] dfg [13:48:40] dfg [13:48:40] dfg [13:48:40] dfg [13:48:40] fdgdf [13:48:40] g [13:48:41] dfg [13:48:41] dfg [13:48:42] dfg [13:48:42] dfg [13:48:43] dfg [13:48:43] df [13:48:43] g [13:48:44] dfg [13:48:44] dfg [13:49:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.475 seconds [13:50:19] fd [13:50:20] df [13:50:20] df [13:50:21] dfd [13:50:21] f [13:50:21] d [13:50:21] f [13:50:21] d [13:50:21] f [13:50:22] d [13:50:23] f [13:50:23] df [13:50:23] df [13:50:23] df [14:05:32] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [14:05:32] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [14:13:55] !ops [14:14:23] That host was banned from -dev and was trolling in -labs yesterday [14:23:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.205 seconds [15:10:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [15:40:38] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:09] RECOVERY - Memcached on virt0 is OK: TCP OK - 2.994 second response time on port 11000 [15:57:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:10:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.204 seconds [16:46:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:56:32] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [16:58:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.248 seconds [17:29:06] labsoconsole's having issues. see #wikimedia-labs starting around 17:15 UTC [17:29:13] bbl [17:32:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:33] PROBLEM - Puppet freshness on search1006 is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on search24 is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on search1024 is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on sq62 is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on sq75 is CRITICAL: Puppet has not run in the last 10 hours [17:35:33] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on search1005 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on search1008 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [17:36:35] PROBLEM - Puppet freshness on search1016 is CRITICAL: Puppet has not run in the last 10 hours [17:36:36] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [17:36:36] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [17:36:37] PROBLEM - Puppet freshness on sq51 is CRITICAL: Puppet has not run in the last 10 hours [17:36:37] PROBLEM - Puppet freshness on search32 is CRITICAL: Puppet has not run in the last 10 hours [17:36:38] PROBLEM - Puppet freshness on sq86 is CRITICAL: Puppet has not run in the last 10 hours [17:36:38] PROBLEM - Puppet freshness on sq77 is CRITICAL: Puppet has not run in the last 10 hours [17:37:29] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Puppet has not run in the last 10 hours [17:37:29] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Puppet has not run in the last 10 hours [17:37:29] PROBLEM - Puppet freshness on stat1001 is CRITICAL: Puppet has not run in the last 10 hours [17:37:29] PROBLEM - Puppet freshness on mw7 is CRITICAL: Puppet has not run in the last 10 hours [17:38:32] PROBLEM - Puppet freshness on argon is CRITICAL: Puppet has not run in the last 10 hours [17:38:36] PROBLEM - Puppet freshness on nitrogen is CRITICAL: Puppet has not run in the last 10 hours [17:42:36] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [17:45:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.979 seconds [18:03:35] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [18:08:32] PROBLEM - Puppet freshness on yvon is CRITICAL: Puppet has not run in the last 10 hours [18:19:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:32] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [18:29:32] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [18:34:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.038 seconds [19:04:29] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [19:06:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:17:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.107 seconds [19:52:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:06:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [20:29:32] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [20:39:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:54:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [21:17:32] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [21:23:33] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [21:26:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:39:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.015 seconds [21:40:20] RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Sun Oct 21 21:40:00 UTC 2012 [21:54:39] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [22:13:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:25:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.816 seconds [22:44:32] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [22:50:10] !log tstarling synchronized php-1.21wmf2/bin/ulimit4.sh [22:50:24] Logged the message, Master [23:00:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:13:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.283 seconds [23:47:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds