[00:02:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [00:06:04] PROBLEM - MySQL Replication Heartbeat on db26 is CRITICAL: CRIT replication delay 181 seconds [00:06:13] PROBLEM - MySQL Slave Delay on db26 is CRITICAL: CRIT replication delay 187 seconds [00:06:58] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:13:52] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [00:21:13] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:21:58] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 181 seconds [00:23:01] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.428 second response time [00:23:11] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 212 seconds [00:27:04] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [00:29:55] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [00:35:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:01] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [00:39:04] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:10] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [00:44:38] !log tstarling synchronized php-1.21wmf6/extensions/ProofreadPage/ProofreadPage.body.php [00:44:47] Logged the message, Master [00:46:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.582 seconds [00:54:40] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [00:55:25] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [01:00:04] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:01:43] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.328 second response time [01:11:19] RECOVERY - MySQL Slave Delay on db26 is OK: OK replication delay 0 seconds [01:12:04] RECOVERY - MySQL Replication Heartbeat on db26 is OK: OK replication delay 0 seconds [01:13:34] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 263 seconds [01:14:55] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [01:15:13] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [01:21:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:35:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [02:00:58] PROBLEM - Puppet freshness on cp1001 is CRITICAL: Puppet has not run in the last 10 hours [02:00:58] PROBLEM - Puppet freshness on es9 is CRITICAL: Puppet has not run in the last 10 hours [02:00:59] PROBLEM - Puppet freshness on es3 is CRITICAL: Puppet has not run in the last 10 hours [02:00:59] PROBLEM - Puppet freshness on db48 is CRITICAL: Puppet has not run in the last 10 hours [02:00:59] PROBLEM - Puppet freshness on es8 is CRITICAL: Puppet has not run in the last 10 hours [02:00:59] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [02:00:59] PROBLEM - Puppet freshness on mw1152 is CRITICAL: Puppet has not run in the last 10 hours [02:01:00] PROBLEM - Puppet freshness on mw1149 is CRITICAL: Puppet has not run in the last 10 hours [02:01:00] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [02:01:01] PROBLEM - Puppet freshness on srv222 is CRITICAL: Puppet has not run in the last 10 hours [02:01:01] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [02:01:02] PROBLEM - Puppet freshness on srv228 is CRITICAL: Puppet has not run in the last 10 hours [02:02:01] PROBLEM - Puppet freshness on srv285 is CRITICAL: Puppet has not run in the last 10 hours [02:02:03] PROBLEM - Puppet freshness on mc2 is CRITICAL: Puppet has not run in the last 10 hours [02:02:03] PROBLEM - Puppet freshness on srv241 is CRITICAL: Puppet has not run in the last 10 hours [02:05:01] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:07:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:10] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [02:09:49] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [02:10:57] * jeremyb tries to decide if the problems he saw were related to that lvs6 blip [02:18:49] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [02:20:28] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.011 second response time on port 11000 [02:21:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.484 seconds [02:29:30] !log LocalisationUpdate completed (1.21wmf6) at Mon Dec 24 02:29:30 UTC 2012 [02:29:40] Logged the message, Master [02:30:58] PROBLEM - Puppet freshness on celsus is CRITICAL: Puppet has not run in the last 10 hours [02:30:58] PROBLEM - Puppet freshness on cp1029 is CRITICAL: Puppet has not run in the last 10 hours [02:30:58] PROBLEM - Puppet freshness on mw1030 is CRITICAL: Puppet has not run in the last 10 hours [02:30:59] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [02:30:59] PROBLEM - Puppet freshness on srv210 is CRITICAL: Puppet has not run in the last 10 hours [02:30:59] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [02:30:59] PROBLEM - Puppet freshness on snapshot1001 is CRITICAL: Puppet has not run in the last 10 hours [02:32:01] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [02:41:37] RECOVERY - Puppet freshness on neon is OK: puppet ran at Mon Dec 24 02:41:08 UTC 2012 [02:41:37] RECOVERY - Puppet freshness on ms-be6 is OK: puppet ran at Mon Dec 24 02:41:09 UTC 2012 [02:41:38] RECOVERY - Puppet freshness on es9 is OK: puppet ran at Mon Dec 24 02:41:28 UTC 2012 [02:44:10] RECOVERY - Puppet freshness on db48 is OK: puppet ran at Mon Dec 24 02:43:42 UTC 2012 [02:44:10] RECOVERY - Puppet freshness on srv241 is OK: puppet ran at Mon Dec 24 02:43:45 UTC 2012 [02:44:37] RECOVERY - Puppet freshness on es8 is OK: puppet ran at Mon Dec 24 02:44:13 UTC 2012 [02:46:07] RECOVERY - Puppet freshness on sq85 is OK: puppet ran at Mon Dec 24 02:45:43 UTC 2012 [02:48:13] RECOVERY - Puppet freshness on es3 is OK: puppet ran at Mon Dec 24 02:47:57 UTC 2012 [02:49:34] RECOVERY - Puppet freshness on mw1149 is OK: puppet ran at Mon Dec 24 02:49:12 UTC 2012 [02:51:13] RECOVERY - Puppet freshness on srv228 is OK: puppet ran at Mon Dec 24 02:50:48 UTC 2012 [02:51:40] RECOVERY - Puppet freshness on tin is OK: puppet ran at Mon Dec 24 02:51:28 UTC 2012 [02:57:13] RECOVERY - Puppet freshness on snapshot1001 is OK: puppet ran at Mon Dec 24 02:56:52 UTC 2012 [02:57:40] RECOVERY - Puppet freshness on srv210 is OK: puppet ran at Mon Dec 24 02:57:22 UTC 2012 [02:58:43] RECOVERY - Puppet freshness on cp1001 is OK: puppet ran at Mon Dec 24 02:58:10 UTC 2012 [03:01:07] RECOVERY - Puppet freshness on mw1115 is OK: puppet ran at Mon Dec 24 03:00:54 UTC 2012 [03:03:04] RECOVERY - Puppet freshness on mc2 is OK: puppet ran at Mon Dec 24 03:02:49 UTC 2012 [03:04:43] RECOVERY - Puppet freshness on mw1152 is OK: puppet ran at Mon Dec 24 03:04:31 UTC 2012 [03:05:10] RECOVERY - Puppet freshness on mw1030 is OK: puppet ran at Mon Dec 24 03:04:58 UTC 2012 [03:06:31] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Dec 24 03:06:16 UTC 2012 [03:06:40] RECOVERY - Puppet freshness on srv285 is OK: puppet ran at Mon Dec 24 03:06:30 UTC 2012 [03:06:58] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Mon Dec 24 03:06:42 UTC 2012 [03:18:13] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Mon Dec 24 03:18:03 UTC 2012 [03:24:13] RECOVERY - Puppet freshness on celsus is OK: puppet ran at Mon Dec 24 03:23:40 UTC 2012 [04:14:38] PROBLEM - MySQL Slave Delay on db26 is CRITICAL: CRIT replication delay 187 seconds [04:16:08] PROBLEM - MySQL Replication Heartbeat on db26 is CRITICAL: CRIT replication delay 241 seconds [04:16:26] RECOVERY - MySQL Slave Delay on db26 is OK: OK replication delay 0 seconds [04:17:47] RECOVERY - MySQL Replication Heartbeat on db26 is OK: OK replication delay 0 seconds [04:36:14] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [04:45:14] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [04:45:15] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [04:45:15] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [04:45:15] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [05:33:14] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Puppet has not run in the last 10 hours [06:06:14] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:06:14] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [06:10:35] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [06:31:51] PROBLEM - MySQL Replication Heartbeat on db26 is CRITICAL: CRIT replication delay 183 seconds [06:32:18] PROBLEM - MySQL Slave Delay on db26 is CRITICAL: CRIT replication delay 185 seconds [06:33:39] RECOVERY - MySQL Replication Heartbeat on db26 is OK: OK replication delay 0 seconds [06:34:06] RECOVERY - MySQL Slave Delay on db26 is OK: OK replication delay 0 seconds [07:26:45] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [07:26:45] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [07:26:46] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [07:26:46] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [07:26:46] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [07:26:46] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [07:39:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:44:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [08:15:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.077 seconds [08:49:58] New patchset: MaxSem; "Cronjob to delete old Solr logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40304 [09:04:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:16:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.077 seconds [09:49:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:02:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [10:28:11] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [10:31:11] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [10:32:21] afk for a bit (errands) [10:35:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:39:17] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [10:47:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.802 seconds [11:16:11] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [11:23:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.827 seconds [12:02:14] PROBLEM - Puppet freshness on srv222 is CRITICAL: Puppet has not run in the last 10 hours [12:08:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:20:23] RECOVERY - Puppet freshness on srv222 is OK: puppet ran at Mon Dec 24 12:19:59 UTC 2012 [12:21:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.038 seconds [12:54:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:06:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [13:22:34] New patchset: Reedy; "Update php symlink" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40309 [13:22:48] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/40309 [13:25:10] Not sure who is about... [13:25:13] Can someone please do rm -rf /home/wikipedia/common/php-1.21wmf3 [13:25:14] Thanks [13:25:20] sec [13:26:05] seems to be owned by you [13:26:21] reedy:wikidev [13:26:23] rm: cannot remove `php-1.21wmf3/.git/objects/8d/68acec4a6c8ebff7bc3170ae2d93c5a4c58f38': Permission denied [13:26:23] rm: cannot remove `php-1.21wmf3/extensions/ZeroRatedMobileAccess/.git/objects/d2/17759b70d97d0ef0b223b156d6e39d0b95678a': Permission denied [13:26:41] ah [13:27:05] mobile (as usual ;)) [13:27:22] done [13:27:30] thanks [13:27:57] yw [13:28:34] !log reedy synchronized live-1.5/ [13:30:19] deployment? [13:31:02] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:31:32] Just cleaning up old files from multiple versions ago [13:34:41] PROBLEM - Varnish HTCP daemon on cp1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:35:08] PROBLEM - Varnish HTTP mobile-backend on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:35] PROBLEM - Varnish HTTP mobile-frontend on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:57] New patchset: Reedy; "Log sync-docroot calls" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40311 [13:36:02] PROBLEM - SSH on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:37:41] RECOVERY - SSH on cp1043 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:39:02] RECOVERY - Varnish HTTP mobile-frontend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.053 seconds [13:39:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:56] RECOVERY - Varnish HTCP daemon on cp1043 is OK: PROCS OK: 1 process with UID = 1001 (varnishhtcpd), args varnishhtcpd worker [13:40:23] RECOVERY - Varnish HTTP mobile-backend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 698 bytes in 0.053 seconds [13:49:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.805 seconds [14:24:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:35:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.495 seconds [14:37:23] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [14:44:38] PROBLEM - Varnish HTCP daemon on cp1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:45:05] PROBLEM - Varnish HTTP mobile-backend on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:32] PROBLEM - Varnish HTTP mobile-frontend on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:41] PROBLEM - SSH on cp1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:50] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [14:45:50] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [14:45:51] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [14:45:51] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [14:46:17] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:49:25] !log rebooting cp1043 which had fallen over [14:52:08] RECOVERY - Varnish HTTP mobile-backend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 696 bytes in 0.054 seconds [14:52:44] RECOVERY - SSH on cp1043 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:53:38] RECOVERY - Varnish HTCP daemon on cp1043 is OK: PROCS OK: 1 process with UID = 1001 (varnishhtcpd), args varnishhtcpd worker [14:54:32] RECOVERY - Varnish HTTP mobile-frontend on cp1043 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.053 seconds [15:10:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:21:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.643 seconds [15:34:08] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: Puppet has not run in the last 10 hours [15:55:26] afk again but for several hours this time, things seem pretty quiet though, just the way we like it [15:56:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:08] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:07:08] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [16:08:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.958 seconds [16:42:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:56:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [17:19:35] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 187 seconds [17:19:44] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 192 seconds [17:27:59] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [17:28:00] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [17:28:00] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [17:28:00] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [17:28:00] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [17:28:00] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [17:30:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:14] PROBLEM - MySQL Replication Heartbeat on db26 is CRITICAL: CRIT replication delay 182 seconds [17:38:29] PROBLEM - MySQL Slave Delay on db26 is CRITICAL: CRIT replication delay 226 seconds [17:38:56] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 9 seconds [17:39:05] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [17:39:23] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 181 seconds [17:41:11] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [17:42:05] RECOVERY - MySQL Slave Delay on db26 is OK: OK replication delay 0 seconds [17:42:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.049 seconds [17:43:08] RECOVERY - MySQL Replication Heartbeat on db26 is OK: OK replication delay 0 seconds [17:49:26] PROBLEM - Varnish traffic logger on cp1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:15:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:26:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.737 seconds [18:31:51] New review: Nikerabbit; "Isn't this usually done via logrotate?" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/40304 [18:33:29] New review: MaxSem; "Log4j uses its own rotation, however it doesn't delete the old files, our Lucene cluster does the sa..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/40304 [19:01:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.535 seconds [19:47:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:02:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [20:26:07] PROBLEM - Host cp1044 is DOWN: PING CRITICAL - Packet loss = 100% [20:29:34] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [20:31:59] RECOVERY - Host cp1044 is UP: PING OK - Packet loss = 0%, RTA = 26.46 ms [20:32:34] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [20:34:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:35:52] PROBLEM - SSH on cp1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:35:52] PROBLEM - Varnish traffic logger on cp1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:28] PROBLEM - Varnish HTCP daemon on cp1044 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:47] PROBLEM - Varnish HTTP mobile-backend on cp1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:37:13] PROBLEM - Varnish HTTP mobile-frontend on cp1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:39:08] New patchset: Cmjohnson; "Adding Tampa new frack bast host "pappas"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40333 [20:40:04] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/40333 [20:40:31] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [20:48:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.167 seconds [20:57:01] PROBLEM - NTP on cp1044 is CRITICAL: NTP CRITICAL: No response from NTP server [21:17:34] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [21:20:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:34:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.045 seconds [22:06:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:18:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.111 seconds [22:53:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:07:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds [23:40:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:51:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.647 seconds