[00:00:13] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db62 is CRITICAL: Connection refused by host
[00:00:14] <nagios-wm>	 PROBLEM - Full LVS Snapshot on db62 is CRITICAL: Connection refused by host
[00:00:27] <gerrit-wm>	 New patchset: RobH; "adding caesium to parsoid eqiad cluster role" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42358
[00:03:13] <nagios-wm>	 PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours
[00:03:14] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours
[00:03:14] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours
[00:03:14] <nagios-wm>	 PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours
[00:03:14] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours
[00:03:14] <nagios-wm>	 PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours
[00:05:10] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db62 is OK: OK replication delay seconds
[00:05:28] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db62 is OK: OK replication delay seconds
[00:05:37] <nagios-wm>	 RECOVERY - Full LVS Snapshot on db62 is OK: OK no full LVM snapshot volumes
[00:06:04] <nagios-wm>	 RECOVERY - MySQL Idle Transactions on db62 is OK: OK longest blocking idle transaction sleeps for seconds
[00:06:13] <nagios-wm>	 RECOVERY - MySQL Slave Running on db62 is OK: OK replication
[00:06:40] <nagios-wm>	 RECOVERY - MySQL disk space on db62 is OK: DISK OK
[00:06:41] <nagios-wm>	 RECOVERY - MySQL Recent Restart on db62 is OK: OK seconds since restart
[00:08:38] <gerrit-wm>	 New patchset: Pyoungmeister; "testing: swapping db62 to use coredb::research class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42359
[00:10:52] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:18:57] <binasher>	 notpeter: db66 just died or something
[00:19:39] <binasher>	 s/died/stalled
[00:23:28] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.904 seconds
[00:24:48] <gerrit-wm>	 Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42359
[00:30:52] <gerrit-wm>	 New patchset: Pyoungmeister; "coredb: innodb_log_file_size need to be set to match shard" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42360
[00:31:57] <gerrit-wm>	 Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42360
[00:35:01] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 208 seconds
[00:35:20] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 217 seconds
[00:41:34] <gerrit-wm>	 New patchset: Pyoungmeister; "coredb: adding lucid support for packages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42364
[00:42:22] <gerrit-wm>	 Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42364
[00:42:39] <Susan>	 dumps.wikimedia.org seems super-slow. I'm downloading the MediaWiki installer and it's going at 24.0 KB/sec.
[00:47:28] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds
[00:47:46] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds
[00:57:13] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:11:28] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.018 seconds
[01:11:35] <Amgine>	 apergos: !b #43647. Someone is whingeing.
[01:12:13] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 260 seconds
[01:15:41] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds
[01:43:53] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:56:19] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.852 seconds
[02:20:29] <gerrit-wm>	 New patchset: Dereckson; "Reset to UTC the es.wikivoyage.org timezone." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42368
[02:22:35] <gerrit-wm>	 New review: Dereckson; "Community prefers UTC as timezone:" [operations/mediawiki-config] (master) C: 0;  - https://gerrit.wikimedia.org/r/42368
[02:29:42] <logmsgbot>	 !log LocalisationUpdate completed (1.21wmf6) at Sat Jan  5 02:29:41 UTC 2013
[02:29:52] <morebots>	 Logged the message, Master
[02:32:11] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:37:43] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 194 seconds
[02:38:28] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 202 seconds
[02:39:13] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.229 seconds
[02:43:52] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 185 seconds
[02:44:55] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 204 seconds
[02:47:37] <nagios-wm>	 PROBLEM - Parsoid on lardner is CRITICAL: Connection refused
[02:51:58] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds
[02:52:34] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds
[02:53:31] <logmsgbot>	 !log LocalisationUpdate completed (1.21wmf7) at Sat Jan  5 02:53:30 UTC 2013
[02:53:40] <morebots>	 Logged the message, Master
[02:57:40] <nagios-wm>	 PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:57:59] <nagios-wm>	 PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:58:43] <nagios-wm>	 PROBLEM - Apache HTTP on mw45 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:59:19] <nagios-wm>	 PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:59:37] <nagios-wm>	 RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.674 second response time
[03:00:23] <nagios-wm>	 RECOVERY - Apache HTTP on mw45 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time
[03:00:58] <nagios-wm>	 RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.064 second response time
[03:01:07] <nagios-wm>	 RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time
[03:03:31] <nagios-wm>	 RECOVERY - Parsoid on lardner is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.007 seconds
[03:22:25] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 209 seconds
[03:22:43] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 215 seconds
[03:44:30] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 5 seconds
[03:44:57] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds
[03:50:21] <nagios-wm>	 PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours
[05:23:14] <nagios-wm>	 PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours
[05:23:15] <nagios-wm>	 PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours
[05:45:08] <nagios-wm>	 PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours
[06:00:53] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 215 seconds
[06:01:11] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 223 seconds
[06:02:50] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:04:30] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.585 seconds
[06:09:53] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds
[06:10:11] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds
[06:41:23] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:52:01] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds
[06:59:14] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:00:52] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.046 seconds
[07:11:49] <nagios-wm>	 PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours
[07:16:46] <nagios-wm>	 PROBLEM - Puppet freshness on virt1008 is CRITICAL: Puppet has not run in the last 10 hours
[07:25:20] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:39:34] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.053 seconds
[08:11:46] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:24:13] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds
[08:42:13] <nagios-wm>	 PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours
[08:42:13] <nagios-wm>	 PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours
[08:45:49] <nagios-wm>	 PROBLEM - Host virt1008 is DOWN: PING CRITICAL - Packet loss = 100%
[08:57:49] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:03:23] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 203 seconds
[09:03:41] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 215 seconds
[09:10:25] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds
[09:10:43] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 1 seconds
[09:13:52] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds
[09:43:16] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 186 seconds
[09:44:01] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 193 seconds
[09:45:49] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:46:52] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds
[09:47:37] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds
[10:00:05] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.052 seconds
[10:04:16] <nagios-wm>	 PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours
[10:04:16] <nagios-wm>	 PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours
[10:04:16] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours
[10:04:16] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours
[10:04:16] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours
[10:04:17] <nagios-wm>	 PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours
[10:32:19] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:44:46] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.541 seconds
[11:06:49] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 208 seconds
[11:06:49] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 207 seconds
[11:10:16] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds
[11:10:17] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds
[11:18:31] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:29:10] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.974 seconds
[11:54:13] <nagios-wm>	 PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours
[12:04:44] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:11:12] <Nemo_bis>	 https://gerrit.wikimedia.org/r/#/c/41562/ would be nice to implement as soon as possible
[12:11:27] <Nemo_bis>	 considering that now also zh.wiki may need HTTPS for the great firewall
[12:17:10] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.048 seconds
[12:41:08] <petan>	 can someone check mc on labs?
[12:41:23] <petan>	 mutante? apergos? paravoid?
[12:44:14] <petan>	 andrewbogott_afk? or anyone who is awake?
[12:44:18] <petan>	 labsconsole is kind of down
[12:51:04] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:52:56] <petan>	 is here anyone awake?
[13:05:19] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.064 seconds
[13:23:14] <petan>	 come on people...
[13:23:30] <petan>	 this is kind of urgent - labs are half unusable
[13:39:04] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:51:13] <nagios-wm>	 PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours
[13:53:19] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds
[14:25:07] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:26:06] <petan>	 still 0 ops awake?
[14:39:13] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.038 seconds
[14:42:36] <Thehelpfulone>	 petan, hmm it's a saturday, a number of ops are in SF where it's currently 6:42am
[14:43:25] <petan>	 Thehelpfulone does it mean that if wikipedia cluster fail now, it would be down for hours?
[14:43:55] <petan>	 I know labs aren't wikipedia, but still, it would be nice to have someone who can be online during EU times
[14:44:04] <Thehelpfulone>	 no, someone would start pinging and paging :P
[14:51:29] <Nemo_bis>	 but as you said, it's only Labs
[14:57:38] <petan>	 Nemo_bis I don't think it's so hard to get someone who could do that
[14:57:45] <petan>	 even if it's just labs
[14:58:05] <petan>	 most of open source devs have time only during weekends
[15:07:28] <Thehelpfulone>	 petan, do you know what the problem is, or labs is just down?
[15:12:40] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:24:14] <nagios-wm>	 PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours
[15:24:14] <nagios-wm>	 PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours
[15:26:04] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.018 seconds
[15:31:49] <petan>	 Thehelpfulone the memcached server is down - which makes it unusable since session info is in memcached
[15:32:04] <petan>	 so we can't login to console and control instances
[15:32:20] <Thehelpfulone>	 labsconsole was up for me a little while back
[15:32:31] <petan>	 could you login?
[15:32:34] <petan>	 because no one can
[15:45:51] <nagios-wm>	 PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours
[15:46:40] <Thehelpfulone>	 ah no, don't seem to be able to
[15:58:19] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:08:57] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.550 seconds
[16:44:03] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:59:26] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds
[17:07:50] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 187 seconds
[17:08:27] <paravoid>	 petan: pong
[17:08:43] <paravoid>	 !log restarting memcached on virt0
[17:08:45] <petan>	 memcached for labsconsole is borked
[17:08:54] <morebots>	 Logged the message, Master
[17:09:12] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 219 seconds
[17:09:26] <paravoid>	 it's a saturday, it's not about being awake
[17:10:39] <petan>	 I know, but labs is something what is mostly being used during weekends
[17:11:26] <petan>	 if there is no one who is willing to keep it operational during weekends, why not to hire someone? or change the system so that labs community people can fix these problems?
[17:11:40] <paravoid>	 I didn't say I don't care, I'm just saying that that's why noone was around
[17:12:09] <petan>	 well, I figured
[17:13:05] <nagios-wm>	 PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours
[17:13:59] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1028 is CRITICAL: CRIT replication delay 190 seconds
[17:14:00] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1041 is CRITICAL: CRIT replication delay 190 seconds
[17:14:08] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1041 is CRITICAL: CRIT replication delay 192 seconds
[17:14:35] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1024 is CRITICAL: CRIT replication delay 201 seconds
[17:14:44] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1024 is CRITICAL: CRIT replication delay 205 seconds
[17:28:50] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1024 is OK: OK replication delay 0 seconds
[17:28:59] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1024 is OK: OK replication delay 0 seconds
[17:29:53] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1041 is OK: OK replication delay 0 seconds
[17:29:54] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1028 is OK: OK replication delay 0 seconds
[17:30:02] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1041 is OK: OK replication delay 0 seconds
[17:31:50] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:46:05] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds
[17:58:50] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds
[17:59:08] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds
[18:19:05] <nagios-wm>	 PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 228 seconds
[18:19:06] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 228 seconds
[18:19:59] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:30:38] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.126 seconds
[18:31:24] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds
[18:31:24] <nagios-wm>	 RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds
[18:43:05] <nagios-wm>	 PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours
[18:43:06] <nagios-wm>	 PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours
[19:06:11] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:20:27] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.073 seconds
[19:51:59] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:01:38] <pgehres_>	 friendly ops folks: how do i silence/ACK a Nagios page?
[20:05:39] <nagios-wm>	 PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours
[20:05:39] <nagios-wm>	 PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours
[20:05:39] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours
[20:05:39] <nagios-wm>	 PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours
[20:05:39] <nagios-wm>	 PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours
[20:05:40] <nagios-wm>	 PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours
[20:06:14] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds
[20:18:32] <nagios-wm>	 PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100%
[20:19:44] <nagios-wm>	 RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[20:24:14] <nagios-wm>	 PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused
[20:39:41] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:52:09] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds
[20:56:25] <nagios-wm>	 RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time
[21:25:14] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:39:28] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds
[21:55:32] <nagios-wm>	 PROBLEM - Puppet freshness on search30 is CRITICAL: Puppet has not run in the last 10 hours
[22:12:37] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:23:16] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.318 seconds
[22:25:45] <gerrit-wm>	 New patchset: Reedy; "Disable TorBlock on private wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/42389
[22:58:08] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:12:23] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.016 seconds
[23:45:51] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:52:27] <nagios-wm>	 PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours
[23:58:26] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds