[01:41:28] <nagios-wm>	 PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 220 seconds
[01:42:36] <nagios-wm>	 PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 257 seconds
[01:49:03] <nagios-wm>	 PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL -  Seconds_Behind_Master : 643s
[01:51:09] <nagios-wm>	 RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 23 seconds
[01:55:48] <nagios-wm>	 RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK -  Seconds_Behind_Master : 3s
[01:56:24] <nagios-wm>	 RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 3 seconds
[02:46:32] <gerrit-wm>	 New patchset: Krinkle; "Update default/index.html in Apache configuration." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16241
[03:00:15] <nagios-wm>	 PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours
[03:24:24] <nagios-wm>	 PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours
[05:58:05] <nagios-wm>	 PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours
[06:01:05] <nagios-wm>	 PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours
[06:03:02] <nagios-wm>	 PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100%
[06:03:38] <nagios-wm>	 RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[06:07:32] <nagios-wm>	 PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused
[06:30:56] <nagios-wm>	 PROBLEM - SSH on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:30:56] <nagios-wm>	 PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect:  10.0.8.8:11000 (Connection timed out) 10.0.8.10:11000 (timeout)
[06:31:32] <nagios-wm>	 PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:33:56] <nagios-wm>	 RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time
[06:38:07] <nagios-wm>	 RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0)
[06:39:01] <nagios-wm>	 PROBLEM - Memcached on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:41:34] <nagios-wm>	 RECOVERY - Memcached on srv260 is OK: TCP OK - 2.999 second response time on port 11000
[06:42:19] <nagios-wm>	 PROBLEM - SSH on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:50:07] <nagios-wm>	 PROBLEM - Memcached on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:51:28] <nagios-wm>	 RECOVERY - Memcached on srv260 is OK: TCP OK - 2.998 second response time on port 11000
[06:58:01] <nagios-wm>	 PROBLEM - Memcached on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:02:04] <nagios-wm>	 RECOVERY - Memcached on srv260 is OK: TCP OK - 2.999 second response time on port 11000
[07:19:28] <nagios-wm>	 RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online
[07:19:55] <nagios-wm>	 RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0)
[07:20:13] <nagios-wm>	 RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.036 second response time
[09:25:02] <nagios-wm>	 PROBLEM - LDAP on sanger is CRITICAL: Connection refused
[09:25:20] <nagios-wm>	 PROBLEM - LDAPS on sanger is CRITICAL: Connection refused
[09:29:50] <nagios-wm>	 PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours
[10:23:01] <nagios-wm>	 PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core:  cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR
[12:26:44] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:30:56] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds
[13:00:47] <nagios-wm>	 PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours
[13:02:53] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:11:08] <gerrit-wm>	 New patchset: Alex Monk; "(bug 36104) Enable Narayam on bnwikisource." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16252
[13:11:50] <Krenair>	 Uh oh...
[13:13:50] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.053 seconds
[13:25:41] <nagios-wm>	 PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours
[13:36:00] <gerrit-wm>	 New review: Alex Monk; "Have you forgotten about this change?" [operations/mediawiki-config] (master) C: 0;  - https://gerrit.wikimedia.org/r/12556
[13:45:30] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:57:39] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.769 seconds
[14:31:51] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:44:36] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds
[15:16:09] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:28:45] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds
[15:58:45] <nagios-wm>	 PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours
[16:01:54] <nagios-wm>	 PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours
[16:02:39] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:12:42] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.770 seconds
[16:46:42] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:56:45] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.260 seconds
[17:31:24] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:42:39] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.026 seconds
[18:15:11] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:28:41] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.037 seconds
[18:38:09] <gerrit-wm>	 New patchset: Alex Monk; "(bug 37885) Enable FlaggedRevs on trwikiquote." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16264
[19:00:02] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:14:54] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.020 seconds
[19:29:28] <gerrit-wm>	 New patchset: Alex Monk; "(bug 27918) Enable wgUseRCPatrol on hewikibooks." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16267
[19:30:39] <nagios-wm>	 PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours
[19:46:33] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:56:15] <gerrit-wm>	 New patchset: Alex Monk; "Remove swwiki's ridiculously high account creation throttle." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16268
[19:57:30] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds
[20:29:27] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:40:42] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.916 seconds
[21:07:46] <gerrit-wm>	 New review: Thehelpfulone; "+1'ing the change, but I would probably put this on hold for a couple more days in case others comme..." [operations/mediawiki-config] (master) C: 1;  - https://gerrit.wikimedia.org/r/16237
[21:13:15] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[21:24:12] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.460 seconds
[21:46:33] <nagios-wm>	 RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0
[21:59:18] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:07:42] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.867 seconds
[22:42:21] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[22:53:27] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.585 seconds
[22:56:27] <nagios-wm>	 RECOVERY - LDAP on sanger is OK: TCP OK - 0.003 second response time on port 389
[22:56:45] <nagios-wm>	 RECOVERY - LDAPS on sanger is OK: TCP OK - 0.005 second response time on port 636
[22:56:56] <Ryan_Lane>	 !log restarted opendj on sanger. The process OOM'd due to heap size.
[22:57:05] <morebots>	 Logged the message, Master
[23:01:15] <nagios-wm>	 PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours
[23:01:30] <woosters>	 ryan_lane - thanks!
[23:01:36] <woosters>	 mails are following again
[23:26:18] <nagios-wm>	 PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours
[23:26:36] <nagios-wm>	 PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:37:42] <nagios-wm>	 RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.462 seconds
[23:38:56] <Ryan_Lane>	 woosters: yw