[00:12:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:20:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.608 seconds [00:53:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:06] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [01:03:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.017 seconds [01:08:06] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:36:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:42:21] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 274 seconds [01:42:48] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 255 seconds [01:44:09] PROBLEM - Puppet freshness on mw1100 is CRITICAL: Puppet has not run in the last 10 hours [01:44:09] PROBLEM - Puppet freshness on dobson is CRITICAL: Puppet has not run in the last 10 hours [01:44:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.534 seconds [01:47:09] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [01:49:42] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 669s [01:49:51] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [01:51:12] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [01:51:12] PROBLEM - Puppet freshness on mw1042 is CRITICAL: Puppet has not run in the last 10 hours [01:51:12] PROBLEM - Puppet freshness on mw1059 is CRITICAL: Puppet has not run in the last 10 hours [01:51:12] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [01:51:12] PROBLEM - Puppet freshness on search29 is CRITICAL: Puppet has not run in the last 10 hours [01:52:15] PROBLEM - Puppet freshness on argon is CRITICAL: Puppet has not run in the last 10 hours [01:52:15] PROBLEM - Puppet freshness on mw1008 is CRITICAL: Puppet has not run in the last 10 hours [01:52:15] PROBLEM - Puppet freshness on mw1015 is CRITICAL: Puppet has not run in the last 10 hours [01:52:15] PROBLEM - Puppet freshness on mw1043 is CRITICAL: Puppet has not run in the last 10 hours [01:52:15] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [01:52:16] PROBLEM - Puppet freshness on mw1095 is CRITICAL: Puppet has not run in the last 10 hours [01:52:16] PROBLEM - Puppet freshness on mw1086 is CRITICAL: Puppet has not run in the last 10 hours [01:52:17] PROBLEM - Puppet freshness on mw1102 is CRITICAL: Puppet has not run in the last 10 hours [01:52:17] PROBLEM - Puppet freshness on mw1073 is CRITICAL: Puppet has not run in the last 10 hours [01:52:18] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [01:52:18] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [01:52:19] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [01:52:19] PROBLEM - Puppet freshness on search23 is CRITICAL: Puppet has not run in the last 10 hours [01:52:20] PROBLEM - Puppet freshness on sq64 is CRITICAL: Puppet has not run in the last 10 hours [01:52:20] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [01:52:21] PROBLEM - Puppet freshness on ocg1 is CRITICAL: Puppet has not run in the last 10 hours [01:53:09] PROBLEM - Puppet freshness on carbon is CRITICAL: Puppet has not run in the last 10 hours [01:54:12] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 39s [01:54:48] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 2 seconds [02:18:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.306 seconds [02:38:09] RECOVERY - Puppet freshness on mw1047 is OK: puppet ran at Sun Jul 15 02:37:41 UTC 2012 [02:38:09] RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Sun Jul 15 02:37:44 UTC 2012 [02:38:36] RECOVERY - Puppet freshness on search23 is OK: puppet ran at Sun Jul 15 02:38:24 UTC 2012 [02:40:06] RECOVERY - Puppet freshness on mw1060 is OK: puppet ran at Sun Jul 15 02:39:59 UTC 2012 [02:40:15] RECOVERY - Puppet freshness on mw1034 is OK: puppet ran at Sun Jul 15 02:40:06 UTC 2012 [02:41:27] RECOVERY - Puppet freshness on mw1043 is OK: puppet ran at Sun Jul 15 02:41:04 UTC 2012 [02:42:39] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Sun Jul 15 02:42:21 UTC 2012 [02:43:33] RECOVERY - Puppet freshness on mw1057 is OK: puppet ran at Sun Jul 15 02:43:27 UTC 2012 [02:44:37] RECOVERY - Puppet freshness on mw1121 is OK: puppet ran at Sun Jul 15 02:44:19 UTC 2012 [02:44:37] RECOVERY - Puppet freshness on mw1059 is OK: puppet ran at Sun Jul 15 02:44:32 UTC 2012 [02:45:03] RECOVERY - Puppet freshness on mw1155 is OK: puppet ran at Sun Jul 15 02:44:52 UTC 2012 [02:47:18] RECOVERY - Puppet freshness on mw1073 is OK: puppet ran at Sun Jul 15 02:47:08 UTC 2012 [02:47:27] RECOVERY - Puppet freshness on mw1086 is OK: puppet ran at Sun Jul 15 02:47:12 UTC 2012 [02:47:36] RECOVERY - Puppet freshness on sq64 is OK: puppet ran at Sun Jul 15 02:47:30 UTC 2012 [02:48:03] RECOVERY - Puppet freshness on mw1099 is OK: puppet ran at Sun Jul 15 02:47:47 UTC 2012 [02:48:39] RECOVERY - Puppet freshness on mw1007 is OK: puppet ran at Sun Jul 15 02:48:16 UTC 2012 [02:48:57] RECOVERY - Puppet freshness on pdf1 is OK: puppet ran at Sun Jul 15 02:48:42 UTC 2012 [02:49:06] RECOVERY - Puppet freshness on mw1003 is OK: puppet ran at Sun Jul 15 02:48:52 UTC 2012 [02:49:33] RECOVERY - Puppet freshness on mw1042 is OK: puppet ran at Sun Jul 15 02:49:21 UTC 2012 [02:49:33] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Sun Jul 15 02:49:29 UTC 2012 [02:50:18] RECOVERY - Puppet freshness on mw1028 is OK: puppet ran at Sun Jul 15 02:50:05 UTC 2012 [02:52:06] RECOVERY - Puppet freshness on mw1095 is OK: puppet ran at Sun Jul 15 02:52:02 UTC 2012 [02:53:27] RECOVERY - Puppet freshness on mw1071 is OK: puppet ran at Sun Jul 15 02:53:13 UTC 2012 [02:54:03] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Sun Jul 15 02:53:48 UTC 2012 [02:54:39] RECOVERY - Puppet freshness on mw1092 is OK: puppet ran at Sun Jul 15 02:54:09 UTC 2012 [02:55:33] RECOVERY - Puppet freshness on ms1004 is OK: puppet ran at Sun Jul 15 02:55:31 UTC 2012 [02:58:33] RECOVERY - Puppet freshness on mw1123 is OK: puppet ran at Sun Jul 15 02:58:21 UTC 2012 [02:58:33] RECOVERY - Puppet freshness on mw1015 is OK: puppet ran at Sun Jul 15 02:58:31 UTC 2012 [02:59:09] RECOVERY - Puppet freshness on mw1078 is OK: puppet ran at Sun Jul 15 02:58:47 UTC 2012 [02:59:36] RECOVERY - Puppet freshness on mw1153 is OK: puppet ran at Sun Jul 15 02:59:19 UTC 2012 [02:59:36] RECOVERY - Puppet freshness on ocg1 is OK: puppet ran at Sun Jul 15 02:59:20 UTC 2012 [03:00:03] RECOVERY - Puppet freshness on mw1089 is OK: puppet ran at Sun Jul 15 02:59:46 UTC 2012 [03:00:03] RECOVERY - Puppet freshness on dobson is OK: puppet ran at Sun Jul 15 02:59:59 UTC 2012 [03:00:57] RECOVERY - Puppet freshness on singer is OK: puppet ran at Sun Jul 15 03:00:45 UTC 2012 [03:01:33] RECOVERY - Puppet freshness on mw1112 is OK: puppet ran at Sun Jul 15 03:01:04 UTC 2012 [03:02:18] RECOVERY - Puppet freshness on mw1110 is OK: puppet ran at Sun Jul 15 03:02:03 UTC 2012 [03:02:36] RECOVERY - Puppet freshness on mw1013 is OK: puppet ran at Sun Jul 15 03:02:30 UTC 2012 [03:03:03] RECOVERY - Puppet freshness on search29 is OK: puppet ran at Sun Jul 15 03:02:48 UTC 2012 [03:03:03] RECOVERY - Puppet freshness on mw1104 is OK: puppet ran at Sun Jul 15 03:03:02 UTC 2012 [03:03:39] RECOVERY - Puppet freshness on mw1082 is OK: puppet ran at Sun Jul 15 03:03:23 UTC 2012 [03:04:06] RECOVERY - Puppet freshness on mw1062 is OK: puppet ran at Sun Jul 15 03:03:51 UTC 2012 [03:04:33] RECOVERY - Puppet freshness on mw1119 is OK: puppet ran at Sun Jul 15 03:04:29 UTC 2012 [03:05:36] RECOVERY - Puppet freshness on argon is OK: puppet ran at Sun Jul 15 03:05:31 UTC 2012 [03:06:04] RECOVERY - Puppet freshness on mw1100 is OK: puppet ran at Sun Jul 15 03:05:54 UTC 2012 [03:06:42] RECOVERY - Puppet freshness on mw1088 is OK: puppet ran at Sun Jul 15 03:06:08 UTC 2012 [03:07:06] RECOVERY - Puppet freshness on mw1102 is OK: puppet ran at Sun Jul 15 03:06:50 UTC 2012 [03:07:34] RECOVERY - Puppet freshness on mw1008 is OK: puppet ran at Sun Jul 15 03:07:11 UTC 2012 [03:07:34] RECOVERY - Puppet freshness on analytics1002 is OK: puppet ran at Sun Jul 15 03:07:11 UTC 2012 [03:07:34] RECOVERY - Puppet freshness on mw1115 is OK: puppet ran at Sun Jul 15 03:07:29 UTC 2012 [03:41:46] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [03:44:46] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [03:46:09] @info pc1 [03:46:10] Krinkle: [pc1: s7] 10.0.0.221 [07:01:47] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:02:23] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:56] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [07:29:23] RECOVERY - MySQL Slave Delay on db12 is OK: OK replication delay 1 seconds [07:30:26] RECOVERY - MySQL Replication Heartbeat on db12 is OK: OK replication delay 0 seconds [08:10:53] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [08:21:59] PROBLEM - Host mw1059 is DOWN: PING CRITICAL - Packet loss = 100% [09:23:50] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [09:34:47] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [10:59:56] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [11:08:55] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:30:01] New patchset: Alex Monk; "(bug 38408) Create suppressredirect group on ruwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/15683 [11:53:01] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [11:59:37] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [12:01:25] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [12:04:45] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:21:14] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [12:42:32] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [12:50:56] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [12:51:59] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [13:10:17] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [13:33:32] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [13:35:56] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [13:39:59] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [13:42:32] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [13:45:33] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [14:13:26] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [15:17:05] New patchset: Krinkle; "bits/robots.txt: Remove commented out rules, re-instate noindex." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14744 [15:17:14] New patchset: Krinkle; "Remove old bits/test.txt file" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/14748 [15:59:50] mw8 appears to have been rebooting periodically anyne know what's up with that? (don't see the cause in the syslog, just the reboots) [16:06:04] stating the obvious: it doesn't sound happy [16:06:05] :p [16:10:10] geethanks [16:10:11] :-P [16:10:50] http://ganglia.wikimedia.org/latest/?r=day&cs=7%2F15%2F2012+9%3A37&ce=7%2F15%2F2012+16%3A6&c=Application+servers+pmtpa&h=mw8.pmtpa.wmnet&tab=m&vn=&mc=2&z=small&metric_group= [16:11:33] wio and mem are interesting [16:12:54] well it's not an emergency so I'll leave it for folsk to look at (maybe me) tomorrow, since I don't see anything obvious without running tests I guess... atop didn't tell me much either except for the wio [17:12:03] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [18:03:32] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [18:04:35] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [18:05:40] bah, again? [18:05:59] as least it doesn't page [18:07:53] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [18:12:14] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [18:33:41] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:37:08] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [18:40:44] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.62 ms [18:44:11] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [18:48:51] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [19:16:14] LeslieCarr! [19:16:28] what's happening? are people hacking? [19:16:37] everyone's leaving the venue [19:16:38] i am in my room watching dollhouse [19:16:44] and being antisocial [19:16:48] and it's everything i hoped it would be [19:16:53] ! [19:17:27] :) [19:17:49] how was the unconference ? [19:19:12] LeslieCarr: slightly productive (for me at least) [19:25:10] PROBLEM - Puppet freshness on ms3 is CRITICAL: Puppet has not run in the last 10 hours [19:27:35] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun Jul 15 19:27:22 UTC 2012 [19:58:55] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [19:59:49] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 1.78 ms [20:03:34] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [20:30:32] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [21:01:17] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [21:10:17] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [21:14:02] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [21:36:47] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.334 second response time [21:48:47] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [21:49:50] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [21:53:26] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [21:53:35] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [22:05:35] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.401 second response time [23:41:44] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [23:43:05] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [23:46:05] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [23:50:48] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15439 [23:50:49] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15438 [23:50:50] Change merged: Tim Starling; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15437