[00:26:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:26:25] RECOVERY - Host ms-be1007 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [00:30:10] PROBLEM - SSH on ms-be1007 is CRITICAL: Connection refused [00:39:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [00:44:25] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [00:48:19] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Puppet has not run in the last 10 hours [00:53:25] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [01:13:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:15:28] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [01:17:16] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [01:17:54] New patchset: Asher; "memcached package should not be ensure latest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41194 [01:23:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.202 seconds [01:29:25] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [01:32:25] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [01:59:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:15:19] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [02:16:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.034 seconds [02:27:25] !log LocalisationUpdate completed (1.21wmf6) at Sat Dec 29 02:27:24 UTC 2012 [02:27:36] Logged the message, Master [02:57:01] RECOVERY - Puppet freshness on manganese is OK: puppet ran at Sat Dec 29 02:56:54 UTC 2012 [03:02:25] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [03:02:25] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [03:43:21] New patchset: Ryan Lane; "Use non-deprecated method of getting json" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41200 [03:43:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41200 [04:51:19] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [04:59:25] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [04:59:25] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [04:59:25] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [04:59:25] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [05:35:44] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 195 seconds [05:36:11] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 204 seconds [05:38:53] PROBLEM - Memcached on mc1011 is CRITICAL: Connection refused [05:38:53] PROBLEM - Memcached on mc1003 is CRITICAL: Connection refused [05:38:53] PROBLEM - Memcached on mc1013 is CRITICAL: Connection refused [05:38:53] PROBLEM - Memcached on mc1009 is CRITICAL: Connection refused [05:38:53] PROBLEM - Memcached on mc1010 is CRITICAL: Connection refused [05:38:54] PROBLEM - Memcached on mc1005 is CRITICAL: Connection refused [05:38:54] PROBLEM - Memcached on mc1007 is CRITICAL: Connection refused [05:38:55] PROBLEM - Memcached on mc1015 is CRITICAL: Connection refused [05:38:55] PROBLEM - Memcached on mc1014 is CRITICAL: Connection refused [05:38:56] PROBLEM - Memcached on mc1016 is CRITICAL: Connection refused [05:38:56] PROBLEM - Memcached on mc1006 is CRITICAL: Connection refused [05:38:57] PROBLEM - Memcached on mc1008 is CRITICAL: Connection refused [05:39:02] PROBLEM - Memcached on mc1004 is CRITICAL: Connection refused [05:42:40] New patchset: Ori.livneh; "(RT 4139) vanadium: add 'mflaschen' w/sudo; mongo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41204 [05:46:37] hey ops. anyone around to look at a three-line puppet change? https://gerrit.wikimedia.org/r/#/c/41204/ [05:48:02] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [05:48:29] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [05:55:35] ori-l: can you split that into separeate commits? [05:58:58] otherwise it looks ok [05:59:08] New patchset: Ori.livneh; "(RT 4139) vanadium: add 'mflaschen' w/sudo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41204 [06:01:08] New patchset: Ori.livneh; "Add MongoDB to vanadium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41206 [06:01:31] jeremyb: done; thanks for feedback. [06:02:13] ori-l: now you added mongo (but left mysql there too) instead of replacing it [06:03:32] Yes, I thought better of it. There are various MySQL tools that that class pulls in and I'd need to be careful about specifying them some other way before I drop it. [06:04:24] ahh [06:16:32] New review: Jeremyb; "I obviously can't see the RT but otherwise LGTM." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/41204 [06:17:29] ori-l: so why are you working now? ;-) [06:22:32] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [06:22:32] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [06:28:41] RECOVERY - Puppet freshness on ms-fe1001 is OK: puppet ran at Sat Dec 29 06:28:22 UTC 2012 [07:43:40] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [07:43:40] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [07:43:40] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [07:43:40] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [07:43:40] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [07:43:41] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [08:29:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:33:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.656 seconds [08:49:56] PROBLEM - Puppet freshness on solr2 is CRITICAL: Puppet has not run in the last 10 hours [08:51:53] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [09:01:56] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [09:01:56] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [09:02:59] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [09:06:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.113 seconds [09:49:56] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [09:56:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:06:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.408 seconds [10:15:35] RECOVERY - Puppet freshness on ms-be3002 is OK: puppet ran at Sat Dec 29 10:15:17 UTC 2012 [10:42:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:53] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [10:54:54] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [10:59:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [11:27:02] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 220 seconds [11:27:20] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 228 seconds [11:29:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:30:29] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [11:39:29] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [11:39:29] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [11:43:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.050 seconds [12:16:32] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [12:16:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:29:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.130 seconds [13:02:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:03:29] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [13:03:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:17:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.063 seconds [13:50:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:04:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.453 seconds [14:37:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:52:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.040 seconds [14:52:21] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [15:00:27] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [15:00:27] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [15:00:27] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [15:00:27] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [15:24:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:40:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [16:10:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.841 seconds [16:23:20] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [16:23:20] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [16:25:26] PROBLEM - Puppet freshness on mw1157 is CRITICAL: Puppet has not run in the last 10 hours [16:58:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:12:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [17:44:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:23] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [17:45:23] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [17:45:23] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [17:45:23] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [17:45:23] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [17:45:24] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [17:56:47] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 198 seconds [17:57:05] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 204 seconds [17:58:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [18:16:12] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [18:16:12] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [18:31:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [18:51:36] PROBLEM - Puppet freshness on solr2 is CRITICAL: Puppet has not run in the last 10 hours [18:53:33] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [18:54:36] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: Puppet has not run in the last 10 hours [19:03:36] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [19:03:36] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [19:04:39] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [19:11:33] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [19:18:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:32:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.898 seconds [19:51:36] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [20:05:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [20:47:31] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [20:47:31] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [20:49:24] Ryan_Lane: virt0 ^ [20:53:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:56:31] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [21:07:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.622 seconds [21:25:37] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.008 second response time on port 11000 [21:31:28] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [21:42:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:56:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.162 seconds [22:18:15] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [22:29:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:42:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.830 seconds [23:04:18] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [23:04:18] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [23:17:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:31:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds