[00:19:00] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [00:23:57] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [00:27:51] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [00:33:33] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 0.61 ms [00:56:57] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [01:05:57] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [01:09:38] Reedy: to bad there is not anything beyond EXPLAIN EXTENDED [01:09:41] *too [01:09:51] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [01:09:57] :/ [01:11:51] about ~15k rows get filtered down to 1346 in one table, and then some min,max joins made in that result set...it shouldn't be slower [01:11:57] *slow [01:12:13] I don't know why the 'anon' one is waaay faster [01:12:47] it seemingly "just broke" in wmf3 [01:12:49] apparently [01:15:33] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [01:40:53] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 316 seconds [01:42:14] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 398 seconds [01:43:53] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [01:45:50] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [02:00:32] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 279 seconds [02:00:50] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 297 seconds [02:02:48] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [02:08:29] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 1.21 ms [02:24:59] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [02:24:59] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:24:59] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [02:24:59] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [02:27:00] !log LocalisationUpdate completed (1.21wmf4) at Sat Nov 17 02:27:00 UTC 2012 [02:27:07] Logged the message, Master [02:44:56] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [02:50:38] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [02:52:30] !log LocalisationUpdate completed (1.21wmf3) at Sat Nov 17 02:52:30 UTC 2012 [02:52:38] Logged the message, Master [03:27:23] PROBLEM - Host ms-be7 is DOWN: PING CRITICAL - Packet loss = 100% [03:33:05] RECOVERY - Host ms-be7 is UP: PING OK - Packet loss = 0%, RTA = 1.45 ms [03:42:59] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [03:49:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [03:49:53] PROBLEM - Squid on brewster is CRITICAL: Connection refused [03:50:29] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [07:09:04] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay NULL seconds [07:14:10] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 4879703 seconds [07:20:09] RECOVERY - Squid on brewster is OK: TCP OK - 0.001 second response time on port 8080 [07:38:27] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [07:43:15] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [08:45:12] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [08:45:12] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [10:20:05] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [10:25:02] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [10:58:14] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [11:07:23] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [11:50:55] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [12:26:37] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [12:26:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:26:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [12:26:37] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [12:38:37] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [12:49:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:55:07] PROBLEM - LVS Lucene on search-pool3.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:56:10] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: Connection timed out [12:57:31] RECOVERY - LVS Lucene on search-pool1.svc.eqiad.wmnet is OK: TCP OK - 0.029 second response time on port 8123 [12:58:16] PROBLEM - Lucene on search1002 is CRITICAL: Connection timed out [12:58:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.463 seconds [13:01:16] PROBLEM - Lucene on search1011 is CRITICAL: Connection timed out [13:09:49] RECOVERY - LVS Lucene on search-pool3.svc.eqiad.wmnet is OK: TCP OK - 9.020 second response time on port 8123 [13:10:52] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: Connection timed out [13:12:13] RECOVERY - LVS Lucene on search-pool1.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [13:14:28] PROBLEM - Lucene on search1001 is CRITICAL: Connection timed out [13:14:55] PROBLEM - LVS Lucene on search-pool3.svc.eqiad.wmnet is CRITICAL: Connection timed out [13:17:28] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.027 second response time on port 8123 [13:20:46] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: Connection timed out [13:22:07] RECOVERY - LVS Lucene on search-pool1.svc.eqiad.wmnet is OK: TCP OK - 3.023 second response time on port 8123 [13:32:46] RECOVERY - Lucene on search1002 is OK: TCP OK - 9.019 second response time on port 8123 [13:33:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:33:58] RECOVERY - Lucene on search1011 is OK: TCP OK - 0.027 second response time on port 8123 [13:34:25] RECOVERY - LVS Lucene on search-pool3.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [13:44:37] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:49:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.049 seconds [14:23:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [15:10:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:28:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [15:54:05] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:41] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:50] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:08] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:08] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:08] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:35] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.063 second response time [15:55:35] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:56:29] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.553 second response time [15:56:38] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.076 second response time [15:56:38] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.218 second response time [15:57:05] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.747 second response time [15:57:50] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.682 second response time [15:57:59] PROBLEM - LVS Lucene on search-pool1.svc.eqiad.wmnet is CRITICAL: Connection timed out [15:58:17] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.908 second response time [15:59:20] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:59:29] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:59:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:14] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:50] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.350 second response time [16:01:44] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 63670 bytes in 3.060 seconds [16:02:20] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:38] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.050 second response time [16:03:50] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.600 second response time [16:03:59] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:04:35] PROBLEM - Apache HTTP on srv208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:03] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:05:03] PROBLEM - LVS Lucene on search-pool2.svc.eqiad.wmnet is CRITICAL: Connection timed out [16:05:03] PROBLEM - Lucene on search1003 is CRITICAL: Connection timed out [16:05:29] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.728 second response time [16:05:47] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:06:05] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.125 second response time [16:06:14] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:06:14] PROBLEM - Lucene on search1001 is CRITICAL: Connection timed out [16:06:32] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.060 second response time [16:06:50] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:06:59] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:08] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:18] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.064 second response time [16:07:44] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:07:53] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.685 second response time [16:07:53] PROBLEM - Apache HTTP on mw32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:02] RECOVERY - LVS Lucene on search-pool2.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [16:08:20] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:29] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 63670 bytes in 8.287 seconds [16:08:38] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:38] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.325 second response time [16:08:48] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:56] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:14] RECOVERY - LVS Lucene on search-pool1.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [16:09:23] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:09:23] RECOVERY - Apache HTTP on mw32 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.977 second response time [16:09:50] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.615 second response time [16:09:59] hey anyone checking this out [16:10:07] this appears to be search related [16:10:08] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.491 second response time [16:10:08] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.376 second response time [16:10:17] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.279 second response time [16:10:20] goddamn fucking search [16:10:23] LeslieCarr: yeah, just for a the last few minutes though [16:10:26] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.977 second response time [16:10:31] i see apaches hanging on connections to search-pool1 [16:10:39] i'm fucking tired of being woken up 4 days a week because of goddamn search [16:10:40] and pool1 + 2 are both red in nagios [16:10:47] haven't looked at the search pools yet [16:10:53] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.062 second response time [16:10:53] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.466 second response time [16:10:57] this is the first time in a while its woken us up like this :/ [16:11:45] there's been nov 11, 10, 9, 8 for other search shit [16:12:32] RECOVERY - Lucene on search1001 is OK: TCP OK - 0.027 second response time on port 8123 [16:13:34] New patchset: Asher; "move search pool1+2 to pmtpa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33864 [16:14:19] binasher: will it work with pmtpa not being commented out ? [16:14:31] no haha [16:14:32] lines 30/36 ? [16:14:36] hehe [16:14:41] its ok, jenkins caught it too [16:14:49] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.832 seconds [16:14:49] PROBLEM - Apache HTTP on mw45 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:14:55] * binasher literally just woke up [16:15:14] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:15:43] me too [16:16:03] New patchset: Asher; "move search pool1+2 to pmtpa" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33865 [16:16:17] RECOVERY - Apache HTTP on mw45 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.742 second response time [16:16:44] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.555 second response time [16:16:52] oh wtf, how did i not amend that. sigh. [16:17:02] Change abandoned: Asher; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33864 [16:18:35] fuck it, live hack [16:18:42] !log asher synchronized wmf-config/lucene.php 'failing pool1+2 to pmtpa' [16:18:50] Logged the message, Master [16:20:58] RECOVERY - Lucene on search1003 is OK: TCP OK - 0.027 second response time on port 8123 [16:21:18] !log asher synchronized wmf-config/lucene.php 'failing pool3+4 to pmtpa' [16:21:25] binasher: interesting statistic - http://ganglia.wikimedia.org/latest/?c=Search%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [16:21:25] Logged the message, Master [16:21:42] even though 1001-1006 are all search pool 1 [16:22:00] Change abandoned: Asher; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33865 [16:23:01] i'm wondering if that's going to move to pmtpa in a bit [16:23:35] yeah … can you think of a reason why only 2/6 machines would flip out though ? [16:24:02] system uptime is pretty similar for all of htem [16:26:24] lvs sends pool queries to 1001-3, and those both have the "main" enwiki indexes, search1001 also has spell correction along with search1006.. but yah, not really, since 1003 looked normal [16:27:12] big spike on a lot of the pool2 hosts too that didn't last too long [16:28:35] obviously search just hates us sleeping [16:29:42] ugh the log for just the last day on just search1001 is 17GB [16:30:43] :( [16:32:07] oh interesting that search1003 looks normal but it looked down the whole time to 1001/1002 "2012-11-17 16:23:37,819 [Thread-5] WARN org.wikimedia.lsearch.search.NetworkStatusThread - Host search1003 for enwiki.nspart2 still down." [16:33:19] ah [16:33:26] search1003 lsearchd is using a full core but getting no queries.. [16:33:32] where are the lucene logs ? don't see them in /var/log [16:33:35] its doing nothing but.. [pid 16637] accept(19, 0x7ff75befd6a0, [28]) = -1 EMFILE (Too many open files) [16:33:36] hrm, lemme check lvs [16:33:50] they're local on the boxes in /a/search/log [16:34:35] ah cool [16:35:03] oh wow, all of its fd's are tcp sockets in close_wait from the app servers [16:35:29] ah yeah, i see search1003 going up and down [16:35:43] several connection refused [16:35:46] that explains that ... [16:37:33] looks like search1003 actually got in that state at 2012-11-17 13:09:43,504 [16:38:02] so still no idea what happened at 16:10 [16:38:08] but hey, everything is ok right now! [16:39:19] hehe yeah [16:39:47] do you want to email ops@ about the switch to tampa ? [16:40:04] leave it until monday ? [16:41:33] i should properly check it in.. i think i can get it right the third time! heh [16:42:17] i'm going to restart lsearchd everywhere in eqiad, others are in screwy states too [16:42:24] then i'll email ops [16:42:54] :) [16:42:55] cool [16:42:59] thanks [16:43:13] lsearchd is such buggy shit [16:44:34] i'm going to send off a quick "here's all the days search has woken me up in the last week" email followup to my thread and then go back to sleep [16:45:48] awesome :) [16:46:45] !log restarting lsearchd on all eqiad hosts [16:46:51] Logged the message, Master [16:48:56] g'night [16:49:05] and FUCK YOU LUCENE [16:50:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:51:52] New patchset: Asher; "fail search to pmtpa and add vip comments" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33867 [16:52:13] night :) thanks for getting on [16:52:31] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/33867 [17:03:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.677 seconds [17:14:22] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [17:38:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.041 seconds [18:26:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.858 seconds [18:46:35] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [18:46:35] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [19:16:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:31:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.114 seconds [20:05:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:19:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.684 seconds [20:20:57] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [20:25:54] PROBLEM - Puppet freshness on analytics1002 is CRITICAL: Puppet has not run in the last 10 hours [20:54:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:58:54] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [21:08:57] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [21:09:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.249 seconds [21:42:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:57:06] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [21:58:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.017 seconds [22:21:10] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.342 second response time [22:27:37] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [22:27:37] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [22:27:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:27:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [22:31:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:39:37] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [22:47:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.575 seconds [23:21:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:36:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.798 seconds [23:45:36] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours