[07:03:11] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.003557801247 secs [07:20:20] PROBLEM - Host search23 is DOWN: PING CRITICAL - Packet loss = 100% [07:22:48] RECOVERY - Host search23 is UP: PING OK - Packet loss = 0%, RTA = 36.00 ms [07:24:58] PROBLEM - search indices - check lucene status page on search23 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:26:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:27:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [07:27:48] RECOVERY - search indices - check lucene status page on search23 is OK: HTTP OK: HTTP/1.1 200 OK - 269 bytes in 0.072 second response time [07:32:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:33:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.161 second response time [07:37:28] PROBLEM - NTP on search23 is CRITICAL: NTP CRITICAL: Offset unknown [07:42:27] RECOVERY - NTP on search23 is OK: NTP OK: Offset -0.0009605884552 secs [07:43:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:44:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [07:52:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:53:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [07:53:34] New patchset: Nemo bis; "ULS config for deployment phase 1" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63113 [08:28:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:31:47] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.007799744606 secs [09:29:10] New patchset: Liangent; "zh.planet feed update for Mountain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66326 [10:53:28] New patchset: Petrb; "created a stub for memcache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66328 [11:43:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:44:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [11:53:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:54:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [12:01:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:02:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:06:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:07:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [13:40:14] New review: GWicke; "We would like to deploy this soon. Are there outstanding issues, or can this be merged now?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [15:08:24] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:09:23] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:12:23] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:13:23] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:17:23] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:18:23] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:23:23] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:25:23] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:26:49] New patchset: Ottomata; "Updating some hadoop mapreduce and yarn configs" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/66337 [15:41:24] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:42:24] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [15:51:24] PROBLEM - search indices - check lucene status page on search19 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 60051 bytes in 0.110 second response time [15:52:03] PROBLEM - Puppet freshness on db1032 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:03] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:04] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:04] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:05] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:05] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:06] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:52:06] PROBLEM - Puppet freshness on mw1171 is CRITICAL: No successful Puppet run in the last 10 hours [15:54:52] New patchset: Liangent; "Remove some hard-coded 'wikipedia' from core in aaaee54f" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/66339 [15:55:29] !log staggered reboot of ms-fe1001-4, standard maintenance [15:55:36] Logged the message, Master [15:57:14] PROBLEM - Host ms-fe1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:57:43] RECOVERY - Host ms-fe1004 is UP: PING OK - Packet loss = 0%, RTA = 1.94 ms [16:00:10] New patchset: Liangent; "Remove some hard-coded 'wikipedia' from core in I3c156792" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/66339 [16:02:13] PROBLEM - Host ms-fe1003 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:23] RECOVERY - Host ms-fe1003 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms [16:12:01] PROBLEM - Host ms-fe1002 is DOWN: PING CRITICAL - Packet loss = 100% [16:13:21] RECOVERY - Host ms-fe1002 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [16:17:52] PROBLEM - Host ms-fe1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:19:41] RECOVERY - Host ms-fe1001 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:04:28] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [17:05:58] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [17:20:43] Something wrong with Gerrit? [17:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.181 second response time [18:53:24] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [18:54:04] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [19:48:59] PROBLEM - SSH on pdf3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:49:59] RECOVERY - SSH on pdf3 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [20:19:32] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:21:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [21:26:33] RECOVERY - search indices - check lucene status page on search19 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.110 second response time [23:25:50] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:32:45] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server