[00:01:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:04:20] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: Puppet has not run in the last 10 hours [00:05:23] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours [00:07:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.590 seconds [00:17:01] Ryan_Lane: catrope@srv256:/usr/local/apache/common$ grep -Rn 404.php . [00:17:03] This is taking a while [00:17:11] heh [00:17:22] I figured I'd run it on bare metal rather than NFS [00:17:34] likely a good idea [00:18:19] !log powercycling ssl1001 [00:18:23] Logged the message, Master [00:18:30] !log powercycling ssl1003 [00:18:33] Logged the message, Master [00:20:14] RECOVERY - SSH on ssl1001 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:22:20] RECOVERY - SSH on ssl1003 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:23:50] RECOVERY - Puppet freshness on ssl1001 is OK: puppet ran at Sat Mar 10 00:23:20 UTC 2012 [00:26:32] RECOVERY - Puppet freshness on ssl1003 is OK: puppet ran at Sat Mar 10 00:26:22 UTC 2012 [00:42:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:46:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.067 seconds [00:52:29] RoanKattouw: giving me a problem I can't figure out on a friday is just mean [00:52:41] hehe [00:52:43] I'm sorry dude [00:52:45] :D [00:52:58] Benny passed me two problems, I fixed the other one [00:54:14] heh [01:01:00] New patchset: Bhartshorne; "adding a manager to call swiftcleaner multiple times on newly created objects." [operations/software] (master) - https://gerrit.wikimedia.org/r/3040 [01:04:32] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3040 [01:04:34] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/3040 [01:06:41] New patchset: Ryan Lane; "Grasping at straws here." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3041 [01:06:52] !log rebalanced the swift rings to finish decreasing traffic sent to ms1 and ms2 [01:06:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3041 [01:06:55] Logged the message, Master [01:06:55] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3041 [01:06:58] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3041 [01:07:06] !log started swiftcleaner on owa1 looking for (and purging) bad objects [01:07:09] Logged the message, Master [01:15:33] New patchset: Ryan Lane; "More grasping" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3042 [01:15:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3042 [01:15:48] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3042 [01:15:51] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3042 [01:18:24] New patchset: Ryan Lane; "More troubleshooting" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3043 [01:18:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3043 [01:19:02] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3043 [01:19:05] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3043 [01:21:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:46] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [01:25:46] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [01:27:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.684 seconds [02:03:52] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:07:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.375 seconds [02:56:43] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [03:06:19] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [03:07:31] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [03:07:31] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [03:08:34] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [03:08:34] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [03:08:34] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [03:10:31] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [03:10:31] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [03:10:31] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [03:10:31] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [03:13:31] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [03:15:28] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [03:15:28] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [03:25:31] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [03:44:34] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [03:44:34] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [03:51:28] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [03:51:29] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [03:51:29] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [03:51:30] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [03:51:30] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Puppet has not run in the last 10 hours [03:59:26] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [03:59:27] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [03:59:28] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [03:59:28] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [03:59:28] PROBLEM - Puppet freshness on ssl3004 is CRITICAL: Puppet has not run in the last 10 hours [03:59:29] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [03:59:29] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [03:59:30] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [04:00:38] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [04:01:41] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [04:01:41] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [04:01:41] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [04:02:44] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [04:03:38] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [04:04:41] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [04:05:35] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [04:06:38] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [04:06:38] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Puppet has not run in the last 10 hours [04:06:38] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [04:07:41] PROBLEM - Puppet freshness on knsq20 is CRITICAL: Puppet has not run in the last 10 hours [04:21:20] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [04:21:38] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [04:43:50] New patchset: Dzahn; "nagios - profiler-to-carbon - work around incorrect process count issues with check_procs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3044 [04:44:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3044 [04:45:31] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3044 [04:45:34] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3044 [04:51:02] PROBLEM - Disk space on db57 is CRITICAL: Connection refused by host [04:51:29] PROBLEM - MySQL disk space on db57 is CRITICAL: Connection refused by host [04:52:24] PROBLEM - MySQL Slave Delay on db25 is CRITICAL: Connection refused by host [04:52:24] PROBLEM - mysqld processes on db25 is CRITICAL: Connection refused by host [04:52:24] PROBLEM - Disk space on mw1072 is CRITICAL: Connection refused by host [04:52:33] PROBLEM - Disk space on snapshot1001 is CRITICAL: Connection refused by host [04:52:42] PROBLEM - RAID on capella is CRITICAL: Connection refused by host [04:52:42] PROBLEM - Disk space on db25 is CRITICAL: Connection refused by host [04:52:42] PROBLEM - MySQL Slave Running on db25 is CRITICAL: Connection refused by host [04:52:42] PROBLEM - RAID on db57 is CRITICAL: Connection refused by host [04:52:51] PROBLEM - RAID on search1006 is CRITICAL: Connection refused by host [04:53:00] PROBLEM - MySQL disk space on db25 is CRITICAL: Connection refused by host [04:53:09] PROBLEM - RAID on mw1072 is CRITICAL: Connection refused by host [04:53:18] PROBLEM - MySQL Idle Transactions on db25 is CRITICAL: Connection refused by host [04:53:27] PROBLEM - DPKG on db57 is CRITICAL: Connection refused by host [04:53:36] PROBLEM - DPKG on search1006 is CRITICAL: Connection refused by host [04:53:36] PROBLEM - RAID on srv224 is CRITICAL: Connection refused by host [04:53:36] PROBLEM - DPKG on capella is CRITICAL: Connection refused by host [04:53:36] PROBLEM - DPKG on snapshot1001 is CRITICAL: Connection refused by host [04:53:36] PROBLEM - Disk space on capella is CRITICAL: Connection refused by host [04:53:45] PROBLEM - MySQL Recent Restart on db25 is CRITICAL: Connection refused by host [04:53:54] PROBLEM - Disk space on search1006 is CRITICAL: Connection refused by host [04:53:54] PROBLEM - DPKG on srv210 is CRITICAL: Connection refused by host [04:54:03] PROBLEM - MySQL Replication Heartbeat on db25 is CRITICAL: Connection refused by host [04:54:11] uhh? didnt touch anything related this time.. [04:54:12] PROBLEM - Disk space on srv210 is CRITICAL: Connection refused by host [04:54:21] PROBLEM - DPKG on db25 is CRITICAL: Connection refused by host [04:54:39] PROBLEM - DPKG on mw1072 is CRITICAL: Connection refused by host [04:54:39] PROBLEM - DPKG on cp1016 is CRITICAL: Connection refused by host [04:54:48] PROBLEM - DPKG on srv224 is CRITICAL: Connection refused by host [04:56:30] checking [05:00:11] oh yeah, the usual nagios-nrpe fails to restart issue after config change [05:01:06] !log starting nagios-nrpe-server on all via dsh (fail to restart on config change issue) [05:01:10] Logged the message, Master [05:11:38] !log doing more (cp*, db*, msbe-* ,mw*) by hand / for loop [05:11:42] Logged the message, Master [05:21:26] what the.. stopped again ?! [05:41:36] RECOVERY - DPKG on virt2 is OK: All packages OK [05:42:26] hrmm.. and now we'll see if this happens again [05:47:36] New patchset: Dzahn; "profiler-to-carbon process check - remove -a when using --ereg-argument-array" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3045 [05:47:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3045 [05:48:24] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3045 [05:48:27] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3045 [06:37:44] RECOVERY - Disk space on srv196 is OK: DISK OK [06:37:44] RECOVERY - DPKG on srv265 is OK: All packages OK [06:37:44] RECOVERY - RAID on srv274 is OK: OK: no RAID installed [06:38:02] RECOVERY - Disk space on srv235 is OK: DISK OK [06:38:29] RECOVERY - RAID on srv256 is OK: OK: no RAID installed [06:59:31] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [07:03:25] RECOVERY - Puppet freshness on db1022 is OK: puppet ran at Sat Mar 10 07:03:18 UTC 2012 [07:03:48] !log ran puppet on db1022, another one that works fine manually but somehow did not by itself [07:03:52] Logged the message, Master [07:32:40] RECOVERY - Disk space on ms1004 is OK: DISK OK [08:23:37] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [10:52:20] PROBLEM - Disk space on search1018 is CRITICAL: DISK CRITICAL - free space: /a 3253 MB (2% inode=99%): [10:54:26] PROBLEM - Disk space on search1017 is CRITICAL: DISK CRITICAL - free space: /a 2121 MB (1% inode=99%): [11:02:02] PROBLEM - Disk space on search1018 is CRITICAL: DISK CRITICAL - free space: /a 4807 MB (3% inode=99%): [11:06:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:08:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.238 seconds [11:27:23] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [11:27:23] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [11:44:56] New patchset: Mark Bergsma; "Don't use probes for upload backend, use upload squids as backend" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3047 [11:44:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:45:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3047 [11:46:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3047 [11:46:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3047 [11:48:05] PROBLEM - Disk space on search1017 is CRITICAL: DISK CRITICAL - free space: /a 3368 MB (2% inode=99%): [11:48:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.746 seconds [12:04:42] New patchset: Mark Bergsma; "Put all probes in every VCL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3048 [12:04:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3048 [12:05:12] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3048 [12:05:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3048 [12:10:26] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [12:24:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:28:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.023 seconds [12:42:46] New patchset: Mark Bergsma; "Cache objects for 1 hour (frontend) or 30 days (backend) by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3049 [12:42:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3049 [12:44:05] New patchset: Mark Bergsma; "Restrict target domain to upload.wikimedia.org on frontends as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3050 [12:44:17] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3049 [12:44:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3049 [12:44:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3050 [12:44:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3050 [12:44:43] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3050 [12:58:01] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [13:04:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:08:04] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [13:08:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.879 seconds [13:09:07] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [13:09:07] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [13:10:01] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [13:10:01] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [13:10:01] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [13:12:07] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [13:12:07] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [13:12:07] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [13:12:07] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [13:15:07] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [13:17:04] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [13:17:04] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [13:27:07] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [13:43:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:35] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [13:46:35] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [13:47:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.177 seconds [13:52:35] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [13:52:35] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [13:52:36] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [13:52:36] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [13:52:37] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [13:52:37] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [14:00:32] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [14:00:33] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [14:00:33] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [14:00:34] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Puppet has not run in the last 10 hours [14:00:34] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [14:00:35] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [14:00:35] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [14:00:36] PROBLEM - Puppet freshness on ssl3004 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [14:02:29] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [14:03:32] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [14:03:32] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [14:03:32] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [14:04:35] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [14:05:29] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [14:06:32] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [14:07:35] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [14:08:29] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [14:08:29] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Puppet has not run in the last 10 hours [14:08:29] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [14:09:32] PROBLEM - Puppet freshness on knsq20 is CRITICAL: Puppet has not run in the last 10 hours [14:23:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:27:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.425 seconds [15:03:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:09:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.035 seconds [15:18:10] New patchset: Mark Bergsma; "Test modified varnishhtcpd with inline http port" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3051 [15:18:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3051 [15:19:13] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3051 [15:19:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3051 [15:21:18] New patchset: Mark Bergsma; "Typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3052 [15:21:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3052 [15:21:55] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3052 [15:21:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3052 [15:24:41] New patchset: Mark Bergsma; "Purge both Varnish instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3053 [15:24:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3053 [15:25:07] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3053 [15:25:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3053 [15:31:02] New patchset: Mark Bergsma; "Convert mobile servers to new htcppurger class, purging both varnish instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3054 [15:31:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3054 [15:31:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3054 [15:31:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3054 [15:43:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.247 seconds [16:08:20] New patchset: Mark Bergsma; "Let's not have Varnish writing to the same file concurrently, shall we" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3055 [16:08:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3055 [16:09:10] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3055 [16:09:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3055 [16:23:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:29:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.675 seconds [17:02:18] New patchset: Mark Bergsma; "Add serve IPs to X-Cache headers for debugging purposes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3056 [17:02:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3056 [17:02:49] New patchset: Mark Bergsma; "Add server IPs to X-Cache headers for debugging purposes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3056 [17:03:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3056 [17:03:11] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3056 [17:03:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3056 [17:03:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:09] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [17:09:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.325 seconds [17:14:13] New patchset: Mark Bergsma; "server.ip is not a string, so use hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3057 [17:14:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3057 [17:14:44] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3057 [17:14:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3057 [17:30:19] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:19] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:19] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:19] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:16] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:17] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:17] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:17] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:22] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:22] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:31] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:32] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:19] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:19] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:20] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:20] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.442 seconds [17:50:16] New patchset: Mark Bergsma; "Cache 4xx on upload" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3058 [17:50:16] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:50:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3058 [17:50:34] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:50:34] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:50:34] PROBLEM - check_minfraud_secondary on payments3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:51:19] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3058 [17:51:24] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3058 [17:55:22] RECOVERY - check_minfraud_secondary on payments3 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 118 bytes in 5.748 second response time [17:55:22] PROBLEM - check_minfraud_secondary on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:22] PROBLEM - check_minfraud_secondary on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:55:50] PROBLEM - check_minfraud_secondary on payments4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:51] New patchset: Mark Bergsma; "Specify cache4xx as a time period" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3059 [17:59:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3059 [17:59:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3059 [17:59:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3059 [18:00:19] RECOVERY - check_minfraud_secondary on payments2 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 118 bytes in 0.593 second response time [18:00:19] RECOVERY - check_minfraud_secondary on payments4 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 118 bytes in 0.604 second response time [18:00:19] RECOVERY - check_minfraud_secondary on payments1 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 118 bytes in 0.583 second response time [18:06:09] New patchset: Mark Bergsma; "Don't return(hit_for_pass) when caching 4xx" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3060 [18:06:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3060 [18:06:23] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3060 [18:06:26] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3060 [18:24:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:24:55] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [18:28:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.484 seconds [18:28:58] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [18:32:52] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [18:37:58] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [18:39:48] New patchset: Mark Bergsma; "Don't udplog PURGE requests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3061 [18:40:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3061 [18:40:22] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3061 [18:40:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3061 [18:44:11] New patchset: Mark Bergsma; "Don't use single quotes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3062 [18:44:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3062 [18:44:38] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3062 [18:44:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3062 [18:48:02] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [18:48:35] New patchset: Mark Bergsma; "Make Puppet automatically restart the varnish loggers on changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3063 [18:48:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3063 [18:48:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3063 [18:48:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3063 [19:00:15] hey asher [19:00:19] wanna join the hackathon? ;) [19:00:33] ASHER COME TO HACKATHON [19:00:47] binasher: I just made mediawiki twice faster!!!!111 [19:00:58] zomg! [19:01:00] 5.4! [19:01:07] and suhosin [19:01:17] and -O3 hehe [19:01:24] yeh [19:01:33] death to suhosin [19:02:14] and newer APC [19:02:19] anyway [19:02:31] domas is annoyed with current avg mediawiki request latency [19:02:36] "was 40ms in my days!" [19:02:41] 160 now [19:02:44] !!! [19:02:46] well, 100+ [19:04:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:04:18] * domas stares at http://svn.php.net/viewvc/pecl/apc/trunk/?sortby=date#dirlist [19:06:52] when's hphpvm going to be ready?? [19:07:29] i'm about ready to deploy varnish for upload [19:08:59] ooh. want to try the persistent backend instead of file? [19:09:08] I have file now [19:09:13] but I suppose we could test it [19:09:17] perhaps on just a few boxes [19:09:31] binasher: I guess when it's ready [19:09:34] upload.eqiad goes to squid in pmtpa now [19:09:42] except for swift, it contacts that directly [19:09:54] yeah, on a few boxes to compare would be good [19:10:22] but lets start with file for the first few days, since we might have enough issues [19:11:17] oh i didn't realize squid didn't send a udp packet per log entry [19:11:27] yeah [19:12:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.021 seconds [19:12:40] we could even do jumbo frames, 9000 MTU [19:12:45] that would fit quite a few requests [19:22:31] !log reslaved db1033 [19:22:35] Logged the message, Master [19:22:59] RECOVERY - mysqld processes on db1033 is OK: PROCS OK: 1 process with command name mysqld [19:26:44] PROBLEM - MySQL Replication Heartbeat on db1033 is CRITICAL: CRIT replication delay 148641 seconds [19:27:02] PROBLEM - MySQL Slave Delay on db1033 is CRITICAL: CRIT replication delay 148559 seconds [19:28:08] !log set sync_binlog = 1 on all current masters and eqiad dbs [19:28:11] Logged the message, Master [19:34:20] i wonder how to deal with the varnish metrics for the two varnish instances [19:43:27] New patchset: Mark Bergsma; "Preseed remaining questions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3065 [19:43:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3065 [19:43:55] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3065 [19:43:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3065 [19:44:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.783 seconds [19:57:13] New patchset: Mark Bergsma; "Add Upload caches eqiad cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3066 [19:57:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3066 [19:57:39] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3066 [19:57:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3066 [20:25:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:29:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.893 seconds [21:06:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:12:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.025 seconds [21:29:08] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [21:29:08] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [21:41:21] PROBLEM - Host ms6 is DOWN: PING CRITICAL - Packet loss = 100% [21:41:30] PROBLEM - Host knsq22 is DOWN: PING CRITICAL - Packet loss = 100% [21:41:30] PROBLEM - Host amssq36 is DOWN: PING CRITICAL - Packet loss = 100% [21:41:30] PROBLEM - Host amssq49 is DOWN: PING CRITICAL - Packet loss = 100% [21:41:30] PROBLEM - Host wikipedia-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:41:39] PROBLEM - Host nescio is DOWN: PING CRITICAL - Packet loss = 100% [21:41:57] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:06] PROBLEM - Host amssq51 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:06] PROBLEM - Host knsq27 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:06] PROBLEM - Host amssq43 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:06] PROBLEM - Host knsq29 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host ssl3002 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host amssq44 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host amssq35 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host amssq38 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host amssq33 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host knsq23 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:15] PROBLEM - Host foundation-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:42:16] PROBLEM - Host amssq54 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:16] PROBLEM - Host wiktionary-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:42:17] PROBLEM - Host amssq56 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:24] PROBLEM - Host knsq21 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:24] PROBLEM - Host amssq41 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:24] PROBLEM - Host knsq16 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:33] PROBLEM - Host knsq19 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:33] PROBLEM - Host amssq50 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:33] PROBLEM - Host amssq32 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:42] PROBLEM - Host amssq59 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:42] PROBLEM - Host knsq25 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:42] PROBLEM - Host upload.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:42:51] PROBLEM - Host ssl3001 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:51] PROBLEM - Host knsq17 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:00] PROBLEM - Host wikiversity-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:00] PROBLEM - Host amssq34 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:01] PROBLEM - Host hooft is DOWN: PING CRITICAL - Packet loss = 100% [21:43:09] PROBLEM - Host bits.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:18] PROBLEM - Host knsq24 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:36] PROBLEM - Host amssq39 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:36] PROBLEM - Host amssq37 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:36] PROBLEM - Host amssq45 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:36] PROBLEM - Host wikinews-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:36] PROBLEM - Host wikisource-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:37] PROBLEM - Host wikiquote-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:45] PROBLEM - Host maerlant is DOWN: PING CRITICAL - Packet loss = 100% [21:43:45] PROBLEM - Host cp3001 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:54] PROBLEM - Host amssq58 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:54] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:54] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [21:43:54] PROBLEM - Host ns2.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:43:54] PROBLEM - Host amslvs3 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:12] PROBLEM - Host amssq60 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:12] PROBLEM - Host br1-knams is DOWN: PING CRITICAL - Packet loss = 100% [21:44:21] PROBLEM - Host ssl3004 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:21] PROBLEM - Host csw1-esams is DOWN: PING CRITICAL - Packet loss = 100% [21:44:39] PROBLEM - Host amssq42 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:39] PROBLEM - Host amslvs4 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:39] PROBLEM - Host mediawiki-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:44:48] PROBLEM - Host 91.198.174.6 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:48] PROBLEM - Host ssl3003 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:48] PROBLEM - Host amssq55 is DOWN: PING CRITICAL - Packet loss = 100% [21:44:48] PROBLEM - Host csw2-esams is DOWN: PING CRITICAL - Packet loss = 100% [21:45:06] PROBLEM - Host amssq62 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:15] PROBLEM - Host amssq52 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:15] PROBLEM - Host amssq46 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:15] PROBLEM - Host amssq53 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:24] PROBLEM - Host amssq31 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:42] PROBLEM - Host amssq40 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:42] PROBLEM - Host upload.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:42] PROBLEM - Host text.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:45:43] PROBLEM - Host wikibooks-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:45:43] PROBLEM - Host wikibooks-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:51] PROBLEM - Host amssq61 is DOWN: PING CRITICAL - Packet loss = 100% [21:45:51] PROBLEM - Host wikimedia-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [21:45:51] PROBLEM - Host wikimedia-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:52] PROBLEM - Host wikipedia-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:52] PROBLEM - Host wikiquote-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:53] PROBLEM - Host wikisource-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:45:53] PROBLEM - Host wikinews-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:46:00] PROBLEM - Host knsq26 is DOWN: PING CRITICAL - Packet loss = 100% [21:46:00] PROBLEM - Host cp3002 is DOWN: PING CRITICAL - Packet loss = 100% [21:46:09] PROBLEM - Host wikiversity-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:46:09] PROBLEM - Host wiktionary-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:46:10] PROBLEM - Host amslvs2 is DOWN: PING CRITICAL - Packet loss = 100% [21:46:18] PROBLEM - Host amslvs1 is DOWN: PING CRITICAL - Packet loss = 100% [21:46:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:27] PROBLEM - Host bits.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:46:45] PROBLEM - Host knsq28 is DOWN: PING CRITICAL - Packet loss = 100% [21:47:03] PROBLEM - Host knsq20 is DOWN: PING CRITICAL - Packet loss = 100% [21:47:03] PROBLEM - Host knsq18 is DOWN: PING CRITICAL - Packet loss = 100% [21:48:24] PROBLEM - Host foundation-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:48:51] PROBLEM - Host mediawiki-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [21:50:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.855 seconds [21:51:36] wtf?? [21:51:49] (and why does this always happen midnight on my shift?) [21:55:59] americans waking up? [21:56:45] no, it's 2 pm on the west coast [21:56:50] and so 5 pm on the east [21:56:59] or maybe I'm an hour off, anyways it's definitely not morning [21:57:25] bits and upload squids in esams still seem to be acting up [22:04:33] I can't ping to a bits squid but I can ssh to it? [22:04:54] not making any sense [22:06:46] looks like bits are coming back. weird [22:07:11] and so are the uploads. [22:11:30] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [22:20:03] RECOVERY - Host wikipedia-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 16%, RTA = 109.81 ms [22:20:04] RECOVERY - Host wikimedia-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 16%, RTA = 110.06 ms [22:20:05] RECOVERY - Host knsq22 is UP: PING OK - Packet loss = 16%, RTA = 109.21 ms [22:20:05] RECOVERY - Host knsq25 is UP: PING WARNING - Packet loss = 66%, RTA = 109.38 ms [22:20:05] RECOVERY - Host amssq43 is UP: PING OK - Packet loss = 16%, RTA = 110.10 ms [22:20:05] RECOVERY - Host amssq40 is UP: PING OK - Packet loss = 16%, RTA = 109.46 ms [22:20:05] RECOVERY - Host amssq37 is UP: PING OK - Packet loss = 16%, RTA = 109.96 ms [22:20:05] RECOVERY - Host wikibooks-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 16%, RTA = 109.34 ms [22:20:06] RECOVERY - Host knsq24 is UP: PING OK - Packet loss = 16%, RTA = 109.30 ms [22:20:06] RECOVERY - Host amssq31 is UP: PING WARNING - Packet loss = 66%, RTA = 111.00 ms [22:20:07] RECOVERY - Host amssq38 is UP: PING WARNING - Packet loss = 66%, RTA = 109.54 ms [22:20:07] RECOVERY - Host amssq32 is UP: PING WARNING - Packet loss = 66%, RTA = 110.61 ms [22:20:08] RECOVERY - Host knsq19 is UP: PING WARNING - Packet loss = 66%, RTA = 109.24 ms [22:20:08] RECOVERY - Host knsq21 is UP: PING WARNING - Packet loss = 66%, RTA = 109.89 ms [22:20:09] RECOVERY - Host amssq59 is UP: PING WARNING - Packet loss = 66%, RTA = 109.73 ms [22:20:09] RECOVERY - Host amssq56 is UP: PING OK - Packet loss = 0%, RTA = 109.30 ms [22:20:10] RECOVERY - Host amssq50 is UP: PING OK - Packet loss = 0%, RTA = 109.96 ms [22:20:10] RECOVERY - Host amssq33 is UP: PING OK - Packet loss = 0%, RTA = 110.71 ms [22:20:11] RECOVERY - Host knsq18 is UP: PING WARNING - Packet loss = 28%, RTA = 110.13 ms [22:20:12] RECOVERY - Host amssq52 is UP: PING OK - Packet loss = 0%, RTA = 109.42 ms [22:20:12] RECOVERY - Host amssq62 is UP: PING OK - Packet loss = 0%, RTA = 108.84 ms [22:20:12] RECOVERY - Host amssq44 is UP: PING OK - Packet loss = 0%, RTA = 109.16 ms [22:20:13] RECOVERY - Host amssq46 is UP: PING OK - Packet loss = 0%, RTA = 109.33 ms [22:20:13] RECOVERY - Host knsq16 is UP: PING OK - Packet loss = 0%, RTA = 109.32 ms [22:20:14] RECOVERY - Host amslvs1 is UP: PING OK - Packet loss = 0%, RTA = 109.52 ms [22:20:14] RECOVERY - Host amssq58 is UP: PING OK - Packet loss = 0%, RTA = 109.50 ms [22:20:15] RECOVERY - Host amssq36 is UP: PING OK - Packet loss = 0%, RTA = 110.07 ms [22:20:15] RECOVERY - Host ssl3003 is UP: PING WARNING - Packet loss = 80%, RTA = 110.84 ms [22:20:16] RECOVERY - Host ms6 is UP: PING WARNING - Packet loss = 80%, RTA = 110.52 ms [22:20:16] RECOVERY - Host mediawiki-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.43 ms [22:20:21] RECOVERY - Host foundation-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.86 ms [22:20:22] RECOVERY - Host amssq48 is UP: PING OK - Packet loss = 0%, RTA = 109.26 ms [22:20:22] RECOVERY - Host amssq47 is UP: PING OK - Packet loss = 0%, RTA = 109.39 ms [22:20:22] RECOVERY - Host ssl3002 is UP: PING OK - Packet loss = 0%, RTA = 109.65 ms [22:20:22] RECOVERY - Host amssq41 is UP: PING OK - Packet loss = 0%, RTA = 110.47 ms [22:20:22] RECOVERY - Host ns2.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.88 ms [22:20:22] RECOVERY - Host amssq35 is UP: PING OK - Packet loss = 0%, RTA = 110.71 ms [22:20:22] RECOVERY - Host knsq17 is UP: PING OK - Packet loss = 0%, RTA = 109.83 ms [22:20:23] RECOVERY - Host amssq53 is UP: PING OK - Packet loss = 0%, RTA = 109.45 ms [22:20:23] RECOVERY - Host knsq28 is UP: PING OK - Packet loss = 0%, RTA = 109.79 ms [22:20:24] RECOVERY - Host amssq60 is UP: PING OK - Packet loss = 0%, RTA = 109.56 ms [22:20:24] RECOVERY - Host knsq27 is UP: PING WARNING - Packet loss = 80%, RTA = 112.29 ms [22:20:25] RECOVERY - Host amssq42 is UP: PING WARNING - Packet loss = 86%, RTA = 112.22 ms [22:20:25] RECOVERY - Host wikiversity-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 80%, RTA = 112.63 ms [22:20:26] RECOVERY - Host amssq51 is UP: PING OK - Packet loss = 0%, RTA = 109.41 ms [22:20:26] RECOVERY - Host wikinews-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 110.43 ms [22:20:30] RECOVERY - Host knsq29 is UP: PING WARNING - Packet loss = 73%, RTA = 112.43 ms [22:20:30] RECOVERY - Host ssl3004 is UP: PING OK - Packet loss = 0%, RTA = 109.57 ms [22:20:30] RECOVERY - Host hooft is UP: PING OK - Packet loss = 0%, RTA = 109.28 ms [22:20:30] RECOVERY - Host knsq26 is UP: PING OK - Packet loss = 0%, RTA = 109.19 ms [22:20:30] RECOVERY - Host wikisource-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.27 ms [22:20:39] RECOVERY - Host amssq61 is UP: PING OK - Packet loss = 0%, RTA = 110.13 ms [22:20:39] RECOVERY - Host amssq49 is UP: PING OK - Packet loss = 0%, RTA = 109.92 ms [22:20:39] RECOVERY - Host amssq55 is UP: PING OK - Packet loss = 0%, RTA = 109.43 ms [22:20:39] RECOVERY - Host cp3002 is UP: PING WARNING - Packet loss = 93%, RTA = 110.08 ms [22:20:39] RECOVERY - Host amssq54 is UP: PING WARNING - Packet loss = 80%, RTA = 112.40 ms [22:20:39] RECOVERY - Host amssq39 is UP: PING WARNING - Packet loss = 86%, RTA = 112.24 ms [22:20:39] RECOVERY - Host amssq34 is UP: PING OK - Packet loss = 0%, RTA = 109.24 ms [22:20:48] RECOVERY - Host amslvs2 is UP: PING OK - Packet loss = 0%, RTA = 108.98 ms [22:20:48] RECOVERY - Host bits.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.10 ms [22:20:49] RECOVERY - Host ssl3001 is UP: PING OK - Packet loss = 0%, RTA = 109.44 ms [22:20:49] RECOVERY - Host knsq20 is UP: PING WARNING - Packet loss = 93%, RTA = 110.06 ms [22:20:49] RECOVERY - Host amslvs3 is UP: PING WARNING - Packet loss = 66%, RTA = 113.23 ms [22:20:57] RECOVERY - Host wiktionary-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 58%, RTA = 112.17 ms [22:20:57] RECOVERY - Host amssq57 is UP: PING OK - Packet loss = 0%, RTA = 109.14 ms [22:20:58] RECOVERY - Host nescio is UP: PING OK - Packet loss = 0%, RTA = 109.09 ms [22:20:58] RECOVERY - Host csw2-esams is UP: PING OK - Packet loss = 0%, RTA = 111.74 ms [22:20:58] RECOVERY - Host cp3001 is UP: PING OK - Packet loss = 0%, RTA = 109.24 ms [22:20:58] RECOVERY - Host maerlant is UP: PING OK - Packet loss = 0%, RTA = 108.93 ms [22:20:58] RECOVERY - Host csw1-esams is UP: PING OK - Packet loss = 0%, RTA = 110.20 ms [22:20:58] RECOVERY - Host upload.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.49 ms [22:21:15] RECOVERY - Host mediawiki-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.32 ms [22:21:15] RECOVERY - Host amslvs4 is UP: PING OK - Packet loss = 0%, RTA = 109.38 ms [22:21:15] RECOVERY - Host br1-knams is UP: PING OK - Packet loss = 0%, RTA = 109.10 ms [22:21:15] RECOVERY - Host knsq23 is UP: PING OK - Packet loss = 0%, RTA = 109.82 ms [22:21:15] RECOVERY - Host wikiquote-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 110.24 ms [22:21:24] RECOVERY - Host 91.198.174.6 is UP: PING OK - Packet loss = 0%, RTA = 109.18 ms [22:22:09] RECOVERY - Host amssq45 is UP: PING OK - Packet loss = 0%, RTA = 110.71 ms [22:23:21] RECOVERY - Host text.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 109.92 ms [22:23:21] RECOVERY - Host wikibooks-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.34 ms [22:23:22] RECOVERY - Host upload.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.60 ms [22:23:30] RECOVERY - Host wikiquote-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.13 ms [22:23:30] RECOVERY - Host wikimedia-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.21 ms [22:23:31] RECOVERY - Host wikisource-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.86 ms [22:23:31] RECOVERY - Host wikipedia-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.63 ms [22:23:32] RECOVERY - Host wikinews-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.56 ms [22:23:57] RECOVERY - Host wikiversity-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 110.44 ms [22:23:57] RECOVERY - Host wiktionary-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.71 ms [22:24:06] RECOVERY - Host bits.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.10 ms [22:25:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:26:03] RECOVERY - Host foundation-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 109.78 ms [22:30:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.993 seconds [22:59:30] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [23:07:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:09:24] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [23:10:27] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [23:10:27] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [23:11:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.029 seconds [23:11:30] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [23:11:30] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [23:11:30] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [23:14:57] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [23:14:57] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [23:14:57] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [23:14:57] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [23:16:09] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [23:18:06] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [23:18:06] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [23:28:09] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [23:46:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:48:06] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [23:48:06] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [23:52:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.444 seconds [23:54:06] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [23:54:06] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [23:54:06] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [23:54:06] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [23:54:06] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [23:54:06] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [23:54:07] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [23:54:07] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [23:54:08] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [23:54:08] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [23:54:09] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [23:57:33] RECOVERY - MySQL Replication Heartbeat on db1033 is OK: OK replication delay 0 seconds [23:58:00] RECOVERY - MySQL Slave Delay on db1033 is OK: OK replication delay 0 seconds