[00:13:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:25:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.043 seconds [00:58:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:10:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.522 seconds [01:41:53] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 246 seconds [01:42:02] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 254 seconds [01:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:48:47] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 659s [01:53:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [01:53:08] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 21 seconds [01:54:20] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 8s [01:54:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.733 seconds [02:28:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [04:31:33] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [05:16:40] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [05:40:40] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:11:10] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [08:51:22] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [09:02:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:04:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [09:38:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.042 seconds [10:17:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:28:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.409 seconds [11:04:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:15:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [11:47:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:56:06] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [11:58:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.114 seconds [12:32:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:45:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [13:18:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:29:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.334 seconds [14:02:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:13:56] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.139 seconds [14:32:41] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [14:47:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.578 seconds [15:17:45] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:33:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:45] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [15:46:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.048 seconds [16:18:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:29:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.177 seconds [17:03:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.397 seconds [17:47:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.704 seconds [18:10:37] New patchset: Alex Monk; "(bug 34769) Let sawikisource admins grant/remove autopatrol" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16921 [18:12:18] PROBLEM - Puppet freshness on db63 is CRITICAL: Puppet has not run in the last 10 hours [18:32:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.132 seconds [18:45:59] RECOVERY - LDAP on virt0 is OK: TCP OK - 0.001 second response time on port 389 [18:46:17] RECOVERY - LDAPS on virt0 is OK: TCP OK - 0.001 second response time on port 636 [18:52:26] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [19:17:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:28:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.667 seconds [20:02:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:12:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.428 seconds [20:46:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:55:04] New patchset: J; "migrate wikimedia-job-runner to puppet" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16501 [20:55:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16501 [20:57:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.326 seconds [21:17:53] New patchset: J; "Add videoscaler class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16654 [21:18:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16654 [21:20:31] New patchset: J; "Add videoscaler class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16654 [21:21:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16654 [21:21:36] New patchset: J; "Add videoscaler class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16654 [21:22:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16654 [21:31:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:44:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.030 seconds [21:53:33] FYI -- #wikimedia-tech: [21:53:45] 14:51 (BarkingFish) evening guys :) Are you aware of any issues with the servers in the Netherlands, please? I just ran a traceroute to en.wikipedia.org, and it's getting as far as my ISP's interchange in Amsterdam, and then stopping dead. I get 6 complete hops out of 30, the other 24 are dead [21:53:45] 14:51 (BarkingFish) http://pastebin.com/pbqgLLTh [21:57:10] PROBLEM - Puppet freshness on mw60 is CRITICAL: Puppet has not run in the last 10 hours [22:16:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:27:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.128 seconds [22:28:16] I don't get a useful traceroute either [22:28:27] yet wikipedia works fine an answers ping perfectly [22:36:37] Platonides: yeah, problem on his isp's end, it turns out [22:53:13] New patchset: Platonides; "Add docroot for testwiki." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16935 [22:54:28] New patchset: Platonides; "Add docroot for testwiki." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16935 [23:01:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:11:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.624 seconds [23:44:19] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [23:45:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:45:49] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [23:53:10] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 225 seconds [23:53:10] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 227 seconds [23:56:37] PROBLEM - MySQL Recent Restart on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:56:37] PROBLEM - MySQL Slave Running on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:56:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.212 seconds [23:57:04] PROBLEM - MySQL Idle Transactions on db12 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:57:49] RECOVERY - MySQL Recent Restart on db12 is OK: OK 5795699 seconds since restart [23:57:49] RECOVERY - MySQL Slave Running on db12 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [23:58:16] RECOVERY - MySQL Idle Transactions on db12 is OK: OK longest blocking idle transaction sleeps for 0 seconds