[00:11:28] RECOVERY - Disk space on analytics1003 is OK: DISK OK [00:51:48] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 345 seconds [00:52:08] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 362 seconds [00:52:59] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:53:20] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -1 seconds [01:02:51] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:07:19] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:17:10] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:19:40] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:27:50] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:28:50] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:32:25] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:33:43] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [01:37:06] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:38:24] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [04:11:22] PROBLEM - RAID on nickel is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:22] RECOVERY - RAID on nickel is OK: OK: Active: 3, Working: 3, Failed: 0, Spare: 0 [06:25:30] RECOVERY - Disk space on ocg1001 is OK: DISK OK [06:28:31] PROBLEM - puppet last run on search1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:40] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:10] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:11] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:21] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:31] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:30] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:44:53] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:43] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:45:43] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:23] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:33] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:01:33] RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [07:12:44] PROBLEM - MySQL Processlist on db1059 is CRITICAL: CRIT 78 unauthenticated, 0 locked, 0 copy to table, 0 statistics [07:13:44] RECOVERY - MySQL Processlist on db1059 is OK: OK 6 unauthenticated, 0 locked, 0 copy to table, 1 statistics [07:16:04] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [07:16:04] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [07:19:04] <_joe_> mmmh [07:19:27] <_joe_> ok just a spike [07:24:33] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: puppet fail [07:25:05] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [07:32:44] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [07:37:26] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [07:42:56] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [07:43:15] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [07:45:36] PROBLEM - Apache HTTP on mw1160 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:46:37] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.076 second response time [09:33:26] PROBLEM - Host cp3004 is DOWN: CRITICAL - Plugin timed out after 15 seconds [09:33:44] RECOVERY - Host cp3004 is UP: PING OK - Packet loss = 0%, RTA = 95.02 ms [10:55:38] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:06:15] <_joe_> svn? [11:06:35] yes [11:08:01] <_joe_> matanya: yes I was just amused by the idea we still keep an SSL cert for something that outdated [11:08:24] I share that amusement :) [11:09:12] you know what else we keep that is outdated? complete version history of all articles :P [13:01:39] <_joe_> matanya: well, keeping everything on svn is a bit different [13:01:55] yeah, just kidding [13:01:56] <_joe_> we have git that should have the full history imported, I hope [13:02:04] it has [13:02:27] <_joe_> (sorry for the late answer, I was AFK for a while) [13:02:51] no worries, my jokes are note that funny :) [13:02:56] *not [13:06:36] <_joe_> nerd humour [14:00:38] svn.wikimedia.org and/or Special:CodeReview is still sometimes useful. Bugs and commit messages reference SVN revisions, not Git hashes. [14:02:12] Old ones, anyway. [15:18:33] (03CR) 10Reedy: Add robots.txt rewrite rule where wiki is public (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [15:22:26] (03CR) 10Reedy: Add robots.txt rewrite rule where wiki is public (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [15:22:42] (03PS5) 10Reedy: Add robots.txt rewrite rule where wiki is public [puppet] - 10https://gerrit.wikimedia.org/r/147487 [15:22:57] (03PS6) 10Reedy: Add robots.txt rewrite rule where wiki is public [puppet] - 10https://gerrit.wikimedia.org/r/147487 [15:25:53] (03PS4) 10Reedy: Make apple-touch-icon.png configurable via touch.php [puppet] - 10https://gerrit.wikimedia.org/r/147488 [15:29:04] (03PS7) 10Reedy: Add robots.txt rewrite rule where wiki is public [puppet] - 10https://gerrit.wikimedia.org/r/147487 [15:29:26] (03PS5) 10Reedy: Make apple-touch-icon.png configurable via touch.php [puppet] - 10https://gerrit.wikimedia.org/r/147488 [15:31:48] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [15:45:10] <_joe_> Reedy: did you see my PS about those? [15:49:49] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:25:54] (03PS1) 10Reedy: Bump Epochs to 20130601000000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170590 [18:12:07] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 21.43% of data above the critical threshold [500.0] [18:36:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:41:35] !log hoo Synchronized php-1.25wmf5/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 09s) [20:41:44] Logged the message, Master [20:41:52] !log hoo Synchronized php-1.25wmf6/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 08s) [20:41:57] Logged the message, Master