[00:02:28] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:37] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:52] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.678 seconds [00:14:28] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:22:16] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.671 seconds [00:23:46] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.185 seconds [00:26:10] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:26:16] New patchset: Bhartshorne; "adding etag awareness to abort failed puts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2598 [00:27:31] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [00:27:40] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:27:56] New review: Aaron Schulz; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2598 [00:28:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:07] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:35:10] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [00:36:40] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.238 seconds [00:43:25] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:43:59] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2598 [00:43:59] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2598 [00:44:46] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.537 seconds [00:49:13] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:50:43] New patchset: Bhartshorne; "typoed semicolon should be comma" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2599 [00:51:15] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2599 [00:51:16] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2599 [00:51:37] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.341 seconds [00:53:04] New patchset: Bhartshorne; "yay more typos boo no lint checks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2600 [00:53:25] too bad gerrit.2600 doesn't have more subversive content. [00:53:32] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2600 [00:53:33] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2600 [00:53:35] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.549 seconds [00:55:40] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:58:58] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:00:55] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.275 seconds [01:08:07] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.740 seconds [01:08:52] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:31] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:43] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.769 seconds [01:16:35] New patchset: Asher; "my fork of gdash from git://github.com/asher/gdash.git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2601 [01:18:46] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:19:43] New patchset: Asher; "my fork of gdash from git://github.com/asher/gdash.git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2601 [01:20:11] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2601 [01:20:12] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2601 [01:20:43] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.528 seconds [01:21:28] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 606s [01:21:55] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 635s [01:22:31] PROBLEM - MySQL replication status on db1025 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 673s [01:24:46] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:22] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.128 seconds [01:28:31] PROBLEM - Puppet freshness on carbon is CRITICAL: Puppet has not run in the last 10 hours [01:28:40] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.021 seconds [01:29:16] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:37] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:42:37] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.487 seconds [01:42:37] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [01:42:55] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [01:46:13] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.094 seconds [01:46:31] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:35] RECOVERY - MySQL replication status on db1025 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [01:49:04] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.561 seconds [01:54:10] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:37] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 609s [01:54:46] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 617s [01:56:43] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.534 seconds [01:57:10] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:00:46] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:25] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.873 seconds [02:04:40] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.704 seconds [02:08:43] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:22] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:13:58] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.664 seconds [02:17:52] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:28] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 243 seconds [02:25:49] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.344 seconds [02:29:53] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:30:55] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 0 seconds [02:35:19] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 217 seconds [02:41:55] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:42:22] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:47:10] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 0 seconds [02:54:13] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.201 seconds [02:54:58] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 239 seconds [02:58:07] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:12:45] PROBLEM - Host amssq50 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:45] PROBLEM - Host amssq59 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:45] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:45] PROBLEM - Host amssq56 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:45] PROBLEM - Host amssq60 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:46] PROBLEM - Host amssq45 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:46] PROBLEM - Host amssq62 is DOWN: PING CRITICAL - Packet loss = 100% [03:12:47] PROBLEM - Host amssq51 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:03] PROBLEM - Host amssq49 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:03] PROBLEM - Host amssq58 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:03] PROBLEM - Host amssq61 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:03] PROBLEM - Host amssq52 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:03] PROBLEM - Host bits.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [03:13:12] PROBLEM - Host cp3002 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:39] PROBLEM - Host cp3001 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:39] PROBLEM - Host br1-knams is DOWN: PING CRITICAL - Packet loss = 100% [03:13:48] PROBLEM - Host knsq24 is DOWN: PING CRITICAL - Packet loss = 100% [03:13:57] PROBLEM - Host bits.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [03:14:15] PROBLEM - Host knsq17 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:15] PROBLEM - Host knsq20 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:15] PROBLEM - Host knsq27 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:15] PROBLEM - Host knsq29 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:15] PROBLEM - Host knsq21 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:16] PROBLEM - Host knsq23 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:16] PROBLEM - Host knsq26 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:16] who killed bits ? [03:14:17] PROBLEM - Host knsq18 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:17] PROBLEM - Host csw2-esams is DOWN: PING CRITICAL - Packet loss = 100% [03:14:18] PROBLEM - Host csw1-esams is DOWN: PING CRITICAL - Packet loss = 100% [03:14:24] PROBLEM - Host hooft is DOWN: PING CRITICAL - Packet loss = 100% [03:14:24] PROBLEM - Host foundation-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [03:14:42] PROBLEM - Host knsq28 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:51] PROBLEM - Host knsq19 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:51] PROBLEM - Host knsq22 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:51] PROBLEM - Host knsq16 is DOWN: PING CRITICAL - Packet loss = 100% [03:14:51] PROBLEM - Host knsq25 is DOWN: PING CRITICAL - Packet loss = 100% [03:15:09] PROBLEM - Host maerlant is DOWN: PING CRITICAL - Packet loss = 100% [03:15:09] PROBLEM - Host foundation-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [03:15:15] hah, all of us come online [03:15:18] PROBLEM - Host ms6 is DOWN: PING CRITICAL - Packet loss = 100% [03:15:18] PROBLEM - Host mediawiki-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [03:15:18] PROBLEM - Host lily is DOWN: PING CRITICAL - Packet loss = 100% [03:15:27] RECOVERY - Host bits.esams.wikimedia.org is UP: PING WARNING - Packet loss = 73%, RTA = 115.55 ms [03:15:27] RECOVERY - Host cp3001 is UP: PING WARNING - Packet loss = 73%, RTA = 121.22 ms [03:15:27] RECOVERY - Host knsq24 is UP: PING WARNING - Packet loss = 28%, RTA = 120.16 ms [03:15:27] RECOVERY - Host knsq27 is UP: PING WARNING - Packet loss = 28%, RTA = 117.23 ms [03:15:27] RECOVERY - Host amssq52 is UP: PING WARNING - Packet loss = 28%, RTA = 119.56 ms [03:15:27] RECOVERY - Host amssq58 is UP: PING WARNING - Packet loss = 28%, RTA = 119.56 ms [03:15:27] RECOVERY - Host amssq49 is UP: PING WARNING - Packet loss = 28%, RTA = 125.82 ms [03:15:28] RECOVERY - Host maerlant is UP: PING WARNING - Packet loss = 66%, RTA = 115.94 ms [03:15:28] RECOVERY - Host ms6 is UP: PING OK - Packet loss = 16%, RTA = 116.82 ms [03:15:36] RECOVERY - Host amssq60 is UP: PING OK - Packet loss = 0%, RTA = 123.57 ms [03:15:36] RECOVERY - Host amssq59 is UP: PING OK - Packet loss = 0%, RTA = 117.85 ms [03:15:36] RECOVERY - Host cp3002 is UP: PING OK - Packet loss = 0%, RTA = 117.40 ms [03:15:36] RECOVERY - Host hooft is UP: PING OK - Packet loss = 0%, RTA = 123.60 ms [03:15:36] RECOVERY - Host knsq21 is UP: PING OK - Packet loss = 0%, RTA = 117.64 ms [03:15:36] RECOVERY - Host amssq62 is UP: PING OK - Packet loss = 0%, RTA = 117.62 ms [03:15:37] RECOVERY - Host amssq56 is UP: PING OK - Packet loss = 0%, RTA = 117.93 ms [03:15:37] RECOVERY - Host amssq50 is UP: PING OK - Packet loss = 0%, RTA = 123.56 ms [03:15:38] RECOVERY - Host amssq61 is UP: PING OK - Packet loss = 0%, RTA = 123.70 ms [03:15:38] RECOVERY - Host knsq20 is UP: PING OK - Packet loss = 0%, RTA = 117.63 ms [03:15:39] RECOVERY - Host knsq26 is UP: PING OK - Packet loss = 0%, RTA = 117.54 ms [03:15:39] RECOVERY - Host knsq23 is UP: PING OK - Packet loss = 0%, RTA = 117.87 ms [03:15:40] RECOVERY - Host knsq29 is UP: PING OK - Packet loss = 0%, RTA = 117.70 ms [03:15:40] RECOVERY - Host knsq18 is UP: PING OK - Packet loss = 0%, RTA = 123.21 ms [03:15:41] RECOVERY - Host knsq17 is UP: PING OK - Packet loss = 0%, RTA = 123.59 ms [03:15:41] RECOVERY - Host foundation-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 123.50 ms [03:15:42] RECOVERY - Host csw2-esams is UP: PING OK - Packet loss = 0%, RTA = 125.50 ms [03:15:54] RECOVERY - Host knsq28 is UP: PING OK - Packet loss = 0%, RTA = 126.84 ms [03:15:54] RECOVERY - Host amssq57 is UP: PING OK - Packet loss = 0%, RTA = 117.08 ms [03:15:54] RECOVERY - Host amssq51 is UP: PING OK - Packet loss = 0%, RTA = 120.42 ms [03:15:54] RECOVERY - Host amssq45 is UP: PING OK - Packet loss = 0%, RTA = 120.76 ms [03:16:03] RECOVERY - Host knsq19 is UP: PING OK - Packet loss = 0%, RTA = 114.12 ms [03:16:03] RECOVERY - Host knsq22 is UP: PING OK - Packet loss = 0%, RTA = 113.45 ms [03:16:03] RECOVERY - Host knsq16 is UP: PING OK - Packet loss = 0%, RTA = 120.75 ms [03:16:03] RECOVERY - Host knsq25 is UP: PING OK - Packet loss = 0%, RTA = 113.59 ms [03:16:12] RECOVERY - Host br1-knams is UP: PING OK - Packet loss = 0%, RTA = 114.67 ms [03:16:30] RECOVERY - Host mediawiki-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 114.38 ms [03:16:39] RECOVERY - Host csw1-esams is UP: PING OK - Packet loss = 0%, RTA = 114.41 ms [03:18:09] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.458 seconds [03:19:21] RECOVERY - Host bits.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 113.46 ms [03:20:33] RECOVERY - Host foundation-lb.esams.wikimedia.org_https is UP: PING OK - Packet loss = 0%, RTA = 119.19 ms [03:20:42] RECOVERY - Host lily is UP: PING OK - Packet loss = 0%, RTA = 113.58 ms [03:22:21] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:42] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.909 seconds [03:24:18] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 0 seconds [03:27:45] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:21] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 231 seconds [03:36:09] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 194 seconds [03:38:24] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.004 seconds [03:38:24] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.259 seconds [03:42:27] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:42:36] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:57:00] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 29 seconds [04:00:09] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.278 seconds [04:01:03] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 272 seconds [04:05:21] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 3.772 seconds [04:08:30] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 0 seconds [04:08:48] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:11:39] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:14:57] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 253 seconds [04:17:03] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.660 seconds [04:22:27] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:12] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.452 seconds [04:30:24] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:30] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.982 seconds [04:40:00] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.239 seconds [04:42:24] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:48:06] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:49:00] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.081 seconds [04:54:51] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.464 seconds [04:55:54] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:58:54] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:00:55] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.695 seconds [05:04:58] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:11:25] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 4.957 seconds [05:11:34] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 6.087 seconds [05:20:43] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:21:01] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:21:55] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.551 seconds [05:22:22] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.029 seconds [05:27:46] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:49] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:29:07] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.780 seconds [05:31:22] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.928 seconds [05:52:49] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 1 seconds [05:58:16] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 234 seconds [06:37:07] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:37:07] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:38:19] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 0.960 seconds [06:38:19] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.933 seconds [06:56:19] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 5 seconds [07:00:13] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 239 seconds [07:17:01] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 0 seconds [07:23:37] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 260 seconds [07:47:28] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:50:01] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [07:51:04] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [07:53:46] PROBLEM - LVS Lucene on search-pool2.svc.pmtpa.wmnet is CRITICAL: Connection refused [07:53:54] yeah it sure is [07:54:58] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [07:56:55] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [07:57:13] PROBLEM - Lucene on search6 is CRITICAL: Connection refused [07:59:09] stale NFS file handle [08:00:18] ah [08:02:47] apergos: started [08:02:55] RECOVERY - LVS Lucene on search-pool2.svc.pmtpa.wmnet is OK: TCP OK - 0.003 second response time on port 8123 [08:03:00] !log remounted /home on search6, started lsearchd [08:03:04] Logged the message, Master [08:03:09] how did you get the remount to work? [08:03:15] that's what I was looking for how to do [08:03:22] just umount /home and mount /home [08:03:28] cause its in fstab [08:03:40] ok [08:03:49] RECOVERY - Lucene on search6 is OK: TCP OK - 0.014 second response time on port 8123 [08:03:53] I was still looking in the irc logs to figure out the workaround [08:03:54] thanks [08:04:08] yw [08:05:50] I wonder if we'll have that on all the other search boxes :-/ [08:06:11] I hate nfs [08:06:19] is there dsh group search boxes? [08:06:40] I dunno, let's look [08:06:46] lets just do an "ls /home" [08:07:05] won't it hang? [08:07:28] /usr/local/dsh/node_groups/search [08:07:36] hmm, didnt for me, well just "cd" then [08:07:38] -bash: cd: /home: Stale NFS file handle [08:07:43] k [08:07:45] /usr/local/dsh/node_groups/searchidx [08:08:07] the last one has only one member of course [08:08:26] hmm, search boxes ask for password.. [08:08:50] they do? [08:09:04] if you can ssh into them that seems odd [08:09:16] what's one that asked for a password? [08:09:19] i should be root on fenari first :p [08:10:28] they all have the Stale NFS file handle :p [08:10:32] no, just have the dsh go as root [08:10:50] search9,20,14,7,16,15,2,12,17... [08:11:24] dsh -cM -g search -- "cd /home" [08:11:52] it won't bite us til someone has to restart on those [08:11:54] then, boom [08:13:02] it only needs it at the beginning for startup, seems like it doesn't actually have anything on /home open after that [08:13:07] !log all search boxes had /home: Stale NFS file handle.. remounting [08:13:09] Logged the message, Master [08:13:44] apergos: better now [08:13:55] how's the cd look? [08:14:09] it does not return anythin, so good:) [08:14:13] yay! [08:14:18] so........ [08:14:21] well, there was one... [08:14:26] ? [08:14:29] searchidx2: mount.nfs: /home is busy or already mounted [08:14:38] ah, that could be the exception [08:14:42] searchidx2: umount.nfs: /home: device is busy [08:14:49] (to having things open) [08:14:54] yep, looks like it [08:15:36] the rest have it remounted, example search9, can list /home now..ack [08:16:22] oh [08:16:23] heh [08:16:40] stuff runs as rainman with things in his directory [08:16:54] it's already mounted [08:17:00] so yay for that [08:17:03] k;) [08:17:17] so... what do you think about another stab at https and office? [08:17:20] :-) [08:17:27] ah,heh;) [08:17:35] after i made coffee?:) [08:17:39] people in the us are asleep, [08:17:42] we hardly use it [08:17:43] sure! [08:17:48] ah it's an hour later for you [08:18:12] ok, cool, be back soon [08:18:45] ok [08:21:04] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [08:22:16] RECOVERY - MySQL Slave Delay on db30 is OK: OK replication delay 6 seconds [08:28:43] PROBLEM - MySQL Slave Delay on db30 is CRITICAL: CRIT replication delay 235 seconds [08:28:43] If the client provided a Host: header field the list is searched for a matching vhost and the first hit on a ServerName or ServerAlias is taken and the request is served from that vhost. [08:28:57] so that's what we have to work with ( http://httpd.apache.org/docs/2.2/vhosts/details.html ) [08:29:01] PROBLEM - Puppet freshness on search1002 is CRITICAL: Puppet has not run in the last 10 hours [08:29:08] * apergos goes to finish fixing their oatmeal [08:39:04] PROBLEM - Puppet freshness on gilman is CRITICAL: Puppet has not run in the last 10 hours [08:39:04] PROBLEM - Puppet freshness on grosley is CRITICAL: Puppet has not run in the last 10 hours [08:39:39] re [08:40:04] ok, so we need to, quoting Ryan: "you need to configure the redirect to only redirect if X-Forwarded-Proto is http" [08:40:17] uh huh [08:40:31] RewriteCond %{HTTP:X-Forwarded-Proto} !https [08:40:32] so [08:40:43] :) [08:40:45] if we set up a separate vhost stanza just for office.wikimedia.org [08:40:57] maybe we can get away with using what's in remnants.conf [08:41:06] I'm gonna look at what's there now. [08:41:52] had you tried putting something in that stanza earlier? [08:42:15] before the first rewrite rule I guess [08:44:16] nah, not really, i was looking at a slightly different way to rewrite [08:44:20] RewriteCond %{HTTPS} off [08:44:51] but you got the right thing already i think [08:45:09] well, we put it on one server, we try it from fenari, etc [08:45:54] ok, srv250 is the guinea pig? [08:46:07] suer [08:46:10] sure [08:46:39] morning hashar [08:46:51] hello :) [08:47:15] apergos: do you like shared screen? [08:47:33] I can do that, whatever you like [08:47:53] then "screen -x" on srv250 pls [08:47:59] or you can just tell me when you have made changes [08:48:04] since I'm on there I can just look at them [08:48:20] it's not like the process of making them is so special... [08:48:58] PROBLEM - Puppet freshness on ganglia1001 is CRITICAL: Puppet has not run in the last 10 hours [08:49:24] it just combines the editing and chatting ;) poor man's etherpad.. but ok ..just editing [08:49:47] I have my irc window and my terminal window on the same desktop so.... [08:50:10] mind if I attach to srv250 ? Always wondered how it looks :-D [08:50:20] (and yes, one day I will have to learn how to use screen) [08:50:26] go ahead [08:51:11] root only feature 8-)) [08:51:19] will try out on my comp [08:55:46] apergos: saved redirects.conf on srv250 [08:56:10] oh, you put it there and not remnants.conf? [08:56:13] lemme look [08:56:36] i added it back like the circular one.but just added one more condition [08:56:41] see, because we have the "firstmatch only" [08:56:42] the !https one [08:56:52] thing, this means that the stanza in remnants.conf won't ever get used [08:57:17] and I think we want it (docroot, the math rewrites, all the rest) [08:58:18] I'm not 100% sure [08:58:26] gotcha, about "The first matching path on the list .." [08:58:43] but afaict that's what would have been active til now [09:00:37] having fun at work is important: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/111525#c30978 [09:01:52] apergos: alright, moved it to remnant, same thing [09:02:08] apergos: before the other standard mediawiki rewrite rules [09:02:11] lemme stare at it some [09:02:55] RewriteCond %{HTTP_HOST} office.wikimedia.org that line can go now [09:03:14] by definition we already match, right? [09:03:19] ServerName office.wikimedia.org [09:03:34] true [09:04:51] hashar: hah, is that "wiki love"? [09:04:59] eh, "code love" [09:04:59] somehow! [09:05:56] there's the "technical barnstar ";) [09:09:02] since you're in the file are you taking out that line? :-P [09:09:21] apergos: i just did, gracefulled, and wgot it from fenari.. [09:09:29] ok [09:09:30] HTTP request sent, awaiting response... HTTP/1.1 301 Moved Permanently [09:09:32] "wgot" nice [09:09:35] Location: https://office.wikimedia.org/wiki/ [following] [09:09:38] :) [09:10:06] I checked with redirect=0, redirect=1, fior mainpage [09:10:10] now let me try a few other variants [09:12:20] hmmm [09:12:34] wget --header="Host: office.wikimedia.org" --max-redirect=2 -S 'http://srv250/' [09:12:39] this saves a copy of "index.html" [09:12:47] I wonder why it saves that and not Main_Page [09:13:52] also.... [09:14:01] RewriteRule ^/math/(.*) http://upload.wikimedia.org/math/$1 [R=301] [09:14:08] that's probably going to be a problem with https [09:14:31] it should be https I guess [09:14:38] and here that $wg Mediawiki setting also comes into play again, doesnt it [09:14:51] I dunno about that [09:15:12] apergos: index.html is probably a wget default whenever you ask for an url ending with / [09:15:30] let me try it for some other wiki [09:15:55] index.html contains Main Page content [09:15:57] hashar: bingo, same behavior for meta (which we didn't touch) [09:15:59] so that's ok. [09:16:12] class="firstHeading">Main Page [09:16:18] yeah, I saw the content was ok, just wanted to make sure we weren't changing the behavior [09:17:20] but we can just use https:// on upload.. it seems.. [09:17:32] good [09:17:44] hashar: re:screen, actually sharing is not really root-only, it's just "same user", so you can also login as "foobar" multiple times, first one does "screen", and the following ones "screen -x" (for different users you'd have to mess with tty permissions) [09:17:44] I didn't see "wgMediawiki" [09:17:51] in the usual config files [09:18:28] ah, it was this: [09:18:30] If you'd like to force the wiki to be SSL-only, set $wgServer = 'https://example.com'; (whatever your site is, do NOT include the path to the wiki here), along with .htaccess rewriterules to redirect people from the http site to the https site [09:18:48] mutante: yeah figured that out on my local comp. Looks like the perfect tool for "Xtreme operating" [09:18:52] but that does not apply to us, cause we do SSL termination..? [09:19:27] I dinot; think you have to set it [09:20:11] we already have a special stanza for this in CommonSettings [09:20:18] } elseif ( isset( $_SERVER['HTTP_X_FORWARDED_PROTO'] ) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https' ) { [09:20:26] $wgServer = preg_replace( '/^http:/', 'https:', $wgServer ); [09:20:38] ah:) [09:21:19] so just https://upload I guess and we see [09:21:55]