[00:39:30] New review: Tychay; "Okay, let's hold back on this change, but push https://gerrit.wikimedia.org/r/#/c/26770/" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/27830 [00:46:02] !log upgrade Bugzilla to 4.0.8 (security patch) [00:46:11] Logged the message, Master [00:48:30] paravoid: is there a bug for TheHammock.jpg ? [00:48:40] look like another bad stat cache entry [00:49:00] getting a local file copy and stating that give the correct size, but mc says 108 [00:50:02] hmm, 'latest' doesn't help [00:52:21] * AaronSchulz wtfs [00:53:00] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [01:03:51] oh, nvm [01:04:09] latest has [size] => 791778 [02:00:02] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [02:00:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 286 seconds [02:21:42] !log LocalisationUpdate completed (1.21wmf2) at Sat Oct 20 02:21:41 UTC 2012 [02:21:56] Logged the message, Master [02:25:59] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [02:25:59] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [02:41:10] !log LocalisationUpdate completed (1.21wmf1) at Sat Oct 20 02:41:10 UTC 2012 [02:41:23] Logged the message, Master [03:00:56] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [03:49:50] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 3 seconds [04:25:59] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [04:44:34] New patchset: Dzahn; "add wikivoyagelb service IPS to lvs.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28604 [04:45:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28604 [04:51:36] New patchset: Dzahn; "add wikivoyagelb and wikidatalb service IPS to lvs.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28604 [04:52:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/28604 [05:13:59] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [05:19:59] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [05:51:02] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [06:04:05] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 218 seconds [06:04:17] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 223 seconds [06:10:59] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [06:12:11] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [06:12:20] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 2 seconds [06:31:51] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 216 seconds [06:31:59] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 222 seconds [06:38:17] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [06:38:29] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [06:40:59] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [08:02:51] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:02:51] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [10:53:51] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [11:43:13] Change abandoned: Hashar; "Going to use a Debian package instead." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/28216 [12:00:45] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [12:26:51] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [12:26:51] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [13:01:48] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [13:04:05] PROBLEM - Host srv222 is DOWN: PING CRITICAL - Packet loss = 100% [13:06:54] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:09:01] PROBLEM - Host srv221 is DOWN: PING CRITICAL - Packet loss = 100% [13:15:09] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:18:18] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.045 second response time [14:26:51] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [15:14:55] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [15:20:51] PROBLEM - Puppet freshness on db42 is CRITICAL: Puppet has not run in the last 10 hours [15:51:54] PROBLEM - Puppet freshness on ms-fe1 is CRITICAL: Puppet has not run in the last 10 hours [16:11:54] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [16:41:51] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [17:55:11] !log updated otrs motd per requst by otrs admins (it referenced a now closed request for comment) [17:55:26] Logged the message, Master [18:03:55] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [18:03:55] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [18:21:23] !log powercycling srv221 & srv222, both dead since ~5h ago [18:21:36] Logged the message, Master [18:23:24] RECOVERY - Host srv221 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [18:23:51] RECOVERY - Host srv222 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [18:23:58] it rose from the dead ;) [18:25:12] PROBLEM - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is CRITICAL: Connection refused [18:25:28] uhhh [18:25:29] paravoid [18:25:37] I know [18:25:37] it's going to fix itself [18:26:20] now I get a page? [18:26:20] as soon as "mw sync" finishes [18:26:23] * apergos did the backread [18:26:28] yes. ignore it, I'm on it [18:26:42] ok thanks [18:26:56] if rsync ever finishes grr [18:27:03] ah [18:27:10] so i'm off the hook? ;) [18:27:12] heh, there he is [18:27:20] yes [18:27:25] ok :) [18:27:26] we have pybal's depool threshold too high [18:27:31] it refuses to depool two servers [18:27:38] it depooled one and we had two down [18:28:00] the second one never got depooled and so you have one in six chances to have it fail [18:28:08] i see [18:28:17] I powercycled the boxes and I've been running puppet on them so that apache can finally start [18:28:36] there we are, we should get the recovery soon now [18:28:43] RECOVERY - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 58578 bytes in 0.667 seconds [18:28:43] :) [18:28:57] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.036 second response time [18:48:09] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:51:18] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.002 second response time on port 11000 [20:32:51] PROBLEM - Host srv224 is DOWN: PING CRITICAL - Packet loss = 100% [20:54:45] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [21:44:32] New patchset: CSteipp; "Add wikivoyage as suffix for wgConf" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/28862 [21:46:48] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:48:27] RECOVERY - Memcached on virt0 is OK: TCP OK - 8.999 second response time on port 11000 [22:01:48] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [22:27:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [22:27:54] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [22:51:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:53:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.278 seconds [23:02:51] PROBLEM - Puppet freshness on cp1040 is CRITICAL: Puppet has not run in the last 10 hours [23:21:36] PROBLEM - Memcached on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:08] RECOVERY - Memcached on virt0 is OK: TCP OK - 9.003 second response time on port 11000 [23:26:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:34:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.108 seconds