[00:00:06] Krinkle, Ryan_Lane: i think that's a reasonable possible explanation for the slowness you saw (if it happens again, you can probably confirm with the 'show-queue' commands and other diagnostics) [00:00:08] yeah, would be nice to have ^demon here. he's the one mostly managing this [00:01:24] Krinkle: cool, thanks, i see the archives:) [00:02:57] Krinkle, Ryan_Lane: https://github.com/openstack-infra/config/blob/master/modules/openstack_project/manifests/review.pp [00:03:02] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 28 seconds [00:03:12] Krinkle, Ryan_Lane: (maybe pass that on to ^demon, or have him ping me next time he's around) [00:03:18] * Ryan_Lane nods [00:03:34] I'm reading up. I'll likely increase it now [00:03:34] Krinkle, Ryan_Lane: (maybe pass that on to ^demon, or have him ping me next time he's around) [00:03:37] Krinkle, Ryan_Lane: we had some growing pains with our gerrit, and eventually found those google groups links in the comments [00:03:43] I doubt anyone has touched that since I initial set it up [00:03:52] *initially [00:03:57] Krinkle, Ryan_Lane: which i think are the secret tuning documentation. :/ [00:04:03] heh [00:04:16] fwiw, zuul already has enabled debug.log since the lats (fixed) issue [00:04:53] Krinkle, Ryan_Lane: so anyway, the URLs in that file should help explain some of the params, and then there's our values, plus some (pretty poor) notes [00:05:11] thanks [00:07:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:07:59] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:07:55 UTC 2013 [00:08:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:08] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:09:02 UTC 2013 [00:09:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:09] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:10:04 UTC 2013 [00:10:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:08] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:10:59 UTC 2013 [00:11:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:49] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:11:47 UTC 2013 [00:12:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:08] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 00:13:06 UTC 2013 [00:13:49] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [00:14:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 217 seconds [00:18:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [00:22:24] Krinkle: Do you know why PhantomJS is timing out on REL1_20? [00:22:41] csteipp: It isn't timing out, Apache is serving HTTP 500 [00:22:55] csteipp: https://bugzilla.wikimedia.org/show_bug.cgi?id=47639 [00:22:58] Ah, cool. [00:22:59] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [00:23:24] as a result, there is no QUnit on the page (or anything on the page for that matter) so the grunt-phantomjs is timing out [00:23:56] csteipp: looks like there is a bug in < 1.21 that causes sqlite permissions errors [00:23:59] 1.21 and master are fine [00:24:23] It is only happening to qunit and not phpunit because phpunit is testing from command line, bypassing Apache. [00:24:29] it isn't a bug with qunit though [00:25:16] That makes sense, I hadn't seen that bug. I'll leave you and hasher to it. Thanks! [00:33:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [00:34:59] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [00:45:58] !log tstarling synchronized wmf-config/PrivateSettings.php [00:46:06] Logged the message, Master [00:48:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 217 seconds [00:49:20] The Nagios page said http://nagios.wikimedia.org "is currently an alias to [[spence]]". Is that still true for its replacement https://icinga.wikimedia.org/ ? [00:53:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [00:58:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [00:59:23] spagewmf: I think icinga runs on neon [00:59:36] I noticed that the Parsoid cluster is being hit by both [00:59:56] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 21 seconds [01:06:41] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [01:16:28] Published patchset: Ryan Lane; "Firewall redis from the outside world on vumi" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61732 [01:16:30] Published patchset: Ryan Lane; "Firewall off redis on stat1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61735 [01:16:44] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61732 [01:16:59] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61735 [01:20:02] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 188 seconds [01:24:01] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [01:25:01] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [01:26:20] Published patchset: Tim Starling; "Use a password for session redis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61734 [01:29:01] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [01:29:44] New review: Aaron Schulz; "Don't forget $wgJobQueueAggregator (which could also be switched to use the actual queue server inst..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61734 [01:30:02] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [01:37:55] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [01:37:55] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [01:37:55] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [01:37:55] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [01:38:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [01:43:56] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 16 seconds [01:48:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [01:50:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 28 seconds [01:53:33] New patchset: Tim Starling; "Use a password for job queue and session redis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61734 [01:54:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 214 seconds [01:58:26] New patchset: Tim Starling; "Require a password for Redis" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61740 [01:58:55] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [02:02:43] New patchset: Aaron Schulz; "Turned autoResync back on again for multiwrite backend." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61741 [02:02:56] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 28 seconds [02:05:10] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61741 [02:07:53] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [02:09:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [02:09:08] !log aaron synchronized wmf-config/filebackend.php 'Turned autoResync back on again for multiwrite backend' [02:09:16] Logged the message, Master [02:14:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [02:16:40] !log LocalisationUpdate completed (1.22wmf2) at Wed May 1 02:16:40 UTC 2013 [02:16:49] Logged the message, Master [02:17:23] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [02:18:03] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [02:25:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 197 seconds [02:27:30] !log LocalisationUpdate completed (1.22wmf3) at Wed May 1 02:27:30 UTC 2013 [02:27:38] Logged the message, Master [02:29:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [02:31:04] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [02:43:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 185 seconds [02:49:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [02:56:27] New patchset: Tim Landscheidt; "Preliminary toollabs module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59969 [02:58:56] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [03:02:16] PROBLEM - RAID on analytics1017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:02:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 28 seconds [03:05:48] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [03:13:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [03:18:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [03:23:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [03:29:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 220 seconds [03:35:59] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [03:49:56] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 1 03:49:55 UTC 2013 [03:50:00] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [03:50:04] Logged the message, Master [03:53:00] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 22 seconds [03:59:00] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [04:00:00] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [04:06:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:07:59] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:07:53 UTC 2013 [04:08:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:08] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:08:59 UTC 2013 [04:09:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:08] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:10:02 UTC 2013 [04:10:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:59] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:10:57 UTC 2013 [04:11:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:48] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:11:47 UTC 2013 [04:12:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:38] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:12:29 UTC 2013 [04:13:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:39] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 04:13:35 UTC 2013 [04:14:18] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [04:18:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [04:21:59] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 10 seconds [04:27:10] New patchset: Faidon; "Add an initial ferm module & base::firewall class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61744 [04:28:59] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 218 seconds [04:34:28] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [04:38:03] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [04:44:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [04:47:03] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 23 seconds [04:57:06] New patchset: Tim Starling; "Move logmsgbot from fenari to neon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61746 [04:59:03] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [05:02:02] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [05:02:29] TimStarling: logmsgbot is not registered. [05:04:09] yeah, I guess we should register it [05:04:21] New review: Tim Starling; "I confirmed with strace that nc -q0 does an orderly shutdown:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61746 [05:05:03] is it possible to convince gerrit that not every long number is a partial hash? [05:05:07] https://bugzilla.wikimedia.org/show_bug.cgi?id=22313 [05:05:12] "Register nickserv account for logmsgbot" [05:05:31] funny [05:05:45] https://bugzilla.wikimedia.org/show_bug.cgi?id=45780 "Gerrit mangles Wikipedia permalinks with anchor" [05:05:50] 6 comments and nobody actually does it [05:06:15] At least Gerrit didn't mangle your comment like it does with URLs. [05:06:26] You can add preceding 0s as a workaround. [05:06:31] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [05:06:33] Though you usually need like 30 of them. [05:08:15] gerrit-wm is also unregistered. [05:08:27] I think registering an in-use nick is difficult/impossible. [05:11:30] Heh, I thought of netcat the other day when we were discussing UDP, XMPP, etc. [05:14:02] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 233 seconds [05:14:04] Ryan_Lane, re stat1 lockdown & firewal, will your change affect UDP events that vumi sends to analytics? [05:14:19] Ryan_Lane, sorry, not stat1, zhen & silver [05:18:04] New patchset: Tim Starling; "Move logmsgbot from fenari to neon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61746 [05:20:02] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 13 seconds [05:24:00] TimStarling: https://gerrit.wikimedia.org/r/#/c/61627/ adds a README to the tcpircbot module (..making good on my threat to extend it) that explains more clearly how it is parametrized [05:38:15] logmsgbot: /msg NickServ SET PRIVATE ON [05:43:58] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 232 seconds [05:44:58] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 24 seconds [05:59:19] yurik: did you look at the change? [05:59:41] yurik: because all it does it block the redis port outside of our network [06:00:25] Ryan_Lane, there was no link in the email, wasn't sure of the best place to check. Cool, sounds like no issues there. [06:01:15] "I've firewalled these to 208.80.152.0/22" ;) [06:01:31] Ryan_Lane, my point exactly! [06:02:05] too cryptic for a casual observer that notices that it has to do with vumi that i will have to support (at least in part) [06:02:16] that's cryptic? [06:02:42] sigh - i wasn't sure what exactly you blocked. UDP traffic does not go through the firewalls too well. [06:03:11] hence it was easier to ask then to assume everything is in order [06:03:15] heh [06:03:28] well, I thought the thread would provide enough context as to what it meant [06:04:01] * Ryan_Lane shrugs [06:04:05] no issues asking, though [06:04:23] Ryan_Lane, some day it will be enough for me. I hope. [06:04:42] :) [06:06:04] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [06:12:14] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [06:23:44] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 186 seconds [06:24:44] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 5 seconds [06:26:44] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 06:26:37 UTC 2013 [06:27:04] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [06:27:24] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 06:27:21 UTC 2013 [06:28:04] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:05] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 06:27:57 UTC 2013 [06:29:04] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [06:45:27] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [06:45:27] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [07:08:15] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [07:38:24] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:54] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [08:07:45] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 08:07:43 UTC 2013 [08:07:54] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:25] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 08:08:22 UTC 2013 [08:08:54] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:04] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 08:08:57 UTC 2013 [08:09:54] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:04] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 08:12:54 UTC 2013 [08:13:54] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:04] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:45] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 214 seconds [08:24:44] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 22 seconds [08:59:43] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 203 seconds [09:01:43] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [09:05:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [09:13:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 198 seconds [09:14:41] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [09:37:30] g/win 5 [10:05:46] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [10:08:21] !log Removed old Toolserver IP range 91.198.174.192/27 on primary and backup router interfaces [10:08:29] Logged the message, Master [10:30:05] !log reedy synchronized php-1.22wmf2/extensions/ParserFunctions/ 'Update to master to fix common fatals' [10:30:13] Logged the message, Master [10:31:19] professor.pmtpa.wmnet [10.0.6.30] 2003 (cfinger) : Connection timed out [10:32:01] Reedy: that's not how you do it. You have to do it like this: [10:32:24] PROBLEM - Connectivity on professor is CRITICAL: CRIT Connection timed out [10:32:42] the repetition of CRITICAL / CRIT is essential [10:33:01] but then as the tension is almost unbearable you have to transition rapidly to: [10:33:40] RECOVERY - Connectivity on professor is OK: OK Ping reply 23 ms [10:34:04] now practice [10:36:34] New patchset: Reedy; "Update php symlink to 1.22wmf2" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61762 [10:36:43] ori-l: I think it's bedtime [10:36:51] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61762 [10:37:15] thereabouts, yeah [11:06:01] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [11:30:52] New patchset: ArielGlenn; "script to shovel in pages-logging xml file into logging table" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/61765 [11:31:17] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/61765 [11:32:41] aww [11:38:01] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [11:38:01] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [11:38:02] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [11:38:02] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [12:06:26] New patchset: Physikerwelt; "Intial version of puppet script for LaTeXML" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61767 [12:07:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:01] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:08:55 UTC 2013 [12:09:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:01] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:09:58 UTC 2013 [12:10:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:01] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:10:57 UTC 2013 [12:11:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:01] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:11:51 UTC 2013 [12:12:50] New review: Physikerwelt; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61767 [12:12:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:21] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:13:13 UTC 2013 [12:13:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:51] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 12:13:47 UTC 2013 [12:14:51] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [12:17:51] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [12:26:32] RECOVERY - search indices - check lucene status page on search1017 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.024 second response time [12:31:25] New patchset: ArielGlenn; "debian packaging for php utfnormal extension" [operations/debs/utfnormal] (master) - https://gerrit.wikimedia.org/r/61768 [12:47:16] PROBLEM - SSH on gadolinium is CRITICAL: Server answer: [12:48:16] RECOVERY - SSH on gadolinium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [12:48:37] PROBLEM - SSH on caesium is CRITICAL: Server answer: [12:49:37] RECOVERY - SSH on caesium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:04:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [13:32:30] Change merged: ArielGlenn; [operations/debs/utfnormal] (master) - https://gerrit.wikimedia.org/r/61768 [13:39:01] New patchset: ArielGlenn; "add .gitignore, .gitreview files" [operations/debs/utfnormal] (master) - https://gerrit.wikimedia.org/r/61772 [13:40:41] Change merged: ArielGlenn; [operations/debs/utfnormal] (master) - https://gerrit.wikimedia.org/r/61772 [14:02:57] New patchset: Reedy; "(bug 47418) Babel configuration for th.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60409 [14:03:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60409 [14:03:13] New patchset: Reedy; "$wgNamespaceRobotPolicies for dewiki: Add 101 and 829" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60445 [14:03:32] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60445 [14:05:43] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [14:07:26] !log reedy synchronized wmf-config/InitialiseSettings.php [14:07:35] Logged the message, Master [14:07:45] mw1182: ssh: Could not resolve hostname mw1182: Name or service not known [14:11:12] Reedy: i will take a look at mw1182 [14:12:20] New patchset: Reedy; "(bug 45979) Set $wgCategoryCollation to 'uca-vi' on all Vietnamese-language wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60705 [14:12:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60705 [14:13:14] !log reedy synchronized wmf-config/InitialiseSettings.php [14:13:22] Logged the message, Master [14:34:41] PROBLEM - Puppet freshness on cp1031 is CRITICAL: No successful Puppet run in the last 10 hours [14:39:24] New patchset: ArielGlenn; "have rest of eqiad snapshot hosts use standard snap partition recipe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61778 [14:41:52] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61778 [14:54:47] New review: Hydriz; "(1 comment)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/61428 [15:05:02] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [15:13:18] New patchset: Demon; "General code cleanup" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/60444 [15:18:03] Ryan_Lane: when you have a few minutes, puppet-merge is claiming it can't ff and it's showing a different commit id for https://gerrit.wikimedia.org/r/#/c/61735/ than gerrit knows.. any ideas? [15:19:33] Change merged: jenkins-bot; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/60444 [15:22:03] New patchset: Burthsceh; "(bug 47933) Rename Module talk namespace for Japanese Wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61782 [15:53:42] !log Converted link between cr2-knams:xe-1/0/0 and csw2-esams:xe-0/1/1 to LACP aggregated link cr2-knams:ae1 / csw2-esams:ae2 [15:53:50] Logged the message, Master [15:57:06] apergos: I cherry-picked some commits in yesterday [15:57:18] eurgh [15:57:19] git reset --hard origin/master [15:57:24] err [15:57:28] origin/production [15:57:39] let me see how that does [15:57:50] (yeah, too bad our branch isn't master, eh? ) [15:58:56] did you already do this? I see my change showing up in git log [15:59:13] no, I didn't [15:59:53] hrm [16:00:21] well puppet-merge tells me there's nothing to merge now so... [16:00:55] thanks? :-D [16:04:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:05:50] midas touch [16:08:06] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:07:56 UTC 2013 [16:08:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:06] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:09:02 UTC 2013 [16:09:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:06] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:10:05 UTC 2013 [16:10:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:06] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:11:01 UTC 2013 [16:11:27] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:56] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:11:49 UTC 2013 [16:12:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:36] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:12:31 UTC 2013 [16:13:06] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [16:13:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:13:47] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 16:13:40 UTC 2013 [16:14:26] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [16:29:26] New patchset: Ori.livneh; "Lint mediawiki_singlenode module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61787 [16:46:26] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [16:46:27] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [16:48:11] New patchset: Danny B.; "cswiktionary: Set display title restriction" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61789 [16:53:10] !log torrus deadlocked, fixin [16:53:18] Logged the message, RobH [16:56:57] ottomata, I'm a bit uneasy about the inbetween state that puppetmaster::self is in right now. Are you mostly done with your dev work there and it's just a question of updating the docs? [17:00:56] heya LeslieCarr, you there? [17:01:03] yes [17:01:08] !log torrus compiling, manutilus apache offline [17:01:15] Logged the message, RobH [17:01:34] we want to move the udp2log mluticast relay that is currently running on oxygen over to gadolinium so we can upgrade oxygen without losing logs [17:01:57] there shouldn't be a problem with running 2 multicast relays on the same mulitcast group at once, should there? [17:02:25] i guess the question is: do you know if there are any problems if there are multiple senders of multicast traffic? [17:06:20] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [17:07:04] LeslieCarr: ^^ :) [17:11:11] should not be [17:11:11] nope [17:11:11] well data could be double received by the receivers [17:13:11] New patchset: Lcarr; "latest release is wm10 not wm8 in our apt repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61793 [17:13:15] New review: Reedy; "Why are you changing quotes at the same time and moving stuff around?" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/61782 [17:14:49] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61793 [17:15:28] hmm, nooo, i think it wouldn't [17:15:31] LeslieCarr: [17:15:36] because each log will only be sent ot a single relay [17:15:39] oh okay [17:15:40] plan: [17:15:40] then yeah [17:15:48] i was thinking plan was could be sent out by boh at once [17:15:49] set up multicast relay on gadolinum [17:16:02] then deploy config changes to frontends to use gadoilnium IP instead of oxygen IP [17:16:08] once we are sure that has all rolled out [17:16:12] then we can stop oxygen relay [17:19:48] !log puppet working on bits caches again after gerrit change 61793 [17:19:54] cool [17:19:57] Logged the message, Mistress of the network gear. [17:20:07] New review: Andrew Bogott; "I tested this and it works fine. I have a couple of other minor patches to this module pending; I'l..." [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/61787 [17:20:08] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61787 [17:24:34] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61627 [17:33:02] !log kaldari synchronized php-1.22wmf2/extensions/Echo/Hooks.php 'syncing Hooks in Echo' [17:33:09] Logged the message, Master [17:35:03] Reedy, re https://bugzilla.wikimedia.org/show_bug.cgi?id=47945 I created the RT ticket for the DNS entry, for the apache redirect one (that you created last year: https://bugzilla.wikimedia.org/show_bug.cgi?id=36477#c3 ) - is that just a date change? [17:38:00] New patchset: BBlack; "Work-In-Progress vhtcpd code." [operations/software/varnish/vhtcpd] (master) - https://gerrit.wikimedia.org/r/60390 [17:38:53] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [17:38:57] * Reedy looks blankly [17:39:00] Hey Krinkle, it looks like PhantomJS is still broken for REL1_20.. is there a way to disable that? Or make it non voting so we can get the security release merged today? [17:39:26] ohai Reedy :) [17:39:37] csteipp: open Special:JavaScriptTest/qunit in your browser (having checked out the gerrit change) and verify it runs [17:39:39] (I forgot hashar was out today on vacation, otherwise I'd be bugging him) [17:39:41] then just override the score and merge it [17:45:43] New review: Faidon; "Yes, it'd be great to reuse existing packages instead of embedding dozens of jars in the same packag..." [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [17:59:26] Probably needs to be announced first [17:59:43] And there's 2 to create currently [17:59:53] !log kaldari synchronized php-1.22wmf3/extensions/Echo/Hooks.php 'syncing Hooks in Echo' [17:59:55] Logged the message, Master [18:00:06] Reedy, yeah it was announced :) [18:00:09] re https://gerrit.wikimedia.org/r/61805 # Uploads are offsite on the previous one RewriteCond %{SERVER_ADDR} !^211\.115\.107 [18:00:09] was added? [18:00:24] What? [18:00:33] I assume Thehelpfulone means this: http://lists.wikimedia.org/pipermail/wikimedia-l/2013-May/125659.html [18:01:08] I don't [18:01:16] oh never mind Reedy, in the RT ticket you created for wikimania2013wiki, https://rt.wikimedia.org/Ticket/Display.html?id=2904 that pastebin you linked to had the line " RewriteCond %{SERVER_ADDR} !^211\.115\.107" but I see it's not there on the live version [18:01:18] New patchset: Reedy; "Add virtualhost for wikimania2014wiki" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61804 [18:01:18] * greg-g steps out of it [18:03:51] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: closed, private and fishbowl to 1.22wmf3 [18:03:59] Logged the message, Master [18:07:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [18:08:00] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: special and wikimedia to 1.22wmf3 [18:08:07] Logged the message, Master [18:09:25] * Reedy kicks professor [18:11:18] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikinews and wikiquote to 1.22wmf3 [18:11:25] Logged the message, Master [18:13:29] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikisource, wikivoyage and wikiversity to 1.22wmf3 [18:13:38] Logged the message, Master [18:15:33] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikibooks and wiktionary to 1.22wmf3 [18:15:42] Logged the message, Master [18:16:03] https://gerrit.wikimedia.org/r/#/c/61807/ [18:16:12] Reedy: whenever ready for wikidata ^ [18:16:14] New patchset: Reedy; "Everything bar open 'pedia projects over to 1.22wmf3" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61808 [18:16:35] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61808 [18:18:43] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [18:19:47] !log reedy synchronized php-1.22wmf3/extensions/ 'Update extensions for wikibase deploy' [18:19:54] Logged the message, Master [18:20:18] Lots of class not found fatals [18:20:23] I wonder if they are just transitionary [18:20:26] !log owa1/2/3 offlinging and wiping [18:20:35] Logged the message, RobH [18:20:57] I'm going with yes [18:23:04] Reedy: we need localisation cache updated :/ [18:23:06] not urgent [18:23:16] otherwise things seem fine [18:27:14] New patchset: Andrew Bogott; "Chown the 'cache' and 'image' dirs to www-data." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60927 [18:27:30] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60927 [18:28:19] andrewbogott: last parameter should have a trailing comma, not a semicolon. allows you to rearrange lines freely. [18:28:47] and use single quotes around literal strings that do not interpolate variables :P [18:29:41] Huh, I thought I did that last one. [18:29:46] Is there an automatic linting tool for this? [18:30:04] puppet-lint picks up most [18:30:14] http://puppet-lint.com/ [18:31:11] !log disabled owa1-3 switchports, starting remote wipe [18:31:18] Logged the message, RobH [18:32:15] !log DNS update - add wikimania2014 entries [18:32:22] Logged the message, Master [18:32:28] heh [18:34:08] thanks mutante [18:36:02] !log LocalisationUpdate completed (1.22wmf3) at Wed May 1 18:36:02 UTC 2013 [18:36:09] Logged the message, Master [18:36:39] aude: ^ Out of interest has that LU fixed it? [18:37:54] New patchset: Andrew Bogott; "Grab-bag of mediawiki_singlenode tuneups." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61811 [18:38:35] ori-l: ^ [18:39:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 224 seconds [18:39:48] New review: Ori.livneh; "LGTM; nice change." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/61811 [18:40:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [18:40:40] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61811 [18:40:48] New patchset: Ottomata; "Setting up webrequest multicast and eventlogging on gadolinium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61812 [18:42:05] Reedy: checking [18:42:36] still see "wbcreateclaim-create:2" [18:42:52] http://www.wikidata.org/wiki/Special:Contributions/Aude for example [18:42:55] That's a no then ;) [18:44:11] right [18:46:31] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61812 [18:47:47] New patchset: Ottomata; "Fixing typo in upstart conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61813 [18:48:02] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61813 [18:48:50] hey i'm about to merge someone's mediwiki singlenode changes on sockpuppet [18:48:54] andrewbogott: ? [18:48:56] s'ok? [18:49:13] ottomata: yep, go ahead. [18:49:20] k thanksdone [18:49:21] I think that's maybe all of them, even. [18:50:25] New patchset: Jgreen; "deploy db1013 as temporary replacement for db1025" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61814 [18:54:43] Change abandoned: Andrew Bogott; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51799 [18:58:01] mutante: can I talk to you? [18:58:26] odder: what's up [18:58:36] OS [18:58:51] ;) [19:04:34] hmm. review in progress forever... [19:04:59] !log reedy Started syncing Wikimedia installation... : rebuild l10n cache for wikidata [19:05:08] Logged the message, Master [19:06:36] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [19:08:15] New review: Hashar; "I am not sure why there is a hadoop::defaults class. Seems all variables defined there are simply pa..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/61710 [19:14:59] !log DNS update - remove project2, anthony and yongle :p [19:15:05] Logged the message, Master [19:16:27] New patchset: Ryan Lane; "deploy db1013 as temporary replacement for db1025" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61814 [19:17:04] New patchset: Ori.livneh; "Improvements to mediawiki_singlenode" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61816 [19:17:13] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61814 [19:17:13] ^ andrewbogott [19:18:55] ori-l, did you test this one or would you like me to? [19:19:14] could you? [19:20:11] Sure. What's the story with 'rewrite'? [19:21:00] mod_rewrite [19:21:27] what about it? [19:21:31] I'm about to die of boredom [19:21:46] Reedy: want me to redeploy E3Experiments? [19:24:28] ori-l: Just, you added it, wondering if it's needed. (Or maybe you didn't add it and I'm confused) [19:24:44] New patchset: Andrew Bogott; "First pass at a labsconsole puppet setup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53989 [19:24:44] New patchset: Andrew Bogott; "Switch the openstack manifest to use webserver::php5." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51798 [19:24:56] i added a rewrite directive to the apache config [19:25:02] New review: Andrew Bogott; "Still WIP" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/53989 [19:25:15] ok [19:25:28] you could probably do it using mod_alias alone, but this happens to be a combo i tested quite a lot on vagrant [19:26:20] we use mod_rewrite in prod quite extensively http://noc.wikimedia.org/conf/highlight.php?file=main.conf [19:27:41] you could certainly swing the other way and change the alias directives to rewrite directives [19:27:50] since mod_rewrite's functionality is a superset of mod_alias [19:28:31] but i was happy to get something that was exact and correct so i stuck with that combo [19:29:36] "[Wmfsf] Issues with the air conditioner" meanwhile it is snowing again in MN [19:30:03] o_O Snow in MN? [19:30:15] It's full of sunny heat up here in Montreal. [19:31:13] Coren, it was in the 70s yesterday but, yes, snow today. [19:31:58] Ze weather, she is cray-zee! [19:32:19] looks like the localisation update is done [19:33:00] it's probably on most servers [19:33:06] not actually returned the console yet [19:33:31] * aude waits [19:34:18] * robla waves at Reedy [19:37:58] ori-l, Error 400 on SERVER: Must pass name to Apache_module[rewrite] [19:38:32] andrewbogott: i'll amend the patch [19:39:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 223 seconds [19:40:24] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [19:43:04] New patchset: Ori.livneh; "Improvements to mediawiki_singlenode" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61816 [19:44:01] ^ andrewbogott [19:44:45] !log reedy synchronized php-1.22wmf3/cache/l10n/ [19:44:53] Logged the message, Master [19:47:03] PROBLEM - Host db1013 is DOWN: PING CRITICAL - Packet loss = 100% [19:49:23] RECOVERY - Host db1013 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [19:50:14] ori-l, now it's all 'Error 400 on SERVER: Cannot reassign variable name on node' [19:50:20] Not sure what that's about [19:50:38] i should just stop being a hipster and pass it explicitly [19:50:41] i'll update the patch, hang on [19:52:36] oh, fun [19:52:46] apache_module is FUBAR but used extensively [19:52:59] i won't fix it now tho, just work around it [19:53:24] this is why you see apache_module { 'slightly_different_name_than_module_name': name => 'module_name', } [19:55:10] New patchset: Ori.livneh; "Improvements to mediawiki_singlenode" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61816 [19:55:21] ^ andrewbogott [19:59:27] New patchset: Ottomata; "Fixing apache logging for metrics.wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61822 [19:59:37] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61822 [20:02:01] ori-l: I think you'll need to hack on this directly. I'm using the system mwreview-dev1. [20:02:02] http://mwreview-dev1.instance-proxy.wmflabs.org/wiki/Main_Page [20:02:12] I'll add you to the project [20:02:44] k [20:05:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:07:24] PROBLEM - mysqld processes on db1013 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:07:53] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:07:48 UTC 2013 [20:08:24] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 199 seconds [20:08:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:08:53] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:08:48 UTC 2013 [20:09:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:43] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:09:38 UTC 2013 [20:10:23] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 19 seconds [20:10:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:43] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:10:33 UTC 2013 [20:11:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:53] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:11:50 UTC 2013 [20:12:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:33] RECOVERY - Puppet freshness on db10 is OK: puppet ran at Wed May 1 20:12:32 UTC 2013 [20:13:33] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [20:20:38] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61424 [20:22:11] Ryan_Lane, is the morebot running in this channel puppetized? I'm wondering what to do about https://gerrit.wikimedia.org/r/#/c/58922/1 [20:30:14] andrewbogott: nope [20:30:18] andrewbogott: it's packaged [20:30:30] OK… I'll just remove the stuff in that patch then. [20:31:00] ok [20:39:53] I'm pushing out a Parsoid update, please ignore Parsoid-related alerts for the next minutes [20:40:33] New patchset: Ori.livneh; "Improvements to mediawiki_singlenode" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61816 [20:40:50] andrewbogott: ^ [20:41:34] and done [20:49:44] !log adding new wikipediazero.org zone alias, running authdns-update on dobson [20:49:52] Logged the message, Master [20:59:07] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61889 [21:01:00] Reedy, robla -- you guys done with your deploy window? [21:02:44] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61891 [21:03:07] oh; ya... like an hour ago [21:03:38] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:05:19] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61892 [21:05:37] New patchset: Ottomata; "Redirecting wikipediazero.org to http://wikimediafoundation.org/wiki/Wikipedia_Zero" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61893 [21:06:09] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [21:07:29] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61892 [21:08:07] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:09:33] hashar, what version of pep8 is jenkins using? [21:12:08] New review: Aaron Schulz; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61425 [21:12:37] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:13:22] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61892 [21:13:34] !log stopping mysql on db78 for cloning [21:13:42] Logged the message, Master [21:14:44] New review: Hashar; "wooorks for me :)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/61892 [21:14:49] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:16:30] New patchset: Dzahn; "Redirecting wikipediazero.org to http://wikimediafoundation.org/wiki/Wikipedia_Zero" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61893 [21:16:49] PROBLEM - mysqld processes on db78 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:18:44] Change merged: Ottomata; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61893 [21:18:55] New review: Hashar; "(1 comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/61891 [21:19:02] New review: Dzahn; "dzahn@fenari:~$ apache-fast-test wikipediazero.org mw1044" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61893 [21:21:34] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61891 [21:24:21] ori-l: I ran your latest patch fresh on mwreview-dev2. Changes weren't picked up until I restarted apache2 by hand; also, paths still seem wrong: http://mwreview-dev2.pmtpa.wmflabs/wiki/Main_Page [21:25:13] !log syncing new redirects.conf to apachces to redirect wikipediazero.org [21:25:26] andrewbogott: does puppet manage orig/LocalSettings.php, or write it once and then leave it unmanaged? [21:25:50] It doesn't manage it -- it's generated by the MW init script. [21:26:19] right, but the latter is invoked with puppet vars as args [21:26:56] so the previous version that did not have labs_mediawiki_host or whatever that var is called would have generated an orig/LocalSettings.php that points to ${::hostname}.pmtpa.wmflabs [21:27:11] and then no subsequent fix would remove that, since puppet is not touching that file [21:27:25] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:28:05] New patchset: RobH; "creating racktables role for eqiad based server" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [21:29:31] ori-l: I build mwreview-dev2 from scratch, then built with your patch. [21:29:35] *built [21:29:44] Or, tried to at least. [21:31:16] dzahn is doing a graceful restart of all apaches [21:31:43] can you add a notice($labs_mediawiki_hostname) to the file? [21:31:55] let me look at the instance for a moment [21:32:02] !log dzahn gracefulled all apaches [21:32:09] Logged the message, Master [21:32:12] New patchset: Andrew Bogott; "Pep8 cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61889 [21:34:01] andrewbogott: I see what's going on, I think [21:34:19] give me a few [21:34:43] actually, it's quite simple [21:34:50] scriptpath should be /w, as in prod [21:35:14] as opposed to this labsinstance.pmtpa.wmflabs/srv/mediawiki awfulness [21:36:36] New patchset: Ori.livneh; "Improvements to mediawiki_singlenode" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61816 [21:36:45] andrewbogott: can you try again with that ^? [21:37:31] ori-l: I'll start the process; I need to go in 5 though. [21:38:16] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [21:38:16] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [21:38:17] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [21:38:17] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [21:39:42] ok. which instance are you running it on? [21:43:03] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61805 [21:43:29] !log mwalker Started syncing Wikimedia installation... : Updating CentralNotice from a8797938bc for historical allocations and some refactoring [21:43:37] Logged the message, Master [21:44:00] ^ andrewbogott, before you run, can you identify the machine you're running this against? [21:45:12] just pulled mediawiki config to create new docroot [21:45:39] and there was one other change to multiversion/MWMultiVersion.php | 2 +- in it [21:49:47] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60766 [22:02:03] !log dzahn synchronized docroot [22:02:10] Logged the message, Master [22:05:28] New patchset: RobH; "magnesium to be racktables host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61904 [22:06:13] !log dzahn synchronized ./multiversion/MWMultiVersion.php [22:06:21] Logged the message, Master [22:06:40] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61904 [22:06:40] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [22:06:56] Banner::getMixins 10.64.16.13 1054 Unknown column 'mixin_name' in 'field list' (10.64.16.13) SELECT mixin_name FROM `cn_template_mixins` WHERE tmp_id = '4249' [22:07:11] awjr: anyone know what that is? [22:07:38] o_O [22:07:57] mwalker ^? [22:08:01] AaronSchulz: ah, saw you +2ed that.. https://gerrit.wikimedia.org/r/#/c/61424/1 fyi, i just synced that out because it wasn't deployed but merged [22:08:43] AaronSchulz: there was a patch in what I'm currently deploying that updates the schema -- but I thought we had already deployed it [22:08:49] working on fixing that [22:09:31] New patchset: RobH; "forgot to include webserver in class call" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61905 [22:11:59] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61905 [22:13:35] I'm going to deploy https://gerrit.wikimedia.org/r/#/c/61746/ now [22:13:39] !log mwalker Finished syncing Wikimedia installation... : Updating CentralNotice from a8797938bc for historical allocations and some refactoring [22:13:43] logmsgbot moving from fenari to neon [22:13:47] Logged the message, Master [22:15:09] New patchset: Tim Starling; "Move logmsgbot from fenari to neon" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61746 [22:15:24] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61746 [22:18:10] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:08] New patchset: RobH; "called install cert twice, oops" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61908 [22:20:00] I wonder how long a puppet run on neon takes [22:21:06] New review: Dzahn; "just like wikimania2013, obvious copy and i just synced the new docroot" [operations/apache-config] (master) C: 2; - https://gerrit.wikimedia.org/r/61804 [22:21:06] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/61804 [22:21:57] Change abandoned: RobH; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61908 [22:23:13] hmmm [22:23:14] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class passwords::logmsgbot for neon.wikimedia.org at /var/lib/git/operations/puppet/manifests/site.pp:1983 on node neon.wikimedia.org [22:24:03] never mind [22:37:36] New patchset: RobH; "lets try it this way" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61913 [22:40:10] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61913 [22:41:59] !log powercycling frozen mw1041 [22:42:07] Logged the message, Master [22:44:29] RECOVERY - Host mw1041 is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [22:45:28] New patchset: RobH; "changing dependency chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61914 [22:46:08] syncing one more time so mw1041 also gets it [22:46:39] PROBLEM - Apache HTTP on mw1041 is CRITICAL: Connection refused [22:47:28] New patchset: Tim Starling; "Allow logmsgbot connections from anywhere" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61916 [22:49:22] hello [22:50:01] TimStarling: hi [22:50:13] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61916 [22:50:17] New patchset: RobH; "changing dependency chain" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61914 [22:51:37] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61914 [22:51:59] hello AaronSchulz [22:52:39] RECOVERY - Apache HTTP on mw1041 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [23:04:19] RECOVERY - mysqld processes on db1013 is OK: PROCS OK: 1 process with command name mysqld [23:05:23] New patchset: Ori.livneh; "Allow multiple, comma-separated CIDR ranges to be specified" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61920 [23:05:30] ^ TimStarling [23:07:08] want to also submit the configuration change? [23:07:44] sure, ok [23:08:05] amend that patch or create another one? [23:08:35] i'll create another one to avoid colliding with you [23:08:59] thanks [23:09:14] * TimStarling is already on to the next thing [23:09:23] PROBLEM - Puppet freshness on db10 is CRITICAL: No successful Puppet run in the last 10 hours [23:10:48] binasher: I really wish those PumpkinSky lock errors had backtraces... [23:11:01] New patchset: Dzahn; "add secure.wikimedia.org (old SSL site) redirects to cluster." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/60934 [23:11:13] PROBLEM - HTTP on magnesium is CRITICAL: Connection refused [23:12:08] binasher: oh, and look at SpecialNewPagesFeed in the PageTriage ext (near the top of execute) [23:12:35] that's pretty "wow", I have too double check that :) [23:17:59] !log tstarling synchronized php-1.22wmf2/extensions/GettingStarted [23:18:07] Logged the message, Master [23:19:13] RECOVERY - HTTP on magnesium is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 575 bytes in 0.001 second response time [23:19:36] ori-l, I'm back… going to test on a new instance, mwreview-dev3 [23:19:57] is Special:GettingStarted meant to serve something other than a segfault? [23:20:28] * AaronSchulz likes Tim style of blunt questions? [23:21:21] I'm reverting my change, maybe that'll fix it [23:21:31] superm401 and I are looking too [23:22:05] New patchset: Dzahn; "add secure.wikimedia.org (old SSL site) redirects to cluster." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/60934 [23:22:35] yeah, it did fix it [23:22:48] working for me now [23:23:11] !log tstarling synchronized php-1.22wmf2/extensions/GettingStarted [23:23:19] Logged the message, Master [23:24:32] it's only on enwiki, test and test2? [23:24:58] test2 is wmf3, and enwiki is wmf2 [23:25:27] so I'll make wmf3 the new version then grab a backtrace for that segfault, if it is reproducible in test2 [23:26:19] unless you know why it happens already? [23:27:14] TimStarling: I think you changed the serializer [23:27:31] the default for phpredis is 0 (none) and with the pool it is 1 (php) [23:28:08] !log tstarling synchronized php-1.22wmf3/extensions/GettingStarted [23:28:14] how can you have no serializer? [23:28:16] Logged the message, Master [23:29:36] it just casts everything to a string? [23:29:51] you wouldn't think that would make it segfault [23:30:28] it gives errors if you pass things that can't be seen as strings [23:30:39] though they should not be segfaults [23:31:15] having no serializer is what the redis queue uses, so ID strings don't get mangled with bracket/comma soup (which Lua would have to mess around with) [23:35:57] I have a core, but there's no debugging symbols on srv193 [23:36:31] that's just a prelude to you saying you don't actually need them, right? [23:37:49] x/8x $rsp might be loads of fun, but I do have some other things to do today [23:38:14] the PHP symbols were in another package [23:38:38] my son just got up so i was a bit distracted, sorry. it sounds like AaronSchulz identified the issue above? [23:38:51] or is it still mysterious? [23:39:25] I merely identified an issue [23:41:15] it is in RedisConnRef::_call [23:45:26] New patchset: Aaron Schulz; "Keep the GettingStarting redis objects uses no serialization." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61927 [23:45:48] it crashed while freeing arguments to __call() [23:46:10] maybe the extension decremented the reference count on one of them when it shouldn't have done [23:48:39] ori-l: OK, I've completed the puppet run and now I'm about to restart apache. Do you want to poke around before I do that? [23:49:22] sRandMember is the relevant method [23:49:25] andrewbogott: go for it, doing two other things [23:50:42] New review: Andrew Bogott; "Paths seem to be working properly now. This still breaks the apache restart, though, so the wiki do..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/61816 [23:54:17] andrewbogott: the apache restart was broken too. if you specify provider => upstart you can treat it like a regular service and not have a separate exec for restarting it. I'll submit a patch tonight or tomorrow. [23:54:45] New patchset: Ori.livneh; "logmsgbot: only allow connections from tin, fenari & localhost" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61931 [23:57:40] AaronSchulz, Tim -- would it be OK if I take off? I have to run and the problem is just too big to fit in my attention span at the moment. It looks like the revert unbroke things, and I think superm401 is continuing to investigate. I will attend to this later tonight PST. Is that OK? [23:58:04] I'm not actively investigating, though I can if I would be helpful. [23:58:43] I think the revert fixes things well enough to make this a non-emergency, but I'll look later either way. [23:59:53] * AaronSchulz tests around [23:59:57] looks like going from none=>php just makes the data interpreted as a literal string since unserialization fails...which is sane