[00:00:02] (03CR) 10Dzahn: "not yet per talk with chris" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148289 (https://bugzilla.wikimedia.org/38516) (owner: 10Dzahn) [00:00:17] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:03:32] (03CR) 10Mwalker: [C: 032] Put quotes around strings in YAML that start with % [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150081 (owner: 10Yuvipanda) [00:03:34] (03Merged) 10jenkins-bot: Put quotes around strings in YAML that start with % [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/150081 (owner: 10Yuvipanda) [00:05:53] (03PS1) 10Ori.livneh: mediawiki: correct name of configuration key in hhvm.ini [operations/puppet] - 10https://gerrit.wikimedia.org/r/150092 [00:06:13] (03CR) 10Ori.livneh: [C: 032 V: 032] mediawiki: correct name of configuration key in hhvm.ini [operations/puppet] - 10https://gerrit.wikimedia.org/r/150092 (owner: 10Ori.livneh) [00:06:27] (03CR) 10Tim Landscheidt: [C: 04-1] "Ori has some points, but I need to sleep first :-)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/120186 (owner: 10Tim Landscheidt) [00:07:03] (03PS2) 10Dzahn: wikidata.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 [00:08:24] greg-g, yuvi has a patch for jouncebot that implements a 'last' command -- but it doesn't actually do anything; what were you intending? [00:08:58] (03PS3) 10Dzahn: wikidata.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 [00:11:38] mwalker: it was a joke :) [00:11:53] bd808 was looking for what would be better termed "previous" [00:12:05] (03PS2) 10Dzahn: wikivoyage-old.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150072 [00:12:26] (03CR) 10BBlack: [C: 032] "confirmed whitespace-only" [operations/dns] - 10https://gerrit.wikimedia.org/r/150071 (owner: 10Dzahn) [00:12:33] ah; yes [00:12:40] that needs to happen [00:12:44] *before I depart [00:12:55] because without a deadline [00:12:56] * greg-g shrugs [00:12:57] it'll take a long time [00:12:58] yeah :) [00:14:03] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150072 (owner: 10Dzahn) [00:16:51] (03CR) 10BBlack: [C: 031] wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [00:16:56] (03PS1) 10Dzahn: wikivoyage-old.org - fix alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150093 [00:17:44] (03CR) 10Dzahn: "JeremyB: no, the source i had copied from still had the tabs when i made it and i removed those on purpose, you re-added them. but meanwhi" [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [00:18:52] (03CR) 10Dzahn: [C: 032] wikivoyage-old.org - fix alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150093 (owner: 10Dzahn) [00:20:49] (03PS7) 10Dzahn: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 [00:20:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:22:22] (03PS1) 10Dzahn: wikimedia.com - retab and alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150094 [00:23:38] (03PS2) 10Dzahn: wikimedia.com - retab and alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150094 [00:25:36] (03PS1) 10Dzahn: wmnet - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150095 (https://bugzilla.wikimedia.org/68769) [00:27:08] (03PS1) 10Dzahn: wmftest.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150096 [00:27:42] !log aaron Synchronized php-1.24wmf15/includes: f754c239ce93fc5f2db19e93f4fe8a1d1ba7bc27 (duration: 00m 06s) [00:27:48] Logged the message, Master [00:27:54] !log aaron Synchronized php-1.24wmf15/maintenance: f754c239ce93fc5f2db19e93f4fe8a1d1ba7bc27 (duration: 00m 04s) [00:27:59] Logged the message, Master [00:30:17] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 1 failures [00:30:19] (03PS1) 10Dzahn: wikivoyage.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150099 [00:33:36] (03PS1) 10Dzahn: wiktionary.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150101 [00:33:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [00:35:57] (03PS1) 10Dzahn: wikiversity.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150103 [00:36:33] (03CR) 10Scottlee: [C: 031] wikimedia.com - retab and alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150094 (owner: 10Dzahn) [00:37:04] (03PS1) 10Dzahn: wikisource.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150105 [00:38:03] (03CR) 10Dzahn: [C: 032] wikimedia.com - retab and alignment [operations/dns] - 10https://gerrit.wikimedia.org/r/150094 (owner: 10Dzahn) [00:41:00] (03PS1) 10Dzahn: 10.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150107 [00:41:33] (03PS1) 10Dzahn: 154.80.208.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150108 [00:44:39] (03PS1) 10Dzahn: 174.198.91.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150109 [00:44:41] !log aaron Synchronized php-1.24wmf15/maintenance/runJobs.php: fcfa3153e53dc70e6cd190a087e7bd577fe380fb (duration: 00m 03s) [00:44:46] Logged the message, Master [00:48:17] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [00:48:52] (03CR) 10Dzahn: [C: 032] Fix str_replace call [operations/apache-config] - 10https://gerrit.wikimedia.org/r/147491 (owner: 10Reedy) [00:55:10] ori: https://gerrit.wikimedia.org/r/#/c/148741/ ? [00:55:17] can all stuff there be abandoned? [00:55:38] because the repo is discontinued.. or should it be converted to puppet repo [00:55:43] a change in the other repo, i mean [00:56:04] basically all these https://gerrit.wikimedia.org/r/#/q/project:operations/apache-config+status:open,n,z [00:56:33] mutante: beta hasn't been ported yet [00:56:41] notice the branch on that particular change is 'betacluster' [00:56:57] but that particular patch can be abandoned for another reason -- namely, it doesn't work very well :) [00:57:10] I think there's only Bryans patch that wants saving from the rest [00:57:22] ah, i see @ beta [00:57:57] ori: gotcha:) [00:58:02] Reedy: haha, what about us-ne though [00:58:10] or should i say.. ne-us :) [00:58:28] KILLKILLKILL [00:58:46] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/109460 (https://bugzilla.wikimedia.org/60222) (owner: 10Tim Landscheidt) [00:58:58] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108880 (https://bugzilla.wikimedia.org/43266) (owner: 10Tim Landscheidt) [00:59:13] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/108465 (https://bugzilla.wikimedia.org/60238) (owner: 10Tim Landscheidt) [00:59:32] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 (owner: 10BryanDavis) [00:59:44] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [00:59:52] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [01:00:04] (03CR) 10Dzahn: [C: 04-1] "apache-config repo has been abandoned (see new file README_BEFORE_EDITING), please abandon or convert to a change in operations/puppet" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/24407 (owner: 10Jeremyb) [01:00:30] Ubuntu 14.04.1 LTS [01:00:34] Didn't realise we were there already [01:03:06] Reedy: yea, we already have f.e. phabricator running on trusty [01:04:16] Reedy: https://gerrit.wikimedia.org/r/#/c/133992/ ? :p [01:04:45] I'd kill it too [01:04:54] Seeing as they don't know what they want [01:05:02] but what are they going to do now? [01:05:02] That, and surely it won't merge cleanly now anyway? ;) [01:05:09] not let us host it all ? [01:05:16] too much drama? sigh [01:05:30] yeah [01:05:30] lol [01:05:34] thinks it's rude to abandon other people's changes [01:05:38] but.. yea [01:05:51] Antoine goes through and abandons some of mine from time to time :) [01:05:58] arg, but all i said was "make it consistent with other US chapter" [01:06:04] who would have known [01:06:40] LOL, i opened the bug and read 1 random sentence and it was [01:06:42] "re MZMcBride's comments about having chapter wikis at all:" [01:08:47] (03CR) 10Dzahn: [C: 04-2] "the bug says "Please place a hold on this"" [operations/dns] - 10https://gerrit.wikimedia.org/r/133992 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [01:11:38] (03CR) 10Dzahn: [C: 04-2] Apache set up for us-ne [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [01:11:50] (03CR) 10Dzahn: [C: 04-2] Redirect usne to us-ne [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133991 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [01:13:20] (03CR) 10Dzahn: [C: 04-1] "please convert to admins.yaml:)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [01:14:06] Hahaha [01:15:59] (03CR) 10Dzahn: [C: 031] "https://legalteam.wikimedia.org/w/touch.php" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147488 (owner: 10Reedy) [01:18:54] (03CR) 10Dzahn: Add robots.txt rewrite rule where wiki is public (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147487 (owner: 10Reedy) [01:22:06] (03CR) 10Dzahn: [C: 031] add index.html pages for various directories on dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/144640 (owner: 10ArielGlenn) [01:23:42] (03CR) 10Dzahn: "we should run this on the puppet compile: https://integration.wikimedia.org/ci/view/operations/job/operations-puppet-catalog-compiler/ ->" [operations/puppet] - 10https://gerrit.wikimedia.org/r/124001 (owner: 10Tim Landscheidt) [01:25:04] (03CR) 10Dzahn: [C: 031] "added _joe_" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [01:27:03] (03CR) 10Dzahn: "what Alex said" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/145018 (owner: 10ArielGlenn) [01:36:07] PROBLEM - Disk space on ms-be3004 is CRITICAL: Timeout while attempting connection [01:36:47] PROBLEM - Host amssq39 is DOWN: PING CRITICAL - Packet loss = 100% [01:36:47] PROBLEM - Host amssq40 is DOWN: PING CRITICAL - Packet loss = 100% [01:36:47] PROBLEM - Host amssq37 is DOWN: PING CRITICAL - Packet loss = 100% [01:36:58] PROBLEM - Host amssq42 is DOWN: PING CRITICAL - Packet loss = 100% [01:36:58] PROBLEM - Host cp3017 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:08] PROBLEM - Host amssq43 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:08] PROBLEM - Host amssq31 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:08] PROBLEM - Host amssq41 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:08] PROBLEM - Host amssq44 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:08] PROBLEM - Host amssq32 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:15] (03PS8) 10BBlack: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [01:37:17] PROBLEM - Host amssq33 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:18] PROBLEM - Host amssq45 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:18] PROBLEM - Host cp3013 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:24] ah.. what's that [01:37:27] PROBLEM - Host amssq46 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:27] PROBLEM - Host cp3014 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:27] PROBLEM - Host amssq34 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:27] PROBLEM - Host cp3015 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:27] PROBLEM - Host amssq47 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:27] (03CR) 10Scottlee: [C: 031] wikisource.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150105 (owner: 10Dzahn) [01:37:28] PROBLEM - Host amssq35 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:37] PROBLEM - Host amssq36 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:37] PROBLEM - Host cp3016 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:37] PROBLEM - Host cp3018 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:47] PROBLEM - Host amssq38 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:47] PROBLEM - Host ms-be3002 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:47] PROBLEM - Host ms-be3001 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:47] PROBLEM - Host ms-be3004 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:47] PROBLEM - Host lvs3003 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:48] PROBLEM - Host ms-fe3002 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:48] PROBLEM - Host lvs3001 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:49] PROBLEM - Host lvs3002 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:49] PROBLEM - Host lvs3004 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:50] PROBLEM - Host ms-be3003 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:50] PROBLEM - Host ms-fe3001 is DOWN: PING CRITICAL - Packet loss = 100% [01:37:53] (03CR) 10Scottlee: [C: 031] wikiversity.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150103 (owner: 10Dzahn) [01:38:48] (03CR) 10Scottlee: [C: 031] wiktionary.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150101 (owner: 10Dzahn) [01:39:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [01:40:01] (03CR) 10Scottlee: [C: 031] 154.80.208.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150108 (owner: 10Dzahn) [01:41:58] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 16 failures [01:42:08] RECOVERY - Host amssq31 is UP: PING WARNING - Packet loss = 86%, RTA = 95.31 ms [01:42:08] RECOVERY - Host amssq43 is UP: PING WARNING - Packet loss = 86%, RTA = 97.23 ms [01:42:08] RECOVERY - Host amssq41 is UP: PING WARNING - Packet loss = 86%, RTA = 95.26 ms [01:42:08] RECOVERY - Host amssq44 is UP: PING WARNING - Packet loss = 86%, RTA = 95.42 ms [01:42:08] RECOVERY - Host amssq32 is UP: PING WARNING - Packet loss = 86%, RTA = 95.62 ms [01:42:08] RECOVERY - Host amssq37 is UP: PING WARNING - Packet loss = 80%, RTA = 95.42 ms [01:42:09] RECOVERY - Host amssq34 is UP: PING OK - Packet loss = 16%, RTA = 95.48 ms [01:42:09] RECOVERY - Host cp3017 is UP: PING OK - Packet loss = 16%, RTA = 95.96 ms [01:42:10] RECOVERY - Host cp3013 is UP: PING WARNING - Packet loss = 28%, RTA = 96.06 ms [01:42:10] RECOVERY - Host amssq33 is UP: PING WARNING - Packet loss = 28%, RTA = 97.00 ms [01:42:11] RECOVERY - Host amssq45 is UP: PING WARNING - Packet loss = 28%, RTA = 97.89 ms [01:42:17] RECOVERY - Host amssq46 is UP: PING OK - Packet loss = 0%, RTA = 96.38 ms [01:42:18] RECOVERY - Host cp3014 is UP: PING OK - Packet loss = 0%, RTA = 98.89 ms [01:42:18] RECOVERY - Host cp3015 is UP: PING OK - Packet loss = 0%, RTA = 95.39 ms [01:42:18] RECOVERY - Host amssq35 is UP: PING OK - Packet loss = 0%, RTA = 96.15 ms [01:42:18] RECOVERY - Host amssq47 is UP: PING OK - Packet loss = 0%, RTA = 95.48 ms [01:42:27] RECOVERY - Host amssq36 is UP: PING OK - Packet loss = 0%, RTA = 95.64 ms [01:42:27] RECOVERY - Host cp3016 is UP: PING OK - Packet loss = 0%, RTA = 95.78 ms [01:42:27] RECOVERY - Host ms-fe3001 is UP: PING OK - Packet loss = 0%, RTA = 95.85 ms [01:42:27] RECOVERY - Host amssq39 is UP: PING OK - Packet loss = 0%, RTA = 95.72 ms [01:42:27] RECOVERY - Host ms-be3004 is UP: PING OK - Packet loss = 0%, RTA = 95.86 ms [01:42:28] RECOVERY - Host amssq40 is UP: PING OK - Packet loss = 0%, RTA = 95.86 ms [01:42:28] RECOVERY - Host amssq38 is UP: PING OK - Packet loss = 0%, RTA = 96.07 ms [01:42:29] RECOVERY - Host cp3018 is UP: PING OK - Packet loss = 0%, RTA = 96.24 ms [01:42:29] RECOVERY - Host amssq42 is UP: PING OK - Packet loss = 0%, RTA = 96.09 ms [01:42:30] RECOVERY - Host ms-fe3002 is UP: PING OK - Packet loss = 0%, RTA = 96.26 ms [01:42:30] RECOVERY - Host ms-be3001 is UP: PING OK - Packet loss = 0%, RTA = 96.06 ms [01:42:31] RECOVERY - Host lvs3002 is UP: PING OK - Packet loss = 0%, RTA = 96.43 ms [01:42:31] RECOVERY - Host lvs3004 is UP: PING OK - Packet loss = 0%, RTA = 96.25 ms [01:42:32] RECOVERY - Host ms-be3002 is UP: PING OK - Packet loss = 0%, RTA = 96.31 ms [01:42:37] RECOVERY - Host ms-be3003 is UP: PING OK - Packet loss = 0%, RTA = 96.19 ms [01:42:37] RECOVERY - Host lvs3001 is UP: PING OK - Packet loss = 0%, RTA = 95.48 ms [01:42:37] RECOVERY - Host lvs3003 is UP: PING OK - Packet loss = 0%, RTA = 95.48 ms [01:43:08] that's cool.. but [01:43:30] all the addrs that went down (as opposed to those that didn't) were on the private1-esams subnet [01:43:42] the public access and the ones still on the older public subnet didn't down in icinga [01:44:13] ah.. eh [01:44:58] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 28 failures [01:44:58] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Puppet has 18 failures [01:44:59] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Puppet has 14 failures [01:44:59] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Puppet has 32 failures [01:44:59] PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Puppet has 27 failures [01:44:59] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 31 failures [01:45:27] PROBLEM - puppet last run on amssq57 is CRITICAL: CRITICAL: Puppet has 21 failures [01:45:27] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 7 failures [01:45:27] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 27 failures [01:45:27] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 29 failures [01:45:27] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 35 failures [01:45:58] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 6 failures [01:45:58] PROBLEM - puppet last run on amssq43 is CRITICAL: CRITICAL: Puppet has 3 failures [01:45:58] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: Puppet has 1 failures [01:45:58] PROBLEM - puppet last run on cp3013 is CRITICAL: CRITICAL: Puppet has 14 failures [01:46:11] lol [01:46:15] gj puppet [01:46:17] PROBLEM - puppet last run on amssq44 is CRITICAL: CRITICAL: Puppet has 6 failures [01:46:18] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 1 failures [01:46:27] PROBLEM - puppet last run on ssl3003 is CRITICAL: CRITICAL: Puppet has 13 failures [01:46:27] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: Puppet has 2 failures [01:46:58] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [01:52:47] PROBLEM - Packetloss_Average on erbium is CRITICAL: packet_loss_average CRITICAL: 37.5854172414 [01:53:58] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Epic puppet fail [01:55:58] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [01:56:18] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [01:56:47] RECOVERY - Packetloss_Average on erbium is OK: packet_loss_average OKAY: -0.0539559302326 [01:56:58] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [01:56:58] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [01:57:17] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: packet_loss_average CRITICAL: 14.0737649167 [01:57:46] ok, that's no good, packetloss on analytics is a different DC [01:57:58] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [01:57:58] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [01:58:27] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [01:58:58] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [01:59:27] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [01:59:27] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [02:01:01] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [02:01:01] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [02:01:17] RECOVERY - Packetloss_Average on analytics1003 is OK: packet_loss_average OKAY: 1.17930608333 [02:01:27] RECOVERY - puppet last run on ssl3003 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [02:01:27] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [02:01:58] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [02:01:58] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [02:01:58] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [02:02:11] neat. [02:02:18] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [02:02:19] seems like a link flap maybe? [02:02:39] the esams case did, analytics1003 is .... strange? [02:02:57] PROBLEM - Packetloss_Average on oxygen is CRITICAL: packet_loss_average CRITICAL: 9.18785825 [02:03:07] ditto that ^ [02:03:27] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [02:03:27] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [02:03:57] from oxygen i can ICMP to something external without loss [02:05:20] yeah I checked lvs boxes too, but analytics100[34] both showing the filter thing [02:05:28] maybe that's normal and we filter those hosts normally, and unrelated [02:06:36] it wasn't on icinga earlier though [02:06:57] RECOVERY - Packetloss_Average on oxygen is OK: packet_loss_average OKAY: 1.15079566667 [02:07:00] I mean maybe packet_loss_average is unrelated to whatever normal circumstances filter pins on analytics [02:07:07] s/pins/pings/ [02:07:17] ah, yea [02:07:46] I'm unclear on what's up we lost the link to esams from eqiad momentarily? [02:07:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:09:43] (03PS9) 10BBlack: wikimedia.ee - own zonefile and set external MX [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [02:10:19] so from analytics1003 you cant tcptraceroute iron, but from iron you can get to ana1003 [02:10:21] chasemp: it looks like we lost some kind of link or routing between esams and eqiad for a few minutes that only affected the private subnet in esams but not the public [02:10:50] possibly-unrelatedly: two hosts in eqiad have temporarily reported packet loss in icinga for a few minutesshortly afterwards [02:11:10] (although perhaps that was picking traffic between those hosts and esams during the esams event?) [02:11:28] hmmm ok thank you, seems weird [02:19:08] (03PS1) 10Aaron Schulz: Removed old jobrunner.ini file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150124 [02:24:12] (03CR) 10BBlack: [C: 032] "I rebased, updating formatting to match wikimedia.com, and validated that the only data diff was the MX part." [operations/dns] - 10https://gerrit.wikimedia.org/r/148762 (owner: 10Dzahn) [02:24:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [02:27:48] (03CR) 10Ori.livneh: [C: 04-1] "Go ahead and just remove the entry for the file rather than ensure => absent it. I'll remove it via Salt from the relevant hosts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150124 (owner: 10Aaron Schulz) [02:29:27] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only" [operations/dns] - 10https://gerrit.wikimedia.org/r/150095 (https://bugzilla.wikimedia.org/68769) (owner: 10Dzahn) [02:30:14] (03PS2) 10BBlack: wmftest.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150096 (owner: 10Dzahn) [02:30:41] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150096 (owner: 10Dzahn) [02:31:00] (03PS2) 10BBlack: wikivoyage.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150099 (owner: 10Dzahn) [02:31:35] (03CR) 10BBlack: [C: 032] "Confirmed whitespace only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150099 (owner: 10Dzahn) [02:31:48] (03PS2) 10BBlack: wikiversity.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150103 (owner: 10Dzahn) [02:32:25] (03CR) 10BBlack: "Confirmed whitespace only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150103 (owner: 10Dzahn) [02:32:34] (03CR) 10BBlack: [C: 032] "Confirmed whitespace only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150103 (owner: 10Dzahn) [02:32:47] (03PS2) 10BBlack: wikisource.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150105 (owner: 10Dzahn) [02:33:22] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150105 (owner: 10Dzahn) [02:33:31] (03PS2) 10BBlack: wiktionary.org - retab and align [operations/dns] - 10https://gerrit.wikimedia.org/r/150101 (owner: 10Dzahn) [02:34:03] (03CR) 10BBlack: [C: 032] "Confirmed whitespace only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150101 (owner: 10Dzahn) [02:36:21] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-29 02:35:18+00:00 [02:36:28] Logged the message, Master [02:38:20] (03CR) 10BBlack: "Joe is this still a desired change? Is it pending something?" [operations/dns] - 10https://gerrit.wikimedia.org/r/146647 (owner: 10Jkrauska) [02:41:14] <^d> Ugh. https://github.com/elasticsearch/elasticsearch/issues/6168 [02:41:35] (03PS2) 10BBlack: 10.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150107 (owner: 10Dzahn) [02:42:08] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150107 (owner: 10Dzahn) [02:43:11] (03PS2) 10BBlack: 174.198.91.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150109 (owner: 10Dzahn) [02:43:42] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150109 (owner: 10Dzahn) [02:44:33] (03PS2) 10BBlack: 154.80.208.in-addr.arpa - retab only [operations/dns] - 10https://gerrit.wikimedia.org/r/150108 (owner: 10Dzahn) [02:44:55] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150108 (owner: 10Dzahn) [02:46:09] (03PS2) 10Aaron Schulz: Removed old jobrunner.ini file [operations/puppet] - 10https://gerrit.wikimedia.org/r/150124 [02:48:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:59:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [03:10:07] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:58] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.026 second response time [03:11:34] !log LocalisationUpdate completed (1.24wmf15) at 2014-07-29 03:10:31+00:00 [03:11:39] Logged the message, Master [03:18:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:25:25] (03CR) 10Ori.livneh: [C: 032] "trivial" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150124 (owner: 10Aaron Schulz) [03:48:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 29 03:47:39 UTC 2014 (duration 47m 38s) [03:48:51] Logged the message, Master [04:05:28] (03PS1) 10Dzahn: wikibooks.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150127 [04:17:37] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 15 data above and 9 below the confidence bounds [04:17:44] (03PS1) 10Dzahn: wikinews.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150129 [04:19:47] (03PS1) 10Dzahn: wikiquote.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150130 [04:22:09] (03CR) 10Scottlee: [C: 031] wikinews.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150129 (owner: 10Dzahn) [04:22:26] (03CR) 10Scottlee: [C: 031] wikibooks.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150127 (owner: 10Dzahn) [04:26:07] (03PS1) 10Dzahn: 1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150132 [04:28:50] (03PS1) 10Dzahn: 26.35.198.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150133 [04:33:37] (03PS1) 10Dzahn: 2.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150134 [04:36:29] (03PS1) 10Dzahn: 3.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150135 [04:40:10] (03PS1) 10Dzahn: 59.15.185.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150136 [04:40:45] (03CR) 10Dzahn: "and that's it. all for Bug 68769" [operations/dns] - 10https://gerrit.wikimedia.org/r/150136 (owner: 10Dzahn) [04:42:07] (03CR) 10Scottlee: [C: 031] 3.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150135 (owner: 10Dzahn) [04:42:36] (03CR) 10Scottlee: [C: 031] 59.15.185.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150136 (owner: 10Dzahn) [04:43:50] (03CR) 10Scottlee: [C: 031] 1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150132 (owner: 10Dzahn) [04:44:12] (03CR) 10Scottlee: [C: 031] 26.35.198.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150133 (owner: 10Dzahn) [05:07:41] apache-config is dead? [05:10:56] long live apache-config [05:13:27] That repo name is probably documented. [05:19:33] (03PS1) 10MZMcBride: Minor tweaks to README_BEFORE_EDITING [operations/apache-config] - 10https://gerrit.wikimedia.org/r/150144 [05:23:27] (03CR) 10Ori.livneh: [C: 032 V: 032] Minor tweaks to README_BEFORE_EDITING [operations/apache-config] - 10https://gerrit.wikimedia.org/r/150144 (owner: 10MZMcBride) [05:23:45] office action. [05:31:20] ori++ [05:36:09] * Carmela tickles ori. [05:39:51] (03PS1) 10Florianschmidtwelzow: WIP: Add Uploadrestriction for Commons in MF [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) [05:48:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [05:56:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [05:57:37] bblack: is that still varnish restarts? [05:57:44] yeah [05:57:59] * ori nods [05:58:41] there's only 100 machines to update manually and watch for random restart failures and then retry repeatedly, one by one with some reasonable delay between them each so the effects don't stack up [05:58:45] it's tons of fun :P [05:59:25] we have 100 varnishes? [05:59:29] yeah [05:59:42] wow, i hadn't realized [06:01:06] i understand a little bit better now why you were motivated to think about varnish orchestration the other week [06:01:45] the main problem with updates and restarts is the random chance of failure due to the stupid persistent mmap implementation [06:01:55] and this update addresses that, so hopefully things will eventually get better [06:02:24] (but after the update + related puppet config change, each and every varnish needs its persistent cache wiped out before the fix takes effect, so that will go out even slower) [06:10:07] PROBLEM - puppetmaster backend https on strontium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:11:07] RECOVERY - puppetmaster backend https on strontium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 6.258 second response time [06:21:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [06:25:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [06:28:47] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:57] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:57] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:58] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:47] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [06:40:37] PROBLEM - puppet last run on ssl1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:18] PROBLEM - puppet last run on virt0 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:57] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:45:58] ^ those seem to be a spate of "500 internal server error" on puppetmaster(s) [06:46:17] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 60 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:58] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:47:17] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:47:17] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:47:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [06:58:37] RECOVERY - puppet last run on ssl1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:59:19] RECOVERY - puppet last run on virt0 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [07:00:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [07:13:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [07:26:27] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [07:26:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [07:45:26] (03PS1) 10Alexandros Kosiaris: sudo privileges should be in array [operations/puppet] - 10https://gerrit.wikimedia.org/r/150160 [07:51:24] !log uploaded PHP 5.3.10-1ubuntu3.13+wmf1 on apt.wikimedia.org. Puppet will upgrade it across the fleet within 20 mins [07:51:30] Logged the message, Master [07:51:36] let's watch the chaos now :-) [07:53:37] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Tue 29 Jul 2014 05:53:20 UTC [07:57:04] <_joe_> akosiaris: well a minor php update, what could possibl go wrong(TM) [07:59:29] _joe_: exactly :-) [08:20:54] (03PS1) 10Dan-nl: added Universiteits Museum Utrecht to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150163 [08:26:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [08:33:37] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Tue Jul 29 08:33:28 UTC 2014 [08:37:46] good morning [08:41:44] hey hashar [08:49:04] godog: hi! do you know how serious we are about releasing debian package to the public ? [08:50:13] hashar: you mean a package available from us to be downloaded or uploaded to the official debian archive? [08:52:11] godog: I guess the first step would be to have it offered for download [08:52:21] with the long term goal of having them added to Debian [08:53:11] the reason I am asking is that I really really would like our debian packaging to be as automatic as possible [08:53:29] with a nice workflow and appropriate responsibilities / accesses [08:54:28] hashar_: yep, that's sort of where we are going with the reprepro repo on releases.wm.o I mailed ops about last week [08:54:57] yeah that sounds like a good step forward :D [08:54:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [08:55:14] hashar_: of course the goal as you mentioned would be to have packages uploaded to debian, at least the ones that make sense to have [08:55:29] yup [08:55:40] and on the other side having a Jenkins instance to build the package for us [08:56:04] cause that is totally doable and would let non ops release packages easily :-] [08:58:02] indeed! [09:00:28] !log Zuul bumping Zuul cloner from patchset 21 to patchset 23. Deploying with tag wmf-deploy-2014-07-29-1 [09:00:34] Logged the message, Master [09:02:19] !log restarted zuul-server and zuul-merger on gallium (new version though that is a noop) [09:02:24] Logged the message, Master [09:10:04] <_joe_> !log stopping jobrunner on mw1053, disabling puppet as well - running tests [09:10:09] Logged the message, Master [09:31:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [09:37:24] (03CR) 10Giuseppe Lavagetto: [C: 031] HHVM: log warnings and stacktraces [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [09:38:40] (03PS5) 10Giuseppe Lavagetto: apache: cherry-pick mods added in Ia46312071 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [09:57:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [10:29:27] <_joe_> !log temporarily stopping puppet on appservers, releasing a potentially dangerous puppet change [10:29:32] Logged the message, Master [10:29:32] (03CR) 10Giuseppe Lavagetto: [C: 032] apache: cherry-pick mods added in Ia46312071 [operations/puppet] - 10https://gerrit.wikimedia.org/r/149066 (owner: 10Ori.livneh) [10:32:07] (03PS1) 10Matanya: access: Matt Flaschen new key [operations/puppet] - 10https://gerrit.wikimedia.org/r/150175 [10:35:03] <_joe_> !log puppet re-enabled on the appservers [10:35:07] Logged the message, Master [10:37:18] (03PS2) 10Matanya: access: Matt Flaschen new key [operations/puppet] - 10https://gerrit.wikimedia.org/r/150175 [10:45:13] (03Abandoned) 10Alexandros Kosiaris: osm-dbs to labsdbs mgmt IPs rename [operations/dns] - 10https://gerrit.wikimedia.org/r/143601 (owner: 10Alexandros Kosiaris) [10:45:29] (03Abandoned) 10Alexandros Kosiaris: osm-dbs to labsdb1006 and labsdb1007 [operations/dns] - 10https://gerrit.wikimedia.org/r/143600 (owner: 10Alexandros Kosiaris) [11:08:23] (03PS1) 10Alexandros Kosiaris: Add CNAME for osmdb service [operations/dns] - 10https://gerrit.wikimedia.org/r/150180 [11:23:58] (03PS3) 10Hoo man: Add client settings needed for the other projects beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149031 [11:24:00] (03PS1) 10Hoo man: Configure testwikidata to use the "special" site link group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150187 [11:28:27] Hmpf. There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection timed out [11:30:56] <_joe_> !log upgrading packages on mw1053, for testing hhvm with pcre-jit enabled [11:31:02] Logged the message, Master [13:11:33] (03CR) 10Ottomata: Split kafka package into 3 separate packages (034 comments) [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/149889 (owner: 10Ottomata) [13:15:53] (03CR) 10Manybubbles: "Chad - any chance you can have a glance at this before this morning's SWAT?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 (owner: 10Manybubbles) [13:22:27] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [13:28:30] (03PS3) 10Rillke: UploadWizard config: Add PD-old-70-1923 license [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145375 [13:29:50] (03PS1) 10Ottomata: Add refinery deploy target to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150208 [13:32:05] uh whoa, bad icinga data for all analytics stuff! all at once?! whaaa? [13:32:30] hm I think its the check_ganglia stuff [13:32:30] hmm [13:33:26] (03PS2) 10Mark Bergsma: Remove DNS recursor on sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/146072 [13:35:58] ah ganglia down? [13:35:59] m [13:36:00] hm [13:38:36] !log restarted gmetad on nickel, seems to have brought ganglia back up [13:38:41] Logged the message, Master [13:41:09] (03PS4) 10Ottomata: statistics: Add packages for halfak [operations/puppet] - 10https://gerrit.wikimedia.org/r/150045 (owner: 10Yuvipanda) [13:41:16] (03CR) 10Ottomata: [C: 032 V: 032] statistics: Add packages for halfak [operations/puppet] - 10https://gerrit.wikimedia.org/r/150045 (owner: 10Yuvipanda) [13:41:29] (03PS2) 10Ottomata: Add refinery deploy target to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150208 [13:41:34] (03CR) 10Ottomata: [C: 032 V: 032] Add refinery deploy target to stat1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150208 (owner: 10Ottomata) [13:41:35] ottomata: ty [13:44:20] (03PS3) 10Mark Bergsma: Remove DNS recursor on sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/146072 [13:45:19] (03CR) 10Mark Bergsma: [C: 032] Remove DNS recursor on sodium [operations/puppet] - 10https://gerrit.wikimedia.org/r/146072 (owner: 10Mark Bergsma) [13:45:54] (03Abandoned) 10Ottomata: added basic hbase support [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/99381 (owner: 10Physikerwelt) [13:48:37] (03CR) 10Chad: [C: 031] Cirrus all field rollout stage two [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 (owner: 10Manybubbles) [13:52:59] (03CR) 10Hashar: [C: 032] added Universiteits Museum Utrecht to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150163 (owner: 10Dan-nl) [13:53:04] (03Merged) 10jenkins-bot: added Universiteits Museum Utrecht to the wgCopyUploadsDomains array [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150163 (owner: 10Dan-nl) [13:53:14] Reedy: please save one sim for me [13:54:48] !log hashar Synchronized wmf-config/InitialiseSettings.php: added Universiteits Museum Utrecht to the wgCopyUploadsDomains array {{gerrit|150163}} (duration: 00m 04s) [13:54:53] Logged the message, Master [13:55:19] (03CR) 10Hashar: "deployed" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150163 (owner: 10Dan-nl) [14:05:36] matanya: standard/micro or nano? [14:06:08] Reedy: micro [14:08:41] (03PS1) 10Giuseppe Lavagetto: hhvm: provide hhvm-build-$VERSION [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150212 [14:08:43] (03PS1) 10Giuseppe Lavagetto: hhvm: lintian fixes [operations/debs/hhvm] - 10https://gerrit.wikimedia.org/r/150213 [14:10:00] <_joe_> godog: ^^ the first one is quite important and I'll probably ask some feedback to brett as well [14:11:10] <_joe_> I don't want to rebuild all extensions every time I rebuild hhvm [14:19:51] _joe_: I think Ori requested that [14:19:57] i.e. rebuild all extensions with latest hhvm [14:20:07] to ensure they can still be compiled [14:21:46] <_joe_> hashar: no the issue here is - we are not guaranteed any ABI compatibility from upstream [14:22:00] <_joe_> so we should rebuild all extensions each time we update the source [14:22:26] <_joe_> but - I'm building and rebuilding packages changing just files in debian, so it's abi compatible [14:22:33] ah [14:22:35] :-] [14:22:40] <_joe_> thus I found a way to provide an abi 'signature' [14:23:06] <_joe_> that would be the version of our package stripped of the +wmfN that I postpone to the version number [14:42:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [14:49:58] (03CR) 10Daniel Kinzler: [C: 031] "looks fine from the software side, can't vouch for the community side of things." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149187 (owner: 10Hoo man) [14:51:34] (03CR) 10Daniel Kinzler: [C: 031] "makes sense to use "special" as an alias to refer to all groups defined to be "special" (that is, singleton groups like commons or species" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150187 (owner: 10Hoo man) [14:52:15] (03CR) 10Daniel Kinzler: [C: 031] "make sense in general, don't know how it fits with the deployment schedule." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149031 (owner: 10Hoo man) [14:54:02] (03CR) 10Tobias Gritschacher: "Ping. What are the remaining blockers of this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [15:00:04] manybubbles, Reedy, manybubbles: Respected human, time to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140729T1500). Please do the needful. [15:04:54] (03CR) 10Manybubbles: [C: 032] Cirrus all field rollout stage two [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 (owner: 10Manybubbles) [15:05:06] (03Merged) 10jenkins-bot: Cirrus all field rollout stage two [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149906 (owner: 10Manybubbles) [15:06:02] !log manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - deploy cirrussearch all field stage 2 part 1 (duration: 00m 04s) [15:06:08] Logged the message, Master [15:06:38] !log manybubbles Synchronized wmf-config: SWAT - deploy cirrussearch all field stage 2 part 2 (duration: 00m 04s) [15:06:43] Logged the message, Master [15:08:56] (03PS4) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [15:08:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [15:11:27] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [15:13:25] !log building cirrus indexes for group0 wikis in place to turn on the weighted all field we'll use for performance improvements later [15:13:30] Logged the message, Master [15:14:15] (03CR) 10jenkins-bot: [V: 04-1] collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [15:15:57] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: Puppet last ran 21865 seconds ago, expected 14400 [15:16:57] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:21:56] ^d: As per https://rt.wikimedia.org/Ticket/Display.html?id=7837, it looks to me like you have Oxygen access now. Is that correct? [15:22:08] <^d> Yep. [15:22:18] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [15:23:20] ^d: OK, guess I just need to close this ticket then :) Thanks. [15:36:57] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 5913: relocating_shards: 0: initializing_shards: 34: unassigned_shards: 156 [15:36:57] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 5913: relocating_shards: 0: initializing_shards: 34: unassigned_shards: 156 [15:36:57] PROBLEM - ElasticSearch health check on elastic1019 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 5913: relocating_shards: 0: initializing_shards: 34: unassigned_shards: 156 [15:38:47] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 5918: relocating_shards: 0: initializing_shards: 29: unassigned_shards: 153 [15:38:47] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 5918: relocating_shards: 0: initializing_shards: 29: unassigned_shards: 153 [15:38:47] PROBLEM - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 5918: relocating_shards: 0: initializing_shards: 29: unassigned_shards: 153 [15:38:47] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 5918: relocating_shards: 0: initializing_shards: 29: unassigned_shards: 153 [15:38:47] PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2033: active_shards: 5918: relocating_shards: 0: initializing_shards: 29: unassigned_shards: 153 [15:40:18] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:43:44] (03CR) 10Legoktm: "@Faidon, could you take another look at this patch? The corresponding core change has been merged." [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 (owner: 1001tonythomas) [15:45:39] (03PS5) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [15:49:00] (03CR) 10jenkins-bot: [V: 04-1] collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [15:59:47] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:47] RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:47] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:47] RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:48] RECOVERY - ElasticSearch health check on elastic1017 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:57] RECOVERY - ElasticSearch health check on elastic1019 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:57] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [15:59:58] RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 19: number_of_data_nodes: 19: active_primary_shards: 2034: active_shards: 6101: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [16:00:04] hoo: Dear anthropoid, the time has come. Please deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140729T1600). [16:00:27] anthropoid... wow, that's new [16:00:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [16:01:11] 'dear anthropoid' :p [16:01:38] (03PS2) 10Hoo man: Enable Wikibase other projects links per default for ruwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149187 [16:02:23] (03CR) 10Hoo man: [C: 032] Enable Wikibase other projects links per default for ruwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149187 (owner: 10Hoo man) [16:02:26] (03Merged) 10jenkins-bot: Enable Wikibase other projects links per default for ruwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149187 (owner: 10Hoo man) [16:03:08] (03PS2) 10Florianschmidtwelzow: WIP: Add Uploadrestriction for Commons in MF [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) [16:03:13] (03PS3) 10Florianschmidtwelzow: Add Uploadrestriction for Commons in MF [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) [16:03:22] !log hoo Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase other projects links per default for ruwiki (duration: 00m 07s) [16:03:28] Logged the message, Master [16:03:39] (03PS9) 1001tonythomas: Removed exim errors_to to support custom Return-Path [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 [16:04:31] putnik: ^ [16:05:38] putnik: wikibase-otherprojects-$project are the message for localization [16:05:44] * messages [16:05:56] You should probably create those [16:06:05] * hoo sees the links in Engl. atm [16:07:29] (03PS2) 10Hoo man: Configure testwikidata to use the "special" site link group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150187 [16:07:57] (03PS4) 10Hoo man: Add client settings needed for the other projects beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149031 [16:08:19] (03CR) 10Hoo man: [C: 032] Add client settings needed for the other projects beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149031 (owner: 10Hoo man) [16:08:29] <_joe_> !log reenabled puppet on mw1053 [16:08:32] (03CR) 10Hoo man: [C: 032] Configure testwikidata to use the "special" site link group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150187 (owner: 10Hoo man) [16:08:33] Logged the message, Master [16:08:35] (03Merged) 10jenkins-bot: Configure testwikidata to use the "special" site link group [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150187 (owner: 10Hoo man) [16:08:38] (03Merged) 10jenkins-bot: Add client settings needed for the other projects beta feature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149031 (owner: 10Hoo man) [16:09:41] !log restarted logstash on logstash1001.eqiad.wmnet; log volume looked to be down from expected levels [16:09:46] Logged the message, Master [16:10:32] !log logstash log event volume up after restart [16:10:36] Logged the message, Master [16:10:41] !log hoo Synchronized wmf-config/Wikibase.php: Make testwikidata use the "special" sitelink group. Preparations for submodule updates. (duration: 00m 08s) [16:10:46] Logged the message, Master [16:11:07] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0531561461794 [16:14:52] (03PS6) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [16:14:57] mh... got i18n changes, so will need to scap :P [16:15:39] hoo :o [16:18:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [16:20:14] !log hoo Started scap: Updating Wikidata with various changes for testwikidata and a client bug fix. [16:20:20] Logged the message, Master [16:33:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [16:37:10] !log replacing defective disk virt1009 [16:37:16] Logged the message, Master [16:39:18] RECOVERY - RAID on virt1009 is OK: OK: Active: 14, Working: 14, Failed: 0, Spare: 0 [16:42:08] (03PS1) 10Hoo man: Only declare "special" site groups for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150246 [16:42:16] (03PS3) 10Aaron Schulz: [WIP] Added a streamlined RunJobs that can be used by redisJobService [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149587 [16:47:42] !log hoo Finished scap: Updating Wikidata with various changes for testwikidata and a client bug fix. (duration: 27m 27s) [16:47:49] Logged the message, Master [16:47:52] (03CR) 10Hoo man: [C: 032] Only declare "special" site groups for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150246 (owner: 10Hoo man) [16:48:06] (03Merged) 10jenkins-bot: Only declare "special" site groups for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150246 (owner: 10Hoo man) [16:48:53] !log hoo Synchronized wmf-config/Wikibase.php: Only declare "special" sitegroups for testwikidata (duration: 00m 08s) [16:49:00] Logged the message, Master [16:49:32] Reedy: wtf [16:49:39] are you messing on tin? [16:49:49] yeah [16:50:02] I need to deploy one more change [16:50:04] moment [16:50:10] (03PS1) 10Reedy: Non wikipedias to 1.24 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150249 [16:50:11] and then probably touch some JS :/ [16:50:16] (03Abandoned) 10Mwalker: Jenkins job validation (DO NOT SUBMIT) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/149387 (owner: 10Hashar) [16:50:17] reverted [16:50:20] go ahead [16:50:23] sorry [16:50:37] RECOVERY - check configured eth on gold is OK: NRPE: Unable to read output [16:50:40] !log hoo Synchronized wmf-config/Wikibase.php: Only declare "special" sitegroups for testwikidata (duration: 00m 07s) [16:50:46] Logged the message, Master [16:52:01] (03CR) 10Krinkle: [C: 031] "This doesn't seem to take care of $wgWikimediaShopShowLinkCountries, however that config variable was actually a no-op because the Wikimed" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149787 (https://bugzilla.wikimedia.org/55678) (owner: 10Legoktm) [16:52:21] !log hoo Synchronized php-1.24wmf14/extensions/Wikidata/: Touch JS (duration: 00m 11s) [16:52:28] Logged the message, Master [16:52:36] !log hoo Synchronized php-1.24wmf15/extensions/Wikidata/: Touch JS (duration: 00m 10s) [16:52:42] Logged the message, Master [16:52:54] In theory I'm done now [16:53:19] oh wait no [16:55:11] (03PS1) 10Hoo man: Bump $wgCacheEpoch for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150250 [16:55:47] (03CR) 10Hoo man: [C: 032] Bump $wgCacheEpoch for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150250 (owner: 10Hoo man) [16:55:53] (03Merged) 10jenkins-bot: Bump $wgCacheEpoch for testwikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150250 (owner: 10Hoo man) [16:56:31] !log hoo Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for testwikidata (duration: 00m 08s) [16:56:37] Logged the message, Master [16:59:56] Ok, all verified [17:14:31] (03CR) 10Jgreen: [C: 032 V: 031] Give OCG admins the ability to run things as OCG [operations/puppet] - 10https://gerrit.wikimedia.org/r/150016 (owner: 10Mwalker) [17:15:25] (03PS1) 10Jforrester: Enable TemplateData GUI on Dutch Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150257 (https://bugzilla.wikimedia.org/68795) [17:16:43] hoo, why this messages aren't in Translatewiki? [17:19:22] (03CR) 10Jkrauska: [C: 031] "yes please -- couldn't find a pusher" [operations/dns] - 10https://gerrit.wikimedia.org/r/146647 (owner: 10Jkrauska) [17:20:44] Oh, them already there =) [17:20:54] putnik: They are [17:23:06] (03PS2) 10Dzahn: Final change for corp dns - change ttl back to 1h [operations/dns] - 10https://gerrit.wikimedia.org/r/146647 (owner: 10Jkrauska) [17:24:01] (03CR) 10Dzahn: "needed manual rebase" [operations/dns] - 10https://gerrit.wikimedia.org/r/146647 (owner: 10Jkrauska) [17:24:52] (03CR) 10Dzahn: [C: 032] Final change for corp dns - change ttl back to 1h [operations/dns] - 10https://gerrit.wikimedia.org/r/146647 (owner: 10Jkrauska) [17:29:32] mwalker: I'm preparing a patch to disable your cluster access… I have two questions. 1) Is there any access that you're hoping to retain after you go? 2) I note that you are the only member of group pdf-render-admins. Have you chosen an heir for that? [17:31:03] (03PS1) 10Andrew Bogott: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 [17:31:07] :'( 1) no, disable it all 2) my heirs will be cscott and gwicke [17:31:07] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [17:32:14] cscott_away, gwicke, can you please open access-request tickets regarding access to pdf renderers? [17:32:40] mwalker: ok, thanks -- shall I postpone merging that patch until Monday? [17:32:59] (I'm in no hurry to give you the boot, just taking care of things while I'm thinking of it.) [17:33:11] aww mwalker [17:33:14] andrewbogott, yes please :) I still have work to do [17:36:28] andrewbogott, while we're on this topic; I asked terry to ask Eloquence to keep me on the operations list -- I think that's managed in puppet somewhere? [17:36:50] (03PS2) 10Andrew Bogott: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 [17:37:04] mwalker: no idea, but I'll try to figure it out. [17:37:19] chasemp: ^ ? [17:37:39] operations list like mailing list? [17:37:40] (03CR) 10Andrew Bogott: [C: 04-2] "Merge after the first of August." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [17:37:45] idk mutante ^ :) [17:37:55] yep; mailing list [17:38:22] chasemp: sorry, the ^ was meant to refer to the gerrit patch: https://gerrit.wikimedia.org/r/#/c/150263/ [17:38:26] ah [17:38:27] sorry ok [17:38:51] hoo|away, I finished =) But why this block before Tools? [17:39:00] no, the ops list is not managed in puppet [17:39:16] it's not like puppet has to do who is subscribed [17:39:30] putnik: don't know offhand [17:39:40] (03CR) 10Mwalker: [C: 031] "Aye; don't merge it until after the 1st of August; but otherwise this looks correct to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [17:39:53] mwalker: ^ it's just.. the list admins not doing anything.. [17:40:15] mutante, they have to do something though because my email address is going to change [17:40:16] and you would stay on it.. besides.. of course depends if you still control the email address that is subscribed [17:40:33] (03PS3) 10Andrew Bogott: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 [17:40:39] I'll put in an RT for it [17:40:49] mwalker: technically it is ops-owner@lists.wikimedia.org [17:40:51] or that, yea [17:41:03] putnik: if it's a problem, you can open a bug and we'll take care of it [17:41:29] https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions [17:41:37] component ist WikidataClient [17:41:47] (03CR) 10Rush: Disable access for mwalker, who is leaving WMF. (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [17:42:04] mwalker: if your email address changes then it's removing old user and subscribing new user, you can also just request to be subscribed , self service on https://lists.wikimedia.org/mailman/listinfo/ops [17:43:12] ah; good point [17:43:15] hoo|away, ok, I'll do. Other projects and Other languages are simular, they should be close to each other. [17:43:56] (03PS7) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [17:47:26] (03PS4) 10Andrew Bogott: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 [17:49:36] (03CR) 10Rush: [C: 031] "technically cool! (literally)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [17:52:57] !log aaron Synchronized php-1.24wmf15/includes/media: 76459cebd9cfbb33e9845f7acd8b8c1382cdae61 (duration: 00m 08s) [17:53:02] Logged the message, Master [17:55:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [17:57:57] (03PS1) 10Dzahn: restrict access to puppet logs to root users [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 [17:58:11] godog: Jeff_Green : ^ [17:59:34] (03PS2) 10Dzahn: restrict access to puppet logs to root users [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 [17:59:44] mutante: looking [18:00:04] Reedy, greg-g: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140729T1800). [18:00:14] heh [18:00:46] mutante: that's a good start, but you really have to survey the logs on different builds and make sure puppet client logs aren't ending up in /var/log/syslog etc [18:00:58] damn, jouncebot needs plural detection [18:01:39] Jeff_Green: yea, i was thinking that sometimes they were in syslog, true. also .. there is an empty /var/log/puppet/ dir .. hrmm [18:02:07] afaik puppet just spews to syslog with an ident and priority or whatever, and what happens next depends on the syslog config [18:02:11] we should just make sure they are all configured the same [18:02:31] that makes sense, yea [18:02:50] you could add rsyslog config that preempts all the other rules to catch things matching puppet [18:03:21] in frack i decided this was a wild goosechase and it was safer to just stop puppet from spewing diffs to syslog altogether [18:03:25] do we need puppet service to syslog at all? just a thought [18:03:33] Jeff_Green: you ate my idea first :) [18:04:00] hmm,, maybe we dont [18:04:09] i ate it after poking my eyes with gnarly puppet sticks for a half day before throwing up my hands [18:04:20] that ...is a visual :) [18:04:24] i did look at it before..but rarely [18:04:29] usually it's just looking at the live run [18:04:44] iirc even if you disable puppet's spewing of diffs, puppet agent -tv overrides that [18:04:44] though /var/log/syslog isn't world readable already [18:05:03] we have a class for that, make syslog world readable [18:05:12] ah! [18:05:13] where it's desired, such as on ocg [18:05:32] modules/base/manifests/init.pp:class base::syslogs ( [18:05:51] if you set /etc/default/pupet or whatever to no syslog [18:05:54] nevermind, I thought running puppet >> log from cron wouldn't touch syslog at all [18:05:57] does it still syslog with puppet agent test? [18:06:13] I guess it does [18:06:30] godog: even if that's true, its standard practice to puppet agent -tv when testing that a host builds correctly [18:07:23] i'd just like to be sure that it's _always_ in the same location and only there [18:07:42] rsyslog rules I guess are the way, and that covers your backside for any other solution [18:07:43] Jeff_Green: true, yeah I don't think there's any way around that ATM [18:08:09] mutante: yeah, I think that's a sane compromise [18:08:12] (03PS8) 1020after4: collect host keys on deployment-bastion for beta environment. [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 [18:08:17] (03CR) 10Kaldari: [C: 04-1] Add Uploadrestriction for Commons in MF (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) (owner: 10Florianschmidtwelzow) [18:09:27] mutante: you are gonig to hate me :) does that rule set the perms for the logfile _after_ puppet has spewed passwords into it? [18:09:37] and are we worried about this edge case, just thought I'd bring it up :) [18:10:25] chasemp: hrmmm :) i don't know, maybe we just do what Jeff did and send it to /dev/null :) [18:10:39] (03CR) 10Hoo man: "As a stopgap: Can we maybe set group to 705 so that deployers can still view? I found these logs useful in the past..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [18:10:43] the change was more to start the discussion [18:11:02] the idea of password leakage is pretty serious here I think [18:11:09] I am kinda with jeff on the null it till needed [18:11:33] I'm not sure what value we get from having diffs in logs [18:11:42] especially how we double and triple duty hosts, hard to keep track of who can see what and what functions a box does vs. who can see what logs [18:11:50] nonobvious disclosure becomes oh so easy I think [18:11:51] although fwiw the /dev/null hack breaks even puppet agent -tv [18:12:05] you won't see diffs in agent runs after that hack [18:12:10] or perhaps just remove --show_diff and keep the rest [18:12:27] oh, but we need that, i could also amend my change to ensure => absent ? [18:12:28] in cron I mean [18:12:55] godog all the normal settings are overridden with the -v flag when you do a puppet run though [18:12:59] it's really badly implemented [18:13:31] anyone with root on a box cna get any secret on that box [18:13:45] password file -> blank it -> puppet -v [18:14:08] NO ROOT FOR ANYBODY OMGZ! [18:14:12] (03PS2) 10Reedy: Non wikipedias to 1.24 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150249 [18:14:33] Jeff_Green: yeah I think it is less of an issue for manual puppet runs tbh [18:14:49] still undesirable [18:14:49] (03CR) 1020after4: "Well everything should be right but it still isn't showing the" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149364 (owner: 1020after4) [18:14:51] (03CR) 10Reedy: [C: 032] Non wikipedias to 1.24 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150249 (owner: 10Reedy) [18:14:53] (03Merged) 10jenkins-bot: Non wikipedias to 1.24 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150249 (owner: 10Reedy) [18:14:57] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [18:15:00] Jeff_Green: :) I thought we already knew that? [18:15:03] my vote is for the syslog rule [18:15:15] and let puppet spew whatever it wants to syslog [18:15:29] and then it's at least limited to those who can sudo and run puppet [18:15:31] i mean the route-all-puppet-to-one-file plus permissions hack [18:15:37] yeah [18:16:05] only if the file that controls the blackholing is similarly protected tho [18:16:07] idk if that's true [18:17:03] you mean make puppet deploy /etc/rsyslog.d/puppet root.root 644? [18:17:28] I guess? turles all the way down [18:17:34] turtles even [18:18:31] kernel security modules to control where puppet is allowed to write :p [18:20:41] if the config that black holes the diffs is root only, and the file they go to during a manual run is as well [18:20:58] that's as good as it gets? [18:21:20] (03PS1) 10RobH: dns for dbproxy1001-1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/150283 [18:22:12] yeah [18:23:18] (03CR) 10RobH: [C: 032] dns for dbproxy1001-1002 [operations/dns] - 10https://gerrit.wikimedia.org/r/150283 (owner: 10RobH) [18:25:27] so, consesus? [18:26:05] !log removed "filter { input labs6-in; }" from ae3.1119 (labs-support1-c-eqiad) on cr[12]-eqiad [18:26:10] Logged the message, Master [18:26:21] godog: sounds like it, yea [18:26:24] sure [18:27:06] (03PS5) 10Mwalker: Disable access for mwalker, who is leaving WMF. [operations/puppet] - 10https://gerrit.wikimedia.org/r/150263 (owner: 10Andrew Bogott) [18:27:20] (03PS4) 10Florianschmidtwelzow: Add Uploadrestriction for Commons in MF [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) [18:27:58] in the end root gunna be root [18:28:20] we seem to have some kind of a mail loop between polonium and iridium [18:28:33] !log reedy Synchronized php-1.24wmf15/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php: touch (duration: 00m 16s) [18:28:37] Logged the message, Master [18:28:57] ori: _joe_: can we change apache log location to fluorine instead of nfs easily? [18:29:01] Jeff_Green: noticed the same between magnesium (rt) and polonium [18:29:04] (03CR) 10Florianschmidtwelzow: Add Uploadrestriction for Commons in MF (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150145 (https://bugzilla.wikimedia.org/62598) (owner: 10Florianschmidtwelzow) [18:29:05] seems same symptoms? [18:29:09] mabye since mail revamp? [18:29:21] i suspect so yeah [18:29:23] I forgot about it till now, but wasn't obvious to me what was up [18:29:24] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf15 [18:29:31] Logged the message, Master [18:31:07] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [18:32:29] seems like in this case there's no destination for root@iridium.eqiad.wmnet. iridium originates, smartroutes to polonium, which sends it back to iridium by subdomain name [18:32:30] (03CR) 10Reedy: "1.24wmf15" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150249 (owner: 10Reedy) [18:33:27] mw1177 enwikivoyage: [db60dfac] /wiki/Special:CentralAutoLogin/start?type=1x1&from=enwiki Exception from line 4186 of /usr/local/apache/common-local/php-1.24wmf15/languages/Language.php: Invalid language code "WikiPedia" [18:33:31] lol wut [18:36:34] (03PS1) 10RobH: install params for dbproxy1001-1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150288 [18:37:09] Jeff_Green: so what's the fix for that in our setup? [18:37:13] maybe both aren't getting standard [18:37:23] because exim is not a straight forward-er [18:37:24] writing it up in rt [18:37:29] and something in that logic handles this [18:37:37] cool [18:39:12] ottomata: ping [18:39:13] (03CR) 10RobH: [C: 032] install params for dbproxy1001-1002 [operations/puppet] - 10https://gerrit.wikimedia.org/r/150288 (owner: 10RobH) [18:39:20] (03CR) 10Filippo Giunchedi: [C: 031] "a bit racy when the file gets first created by puppet and permissions get changed, but certainly better than nothing!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150273 (owner: 10Dzahn) [18:39:21] cmjohnson1: ponnnng [18:39:35] can I power cycle virt1009? [18:40:11] what is iridium? i can't log in [18:40:33] phabricator(someday) [18:40:46] try jeff instead of root? [18:40:58] cmjohnson1: virt1009? uhhhh [18:41:04] i don't know anything about it, so you might be asking the wrong person! [18:41:05] :) [18:41:05] chasemp: worked, thx [18:41:14] oh [18:41:19] analytics1009 you mean? [18:41:39] oh..hrm..okay wrong person then [18:41:40] sorry [18:41:49] Jeff_Green: can you peek at magnesium [18:41:52] cmjohnson1: try Coren or andrewbogott [18:41:53] symptoms look the same to me [18:41:53] so used to going to you with cisco problems [18:42:19] Virt1009? Lemme check. [18:42:38] cmjohnson1: Give me a second to check... [18:42:42] okay [18:42:58] Heh. Double answer! [18:43:09] double the fun....double the trouble [18:43:12] cmjohnson1: Yes, should be safe. Why are you rebooting it? [18:43:18] I see no reason not to, this doesn't look like it's sporting a compute node. Safe enough. [18:43:20] replaced /dev/sdb [18:43:29] chasemp: no luck on magnesium, it just disconnects me [18:43:38] really? weird [18:43:46] iridium has some ganglia/gmond-related puppet fail [18:43:49] cmjohnson1: ok, have at. [18:43:52] !log power cycling virt1009 [18:43:54] which is why it's cronspamming in the first place [18:43:56] Logged the message, Master [18:46:20] Jeff_Green: yeah haven't seen that before [18:46:32] consequence of killing generic::systemusers [18:46:34] I think [18:46:37] PROBLEM - Host virt1009 is DOWN: PING CRITICAL - Packet loss = 100% [18:47:28] oic [18:48:06] I will look at fixing that error to reduce the spam at least :) [18:48:30] ok cool [19:06:05] i really have a hard time parsing exim logic... [19:06:14] Jeff_Green: I fixed the puppet error at least [19:06:18] cool [19:06:23] thanks for looking at htis [19:06:30] I also have a hard time understanding exim [19:09:09] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf14 [19:09:14] Logged the message, Master [19:09:31] (03PS1) 10Reedy: Wikivoyages back to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150296 [19:10:18] Ah [19:10:39] 'WikiPedia' => 'Wikipedia', [19:12:09] But that hasn't changed [19:13:40] (03PS1) 10Reedy: Remove WikiPedia from wmgExtraLanguageNames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150299 [19:14:22] !log reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s) [19:14:27] Logged the message, Master [19:14:55] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf15... [19:15:00] Logged the message, Master [19:15:53] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf14 [19:16:11] (03Abandoned) 10Reedy: Remove WikiPedia from wmgExtraLanguageNames [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150299 (owner: 10Reedy) [19:18:27] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 0 unmerged changes in mediawiki_config (dir /a/common/). [19:19:26] (03CR) 10Jforrester: [C: 04-1] "Not sure that this is ready yet." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149597 (owner: 10Jforrester) [19:25:26] (03PS1) 10Reedy: Disable RelatedSites Extension [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150301 (https://bugzilla.wikimedia.org/68815) [19:27:51] (03CR) 10Dzahn: [C: 031] "yes please, there are fatals apparently related to this" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150301 (https://bugzilla.wikimedia.org/68815) (owner: 10Reedy) [19:28:36] Reedy: but.. i dont see the error he describes? [19:29:01] mutante: I reverted wikivoyage back to 1.24wmf14 [19:29:04] (twice) [19:29:13] that explains it, thanks [19:29:25] I'm just trying to find what change he minds [19:29:25] https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf15/Changelog [19:30:20] uhm.. yea.. "put inter-project links into the sidebar" [19:31:02] maybe https://gerrit.wikimedia.org/r/#/c/146708/ ? [19:31:13] at least it's related to interwiki [19:46:33] !log Reloading Zuul to deploy bast1001 [19:46:39] Logged the message, Master [19:46:45] !log Reloading Zuul to deploy I7f80ee0b85d29791b7. [19:46:51] Logged the message, Master [19:47:10] James_F: [19:47:24] mutante: did you see matt f ? [19:47:31] matanya: no [19:47:39] he timed out from IRC after in pinged him [19:47:58] Krinkle: Ta. [19:47:59] ok, the change is there, if you wish to execute it [19:48:08] matanya: how to confirm? [19:48:10] once you see him of course [19:48:14] i dont [19:48:21] i'm going to be in travel tomorrow [19:48:32] this needs to be possible from remote somehow [19:48:46] ask him/request him to vote on the change/put the key on office wiki [19:48:47] doesn't scale (tm) [19:48:58] yes, that will be it [19:49:12] but he already said on ticket he'd do it later [19:49:29] ask for his ID, social security number, his bank account etc :) [19:49:53] ok, no rush on my end anyway, he wants the access, not me [19:50:47] ACKNOWLEDGEMENT - DPKG on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - Disk space on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - NTP on virt1009 is CRITICAL: NTP CRITICAL: No response from NTP server Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - RAID on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - SSH on virt1009 is CRITICAL: Connection timed out Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - check configured eth on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:47] ACKNOWLEDGEMENT - check if dhclient is running on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:50] ACKNOWLEDGEMENT - puppet disabled on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:50:50] ACKNOWLEDGEMENT - puppet last run on virt1009 is CRITICAL: Timeout while attempting connection Chris Johnson There was a problem with the disk replacement. [19:52:37] ACKNOWLEDGEMENT - Host virt1009 is DOWN: PING CRITICAL - Packet loss = 100% Chris Johnson There was a problem with the disk replacement. - The acknowledgement expires at: 2014-08-03 19:52:06. [19:59:11] (03PS1) 10Dzahn: mailman lighttpd redirect for research-team list [operations/puppet] - 10https://gerrit.wikimedia.org/r/150366 [20:02:37] (03CR) 10Dzahn: "Aaron, this part does the http redirect from the listinfo page.." [operations/puppet] - 10https://gerrit.wikimedia.org/r/150366 (owner: 10Dzahn) [20:04:29] (03PS1) 10Dzahn: mailman list alias for renamed research list [operations/puppet] - 10https://gerrit.wikimedia.org/r/150370 [20:06:44] (03CR) 10Dzahn: "Aaron, and this part does the actual mail redirect. then it just needs also a config change which is " add the old list email address to "" [operations/puppet] - 10https://gerrit.wikimedia.org/r/150370 (owner: 10Dzahn) [20:19:06] (03CR) 10Dzahn: [C: 032] wikibooks.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150127 (owner: 10Dzahn) [20:20:10] greg-g, we're going to enable OCG today -- cscott has some work before we're ready but... I'm thinking we'll be ready to go in an hour or two [20:21:11] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [20:21:55] mwalker, greg-g: yup, just re-running my test suites to make sure i'm not going to deploy something broken. [20:21:59] (03CR) 10Dzahn: [C: 032] wikinews.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150129 (owner: 10Dzahn) [20:22:53] cscott, I typically deploy to beta first, with the built and committed node_modules -- which is probably where we can add the automated tests in jenkins [20:26:06] (03PS2) 10Dzahn: wikiquote.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150130 [20:31:50] mwalker: cscott find a good time and doit [20:35:11] how about enabling it on wikitech, then you can make a meta PDF of https://wikitech.wikimedia.org/wiki/OCG [20:35:31] hehe [20:35:50] does wikitech have a parsoid instance? [20:36:41] I don't think so... [20:36:50] i'm not sure, but i think no [20:37:08] sadness; no OCG for wikitech then... it needs parsoid [20:37:09] having some articles as offline content wouldn't be bad though [20:37:14] gotcha [20:37:48] (03CR) 10Dzahn: [C: 032] wikiquote.org - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150130 (owner: 10Dzahn) [20:38:06] (03PS4) 10Aaron Schulz: Added a streamlined RunJobs that can be used by redisJobService [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149587 [20:39:35] we could enable the Parsoid extension on wikitech though; and do some configuration magic there and on the parsoid cluster and it would probably "just work" though... [20:39:43] James_F|Away, RoanKattouw ^ thoughts? [20:40:12] Hi everyone. I've been trying to run queries on Hive, but I keep getting "Permission denied". I wonder if it's my query, the system, or something I don't know about? [20:41:01] have you asked ironholds? [20:41:07] he was the test bunny for hive [20:41:30] mwalker: I'm pretty sure we have VE on wikitechwiki... [20:42:28] RoanKattouw, I dont have an editing preference for it (and beta features is not enabled on wikitech) [20:42:35] not in person, but noticed that he filed a similar bugzilla bug so thought to ask here first cause i just got my access right recently. [20:42:47] mwalker: Try ?veaction=edit ? [20:43:15] RoanKattouw, no dice... [20:43:55] 'labswiki': 'https://wikitech.wikimedia.org/w/api.php' // Not private but can't use proxy [20:43:57] In the Parsoid config [20:45:26] (03CR) 10Andrew Bogott: [C: 032] mailman list alias for renamed research list [operations/puppet] - 10https://gerrit.wikimedia.org/r/150370 (owner: 10Dzahn) [20:45:29] RoanKattouw, ooh; that does work [20:45:42] (03CR) 10Andrew Bogott: [C: 032] mailman lighttpd redirect for research-team list [operations/puppet] - 10https://gerrit.wikimedia.org/r/150366 (owner: 10Dzahn) [20:45:47] Reedy, are you the one who administers wikitech? [20:46:01] Nope [20:46:06] do you know who does? [20:46:08] I don't have shell there :( [20:46:09] Ops [20:46:11] ;) [20:46:22] VE is installed there [20:46:22] Try andrewboggott [20:46:28] According to Special:Version [20:46:46] I notice that VE emits an error when you try to load it on "Shell Request/Artcelestine" (I used Special:Random) - $.uls is undefined [20:46:53] Wouldn't be a surprise if the confih was much out of date [20:47:09] Also on "Salt" [20:47:27] (03CR) 10Edenhill: "Looks good, merge it!" [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/127804 (owner: 10CSteipp) [20:47:35] ULS is needed? [20:47:53] wat [20:48:00] I didn't realize we depended on ULS? [20:48:04] Oh, we have a thing [20:48:10] To backfill it if not installed [20:48:13] Also I notice there's a "Enable VisualEditor. It will be available in the following namespaces: (Main), User" pref in the 'Editing' tab, defaulting to unchecked [20:48:15] ve.init.mw.Platform.prototype.getLanguageAutonym = $.uls.data.getAutonym; [20:48:19] That must not work very well then [20:48:23] Uncaught TypeError: Cannot read property 'data' of undefined [20:49:31] I will look into this once I'm done eating [20:50:27] (03PS1) 10Dzahn: listserver_aliases - fix formatting [operations/puppet] - 10https://gerrit.wikimedia.org/r/150387 [20:50:50] greg-g: Flow has a couple of minor tweaks for our Tuesday window, I updated wikitech Deployments. Efficient and joyful! [20:51:44] RoanKattouw: I usually look after wikitech. I have a few minutes to look at things now, or you can ping me first thing tomorrow... [20:51:53] spagewmf, can you let me and cscott know when you're done [20:51:59] we're going to enable the new pdf renderer [20:52:45] andrewbogott: It's probably just our code being broken, and wikitech is the only place that trips this edge case [20:52:46] mwalker: I was going to start at 2pm SF time and should be done in 30 minutes or less [20:53:13] RoanKattouw: ok. There's a vagrant role for wikitech which may or may not exhibit the same behavior. [20:53:26] And if it doesn't, that's interesting as well :) [20:53:35] spagewmf, awesome -- if it takes you longer no worries (I'm just trying to edge in if there's free space between you and the swat) [20:54:03] !log aaron Synchronized php-1.24wmf14/includes/media: b45248509c07acb8146d6e735ef68dff193ac290 (duration: 00m 07s) [20:54:08] Logged the message, Master [20:55:40] (03CR) 10Andrew Bogott: [C: 032] listserver_aliases - fix formatting [operations/puppet] - 10https://gerrit.wikimedia.org/r/150387 (owner: 10Dzahn) [21:00:04] spagewmf: Respected human, time to deploy Flow (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140729T2100). Please do the needful. [21:00:28] (03CR) 10Dzahn: [C: 032] 26.35.198.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150133 (owner: 10Dzahn) [21:02:54] I need to nuke my bot's watchlist but I go on getting wikimedia errors [21:03:03] (03PS1) 10Mwalker: Enable OCG in production [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150393 [21:03:22] (03CR) 10Dzahn: [C: 032] 3.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150135 (owner: 10Dzahn) [21:04:19] (03CR) 10Spage: [C: 032] "Works on my local wiki." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149077 (owner: 10EBernhardson) [21:04:25] (03Merged) 10jenkins-bot: Enable sandbox page for wikimania user testing [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149077 (owner: 10EBernhardson) [21:05:03] Vito: username? [21:05:12] On what wiki? [21:05:14] bottuzzu @itwiki [21:06:01] Just put my laptop away. Will deal with it in a bit if someone doesn't beat me to it [21:06:39] Reedy: `git fetch origin && git diff HEAD origin` reports unexpected changes to wikiversions.json. ? [21:06:39] ty Reedy, it's not urgent though it's useful to be done [21:07:55] (03CR) 10Dzahn: [C: 032] 59.15.185.in-addr.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150136 (owner: 10Dzahn) [21:08:26] Reedy: seems tin's b097756 Wikivoyages back to 1.24wmf14 is not in origin/master? [21:09:01] Did I not submit it? [21:09:58] Reedy it's not in `git log --oneline -3 origin/master` [21:10:06] Check gerrit [21:10:13] I'm not on my laptop [21:10:19] Need to go home [21:10:44] Reedy: will do, thanks. So tin has the intended code [21:11:11] (03PS2) 10BBlack: 1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150132 (owner: 10Dzahn) [21:11:40] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150132 (owner: 10Dzahn) [21:12:35] (03PS2) 10BBlack: 2.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150134 (owner: 10Dzahn) [21:13:09] (03CR) 10BBlack: [C: 032] "Confirmed whitespace-only." [operations/dns] - 10https://gerrit.wikimedia.org/r/150134 (owner: 10Dzahn) [21:15:34] (03CR) 10Spage: [C: 032] "Reedy already deployed this change, so I'm +2ing after the fact." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150296 (owner: 10Reedy) [21:15:40] (03Merged) 10jenkins-bot: Wikivoyages back to 1.24wmf14 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/150296 (owner: 10Reedy) [21:16:43] bblack: thanks very much . all clean [21:16:46] /dns/templates$ grep -l -P '\t' *.* [21:16:49] = nothing :) [21:17:07] hashar will like [21:17:12] :) [21:18:00] * greg-g walks afk for a bit, didn't really take a lunch break [21:18:22] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:18:29] !log spage updated /a/common to {{Gerrit|I3b4622e27}}: Wikivoyages back to 1.24wmf14 [21:18:35] Logged the message, Master [21:19:21] ^ [21:19:41] ^ "if this creates a merge commit, that means someone has committed a local change without pushing; this isn't the end of the world but it should be fixed, so either fix it yourself or ask for help" . I think that's what the above messages are reporting [21:19:57] (03CR) 10Ori.livneh: [C: 032] Added a streamlined RunJobs that can be used by redisJobService [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149587 (owner: 10Aaron Schulz) [21:20:02] (03Merged) 10jenkins-bot: Added a streamlined RunJobs that can be used by redisJobService [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/149587 (owner: 10Aaron Schulz) [21:21:10] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.02 [21:21:18] ^ ^d [21:21:55] <^d> We need to tweak that metric. [21:22:01] <^d> It's a little too aggressive. [21:22:08] !log ori updated /a/common to {{Gerrit|Ia62e9158f}}: Added a streamlined RunJobs that can be used by redisJobService [21:22:15] Logged the message, Master [21:22:19] !log spage Synchronized wmf-config/InitialiseSettings.php: Enable Flow on Wikimania testing page (duration: 00m 13s) [21:22:24] Logged the message, Master [21:22:50] (03PS1) 10Dzahn: langlist.tmpl - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150397 [21:23:04] ori: heh I was going to poke spage and then saw you already synced it ;) [21:23:51] (03PS4) 10BBlack: normalize_path should not read past the end of the url string [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 [21:24:49] (03CR) 10Dzahn: [C: 032] langlist.tmpl - retab [operations/dns] - 10https://gerrit.wikimedia.org/r/150397 (owner: 10Dzahn) [21:25:44] (03CR) 10BBlack: "I've gone back to the PS2 variant that just fixes the problem noted (reading past the end) and adds a const, and then fixed the underflow " [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [21:26:55] ori: ^ 149119? [21:27:26] OK something is very wrong on wikitech wiki [21:27:51] https://wikitech.wikimedia.org/w/extensions/VisualEditor/lib/ve/lib/jquery.uls/src/jquery.uls.data.js really should not be a 404 [21:28:29] Jeff_Green: do we need MX records for donate-lb.pmtpa.wm.o ? [21:28:36] note the tampa part [21:28:42] RoanKattouw: wikitech is in a weird state, where it has a more up to date mw core than extensions, I think. poke andrewbogott_afk [21:29:10] mutante: afaik no [21:29:37] Jeff_Green: i ask because of the comments on https://gerrit.wikimedia.org/r/#/c/143201/ [21:29:54] it's like the only thing left there that responds [21:30:03] Jeff_Green: yeah the bigger question is, do we need MX on any of donate-lb.$site ? [21:30:41] bblack: we do send mail for donate.wikimedia.org which is a cname for donate-lb.eqiad.wmnet [21:31:12] "send mail for" means receive mail to @donate.wikimedia.org, right? [21:31:38] send mail from whatever@donate.wikimedia.org and ostensibly receive bounce mail back [21:31:46] ah [21:32:05] why we don't just have a simple mx record for donate.wikimedia.org I'm not sure. maybe you can't tie a cname to a mx record? [21:32:06] is that the silverpop part? [21:32:19] you can't have a CNAME and an MX (or anything else) at the same name [21:32:25] (yet another reason CNAMEs suck!) [21:32:27] mutante: either that or civicrm [21:32:30] !log spage ran `mwscript namespaceDupes.php --wiki=enwiki --prefix Topic`, 5 pages renamed [21:32:35] Logged the message, Master [21:32:38] bblack: ok so then there you have it [21:33:03] also, I'm not entirely sure that email to foo@bar with bar CNAME baz then baz MX quux is RFC-legal, although it probably works [21:33:21] I assume it's working today in practice, right? [21:33:59] afaik yes it works in practice [21:34:28] that mail ends up back on the civicrm server, waiting for the day when we actually make civicrm process it like it's supposed to [21:34:34] mwalker: I think I'm done, the rest of the Flow window is yours for only $45 plus shipping and handling [21:34:55] ah shucks [21:34:57] that's a deal! [21:35:22] mutante: yeah so we do need to the other 3, but the pmtpa name isn't configured as a possible geoip response, so we're good to go. [21:35:57] (03CR) 10BBlack: [C: 031] "^ All the above resolved on IRC w/ Jeff, we're good to remove the pmtpa MX here." [operations/dns] - 10https://gerrit.wikimedia.org/r/143201 (owner: 10Dzahn) [21:35:58] bblack: ah, cool, i was almost about to amend and leave only that one in there [21:36:10] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [21:38:11] this is another example of things that will get much simpler with the eventual DNS refactor away from the use of CNAMEs for prod service names [21:41:50] gah, https://tickets.puppetlabs.com/browse/PUP-1062 [21:41:50] sigh [21:41:51] (03CR) 10Ori.livneh: "Tested, works. Small quibble noted inline." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [21:41:56] ^ bblack [21:42:02] spagewmf: thanks [21:42:44] (03CR) 10Dzahn: [C: 031] solr: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/149960 (owner: 10Matanya) [21:43:02] eh fair enough [21:43:03] bblack: of course if you choose to change it to - 3 be sure to change the boundary check to > 2 -- i.e., line 190 should read "const size_t lastConvert = urlLength > 2 ? urlLength - 3 : 0;" [21:43:09] yeah [21:43:28] I quibbled on that myself, but figured the comment somewhat clarified by noting it was a comparator for "i" rather than an index [21:43:31] but the index is clearer [21:43:43] (03CR) 10Dzahn: [C: 04-1] "how about <%= debug_logging %> , <%= spamd_user %> and <%= spamd_group %> ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/149993 (owner: 10Matanya) [21:44:36] Vito: done [21:44:44] !log cleared bottuzzu@itwiki watchlist [21:44:48] Logged the message, Master [21:44:59] (03PS5) 10BBlack: normalize_path should not read past the end of the url string [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 [21:45:01] (03CR) 10Dzahn: [C: 04-1] "now you could try rebasing this because all the tabs are gone, and your other alignment fixes should stay" [operations/dns] - 10https://gerrit.wikimedia.org/r/147168 (https://bugzilla.wikimedia.org/68769) (owner: 10Scottlee) [21:45:10] bblack, around? [21:45:23] * bblack hides [21:45:40] (03CR) 10Ori.livneh: [C: 031] normalize_path should not read past the end of the url string [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [21:46:16] (03CR) 10BBlack: [C: 032] normalize_path should not read past the end of the url string [operations/puppet] - 10https://gerrit.wikimedia.org/r/149119 (owner: 10BBlack) [21:46:38] bblack, hehe :) apparenty opera mini has some complications with javascript (it runs it, so it is not