[00:00:34] (03PS1) 10Jdlrobson: Enable VectorBeta form refresh on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205474 [00:01:21] (03CR) 10Jdlrobson: "Let's put this on beta labs https://gerrit.wikimedia.org/r/205474 so Jared can make a call about whether we want to go ahead with this or " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175406 (https://phabricator.wikimedia.org/T73477) (owner: 10Glaisher) [00:01:27] (03CR) 10Jdlrobson: [C: 031] Enable "Form Refresh" as a BetaFeature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175406 (https://phabricator.wikimedia.org/T73477) (owner: 10Glaisher) [00:07:00] !log krenair Synchronized php-1.26wmf2/extensions/WikiGrok/resources/startup/init.js: https://gerrit.wikimedia.org/r/#/c/205470/ (duration: 00m 13s) [00:07:02] kaldari, ^ [00:07:06] Logged the message, Master [00:07:17] checking…. [00:08:55] Krenair: looks good [00:09:28] kaldari [00:09:34] !log krenair Synchronized php-1.26wmf1/extensions/WikiGrok/resources/startup/init.js: https://gerrit.wikimedia.org/r/#/c/205469/ (duration: 00m 13s) [00:09:37] Logged the message, Master [00:09:38] checking... [00:10:59] Krenair: looks good. Thanks! [00:12:11] yw [00:22:53] hmm icinga for neon says: Icinga configuration contains errors, please check! [00:23:01] and /etc/init.d/icinga check says: Error: Contact group 'pager_testing' specified in service 'check_to_check_nagios_paging' for host 'neon' is not defined anywhere! [00:23:59] it's been in this state for 2.5 hours [00:24:05] * jgage looks for culprit [00:33:54] (03PS1) 10Yuvipanda: tools: Pass in path to MySQL credentials as environment variable [puppet] - 10https://gerrit.wikimedia.org/r/205480 [00:34:06] (03CR) 10jenkins-bot: [V: 04-1] tools: Pass in path to MySQL credentials as environment variable [puppet] - 10https://gerrit.wikimedia.org/r/205480 (owner: 10Yuvipanda) [00:34:10] bblack: ^^ is how [00:34:14] * YuviPanda does rebase [00:35:55] (thanks jgage) [00:36:11] (03PS2) 10Yuvipanda: tools: Pass in path to MySQL credentials as environment variable [puppet] - 10https://gerrit.wikimedia.org/r/205480 [00:37:04] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Pass in path to MySQL credentials as environment variable [puppet] - 10https://gerrit.wikimedia.org/r/205480 (owner: 10Yuvipanda) [00:37:29] (03PS1) 10Gage: check_to_check_nagios_paging: alert admins instead of pager_testing [puppet] - 10https://gerrit.wikimedia.org/r/205482 [00:38:08] (03PS2) 10Gage: check_to_check_nagios_paging: alert admins instead of pager_testing [puppet] - 10https://gerrit.wikimedia.org/r/205482 [00:38:33] ottomata, for analytics1021? no problem :) [00:38:59] (03PS1) 10EBernhardson: Revert "Enable VisualEditor for Flow posts on Beta Labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205483 [00:39:11] (03CR) 10Gage: [C: 032] check_to_check_nagios_paging: alert admins instead of pager_testing [puppet] - 10https://gerrit.wikimedia.org/r/205482 (owner: 10Gage) [00:40:42] (03PS2) 10EBernhardson: Revert "Enable VisualEditor for Flow posts on Beta Labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205483 [00:41:05] (03CR) 10EBernhardson: [C: 032] Revert "Enable VisualEditor for Flow posts on Beta Labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205483 (owner: 10EBernhardson) [00:41:13] (03Merged) 10jenkins-bot: Revert "Enable VisualEditor for Flow posts on Beta Labs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205483 (owner: 10EBernhardson) [00:43:11] !log ebernhardson Synchronized wmf-config/InitialiseSettings-labs.php: keeping prod in sync with labs-only mediawiki-config changes (duration: 00m 14s) [00:43:14] Logged the message, Master [00:54:08] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [00:54:15] finally [01:11:33] (03PS1) 10Rush: Allow catchpoint to request a file from icinga [puppet] - 10https://gerrit.wikimedia.org/r/205486 [01:53:56] (03PS1) 10Yuvipanda: tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) [01:54:47] (03PS2) 10Yuvipanda: tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) [01:55:10] ori: is the debug mode http header documented somewhere? [01:55:39] https://wikitech.wikimedia.org/wiki/Debugging_in_production [01:56:37] (03PS3) 10Yuvipanda: tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) [01:56:49] wheeee [01:57:03] (03PS4) 10Yuvipanda: tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) [01:57:14] (03PS5) 10Yuvipanda: tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) [01:57:22] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Let 5xx errors pass through to client if debug is set [puppet] - 10https://gerrit.wikimedia.org/r/205495 (https://phabricator.wikimedia.org/T66393) (owner: 10Yuvipanda) [02:00:51] (03PS1) 10Yuvipanda: tools: Move error_page directives under location [puppet] - 10https://gerrit.wikimedia.org/r/205497 (https://phabricator.wikimedia.org/T66393) [02:01:03] (03CR) 10jenkins-bot: [V: 04-1] tools: Move error_page directives under location [puppet] - 10https://gerrit.wikimedia.org/r/205497 (https://phabricator.wikimedia.org/T66393) (owner: 10Yuvipanda) [02:01:34] (03PS2) 10Yuvipanda: tools: Move error_page directives under location [puppet] - 10https://gerrit.wikimedia.org/r/205497 (https://phabricator.wikimedia.org/T66393) [02:01:42] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Move error_page directives under location [puppet] - 10https://gerrit.wikimedia.org/r/205497 (https://phabricator.wikimedia.org/T66393) (owner: 10Yuvipanda) [02:06:37] ori: toollabs supports it too now :D [02:06:44] X-Wikimedia-Debug [02:06:50] awesome! [02:11:09] ori: yeah, earlier if you had a 500 it would give you a generic page, now it’ll give you the actual 500 page [02:26:24] !log l10nupdate Synchronized php-1.26wmf1/cache/l10n: (no message) (duration: 05m 56s) [02:26:32] Logged the message, Master [02:30:55] !log LocalisationUpdate completed (1.26wmf1) at 2015-04-21 02:29:52+00:00 [02:31:00] Logged the message, Master [02:54:20] !log l10nupdate Synchronized php-1.26wmf2/cache/l10n: (no message) (duration: 08m 25s) [02:54:25] Logged the message, Master [02:56:08] PROBLEM - puppet last run on mw2031 is CRITICAL puppet fail [03:01:11] !log LocalisationUpdate completed (1.26wmf2) at 2015-04-21 03:00:08+00:00 [03:01:17] Logged the message, Master [03:13:59] RECOVERY - puppet last run on mw2031 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [03:47:40] !log catrope Synchronized php-1.26wmf2/extensions/WikiEditor/: SWAT (duration: 00m 15s) [03:47:46] Logged the message, Master [03:49:01] !log catrope Synchronized php-1.26wmf1/extensions/WikiEditor/: SWAT (duration: 00m 13s) [03:49:05] Logged the message, Master [03:49:14] robh: did your swat just complete or?! [03:50:00] gah [03:50:02] I meant RoanKattouw [03:50:29] Sorry, force of habit [03:50:33] That clearly wasn't actually a SWAT [03:50:42] But an out-of-band deploy of a data gathering fix [03:51:03] ah :) [03:51:40] YuviPanda: he was trying to set wikis ablaze (team!) [04:07:56] 6operations, 10Wikimedia-Apache-configuration, 5Patch-For-Review: wikibooks.org redirects to en.wikibooks.org - https://phabricator.wikimedia.org/T87039#1223222 (10Glaisher) Weird. It wasn't working when the Gerrit change was merged (apparently due to cache) but it was working when I checked the next day and... [04:08:14] 6operations, 10Wikimedia-Apache-configuration: wikibooks.org redirects to en.wikibooks.org - https://phabricator.wikimedia.org/T87039#1223223 (10Glaisher) [04:10:34] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223228 (10vshchepakina) @Dzahn there is a domain problem!!! Go to the store, add items to the cart, click checkout and it redirects you to the main page... [04:17:42] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223231 (10yuvipanda) Hey @vshchepakina! I just added items and clicked checkout, and it's taking me to a checkout.shopify.com URL. [04:18:13] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223232 (10vshchepakina) @Dzahn shop.wikimedia.org just kept pulling the website back to that url so I had to delete it. [04:18:56] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223233 (10jeremyb) what does "had to delete it" mean? [04:28:03] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223238 (10vshchepakina) @jeremyb We deleted the url (shop.wikimedia.org) that was redirecting the cart page back to itself. It was creating a loop. [04:29:36] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223245 (10jeremyb) Who deleted it from where? your web browser? Were you able to replicate this behavior on multiple devices? [04:32:19] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1223255 (10Glaisher) >>! In T96468#1223032, @Legoktm wrote: > If possible, it would be appreciated if we could delay this until the majority of SULF ren... [04:32:56] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1223256 (10Glaisher) [04:35:41] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223264 (10vshchepakina) We deleted it from the Shopify admin. And yes, I can replicate this behavior on multiple devices. [04:42:36] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1223276 (10jeremyb) ok, and now everything's good/ [04:46:40] (03CR) 10MZMcBride: "I wonder if there's a term for this.... I view this is as basically anti-dogfooding. Instead of feeling the pain of the bad search interfa" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204536 (https://phabricator.wikimedia.org/T94856) (owner: 10Glaisher) [05:11:15] Hmm [05:11:18] Database error [05:15:23] Nevermind.. [05:46:05] (03PS2) 10Springle: Lowered "max lag" to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204843 (owner: 10Aaron Schulz) [05:46:37] (03CR) 10Springle: [C: 032] Lowered "max lag" to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204843 (owner: 10Aaron Schulz) [05:46:43] (03Merged) 10jenkins-bot: Lowered "max lag" to 10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204843 (owner: 10Aaron Schulz) [05:47:56] !log springle Synchronized wmf-config/db-eqiad.php: reduce max lag to 10s, gerrit 204843 (duration: 00m 12s) [05:48:03] Logged the message, Master [05:48:19] !log springle Synchronized wmf-config/db-codfw.php: reduce max lag to 10s, gerrit 204843 (duration: 00m 12s) [05:48:22] Logged the message, Master [05:52:18] (03PS1) 10Springle: depool db1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205531 [05:52:52] (03CR) 10Springle: [C: 032] depool db1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205531 (owner: 10Springle) [05:52:59] (03Merged) 10jenkins-bot: depool db1019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205531 (owner: 10Springle) [05:53:51] !log springle Synchronized wmf-config/db-eqiad.php: depool db1019 (duration: 00m 12s) [05:53:56] Logged the message, Master [06:01:22] !log fixed invalid accounts due to bad SULF renames on bat_smgwiki [06:01:28] Logged the message, Master [06:15:59] (03PS1) 10Springle: upgrade db1019 [puppet] - 10https://gerrit.wikimedia.org/r/205533 [06:17:09] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 21 06:16:06 UTC 2015 (duration 16m 5s) [06:17:15] Logged the message, Master [06:18:49] (03CR) 10Springle: [C: 032] upgrade db1019 [puppet] - 10https://gerrit.wikimedia.org/r/205533 (owner: 10Springle) [06:23:56] (03CR) 10Yurik: [C: 031] "Looks ok to me. Overall, I get a feeling of "too many moving parts" - adding one service requires modification of 10 files, plus 3 new. I " [puppet] - 10https://gerrit.wikimedia.org/r/205350 (https://phabricator.wikimedia.org/T90487) (owner: 10Mobrovac) [06:29:39] PROBLEM - puppet last run on iron is CRITICAL Puppet has 2 failures [06:29:39] PROBLEM - puppet last run on cp1056 is CRITICAL Puppet has 1 failures [06:30:09] PROBLEM - puppet last run on db1015 is CRITICAL Puppet has 1 failures [06:30:19] PROBLEM - puppet last run on mw2043 is CRITICAL Puppet has 1 failures [06:30:19] PROBLEM - puppet last run on analytics1030 is CRITICAL Puppet has 2 failures [06:30:39] PROBLEM - puppet last run on elastic1022 is CRITICAL Puppet has 1 failures [06:30:48] PROBLEM - puppet last run on wtp2015 is CRITICAL Puppet has 2 failures [06:30:49] PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 1 failures [06:31:09] PROBLEM - puppet last run on mw1009 is CRITICAL Puppet has 1 failures [06:31:18] PROBLEM - puppet last run on cp3042 is CRITICAL Puppet has 1 failures [06:33:39] PROBLEM - puppet last run on mw1119 is CRITICAL Puppet has 2 failures [06:33:49] PROBLEM - puppet last run on mw1144 is CRITICAL Puppet has 3 failures [06:35:19] PROBLEM - puppet last run on mw2097 is CRITICAL Puppet has 1 failures [06:36:19] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:37:28] !log xtrabackup clone db1027 to db1019 [06:37:32] Logged the message, Master [06:43:20] (03Abandoned) 10Giuseppe Lavagetto: LVS: add api and apaches for codfw [puppet] - 10https://gerrit.wikimedia.org/r/196067 (https://phabricator.wikimedia.org/T92377) (owner: 10Dzahn) [06:45:21] RECOVERY - puppet last run on mw1009 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:45:33] RECOVERY - puppet last run on iron is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:45:33] RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:45:41] RECOVERY - puppet last run on cp1056 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:42] RECOVERY - puppet last run on mw2043 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:45:52] RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:12] RECOVERY - puppet last run on mw2097 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on db1015 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:21] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:46:52] RECOVERY - puppet last run on analytics1030 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:47:12] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:22] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:42] RECOVERY - puppet last run on elastic1022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:42] RECOVERY - puppet last run on wtp2015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:19:39] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1223535 (10mobrovac) [07:20:04] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1155079 (10mobrovac) >>! In T94093#1222806, @GWicke wrote: > For now, we need: > > - ability to run node and pixz as a plain user With a normal shell access we should be able to do tha... [07:20:59] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1223537 (10mobrovac) a:5mobrovac>3RobH @RobH I have changed the task description to list the exact things we need, so reassigning back to you. [07:28:54] 6operations: Java security updates (CPU 2014) - https://phabricator.wikimedia.org/T96125#1223557 (10MoritzMuehlenhoff) CVE-2015-0491 and CVE-2015-0459 are specific to Oracle Java and don't affect the Java packages (icedtea/openjdk) as used by us. [07:55:07] (03PS1) 10Ori.livneh: coal: initial frontend [puppet] - 10https://gerrit.wikimedia.org/r/205544 [07:55:48] (03PS2) 10Ori.livneh: coal: initial frontend [puppet] - 10https://gerrit.wikimedia.org/r/205544 [07:56:06] (03CR) 10Ori.livneh: [C: 032 V: 032] "I'll move this to a separate repository sometime this week." [puppet] - 10https://gerrit.wikimedia.org/r/205544 (owner: 10Ori.livneh) [08:03:21] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 2 below the confidence bounds [08:04:29] YuviPanda: I think that was an artifact of the graph, if you append format=json it returns integers [08:31:32] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 1 below the confidence bounds [08:36:50] 6operations, 7HHVM: luasandbox is failing with HHVM 3.6 - https://phabricator.wikimedia.org/T96661#1223637 (10Joe) 3NEW [08:40:16] _joe_: ouch :( [08:40:34] 6operations, 7HHVM: luasandbox is failing with HHVM 3.6 - https://phabricator.wikimedia.org/T96661#1223645 (10Joe) The error comes from Extension::CompileSystemlib in file hphp/runtime/ext/extension.cpp This is probably due to some breaking change in how native extensions are treated? [08:40:49] <_joe_> ori: heh. [08:41:25] <_joe_> ori: I'll speak with mark, I think we need a few people that devote part of their time to HHVM continuously [08:41:37] <_joe_> or we won't be able to keep up with it [08:42:00] i think we'll have to re-do luasandbox using HNI and ditch the zend compat layer [08:42:04] which will make tim very sad [08:42:08] since he worked very hard on it [08:42:12] <_joe_> I know [08:42:14] but it looks like upstream maintainership for it is zilch [08:43:24] going to deploy cherry-picks of https://gerrit.wikimedia.org/r/#/c/205345/ in a moment to ease pressure on carbon [08:44:00] <_joe_> carbon? [08:44:14] <_joe_> graphite you mean? [08:44:24] (03CR) 10Muehlenhoff: "memcached also runs as nobody (as does ircecho, but little harm could be done here). nobody is also typically used in various aspects of N" [puppet] - 10https://gerrit.wikimedia.org/r/174896 (owner: 10Hoo man) [08:44:28] yeah [08:44:49] (this is coordinated with godog) [08:45:37] *nod* [08:45:59] 6operations, 7Graphite: audit graphite retention schemas - https://phabricator.wikimedia.org/T96662#1223660 (10fgiunchedi) p:5Triage>3Normal [08:46:12] PROBLEM - puppet last run on cp4006 is CRITICAL puppet fail [08:46:19] !log ori Synchronized php-1.26wmf2/includes/jobqueue: Ifa478996f: Revert 'Added per-wiki queue stats information' (duration: 00m 13s) [08:46:24] Logged the message, Master [08:47:42] !log ori Synchronized php-1.26wmf1/includes/jobqueue: Ifa478996f: Revert 'Added per-wiki queue stats information' (duration: 00m 12s) [08:47:45] Logged the message, Master [08:48:00] godog: {{done}} [08:48:28] \o/ thanks ori [08:49:08] i'm going to stay up for a few more minutes for due diligence since i just synced, but gonna hit the sack after that. good night [08:51:07] sounds good, good night [08:56:42] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 1 below the confidence bounds [08:57:43] _joe_: i think i got it [08:58:17] <_joe_> ori: oh, really? I was testing the other extensions too at the moment [08:58:26] https://github.com/facebook/hhvm/commit/6b94cd47463883b0f1682cdf84b93da511e2fc15 [08:58:52] just have to drop the ': mixed;' [08:58:55] i'll submit a patch [08:59:43] 6operations, 7HHVM: luasandbox is failing with HHVM 3.6 - https://phabricator.wikimedia.org/T96661#1223694 (10ori) Looks like this was broken by https://github.com/facebook/hhvm/commit/6b94cd47463883b0f1682cdf84b93da511e2fc15 . [09:00:12] <_joe_> \o/ [09:02:21] _joe_: https://gerrit.wikimedia.org/r/#/c/205549/ (totally untested but should work) [09:02:35] now i'm really off [09:02:43] <_joe_> ori: ok thanks :) [09:02:55] <_joe_> sleep well [09:03:01] RECOVERY - puppet last run on cp4006 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [09:08:27] (03PS1) 10Filippo Giunchedi: nova: install ::mediawiki::cgroup [puppet] - 10https://gerrit.wikimedia.org/r/205553 (https://phabricator.wikimedia.org/T92712) [09:15:53] 6operations, 7Graphite: revisit what percentiles are calculated by statsite - https://phabricator.wikimedia.org/T88662#1223740 (10fgiunchedi) [09:17:30] <_joe_> ori: it works, <3 [09:33:13] 6operations, 7Graphite: revisit what percentiles are calculated by statsite - https://phabricator.wikimedia.org/T88662#1223809 (10fgiunchedi) we do have median and p95/p99, we could add p98 too but is it worth it? [09:36:41] 6operations: Upgrade xenon, cerium and praseodymium to jessie - https://phabricator.wikimedia.org/T90955#1223827 (10fgiunchedi) @gwicke @mobrovac @eevans is the test cluster in use? if not I'd like to start reimaging tomorrow with jessie [09:39:04] 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-JobRunner, 7Graphite: jobrunner metrics audit - https://phabricator.wikimedia.org/T95913#1223836 (10fgiunchedi) a:3fgiunchedi [09:42:46] 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, 7Nodepool, and 2 others: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1223839 (10hashar) [09:57:48] (03PS1) 10Hashar: Support spaces in Gearman functions names [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/205564 [09:58:11] (03CR) 10Hashar: "Sent upstream to: https://review.openstack.org/#/c/175791/" [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/205564 (owner: 10Hashar) [10:06:28] ARHRAHAZEAZE [10:06:31] Debian [10:09:06] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: enable authenticated access to Cassandra JMX - https://phabricator.wikimedia.org/T92471#1223912 (10fgiunchedi) >>! In T92471#1220602, @GWicke wrote: > We can enforce localhost-only access from a specific group or root with iptables. @Dzahn a... [10:11:33] (03PS8) 10Hashar: Initial Debian packaging [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) [10:13:22] (03CR) 10Hashar: "Modified changelog version to use a -wmf suffix and the distribution jessie-wikimedia:" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) (owner: 10Hashar) [10:15:23] (03PS2) 10Hashar: Support spaces in Gearman functions names [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/205564 [10:23:31] (03PS1) 10Hashar: wmf2: patch to support spaces in Gearman functions [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/205571 [10:26:32] (03CR) 10Hashar: "Note: this change has some follow up to include custom patches pending review upstream. So it is no more directly deployed on labnodepool1" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) (owner: 10Hashar) [10:27:18] (03CR) 10Hashar: "That creates nodepool_0.0.1-104-gddd6003-wmf2_amd64.deb which I have deployed on labnodepool1001" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/205571 (owner: 10Hashar) [10:35:01] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [10:37:34] 6operations: Multiple PHP security issues - https://phabricator.wikimedia.org/T96586#1223981 (10MoritzMuehlenhoff) CVE-2015-2305, CVE-2015-2783,CVE-2015-2787,CVE-2015-3329 and CVE-2015-3330 do not affect HHVM. CVE-2015-2348 needs to be discussed with upstream [10:49:50] (03CR) 10Filippo Giunchedi: [C: 031] Change BZ references to Phabricator tickets [puppet] - 10https://gerrit.wikimedia.org/r/204626 (owner: 10Alex Monk) [10:50:27] (03CR) 10Filippo Giunchedi: "does this have a phabricator ticket? (SCNR)" [puppet] - 10https://gerrit.wikimedia.org/r/204626 (owner: 10Alex Monk) [10:52:24] (03CR) 10Filippo Giunchedi: [C: 031] contint: allow gearman/zeromq from labnodepool1001 [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [10:52:54] godog: if you feel brave, we can get it merged and applied :) [10:53:44] hashar: sorry I can't ATM, was flushing the review queue [10:54:05] godog: ok :) [10:54:33] godog: oh and on Friday I successfully switched the Zuul prod server to use the Debian package \o/ [10:54:38] thank you a ton for all the reviews! [10:54:41] nice one! [10:55:04] no problem, took a few iterations [11:07:13] (03CR) 10Filippo Giunchedi: "python-statsd from https://github.com/jsocol/pystatsd is in debian with packages at http://anonscm.debian.org/viewvc/python-modules/packag" [debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [11:08:21] (03CR) 10Filippo Giunchedi: "no I don't think so, there are gdash-based dashboards though https://gdash.wikimedia.org/dashboards/swift.eqiad-prod/" [puppet] - 10https://gerrit.wikimedia.org/r/170007 (owner: 10Alexandros Kosiaris) [11:10:23] PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 3 failures [11:13:00] PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 3 failures [11:13:00] PROBLEM - puppet last run on mw1208 is CRITICAL Puppet has 1 failures [11:13:22] PROBLEM - puppet last run on mw1238 is CRITICAL Puppet has 3 failures [11:13:31] PROBLEM - puppet last run on mw1180 is CRITICAL Puppet has 1 failures [11:14:21] PROBLEM - puppet last run on mw1074 is CRITICAL Puppet has 1 failures [11:25:16] _joe_: akosiaris 503's [11:25:25] <_joe_> what page? [11:25:35] https://meta.wikimedia.org/wiki/Special:GlobalRenameQueue/open [11:25:40] If you report this error to the Wikimedia System Administrators, please include the details below. [11:25:40] 503 Service Temporarily Unavailable [11:26:07] <_joe_> that's most probably a problem with that page getting over the memory limit or something like that [11:26:09] works fine for me [11:26:28] now it is intermediate [11:27:01] RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [11:27:09] thanks, i guess that is what stewards deserve :D [11:27:12] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [11:27:17] https://gdash.wikimedia.org/dashboards/reqerror/ does point out a slightly elevated rate [11:28:26] <_joe_> akosiaris: 200 5xx/min is basically noise [11:28:31] RECOVERY - puppet last run on mw1208 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:29:01] seems back to normal now. [11:29:52] RECOVERY - puppet last run on mw1238 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:30:02] RECOVERY - puppet last run on mw1180 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:32:30] RECOVERY - puppet last run on mw1074 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:32:51] (03PS1) 10Filippo Giunchedi: gdash: restore median vs mean [puppet] - 10https://gerrit.wikimedia.org/r/205581 (https://phabricator.wikimedia.org/T88662) [11:32:53] (03PS1) 10Filippo Giunchedi: gdash: switch to p95 from p99 [puppet] - 10https://gerrit.wikimedia.org/r/205582 (https://phabricator.wikimedia.org/T88662) [12:08:08] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1224156 (10faidon) >>! In T95253#1221187, @Eevans wrote: >> I took a look at running multiple cassandra hosts on the same host, it looks like we'd need a different i... [12:20:51] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 9 below the confidence bounds [12:24:15] 6operations, 10Datasets-General-or-Unknown, 10Wikidata, 10Wikidata-Sprint-2015-04-21, and 2 others: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1224189 (10Tobi_WMDE_SW) [12:44:21] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 726.340917589 [12:44:50] 6operations, 10Deployment-Systems, 6Services: Automate compiling service dependencies using production Jessie libraries - https://phabricator.wikimedia.org/T94611#1224227 (10mobrovac) [12:48:20] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: switch to p95 from p99 [puppet] - 10https://gerrit.wikimedia.org/r/205582 (https://phabricator.wikimedia.org/T88662) (owner: 10Filippo Giunchedi) [12:49:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: restore median vs mean [puppet] - 10https://gerrit.wikimedia.org/r/205581 (https://phabricator.wikimedia.org/T88662) (owner: 10Filippo Giunchedi) [12:54:11] PROBLEM - puppet last run on mw1183 is CRITICAL Puppet has 1 failures [12:54:11] PROBLEM - puppet last run on mw1181 is CRITICAL Puppet has 4 failures [12:55:51] PROBLEM - puppet last run on mw1171 is CRITICAL Puppet has 3 failures [13:08:37] (03PS7) 10Filippo Giunchedi: graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) [13:09:06] PROBLEM - puppet last run on mw1228 is CRITICAL Puppet has 1 failures [13:09:07] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [13:09:32] (03CR) 10jenkins-bot: [V: 04-1] graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) (owner: 10Filippo Giunchedi) [13:10:17] RECOVERY - puppet last run on mw1181 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [13:10:18] RECOVERY - puppet last run on mw1183 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [13:10:48] (03CR) 10Filippo Giunchedi: graphite: introduce carbon-c-relay (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) (owner: 10Filippo Giunchedi) [13:11:08] RECOVERY - puppet last run on mw1171 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [13:11:47] PROBLEM - puppet last run on mw1206 is CRITICAL Puppet has 4 failures [13:12:20] (03PS8) 10Filippo Giunchedi: graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) [13:14:24] 6operations, 7Graphite, 5Patch-For-Review: revisit what percentiles are calculated by statsite - https://phabricator.wikimedia.org/T88662#1224304 (10fgiunchedi) I've restored median and changed p99 with p95, p98 seems redudant and I'm not sure about p75 either. thoughts? [13:15:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 6 below the confidence bounds [13:17:56] PROBLEM - Varnishkafka Delivery Errors per minute on cp4006 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:18:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4007 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:20:16] PROBLEM - Varnishkafka Delivery Errors per minute on cp4014 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:22:57] RECOVERY - Varnishkafka Delivery Errors per minute on cp4006 is OK Less than 1.00% above the threshold [0.0] [13:23:20] 6operations: adjust CirrusSearch monitoring - https://phabricator.wikimedia.org/T84163#1224311 (10fgiunchedi) [13:23:37] RECOVERY - Varnishkafka Delivery Errors per minute on cp4014 is OK Less than 1.00% above the threshold [0.0] [13:23:37] (03Abandoned) 10Alexandros Kosiaris: Add unpuppetized ganglia swift views [puppet] - 10https://gerrit.wikimedia.org/r/170007 (owner: 10Alexandros Kosiaris) [13:23:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp4007 is OK Less than 1.00% above the threshold [0.0] [13:25:17] PROBLEM - puppet last run on mw2099 is CRITICAL Puppet has 5 failures [13:25:47] RECOVERY - puppet last run on mw1228 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [13:28:36] RECOVERY - puppet last run on mw1206 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:34:56] PROBLEM - puppet last run on mw2194 is CRITICAL Puppet has 1 failures [13:35:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp4005 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:36:35] hashar: https://phabricator.wikimedia.org/T86170 is resolved, right? [13:36:36] PROBLEM - Varnishkafka Delivery Errors per minute on cp4015 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:36:53] andrewbogott: yeah indeed [13:36:56] let me close it :) [13:37:26] as soon as my mail client unfreezes I will respond to that other ticket [13:37:55] andrewbogott: nodepool managed to boot its first instance [13:38:03] that’s great! [13:38:14] though it stalls because our firstboot.sh script relies on puppetVar / puppetClass which are set in LDAP [13:38:33] and are not available when spawning an instance directly via the openstack api [13:39:34] is there a ticket for that? I thought I saw one before my mail client freaked out [13:39:36] PROBLEM - puppet last run on mw2157 is CRITICAL Puppet has 1 failures [13:39:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp4013 is CRITICAL 11.11% of data above the critical threshold [20000.0] [13:40:40] andrewbogott: "Instances created by Nodepool cant run puppet due to missing certificate" https://phabricator.wikimedia.org/T96670 [13:41:08] RECOVERY - Varnishkafka Delivery Errors per minute on cp4005 is OK Less than 1.00% above the threshold [0.0] [13:41:12] which has its root cause somewhere in firstboot.sh expecting a PuppetVar from ldap :D [13:41:46] RECOVERY - Varnishkafka Delivery Errors per minute on cp4015 is OK Less than 1.00% above the threshold [0.0] [13:41:51] I found out that neither the openstack nor the ec2 API expose the project name :/ [13:42:06] RECOVERY - puppet last run on mw2099 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [13:43:06] PROBLEM - puppet last run on mw2119 is CRITICAL Puppet has 1 failures [13:43:27] (03PS2) 10Faidon Liambotis: Allow Catchpoint to request a file from icinga [puppet] - 10https://gerrit.wikimedia.org/r/205486 (https://phabricator.wikimedia.org/T95758) (owner: 10Rush) [13:43:40] (03CR) 10Faidon Liambotis: [C: 031] Allow Catchpoint to request a file from icinga [puppet] - 10https://gerrit.wikimedia.org/r/205486 (https://phabricator.wikimedia.org/T95758) (owner: 10Rush) [13:44:57] RECOVERY - Varnishkafka Delivery Errors per minute on cp4013 is OK Less than 1.00% above the threshold [0.0] [13:47:03] hashar: I added a subtask — I’ll think about what corners I can cut in the meantime. [13:48:17] hashar: Your new instances are actually already creating dummy ldap records as a proof of concept. So that part (presumably, the hard part) is working. At the moment those records can’t be picked up by firstboot due to intentionally obfuscated domain names… but you can see them if you dig around in ldap. [13:51:03] andrewbogott: yeah seems the basic capabilities are working just fine :) [13:51:36] fully migrating to horizon is going to be fun [13:51:46] RECOVERY - puppet last run on mw2194 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:51:49] there is a lot of bits that rely on OpenStackManager [13:53:09] andrewbogott: I also looked at the 'disk image builder' utility. It requires full root access apparently :((( [13:53:16] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 9090.42969555 [13:53:35] hashar: the full migration is a big job, bug if instances can be created/deleted then users can switch back and forth between Horizon and OSM as appropriate in the meantime. [13:53:43] hashar: it shouldn’t need root, you can build images with ‘fakeroot' [13:53:53] oh [13:54:11] hashar, btw, can you please read https://phabricator.wikimedia.org/T96678 and comment? I’d like to do that today if there are no objections — I’ll bring it up in the meeting later. [13:54:45] doing [13:56:17] hashar: are you talking about building images by hand, or having nodepool (or something) build images automatically? [13:56:24] later [13:56:26] RECOVERY - puppet last run on mw2157 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:56:40] nodepool uses https://pypi.python.org/pypi/diskimage-builder [13:56:43] you can also build images /on/ a labs instance, which can solve the ‘needs root’ issue. [13:56:51] but I must say I havent thoroughly looked at it yet [13:57:06] dammit there are too many competing tools for image creation :( [13:57:13] (03PS1) 10Filippo Giunchedi: logging: update CirrusSearch thresholds [puppet] - 10https://gerrit.wikimedia.org/r/205603 (https://phabricator.wikimedia.org/T84163) [13:57:20] that ones comes from their tripleO project [13:57:34] a way to run openstack over openstack (as I understand it) [13:58:07] RECOVERY - puppet last run on mw2119 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [13:58:26] 6operations, 5Patch-For-Review: adjust CirrusSearch monitoring - https://phabricator.wikimedia.org/T84163#1224392 (10fgiunchedi) a:3fgiunchedi [14:00:19] commented [14:00:19] andrewbogott: https://phabricator.wikimedia.org/T96678#1224401 :) [14:00:19] off to lead a meeting [14:01:20] * hashar going to start the CI weekly triage meeting in #wikimedia-office [14:02:48] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224417 (10Anomie) The `/api/{name}/` pattern seems ok, although I still find "v1" to be a strange name for restbase. [14:11:08] (03CR) 10Manybubbles: [C: 031] logging: update CirrusSearch thresholds [puppet] - 10https://gerrit.wikimedia.org/r/205603 (https://phabricator.wikimedia.org/T84163) (owner: 10Filippo Giunchedi) [14:12:57] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, and 3 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1224467 (10hashar) 5Resolved>3Open [14:13:09] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 5Continuous-Integration-Isolation, and 3 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#489927 (10hashar) Reopening, the package hasn't been provided for Trusty on apt.wikimedia.org. [14:15:38] !log enwiki master under unusual jobrunner load, not terminal but see https://phabricator.wikimedia.org/T96686 [14:15:39] !log legoktm Synchronized php-1.26wmf1/extensions/OAI/OAIHooks.php: better debugging for T96686 (duration: 00m 11s) [14:15:46] Logged the message, Master [14:15:49] Logged the message, Master [14:17:25] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224475 (10mobrovac) >>! In T95229#1224417, @Anomie wrote: > The `/api/{name}/` pattern seems ok, although I still find "v1" to be a strange name for restbase.... [14:20:37] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224477 (10Anomie) >>! In T95229#1224475, @mobrovac wrote: > I don't see //v1// as the //name for RESTBase//, but rather as the //version of the content API//.... [14:24:10] (03CR) 10Anomie: CX: Enable Content Translation in given wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204722 (https://phabricator.wikimedia.org/T95848) (owner: 10KartikMistry) [14:36:37] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 3 below the confidence bounds [14:44:21] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1224509 (10Eevans) >>! In T95253#1224156, @faidon wrote: >>>! In T95253#1221187, @Eevans wrote: >>> I took a look at running multiple cassandra hosts on the same hos... [14:45:06] move [14:51:26] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224521 (10BBlack) Where are we going to place non-restbase services in this URL path schema relative to restbase? I assume "page" isn't the name of restbase e... [14:53:13] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1224524 (10MoritzMuehlenhoff) I have evaluated "pass" by Jason Donenfeld (passwordstorage.org) and "pws" by Peter Palfrader and various contributors from the Debian System Administrators team.(http://code.google.com/p/pwstore/, t... [14:55:15] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224530 (10mobrovac) >>! In T95229#1224477, @Anomie wrote: > Above Gabriel claimed "v1" was the name. If it's the version then it's not following the `/api/{nam... [15:00:04] manybubbles, anomie, ^d, thcipriani, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150421T1500). Please do the needful. [15:00:26] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [15:00:28] * marktraceur hasn't had coffee, should probably not do it [15:01:00] * greg-g pushes down on his french press [15:01:02] anomie: fixing [15:01:02] kart_: Hi there, looks like you're the only one, are you around? [15:01:06] Oh, good. [15:01:19] (03PS4) 10KartikMistry: CX: Enable Content Translation in given wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204722 (https://phabricator.wikimedia.org/T95848) [15:01:26] marktraceur: now :) [15:01:31] akosiaris: around? [15:01:54] 6operations, 6Services, 7Service-Architecture: Proxying new services through RESTBase - https://phabricator.wikimedia.org/T96688#1224550 (10mobrovac) 3NEW [15:03:04] akosiaris: godog Please merge https://gerrit.wikimedia.org/r/#/c/204725/ :) [15:04:37] (03PS3) 10Filippo Giunchedi: CX: Enable Content Translation in given wikis [puppet] - 10https://gerrit.wikimedia.org/r/204725 (https://phabricator.wikimedia.org/T95848) (owner: 10KartikMistry) [15:04:41] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1224566 (10mobrovac) >>! In T95229#1224521, @BBlack wrote: > Where are we going to place non-restbase services in this URL path schema relative to restbase? I... [15:04:44] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] CX: Enable Content Translation in given wikis [puppet] - 10https://gerrit.wikimedia.org/r/204725 (https://phabricator.wikimedia.org/T95848) (owner: 10KartikMistry) [15:05:07] kart_: done [15:05:10] godog: thanks! [15:05:17] marktraceur: we can go ahead. [15:05:41] marktraceur: kart_ I can swat this morning, if you don't want to [15:05:53] mobrovac: good to go ahead tomorrow with https://phabricator.wikimedia.org/T90955 ? [15:07:03] thcipriani: that is helpful [15:07:10] heh, kk, swatting [15:07:15] godog: probably, we'll let you know on the ticket by the end of the day today [15:07:22] thnx [15:08:33] (03CR) 10Thcipriani: [C: 032] CX: Enable Content Translation in given wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204722 (https://phabricator.wikimedia.org/T95848) (owner: 10KartikMistry) [15:08:41] (03Merged) 10jenkins-bot: CX: Enable Content Translation in given wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204722 (https://phabricator.wikimedia.org/T95848) (owner: 10KartikMistry) [15:08:53] (03PS2) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [15:09:41] mobrovac: sweet thanks [15:11:17] !log thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT [[gerrit:204722]] (duration: 00m 11s) [15:11:21] Logged the message, Master [15:11:26] ^ kart_ [15:11:29] thcipriani: Sorry, I will definitely be able to tomorrow, I'm getting back in the swing of things here [15:11:43] marktraceur: I need all the practice I can get :) [15:12:40] manybubbles: the new logstash servers...any particular place you want them (rack/row with a specific server?) [15:12:59] bd808: ^^^^^ - I don't think it matters [15:13:18] cmjohnson1: spread them out so that a switch going down doesn't kill them all again please [15:13:31] other than that no requirements that I can think of [15:14:00] thcipriani: cool. [15:14:01] ^ hah...okay will do, thx [15:14:05] thcipriani: thanks! [15:14:31] cmjohnson1: At some point I'd like to have 2 of the current ones moved too for the same reason [15:14:33] kart_: yw [15:25:56] 6operations, 10ops-eqiad: Rack and Setup (3) Logstash Servers - https://phabricator.wikimedia.org/T96692#1224681 (10Cmjohnson) 3NEW a:3Cmjohnson [15:36:58] we're still in the SWAT window right? [15:37:29] (03PS2) 10Anomie: Revert "Revert of Iab860b8a5: Make puppet cronjob to run SecurePoll/cli/purgePrivateVoteData.php" [puppet] - 10https://gerrit.wikimedia.org/r/184637 [15:37:56] we are [15:41:47] PROBLEM - puppet last run on cp3020 is CRITICAL puppet fail [15:45:06] !log legoktm Synchronized php-1.26wmf2/extensions/OAI/OAIHooks.php: better debugging for T96686 (duration: 00m 11s) [15:45:12] Logged the message, Master [15:53:51] (03PS3) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [15:58:10] (03Abandoned) 10Gage: initial debianization [debs/python-statsd] - 10https://gerrit.wikimedia.org/r/131449 (owner: 10Gage) [16:00:16] RECOVERY - puppet last run on cp3020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:03:17] PROBLEM - puppet last run on cp3032 is CRITICAL puppet fail [16:05:28] (03CR) 10Bartosz Dziewoński: "Upstream is not receptive (they have a point, too), so I'll poke at this." [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [16:20:07] RECOVERY - puppet last run on cp3032 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:22:43] (03CR) 10Dzahn: [C: 031] "lgtm, i just wouldn't know how to actually which IP addresses it uses" [puppet] - 10https://gerrit.wikimedia.org/r/205486 (https://phabricator.wikimedia.org/T95758) (owner: 10Rush) [16:25:46] (03CR) 10Dzahn: "will it really work with "srange => $nodepool_host," when $nodepool_host is just the IP, i would think it has to be a network so end in /3" [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [16:27:32] (03CR) 1020after4: [C: 032] Beta: Add wikis for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202689 (https://phabricator.wikimedia.org/T90683) (owner: 10KartikMistry) [16:27:39] (03Merged) 10jenkins-bot: Beta: Add wikis for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202689 (https://phabricator.wikimedia.org/T90683) (owner: 10KartikMistry) [16:34:35] 10ops-fundraising, 10Fundraising-Backlog: Need Civi access for Donor Services agent - https://phabricator.wikimedia.org/T95011#1224861 (10Jgreen) great, I'll get her cert done today [16:35:03] (03PS3) 10Rush: Allow Catchpoint to request a file from icinga [puppet] - 10https://gerrit.wikimedia.org/r/205486 (https://phabricator.wikimedia.org/T95758) [16:35:20] (03CR) 10Rush: [C: 032 V: 032] Allow Catchpoint to request a file from icinga [puppet] - 10https://gerrit.wikimedia.org/r/205486 (https://phabricator.wikimedia.org/T95758) (owner: 10Rush) [16:36:27] greg-g: I'm going to backport and deploy an OAI fix that's currently hitting enwiki's master due to SULF: https://phabricator.wikimedia.org/T96686 [16:40:20] !log legoktm Synchronized php-1.26wmf2/extensions/OAI/OAIHooks.php: Don't try to update up_page=0 if page moves suppressed redirects (duration: 00m 11s) [16:40:26] Logged the message, Master [16:40:48] legoktm: expo facto "kk" [16:40:58] !log legoktm Synchronized php-1.26wmf1/extensions/OAI/OAIHooks.php: Don't try to update up_page=0 if page moves suppressed redirects (duration: 00m 13s) [16:41:01] Logged the message, Master [16:43:56] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [16:44:48] (03PS1) 10Ottomata: Add impala and llama CDH pacakges to reprepro updates [puppet] - 10https://gerrit.wikimedia.org/r/205632 (https://phabricator.wikimedia.org/T96329) [16:44:56] 10ops-fundraising, 10Fundraising-Backlog: Need Civi access for Donor Services agent - https://phabricator.wikimedia.org/T95011#1224917 (10CCogdill_WMF) Thank you! [16:45:28] (03PS2) 10Ottomata: Add impala and llama CDH pacakges to reprepro updates [puppet] - 10https://gerrit.wikimedia.org/r/205632 (https://phabricator.wikimedia.org/T96329) [16:46:06] (03CR) 10Ottomata: [C: 032 V: 032] Add impala and llama CDH pacakges to reprepro updates [puppet] - 10https://gerrit.wikimedia.org/r/205632 (https://phabricator.wikimedia.org/T96329) (owner: 10Ottomata) [17:01:30] (03PS1) 10Rush: git::install do fetch tags explicitly [puppet] - 10https://gerrit.wikimedia.org/r/205634 [17:05:17] (03PS3) 10Bartosz Dziewoński: Preserve order of 'maniphest.statuses' in Phabricator settings [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) [17:08:37] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1224990 (10Dzahn) Oh! So this was a setting in the shop admin ui, right? I did not make any changes there, i only focused on the redirect on our servers.... [17:09:21] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1224993 (10vshchepakina) Everything is ok! The store is up and running! [17:09:50] (03CR) 10Bartosz Dziewoński: "I have no idea if this has even a chance of working; I don't know anything about Puppet. Also, on a scale from mildly to very alarming, th" [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [17:10:56] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1224998 (10Dzahn) 5Resolved>3declined Great! Happy we can call it resolved then in time. [17:12:22] 10ops-fundraising, 10Fundraising-Backlog: Need Civi access for Donor Services agent - https://phabricator.wikimedia.org/T95011#1225000 (10CCogdill_WMF) @k4-713 Kristie has her cert set up! Can you reset her Civi password and confirm her username? Thank you! [17:17:21] legoktm: are you keeping an eye on https://tendril.wikimedia.org/host/view/db1072.eqiad.wmnet/3306 ? [17:17:37] randomcat: no, should I be? [17:17:41] (03CR) 10Alexandros Kosiaris: [C: 031] contint: allow gearman/zeromq from labnodepool1001 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [17:18:00] legoktm: maybe, not sure how much of that is T96686 [17:18:00] randomcat: I'm not really sure what those graphs mean... [17:18:17] hmm, I already deployed the fix for that [17:19:48] (03CR) 10Rush: [C: 032 V: 032] git::install do fetch tags explicitly [puppet] - 10https://gerrit.wikimedia.org/r/205634 (owner: 10Rush) [17:19:51] 10ops-fundraising, 10Fundraising-Backlog: Need Civi access for Donor Services agent - https://phabricator.wikimedia.org/T95011#1225005 (10Jgreen) a:5Jgreen>3K4-713 [17:19:57] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1225006 (10jeremyb) 5declined>3Resolved This is not declined. :) [17:23:37] (03PS9) 10Ori.livneh: graphite: introduce carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/181080 (https://phabricator.wikimedia.org/T85908) (owner: 10Filippo Giunchedi) [17:26:00] (03PS4) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [17:26:39] (03PS5) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [17:29:33] (03PS2) 10Aude: Add subscriptionLookupMode setting for wikidata + testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204728 [17:31:10] 6operations: Upgrade xenon, cerium and praseodymium to jessie - https://phabricator.wikimedia.org/T90955#1225021 (10GWicke) @fgiunchedi, go ahead! [17:31:45] (03PS1) 10Alex Monk: Lift account creation throttle for Santiago Wikipedia editing workshop in a few hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205640 (https://phabricator.wikimedia.org/T96696) [17:32:17] !log aaron Synchronized php-1.26wmf2/includes/GlobalFunctions.php: b5b054e2f5b53e30d5aca21d046aa0ac33d5c407 (duration: 00m 12s) [17:32:25] Logged the message, Master [17:34:52] randomcat: umm a bunch of " 25 [10000ms] at runtime/ext_mysql: slow query: SELECT MASTER_POS_WAIT('db1052-bin.001859', 102804609, 10)" just showed up on fatalmonitor and the job queue stalled for like 2 minutes? [17:41:29] greg-g: would it be okay if i deploy https://gerrit.wikimedia.org/r/204728 before the train? [17:41:56] * aude forgot about it for swat [17:45:59] (03PS6) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [17:46:09] * aude proceeds... [17:47:26] (03CR) 10Aude: [C: 032] Add subscriptionLookupMode setting for wikidata + testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204728 (owner: 10Aude) [17:47:33] (03Merged) 10jenkins-bot: Add subscriptionLookupMode setting for wikidata + testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/204728 (owner: 10Aude) [17:47:45] (03PS7) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [17:48:37] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [17:48:57] !log aude Synchronized wmf-config/Wikibase.php: Add subscriptionLookupMode setting for wikidata (duration: 00m 13s) [17:49:01] Logged the message, Master [17:54:43] legoktm: https://tendril.wikimedia.org/report/slow_queries?host=^db1052&user=wikiuser&schema=wik&qmode=eq&query=&hours=1 [17:54:49] ugh, I need to tweak Renameuser [17:55:35] odd, it's suppose to respect $wgUpdateRowsPerJob for jobs [17:56:20] (03PS8) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [17:56:59] randomcat: the jobs are moving quickly for a few minutes and then stall for a minute or two and repeat... [17:57:25] I wonder if those renames are slow due to scanning, row changes, or lock contention on gaps [17:57:44] legoktm: which jobs? [17:57:57] not all of them http://ganglia.wikimedia.org/latest/?c=Jobrunners%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [17:58:34] randomcat: LocalRenameUserJobs [17:58:43] it makes sense for runners on renameuser to close out if queries are taking 5 seconds [17:59:09] they will timeout on the slave lag check and return 'slave-lag-limit' in the json to the coordinator [18:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150421T1800). Please do the needful. [18:05:20] (03PS1) 1020after4: Group1 wikis to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205641 [18:09:01] (03PS9) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [18:10:11] 6operations, 6Security, 10Wikimedia-Shop, 7HTTPS, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1225091 (10Dzahn) That was a mistake, i did not mean to change it to declined. Thanks for fixing it. [18:10:16] (03CR) 10MZMcBride: "It looks like there are Windows line endings (carriage returns) in PS3's modules/phabricator/lib/ordered_hash.rb. I'm not sure if that was" [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [18:12:45] (03CR) 1020after4: "Evan Priestley wrote a fairly thorough response detailing how phacility deals with deploying configuration for their hosted phabricator in" [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [18:13:33] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1225095 (10mobrovac) After a heated IRC discussion, there is consensus around the fact that there should be a 1:1 mapping between `name` in `/api/{name}` and ba... [18:20:09] (03CR) 1020after4: [C: 04-1] Preserve order of 'maniphest.statuses' in Phabricator settings [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [18:20:29] (03PS10) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [18:20:56] (03CR) 1020after4: "I will submit a change for this which avoids the need for custom stuff in puppet." [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [18:22:26] (03CR) 10Bartosz Dziewoński: "(I don't mean to "enforce" high priority on this, it waited for half a year, it can wait more. Whenever you can work on it :) )" [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [18:22:33] (03CR) 1020after4: [C: 032] Group1 wikis to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205641 (owner: 1020after4) [18:22:39] (03Merged) 10jenkins-bot: Group1 wikis to 1.26wmf2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205641 (owner: 1020after4) [18:22:48] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1225111 (10GWicke) >>! In T95229#1225095, @mobrovac wrote: > After a heated IRC discussion, there is consensus around the fact that there should be a 1:1 mappin... [18:24:40] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.26wmf2 [18:24:51] Logged the message, Master [18:34:25] (03PS11) 10Ottomata: Puppetize impala [puppet/cdh] - 10https://gerrit.wikimedia.org/r/205446 (https://phabricator.wikimedia.org/T96329) [18:43:10] (03PS1) 10Aude: Update dispatchChanges cronjob to use new script location [puppet] - 10https://gerrit.wikimedia.org/r/205644 [18:53:56] PROBLEM - puppet last run on mw2065 is CRITICAL puppet fail [19:05:28] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1225155 (10TheDJ) >Also, RESTbase already sends liberal CORS headers with its responses anyway. Another reason to do this, not yet mentioned as far as I see, i... [19:11:57] RECOVERY - puppet last run on mw2065 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [19:32:28] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 21.43% of data above the critical threshold [500.0] [19:36:57] PROBLEM - puppet last run on cp3019 is CRITICAL puppet fail [19:44:09] jgage: you there? [19:48:57] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [19:54:57] RECOVERY - puppet last run on cp3019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:46] aude: sorry, was away, are you ok now? [20:04:25] greg-g: yes, although about to send a mail [20:04:35] kk [20:04:36] https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=154892&oldid=154817 for next tuesday [20:04:44] to enable more usage tracking etc. [20:08:26] greg-g: sent [20:09:47] 6operations, 10Parsoid, 6Services: Let's consider upgrading our Node.js installs to io.js (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1225282 (10Krinkle) @gwicke Experimental or non-default flags like `--harmony` are not practical requirements for stable release of our... [20:27:07] 6operations, 10Wikimedia-Apache-configuration: wikibooks.org redirects to en.wikibooks.org - https://phabricator.wikimedia.org/T87039#1225317 (10Dzahn) maybe there was a second change related to wikibooks domains [20:28:12] (03PS1) 10coren: Labs: Add a script to reboot idmap instances [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) [20:29:35] andrewbogott_afk: ^^ when you get a minute. [20:32:11] (03CR) 10Dzahn: [C: 032] contint: allow gearman/zeromq from labnodepool1001 [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [20:32:36] 6operations, 7Monitoring: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1225325 (10faidon) [20:33:55] (03CR) 10Dzahn: "doing this now. we can always come back to the IP vs @resolve question" [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [20:33:55] (03CR) 10Dzahn: "Could not find data item contint::nodepool_host in any Hiera data file :/" [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [20:33:55] !log aaron Synchronized php-1.26wmf2/includes/jobqueue/JobRunner.php: 2f3b7594650162b04f55e63e8df251d3913ab7ca (duration: 00m 11s) [20:33:55] (03CR) 10Hashar: "Yup sounds better for now to mess up with @resolve() As Daniel said we can update later on." [puppet] - 10https://gerrit.wikimedia.org/r/204876 (https://phabricator.wikimedia.org/T96426) (owner: 10Hashar) [20:33:55] hashar: it cant find the labnodepool host :p [20:33:55] mutante: yeah it is hidden [20:33:57] mutante: anyway that change is for gallium :) [20:34:03] yea, it fails on gallium [20:34:10] :=( [20:34:18] * hashar shakes fists at puppet [20:34:19] Could not find data item contint::nodepool_host in any Hiera data file [20:34:22] Logged the message, Master [20:34:58] somehow it doesnt find it in ./common/ in hiera [20:35:28] eh, what was the advantage again over having that IP in the role class :) [20:35:52] it is defined in hieradata/common/contint.yaml [20:36:01] not sure how to hiera() works though [20:37:11] mutante: maybe in the yaml file we should remove the prefix contint:: [20:37:38] PROBLEM - puppet last run on gallium is CRITICAL puppet fail [20:38:30] hashar: looking at other ones in there, yes, that sounds right [20:38:35] i can make a patch [20:39:49] 6operations, 10Parsoid, 6Services: Offer io.js on Jessie - https://phabricator.wikimedia.org/T91855#1225349 (10GWicke) [20:41:44] (03PS1) 10Dzahn: contint: fix hiera variable name for nodepool host [puppet] - 10https://gerrit.wikimedia.org/r/205703 (https://phabricator.wikimedia.org/T96426) [20:42:34] (03CR) 10Dzahn: [C: 032] contint: fix hiera variable name for nodepool host [puppet] - 10https://gerrit.wikimedia.org/r/205703 (https://phabricator.wikimedia.org/T96426) (owner: 10Dzahn) [20:43:37] hashar: yep, that works [20:45:16] mutante: !!! [20:45:57] RECOVERY - puppet last run on gallium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:46:42] mutante can confirm nodepool manage to join gallium. Thank you ! [20:46:59] hashar: confirming ferm rules have been created. updating ticket , cool [20:47:28] mutante: you can close it https://phabricator.wikimedia.org/T96426 :) [20:47:38] mutante: thank you to have suggested the use of hiera [20:47:45] 6operations, 5Continuous-Integration-Isolation, 5Patch-For-Review: Allow gearman and zeromq connections from labnodepool1001 to gallium.wikimedia.org - https://phabricator.wikimedia.org/T96426#1225359 (10Dzahn) Notice: /Stage[main]/Contint::Firewall/Ferm::Service[jenkins_zeromq_from_nodepool]/File[/etc/ferm/... [20:48:02] hashar: yw! ..and done :) [20:49:08] 6operations, 5Continuous-Integration-Isolation: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1225361 (10Dzahn) [20:49:11] 6operations, 5Continuous-Integration-Isolation, 5Patch-For-Review: Allow gearman and zeromq connections from labnodepool1001 to gallium.wikimedia.org - https://phabricator.wikimedia.org/T96426#1225360 (10Dzahn) 5Open>3Resolved [20:49:47] (03CR) 10Andrew Bogott: "shouldn't this cleanup testfile before rebooting?" [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:50:18] (03CR) 10coren: "It does. That what the 'trap "rm..." 0' does." [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:52:23] hashar: should we do the other one as well? [20:52:34] the one that moves zuul_merger_hosts to hiera [20:52:47] (03CR) 10Andrew Bogott: "so trap blahblah 0 means 'blahblah on exit' approximately?" [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:53:16] !log aaron Synchronized php-1.26wmf1/includes/GlobalFunctions.php: bceb4de391bd8a321921a8587988cb1be7b71556 (duration: 00m 11s) [20:53:23] Logged the message, Master [20:53:24] (03CR) 10coren: "Yep, regardless of how the shell ends up exiting." [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:53:39] !log aaron Synchronized php-1.26wmf1/includes/jobqueue/JobRunner.php: 4285f1921585ee87034e9739b1353fbad35f3a29 (duration: 00m 11s) [20:53:44] Logged the message, Master [20:53:48] (03CR) 10Andrew Bogott: [C: 031] "ok then" [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:54:22] (03CR) 10coren: [C: 032] Labs: Add a script to reboot idmap instances [puppet] - 10https://gerrit.wikimedia.org/r/205702 (https://phabricator.wikimedia.org/T95556) (owner: 10coren) [20:54:40] 6operations, 10Parsoid, 6Services: Offer io.js on Jessie - https://phabricator.wikimedia.org/T91855#1225381 (10cscott) Note that coffeescript packages are typically not handled "properly" by npm, either -- they depend on `require('coffeescript')` being included first, which then "teaches" npm how to load `.c... [20:55:15] mutante: havent looked at it today though [20:56:14] mutante should drop the contint:: prefix in yaml files :/ [20:56:17] hashar: well, you already cherry-picked it and confirmed it working, so that sounds good [20:56:42] oh, heh, but why did it work:) [20:57:01] https://gerrit.wikimedia.org/r/#/c/201882/9/hieradata/labs/integration/common.yaml,unified that is the one from labs [20:57:28] so in realms 'labs' , project 'integration' that loads common.yaml [20:57:35] and define contint::zuul_merger_hosts [20:57:44] but on prod [20:57:45] https://gerrit.wikimedia.org/r/#/c/201882/9/hieradata/common/contint.yaml,unified [20:57:47] ugh, it's another thing that is different in labs? [20:58:00] the contint.yaml file content is namespaced with contint:: [20:58:15] (03CR) 1020after4: "I'm working on it now. shouldn't be that hard to fix and it's been bothering me for a while." [puppet] - 10https://gerrit.wikimedia.org/r/205338 (https://phabricator.wikimedia.org/T548) (owner: 10Bartosz Dziewoński) [20:58:33] (03CR) 10Hashar: contint: move zuul_merger_hosts to hiera, use in ferm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [20:59:27] (03PS10) 10Hashar: contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [20:59:39] (03CR) 10jenkins-bot: [V: 04-1] contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [20:59:43] mutante: bah [21:00:04] rmoen, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150421T2100). [21:00:06] hashar: i know :) needs manual rebase :p [21:00:10] i was doing the same thing [21:00:15] anyway [21:00:24] that one has the potential to disrupt CI [21:00:38] so I would like to have some time ahead before it lands [21:00:42] ok [21:00:43] and it is a bit late right now :/ [21:00:49] I am just being paranoid :) [21:00:49] agreed. not merging [21:00:54] that's a good thing [21:01:55] (03PS11) 10Hashar: contint: move zuul_merger_hosts to hiera, use in ferm [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [21:01:57] at least one patch got merged :) [21:02:02] and the other is in a better shape! [21:02:15] yep, all good [21:04:40] (03PS1) 10Rush: phab: apply phabricator-roots group [puppet] - 10https://gerrit.wikimedia.org/r/205718 [21:05:08] (03CR) 10Rush: [C: 032 V: 032] phab: apply phabricator-roots group [puppet] - 10https://gerrit.wikimedia.org/r/205718 (owner: 10Rush) [21:05:46] (03CR) 10Hashar: "PS11 should be good for production. Since this change can potentially disrupt the CI infra, it is probably better to merge it during Euro" [puppet] - 10https://gerrit.wikimedia.org/r/201882 (https://phabricator.wikimedia.org/T87519) (owner: 10Dzahn) [21:05:54] mutante: Danke Schon! [21:07:35] hashar: de rien [21:07:42] (03CR) 10Dzahn: [C: 031] Revert "Revert of Iab860b8a5: Make puppet cronjob to run SecurePoll/cli/purgePrivateVoteData.php" [puppet] - 10https://gerrit.wikimedia.org/r/184637 (owner: 10Anomie) [21:13:50] (03CR) 10Dzahn: [C: 032] ocg: Fix ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/204772 (owner: 10Tim Landscheidt) [21:16:19] (03CR) 10Dzahn: "thanks. confirmed on ocg1001." [puppet] - 10https://gerrit.wikimedia.org/r/204772 (owner: 10Tim Landscheidt) [21:16:33] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1225413 (10Tgr) >>! In T95229#1225155, @TheDJ wrote: > I know @tgr or @gilles gathered some statistics on the level of 'brokenness' of CORS throughout the user... [21:22:52] (03PS1) 10Rush: phab: use 'service' to manage upgrade processes [puppet] - 10https://gerrit.wikimedia.org/r/205722 [21:26:42] (03PS1) 10Rush: phab stage tags for upgrade [puppet] - 10https://gerrit.wikimedia.org/r/205723 [21:32:21] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1225443 (10GWicke) [21:36:01] (03CR) 10Negative24: [C: 031] "Looks sane. All depends on whats in git." [puppet] - 10https://gerrit.wikimedia.org/r/205723 (owner: 10Rush) [21:38:43] (03PS2) 10Rush: phab: use 'service' to manage upgrade processes [puppet] - 10https://gerrit.wikimedia.org/r/205722 [21:38:56] (03CR) 10Rush: [C: 032 V: 032] phab: use 'service' to manage upgrade processes [puppet] - 10https://gerrit.wikimedia.org/r/205722 (owner: 10Rush) [21:45:58] (03PS1) 10Andrew Bogott: Don't clean up unused image types. [puppet] - 10https://gerrit.wikimedia.org/r/205731 [21:47:09] (03PS2) 10Ori.livneh: coal: update frontend code [puppet] - 10https://gerrit.wikimedia.org/r/205732 [21:47:11] (03CR) 10Ori.livneh: [C: 032 V: 032] coal: update frontend code [puppet] - 10https://gerrit.wikimedia.org/r/205732 (owner: 10Ori.livneh) [21:47:27] 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1225502 (10GWicke) To add to what @Eevans said, the primary concern here is not about anything too special about Cassandra. It's more about the generic complexity of... [21:47:58] (03CR) 10Andrew Bogott: [C: 032] Don't clean up unused image types. [puppet] - 10https://gerrit.wikimedia.org/r/205731 (owner: 10Andrew Bogott) [22:17:03] (03PS1) 10Krinkle: performance: Implement hash permalinks for tabs [puppet] - 10https://gerrit.wikimedia.org/r/205766 [22:18:07] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1225814 (10RobH) @mobrovac, Thanks! I'll sync up with @dzahn and @arielglenn and see who will be implementing the puppetization of the group. (We'll have to check out the work Daniel a... [22:19:04] (03PS2) 1020after4: phab stage tags for upgrade [puppet] - 10https://gerrit.wikimedia.org/r/205723 (owner: 10Rush) [22:19:53] (03CR) 10Ori.livneh: [C: 032] "Thanks very much!" [puppet] - 10https://gerrit.wikimedia.org/r/205766 (owner: 10Krinkle) [22:21:24] 6operations, 5Interdatacenter-IPsec: Kernel panics on Jessie (3.16.0-4-amd64) during IPsec load test - https://phabricator.wikimedia.org/T94820#1225815 (10Gage) 5Open>3Resolved a:3Gage This seems to be fixed in linux-image-3.19.0-trunk-amd64 version 3.19.3-1~exp1, currently in Debian/Experimental. * 3.... [22:24:39] !log disabled a bunch of old rt queues from allowing ticket creation, tired of spam [22:24:50] Logged the message, Master [22:31:00] jouncebot, next [22:31:00] In 0 hour(s) and 28 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150421T2300) [22:34:00] (03PS1) 10Yuvipanda: labs: Move inlcude of labs_lvm to the Volume define [puppet] - 10https://gerrit.wikimedia.org/r/205770 [22:34:09] (03PS2) 10Yuvipanda: labs: Move inlcude of labs_lvm to the Volume define [puppet] - 10https://gerrit.wikimedia.org/r/205770 [22:34:53] (03CR) 10Yuvipanda: [C: 032 V: 032] labs: Move inlcude of labs_lvm to the Volume define [puppet] - 10https://gerrit.wikimedia.org/r/205770 (owner: 10Yuvipanda) [22:39:48] (03PS2) 10QChris: Add alerts for missing hours in pagecounts_all_sites and pagecounts_raw [puppet] - 10https://gerrit.wikimedia.org/r/205067 [22:44:08] (03PS1) 10Ori.livneh: coal: allow 'hour' resolution to be reselected after being deselected [puppet] - 10https://gerrit.wikimedia.org/r/205775 [22:44:17] (03CR) 10QChris: Add alerts for missing hours in pagecounts_all_sites and pagecounts_raw (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/205067 (owner: 10QChris) [22:45:08] (03PS2) 10Ori.livneh: coal: allow 'hour' resolution to be reselected after being deselected [puppet] - 10https://gerrit.wikimedia.org/r/205775 [22:45:21] (03PS3) 10Ori.livneh: coal: allow 'hour' resolution to be reselected after being deselected [puppet] - 10https://gerrit.wikimedia.org/r/205775 [22:45:31] (03CR) 10Ori.livneh: [C: 032 V: 032] coal: allow 'hour' resolution to be reselected after being deselected [puppet] - 10https://gerrit.wikimedia.org/r/205775 (owner: 10Ori.livneh) [22:47:09] 10Ops-Access-Requests, 6operations, 10Analytics: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1225949 (10Shouston_WMF) 5Open>3Resolved [22:51:43] 6operations, 10ops-eqiad: db1060 raid degraded - https://phabricator.wikimedia.org/T96471#1225976 (10RobH) a:3Cmjohnson [22:53:22] I'll swat today [22:53:29] seeing as I have a couple of patches [22:54:09] oooooh [22:54:11] I just realized [22:54:19] that we are swatting bug(fixes) [22:54:21] (03PS1) 10Jforrester: Enable a test of the VisualEditor A/B testing framework [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205778 [22:54:26] and it’s not SWAT as in SWATCats [22:54:38] :) [22:55:07] i thought it was because deployments are akin to tear gas [22:55:53] (at least when they go wrong) [22:56:14] :D [22:56:25] Friendly SpaOMGTEARGAS [22:56:50] James_F, ... sigh. enwiki only? [22:57:06] Krenair: For the test of the test? Yes. [22:57:38] (03CR) 10Jforrester: [C: 04-1] "Not until Analytics says OK." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205778 (owner: 10Jforrester) [22:58:23] YuviPanda: also "Setting Wikis Ablaze Team" backronym is pretty awesome :) [22:58:31] :D [22:59:07] * YuviPanda takes SWAT to arbcom [22:59:07] oh wait [22:59:34] haha [22:59:56] YuviPanda: http://www.roflcat.com/images/cats/Swat_Team.jpg [23:00:04] RoanKattouw, ^d, Krenair, Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150421T2300). [23:00:08] mutante: :D [23:00:10] so i a Gentoo guy chats me up on the train [23:00:10] ok [23:00:20] Krenair: not too soon, I think. [23:00:24] :) [23:00:42] mutante: but did he build himself from scratch to be the most optimal person to have that conversation before having that conversation? [23:01:09] YuviPanda: lol. he claimed "but after you compile it once you never have to do it again" [23:01:28] “But what if Intel releases a new microcode package!!!1" [23:01:43] wait... what? [23:01:56] krenair@tin:/srv/mediawiki-staging/php-1.26wmf2$ git rebase origin/wmf/1.26wmf2 [23:01:56] First, rewinding head to replay your work on top of it... [23:01:56] Applying: Add namespace aliases for Luri (lrc) [23:02:02] why is that there? [23:04:24] whatever [23:04:28] James_F [23:04:31] !log krenair Synchronized php-1.26wmf2/extensions/VisualEditor: https://gerrit.wikimedia.org/r/205774 - should effectively be a no-op until config (duration: 00m 12s) [23:04:37] Logged the message, Master [23:04:37] Thanks. [23:04:38] Krenair: lrc is not in langlist in DNS yet [23:04:48] missing ? [23:05:22] lrc is Northern Luri, apparently there is also Southern and Bakhtiari https://en.wikipedia.org/wiki/Luri_language [23:05:41] (03CR) 10Negative24: [C: 031] phab stage tags for upgrade [puppet] - 10https://gerrit.wikimedia.org/r/205723 (owner: 10Rush) [23:05:46] (03PS2) 10Alex Monk: Lift account creation throttle for Santiago Wikipedia editing workshop in a few hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205640 (https://phabricator.wikimedia.org/T96696) [23:05:56] (03CR) 10Alex Monk: [C: 032] Lift account creation throttle for Santiago Wikipedia editing workshop in a few hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205640 (https://phabricator.wikimedia.org/T96696) (owner: 10Alex Monk) [23:06:03] (03Merged) 10jenkins-bot: Lift account creation throttle for Santiago Wikipedia editing workshop in a few hours [mediawiki-config] - 10https://gerrit.wikimedia.org/r/205640 (https://phabricator.wikimedia.org/T96696) (owner: 10Alex Monk) [23:06:08] mutante, right, but I'm wondering why this was applied to the cluster [23:07:18] !log krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/205640/ (duration: 00m 13s) [23:07:23] Logged the message, Master [23:09:36] Krenair: hrmm.. no idea, should technically show up on deployment calendar? it's this https://gerrit.wikimedia.org/r/#/c/203648/ [23:10:13] Krenair: tried searching for gerrit:203648 in wikitech search to find it on Deployment calendar history .. but it takes me straight to gerrit [23:11:36] Hmm. [23:13:37] woah, wtf [23:14:03] committer Alex Monk 1429657264 +0000 [23:14:38] I don't remember doing that... Maybe I put in the wrong commit somewhere? I had touched that change in gerrit earlier [23:14:53] It's not the deployed version [23:15:14] Krenair: well, you said "Re-applying Nikerabbit's +2" [23:15:22] and that merged it i guess [23:15:24] right, and that would've merged it to master [23:15:45] but not to our deployment branch and it wouldn't give that @tin.eqiad.wmnet commit host [23:16:33] mutante, I'm not sure what you're asking about on the change by the way... [23:16:47] I don't think we host any wikis defaulting to that language. [23:19:47] Krenair: yes, we can't have any because "lrc" is not in DNS, so the question is if it should be added [23:20:20] If the committee were to approve of WMF creating it, sure [23:21:12] checks. https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Lurish [23:22:03] So basically no [23:23:08] i guess. it was not very obvious. there are lots of support votes [23:23:18] 6operations, 10RESTBase, 10VisualEditor, 7Performance: Set up an API base path for REST and action APIs - https://phabricator.wikimedia.org/T95229#1226371 (10GWicke) [23:23:19] but no decision [23:25:02] "10 milions speakers in iran, oman, kuwait, iraq and other location in the world" sounds good to me, they have support votes and it waits since 2013 .. hmm.. shrug [23:29:39] mutante, it's really not relevant to the commit [23:30:28] Anyone know why SSH freezes and becomes useless if I leave it open in the background for too long? [23:32:19] Krenair http://superuser.com/questions/98562/way-to-avoid-ssh-connection-timeout-freezing-of-terminal-tab [23:32:22] Krenair: idle timeout [23:32:36] increase ServerAliveInterval [23:33:15] thanks [23:36:17] Krenair: yw! [23:38:16] 7Blocked-on-Operations, 10Ops-Access-Requests, 6operations: Access to francium for gwicke,mobrovac,eevans (htmldumps-admins) - https://phabricator.wikimedia.org/T94093#1226388 (10Dzahn) [23:39:50] (03PS1) 10Dzahn: admin: create html dumps admin group [puppet] - 10https://gerrit.wikimedia.org/r/205786 (https://phabricator.wikimedia.org/T94093) [23:56:28] springle: is there a maria 10 upgrade task in phab? [23:56:35] (03CR) 10Dzahn: [C: 032] "adds empty group" [puppet] - 10https://gerrit.wikimedia.org/r/205786 (https://phabricator.wikimedia.org/T94093) (owner: 10Dzahn)