[00:00:05] (03CR) 10Dzahn: [C: 032] "yea, unfortunately it did not work but it did not log to the "log.out" file or syslog either" [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:00:11] cajoel: i don't know [00:00:43] mutante: I'll open a phab [00:00:48] maybe 2 [00:00:53] one for me, one for public [00:01:12] !log ori Synchronized php-1.26wmf7/extensions/RSS/RSSParser.php: Ice44740fb: Don't rely on strip marker uniqueness (T10104) (duration: 00m 13s) [00:01:15] cajoel: alright [00:01:16] Logged the message, Master [00:01:54] 6operations: Login for jkrauska to librenms - https://phabricator.wikimedia.org/T101064#1327667 (10JKrauska) [00:02:37] !log ori Synchronized php-1.26wmf8/extensions/RSS/RSSParser.php: Ice44740fb: Don't rely on strip marker uniqueness (T10104) (duration: 00m 14s) [00:02:41] Logged the message, Master [00:02:48] (03CR) 10Paladox: "I think it is to do with it requires" [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:02:53] 6operations: Public Access to librenms - https://phabricator.wikimedia.org/T101067#1327681 (10JKrauska) 3NEW [00:03:09] (03CR) 10Dzahn: [V: 032] "yea, unfortunately it did not work but it did not log to the "log.out" file or syslog either." [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:03:27] (03PS1) 10Paladox: Revert "Revert "Add link in gitblit for phabricator"" [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:03:49] ori, did you forget to push https://gerrit.wikimedia.org/r/215234 ? [00:04:04] 6operations, 10Wikimedia-Apache-configuration: Redirect for Wikimedia v NSA - https://phabricator.wikimedia.org/T97341#1327702 (10Heather) Right now it would redirect to the blog post. http://blog.wikimedia.org/2015/03/10/wikimedia-v-nsa/ [00:04:11] AaronSchulz: I didn't forget, I just haven't done it. Do you want me to? [00:04:55] AaronSchulz: I'll deploy it [00:07:04] !log Updated jobrunner for I1d351d8d1: Made periodictasks stats calls more useful [00:07:08] Logged the message, Master [00:07:11] AaronSchulz: done, restarted, etc. [00:07:19] (03CR) 10Paladox: "Would the above work. Since it is looking for comit link" [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:07:31] ok [00:07:57] (03CR) 10Paladox: "Or trackingid" [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:09:40] (03CR) 10Paladox: "It is looking for global and trackingid sound like a name that would suggestion that other software can look for." [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:11:11] (03PS2) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:11:21] (03PS3) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:11:45] (03PS4) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:12:14] Nemo_bis, https://phabricator.wikimedia.org/T101060 [00:15:55] (03PS16) 10BBlack: sslcert: Deploy new x509-bundle script + test output [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [00:15:57] (03PS1) 10BBlack: sslcert: switch all install_certificate to x509-bundle [puppet] - 10https://gerrit.wikimedia.org/r/215249 [00:16:50] (03PS5) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:17:49] (03CR) 10Paladox: "@Dzahn please review this since I changed it. Because it should use trackingid." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [00:19:55] (03PS17) 10BBlack: sslcert: Deploy new x509-bundle script + test output [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [00:19:57] (03PS2) 10BBlack: sslcert: switch all install_certificate to x509-bundle [puppet] - 10https://gerrit.wikimedia.org/r/215249 [00:21:03] (03CR) 10Paladox: Add link in gitblit for phabricator (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [00:21:45] (03CR) 10BBlack: [C: 032] sslcert: Deploy new x509-bundle script + test output [puppet] - 10https://gerrit.wikimedia.org/r/197341 (owner: 10Faidon Liambotis) [00:22:51] (03PS6) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:23:07] (03PS7) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [00:23:51] (03CR) 10Paladox: "I might have fixed problem that occurred with this patch https://gerrit.wikimedia.org/r/#/c/215247" [puppet] - 10https://gerrit.wikimedia.org/r/215238 (owner: 10Dzahn) [00:36:03] PROBLEM - puppet last run on rcs1002 is CRITICAL Puppet has 1 failures [00:36:33] PROBLEM - puppet last run on rcs1001 is CRITICAL Puppet has 1 failures [00:40:53] ^ related to above: (CR) BBlack: [C: 2] sslcert: Deploy new x509-bundle script + test output [puppet] - https://gerrit.wikimedia.org/r/197341 [00:41:06] (nothing to worry about, it's failing on generating some test output, which was the point of testing it) [00:54:50] (03CR) 10Kaldari: [C: 031] Disable WikiGrok in all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215109 (https://phabricator.wikimedia.org/T101016) (owner: 10Bmansurov) [01:19:21] ori, looks like tmh1001/1002 were not restarted [01:20:50] (03PS1) 10BBlack: add DigiCertSHA2HighAssuranceServerCA Intermediate cert, used by stream.wm.o cert [puppet] - 10https://gerrit.wikimedia.org/r/215266 [01:22:11] (03PS2) 10BBlack: add DigiCertSHA2HighAssuranceServerCA Intermediate cert, used by stream.wm.o cert [puppet] - 10https://gerrit.wikimedia.org/r/215266 [01:24:30] (03PS3) 10BBlack: add DigiCertSHA2HighAssuranceServerCA Intermediate cert, used by stream.wm.o cert [puppet] - 10https://gerrit.wikimedia.org/r/215266 [01:24:52] (03CR) 10BBlack: [C: 032 V: 032] add DigiCertSHA2HighAssuranceServerCA Intermediate cert, used by stream.wm.o cert [puppet] - 10https://gerrit.wikimedia.org/r/215266 (owner: 10BBlack) [01:27:56] (03PS1) 10BBlack: bugfix for 44c3d7201 [puppet] - 10https://gerrit.wikimedia.org/r/215268 [01:28:13] (03CR) 10BBlack: [C: 032 V: 032] bugfix for 44c3d7201 [puppet] - 10https://gerrit.wikimedia.org/r/215268 (owner: 10BBlack) [01:30:03] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures [01:30:13] RECOVERY - puppet last run on rcs1001 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [01:30:42] PROBLEM - puppet last run on virt1004 is CRITICAL Puppet has 1 failures [01:31:03] PROBLEM - puppet last run on dataset1001 is CRITICAL Puppet has 1 failures [01:31:03] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 1 failures [01:31:04] PROBLEM - puppet last run on virt1001 is CRITICAL Puppet has 1 failures [01:31:16] I screwed up the digicert commit, there will be a few intermediate puppetfails there from before the fix :/ [01:31:23] PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 1 failures [01:31:23] PROBLEM - puppet last run on cp3006 is CRITICAL Puppet has 1 failures [01:31:32] PROBLEM - puppet last run on cp1071 is CRITICAL Puppet has 1 failures [01:31:42] PROBLEM - puppet last run on cp1058 is CRITICAL Puppet has 1 failures [01:31:43] PROBLEM - puppet last run on plutonium is CRITICAL Puppet has 1 failures [01:31:46] nothing serious, just annoying spam [01:31:52] PROBLEM - puppet last run on antimony is CRITICAL Puppet has 1 failures [01:31:53] PROBLEM - puppet last run on cp4001 is CRITICAL Puppet has 1 failures [01:31:54] PROBLEM - puppet last run on cp3004 is CRITICAL Puppet has 1 failures [01:31:54] PROBLEM - puppet last run on cp3041 is CRITICAL Puppet has 1 failures [01:32:14] PROBLEM - puppet last run on virt1003 is CRITICAL Puppet has 1 failures [01:32:33] PROBLEM - puppet last run on cp1050 is CRITICAL Puppet has 1 failures [01:32:33] PROBLEM - puppet last run on silver is CRITICAL Puppet has 1 failures [01:32:42] PROBLEM - puppet last run on cp1046 is CRITICAL Puppet has 1 failures [01:32:44] PROBLEM - puppet last run on cp4018 is CRITICAL Puppet has 1 failures [01:32:53] PROBLEM - puppet last run on cp3005 is CRITICAL Puppet has 1 failures [01:32:54] PROBLEM - puppet last run on nembus is CRITICAL Puppet has 1 failures [01:33:02] PROBLEM - puppet last run on cp4019 is CRITICAL Puppet has 1 failures [01:33:02] PROBLEM - puppet last run on cp4005 is CRITICAL Puppet has 1 failures [01:33:03] PROBLEM - puppet last run on cp3003 is CRITICAL Puppet has 1 failures [01:33:12] RECOVERY - puppet last run on rcs1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:33:52] PROBLEM - puppet last run on cp1063 is CRITICAL Puppet has 1 failures [01:46:44] RECOVERY - puppet last run on cp1058 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [01:46:53] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [01:47:02] RECOVERY - puppet last run on antimony is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [01:47:23] RECOVERY - puppet last run on virt1003 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [01:47:52] RECOVERY - puppet last run on dataset1001 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [01:47:53] RECOVERY - puppet last run on virt1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:48:02] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:48:04] RECOVERY - puppet last run on nembus is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [01:48:12] RECOVERY - puppet last run on cp4019 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [01:48:13] RECOVERY - puppet last run on cp3006 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [01:48:13] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:48:13] RECOVERY - puppet last run on cp1071 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:48:43] RECOVERY - puppet last run on cp4001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:48:43] RECOVERY - puppet last run on cp3004 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [01:48:52] RECOVERY - puppet last run on cp3041 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [01:49:03] (03CR) 10Tim Landscheidt: [C: 031] Add cnwikimedia to the list of wikis on labs [puppet] - 10https://gerrit.wikimedia.org/r/214995 (owner: 10Jcrespo) [01:49:12] RECOVERY - puppet last run on virt1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:22] RECOVERY - puppet last run on cp1050 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:23] RECOVERY - puppet last run on silver is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:23] (03PS1) 10Legoktm: Remove references to $wgEchoCohortInterval [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215270 (https://phabricator.wikimedia.org/T101047) [01:49:32] RECOVERY - puppet last run on cp1046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:43] RECOVERY - puppet last run on cp4018 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [01:49:44] RECOVERY - puppet last run on cp3005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:49:53] RECOVERY - puppet last run on cp4005 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [01:49:53] RECOVERY - puppet last run on cp3003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:50:04] RECOVERY - puppet last run on plutonium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:50:42] RECOVERY - puppet last run on cp1063 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [01:50:56] bblack: How is digicert_sha2_ca used in stream.wm.o? [01:52:41] (03CR) 10Alex Monk: "Now that's merged, this should be OK to do after the 11th of June, when 1.26wmf9 is scheduled to go to wikipedias." [puppet] - 10https://gerrit.wikimedia.org/r/139581 (owner: 10Withoutaname) [02:04:07] Krinkle: I don't really know. stream.wm.o doesn't actually have any working SSL as far as I can tell [02:04:18] it has a listener on port 443 which apparently serves plaintext HTTP? [02:04:25] Hm.. [02:04:31] bblack: I'm just curious what led you to that commit. [02:04:36] I don't see any references to it from stream's config. [02:04:39] but in any case, it had a certificate for stream.wm.o deployed onto the server, and that cert needed that intermediate to generated a correct chained file [02:04:52] interesting [02:04:57] there's an install_certificate, and an nginx config for :443, in puppet [02:05:48] https://github.com/wikimedia/operations-puppet/blob/production/modules/rcstream/manifests/proxy/ssl.pp [02:06:13] ^ that stuff, which AFAICS doesn't end up configuring a functional SSL server anyways, just an HTTP listener than happens to be on port 443 [02:06:24] but at least the cert chain file is deployed correctly with the new script now :) [02:06:58] !log krinkle Synchronized php-1.26wmf7/resources/src/mediawiki/mediawiki.js: backport rl-fix I717b86573 (duration: 00m 14s) [02:07:05] Logged the message, Master [02:23:35] !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 06m 26s) [02:23:52] Logged the message, Master [02:23:59] (03PS3) 10BBlack: sslcert: switch all install_certificate to x509-bundle [puppet] - 10https://gerrit.wikimedia.org/r/215249 [02:26:36] (03CR) 10BBlack: [C: 032] "I've manually verified this in every case where install_certificate installs certs in our fleet, host by host and cert by cert. In every " [puppet] - 10https://gerrit.wikimedia.org/r/215249 (owner: 10BBlack) [02:27:24] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (100068s 100000s) [02:28:06] ^ not me! [02:28:45] !log LocalisationUpdate completed (1.26wmf7) at 2015-06-02 02:27:42+00:00 [02:28:49] Logged the message, Master [02:33:00] 6operations, 10Traffic, 7HTTPS: review/rebase/merge the final sslcert patch... - https://phabricator.wikimedia.org/T97316#1327995 (10BBlack) 5Open>3Resolved This was broken up and then merged/fixed over 4 commits: https://gerrit.wikimedia.org/r/197341 https://gerrit.wikimedia.org/r/215266 https://gerrit... [02:44:55] !log l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 05m 45s) [02:45:01] Logged the message, Master [02:49:26] !log LocalisationUpdate completed (1.26wmf8) at 2015-06-02 02:48:23+00:00 [02:49:31] Logged the message, Master [02:53:02] PROBLEM - MySQL Processlist on db1040 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 67 statistics [02:54:42] RECOVERY - MySQL Processlist on db1040 is OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [03:11:33] (03Abandoned) 10Mattflaschen: Add php5-xdebug to deployment-bastion for command-line debugging [puppet] - 10https://gerrit.wikimedia.org/r/215214 (owner: 10Mattflaschen) [03:25:13] AaronSchulz: i'll do those now [03:28:01] ori, I restarted them already [03:28:24] ok [03:29:34] 6operations, 10Analytics-Cluster, 3Fundraising Sprint Kraftwerk, 3Fundraising Sprint Lou Reed, 10Fundraising Tech Backlog: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1328015 (10Ottomata) In those cases, there are more requests in kafkatee than in udp2log... [03:30:24] amazingly the wikidata Site key, which has been dominating memcache network utilization since 2013, is back [03:31:18] AaronSchulz: http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Redis+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [03:32:31] yeah I saw cpu/io reduction [03:33:03] push/pop rates are the same, and I tested a delayed job, which was fine too [03:33:21] well done [03:33:58] https://twitter.com/soulislove/status/605086929042972672 [03:34:33] * AaronSchulz keeps forgetting to add acks to https://gdash.wikimedia.org/dashboards/jobq/ [03:34:48] AaronSchulz: also: https://gerrit.wikimedia.org/r/#/c/215213/ [03:35:06] kind of crazy that it's not set by default [03:35:17] i updated https://github.com/facebook/hhvm/wiki/INI-Settings so the next person to come along doesn't have to dig as much [03:35:19] what is the max key count? [03:35:47] i don't think there is one [03:36:19] does it lock apc while doing that? [03:36:46] good question, let's see [03:36:57] as a way of controlling growth you can set a maxttl btw [03:37:07] which basically forces every key to expire eventually [03:38:15] the relevant code is here https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/concurrent-shared-store.cpp#L182-214 [03:53:56] AaronSchulz: https://gerrit.wikimedia.org/r/#/c/215278/ [05:08:04] 6operations, 6Phabricator, 7database: Missing data in Phab reporting dump - https://phabricator.wikimedia.org/T101038#1328050 (10JAufrecht) 5Open>3Invalid [05:08:31] 6operations, 6Phabricator, 7database: Missing data in Phab reporting dump - https://phabricator.wikimedia.org/T101038#1327263 (10JAufrecht) Dug deeper, was able to answer all of these questions with the data available and/or identify problems in the script. [05:26:16] 6operations, 6Engineering-Community, 3ECT-May-2015: date/budget proposal for 2015 Ops Offsite - https://phabricator.wikimedia.org/T89023#1328078 (10Rfarrand) @mark any thoughts about this, or do you want to wait to move forward until the 2015/2016 budget has been revealed? I will wait to add this task to a s... [05:41:00] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 2 05:39:57 UTC 2015 (duration 39m 56s) [05:41:04] Logged the message, Master [05:57:35] (03PS1) 10KartikMistry: CX: Add wikis for deployment on 20150406 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215281 (https://phabricator.wikimedia.org/T100622) [06:05:43] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (19603 100000s) [06:07:14] (03CR) 10Liuxinyu970226: CX: Add wikis for deployment on 20150406 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215281 (https://phabricator.wikimedia.org/T100622) (owner: 10KartikMistry) [06:08:51] (03CR) 10KartikMistry: "And, it should be fix ASAP!" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215281 (https://phabricator.wikimedia.org/T100622) (owner: 10KartikMistry) [06:11:31] (03PS1) 10KartikMistry: CX: Add wikis for deployment on 20150406 [puppet] - 10https://gerrit.wikimedia.org/r/215282 (https://phabricator.wikimedia.org/T100622) [06:13:01] (03PS1) 10KartikMistry: CX: Fix typo fiu_vro -> fiu_vrowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215283 [06:13:34] who can fix and deploy, https://gerrit.wikimedia.org/r/215283 quickly? [06:13:39] typo in config :/ [06:30:23] PROBLEM - puppet last run on cp4008 is CRITICAL Puppet has 2 failures [06:30:32] PROBLEM - puppet last run on db2059 is CRITICAL Puppet has 1 failures [06:30:33] PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 1 failures [06:30:34] PROBLEM - puppet last run on cp3040 is CRITICAL Puppet has 1 failures [06:31:23] PROBLEM - puppet last run on cp1061 is CRITICAL Puppet has 1 failures [06:32:02] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 2 failures [06:32:03] PROBLEM - puppet last run on cp3042 is CRITICAL Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures [06:33:03] PROBLEM - puppet last run on db2065 is CRITICAL Puppet has 1 failures [06:33:22] PROBLEM - puppet last run on subra is CRITICAL Puppet has 1 failures [06:34:53] PROBLEM - puppet last run on mw1175 is CRITICAL Puppet has 1 failures [06:35:02] PROBLEM - puppet last run on mw1217 is CRITICAL Puppet has 1 failures [06:35:13] PROBLEM - puppet last run on mw2118 is CRITICAL Puppet has 1 failures [06:35:13] PROBLEM - puppet last run on mw2114 is CRITICAL Puppet has 1 failures [06:35:13] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures [06:35:13] PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 2 failures [06:35:13] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 1 failures [06:35:22] PROBLEM - puppet last run on mw1226 is CRITICAL Puppet has 1 failures [06:35:43] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 2 failures [06:35:43] PROBLEM - puppet last run on mw1060 is CRITICAL Puppet has 1 failures [06:36:12] PROBLEM - puppet last run on mw1100 is CRITICAL Puppet has 1 failures [06:36:32] PROBLEM - puppet last run on mw2163 is CRITICAL Puppet has 1 failures [06:36:32] PROBLEM - puppet last run on mw2059 is CRITICAL Puppet has 1 failures [06:36:53] PROBLEM - puppet last run on mw2113 is CRITICAL Puppet has 1 failures [06:43:25] (03CR) 10Nikerabbit: [C: 031] CX: Fix typo fiu_vro -> fiu_vrowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215283 (owner: 10KartikMistry) [06:43:46] Nikerabbit: can you deploy it? [06:45:22] RECOVERY - puppet last run on mw2118 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:45:24] RECOVERY - puppet last run on mw1226 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1060 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:46:23] RECOVERY - puppet last run on mw1100 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on cp1061 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:42] RECOVERY - puppet last run on mw2163 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:46:52] RECOVERY - puppet last run on mw1217 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:52] RECOVERY - puppet last run on subra is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:47:03] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:47:13] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:47:13] RECOVERY - puppet last run on cp4008 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:47:13] RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:23] RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:23] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:47:33] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:33] RECOVERY - puppet last run on cp3040 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:33] RECOVERY - puppet last run on mw1123 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:48:13] RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:22] RECOVERY - puppet last run on mw2059 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:48:23] RECOVERY - puppet last run on mw1175 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:43] RECOVERY - puppet last run on mw2114 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:43] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:33] kart_: deploy or merge? [06:55:36] Nikerabbit: both [07:21:54] (03PS2) 10GWicke: Add basic alerts on RESTBase error rates and storage latencies [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) [07:22:39] (03CR) 10GWicke: Add basic alerts on RESTBase error rates and storage latencies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [07:24:32] (03CR) 10GWicke: "@Filippo, thanks! Added that stanza. I'm assuming the corresponding contact group is defined in puppet-private? IIRC there is already one " [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [07:33:02] (03CR) 10Nikerabbit: [C: 032] CX: Fix typo fiu_vro -> fiu_vrowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215283 (owner: 10KartikMistry) [07:33:08] (03Merged) 10jenkins-bot: CX: Fix typo fiu_vro -> fiu_vrowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215283 (owner: 10KartikMistry) [07:33:46] 6operations: Remove default cipers in OpenSSL - https://phabricator.wikimedia.org/T101082#1328307 (10MoritzMuehlenhoff) 3NEW a:3MoritzMuehlenhoff [07:36:16] !log nikerabbit Synchronized wmf-config/InitialiseSettings.php: Fixed wiki id for fiu_vro for CX beta feature (duration: 00m 13s) [07:36:21] Logged the message, Master [07:39:11] Thanks Nikerabbit [07:46:55] 6operations, 7discovery-system, 5services-tooling: [RFC] Define the on-disk and live structure of etcd pool data - https://phabricator.wikimedia.org/T100793#1328332 (10GWicke) I'm approaching this discussion more from the perspective of a discovery API consumer. As a consumer, I'm more concerned about ease-o... [08:15:39] 6operations, 10Deployment-Systems: Unhashable type: dict error when running salt --batch-size - https://phabricator.wikimedia.org/T99776#1328353 (10ArielGlenn) This fix is deployed on virt1000 and on the production salt master as of last week. Please watch over the next few days; if I have no more reports of... [08:18:52] 6operations, 10RESTBase-Cassandra: configure less aggressive cassandra log rotation - https://phabricator.wikimedia.org/T100970#1328355 (10GWicke) Yes, I think we should look into sending cassandra logs to logstash. That way it should be easier to plot trends, set up alerts etc. [08:19:10] 6operations, 10RESTBase-Cassandra: configure less aggressive cassandra log rotation / send cassandra logs to logstash - https://phabricator.wikimedia.org/T100970#1328356 (10GWicke) [08:26:27] 6operations, 10RESTBase-Cassandra: configure less aggressive cassandra log rotation / send cassandra logs to logstash - https://phabricator.wikimedia.org/T100970#1328368 (10mobrovac) Since Cassandra uses the /log4j/ back-end for logging, this should be straightforward to configure using Logstash' [log4j receiv... [08:32:32] PROBLEM - puppet last run on mw2200 is CRITICAL puppet fail [08:35:55] (03PS8) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [08:43:36] (03CR) 10Jcrespo: [C: 031] Tools: Add database alias for wikimania2016wiki [puppet] - 10https://gerrit.wikimedia.org/r/214718 (https://phabricator.wikimedia.org/T96638) (owner: 10Tim Landscheidt) [08:51:03] RECOVERY - puppet last run on mw2200 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [08:54:48] 6operations, 6Engineering-Community, 3ECT-May-2015: date/budget proposal for 2015 Ops Offsite - https://phabricator.wikimedia.org/T89023#1328390 (10mark) @RFarrand: I do want to get moving on this soon, but it would be good to have our budget clarified indeed. [08:56:30] 6operations, 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-100: Make a block-level copy of the codfw mirror of labstore1001 to eqiad - https://phabricator.wikimedia.org/T101010#1328391 (10mark) (Sparse) block level copying of the thin volumes started between the systems: ``` pv -eprab /dev/mapper/store-now_... [09:10:14] 6operations, 10Datasets-General-or-Unknown: snaphot1004 running dumps very slowly, investigate - https://phabricator.wikimedia.org/T98585#1328416 (10ArielGlenn) New June runs are underway for all wikis, stubs first. After a couple of days when these are all done I'll do tables next for all wikis, then the res... [09:10:23] PROBLEM - DPKG on db1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:13:40] 6operations, 7database: Enabling automatic buffer pool dumping on start/stop (puppet) for all servers - https://phabricator.wikimedia.org/T101009#1328418 (10jcrespo) Good things: * Dump takes 1 second (it only writes the LRU list, not the actual pages) even for a 200GB buffer pool * Load is asynchronous by de... [09:30:12] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 609 [09:35:12] RECOVERY - check_mysql on db1008 is OK: Uptime: 4049375 Threads: 1 Questions: 13872971 Slow queries: 26895 Opens: 64734 Flush tables: 2 Open tables: 64 Queries per second avg: 3.425 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:53:53] RECOVERY - DPKG on db1011 is OK: All packages OK [09:55:53] PROBLEM - DPKG on db1038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:12] PROBLEM - DPKG on db1040 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:13] PROBLEM - DPKG on db1024 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:22] PROBLEM - DPKG on db1041 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:23] PROBLEM - DPKG on db1047 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:24] PROBLEM - DPKG on db1048 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:31] ^strange [09:56:33] PROBLEM - DPKG on db1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:34] PROBLEM - DPKG on db1022 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:43] PROBLEM - DPKG on db1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:43] PROBLEM - DPKG on db1043 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:56:53] PROBLEM - DPKG on db1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:57:12] PROBLEM - DPKG on db1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:57:23] PROBLEM - DPKG on db1042 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:59:24] all those are still in 5.5 [10:00:48] jynus: that's alright, I'm currently removing unused fuse packages and when installing/removing packages icinga sometimes flags these alerts [10:00:58] ah! [10:01:09] good to know [10:12:06] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1328470 (10jcrespo) 3NEW a:3jcrespo [10:16:05] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1328478 (10jcrespo) While a baseline of "5.6/10 or higher" is needed for some of the features we want/need, and normalization is great for maintenance, I have some reserves against 100% of exact same OS an... [10:17:02] RECOVERY - DPKG on db1006 is OK: All packages OK [10:17:11] moritzm, maybe part of your field? if you want to provide some feedback, subscribe ^ [10:17:42] RECOVERY - DPKG on db1038 is OK: All packages OK [10:18:02] RECOVERY - DPKG on db1040 is OK: All packages OK [10:18:02] RECOVERY - DPKG on db1024 is OK: All packages OK [10:18:03] RECOVERY - DPKG on db1041 is OK: All packages OK [10:18:13] RECOVERY - DPKG on db1047 is OK: All packages OK [10:18:13] RECOVERY - DPKG on db1048 is OK: All packages OK [10:18:13] RECOVERY - DPKG on db1002 is OK: All packages OK [10:18:23] RECOVERY - DPKG on db1022 is OK: All packages OK [10:18:23] RECOVERY - DPKG on db1005 is OK: All packages OK [10:19:13] RECOVERY - DPKG on db1042 is OK: All packages OK [10:19:59] 6operations: Public Access to librenms - https://phabricator.wikimedia.org/T101067#1328482 (10faidon) I don't think we can, no. First off, I don't trust it all that much as a software, but more importantly, dilvuging all those bits of information would be too revealing and hurt our negotiating position with vend... [10:20:14] RECOVERY - DPKG on db1043 is OK: All packages OK [10:20:52] RECOVERY - DPKG on db1034 is OK: All packages OK [10:23:00] jynus: sure, let me followup on the phab task [10:32:24] (03CR) 10Phuedx: [C: 032] Disable WikiGrok in all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215109 (https://phabricator.wikimedia.org/T101016) (owner: 10Bmansurov) [10:32:31] (03Merged) 10jenkins-bot: Disable WikiGrok in all production wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215109 (https://phabricator.wikimedia.org/T101016) (owner: 10Bmansurov) [10:33:37] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1328502 (10MoritzMuehlenhoff) At least for the OS all machines should be updated consistently to trusty, which brings many important improvements to the low-level OS components. [10:35:43] moritzm, thanks! [10:36:20] I will discuss MySQL with sean, I am having some reserves against a potential future 10.1 [10:48:03] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [10:50:28] 6operations, 10Traffic: Spammer using //bits.wikimedia.org/geoiplookup - https://phabricator.wikimedia.org/T100902#1328516 (10faidon) I never liked geoiplookup.wm.org for precisely this reason: it can be easily abused. I'd love to see it go away. With the GeoIP cookie stuff + IPv6/libmaxminddb it's my understa... [10:52:13] 6operations: graphite2001 bios config issue - https://phabricator.wikimedia.org/T100959#1328517 (10faidon) [10:53:23] 6operations: graphite2001 bios config issue - https://phabricator.wikimedia.org/T100959#1324524 (10faidon) That's not the firmware, that's GRUB printing that. Let's do some basic debugging before we send it off to Papaul: do you see a) a grub prompt b) kernel output c) a getty on the serial port? [11:03:13] PROBLEM - puppet last run on mw1070 is CRITICAL Puppet has 1 failures [11:07:14] 6operations: Remove default ciphers in OpenSSL - https://phabricator.wikimedia.org/T101082#1328524 (10Aklapper) [11:10:03] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1328528 (10faidon) (trusty at minimum, ideally jessie!) [11:18:13] RECOVERY - puppet last run on mw1070 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:53:01] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1328599 (10mmodell) [11:53:32] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1328600 (10mmodell) 5Open>3stalled [12:08:21] (03CR) 1020after4: [C: 032] Lint JSON files [tools/scap] - 10https://gerrit.wikimedia.org/r/214288 (https://phabricator.wikimedia.org/T100600) (owner: 10Legoktm) [12:08:40] (03Merged) 10jenkins-bot: Lint JSON files [tools/scap] - 10https://gerrit.wikimedia.org/r/214288 (https://phabricator.wikimedia.org/T100600) (owner: 10Legoktm) [12:18:51] !log installed linux-tools-3.19.8-1 for jessie-wikimedia on carbon [12:18:58] Logged the message, Master [12:20:54] akosiaris, hi, has max spoken with you? [12:25:14] 6operations, 10Citoid, 6Services: Separate citoid service for beta that runs off master instead of deploy - https://phabricator.wikimedia.org/T92304#1328641 (10mobrovac) >>! In T92304#1284821, @Mvolz wrote: > @JDForrester-WMF, should we change beta to run off citoid.wmflabs.org instead of citoid.wikimedia.or... [12:26:18] 6operations, 10Citoid, 6Services: Separate citoid service for beta that runs off master instead of deploy - https://phabricator.wikimedia.org/T92304#1328643 (10mobrovac) [12:31:51] !log merged https://gerrit.wikimedia.org/r/#/c/214288/ and deployed scap [12:31:55] Logged the message, Master [12:32:59] 6operations: Backport and include linux-tools-3.19 to our jessie repository - https://phabricator.wikimedia.org/T100216#1328656 (10MoritzMuehlenhoff) I've created/forward-ported a linux-tools-3.19 package based on the last 3.18.5-1~exp1 upload. That was a real pain to build... I've run various tests with perf an... [12:35:29] (03PS1) 10Andrew Bogott: No longer set use_dnsmasq for new instances. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215317 [12:36:18] andrewbogott: thanks for the help last night, jessie does work for me, for now [12:36:35] that’s great! Is your current quota going to be adequate? [12:36:53] not going to change from one gigantic [12:37:12] ok, sounds good [12:37:17] should be enough in the current scale [12:37:29] if that changes, i'll let you know [12:38:26] (03CR) 10Alex Monk: [C: 031] No longer set use_dnsmasq for new instances. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215317 (owner: 10Andrew Bogott) [12:38:44] (03CR) 10Yuvipanda: [C: 031] No longer set use_dnsmasq for new instances. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215317 (owner: 10Andrew Bogott) [12:39:39] _joe_: the nudge is back [12:40:10] can you please enlighten me why the plan for image/video scalers is ==> trusty and not ==> jessie ? [12:50:57] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1328728 (10BBlack) So, varnish supports passing WS through, but only in "pipe" mode: https://www.varnish-cache.org/docs/3.0/tutorial/... [12:56:53] (03PS9) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [13:04:45] _joe_: I just applied https://gerrit.wikimedia.org/r/#/c/202924/ on an instance with role::puppet::self and it seems to work fine. Do you have any other reservations about that patch? Or is there another test case I should try first? [13:18:57] YuviPanda: when you’re done stress-testing, I’d appreciate a review of this: https://gerrit.wikimedia.org/r/#/c/202924/ [13:21:59] andrewbogott: it's a bank holiday in Italy [13:28:06] (03PS1) 10Jcrespo: Enable buffer pool load at start and dump at stop [puppet] - 10https://gerrit.wikimedia.org/r/215320 (https://phabricator.wikimedia.org/T101009) [13:30:12] Hm.. did we change upload.wikimedia.org caching? It seems none of the responses are being given a proper 304 response anymore. And there is also no longer an Expires header, so the client has do to a roundtrip each time. [13:30:29] The server then responds sometimes with 304 but a 304 with content, whereas it's supposed to not have content I think. [13:30:37] It has both ETag and Last-Modified too [13:32:42] PROBLEM - puppet last run on cp4005 is CRITICAL puppet fail [13:36:34] <_joe_> Krinkle: ask bblack, I don't think we ever removed an Expires: header, nor set it at the edge [13:37:02] <_joe_> matanya: how long do you want to wait before we migrate the image/videoscalers? a few months or virtually forever? [13:37:18] <_joe_> matanya: migrating to jessie requires a substantial amount of work [13:37:42] _joe_: well, aren't we already waiting for ever ? :P [13:37:48] <_joe_> andrewbogott: go on then, I had issues with external clients to a self-hosted puppetmaster [13:38:20] <_joe_> matanya: very funny. I guess I'll just release that ticket then, so that others more capable and less busy than me can go on with jessie [13:38:35] <_joe_> also, I'm off today, I'm just answering out of courtesy [13:39:09] _joe_: oh, I didn’t test that. I will make sure that that works before merging. Thanks. [13:39:25] <_joe_> andrewbogott: but I figured that should not be a problem in case you already migrated the main puppetmaster [13:39:32] <_joe_> but yeah, test it [13:42:32] 6operations, 6Commons, 6Multimedia, 7HHVM, and 4 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1328917 (10Joe) Since the latest test I did after ori found we needed that file was unsuccessful, and I have zero time to work on this at the moment, I'll release this ti... [13:42:42] 6operations, 6Commons, 6Multimedia, 7HHVM, and 4 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1328918 (10Joe) a:5Joe>3None [13:43:00] _joe_: sorry, no offiense [13:43:10] *offense, didn't mean to hurt [13:43:27] I truly apologize [13:44:03] your answer makes sense and i am sorry i am troubling you on your day off, thanks for replying. [13:44:04] <_joe_> matanya: oh you didn't [13:44:58] <_joe_> matanya: I'm sure someone else will grab that ticket, if no one does, I may take a look in the next quarter, maybe [13:45:18] (03PS1) 10Jcrespo: Depooling es1010 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215323 [13:45:52] _joe_: I am sorry again. [13:46:36] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU - https://phabricator.wikimedia.org/T86096#1328937 (10Joe) Since I have no time to work on the imagescalers/videoscalers and I don't think I'll be able to work on this for the forseeable future, I'll release the ticket in hope someone else will have tim... [13:46:47] 6operations, 7HHVM: Switch HAT appservers to trusty's ICU - https://phabricator.wikimedia.org/T86096#1328938 (10Joe) a:5Joe>3None [13:49:20] <_joe_> matanya: really, no hard feelings, you just made me realize that maybe I'm being a blocker (working on this maybe 2 hours a week) [13:49:32] RECOVERY - puppet last run on cp4005 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [13:49:50] thank you _joe_ [13:50:03] Krinkle: 13:30 < Krinkle> Hm.. did we change upload.wikimedia.org caching? It seems none of the responses are being given a proper 304 response anymore. And there is also no longer an Expires header, so the client has do to a roundtrip each time. [13:50:20] are you sure there's a change from a previously-observed behavior? [13:51:34] (as opposed to an observation of something that may have been faulty forever) [13:52:18] there was a change to the upload caches relatively-recently, which I wouldn't expect to have these effects, but it's possible I fail to understand something about the varnish<->swift interaction's effects. [13:53:46] (that change went live on May 22, a week and 4 days ago or so? [13:53:48] ) [13:54:21] (03PS1) 10Muehlenhoff: Update to 3.19.8-ckt1 [debs/linux] - 10https://gerrit.wikimedia.org/r/215324 [13:54:26] https://gerrit.wikimedia.org/r/#/c/212788/1/modules/role/manifests/cache/upload.pp [13:57:47] moritzm: so the .8-ckt1 is just the final .8 kernel.org + those sec fixes? [13:58:06] oh no, nevermind. I see it has its own changelog too [14:01:26] bblack: beside the three security fixes it also has fixes for the ext4 data corruption bugs that made the news a while ago [14:02:42] yeah [14:02:55] that only triggered on 4.0+ I thought? [14:04:23] at least one of them affected jessie's 3.16 kernel, let me double-check (but the circumstances to hit are very narrow) [14:04:32] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to contint-admins for Jan Zerebecki - https://phabricator.wikimedia.org/T98961#1328986 (10hashar) @JanZerebecki confirmed sudo works for him. Thanks! [14:08:40] the discard bug was introduced in 4.0, but backported to various stable kernels, our kernels are fine, it never made it to trusty/precise/jessie kernels and the fix for 3.19 was only in git, but not built [14:08:55] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to contint-admins for Jan Zerebecki - https://phabricator.wikimedia.org/T98961#1328991 (10Abraham) sorry guys I've been late - yes I approve this request :-) @dzahn thank you! [14:09:40] ah ok, cool [14:10:17] the extents bug dates back way longer (the fix was merged in 3.10.78), so precise/trusty/3.19 are affected, but it's hard to hit (after all it wasn't noted/fixed for several years) [14:23:15] bblack: The 304 responses being weird is probably just a bug in my debug tools. I see it works properly via cURL [14:23:26] bblack: However the ETag/Last-Modified being sent both is new afaik. [14:24:34] Krinkle: do you know when it changed? [14:25:07] (also, it is a problem that both are sent?) [14:32:36] 6operations, 10Traffic, 7HTTPS, 5HTTPS-by-default: Switch to ECDSA hybrid certificates - https://phabricator.wikimedia.org/T86654#1329572 (10konklone) > Another alternative would be to try to finish the work of the original patch author and get it accepted upstream, with all of the stapling stuff sorted ou... [14:39:16] bblack: It's not a problem, it's just silly. [14:39:36] Seems a waste to generate and maintain both. [14:39:43] highly uncommon. [14:42:05] 6operations, 6Release-Engineering: Try out hack (3None Don't want to lick this cookie too hard, haven't had time to play with it. We'll have to change some settings in HHVM (forgot which one) and scap & co will... [14:44:03] !testwiki [14:44:07] (03PS1) 10Aklapper: Improve static-bugzilla frontpage (mention Phabricator etc.) [puppet] - 10https://gerrit.wikimedia.org/r/215328 [14:45:52] !testwiki is mw1017 [14:45:53] Key was added [14:49:12] PROBLEM - puppet last run on mw1164 is CRITICAL Puppet has 1 failures [14:58:17] andrewbogott: ping for SWAT in 2 minutes [14:58:25] yep, I’m here. thanks. [15:00:05] manybubbles, anomie, ^d, thcipriani, marktraceur, andrewbogott: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T1500). Please do the needful. [15:00:05] (03PS10) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [15:00:07] (03PS1) 10Andrew Bogott: For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 [15:00:18] (03CR) 10John F. Lewis: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/215328 (owner: 10Aklapper) [15:00:33] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215317 (owner: 10Andrew Bogott) [15:00:39] (03Merged) 10jenkins-bot: No longer set use_dnsmasq for new instances. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215317 (owner: 10Andrew Bogott) [15:00:45] (03CR) 10jenkins-bot: [V: 04-1] For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 (owner: 10Andrew Bogott) [15:00:50] (03CR) 10jenkins-bot: [V: 04-1] For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 (owner: 10Andrew Bogott) [15:01:05] (03CR) 10Andrew Bogott: "Note that after this is merged I will go through ldap by hand and correct existing puppetmaster settings." [puppet] - 10https://gerrit.wikimedia.org/r/215333 (owner: 10Andrew Bogott) [15:02:09] andrewbogott: btw, turning use_dnsmasq off doesn't mean the $instancename.eqiad.wmflabs will stop working, right? [15:02:23] It will still work [15:02:36] but puppet code that uses $domain and $fqdn will get the new fqdn. [15:03:20] !log thcipriani Synchronized wmf-config/wikitech.php: SWAT: No longer set use_dnsmasq for new instances. [[gerrit:215317]] (duration: 00m 12s) [15:03:23] Logged the message, Master [15:03:24] andrewbogott: ok! [15:03:42] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [15:03:45] ^ andrewbogott should be deployed! Test please. [15:03:50] ok! [15:03:55] (03PS2) 10Andrew Bogott: For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 [15:03:57] (03PS11) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [15:04:35] (03CR) 10jenkins-bot: [V: 04-1] For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 (owner: 10Andrew Bogott) [15:04:49] (03CR) 10jenkins-bot: [V: 04-1] For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 (owner: 10Andrew Bogott) [15:05:17] 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1329741 (10Aklapper) Great great work. Timing sounds good to me. Anyone planning to draft that announcement email? I'm happy to help (or even draft it, just want... [15:05:53] RECOVERY - puppet last run on mw1164 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [15:07:34] 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 7Tracking: Tracking: Remove Bugzilla from production - https://phabricator.wikimedia.org/T95184#1329749 (10JohnLewis) I believe Daniel's workload is mixed at the minute so if you want to draft an email @aklapper that'll be extremely helpful. [15:09:04] thcipriani: that change works, although there’s a sub-problem that I haven’t diagnosed yet. May require a followup patch, not sure. [15:10:01] andrewbogott: kk, deploy window is open for another 50 minutes or so, plus there's evening SWAT [15:10:38] (03PS2) 10Yuvipanda: ores: Specify protocol explicitly for nginx backend [puppet] - 10https://gerrit.wikimedia.org/r/214912 [15:10:46] (03CR) 10Yuvipanda: [C: 032 V: 032] ores: Specify protocol explicitly for nginx backend [puppet] - 10https://gerrit.wikimedia.org/r/214912 (owner: 10Yuvipanda) [15:14:03] (03PS1) 10Andrew Bogott: Remove call to onInstanceActionCompletion.php after signing [puppet] - 10https://gerrit.wikimedia.org/r/215334 [15:14:14] YuviPanda: can I get a quick review for ^ ? It’s causing some breakage. [15:14:40] andrewbogott: yeah, I don't even know what that was doing [15:14:50] (03PS2) 10Andrew Bogott: Remove call to onInstanceActionCompletion.php after signing [puppet] - 10https://gerrit.wikimedia.org/r/215334 [15:14:52] It’s for echo [15:15:01] (03CR) 10Yuvipanda: [C: 031] Remove call to onInstanceActionCompletion.php after signing [puppet] - 10https://gerrit.wikimedia.org/r/215334 (owner: 10Andrew Bogott) [15:15:04] Which, I’m not sure where that should happen, but I’d say not there. [15:15:20] andrewbogott: hmm, even then - wikitech's not on the same host as puppetmaster anymore [15:15:29] right [15:15:33] andrewbogott: +1'd it [15:15:35] thanks [15:15:49] (03CR) 10Andrew Bogott: [C: 032] Remove call to onInstanceActionCompletion.php after signing [puppet] - 10https://gerrit.wikimedia.org/r/215334 (owner: 10Andrew Bogott) [15:20:59] andrewbogott, it used to send notifications when instances finished building [15:21:23] Krenair: yeah, seems like a nice feature but it would need to be redesigned somehow. [15:21:35] I missed that one when I split off wikitech onto silver. Sorry about ripping out your feature :( [15:22:08] thcipriani: instance creation is working now, so you can stand down. Thank you! [15:22:23] andrewbogott: kk, thanks :) [15:22:29] I'm not really convinced a lot of people used them anyway [15:22:47] and with that SWAT is complete [15:23:35] Ryan brought up the idea and I thought it could be useful [15:26:10] (03PS3) 10Andrew Bogott: For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 [15:26:12] (03PS12) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [15:26:48] (03CR) 10jenkins-bot: [V: 04-1] For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 (owner: 10Andrew Bogott) [15:26:53] (03CR) 10jenkins-bot: [V: 04-1] For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 (owner: 10Andrew Bogott) [15:28:29] (03PS4) 10Andrew Bogott: For self-hosted puppet, require simple puppetmaster name. [puppet] - 10https://gerrit.wikimedia.org/r/215333 [15:28:31] (03PS13) 10Andrew Bogott: For cert names, use the fqdn instead of the ec2id if use_dnsmasq is lowered. [puppet] - 10https://gerrit.wikimedia.org/r/202924 [15:29:43] thcipriani: can you live with this? https://gerrit.wikimedia.org/r/#/c/215333/ [15:29:50] (looks like you touched the related code recently) [15:31:43] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1329824 (10Cmjohnson) The restbase servers and ssds arrived on-site. [15:31:54] andrewbogott: I don't have a problem with it, have to update deployment-prep, integration, and staging before it rolls though. Simple Hiera update, shouldn't be a big deal. [15:32:27] thcipriani: ok… I was going to fix things in ldap but I didn’t think that there might be local hiera settings for the name. [15:32:50] I’ll make you a phab task [15:33:09] andrewbogott: ah, right, well, hiera will override ldap in labs. Phab task would be good. Thanks :) [15:33:27] er, the hiera page on wikitech. [15:35:36] thcipriani: https://phabricator.wikimedia.org/T101110 [15:35:47] * thcipriani looks [15:36:07] That should happen before I switch things over to the new dns setup. Deadline for that is Monday, although I’d like to merge my enforcement patch sooner than that. [15:37:40] andrewbogott: shouldn't take me too long to check everything. I'll set priority to high, likely finished by tomorrow. I'll update phab ticket if I hit any snags. [15:41:59] (03PS2) 10Jcrespo: Depooling es1010 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215323 [15:42:16] 6operations, 10Analytics-Cluster, 10hardware-requests: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1329868 (10Ottomata) Ok cool, noted for the future danke. How goes? :) [15:42:57] 6operations, 10Analytics-Cluster, 10hardware-requests: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1329872 (10Ottomata) Oh, also, same number of cores please :) [15:44:11] 6operations, 6Release-Engineering: Try out hack ( !log jynus Synchronized wmf-config/db-eqiad.php: depool es1010 (duration: 00m 13s) [15:44:23] Logged the message, Master [15:45:39] 6operations, 10ops-eqiad: Setup/rack new restbase servers - https://phabricator.wikimedia.org/T101112#1329877 (10Cmjohnson) 3NEW a:3Cmjohnson [15:45:49] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#1329886 (10bd808) [15:45:52] 6operations, 6Release-Engineering: Try out hack ( 6operations, 10ops-eqiad: Setup/rack new restbase servers - https://phabricator.wikimedia.org/T101112#1329888 (10Cmjohnson) [15:46:57] (03PS1) 10Cmjohnson: Adding dns entries for restbase1006-1009 (T101112) [dns] - 10https://gerrit.wikimedia.org/r/215339 [15:48:02] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1329895 (10mmodell) @BBlack: how would that work with both varnish and nginx in front of phabricator? They would both have to be able... [15:48:34] (03PS1) 10Yuvipanda: dynamicproxy: Hardcode resolver URL to new designate server [puppet] - 10https://gerrit.wikimedia.org/r/215340 [15:48:41] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for restbase1006-1009 (T101112) [dns] - 10https://gerrit.wikimedia.org/r/215339 (owner: 10Cmjohnson) [15:48:46] (03PS2) 10Yuvipanda: dynamicproxy: Hardcode resolver URL to new designate server [puppet] - 10https://gerrit.wikimedia.org/r/215340 [15:48:52] (03CR) 10Yuvipanda: [C: 032 V: 032] dynamicproxy: Hardcode resolver URL to new designate server [puppet] - 10https://gerrit.wikimedia.org/r/215340 (owner: 10Yuvipanda) [15:58:07] 10Ops-Access-Requests, 6operations: Login for jkrauska to librenms - https://phabricator.wikimedia.org/T101064#1329972 (10Dzahn) [15:59:05] (03CR) 10Muehlenhoff: [C: 032 V: 032] Update to 3.19.8-ckt1 [debs/linux] - 10https://gerrit.wikimedia.org/r/215324 (owner: 10Muehlenhoff) [16:03:34] 6operations: Requesting access to create projects in Phab. - https://phabricator.wikimedia.org/T101117#1330019 (10Dbrant) 3NEW [16:05:05] dr0ptp4kt: mind approving? https://phabricator.wikimedia.org/T101117 [16:10:30] 10Ops-Access-Requests, 6operations: Login for jkrauska to librenms - https://phabricator.wikimedia.org/T101064#1330090 (10faidon) All access requests should be accompanied by a rationale. What do you need this for? [16:16:43] "Loading cpufreq kernel modules...[fail]" on boot [16:16:43] didn't someone changed this not a long tim ago? [16:17:13] wasn't bblack talking about ^ earlier? [16:17:57] let me check the log [16:23:37] dbrant: not an operations responsibility btw. Tagged as phab and doesn't need any approval from managers or so :) [16:23:58] JohnFLewis: cool! wasn't sure; thanks [16:27:32] dbrant: done [16:28:40] jouncebot: next [16:28:40] In 0 hour(s) and 31 minute(s): Mailman Maintainance (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T1700) [16:28:40] (03CR) 10Filippo Giunchedi: Add basic alerts on RESTBase error rates and storage latencies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [16:31:27] so is this broken normal? http://stream.wikimedia.org/rc = 404 while wikitech says " publishes this on the endpoint stream.wikimedia.org/rc" [16:31:35] broken normal [16:31:41] jynus: which kernel? [16:31:43] or :p [16:32:16] mutante: is there a thing such as "broken normal"? [16:32:29] Linux es1010 3.13.0-53-generic #89-Ubuntu SMP [16:33:16] it's something different, then. the fix mentioned by brandon was for the 3.19 kernels [16:33:19] 10Ops-Access-Requests, 6operations: Login for jkrauska to librenms - https://phabricator.wikimedia.org/T101064#1330203 (10chasemp) p:5Triage>3Normal [16:33:22] JohnFLewis: i just missed the "or", but yes there is, it's "SNAFU" [16:33:38] Where the hell is sodium calling in the install cert parameter for the lists certificate =P [16:33:56] robh: not in mail.pp? [16:34:10] not that i can see [16:34:12] i see an ssl check [16:34:17] but no actual install [16:34:29] then i see in the apache template file the reference, but thats not an install, just reference [16:35:00] I don't trust my change to work until I see how its implemented (well, i also wont merge it, heh, cuz thats just bad) [16:35:36] JohnFLewis: ah, it's like this it seems, got link from godog https://phabricator.wikimedia.org/T69955 [16:35:43] (03PS1) 10Faidon Liambotis: certs: remove random certificates::* includes [puppet] - 10https://gerrit.wikimedia.org/r/215346 [16:35:45] (03PS1) 10Faidon Liambotis: certs: kill a bunch of Labs classes [puppet] - 10https://gerrit.wikimedia.org/r/215347 [16:35:47] (03PS1) 10Faidon Liambotis: certs: inline certificate:: classes to ::base [puppet] - 10https://gerrit.wikimedia.org/r/215348 [16:35:49] (03PS1) 10Faidon Liambotis: base: certificates::base -> base::certificates [puppet] - 10https://gerrit.wikimedia.org/r/215349 [16:35:51] (03PS1) 10Faidon Liambotis: sslcert: include ::chainedcert from ::certificate [puppet] - 10https://gerrit.wikimedia.org/r/215350 [16:35:53] (03PS1) 10Faidon Liambotis: sslcert: remove ::certificate's $content parameter [puppet] - 10https://gerrit.wikimedia.org/r/215351 [16:35:55] (03PS1) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [16:36:03] 6operations: Public Access to librenms - https://phabricator.wikimedia.org/T101067#1330208 (10chasemp) 5Open>3declined a:3chasemp >>! In T101067#1328482, @faidon wrote: > I don't think we can, no. First off, I don't trust it all that much as a software, but more importantly, dilvuging all those bits of inf... [16:36:03] let's see how many jenkins failure I'll get [16:36:08] bblack: ^ all completely untested [16:36:17] robh: looks like it does not [16:36:37] so do we have a cert on a box that is manually installed and not defined by puppet? [16:36:42] it certainly seems that way [16:36:45] (sodium) [16:36:47] seems like it, yea [16:36:57] or was perhaps previously puppet managed but has since ceased [16:37:00] ok, well, thats ok... [16:37:04] since we'll fix it today. [16:37:11] but otherwise, eww. [16:37:19] right, yes [16:37:22] to both [16:37:28] I just wanted to make sure I wasn't missing something [16:37:49] As I see absolutely nothing that actually sets the installation of the certificate for lists.wikimedia.org on sodium. [16:38:17] With paravoid's newest change, I should then simply be able to call install_certificate and be done with it =] [16:38:30] (except the update issue, which im not sure about how to work around or if it applies here) [16:38:43] seems like it does, as its a new intermediary for the cert (though not new on our cluster) [16:39:30] paravoid: will the ssl changes be ok for me to update the sha256 cert on lists.wikimedia.org during my planned maint in about 20 minutes? [16:39:52] and if so, can I simply define install_certificate for the host and what else do i need to do (still manaually remove chained file?) [16:40:24] robh: as long as the intermediate cert is there globally. "basically don't include a "ca" parameter with install_certificate, but do be sure any new intermediates that we're not already using are installed for everyone in manifests/certs.pp in general." [16:40:36] mutante: yes, i have that [16:40:40] which doesnt address my question [16:40:51] in another channel he mentioned an update issue [16:41:05] and what im doing is an update to an existing cert [16:41:17] i quoted that because you said it's a new intermediate [16:41:23] ok, i didn't read that yet [16:41:29] well, its new to the cert but not to the cluster [16:41:30] robh: yes and yes [16:41:40] :) [16:41:41] paravoid: ok, coolness [16:41:44] thx! [16:41:46] manually remove chained file, for now [16:41:58] will fix, but not in the next 20' [16:42:09] no worries, easy enough to do [16:42:18] (03PS1) 10Faidon Liambotis: sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 [16:42:21] there you go :) [16:42:23] thanks for simplifying the entire process though, not having to determine intermediaries and the like [16:42:36] oh I'm going to simplify it even more [16:42:43] mutante: sorry if my tone was harsh, it totally wasnt meant to be =] [16:42:51] but yes, you're welcome -- thank bblack for rolling it out [16:42:59] I just threw code over the wall :D [16:44:30] (03PS2) 10Faidon Liambotis: certs: replace require by collector ordering [puppet] - 10https://gerrit.wikimedia.org/r/215352 [16:44:32] (03PS2) 10Faidon Liambotis: sslcert: automatically regenerate chained cert on changes [puppet] - 10https://gerrit.wikimedia.org/r/215353 [16:45:37] paravoid: I think your <| |> collector would only grab ones that are already instantiated for that host, it's not going to avoid the need to require/include them first. [16:45:56] what do you mean? [16:46:33] oh ignore me, I'm reading one of the last patches, not the first [16:46:41] probably it was addressed in an earlier one [16:47:55] (03PS1) 10Yuvipanda: tools: Add an /etc/hosts entry for localhost [puppet] - 10https://gerrit.wikimedia.org/r/215355 [16:47:58] Coren: ^ [16:48:05] paravoid: stepping back a sec from the real content of your changes: why do we install the globalsign root, and not apparently the intermediates, in that list? [16:48:20] I haven't audited those CAs yet [16:48:40] I mean apparently it works today in practice, but I've never understood how/why [16:48:42] I'd expect the globalsign root to be already in debian/ubuntu's default certificate stores [16:48:47] so I'm not sure why we're shipping it [16:49:34] lrwxrwxrwx 1 root root 57 Mar 14 21:16 GlobalSign_Root_CA.pem -> /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA.crt [16:49:36] well what I mean is, we're using an intermediate there too aren't we? [16:49:37] lrwxrwxrwx 1 root root 62 Mar 14 21:16 GlobalSign_Root_CA_-_R2.pem -> /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA_-_R2.crt [16:49:40] lrwxrwxrwx 1 root root 62 Mar 14 21:16 GlobalSign_Root_CA_-_R3.pem -> /usr/share/ca-certificates/mozilla/GlobalSign_Root_CA_-_R3.crt [16:49:43] these are shipped [16:49:45] lrwxrwxrwx 1 root root 50 Mar 14 21:51 GlobalSign_CA.pem -> /usr/local/share/ca-certificates/GlobalSign_CA.crt [16:49:48] this is not [16:50:17] oh it's just bad naming confusing me [16:50:30] our installed "GlobalSign_CA.crt" is "Subject: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2" [16:50:45] it should just be named better so it doesn't sound like a real Root [16:50:50] it's a specific intermediate [16:51:03] for i in GlobalSign_*; do echo $(readlink $i); openssl x509 -in $i -noout -subject -issuer; echo; done [16:51:45] TL;DR: what you said [16:51:46] :) [16:53:16] (03Abandoned) 10Yuvipanda: tools: Add an /etc/hosts entry for localhost [puppet] - 10https://gerrit.wikimedia.org/r/215355 (owner: 10Yuvipanda) [16:53:18] (03PS2) 10RobH: lists.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214680 (https://phabricator.wikimedia.org/T100832) [16:53:23] ok I'm switching over to questions in commit comments instead, there's a lot [16:53:29] heh [16:54:13] I wrote these in a hurry (again) [16:54:22] well I need to read it all through [16:54:37] (03CR) 10Dzahn: "blocked on https://gerrit.wikimedia.org/r/#/c/206083/ ??" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206300 (https://phabricator.wikimedia.org/T96468) (owner: 10Dzahn) [16:54:44] my first thoughts are along the lines of "yeah but installed_certificate might not be the only reason we install a given certificates:: file" [16:54:45] eep, found a bug already [16:54:53] err install_certificate [16:55:04] what are you referring to? [16:55:28] (03PS2) 10Krinkle: webperf: Remove JQMigrateUsage deprecate handler [puppet] - 10https://gerrit.wikimedia.org/r/210263 [16:55:53] (03CR) 10Dzahn: [C: 032] Improve static-bugzilla frontpage (mention Phabricator etc.) [puppet] - 10https://gerrit.wikimedia.org/r/215328 (owner: 10Aklapper) [16:55:54] the removal of random certificates::foo all over puppet, on the assumption that having them in install_certificate is good enough. having them all in base:: might be? [16:56:07] we do install certs for reasons other than install_certificate [16:56:27] (03PS4) 10Jcrespo: mariadb: indentation fixes [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/211357 (owner: 10Dzahn) [16:56:37] https://gerrit.wikimedia.org/r/#/c/215346/1 you mean [16:56:50] yeah [16:56:57] I guess I failed at moving my comments over to gerrit heh [16:57:03] (03CR) 10Jcrespo: [C: 031] mariadb: indentation fixes [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/211357 (owner: 10Dzahn) [16:57:21] that is true, although there are spurious ones for sure [16:57:22] e.g. icinga's [16:57:31] I think there are labs/ldap-ish cases where they're installing something with certificate::foo and not using install_certificate [16:57:38] but in any case, I /am/ moving them to base a couple of patches later :) [16:57:45] ok :) [16:58:10] (03CR) 10Dzahn: [C: 032] "thx" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/211357 (owner: 10Dzahn) [16:58:12] (03PS4) 10Andrew Bogott: Allow for new labs domain schema in ENC [puppet] - 10https://gerrit.wikimedia.org/r/202790 (owner: 10Thcipriani) [16:58:23] but you're right, this may not be correct for the Labs case as a standalone patch [16:58:45] Labs/LDAP [16:58:55] robh: re: lists cert patch, it was unpuppetized before? [16:59:12] well, i have an error in my current one, but yea [16:59:15] andrewbogott: getting rid of ec2id? \o/ [16:59:20] i cannot find any referene to the cert installation on sodium [16:59:26] daniel did a quick check and confirmed the same [16:59:29] that's awesome heh [16:59:33] :/ [16:59:34] indeed =P [16:59:43] the apache template that uses the cert is in puppet, but not install_certificate it looks, yea [17:00:04] RobH: Respected human, time to deploy Mailman Maintainance (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T1700). Please do the needful. [17:00:26] robh: I guess your error is the random change to the nrpe check [17:00:41] yep [17:00:48] accidental typo introduction [17:01:00] and now my local git repo is in some kind of git fuckign hell [17:01:03] and im having issues resolving [17:02:17] I can amend if you want [17:02:25] please do im tired of fighting this [17:02:38] and i wanna figure it out rather than blow it away and start over, but i dont wanna wait on my maint ;] [17:02:45] so i rather leave it in a bad state and figure it out later today. [17:02:50] (on my local repo) [17:03:03] (03PS3) 10BBlack: lists.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214680 (https://phabricator.wikimedia.org/T100832) (owner: 10RobH) [17:03:26] (03CR) 10BBlack: [C: 031] lists.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214680 (https://phabricator.wikimedia.org/T100832) (owner: 10RobH) [17:03:29] looks good, thank you =] [17:04:07] matanya: are we doing ensure => ‘present’ these days or ensure => present? [17:04:10] !log starting the lists.wikimedia.org certificate update, archives will offline during this process [17:04:14] Logged the message, Master [17:04:24] (03PS3) 10Rush: Deployment group for trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/209045 (https://phabricator.wikimedia.org/T97775) (owner: 10Thcipriani) [17:04:34] (03CR) 10RobH: [C: 032] lists.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214680 (https://phabricator.wikimedia.org/T100832) (owner: 10RobH) [17:05:30] (03PS1) 10Jcrespo: Repool es1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215358 [17:05:42] JohnFLewis: fyi im going to do the cert replacement, and then the rename [17:05:57] robh: okay :) [17:05:59] (03PS4) 10Rush: Deployment group for trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/209045 (https://phabricator.wikimedia.org/T97775) (owner: 10Thcipriani) [17:06:21] wee, i see puppet picking the sha256 intermediary automatically [17:06:22] woot! [17:06:43] cool! [17:06:48] (03PS2) 10Jcrespo: Repool es1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215358 [17:07:31] !log lists.wikimedia.org is now sha256 cert [17:07:35] Logged the message, Master [17:07:37] (03CR) 10Rush: [C: 032 V: 032] "thcipriani is going to babysit this a bit as designated releng dude, seems like a total noop post manual fixes. It has also been tested i" [puppet] - 10https://gerrit.wikimedia.org/r/209045 (https://phabricator.wikimedia.org/T97775) (owner: 10Thcipriani) [17:07:43] so it didnt rehup apache on cert replacement, which didnt change the cert presented [17:08:12] but, that may be part of the changes being worked on, for now required manual kick [17:08:37] 7Puppet, 6operations, 10Beta-Cluster, 5Patch-For-Review: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1330437 (10chasemp) 5Open>3Resolved a:3chasemp merged [17:08:44] 6operations, 7HTTPS: replace lists.wikimedia.org's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100832#1330440 (10RobH) [17:09:39] (03CR) 10Dzahn: [C: 031] "happened to run across this looking at "LightProcess"" [puppet] - 10https://gerrit.wikimedia.org/r/215187 (owner: 10Ori.livneh) [17:10:40] JohnFLewis: so thanks for making the patchset for the renames =] [17:10:47] im moving onto that now [17:10:58] okay [17:11:04] (03PS7) 10Andrew Bogott: dnsrecursor: ensure => 'present' rather than 'latest' [puppet] - 10https://gerrit.wikimedia.org/r/211060 [17:11:25] (03CR) 10Andrew Bogott: [C: 032] dnsrecursor: ensure => 'present' rather than 'latest' [puppet] - 10https://gerrit.wikimedia.org/r/211060 (owner: 10Andrew Bogott) [17:12:10] 10Ops-Access-Requests, 6operations: Login for jkrauska to librenms - https://phabricator.wikimedia.org/T101064#1330449 (10JKrauska) As a direct user of ulsfo bandwidth, it would be helpful to identify when those links are having problems. [17:12:32] chasemp: shall I merge this change to deployment_server.pp? [17:12:54] Error: You are not authorized to create new mailing lists [17:12:56] wtf... [17:13:03] is the list admin password not able to create lists? [17:13:39] robh: what list(s) do you need? I'll magic them up quickly [17:13:48] andrewbogott: yes please sorry [17:13:50] hrmm, i think i just have old pass [17:13:53] lemme test one more thing [17:14:05] andrewbogott: I got distracted thank you [17:14:12] robh: the master password can create new lists, but maybe you have an outdated one, it was changed not that long ago [17:14:18] yea, old pass [17:14:22] okay :p [17:14:22] i made list no problem [17:14:44] (03PS1) 10Jdlrobson: Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) [17:15:49] (03CR) 10Bmansurov: Enable MediaWiki logo on mobile login page (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [17:16:19] 6operations: Public Access to librenms - https://phabricator.wikimedia.org/T101067#1330499 (10JKrauska) @faidon I can't speak to the vendor leverage issue, but as a network person I always found are open torrus stats to be incredibly helpful/interesting. I see there's a ticket pointing out that torrus is brok... [17:16:57] list naming inconsistency makes me want to cry :( [17:17:22] (03PS3) 10Dzahn: icinga.wikimedia.org cert sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214674 (https://phabricator.wikimedia.org/T100830) (owner: 10RobH) [17:17:32] it makes me cry-l [17:17:52] haha :) [17:18:10] (03CR) 10Dzahn: [C: 031] "amended to _not_ set the intermediate cert anymore" [puppet] - 10https://gerrit.wikimedia.org/r/214674 (https://phabricator.wikimedia.org/T100830) (owner: 10RobH) [17:18:26] !log mailing list traffic halted for list renames [17:18:30] Logged the message, Master [17:18:38] ok, now we are in mailing list downtime, copying the config files for pywikibot list [17:18:40] robh: the dramaz must flow [17:19:05] will the list messages queue? [17:19:16] cajoel: yes [17:19:44] ori: no repeat of last time hopefully :p [17:22:04] 6operations, 6Commons, 10MediaWiki-Database, 6Multimedia, and 2 others: internal_api_error_DBQueryError: Database query error while (mass) deleting file over api - https://phabricator.wikimedia.org/T98706#1330544 (10chasemp) a:3aaron @aaron, did your patch address this successfully? I'm not sure where t... [17:23:10] argh,,, why am i having a brain fart on the copy of mboxing properly, arghhh [17:23:12] * robh still workign [17:23:46] (03PS9) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [17:23:52] 6operations, 7HTTPS, 5Patch-For-Review: replace librenms's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100831#1330552 (10chasemp) a:3RobH Don't shoot me Robh, it just seems like this is your deal :) [17:24:03] 6operations, 7HTTPS, 5Patch-For-Review: replace librenms's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100831#1330555 (10chasemp) p:5High>3Normal [17:24:10] (03PS4) 10Dzahn: wikitech.wikimedia.org certificate sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214666 (https://phabricator.wikimedia.org/T92709) (owner: 10RobH) [17:25:10] robh: don't worry - you have two hours [17:25:18] (03CR) 10Dzahn: [C: 031] "amended to _not_ set the intermediate cert anymore" [puppet] - 10https://gerrit.wikimedia.org/r/214666 (https://phabricator.wikimedia.org/T92709) (owner: 10RobH) [17:25:20] (03CR) 10Tim Landscheidt: "Perhaps the entry for the IP is managed by DHCP/resolvconf?" [puppet] - 10https://gerrit.wikimedia.org/r/215355 (owner: 10Yuvipanda) [17:25:36] then you have the 'crap something is wrong - we need 3 more hours' excuse like last time ;) [17:26:09] 6operations, 7HHVM, 5Patch-For-Review: investigate HHVM mysqlExtension::ConnectTimeout - https://phabricator.wikimedia.org/T98489#1330560 (10chasemp) a:3jcrespo thanks! [17:26:13] (03CR) 10Yuvipanda: "Very possibly. Coren found a solution via http://gridscheduler.sourceforge.net/htmlman/htmlman5/host_aliases.html though." [puppet] - 10https://gerrit.wikimedia.org/r/215355 (owner: 10Yuvipanda) [17:27:32] Cannot open mbox file /var/lib/mailman/archives/private/pywikibot.mbox/pywikibot.mbox: [Errno 2] No such file or directory: '/var/lib/mailman/archives/private/pywikibot.mbox/pywikibot.mbox' [17:27:47] uhh, it totally is i copied it and it has the right permissions... [17:28:07] (03PS1) 10Ori.livneh: don't include varnishstatsd in kafka role [puppet] - 10https://gerrit.wikimedia.org/r/215362 [17:29:10] robh: ls the file works? [17:29:55] yep [17:29:57] (03PS2) 10Ori.livneh: Apply role::cache::statsd on bits [puppet] - 10https://gerrit.wikimedia.org/r/215362 [17:30:01] ls's the same as the original pywikipedia-l [17:30:18] godog: ^ [17:30:42] both 47m [17:30:43] the directory doesn't have the -l but the file inside it does [17:30:45] pywikipedia-l.mbox [17:30:52] ha [17:30:57] mutante: duh, thank you. [17:31:37] so i have to rename them all? [17:31:41] i dont recall doing that before... [17:32:00] * JohnFLewis silently looks on echoing https://phabricator.wikimedia.org/T99734 [17:32:05] but what mutante said yeah [17:32:09] heh [17:32:11] robh: you did :) [17:33:34] (03CR) 10Filippo Giunchedi: [C: 04-1] Apply role::cache::statsd on bits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215362 (owner: 10Ori.livneh) [17:33:55] godog: ohhhhh. good catch. [17:34:53] (03PS3) 10Ori.livneh: Apply role::cache::statsd on bits [puppet] - 10https://gerrit.wikimedia.org/r/215362 [17:35:37] its owrkign now [17:35:39] working even [17:35:40] ori: heh, you are correct though that class shouldn't be required anymore by kafka [17:35:50] rebuilding in arch for pywikibot [17:36:13] JohnFLewis: mailman was specifically called out as a next quarter project in the meeting this week =] [17:36:30] robh: mark told me :) [17:36:33] argh, pywikipedia-l is a busy list [17:36:38] its goign to take a bit to rebuild [17:36:47] (03CR) 10Filippo Giunchedi: [C: 031] Apply role::cache::statsd on bits [puppet] - 10https://gerrit.wikimedia.org/r/215362 (owner: 10Ori.livneh) [17:37:20] JohnFLewis: since we are redirecting, there is no reason to keep the old archives online right? [17:37:26] we're redirecting old stuff for it [17:37:32] so having it on sodium as a dupe is silly right? [17:37:44] (03CR) 10Ori.livneh: [C: 032] Apply role::cache::statsd on bits [puppet] - 10https://gerrit.wikimedia.org/r/215362 (owner: 10Ori.livneh) [17:37:46] (03PS5) 10Dduvall: ci: Role for running Raita [puppet] - 10https://gerrit.wikimedia.org/r/208024 [17:37:49] (once we confirm rebuild and new version is online and working of course) [17:37:52] I'm unsure if apache redirects the archives as well though [17:38:08] we'll have to test, if not we may want to add that in the future [17:38:13] so prevent data duplication [17:38:17] just thinking aloud =] [17:38:19] (03PS6) 10Dduvall: ci: Role for running Raita [puppet] - 10https://gerrit.wikimedia.org/r/208024 [17:38:25] yeah [17:38:43] 2010 figuring article archives for pywikibot [17:39:18] expects that list to be full of subversion commit messages :p [17:39:32] yea [17:39:33] godog: applied correctly, it seems [17:39:44] i went to prestage my next chagne in web ui [17:39:51] \o/ I'll watch graphite [17:39:53] then i recalled my changes are now offlining it until it finishes [17:39:55] =P [17:41:08] (03CR) 10Dzahn: "@Paladox so the "match = "\\b[bB][uU][gG]\\:?\\s+#?(\\d+)\\b"" is identical for Bugzilla and Phabricator, that's ok?" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [17:41:47] ok, done with figuring archives and now rebuilding actual html [17:42:33] 9k of them. [17:42:38] its at 300. [17:43:48] heh, pywikipedia-l is the largest (i think) of the three lists for rename [17:43:53] godog: one small issue: for varnish.eqiad.backends.ipv4_10_2_2_1, i see counts for 'Wxx', 'oxx', 'vxx', 'yxx'. So I guess first_letter_of_response_code + 'xx' is not correct. [17:44:39] likely will be [17:45:23] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [17:45:32] 2900 of 9000 [17:45:34] ori: indeed, perhaps it needs casting to signed int :P [17:45:39] its moving right along [17:46:13] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [17:47:06] (03CR) 10Dzahn: [C: 032] icinga.wikimedia.org cert sha1 to sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214674 (https://phabricator.wikimedia.org/T100830) (owner: 10RobH) [17:47:47] watches the cert update on icinga [17:48:15] ori: not related to your change but I'm seeing 'rcvbuferrors' inbound, investigating that (last graph on https://gdash.wikimedia.org/dashboards/graphite.eqiad/) [17:48:25] 7Blocked-on-Operations, 6operations, 10Maps, 6Scrum-of-Scrums, 10hardware-requests: Eqiad Spare allocation: 1 hardware access request for OSM Maps project - https://phabricator.wikimedia.org/T97638#1330678 (10yuvipanda) @akosiaris - so, is this just giving the maps team access to the psql on this machin... [17:49:33] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [17:49:55] meh, i should have copied all the configs over and such [17:50:00] and then did all the rebuilds [17:50:01] oh well [17:50:07] wait, why is mailman ok? [17:50:10] it should be stopped... [17:50:19] (03PS3) 10Jcrespo: mysql: indentation fixes [puppet] - 10https://gerrit.wikimedia.org/r/211358 (owner: 10Dzahn) [17:50:31] (03CR) 10Jcrespo: [C: 031] mysql: indentation fixes [puppet] - 10https://gerrit.wikimedia.org/r/211358 (owner: 10Dzahn) [17:50:43] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [17:51:02] argh... [17:51:07] who knows if thats gonna fubar my rebuild [17:51:15] restopped it but dunno why it fired [17:51:20] puppet agent is disabled. [17:51:44] (03CR) 10Jcrespo: "This package is deprecated and should not be used for new machines" [puppet] - 10https://gerrit.wikimedia.org/r/211358 (owner: 10Dzahn) [17:52:46] (03CR) 10Jcrespo: [C: 032] Repool es1010 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215358 (owner: 10Jcrespo) [17:54:54] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [17:55:31] weird [17:55:48] ok, pywikibot is moved and renamed and acceptable aliases updated, moving next [17:55:48] (03PS1) 10Ori.livneh: varnishstatsd: don't report stats for bogus HTTP status codes [puppet] - 10https://gerrit.wikimedia.org/r/215370 [17:55:53] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [17:55:53] well, copied [17:56:09] archives work https://lists.wikimedia.org/pipermail/pywikibot/ [17:56:22] onto pywikipedia-announce [17:56:22] !log jynus Synchronized wmf-config/db-eqiad.php: repool es1010 (duration: 00m 12s) [17:56:26] Logged the message, Master [17:57:28] 6operations, 10Analytics-Cluster: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1330742 (10faidon) Yeah, what @Ottomata said, it's just a handful of packages and we have this running on Ubuntu as well, so how hard can it be... On the above: -... [17:59:03] 6operations: Backport & test firmware-linux 0.44 - https://phabricator.wikimedia.org/T100771#1330753 (10faidon) 5Open>3Resolved I wouldn't mind all that much but I don't see a reason yet, so let's not for now, I'd say. That said, if we do decide to do it, let's add use firmware-linux instead, which includes... [17:59:14] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, I guess no chance we start returning non-standard http codes" [puppet] - 10https://gerrit.wikimedia.org/r/215370 (owner: 10Ori.livneh) [18:00:05] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T1800). Please do the needful. [18:00:20] (03PS2) 10Ori.livneh: varnishstatsd: don't report stats for bogus HTTP status codes [puppet] - 10https://gerrit.wikimedia.org/r/215370 [18:00:31] (03CR) 10Ori.livneh: [C: 032 V: 032] varnishstatsd: don't report stats for bogus HTTP status codes [puppet] - 10https://gerrit.wikimedia.org/r/215370 (owner: 10Ori.livneh) [18:01:29] (03CR) 10Paladox: "I am not sure but could we test because it may use that or something else but I would think it would may. I am not 100% sure but testing s" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [18:01:44] (03PS10) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [18:03:03] hrmm, the -bugs seems large too [18:03:35] robh: the name would suggest so :) [18:04:18] (03PS2) 10Bmansurov: Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [18:04:31] andrewbogott: https://wikitech.wikimedia.org/wiki/Puppet_coding#Resources no quote [18:05:05] (03CR) 10Aaron Schulz: [C: 031] HHVM APC: enable item expiration [puppet] - 10https://gerrit.wikimedia.org/r/215213 (owner: 10Ori.livneh) [18:05:08] matanya: ok, thanks. [18:08:36] (03PS1) 10Filippo Giunchedi: gdash: fix graphite dashboard colors [puppet] - 10https://gerrit.wikimedia.org/r/215375 [18:12:46] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] gdash: fix graphite dashboard colors [puppet] - 10https://gerrit.wikimedia.org/r/215375 (owner: 10Filippo Giunchedi) [18:13:21] ori: good to merge your change ? [18:13:33] godog: ack, yes [18:13:57] yup, merged [18:16:19] 6operations, 7database: es[12]00[123] maintenance and upgrade - https://phabricator.wikimedia.org/T101084#1330859 (10jcrespo) es1010 upgraded (if you do not want to be notified, unsubscribe!) [18:16:55] ok, copy of announce is progressing [18:16:58] sorry, rebuild [18:18:35] and bugs is rebuilding [18:19:56] and its a big one, as expected [18:20:00] (03PS11) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [18:20:11] (03PS1) 10Dzahn: Revert "icinga.wikimedia.org cert sha1 to sha256" [puppet] - 10https://gerrit.wikimedia.org/r/215377 [18:21:00] (03CR) 10Dzahn: [C: 032] "new cert apparently is not in x509 format" [puppet] - 10https://gerrit.wikimedia.org/r/215377 (owner: 10Dzahn) [18:21:36] mutante: hm? [18:22:07] oh saw the +2 comment [18:22:07] nvm [18:24:00] 14k messages in -bugs [18:25:32] godog: i'll remove [^1-5]xx from graphite100{1,2} [18:26:29] (03PS1) 10Dzahn: do not set ca parameter with install_certificate [puppet] - 10https://gerrit.wikimedia.org/r/215378 [18:27:05] (03CR) 10Dzahn: [C: 032] "Error 400 on SERVER: Invalid parameter ca" [puppet] - 10https://gerrit.wikimedia.org/r/215378 (owner: 10Dzahn) [18:28:23] PROBLEM - puppet last run on neon is CRITICAL puppet fail [18:28:53] 6operations, 7Graphite: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141#1330890 (10fgiunchedi) 3NEW [18:29:03] (03PS2) 10Dzahn: do not set ca parameter with install_certificate [puppet] - 10https://gerrit.wikimedia.org/r/215378 [18:29:23] ori: kk, thanks (graphite2001 not graphite1002 btw) [18:29:28] yep. done btw [18:29:43] PROBLEM - puppet last run on mw2017 is CRITICAL puppet fail [18:29:53] 6operations, 7Graphite: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141#1330898 (10fgiunchedi) [18:30:28] godog: these are the ones i removed: https://dpaste.de/AKke/raw [18:31:38] 6operations, 7Monitoring, 5Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1330909 (10fgiunchedi) [18:32:27] godog: should i increase the sample factor for xhprof? [18:33:00] it's 1:1000 currently [18:33:03] could make that 1:10,000 [18:33:26] ori: heh was going to mention the same thing, that'd help for sure [18:33:52] 100k packets/s pushes things a bit over the edge, not sure if it is statsdlb or statsite or both [18:34:36] are we still using statsdlb? [18:35:24] yup, each statsite is at ~20% cpu afaict [18:35:32] RECOVERY - puppet last run on neon is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [18:38:49] 12k of 14k [18:41:17] still going... =P [18:41:52] (03CR) 10RobH: [C: 032] pywikipedia->pywikibot in mailman [puppet] - 10https://gerrit.wikimedia.org/r/214694 (https://phabricator.wikimedia.org/T100707) (owner: 10John F. Lewis) [18:42:58] ok, redirections merged and puppet is running [18:43:17] should also rehup mailman since its stopped. [18:43:27] yep. [18:43:57] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [18:44:35] ok, starting to see list traffic resume [18:44:47] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [18:45:46] (03PS1) 10Ori.livneh: Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 [18:46:22] godog: ^. Keep in mind that since this touches StartProfile.php, I would not expect the change to take effect until an HHVM restart. But since I plan on pushing out HHVM configuration changes later (which will refresh the service), we can merge it and have it queued up. [18:46:29] !log sodium has resumed normal service. all items on https://phabricator.wikimedia.org/T100711 addressed [18:46:34] Logged the message, Master [18:46:36] RECOVERY - puppet last run on mw2017 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [18:48:00] (03PS2) 10Filippo Giunchedi: Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 (https://phabricator.wikimedia.org/T101141) (owner: 10Ori.livneh) [18:48:10] (03CR) 10Filippo Giunchedi: [C: 031] Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 (https://phabricator.wikimedia.org/T101141) (owner: 10Ori.livneh) [18:48:13] 6operations, 10pywikibot-core, 5Patch-For-Review, 10Wikimedia-Mailing-lists: Rename pywikipedia list prefixes to pywikibot - https://phabricator.wikimedia.org/T100707#1330997 (10RobH) 5Open>3Resolved The rename of these lists has completed and should now function normally. [18:48:19] ori: yup, I've linked the relevant ticket [18:49:34] 6operations, 7HTTPS: replace lists.wikimedia.org's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100832#1331008 (10RobH) [18:49:43] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1331011 (10RobH) [18:50:28] so i see emails coming through on lists, but its still over 5 minutes behind [18:50:38] * robh is still awaiting his test email, though he is now getting many other list emails [18:51:02] (03PS1) 1020after4: Group1 wikis to 1.26wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215385 [18:51:28] (03CR) 1020after4: [C: 032] Group1 wikis to 1.26wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215385 (owner: 1020after4) [18:52:46] (03Merged) 10jenkins-bot: Group1 wikis to 1.26wmf8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215385 (owner: 1020after4) [18:53:14] godog: do you know http://linux.die.net/man/1/dropwatch ? [18:53:41] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: Group1 wikis to 1.26wmf8 [18:53:48] there it goes, 8 minutes after send [18:53:51] Logged the message, Master [18:54:40] 6operations, 7HTTPS: replace lists.wikimedia.org's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100832#1331035 (10RobH) [18:54:42] 6operations, 3Roadmap, 7notice, 7user-notice: Mailing list maintenance window - 2015-06-02 17:00 UTC to 19:00 UTC - https://phabricator.wikimedia.org/T100711#1331033 (10RobH) 5Open>3Resolved All items on this task have been completed, resolving task. [18:54:45] 6operations, 10pywikibot-core, 5Patch-For-Review, 10Wikimedia-Mailing-lists: Rename pywikipedia list prefixes to pywikibot - https://phabricator.wikimedia.org/T100707#1331036 (10RobH) [18:58:21] ori: hah, no never heard of it, looks very useful though! [18:58:46] $ apt-cache search dropwatch [18:58:47] $ [18:58:58] sad_trombone.au [19:01:57] uhh [19:02:03] it.voy is down [19:02:30] 2015-06-02 21:02:19 mw1219 itwikivoyage exception INFO: [3da1636c] /wiki/Assos_(Cefalonia) BadMethodCallException from line 367 of /srv/mediawiki/php-1.26wmf8/includes/skins/SkinTemplate.php: Call to a member function getCredits() on a non-object (boolean) [19:02:49] mobilefrontend? [19:03:03] 2015-06-02 21:02:51 mw1075 dewikivoyage exception INFO: [2bccf59b] /wiki/Buto BadMethodCallException from line 367 of /srv/mediawiki/php-1.26wmf8/includes/skins/SkinTemplate.php: Call to a member function getCredits() on a non-object (boolean) [19:03:04] yeah [19:03:06] heh [19:03:26] twentyafterfour: revert [19:04:11] all wikivoyages are down afais [19:04:27] why just wikivoyages? [19:04:48] they have different extensions probably [19:04:56] well they do. and it's probably one of them [19:05:00] (03PS1) 10Aude: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215389 [19:05:03] that MF is not compatible with [19:05:06] special extensions [19:05:07] don't revert yet [19:05:33] (03PS1) 10Ori.livneh: wgMaxCredits to 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215390 [19:05:36] That error seems to indicate that Action::factory( 'credits', ... ) returns null? [19:05:48] (03CR) 10Ori.livneh: [C: 032 V: 032] wgMaxCredits to 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215390 (owner: 10Ori.livneh) [19:06:00] ori: umm, that's used for the wikitravel licensing thing [19:06:13] !log ori Synchronized wmf-config/InitialiseSettings.php: wgMaxCredits to 0 (duration: 00m 13s) [19:06:19] Logged the message, Master [19:06:29] well, it's back up now, which is better than before [19:07:00] now let's find the regression [19:07:27] * aude needs to deploy config patch for wikidata [19:07:54] https://www.mediawiki.org/wiki/MediaWiki_1.26/wmf8/Changelog [19:10:02] none of the wikivoyage specific extensions got updates [19:10:07] oh [19:10:45] CommonSettings.php has $wgActions['credits'] = false; [19:11:22] legoktm: Per https://gerrit.wikimedia.org/r/215390 , only wikivoyages had a non-zero $wgMaxCredits [19:12:12] RoanKattouw_away: yeah, because they have the special wikitravel licensing credits on every page's footer [19:12:29] does anyone mind if i deploy my patch? [19:12:58] shouldn't take long [19:13:11] YuviPanda: if you have a moment, could you please review https://gerrit.wikimedia.org/r/#/c/202790/ ? [19:13:14] aude: I don't mind, I'm just trying to figure out whether we need a revert or not [19:13:26] https://gerrit.wikimedia.org/r/#/c/213263/1 looks weird. [19:13:28] ok [19:13:32] legoktm: ori: so no revert? [19:13:43] but it affected more than just the main apge [19:13:47] i think we can figure out the issue with wikivoyage [19:13:52] twentyafterfour: probably not [19:13:52] (03PS1) 10BBlack: icinga.wikimedia.org cert sha1 to sha256 + chained file usage [puppet] - 10https://gerrit.wikimedia.org/r/215393 [19:14:10] (03PS1) 10Ori.livneh: Only set $wgActions['credits'] to false if $wgMaxCredits is 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215394 [19:14:20] (03CR) 10Aude: [C: 032] Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215389 (owner: 10Aude) [19:14:27] (03Merged) 10jenkins-bot: Bump cache epoch for Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215389 (owner: 10Aude) [19:15:24] robh: so... sucessful? [19:15:40] !log aude Synchronized wmf-config/Wikibase.php: bump cache epoch for wikidata (duration: 00m 13s) [19:15:44] done [19:15:46] Logged the message, Master [19:15:47] aude: btw [19:15:49] I haven't filed a bug yet [19:16:01] but the SiteGroup key is using up a shitton of memcached bandwidth... again [19:16:12] (03CR) 10Dzahn: [C: 031] icinga.wikimedia.org cert sha1 to sha256 + chained file usage [puppet] - 10https://gerrit.wikimedia.org/r/215393 (owner: 10BBlack) [19:16:12] gah... [19:16:22] that really needs to be fixed, like, permanently [19:16:28] please do file a bug [19:16:58] (03CR) 10BBlack: [C: 032] icinga.wikimedia.org cert sha1 to sha256 + chained file usage [puppet] - 10https://gerrit.wikimedia.org/r/215393 (owner: 10BBlack) [19:17:42] JohnFLewis: all seem to work, i sent an email to the list and didnt get a reply back yet though [19:17:47] no bounce notice though [19:17:48] (03PS2) 10Ori.livneh: Only set $wgActions['credits'] to false if $wgMaxCredits is 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215394 [19:17:49] so thats good =] [19:18:01] robh: I sent one before and never got either [19:18:02] (03CR) 10Ori.livneh: [C: 032] Only set $wgActions['credits'] to false if $wgMaxCredits is 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215394 (owner: 10Ori.livneh) [19:18:08] (03Merged) 10jenkins-bot: Only set $wgActions['credits'] to false if $wgMaxCredits is 0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215394 (owner: 10Ori.livneh) [19:18:50] (03PS1) 10Ori.livneh: Revert "wgMaxCredits to 0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215397 [19:18:57] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert "wgMaxCredits to 0" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215397 (owner: 10Ori.livneh) [19:19:49] !log ori Synchronized wmf-config: I35255f357 and I026dfdbf68 (duration: 00m 12s) [19:19:58] Logged the message, Master [19:20:27] looks fine now [19:20:30] i don't know how it ever worked [19:20:51] legoktm, twentyafterfour: could one of you file a bug and try to isolate the guilty commit? [19:21:15] ori, twentyafterfour: the bug is https://phabricator.wikimedia.org/T101148 [19:21:33] it also looks to me like the Action::factory call in SkinTemplate.php should handle null return values better [19:22:30] I don't really understand wgMaxCredits so I'm not well equipped to debug it [19:22:42] (03PS3) 10Ori.livneh: Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 (https://phabricator.wikimedia.org/T101141) [19:22:55] but I will try to poke it [19:23:00] twentyafterfour: it was the first time i've seen this variable too, i'm just reading the code [19:23:05] (03CR) 10Ori.livneh: [C: 032] Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 (https://phabricator.wikimedia.org/T101141) (owner: 10Ori.livneh) [19:23:11] (03Merged) 10jenkins-bot: Sample profiling data at 1:10,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215383 (https://phabricator.wikimedia.org/T101141) (owner: 10Ori.livneh) [19:24:03] (03PS1) 10Dzahn: tendril: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215403 [19:24:09] !log ori Synchronized wmf-config/StartProfiler.php: I7810b72d5: Sample profiling data at 1:10,000 (duration: 00m 12s) [19:24:14] (03PS2) 10Dzahn: tendril: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215403 [19:24:15] Logged the message, Master [19:25:53] (03PS1) 10BBlack: icinga apache config: Add back missing SSLCertificateFile [puppet] - 10https://gerrit.wikimedia.org/r/215404 [19:25:55] (03PS2) 10Ori.livneh: HHVM APC: enable item expiration [puppet] - 10https://gerrit.wikimedia.org/r/215213 [19:27:08] (03CR) 10BBlack: [C: 032 V: 032] icinga apache config: Add back missing SSLCertificateFile [puppet] - 10https://gerrit.wikimedia.org/r/215404 (owner: 10BBlack) [19:29:21] (03PS3) 10Ori.livneh: Apply ::varnish::logging::statsd on all varnishes, not just bits [puppet] - 10https://gerrit.wikimedia.org/r/214651 [19:31:12] (03PS2) 10Ori.livneh: HHVM canaries: set light_process_count to 5 [puppet] - 10https://gerrit.wikimedia.org/r/215187 [19:31:18] (03CR) 10Ori.livneh: [C: 032 V: 032] HHVM canaries: set light_process_count to 5 [puppet] - 10https://gerrit.wikimedia.org/r/215187 (owner: 10Ori.livneh) [19:32:55] godog: https://graphite.wikimedia.org/render/?title=udp%20errors/drops,%20all%20interfaces&vtitle=packets/s&from=-4hour&width=1024&height=500&until=now&areaMode=none&hideLegend=&target=alias(servers.graphite1001.udp.InDatagrams,%22rx%20datagrams%22)&target=secondYAxis(group(servers.graphite1001.udp.InErrors,%20servers.graphite1001.udp.NoPorts,%20servers.graphite1001.udp.RcvbufErrors)) starting to register a drop [19:33:24] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM but looping in bblack just in case" [puppet] - 10https://gerrit.wikimedia.org/r/214651 (owner: 10Ori.livneh) [19:34:20] ori: yep looks good, hopefully there'll be less stragglers than last time [19:34:36] (03CR) 10GWicke: Add basic alerts on RESTBase error rates and storage latencies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [19:34:38] (03PS3) 10GWicke: Add basic alerts on RESTBase error rates and storage latencies [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) [19:34:47] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.69% of data above the critical threshold [500.0] [19:35:01] godog: the APC change should trigger a service refresh of HHVM, so i think in 10 minutes or so it should apply everywhere [19:35:18] (03PS1) 10Aaron Schulz: Add job ack rates to gdash [puppet] - 10https://gerrit.wikimedia.org/r/215406 [19:36:31] (03CR) 10Ori.livneh: [C: 031] Add job ack rates to gdash [puppet] - 10https://gerrit.wikimedia.org/r/215406 (owner: 10Aaron Schulz) [19:37:32] (03PS1) 10Andrew Bogott: Allow the labs recursor to respond to all wmf networks. [puppet] - 10https://gerrit.wikimedia.org/r/215407 [19:38:40] (03CR) 10Andrew Bogott: [C: 032] Allow the labs recursor to respond to all wmf networks. [puppet] - 10https://gerrit.wikimedia.org/r/215407 (owner: 10Andrew Bogott) [19:39:57] RECOVERY - Recursive DNS on 208.80.154.20 is OK: DNS OK: 0.036 seconds response time. www.wikipedia.org returns 208.80.154.224 [19:48:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Add job ack rates to gdash [puppet] - 10https://gerrit.wikimedia.org/r/215406 (owner: 10Aaron Schulz) [19:48:59] (03PS1) 10Cmjohnson: Adding dhcp entries for restbase1007-9 [puppet] - 10https://gerrit.wikimedia.org/r/215482 [19:50:17] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [19:52:18] (03CR) 10Cmjohnson: [C: 032] Adding dhcp entries for restbase1007-9 [puppet] - 10https://gerrit.wikimedia.org/r/215482 (owner: 10Cmjohnson) [19:53:35] (03PS3) 10Dzahn: replace tendril.wikimedia.org's sha1 cert with sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214692 (https://phabricator.wikimedia.org/T100835) (owner: 10RobH) [19:54:29] (03CR) 10Dzahn: "amended to only update the cert itself" [puppet] - 10https://gerrit.wikimedia.org/r/214692 (https://phabricator.wikimedia.org/T100835) (owner: 10RobH) [19:55:22] ori: the effect would be that bump in 5xx too I suppose? [19:56:59] godog: I don't think so. I was just investigating that. I suspect it relates to the MediaWiki deployment train. [19:57:26] Oh. It appears to have subsided. In that case, yeah, that's probably the rolling restart of HHVMs. [19:57:34] the spike is about 20 minutes in duration [19:57:39] which coincides with the period of puppet runs [19:58:10] and it peaks at 500/s, which is plausible given that each app server handles ~90-100 reqs/s (very vague memory) [19:59:27] (03CR) 10Dzahn: [C: 032] replace tendril.wikimedia.org's sha1 cert with sha256 [puppet] - 10https://gerrit.wikimedia.org/r/214692 (https://phabricator.wikimedia.org/T100835) (owner: 10RobH) [20:00:59] ori: yeah I looked a bit too but couldn't find any obviously smoking gun [20:01:30] godog: it coincides well with fluorine:/a/mw-log/kernel.log showing HHVM restarts [20:02:12] (03CR) 10Dzahn: [C: 032] "chained files has been regenerated now" [puppet] - 10https://gerrit.wikimedia.org/r/215403 (owner: 10Dzahn) [20:02:19] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [20:05:13] ori, looks like gdash doesn't know what teal is :/ [20:05:27] i don't think *i* know what teal is [20:05:37] stick to roygbiv, you commie [20:06:07] I guess I can use yellow, assuming it's visible... [20:06:39] 6operations, 7HTTPS, 5Patch-For-Review: replace tendril.wikimedia.org's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100835#1331447 (10Dzahn) a:3Dzahn [20:06:46] (03CR) 10Dzahn: "this was for T100835" [puppet] - 10https://gerrit.wikimedia.org/r/215403 (owner: 10Dzahn) [20:06:57] (03CR) 10Ori.livneh: [C: 04-1] " ori: I'd avoid long commands in cron especially with date since % is cron's comment character" [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [20:07:56] or maybe #04B4AE [20:09:36] godog: sooooo... varnish stats? [20:11:02] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1331462 (10Dzahn) [20:11:04] 6operations, 7HTTPS, 5Patch-For-Review: replace tendril.wikimedia.org's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100835#1331460 (10Dzahn) 5Open>3Resolved also was: https://gerrit.wikimedia.org/r/#/c/215403/ replaced now: Signature algorithm SHA256withRSA https://www.ssllabs.com/sslte... [20:11:15] ori: yeah with more xhprof sampling things look more reasonable [20:11:41] (03CR) 10Filippo Giunchedi: [C: 032] Apply ::varnish::logging::statsd on all varnishes, not just bits [puppet] - 10https://gerrit.wikimedia.org/r/214651 (owner: 10Ori.livneh) [20:11:59] \o/ [20:12:00] thanks :) [20:12:43] np, now I'm curious to see what happens! [20:14:02] (03PS1) 10Aaron Schulz: Fix 0a5165b by picking a color alias graphite accepts [puppet] - 10https://gerrit.wikimedia.org/r/215504 [20:14:06] ori, ^ [20:14:44] 6operations, 7HTTPS, 5Patch-For-Review: replace icinga's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100830#1331483 (10Dzahn) fixed by @bblack https://gerrit.wikimedia.org/r/#/c/215404/ https://gerrit.wikimedia.org/r/#/c/215393/ https://www.ssllabs.com/ssltest/analyze.html?d=icinga.wikimedia... [20:15:02] 6operations, 7HTTPS, 5Patch-For-Review: replace icinga's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100830#1331485 (10Dzahn) a:3BBlack [20:15:12] 6operations, 7HTTPS: Replace SHA1 certificates with SHA256 - https://phabricator.wikimedia.org/T73156#1331487 (10Dzahn) [20:15:13] 6operations, 7HTTPS, 5Patch-For-Review: replace icinga's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100830#1331486 (10Dzahn) 5Open>3Resolved [20:15:24] (03CR) 10Ori.livneh: [C: 032] Fix 0a5165b by picking a color alias graphite accepts [puppet] - 10https://gerrit.wikimedia.org/r/215504 (owner: 10Aaron Schulz) [20:15:44] 6operations, 7HTTPS, 5Patch-For-Review: replace librenms's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100831#1331498 (10Dzahn) a:5RobH>3Dzahn [20:17:37] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1331523 (10Qgil) [20:19:15] (03Abandoned) 10Andrew Bogott: Make the DNS server for .wmflabs configurable [puppet] - 10https://gerrit.wikimedia.org/r/211063 (owner: 10Andrew Bogott) [20:20:24] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1331538 (10Qgil) @Dbrant said in T101117: > I'd like to be able to create projects in Phab. Being the acting product owner for the Android app... [20:20:51] matanya: hey, could i ask you to make a small edit to https://meta.wikimedia.org/wiki/Www.wikimedia.org_template ? [20:21:52] not just that [20:21:58] all of the projects' landing pages [20:22:44] maybe Nemo_bis can do it [20:23:22] (03PS1) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:23:24] (03CR) 10jenkins-bot: [V: 04-1] Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) (owner: 10Yuvipanda) [20:25:09] (03PS2) 10Yuvipanda: Add support for Generic Webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/215505 (https://phabricator.wikimedia.org/T97230) [20:27:17] godog: looking awesome. thanks again [20:28:48] ori: looking good indeed! http://graphite.wikimedia.org/render/?width=878&height=345&_salt=1433276879.495&from=-1hours&target=aliasByNode(sumSeries(varnish.esams.backends.*.5xx.rate)%2C%201)&target=aliasByNode(sumSeries(varnish.ulsfo.backends.*.5xx.rate)%2C%201)&target=aliasByNode(sumSeries(varnish.eqiad.backends.*.5xx.rate)%2C%201) [20:32:21] 6operations, 7HTTPS, 5Patch-For-Review: replace librenms's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100831#1331579 (10Dzahn) a:5Dzahn>3None [20:33:59] 6operations, 7HTTPS, 5Patch-For-Review: replace librenms's sha1 cert with sha256 - https://phabricator.wikimedia.org/T100831#1321685 (10Dzahn) librenms doesn't use an individual .erb template for the Apache config. instead it uses @webserver::apache::site which uses templates/apache/generic_vhost.erb which... [20:37:07] (03CR) 10Dzahn: [C: 032] Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:37:27] ori: I'm not seeing anything obviously broken, I'll go to lunch in ~5 [20:37:56] cool [20:37:58] thanks again [20:38:40] np! thanks for taking care of that, reqstats almost out of the door [20:38:49] what's left? [20:38:55] oh ottomata's stuff right? [20:39:30] yup, also the breakdown per-wiki but we should approach that better I think [20:39:47] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 23.08% of data above the critical threshold [500.0] [20:39:55] uhm [20:40:04] https://gdash.wikimedia.org/dashboards/reqerror/ ? [20:40:22] interesting [20:40:52] Error: 2013 Lost connection to MySQL server during query (10.64.48.20) [20:40:54] bunch of those [20:41:02] jynus: ^ [20:41:47] (03CR) 10Dzahn: [C: 031] Enable MediaWiki logo on mobile login page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/215361 (https://phabricator.wikimedia.org/T100633) (owner: 10Jdlrobson) [20:43:33] (03CR) 10Dzahn: [C: 04-1] "don't do this yet, should be 2 separate changes and check Apache config to use SSLCertificateChainFile directive" [puppet] - 10https://gerrit.wikimedia.org/r/214666 (https://phabricator.wikimedia.org/T92709) (owner: 10RobH) [20:43:56] (03CR) 10Paladox: "Thanks I am going to see If this works." [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:44:22] godog: ori, sorry been busy with some spark troubleshooting for analytics folks [20:44:30] will get back to reqstats sometime... [20:45:08] (03PS1) 10Dzahn: ganglia: use SSLCertificateChainFile [puppet] - 10https://gerrit.wikimedia.org/r/215508 (https://phabricator.wikimedia.org/T100825) [20:45:51] (03CR) 10Dzahn: "also see https://gerrit.wikimedia.org/r/#/c/215508/" [puppet] - 10https://gerrit.wikimedia.org/r/214670 (https://phabricator.wikimedia.org/T100825) (owner: 10RobH) [20:48:34] bblack: seems to have been a transient db issue [20:49:19] ori: yes? [20:49:43] or paravoid [20:49:46] mutante: that's incorrect [20:49:53] what do you want me to edit ? [20:50:18] matanya: the portals need to have references to bits.wikimedia.org replaced [20:50:25] but kaldari is on it [20:50:27] so it's ok :) [20:50:47] sure [20:51:01] btw, ori, it is a wiki, you can edit yourself :D [20:51:13] matanya: i don't have staff rights or admin [20:51:33] (03CR) 10Paladox: "+2 for verified please/" [puppet] - 10https://gerrit.wikimedia.org/r/215247 (owner: 10Paladox) [20:51:38] aha, good point [20:51:46] i don't like to have to remember to switch hats between a regular account and a '(WMF)' account so i just forfeit having the staff bit [20:52:08] what happened to bits, ori ? [20:53:04] it got dissolved. with SPDY / HTTP 2 and HTTPS everywhere on the horizon, the cost of establishing a connection to a separate domain goes up relative to just requesting the resource from the same domain that is serving the page [20:53:42] so before: //bits.wikimedia.org/meta.wikimedia.org/load.php?... , after: //meta.wikimedia.org/w/load.php?... [20:54:05] I see, good to see this going forward [20:54:09] ori: I updated the wikipedia portal, but it's cached, so no idea when it will update [20:54:19] bblack: can you purge www.wikimedia.org? [20:54:25] brb [20:54:40] doesn't actio=purge do that ? [20:54:45] action= [20:54:57] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [20:55:39] matanya: doesn't seem to [21:00:05] rmoen, kaldari: Dear anthropoid, the time has come. Please deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T2100). [21:00:29] bblack: let me know if you can purge. otherwise I'm stuck here forever and can't get lunch [21:06:12] oh, hmmm [21:06:27] purge exactly which/what? [21:06:41] kaldari: ^ [21:07:56] bblack: http://www.wikipedia.org/ [21:08:44] oh, I figured you guys were trying to purge the template page itself [21:08:47] ok [21:10:19] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1331775 (10csteipp) 3NEW [21:10:43] bblack: the cache on that page is supposed to expire within an hour on it's own, but I have to be around when it expires in case I broke anything, so would be nice to just purge it now. [21:13:03] hmmm I thought I just did purge it, but I still see bits [21:13:57] yeah I did purge it, the age is fresh [21:14:06] but whatever it fetches from indirectly must still have old content [21:14:31] with X-Wikimedia-Debug it picks up kaldari's change [21:14:41] yeah but... [21:14:59] there's no other potential caching involved somewhere in rendering that template thing? [21:15:16] the Age: and hit counters reset after my purge [21:15:16] ori, there are spikes of errors when mass edits happen [21:15:30] you've been here long enough to know the answer to that. TIAAC. (there is always another cache) [21:16:23] kaldari: your edit is good [21:16:32] csteipp, I think those tasks are supposed to get security=access request which sets up a private blocker for ops review? [21:16:41] I am more worried about access denieds to parser cache [21:16:41] just did it all over again, purging through the cache stages, just in case [21:16:44] < Age: 12 [21:16:46] < Connection: keep-alive [21:16:48] but sitll has old content [21:16:50] < X-Cache: cp1055 hit (3), cp1055 frontend hit (278) [21:17:07] if you append a cache-busting query string, you get the update [21:17:17] whose cache are we busting, though? [21:17:31] varnish [21:17:36] you're not purging it, somehow [21:18:00] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1331813 (10chasemp) [21:18:23] it might not work for existing tickets, no idea [21:18:26] no [21:18:32] even when I cache bust the URL, I still see: [21:18:32] ori: did it work? [21:50:13] kaldari: ori: May wanna update the logos on wikimedia.org as well to use /static [21:50:43] e.g. https://www.wikimedia.org/static/images/project-logos/enwiki.png [21:51:04] via / not the full url. [21:55:01] ori: does this look correct: https://meta.wikimedia.org/w/index.php?title=Www.wikipedia.org_template%2Ftemp&type=revision&diff=12369460&oldid=12051390 [21:56:12] Krinkle: hmm, that's probably a good idea [21:57:12] No need to use meta.wikimedia.org though [21:57:26] (except for load.php) [21:57:27] legoktm: http://i.imgur.com/V6dQpFK.png [21:57:53] HAHAHA that's my user-agent all right [21:57:59] woot :D [21:58:17] ori: so, how do we turn this into a pretty graph? :) [21:58:27] kaldari: yes [21:59:46] so, now purge the P one? [21:59:58] i don't think kaldari made the edit yet [22:00:08] Difference between revisions of "Www.wikipedia.org template/temp" [22:00:10] note /tmp [22:00:12] */temp [22:00:16] ah [22:00:19] not yet [22:01:19] bblack: check out the beautiful varnish/ hierarchy in graphite.wikimedia.org [22:01:32] we should have per-backend 5xx stats / alerts [22:01:49] ori, Krinkle: would it make more sense to just use href="/static/..." for all the static assets instead of href="//meta.wikimedia.org/static/"? [22:02:27] kaldari: actually, yes. that's a good point. [22:02:50] ori: oh, you mean per-backend, as in "responses from all varnishes to backend X"? [22:03:16] no, as in which backend is actually issuing the 5xxs [22:03:22] backend-backend, not varnish backend [22:03:48] well I meant that too, but there's no non-confusing way to say it, I think [22:03:51] heh [22:04:33] this is stats on 5xx aggregate on the host varnish receives the 5xx from, as opposed to the varnish host that received it, right? [22:04:51] ori: OK, if this looks good, I'll update the real one: https://meta.wikimedia.org/w/index.php?title=Www.wikipedia.org_template%2Ftemp&type=revision&diff=12369478&oldid=12051390 [22:05:15] kaldari: Yeah, load from current domain via / [22:05:23] the www domains have /static [22:05:30] bblack: yes, *but*, the data center the varnish host is in is factored into the key [22:05:40] right [22:05:43] makes sense, awesome [22:06:22] that will be so useful for narrowing down reqerror [22:06:54] i hope so. only a couple of varnishes were emitting these metrics until a couple of hours ago, so the past 24 hours will look spikey, but they'll smooth out now [22:07:33] (03PS12) 10Paladox: Add link in gitblit for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/215247 [22:08:37] kaldari: lgtm [22:10:59] 6operations, 7database: Permission problem on parsercache db servers - https://phabricator.wikimedia.org/T101182#1332012 (10jcrespo) 3NEW a:3Springle [22:11:43] bblack, ori, OK, you can purge www.wikipedia.org now [22:12:39] actually looks like it's purged now :) [22:12:46] yeah [22:12:50] done! :) [22:13:00] not quite :P [22:13:05] eh? [22:13:25] www.wikinews.org, wiktionary.org, wikiquote.org, wikiversity.org, wikibooks.org, wikivoyage.org [22:13:38] but you don't have to do all those, kaldari [22:13:39] I know, I know :) [22:13:52] well if you're going that far, I think there something like 10 of those to go [22:14:03] I just wanted to get Wikipedia over with [22:14:04] bbl! [22:14:07] bblack: i think that's the set [22:14:09] ciao [22:17:36] (03PS2) 10Ori.livneh: HHVM: set light_process_count to 5, light_process_file_prefix to /tmp/hhvm. [puppet] - 10https://gerrit.wikimedia.org/r/215188 [22:22:10] 6operations, 7Graphite, 5Patch-For-Review: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141#1332057 (10fgiunchedi) increasing xhprof sampling helped by reducing the incoming packets rates, though it is still there (the new increase is due to varnishstatsd https://gerr... [22:24:10] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332060 (10csteipp) >>! In T101170#1331900, @greg wrote: >>>! In T101170#1331852, @RobH wrote: >> We'll need @dpatrck's manager to approve this access on task, that would be @greg correct? > >... [22:27:47] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332088 (10dpatrick) Wikitech username: dpatrick Preferred shell username: dpatrick Public RSA key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3dtG8+VT7ldgcssl6OFgWbZThbShLay2LufVhVCmXEyOzoBGV3miwhy... [22:30:06] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332113 (10RobH) Just let us know once you have reviewed and signed https://phabricator.wikimedia.org/L3 Thanks! [22:37:09] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332160 (10dpatrick) @RobH I read and signed. Thanks. [22:41:59] robh: does dpatrick have a uid ? [22:42:09] im getting him setup now [22:42:31] he says has wikitech, so has to have labs uid [22:44:32] (03PS5) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 [22:44:36] 10Ops-Access-Requests, 6operations: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332192 (10RobH) a:5dpatrick>3RobH [22:45:15] (03CR) 10jenkins-bot: [V: 04-1] Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [22:45:19] robh: i already did the puppet changes, apart from uid [22:45:24] oh, same [22:45:31] is yours already committed? [22:45:33] so i'll withdraw [22:45:35] "uidNumber: 12203" [22:45:42] (03PS6) 10Ori.livneh: Log a 20s sample of memcached usage to a file once a day [puppet] - 10https://gerrit.wikimedia.org/r/214762 [22:45:51] yea i have it done already just adidng in groups but i dont care, if you wanna do it you can [22:45:52] sorry for stepping on your toes [22:46:15] no, no go ahead [22:48:18] (03CR) 10Ori.livneh: [C: 032] HHVM: set light_process_count to 5, light_process_file_prefix to /tmp/hhvm. [puppet] - 10https://gerrit.wikimedia.org/r/215188 (owner: 10Ori.livneh) [22:49:57] (03PS1) 10RobH: setting up user dpatrick with deploy access [puppet] - 10https://gerrit.wikimedia.org/r/215537 (https://phabricator.wikimedia.org/T101170) [22:50:29] yay i finally unfubar'd my repo without losing any of my work from earlier [22:50:41] (which is now invalid and outdated since brandon did it for me, but i am happy i figured it out) [22:50:55] =P [22:51:12] totally yay :) [22:51:39] it took an embarassing amount of tinkering on my end, git aint easy =P [22:52:30] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332260 (10RobH) a:5RobH>3None Placing this up for grabs for merge by any opsen on Friday (pending no objections.) [22:53:14] robh, git stash is your friend on these cases [22:53:26] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Deployment access for Darian Patrick - https://phabricator.wikimedia.org/T101170#1332265 (10dpatrick) There's a typo there. In the "Name:" field, it lists "dpatrck" instead of "dpatrick". [22:54:54] (03CR) 10Alex Monk: [C: 04-1] setting up user dpatrick with deploy access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/215537 (https://phabricator.wikimedia.org/T101170) (owner: 10RobH) [22:55:20] ha, typo =P [22:55:49] (03PS2) 10RobH: setting up user dpatrick with deploy access [puppet] - 10https://gerrit.wikimedia.org/r/215537 (https://phabricator.wikimedia.org/T101170) [22:55:50] you should change his username to dptrck :P [22:56:17] then if he asks, you just explain that when creating his account you were out of vowels [22:58:14] i did poorly in the first round and was unable to buy vowels [22:58:23] (wheel of fortune reference, wooo) [23:00:04] RoanKattouw, ^d, Krinkle, Kaldari, matt_flaschen, AaronSchulz: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150602T2300). Please do the needful. [23:00:11] we had to stick to consonants because they're cheaper [23:00:16] pong [23:05:04] I'll do SWAT. [23:05:28] Krinkle's here, RL change is first. [23:05:39] OK [23:08:54] 6operations: Fix all .erb variable warnings - https://phabricator.wikimedia.org/T97251#1332310 (10Matanya) Can you please have a paste with all warnings ? [23:14:08] 6operations: Fix all .erb variable warnings - https://phabricator.wikimedia.org/T97251#1332327 (10ori) >>! In T97251#1332310, @Matanya wrote: > Can you please have a paste with all warnings ? {P714} [23:14:57] 6operations, 6Release-Engineering: Try out hack (>! In T91590#1329608, @demon wrote: > (Should we be linting with hhvm anyway? Hmm....) Once tin is upgraded to trusty `php` will point to `hhvm` and we'd be linting with... [23:15:31] (03PS4) 10Filippo Giunchedi: Add basic alerts on RESTBase error rates and storage latencies [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [23:15:37] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Add basic alerts on RESTBase error rates and storage latencies [puppet] - 10https://gerrit.wikimedia.org/r/215004 (https://phabricator.wikimedia.org/T78514) (owner: 10GWicke) [23:16:13] thanks ori [23:16:39] godog: will these metrics (esp. 5xx) be emitted if restbase is crashing hard? [23:16:52] i wonder if the new varnishstatsd stuff would be better for this [23:16:59] Krinkle, one of the tests failed. Although it is a RL test, I don't think it's related. Seems it only works if the second happens to not tick during the test method: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/5863/console [23:17:04] Krinkle, how do you want to proceed? [23:17:10] It it not related. [23:17:20] It is a race condition that hasn't failed since 2013 but it's intesteresting. [23:17:34] Due to load, Jenkins was slower so the test took longer than 1 second and the tiemstamp has a mismatch [23:17:35] intesteresting is an awesome word [23:17:45] :D [23:17:52] I rechecked it, Should be fine. [23:17:55] It could still fail even if it took .05 s, just depends when the second ticks. [23:18:11] I'll force it. [23:18:16] Yeah, I'm actually working these few weeks on refactoring that whole part of RL to new use timestamps [23:18:29] This is one of many reasons I didn't remember that are also good reasons to get rid of it [23:18:45] We also really need fake time for core. Something like SinonJS timers but for PHP. [23:18:53] Yup [23:21:17] (03CR) 10Filippo Giunchedi: [C: 04-1] Log a 20s sample of memcached usage to a file once a day (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [23:23:28] ori: yeah they won't be emitted, the varnishstatsd metrics will help but I'm imagining sth like 'too many 5xx' and then we can drilldown like the top5 backends by 5xx/s and not individual backend alarms [23:25:49] is there still time to add something to the ongoing SWAT? tiny extension bump [23:26:46] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [23:27:19] oops, that'd be me, checking [23:28:10] gilles, ping the deployer, matt_flaschen in this case [23:28:26] gilles, yeah [23:28:46] thanks! adding it to the page now [23:31:28] !log mattflaschen Synchronized php-1.26wmf7/includes/OutputPage.php: Don't cache minification of user.tokens (duration: 00m 13s) [23:31:34] Logged the message, Master [23:31:43] !log mattflaschen Synchronized php-1.26wmf7/includes/resourceloader/ResourceLoader.php: Don't cache minification of user.tokens (duration: 00m 14s) [23:31:48] Logged the message, Master [23:31:56] !log mattflaschen Synchronized php-1.26wmf7/includes/resourceloader/ResourceLoaderStartUpModule.php: Don't cache minification of user.tokens (duration: 00m 13s) [23:32:02] Logged the message, Master [23:32:39] ori, https://gerrit.wikimedia.org/r/#/c/215263/3 [23:33:05] !log mattflaschen Synchronized php-1.26wmf8/includes/OutputPage.php: Don't cache minification of user.tokens (duration: 00m 14s) [23:33:11] Logged the message, Master [23:33:18] !log mattflaschen Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoader.php: Don't cache minification of user.tokens (duration: 00m 13s) [23:33:24] Logged the message, Master [23:33:34] !log mattflaschen Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoaderStartUpModule.php: Don't cache minification of user.tokens (duration: 00m 15s) [23:33:40] Logged the message, Master [23:34:31] Krinkle, you're done. [23:34:41] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1332375 (10dr0ptp4kt) @ArielGlenn - sitemap.wikimedia.org/* entries can be safely removed, making it possible to make room for those top 15 https://.wikipedia.org entries plus both http://www.... [23:34:48] matt_flaschen: confirmed on en.wikipedia.org and mediawiki.org. Thanks [23:35:06] kaldari, present? [23:35:14] present [23:36:47] kaldari, was this already deployed? It was merged this morning. [23:37:16] (03CR) 10Gage: Log a 20s sample of memcached usage to a file once a day (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/214762 (owner: 10Ori.livneh) [23:37:57] matt_flaschen: Sam accidently merged it, but he said it wasn't deployed [23:38:09] Okay [23:39:16] jgage: hah, good catch re: dates! [23:39:17] 6operations: graphite2001 bios config issue - https://phabricator.wikimedia.org/T100959#1332388 (10Gage) a) no grub prompt b) yes, I see kernel output c) yes, I see getty on the serial port. I'm a bit surprised that I don't find any matches for 'com1' under /etc/grub.d/ or /boot/ . I saw this message both befor... [23:43:51] !log mattflaschen Synchronized wmf-config/InitialiseSettings.php: Disable WikiGrok (duration: 00m 13s) [23:43:57] Logged the message, Master [23:44:01] kaldari, done. Please test. [23:47:16] 6operations, 10Traffic, 7Mobile, 7Varnish: Static image files from en.m.wikipedia.org are served with cache-suppressing headers - https://phabricator.wikimedia.org/T86993#1332403 (10Krinkle) Note that while Cache-Control may wrongly include private and max-age=0, it does have an Expires header still. Which... [23:48:51] matt_flaschen: looks good [23:49:03] Thanks [23:49:10] Flow's next. [23:56:03] !log mattflaschen Synchronized php-1.26wmf8/extensions/Flow/: Sync Flow 1.26wmf8 for import fix (duration: 00m 15s) [23:56:09] Logged the message, Master [23:56:51] I can't test it without doing an import, but I'll be doing one later tonight. [23:57:16] AaronSchulz, present? [23:57:29] yep [23:58:19] (03PS1) 10Filippo Giunchedi: Revert "Add basic alerts on RESTBase error rates and storage latencies" [puppet] - 10https://gerrit.wikimedia.org/r/215557 [23:58:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Revert "Add basic alerts on RESTBase error rates and storage latencies" [puppet] - 10https://gerrit.wikimedia.org/r/215557 (owner: 10Filippo Giunchedi) [23:59:56] matt_flaschen, hoping I'll get ori for non-mw one