[00:00:05] RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151210T0000). Please do the needful. [00:00:08] Krinkle legoktm RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:10] o/ [00:00:11] o/ [00:00:39] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [00:01:11] (03CR) 10Andrew Bogott: [C: 032] Add direct hostname lookup to labs hiera [puppet] - 10https://gerrit.wikimedia.org/r/258071 (owner: 10Andrew Bogott) [00:01:25] jdlrobson: ^ you forgot to sync [00:01:46] It's a labs change [00:01:49] And I'm about to SWAT [00:01:55] So we'll be fine [00:01:57] still, needs to be synced [00:02:01] but ok if you do it [00:02:41] !log catrope@tin Synchronized wmf-config/InitialiseSettings-labs.php: Shut up unmerged change warning (duration: 00m 30s) [00:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:18] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [00:04:43] (03CR) 10Catrope: [C: 032] Set $wgExtDistGraphiteRenderApi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257077 (https://phabricator.wikimedia.org/T120339) (owner: 10Legoktm) [00:05:35] (03Merged) 10jenkins-bot: Set $wgExtDistGraphiteRenderApi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257077 (https://phabricator.wikimedia.org/T120339) (owner: 10Legoktm) [00:06:48] !log catrope@tin Synchronized wmf-config/CommonSettings.php: Set $wgExtDistGraphiteRenderApi (duration: 00m 28s) [00:06:52] * RoanKattouw glares at Krinkle for putting up an unreviewed change for SWAT [00:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:07:04] RoanKattouw: Which one is that [00:07:04] Saved by the bell, though, Aaron just +2ed it [00:07:08] * legoktm tests [00:07:09] Yeah :) [00:07:12] https://www.mediawiki.org/wiki/Special:ExtensionDistributor \o/ [00:07:34] thanks RoanKattouw [00:07:41] legoktm: nice, very nice :) [00:07:45] I also like the ajax selector [00:07:50] Didn't know that landed [00:07:57] that happened in Lyon [00:08:06] legoktm, VE is at the top? nice [00:08:07] James_F, ^ [00:08:43] https://grafana.wikimedia.org/dashboard/db/extension-distributor-downloads is the full dashboard [00:09:27] Krinkle: https://gerrit.wikimedia.org/r/#/c/258023/ [00:10:47] Is this space in the SWAT window RoanKattouw for https://gerrit.wikimedia.org/r/258053 ? [00:11:11] (03CR) 10Catrope: [C: 032] Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) (owner: 10Jdlrobson) [00:11:16] jdlrobson: Yes, please add to the wiki page [00:11:18] thanks :D [00:11:21] RoanKattouw: doing now! [00:16:04] RoanKattouw: doe [00:16:13] WTF why is Jenkins not picking up my +2 [00:16:23] (03CR) 10Catrope: [C: 032] Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) (owner: 10Jdlrobson) [00:16:40] (03Merged) 10jenkins-bot: Enable the A/B test to measure impact of collapsing content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258053 (https://phabricator.wikimedia.org/T120292) (owner: 10Jdlrobson) [00:16:55] OK I guess it did, it was just being confusing [00:18:07] !log catrope@tin Synchronized wmf-config/mobile.php: Enable A/B test to measure impact of collapsing content (duration: 00m 29s) [00:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:18:38] PROBLEM - Unmerged changes on repository puppet on labcontrol1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [00:19:40] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1451751 (10Merl) [00:20:43] ^ the labcontrol1001 is ok is just us [00:22:33] thanks for the heads-up [00:25:11] Krenair: Not so nice for me as I get so many support questions. :-) [00:25:36] (03CR) 10Dzahn: [C: 032] introduce technetium.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/257975 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [00:25:59] 6operations, 6Discovery, 10Maps, 10Traffic, 3Discovery-Maps-Sprint: Load testing Maps sent all traffic to only one server - https://phabricator.wikimedia.org/T117937#1867945 (10Yurik) 5Open>3Invalid Thanks @bblack, closing for now. [00:26:28] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/Thanks/: Fix bug with topic titles in Flow thank notifications (duration: 00m 28s) [00:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:32:22] hoo, RoanKattouw: skipping the sync is my fault. I've been in the lazy habit of merging labs only config changes, fetching on tin and not syncing to the cluster [00:32:28] RECOVERY - Unmerged changes on repository puppet on labcontrol1001 is OK: No changes to merge. [00:36:41] * Krinkle is awaiting the msgblobstore change [00:39:35] Yeah I deliberately did everything else first [00:39:38] And then got distracted, sorry [00:40:29] Oh, right. [00:40:36] I would've sworn I saw you +2 them earlier [00:40:51] No, that was because Aaron +2ed the maintenance script change [00:41:03] At which point I glared at you for putting up unmerged changes for SWAT [00:41:26] (of course I should have checked that myself, not relied on Aaron +2ing the change to notice that it wasn't merged) [00:42:11] I think I'm just gonna sync-dir php-1.27.0-wmf.8 for this one [00:42:37] Cause I don't think there's a more sensible way to deploy it [00:42:54] Hmm maybe includes/ + the other stuff separately, but meh, might as well [00:43:44] yeah, wmf.8, or includes + maintenance [00:44:24] (03PS1) 10Dzahn: dhcp: add install server config for technetium.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/258077 (https://phabricator.wikimedia.org/T118763) [00:45:55] (03PS2) 10Dzahn: dhcp: add install server config for technetium.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/258077 (https://phabricator.wikimedia.org/T118763) [00:51:08] !log catrope@tin Synchronized php-1.27.0-wmf.8/skins/Vector/: Make placeholder in logged-out personal bar greyed out (duration: 00m 29s) [00:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:51:47] (03PS3) 10Dzahn: dhcp: add install server config for technetium.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/258077 (https://phabricator.wikimedia.org/T118763) [00:52:09] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: puppet fail [00:52:13] (03CR) 10Dzahn: [C: 032] dhcp: add install server config for technetium.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/258077 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [00:52:32] RoanKattouw: confirmed the vector change on mediawiki.org [00:54:35] !log catrope@tin Synchronized php-1.27.0-wmf.8/: MessageBlobStore changes (duration: 02m 29s) [00:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:55:22] can i sneak a couple more patches into SWAT? I can deploy them myself if you are done. Just got it merged a few minutes ago [00:55:28] RoanKattouw: I'm changing the WIkimediaMaintenance patch, don't dpeloy that one yet if you haven' already [00:55:32] (or whenever you finish) [00:55:36] Reedy just pointed out a flaw [00:55:38] I just did [00:55:44] Fortunately it's a maintenance script [00:55:45] OK. it's fine [00:55:52] So you have until whenever it runs [00:56:01] Soon IIRC [00:56:26] Yeah does it run at 01:00 UTC? Or 02:00? [00:57:55] looks like 2 UTC [00:58:59] RoanKattouw: +2'ed in wmf.8 [00:59:05] of WikimediaMaintenance [00:59:06] anyway [00:59:08] checking RL impact now [01:00:04] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151210T0100). Please do the needful. [01:01:00] RoanKattouw: Can you sync the WikimediaMaintenance fixup? [01:01:10] Syncing [01:01:32] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/WikimediaMaintenance/: Fix refreshMessageBlobs script (duration: 00m 29s) [01:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:01:42] thx [01:02:36] (03PS1) 10EBernhardson: A/B test for search lang detect via accept-language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258084 (https://phabricator.wikimedia.org/T119527) [01:03:27] Hm.. it didn't sync to terbium [01:03:35] Just tested refreshMessageBlobs.php there for testwiki [01:03:45] it has the previous update but not this one [01:04:06] still has MessageBlobStore::clear(); [01:04:47] Oh ahm [01:04:54] git submodule update would help [01:05:39] oh you mean on tin. Right [01:05:41] ;-) [01:05:43] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/WikimediaMaintenance/: Fix refreshMessageBlobs script (duration: 00m 29s) [01:05:58] OK. works now [01:09:43] (03PS2) 10EBernhardson: A/B test for search lang detect via accept-language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258084 (https://phabricator.wikimedia.org/T119528) [01:09:45] (03PS1) 10EBernhardson: Turn off A/B test for search lang detect via accept-language" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258087 (https://phabricator.wikimedia.org/T119529) [01:14:53] (03PS2) 10EBernhardson: Turn off A/B test for search lang detect via accept-language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258087 (https://phabricator.wikimedia.org/T119529) [01:18:41] (03CR) 10EBernhardson: [C: 032] Increase Cirrus master timeout to 2 minutes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258027 (owner: 10EBernhardson) [01:19:29] (03Merged) 10jenkins-bot: Increase Cirrus master timeout to 2 minutes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258027 (owner: 10EBernhardson) [01:19:37] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [01:20:55] !log ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: Increase elasticsearch master timeout for maint actions to 2 minutes (duration: 00m 31s) [01:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:23:42] !log ebernhardson@tin Synchronized php-1.27.0-wmf.8/extensions/CirrusSearch/: Add master timeout parameter to mapping updates so it can be increased when the master is slow (duration: 00m 29s) [01:23:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:25:24] !log ebernhardson@tin Synchronized php-1.27.0-wmf.7/extensions/CirrusSearch/: Add master timeout parameter to mapping updates so it can be increased when the master is slow (duration: 00m 30s) [01:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:26:59] 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Make elasticsearch cluster accessible from analytics hadoop workers - https://phabricator.wikimedia.org/T120281#1868214 (10EBernhardson) [01:40:49] (03PS1) 10Madhuvishy: [WIP] apache: Add role to serve static sites on multiple hosts using apache [puppet] - 10https://gerrit.wikimedia.org/r/258096 [01:41:57] (03CR) 10jenkins-bot: [V: 04-1] [WIP] apache: Add role to serve static sites on multiple hosts using apache [puppet] - 10https://gerrit.wikimedia.org/r/258096 (owner: 10Madhuvishy) [01:46:31] (03CR) 10Dzahn: "fyi, the "order Deny,Allow" thing changes syntax between Apache 2.2 and 2.4, It won't work on newer instances (jessie) unless a compatibil" [puppet] - 10https://gerrit.wikimedia.org/r/258096 (owner: 10Madhuvishy) [01:53:22] (03CR) 10Madhuvishy: "Aaah, thanks for mentioning that. Is there a way to define it without the Order Deny, Allow that's not 2.2 specific, or should we just mak" [puppet] - 10https://gerrit.wikimedia.org/r/258096 (owner: 10Madhuvishy) [01:57:32] (03CR) 10Dzahn: "you could either put an "if" around it and make it work for both versions:" [puppet] - 10https://gerrit.wikimedia.org/r/258096 (owner: 10Madhuvishy) [02:19:08] (03CR) 10Madhuvishy: "Cool! I like the If idea - will do that! Thanks :)" [puppet] - 10https://gerrit.wikimedia.org/r/258096 (owner: 10Madhuvishy) [02:20:27] (03PS2) 10Madhuvishy: [WIP] apache: Add role to serve static sites on multiple hosts using apache [puppet] - 10https://gerrit.wikimedia.org/r/258096 [02:27:33] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.7) (duration: 10m 53s) [02:27:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:44] !log updated restbase1008 to cassandra 2.1.12 [02:28:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:53:11] (03PS3) 10Madhuvishy: [WIP] apache: Add role to serve static sites on multiple hosts using apache [puppet] - 10https://gerrit.wikimedia.org/r/258096 [03:02:47] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:02:47] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.8) (duration: 15m 54s) [03:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:02:48] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 953455 bytes in 9.700 second response time [03:25:47] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:27:17] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [03:31:18] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:31:48] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:41:45] (03CR) 10KartikMistry: service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [03:45:32] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:47:31] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 953442 bytes in 3.510 second response time [03:52:41] (03PS13) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [03:55:07] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 10 03:55:07 UTC 2015 (duration 54m 36s) [03:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:00:29] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [04:16:37] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [100000000.0] [04:16:40] PROBLEM - Labs LDAP on seaborgium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:18:18] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [04:22:27] (03PS1) 10Catrope: Enable Flow opt-in beta feature on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258101 (https://phabricator.wikimedia.org/T120829) [04:22:27] RECOVERY - Labs LDAP on seaborgium is OK: LDAP OK - 0.011 seconds response time [04:24:25] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [04:26:51] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:51] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 15.38% of data above the critical threshold [100000000.0] [04:30:11] (03PS1) 10Catrope: Allow all logged-in users to create Flow boards on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258104 (https://phabricator.wikimedia.org/T120468) [04:31:11] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 953320 bytes in 4.496 second response time [04:33:45] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1868483 (10Catrope) [04:43:30] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:30] test [04:45:30] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 953387 bytes in 6.992 second response time [04:47:50] chasemp: fail [04:51:37] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:11:49] PROBLEM - Labs LDAP on serpens is CRITICAL: Could not bind to the LDAP server [05:17:09] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [05:23:18] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [05:27:18] PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: puppet fail [05:48:07] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [05:54:08] RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [05:54:08] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [06:17:58] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [06:23:48] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [06:30:49] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:08] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:29] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: puppet fail [06:31:37] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:38] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:18] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:47] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:59] PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old. [06:44:47] RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old. [06:49:58] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:08] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:56:09] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:56:28] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:59] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:57:00] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:57:00] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:08] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:57:38] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:16:27] (03PS16) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [07:17:25] (03CR) 10jenkins-bot: [V: 04-1] etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) (owner: 10Giuseppe Lavagetto) [07:17:29] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:20:39] <_joe_> wat? this was just a rebase [07:20:47] <_joe_> and now jenkins is downvoting? [07:48:28] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [07:49:18] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: puppet fail [07:54:17] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [08:14:57] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [08:17:48] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [08:21:59] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [08:23:38] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [08:26:06] 6operations, 10RESTBase-Cassandra: Update to Cassandra 2.1.12 - https://phabricator.wikimedia.org/T120803#1868654 (10GWicke) 1008 bootstrapped successfully, and is now also upgraded to 2.1.12. General metrics are continuing to look significantly better with 2.1.12, with less than half iowait and a lower numb... [08:28:06] 6operations, 6Discovery, 3Discovery-Cirrus-Sprint: Make elasticsearch cluster accessible from analytics hadoop workers - https://phabricator.wikimedia.org/T120281#1868662 (10EBernhardson) I took a look around, and afaict this is managed directly on the routers and not from within operations/puppet or any oth... [08:31:58] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 1.00% above the threshold [1000000.0] [08:36:02] (03CR) 10DCausse: [C: 031] A/B test for search lang detect via accept-language (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258084 (https://phabricator.wikimedia.org/T119528) (owner: 10EBernhardson) [08:41:58] (03CR) 10EBernhardson: A/B test for search lang detect via accept-language (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258084 (https://phabricator.wikimedia.org/T119528) (owner: 10EBernhardson) [08:43:51] (03PS14) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [08:59:04] (03PS2) 10Jcrespo: Reconfiguration of External Storage servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/257954 [09:00:14] (03CR) 10Jcrespo: [C: 032] Reconfiguration of External Storage servers in codfw [puppet] - 10https://gerrit.wikimedia.org/r/257954 (owner: 10Jcrespo) [09:04:57] !log rebooting, restarting and upgrading mysql on es2 (codfw external storage servers) [09:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:06:27] (03PS1) 10Merlijn van Deen: toollabs: install openjdk-8-headless [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) [09:08:02] (03PS2) 10Merlijn van Deen: toollabs: install openjdk-8-headless on trusty [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) [09:15:53] (03PS17) 10Giuseppe Lavagetto: etcd: auth puppetization [puppet] - 10https://gerrit.wikimedia.org/r/255155 (https://phabricator.wikimedia.org/T97972) [09:17:18] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [09:19:10] (03PS1) 10Jcrespo: Fix typo s/off/ROW/g for binlog_format on ES codfw [puppet] - 10https://gerrit.wikimedia.org/r/258115 [09:19:58] (03CR) 10Jcrespo: [C: 032] Fix typo s/off/ROW/g for binlog_format on ES codfw [puppet] - 10https://gerrit.wikimedia.org/r/258115 (owner: 10Jcrespo) [09:23:17] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [09:38:16] akosiaris: ping me when around. [09:39:31] jynus: I see db10{41,56,11,26,16,34,21} run an old version of diamond, any reserve if I upgrade that? [09:40:23] let me check [09:42:16] 41:OK, 56:OK, 11:OK, 26: OK, 16: OK, 34: OK, 21: OK [09:42:50] I have pending upgrade all, but it will take me some time [09:45:06] jynus: ack, thanks! no worries on my side we're running the same diamond version on all distros now [09:45:23] also, please do not upgrade mysql! [09:46:17] haha no I'll wait sunday or saturday night [09:46:44] sounds good [09:47:29] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [09:47:32] can it be christmas' eve's night? [09:48:17] haha I can do that too, a memorable present [09:51:03] (03CR) 10Muehlenhoff: [C: 04-1] "openjdk-8 is not in trusty, where these packages coming from and how are they being updated for security updates?" [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) (owner: 10Merlijn van Deen) [09:53:29] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [10:02:19] (03CR) 10Filippo Giunchedi: [C: 04-1] "the puppet compiler for restbase1001 shows a notify on the config, which we don't want" [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [10:17:48] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [10:23:48] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [10:27:26] (03CR) 10Mobrovac: "Uf, good catch Filippo! Shall fix." [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [10:40:17] Are there any known problems with email sending? [10:41:04] I am missing at least one (talk page notification) and Lydia_WMD.E also misses several [10:44:51] (03CR) 10Merlijn van Deen: "It's in trusty-wikimedia/main (pool/main/o/openjdk-8/openjdk-8-dbg_8u40~b09-1+wm1_amd64.deb). I thought there was no jessie package, but i" [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) (owner: 10Merlijn van Deen) [10:47:19] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000000.0] [10:47:58] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [10:49:59] (03CR) 10Hashar: "IIRC we had jdk8 added to Trusty for wikidata/gremlin back in January 2015. The project has been shoot down. So maybe that is a leftov" [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) (owner: 10Merlijn van Deen) [10:53:49] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [10:53:55] (03CR) 10Muehlenhoff: "Ok, that b40 versions seems to have been a one-off backport, but it's more than a year old and has plenty of open vulnerabilities." [puppet] - 10https://gerrit.wikimedia.org/r/258113 (https://phabricator.wikimedia.org/T121020) (owner: 10Merlijn van Deen) [10:55:31] moritzm: ok, so that one should be killed with fire. Would it be possible for ops to also provide a trusty backport when building the jessie package? [10:56:39] Nemo_bis, can you join me on #wikimedia-databases [10:56:41] it seems it does need a bit of work, as libnss3-dev in trusty is too old... [11:01:29] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 1.00% above the threshold [1000000.0] [11:10:02] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:11:43] <_joe_> uhm any labs people around? [11:12:13] <_joe_> it's working for me [11:14:02] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 955242 bytes in 5.480 second response time [11:18:03] I think there is some flapping on the check, they where commenting it yesterday [11:18:34] that is why I am not too worried [11:18:38] something with DNS, iirc. [11:18:55] the DNS lookup times out or takes ages, and then once the domain is resolved everything is fine [11:19:15] also about the home page taking more time than the timeout [11:20:59] valhallasw`cloud: it's possible, but would need to be discussed, since it's a fair mount of work involved [11:21:31] (03PS15) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [11:22:08] mobrovac: Can you review again? ^ [11:22:34] mobrovac: also config.yaml.erb is name can be changed? [11:22:53] mobrovac: ie config.prod.yaml.erb [11:23:21] moritzm: *nod*. I'm trying to build the package now for trusty -- your jessie backport, with older gcc and libnss3. I can't really estimate the effort it takes to keep a backport updated, though. [11:23:51] oh, there's a simpler way: [11:24:37] the debian/rules has a target to rebuild for older distros (and trusty is likely in there) [11:25:00] this results in a rewritten debian/control file, so that the respective GCC etc. are used [11:25:07] ah, interesting [11:25:09] https://wikitech.wikimedia.org/wiki/Building_OpenJDK_8_backports [11:25:35] possible replacing distrel=jessie with distrel=trusty would be enough backport-wise [11:26:11] the biggest timedrain for these backports is testing [11:26:55] (and the actual build takes many hours on even a fast host since the test suite doesn't use multiple cores/CPUs) [11:27:16] yep, that indeed works [11:27:27] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [11:28:08] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [11:31:38] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:32:19] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:38:59] that's going to page isn't it? [11:39:08] <_joe_> what is? [11:39:23] no sorry, that was meant for the tools alarms [11:58:38] PROBLEM - Apache HTTP on mw1119 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:00:17] PROBLEM - HHVM rendering on mw1119 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:00:18] PROBLEM - RAID on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:00:49] PROBLEM - Check size of conntrack table on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:02:08] PROBLEM - SSH on mw1119 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:02:08] PROBLEM - puppet last run on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:06:49] PROBLEM - salt-minion processes on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:06:58] PROBLEM - dhclient process on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:06:58] PROBLEM - configured eth on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:06:59] PROBLEM - Disk space on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:07:18] PROBLEM - nutcracker process on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:07:28] PROBLEM - HHVM processes on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:08:07] PROBLEM - nutcracker port on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:08:08] PROBLEM - DPKG on mw1119 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:11:23] !log codfw es2 server restart finished [12:11:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:11:36] moritzm: and thanks for the clarification on where the effort lies. [12:13:05] !log powercycle mw1119, login on console sluggish and no ssh [12:13:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:13:15] valhallasw`cloud: sure, let me know how the build works out [12:15:17] RECOVERY - nutcracker process on mw1119 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [12:15:18] RECOVERY - HHVM processes on mw1119 is OK: PROCS OK: 12 processes with command name hhvm [12:15:58] RECOVERY - nutcracker port on mw1119 is OK: TCP OK - 0.000 second response time on port 11212 [12:15:58] RECOVERY - SSH on mw1119 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [12:15:59] RECOVERY - DPKG on mw1119 is OK: All packages OK [12:15:59] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 47 minutes ago with 0 failures [12:16:18] RECOVERY - HHVM rendering on mw1119 is OK: HTTP OK: HTTP/1.1 200 OK - 65186 bytes in 2.355 second response time [12:16:19] RECOVERY - RAID on mw1119 is OK: OK: no RAID installed [12:16:48] RECOVERY - salt-minion processes on mw1119 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:16:48] RECOVERY - Check size of conntrack table on mw1119 is OK: OK: nf_conntrack is 1 % full [12:16:57] RECOVERY - dhclient process on mw1119 is OK: PROCS OK: 0 processes with command name dhclient [12:16:58] RECOVERY - Apache HTTP on mw1119 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 7.216 second response time [12:16:58] RECOVERY - configured eth on mw1119 is OK: OK - interfaces up [12:16:58] RECOVERY - Disk space on mw1119 is OK: DISK OK [12:32:53] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:33:23] <_joe_> seems just extremely slow tbh [12:33:27] <_joe_> wfm [12:33:33] what is? [12:33:43] <_joe_> https://tools.wmflabs.org/ [12:34:42] that is damn slow [12:34:55] <_joe_> took me 4s to get the response [12:35:04] more like 20-30s here [12:35:12] Coren/mutante were saying yesterday that it takes 8 seconds to load and that's the normal behavior [12:35:16] alert fires up at 10s [12:35:46] going from 8s -> 10s is probably within normal bounds, the problem is the 8 seconds in the first place [12:36:18] which instance hosts that? [12:36:51] don't know [12:37:02] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 955840 bytes in 7.271 second response time [12:37:04] the page is 933K [12:38:15] http://tools.wmflabs.org/?Help is fun [12:38:28] it takes 5-6 seconds... to emit a 302 redirect [12:39:03] which doesn't include any LDAP data, so unless it's so badly coded that it does LDAP queries nonetheless... [12:39:12] it may not be ldap related at all [12:39:16] paravoid: the proxy needs ldap, though. [12:39:22] what for? [12:39:38] hm, wait, no, that should be just redis [12:39:46] I was thinking 'to route requests', but that's not true [12:41:05] nginx using like 25% cpu on tools-proxy-01 [12:41:31] Dec 10 12:40:27 tools-proxy-01 nslcd[10149]: [25cb9e] no available LDAP server found, sleeping 1 seconds [12:41:31] Dec 10 12:40:28 tools-proxy-01 nslcd[10149]: [25cb9e] connected to LDAP server ldap://ldap-labs.eqiad.wikimedia.org:389 [12:41:35] current proxy is tools-proxy-01, current webgrid for /admin is tools-webgrid-lighttpd-1403, which has load average 2-3 as well (4 cpu host) [12:41:50] maybe restart tools-proxy-02, then switch over to there? [12:41:57] Dec 10 12:40:23 tools-proxy-01 nslcd[10149]: [f233cd] ldap_result() failed: Can't contact LDAP server [12:42:01] my login was very slow [12:44:27] in the mean time, on seaborgium: [12:44:28] Dec 10 12:44:09 seaborgium slapd[20480]: connection_read(167): no connection! [12:44:39] quality error message [12:45:06] bah, I think I broke tools-proxy-02 by restarting nslcd :/ [12:45:45] mark: that's a red herring [12:46:02] the "no connection" part [12:46:05] just misbehaving clients [12:46:13] see my mail to ops: [12:46:15] That is caused by incorrect client behaviour: An LDAP client connected to the [12:46:16] server, but didn't unbind before terminating the connection. slapd then tries to [12:46:18] send a response, but can't and logs this message. [12:48:03] root@seaborgium:~# sudo lsof -i -n -P |egrep -c '(389|636)' [12:48:03] 126 [12:48:04] uhm [12:48:08] wasn't this 1000+ yesterday? [12:48:22] i'm stracing it [12:49:29] brb [12:50:37] indeed, used to be between 1100 and 1600 yesterday [12:52:37] fwiw, the issue is on tools-webgrid-lighttpd-1403, not on the proxy: time curl http://tools-webgrid-lighttpd-1403:38816/ > /dev/null --> real 0m6.934s [12:52:42] let me just restart the webservice >_< [12:52:45] nooo [12:52:52] we're debugging [12:52:56] don't change things while we're looking into it :) [12:53:07] !log performing switchover of es2 es1011-> es1015 for master maintenance, no production impact is expected [12:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:53:28] RECOVERY - Labs LDAP on serpens is OK: LDAP OK - 0.112 seconds response time [12:53:58] mark: sorry, I should have thought of that myself >_<. Anyway, restart doesn't fix it either, so there's that :-p [12:54:02] paravoid: it's likely the effect of idletimeout [12:54:27] valhallasw`cloud: when working together on resolving a problem, please always coordinate with others before such actions :) [12:54:35] !log re-enabled puppet on serpens / restarted slapd [12:54:37] Got it. [12:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:56:46] uuh, all my connections to labs just died [12:56:53] about a minute ago [12:57:51] (03PS3) 10BBlack: varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) [12:59:29] kart_: will take a look soon [13:00:11] kart_: akosiaris: let's schedule the deployment of service-runner-based cxserver for next week? [13:00:21] i'd appreciate if we set a concrete slot for that [13:00:38] (03PS2) 10Muehlenhoff: Further updates to LDAP indices [puppet] - 10https://gerrit.wikimedia.org/r/257871 [13:01:24] (03CR) 10Muehlenhoff: [C: 032 V: 032] Further updates to LDAP indices [puppet] - 10https://gerrit.wikimedia.org/r/257871 (owner: 10Muehlenhoff) [13:02:43] ok, so tools.wmflabs.org *does* hit ldap, through posix_getpwnam, I think. [13:03:39] !log stopping slapd on serpens to update LDAP indices [13:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:05:21] !log restarted slapd on serpens [13:05:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:05:28] i'm not seeing ldap errors on tools-proxy-01 now [13:06:05] mark: there are many on tools-webgrid-lighttpd-1415. [13:06:12] which is where the actual webservice runs [13:08:47] hmm yea [13:08:51] (03PS4) 10BBlack: varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) [13:10:19] (03CR) 10BBlack: [C: 032 V: 032] varnish: sanitize XFF better [puppet] - 10https://gerrit.wikimedia.org/r/257984 (https://phabricator.wikimedia.org/T118769) (owner: 10BBlack) [13:14:37] (03CR) 10Billinghurst: [C: 031] "looks like all the others :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257868 (https://phabricator.wikimedia.org/T120568) (owner: 10Glaisher) [13:16:23] bblack: I'd also unset XRIP [13:16:37] to make things a little less confusing :) [13:16:41] we do, when the client is direct [13:17:00] no, I mean at the end, just remove it entirely [13:17:08] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [13:17:33] I figured we might have use for it somewhere, in VCL or in an app, but donno [13:17:42] you think not? [13:18:13] yeah I doubt it [13:18:30] and I can easily see someone misconfiguring a misc-web app or something [13:19:26] well the other thing is: before the recent XFF/XCIP work, we were already sending XRIP for an HTTPS traffic (nginx setting it). Given that, and that various random internet links and softwares configure to look at either XRIP or XCIP... maybe just alias to the two values? [13:19:44] although it's wasteful, and *probably* nobody was using it yet anyways [13:20:29] I wouldn't [13:23:17] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [13:28:45] (03PS1) 10Jcrespo: Switchover from es1011 to es1015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258132 [13:29:06] moritzm: so you haven't reverted the config on seaborgium? [13:29:28] I'll do that now [13:30:57] !log reenabled puppet on seaborgium & forced a puppet run [13:30:58] Please tell me if you experiment any problem when writing changes on the wikis, switchover is about to be applied [13:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:31:04] (03PS5) 10Mobrovac: RESTBase: Switch to service::node [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) [13:31:19] (03PS2) 10Jcrespo: Switchover from es1011 to es1015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258132 [13:32:12] (03CR) 10Jcrespo: [C: 032] Switchover from es1011 to es1015 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258132 (owner: 10Jcrespo) [13:33:42] paravoid: ok, hadn't done that yet, wanted to wait a bit until the replication has caught up [13:35:32] (03PS1) 10Jcrespo: Apply master swithcover also on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258134 [13:35:41] (03PS2) 10Jcrespo: Apply master swithcover also on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258134 [13:35:59] (03CR) 10Jcrespo: [C: 032] Apply master swithcover also on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258134 (owner: 10Jcrespo) [13:36:21] (03Merged) 10jenkins-bot: Apply master swithcover also on codfw [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258134 (owner: 10Jcrespo) [13:36:30] applying it also on codfw (although no errors wouls have been produced) [13:36:38] (03PS1) 10BBlack: VCL: tighten up XFF regex slightly [puppet] - 10https://gerrit.wikimedia.org/r/258135 [13:36:40] (03PS1) 10BBlack: VCL: do not expose X-Real-IP to applayer [puppet] - 10https://gerrit.wikimedia.org/r/258136 [13:36:42] (03PS1) 10BBlack: tlsproxy: also set XCIP to same value as XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258137 [13:36:44] (03PS1) 10BBlack: VCL: switch nginx IP data from XRIP to XCIP [puppet] - 10https://gerrit.wikimedia.org/r/258138 [13:36:46] (03PS1) 10BBlack: tlsproxy: stop sending XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258139 [13:37:19] (03CR) 10BBlack: [C: 032 V: 032] VCL: tighten up XFF regex slightly [puppet] - 10https://gerrit.wikimedia.org/r/258135 (owner: 10BBlack) [13:38:25] (03CR) 10BBlack: [C: 032 V: 032] VCL: do not expose X-Real-IP to applayer [puppet] - 10https://gerrit.wikimedia.org/r/258136 (owner: 10BBlack) [13:38:45] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: also set XCIP to same value as XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258137 (owner: 10BBlack) [13:39:06] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Switchover es2 master es1011 -> es1015 (duration: 00m 37s) [13:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:39:54] !log jynus@tin Synchronized wmf-config/db-codfw.php: Switchover codfw es2 master es1011 -> es1015 (duration: 00m 29s) [13:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:41:31] mobrovac: yes. Talked with akosiaris. We're planning for service-runner deployment next week. [13:41:41] I need akosiaris today :) [13:41:49] :) [13:42:05] kart_: kk, but would like to settle on a specific slot [13:42:16] "next week" is too broad [13:42:47] editing works for me, anyones saw any problems? [13:44:37] (03PS2) 10BBlack: tlsproxy: stop sending XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258139 [13:44:39] (03PS2) 10BBlack: VCL: switch nginx IP data from XRIP to XCIP [puppet] - 10https://gerrit.wikimedia.org/r/258138 [13:45:48] !log stopping/starting slapd on seaborgium to update LDAP indices [13:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:50:57] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [13:50:58] and this is also our first critical production server with ROW-based replication [13:50:59] a long-due change that will bring a lot of advantages [13:52:39] mobrovac: Tuesday. I'll add Window. [13:52:51] (03CR) 10Mobrovac: "https://puppet-compiler.wmflabs.org/1467/ looks much better now :)" [puppet] - 10https://gerrit.wikimedia.org/r/257898 (https://phabricator.wikimedia.org/T118401) (owner: 10Mobrovac) [13:53:43] kart_: around 12 UTC would suit all of us, i think [13:53:49] mobrovac: but I would like to get +1 before. [13:53:57] mobrovac: yes. That's fine. [13:54:06] kart_: ofc, we need to get it in shape before that :) [13:56:26] kart_: what are the differences in the new patch for https://gerrit.wikimedia.org/r/#/c/244145/ ? [13:57:01] and not a single connection/edit was impacted, according to the logs, External Storage service is a gift for DBAs [13:57:30] kart_: I am around, what do you need me for ? [13:57:31] !log restarting, upgrading and reconfiguring mysql @ es1011 [13:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:57:53] cxserver upgrade ? Tuesday ? sounds ok [14:02:43] akosiaris: mobrovac See: https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0December.C2.A015 [14:02:50] akosiaris: +1 :) [14:03:22] akosiaris: also see my comments about renaming config.yaml.erb to config.prod.yaml.erb [14:03:35] (03PS1) 10Jcrespo: Applying configuration changes on es1011 [puppet] - 10https://gerrit.wikimedia.org/r/258145 [14:04:08] mobrovac: added new config var and file rename. [14:04:14] (new in last PS) [14:04:20] kk thnx kart_ [14:04:47] (03PS2) 10Jcrespo: Applying configuration changes on es1011 [puppet] - 10https://gerrit.wikimedia.org/r/258145 [14:07:00] (03CR) 10Jcrespo: [C: 032] Applying configuration changes on es1011 [puppet] - 10https://gerrit.wikimedia.org/r/258145 (owner: 10Jcrespo) [14:11:49] (03CR) 10Mobrovac: [C: 04-1] "One minor detail and we should be GTG." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [14:16:15] PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:24] RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 955806 bytes in 8.848 second response time [14:20:33] mobrovac: q: What should value of restbase url for Beta? [14:20:52] mobrovac: Production one wont' be accessible by Beta. [14:21:26] (03PS1) 10Jcrespo: Repool es1011 with lower weight after mantenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258146 [14:21:37] mobrovac: old one? [14:21:51] kart_: http://deployment-restbase02.deployment-prep.eqiad.wmflabs/ ... [14:22:15] kart_: however, a big note is that prod domains are not available in restbase in beta [14:23:01] kart_: only the beta ones, so the other part of the uri should probably be /@lang.wikipedia.beta.wmflabs.org/v1/page/html/@title [14:24:00] kart_: cf https://github.com/wikimedia/operations-puppet/blob/production/modules/restbase/templates/config.labs.yaml.erb#L53-L84 for a full list of domains served by restbase in beta [14:26:01] valhallasw`cloud: which tool is the tools.wmflabs.org homepage? [14:26:09] paravoid: tools.admin [14:27:01] paravoid: /data/project/admin/public_html/content/tool.php is probably the culprit [14:27:07] ...in combination with fa.m.wikinews.org [14:27:17] 10.68.21.49 tools.wmflabs.org - [10/Dec/2015:14:27:08 +0000] "POST /weather/api/v1 HTTP/1.1" 404 845 "https://fa.m.wikinews.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C" "Mozilla/5.0 (Android 5.1.1; Mobile; rv:42.0) Gecko/42.0 Firefox/42.0" [14:27:25] there's a gazillion entries like that in access.log [14:29:27] (03PS16) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [14:32:34] (03PS3) 10BBlack: VCL: switch nginx IP data from XRIP to XCIP [puppet] - 10https://gerrit.wikimedia.org/r/258138 [14:33:25] (03CR) 10BBlack: [C: 032 V: 032] VCL: switch nginx IP data from XRIP to XCIP [puppet] - 10https://gerrit.wikimedia.org/r/258138 (owner: 10BBlack) [14:46:37] 6operations, 7Mail: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#1869230 (10hoo) 3NEW [14:46:56] paravoid, mark, coren: https://phabricator.wikimedia.org/T121104#1869216. tools.wmflabs.org is still slow, but at least the tools.admin webservices are no longer taking up a lot of cpu. Hopefully that at least helps with the flapping for now... [14:47:30] I lack the mediawiki skills to figure out what javascript is hitting tools, though :( [14:48:09] aha [14:48:11] thanks :) [14:48:12] what the fuck [14:48:21] H41 Remove Mjbmr [14:48:22] Passed Subscribers include any of Mjbmr [14:48:22] Action: Remove me as a subscriber [14:48:22] Removed Subscribers Removed a subscriber: Mjbmr. [14:48:28] and I was wondering why the page takes 3s to load now.. [14:48:32] I've been trying to catch it in the act [14:48:45] when did you do that? like in the last 10-15 minutes? [14:48:52] paravoid: yeah, a few minutes ago [14:49:17] the load is now on some other webgrid host, as tools.weather is still being hammered [14:49:21] I found the weather thing myself, it's ~30% of all HTTP requests to tools-proxy [14:49:40] yeah, that's because it's being retried without sleep [14:49:53] HTTP OK: HTTP/1.1 200 OK - 955936 bytes in 3.390 second response time [14:49:56] icinga now [14:50:05] so yeah, I think that was it [14:51:16] andre__: can you disable harald rule H41? [14:51:32] that one does not seem conductive for... solving bugs. [14:53:18] the "multiple times per second" is because that page has multiple cities [14:53:59] paravoid: it actually seems to retry on error? I get about 10 req/s and it doesn't seem to stop. [14:54:24] it's all coming from https://fa.m.wikinews.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C [14:54:32] judging from the HTML [14:54:45] it tries to load the weather for 4 cities x 3 past days [14:54:50] ah [14:55:00] and that is the frontpage [14:55:41] is it? I don't think so? [14:55:46] yes, it is [14:55:53] ah, it's the *mobile* frontpage [14:55:58] which is apparently different than the desktop frontpage [14:56:07] but it happens on the dektop one too [14:56:27] (maybe it has less traffic) [14:57:08] valhallasw`cloud: no I cannot plus no idea what H41 does exactly [14:57:14] yeah [14:57:26] andre__: it removes mjmbr from CCs if they are CC'ed... [14:57:27] valhallasw`cloud, could you elaborate? its author could disable it though :) [14:57:47] valhallasw`cloud, admins do not have permissions to disable other people's herald rules [14:58:00] (03CR) 10Ottomata: "This version of pykafka has librdkafka support! I haven't tried it, and also haven't succesfully built it. We aren't going to use it yet" [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/257974 (owner: 10Ottomata) [14:58:10] I'm trying to solve the issue from the TL end by setting a sensible cors header -- hopefully it'll stop retrying when it just gets and empty response [14:58:16] valhallasw`cloud: "Access Denied: Restricted Herald Rule" - "A personal rule's owner can always view and edit it." [14:58:20] sorry [14:58:28] np [14:58:47] faidon@tin:~$ mwgrep tools.wmflabs.org/weather [14:58:48] ## Public wiki results [14:58:48] fawikinews MediaWiki:Gadget-Weather.js [14:58:48] (total: 1, shown: 1) [14:59:20] https://fa.wikinews.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Gadget-Weather.js [15:01:26] ok, broke it from the tool labs end [15:01:36] header("Access-Control-Allow-Origin: *"); echo 'No content'; [15:01:50] heh [15:02:00] a 404 was also retried >_< [15:02:51] 6operations, 7Mail: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#1869258 (10Lydia_Pintscher) To be specific I do miss several notification emails for changes on meta, wikidata and dewp. I do receive some notification emails from these wikis but apparently not all. [15:05:12] (03CR) 10Alexandros Kosiaris: service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:11:06] (03CR) 10Jcrespo: [C: 032] Repool es1011 with lower weight after mantenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258146 (owner: 10Jcrespo) [15:11:43] PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Puppet has 1 failures [15:13:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1011 with lower weight after mantenance (duration: 00m 29s) [15:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:16:41] (03PS1) 10Andrew Bogott: Change the icinga settings for labs public dns monitoring [puppet] - 10https://gerrit.wikimedia.org/r/258152 [15:21:26] (03PS2) 10Andrew Bogott: Change the icinga settings for labs public dns monitoring [puppet] - 10https://gerrit.wikimedia.org/r/258152 [15:22:48] !log perming es3 master switchover of es1014 -> es1019, no production impact is expected [15:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:04] s/perming/performing/ [15:26:17] (03CR) 10KartikMistry: service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [15:29:01] (03PS17) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [15:33:31] (03PS3) 10BBlack: tlsproxy: stop sending XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258139 [15:33:54] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: stop sending XRIP [puppet] - 10https://gerrit.wikimedia.org/r/258139 (owner: 10BBlack) [15:35:35] (03PS18) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [15:36:52] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:37:17] (03PS1) 10BBlack: VCL: explicitly clear XRIP to prevent spoofing to applayer [puppet] - 10https://gerrit.wikimedia.org/r/258154 [15:37:40] (03CR) 10BBlack: [C: 032 V: 032] VCL: explicitly clear XRIP to prevent spoofing to applayer [puppet] - 10https://gerrit.wikimedia.org/r/258154 (owner: 10BBlack) [15:40:03] (03PS1) 10Jcrespo: Switchover es3 master es1014 -> es1019 and depool es1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258155 [15:41:17] (03CR) 10Jcrespo: [C: 032] Switchover es3 master es1014 -> es1019 and depool es1014 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258155 (owner: 10Jcrespo) [15:42:47] again, please report any issue on saving edits, about to apply switchover [15:44:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Switchover es3 master es1014 -> es1019 and depool es1014 (duration: 00m 28s) [15:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:44:24] (03CR) 10Hashar: "recheck" [software/conftool] - 10https://gerrit.wikimedia.org/r/256480 (owner: 10Giuseppe Lavagetto) [15:44:41] !log jynus@tin Synchronized wmf-config/db-codfw.php: Switchover es3 master es1014 -> es1019 and depool es1014 (duration: 00m 31s) [15:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:54:30] (03CR) 10DCausse: [C: 031] Use event-schemas repository for avro schemas (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255135 (https://phabricator.wikimedia.org/T118570) (owner: 10EBernhardson) [15:58:51] (03CR) 10Hashar: "So that fails with:" [software/conftool] - 10https://gerrit.wikimedia.org/r/256480 (owner: 10Giuseppe Lavagetto) [16:00:05] anomie ostriches thcipriani marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151210T1600). [16:00:05] Luke081515 bearND, mdholloway: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [16:00:20] !log restarting, upgrading and configuring es1014 [16:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:01:19] I can SWAT. Luke081515 bearND|afk mdholloway ping for swat. [16:01:29] ok [16:04:35] (03PS1) 10Jcrespo: Reconfiguring es1014 (ferm, binlog_format, p_s, ssl) [puppet] - 10https://gerrit.wikimedia.org/r/258157 [16:05:24] (03CR) 10Jcrespo: [C: 032] Reconfiguring es1014 (ferm, binlog_format, p_s, ssl) [puppet] - 10https://gerrit.wikimedia.org/r/258157 (owner: 10Jcrespo) [16:05:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254029 (https://phabricator.wikimedia.org/T113109) (owner: 10Luke081515) [16:05:44] thcipriani: good morning! [16:06:45] mdholloway: hiya. Getting out a handful of config patches, then I'll get the mobileapp backport. [16:06:54] thcipriani: sounds good [16:07:30] mdholloway: That's my patches, I got 5 config changes to deploy ;) [16:07:55] thcipriani: I can close the phab tasks at my own, so you don't have to [16:08:06] Luke081515: appreciated. [16:09:30] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/252012 (https://phabricator.wikimedia.org/T113109) (owner: 10Luke081515) [16:09:43] ^ just realized first patch had that as a dependency. [16:10:03] :) [16:10:18] (03Merged) 10jenkins-bot: Add new group "curator" to enwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/252012 (https://phabricator.wikimedia.org/T113109) (owner: 10Luke081515) [16:10:38] looks good [16:10:42] mdholloway: is the MobileApp backport needed for .7 .8 or both? either way could I get you to make those patches while I [16:10:51] deploy config changes [16:10:55] (03Merged) 10jenkins-bot: Nuke and unblockself only for bureaucrats on en.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254029 (https://phabricator.wikimedia.org/T113109) (owner: 10Luke081515) [16:11:47] thcipriani: sure. i think we'll want for both .7 and .8, but let's confirm with bearND, i think he just got here [16:12:22] that would make sense, wikipedias are running .7 until ~11am Pacific today, after which point everything should be on .8 [16:12:34] yes, we want both then [16:12:38] thank you [16:14:16] thcipriani: ah, good to know about the upgrade [16:14:17] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Nuke and unblockself only for bureaucrats on en.wikiversity [[gerrit:254029]] and Add new group "curator" to enwikiversity [[gerrit:252012]] (duration: 00m 36s) [16:14:20] ^ Luke081515 check please [16:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:14:54] thcipriani: bearND here's the .7 patch: https://gerrit.wikimedia.org/r/#/c/258159/ [16:14:57] (03PS1) 10BBlack: VCL: clear XFF if empty [puppet] - 10https://gerrit.wikimedia.org/r/258160 [16:15:32] thcipriani: was succesful :) [16:15:36] (03CR) 10BBlack: [C: 032 V: 032] VCL: clear XFF if empty [puppet] - 10https://gerrit.wikimedia.org/r/258160 (owner: 10BBlack) [16:15:39] Luke081515: thanks [16:15:59] thcipriani: bearND: and the .8: https://gerrit.wikimedia.org/r/#/c/258161/ [16:16:32] mdholloway: cool, thanks. Could I get you to throw those on the wikitech deployments page so they're easy to find? [16:16:40] (03CR) 10Mobrovac: [C: 04-1] service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [16:16:49] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255422 (https://phabricator.wikimedia.org/T119636) (owner: 10Luke081515) [16:17:05] thcipriani: sure thing [16:17:30] (03Merged) 10jenkins-bot: Enable filemover group at ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255422 (https://phabricator.wikimedia.org/T119636) (owner: 10Luke081515) [16:19:25] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable filemover group at ukwiki [[gerrit:255422]] (duration: 00m 29s) [16:19:27] ^ Luke081515 check please [16:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:20:11] thcipriani: Works :) [16:20:23] Luke081515: awesome thanks. [16:21:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/253084 (https://phabricator.wikimedia.org/T116270) (owner: 10Luke081515) [16:22:39] (03Merged) 10jenkins-bot: Add a rollbacker group at wuuwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/253084 (https://phabricator.wikimedia.org/T116270) (owner: 10Luke081515) [16:23:34] 6operations, 10RESTBase-Cassandra: Update to Cassandra 2.1.12 - https://phabricator.wikimedia.org/T120803#1869679 (10GWicke) Latency improvement since the upgrade: {F3064913} [16:24:34] hmm, just had a large spike of notices from the Math extension, seemingly unrelated to SWAT: Notice: Undefined property: stdClass::$detail in /srv/mediawiki/php-1.27.0-wmf.8/extensions/Math/MathRestbaseInterface.php on line 84 [16:25:36] as consequence of my last patch? [16:26:23] 6operations, 10DBA, 5Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#1869691 (10jcrespo) External storage link has been encrypted, that makes 2 out of the 14 cross-datacenter links using TLS now. I am using it but not enforcing it at user level to make sure i... [16:26:51] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add a rollbacker group at wuuwiki [[gerrit:253084]] (duration: 00m 28s) [16:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:27:03] Luke081515: I don't think so, spiked a little after and then quieted down immediately [16:27:10] Luke081515: ^ check last sync please [16:27:39] thcipriani: Works too :) [16:29:04] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) (owner: 10Luke081515) [16:30:01] (03Merged) 10jenkins-bot: Add three new groups to pawiki, and allow sysops to add or remove users to them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257881 (https://phabricator.wikimedia.org/T120369) (owner: 10Luke081515) [16:30:40] (03PS1) 10Krinkle: Remove unused $wgObjectCaches['resourceloader'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258165 [16:31:39] (03PS19) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [16:32:03] (03CR) 10Milimetric: "any reason to not merge this? If not, ottomata would you do it? Please thank you :)" [puppet] - 10https://gerrit.wikimedia.org/r/252863 (https://phabricator.wikimedia.org/T118519) (owner: 10GWicke) [16:32:23] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Add three new groups to pawiki, and allow sysops to add or remove users to them [[gerrit:257881]] (duration: 00m 29s) [16:32:25] ^ Luke081515 check please [16:32:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:33:17] thcipriani: Great, works. Thanks for deploying my 5 patches :) [16:33:35] Luke081515: yw, thanks for checking them :) [16:34:11] * Luke081515 closes the tasks at phab now [16:36:13] milimetric: i'm not really sure what that does or even what it affects, I might not be the right person to merge it....buuuut also not sure who else should [16:37:07] (03CR) 10Mobrovac: [C: 04-1] service-runner migration for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [16:37:20] 6operations, 10ops-codfw: rack new yubico auth system - https://phabricator.wikimedia.org/T120263#1869744 (10Papaul) Moritz is stay working on the YubiHSM modules. Once this is done he will update me. [16:38:14] (03PS1) 10Jcrespo: Enforce SSL on change master [software] - 10https://gerrit.wikimedia.org/r/258167 [16:38:20] mdholloway: bearND I'll roll out .8 first, then .7: wasn't there some kind of page purge that needed to be run the last time we did this? [16:38:40] (03PS1) 10coren: Labs: Add a timeout check to getent (via ldap) on labstore [puppet] - 10https://gerrit.wikimedia.org/r/258168 [16:38:52] Hm. Needs moar comments. [16:39:14] thcipriani: yes, I can do the purge [16:39:26] bearND: kk, thanks [16:39:27] thcipriani: sounds good. yep, we just need to ssh into tin and purge the cache for the remote config file url. [16:39:46] (03PS1) 10Ottomata: Initial debianization and release [debs/python-sprockets] (debian) - 10https://gerrit.wikimedia.org/r/258169 [16:40:11] (03PS2) 10Ottomata: Initial debianization and release [debs/python-sprockets] (debian) - 10https://gerrit.wikimedia.org/r/258169 (https://phabricator.wikimedia.org/T121112) [16:40:42] mobrovac: and restbase_url in cxserver :) [16:41:03] so that need to change in config.yaml.erb. [16:41:24] (03CR) 10jenkins-bot: [V: 04-1] Labs: Add a timeout check to getent (via ldap) on labstore [puppet] - 10https://gerrit.wikimedia.org/r/258168 (owner: 10coren) [16:41:41] yup, kart_, in modules/cxserver/manifests/init.pp too [16:42:02] !log thcipriani@tin Synchronized php-1.27.0-wmf.8/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android Beta app: 30% [[gerrit:258161]] (duration: 00m 29s) [16:42:07] ^ mdholloway bearND .8 is sync'd [16:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:42:14] k [16:42:32] (03PS2) 10coren: Labs: Add a timeout check to getent (via ldap) on labstore [puppet] - 10https://gerrit.wikimedia.org/r/258168 [16:43:16] once .7 is done then I'll purge [16:44:38] bearND: you going to switch up the order this time (www, then meta)? [16:45:55] !log thcipriani@tin Synchronized php-1.27.0-wmf.7/extensions/MobileApp/config/config.json: SWAT: Roll out RESTBase usage to Android Beta app: 30% [[gerrit:258159]] (duration: 00m 29s) [16:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:46:03] ^ bearND mdholloway .7 done [16:46:23] chasemp: https://gerrit.wikimedia.org/r/#/c/258168/2 has a rough first draft of a timeout check. [16:46:28] ok, thanks [16:46:58] * Coren needs some lunch. [16:47:08] (03PS20) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [16:47:46] thcipriani: mdholloway purged [16:47:59] https://meta.wikimedia.org/static/current/extensions/MobileApp/config/android.json [16:48:12] \o/ [16:48:20] thcipriani: thanks again, thcipriani [16:48:20] nice. thanks bearND ! [16:48:48] mdholloway: thanks for the patches today: appreciated. [16:49:45] mobrovac: I'm re-fixing. [16:52:10] PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: puppet fail [16:54:26] (03PS21) 10KartikMistry: service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) [17:00:04] godog robh: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151210T1700). [17:00:15] mobrovac: good night. Let;'s finish tomorrow :) [17:00:27] but there are no swat patchesssss [17:00:29] (I have fixed restbase_url) [17:00:32] godog: no one wants our help! [17:01:02] robh: success_baby.jpg ! [17:06:02] !log Updated scholarships.wikimedia.org to a4e3dbf (Ensure that Auth\UserData is constructed with an array) [17:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:13:48] moritzm: build successful! [17:19:57] heya mutante, would you know anything about this? or who to ask? [17:19:58] https://gerrit.wikimedia.org/r/#/c/252863/ [17:19:59] it seems fine [17:20:02] but i'm not sure where it gets applied [17:20:09] so i want to get someone else in onit [17:20:49] RECOVERY - puppet last run on mw2130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:21:47] (03PS1) 10Jcrespo: Repool es1011 at 100% load, repool es1014 with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258171 [17:23:32] (03CR) 10Jcrespo: [C: 032] Repool es1011 at 100% load, repool es1014 with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258171 (owner: 10Jcrespo) [17:25:13] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1011 at 100% load, repool es1014 with low load (duration: 00m 29s) [17:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:36:56] (03PS3) 10Andrew Bogott: Change the icinga settings for labs public dns monitoring [puppet] - 10https://gerrit.wikimedia.org/r/258152 [17:37:53] (03CR) 10coren: [C: 031] "I think more sensitive is better in this case, given how critical that is." [puppet] - 10https://gerrit.wikimedia.org/r/258152 (owner: 10Andrew Bogott) [17:42:26] (03CR) 10Andrew Bogott: [C: 031] "Can we have similar tests for ldaps and ldap using starttls?" [puppet] - 10https://gerrit.wikimedia.org/r/258168 (owner: 10coren) [17:42:56] (03PS4) 10Andrew Bogott: Change the icinga settings for labs public dns monitoring [puppet] - 10https://gerrit.wikimedia.org/r/258152 [17:44:39] (03CR) 10Andrew Bogott: [C: 032] Change the icinga settings for labs public dns monitoring [puppet] - 10https://gerrit.wikimedia.org/r/258152 (owner: 10Andrew Bogott) [17:45:45] (03CR) 10coren: "Would it make sense for them to run on that host, though? That one test is reasonable imo because it uses the same mechanism NFS (which r" [puppet] - 10https://gerrit.wikimedia.org/r/258168 (owner: 10coren) [17:46:26] ori: I'm trying to find your(?) blog post on async loading of js, but it isn't showing up in my searches of blog.wikimedia.org [17:46:29] help? [17:49:20] (03CR) 10coren: [C: 032] Labs: Add a timeout check to getent (via ldap) on labstore [puppet] - 10https://gerrit.wikimedia.org/r/258168 (owner: 10coren) [17:51:36] greg-g: there wasn't one. some tweets and an article in the ny observer, i think. [17:51:48] ori: oh, hence not being able to find it :) [17:51:53] http://observer.com/2015/08/how-wikipedia-upped-its-page-load-speed-by-roughly-40-percent-and-why/ ? [17:52:06] (03PS3) 10coren: Labs: Add a timeout check to getent (via ldap) on labstore [puppet] - 10https://gerrit.wikimedia.org/r/258168 [17:52:33] Bah, sometimes the fastforward logic is... not. [17:53:21] ori: that helps, thanks! [17:55:45] 6operations, 10ops-codfw: rack 8 new misc systems - https://phabricator.wikimedia.org/T120885#1869960 (10Papaul) [17:59:39] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [18:01:12] andrewbogott, "Icinga configuration contains errors" [18:01:26] jynus: ok, will look in a moment [18:02:31] (03CR) 10Mobrovac: [C: 031] service-runner migration for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/250910 (https://phabricator.wikimedia.org/T117657) (owner: 10KartikMistry) [18:05:44] (03PS1) 10Andrew Bogott: Add a stupid hack for determining whether or not a labs box is bare metal. [puppet] - 10https://gerrit.wikimedia.org/r/258176 [18:05:46] (03PS1) 10Andrew Bogott: Disable everything NFS-related on baremental labs boxes. [puppet] - 10https://gerrit.wikimedia.org/r/258177 [18:06:11] "Invalid max_check_attempts, check_interval, retry_interval, or notification_interval value" [18:06:18] well, thanks for being so specific [18:08:22] (03PS1) 10Andrew Bogott: Change labs public dns recheck interval to 1 minute. [puppet] - 10https://gerrit.wikimedia.org/r/258179 [18:09:19] Coren: https://gerrit.wikimedia.org/r/#/c/258179/ :( [18:09:49] (03CR) 10coren: [C: 031] "Sad." [puppet] - 10https://gerrit.wikimedia.org/r/258179 (owner: 10Andrew Bogott) [18:09:59] (03CR) 10Andrew Bogott: [C: 032] Change labs public dns recheck interval to 1 minute. [puppet] - 10https://gerrit.wikimedia.org/r/258179 (owner: 10Andrew Bogott) [18:10:14] (03CR) 10Ori.livneh: [C: 04-1] Add a stupid hack for determining whether or not a labs box is bare metal. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/258176 (owner: 10Andrew Bogott) [18:10:18] andrewbogott: "And error has occured because a value is invalid." [18:11:02] “I must decline, for secret reasons" [18:13:59] 6operations, 10hardware-requests: spare swift disks order - https://phabricator.wikimedia.org/T119698#1870067 (10Cmjohnson) [18:14:22] (03PS2) 10Andrew Bogott: Disable everything NFS-related on baremental labs boxes. [puppet] - 10https://gerrit.wikimedia.org/r/258177 [18:14:24] (03PS2) 10Andrew Bogott: Add a stupid hack for determining whether or not a labs box is bare metal. [puppet] - 10https://gerrit.wikimedia.org/r/258176 [18:15:35] andrewbogott: Why no NFS? I mean, why hardcode no NFS? The hiera defaults won't grab /home and /data/project nowadays. [18:15:48] ori is all “Stupid hack? Gotta get me some of that!" [18:15:59] heh [18:16:53] Coren: I’m pretty sure we don’t want to support nfs on bare-metal nodes. And in the short run it won’t work due to being on the wrong vlan [18:17:24] I’m not committed to that change in the long run, but it will make things a lot simpler for current hack attempts. [18:17:34] I don't think bare metal vs instance changes anything but the vlan point is compelling. :-) [18:17:41] (alternative answer: Andrew is dumb and put the baremetal node in ‘testlabs’ which uses NFS) [18:18:03] I’d be interested in switching NFS back on after we move things into the right vlan, just because it might work for free [18:18:04] (03CR) 10coren: [C: 031] "The don't live on the right network anyways." [puppet] - 10https://gerrit.wikimedia.org/r/258177 (owner: 10Andrew Bogott) [18:19:45] (03PS3) 10Andrew Bogott: Disable everything NFS-related on baremental labs boxes. [puppet] - 10https://gerrit.wikimedia.org/r/258177 [18:24:32] (03CR) 10Rush: "is_virtual is notoriously stupid, I used just $virtual in the past which comes back w/ "physical" or "kvm" afaict so we can do a case chec" [puppet] - 10https://gerrit.wikimedia.org/r/258176 (owner: 10Andrew Bogott) [18:25:00] 6operations, 10hardware-requests: spare swift disks order - https://phabricator.wikimedia.org/T119698#1870166 (10RobH) [18:25:13] (03Abandoned) 10Andrew Bogott: Add a stupid hack for determining whether or not a labs box is bare metal. [puppet] - 10https://gerrit.wikimedia.org/r/258176 (owner: 10Andrew Bogott) [18:26:25] (03CR) 10Catrope: [C: 031] "Scheduled for SWAT later today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257826 (owner: 10Legoktm) [18:26:49] (03PS4) 10Andrew Bogott: Disable everything NFS-related on baremental labs boxes. [puppet] - 10https://gerrit.wikimedia.org/r/258177 [18:27:21] chase, ori, dig this: https://dpaste.de/Ac32 [18:29:11] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [18:29:26] 6operations, 10ops-eqiad: rack/setup/deploy rdb1005 & rdb1006 - https://phabricator.wikimedia.org/T119543#1870197 (10Cmjohnson) [18:29:28] 6operations, 10hardware-requests: Add another redis jobqueue server master and slave - https://phabricator.wikimedia.org/T89400#1870199 (10Cmjohnson) [18:30:03] 7Puppet, 6Phabricator: phabricator at labs is not up to date - https://phabricator.wikimedia.org/T117441#1870206 (10Luke081515) 5Open>3Resolved Works, thanks for instructions :). [18:31:02] (03PS1) 10Jcrespo: Repool es1014 at 100% load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258182 [18:32:40] (03PS2) 10Jcrespo: Repool es1014 at 100% load and reduce es1019 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258182 [18:32:45] andrewbogott: how nice, especially from such a factual command :) [18:33:44] !log reprepro: updating cassandra to latest upstream (2.1.12) [18:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:33:54] godog, gwicke, urandom, mobrovac ^ [18:34:35] wow that was fast :) [18:36:08] <_joe_> also logged! [18:36:24] I'm annoyed that facter doesn't set is_virtual correctly more than I am at the hack to circumvent it. [18:36:45] Oh. [18:36:55] andrewbogott: Heh. Didn't read your last comment. [18:37:46] _joe_: have a min to take a look at https://gerrit.wikimedia.org/r/#/c/257898/ perhaps? [18:38:20] <_joe_> mobrovac: not really [18:38:46] (03CR) 10Jcrespo: [C: 032] Repool es1014 at 100% load and reduce es1019 load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258182 (owner: 10Jcrespo) [18:39:04] (03CR) 10Andrew Bogott: [C: 032] Disable everything NFS-related on baremental labs boxes. [puppet] - 10https://gerrit.wikimedia.org/r/258177 (owner: 10Andrew Bogott) [18:40:06] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1014 at 100% load and reduce es1019 load (duration: 00m 29s) [18:40:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:41:40] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [18:45:07] paravoid: ack! thanks [18:45:15] ori, AaronSchulz: I've upgraded and reconfigured the ES servers to use ROW based replication, performance_schema and SSL. I do not expect changes on savetiming, but feel free to ping me is if it were to gets worse [18:45:39] nice [18:45:39] jynus: ES? [18:45:45] jynus: I amended https://gerrit.wikimedia.org/r/#/c/256875/6/includes/db/DatabaseMysqlBase.php btw [18:46:07] eqiad slaves? [18:46:18] External Storage, sorry [18:46:22] oh, right [18:46:24] the revision text [18:46:32] thanks, i'm not hip to the acronyms [18:46:52] well, it is the name of the hosts es1011... [18:47:04] yeah, i'm just ignorant [18:47:09] not your fault. but htanks for explaining ;) [18:47:10] although I always had preferred Storage External [18:47:28] (aka sex servers) [18:47:37] lol [18:47:53] to not confuse them with Elastic Search, which are called elastic1001 [18:48:32] AaronSchulz, I saw it, do you think it will impact performance on the masters? I can still do the extra column [18:48:35] if we went with that naming scheme, then the main wiki dbs should be 'storage internal', or 'sin' [18:48:53] what extra column? [18:49:53] I mentioned that if master ids were a bad idea, I could add a column with the hostname on the hearybeat table [18:50:01] ori: appropriate [18:50:23] I like the current state, though [18:51:09] I think server ids are the way to go [18:51:10] PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: puppet fail [18:52:53] I'm ok with it [18:53:20] I will send you a couple of changes on the comments, but not today, tomorrow [18:54:57] (03PS1) 10Ottomata: Initial debianization and 1.2.1 release [debs/python-sprockets-clients-statsd] (debian) - 10https://gerrit.wikimedia.org/r/258187 [18:59:07] These are the couple of warnings I mentioned on the commit, and why I didn't like being based on the log names https://phabricator.wikimedia.org/P2402 [19:00:04] thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151210T1900). [19:02:32] starting update of wikipeida wikis to wmf.8 now [19:05:53] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [19:06:26] (03PS1) 10Ottomata: Initial debianization and release 1.3.1 [debs/python-sprockets-mixins-statsd] (debian) - 10https://gerrit.wikimedia.org/r/258189 (https://phabricator.wikimedia.org/T121112) [19:06:33] PROBLEM - Getent speed check on labstore2001 is CRITICAL: CRITICAL: getent group tools.admin failed [19:06:43] (03PS2) 10Ottomata: Initial debianization and 1.2.1 release [debs/python-sprockets-clients-statsd] (debian) - 10https://gerrit.wikimedia.org/r/258187 (https://phabricator.wikimedia.org/T121112) [19:06:54] PROBLEM - Getent speed check on labstore1003 is CRITICAL: CRITICAL: getent group tools.admin failed [19:07:27] (03PS1) 10Thcipriani: all wikis to 1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258190 [19:10:21] (03CR) 10Thcipriani: [C: 032] "php-1.27.0-wmf.8 train" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258190 (owner: 10Thcipriani) [19:10:53] (03Merged) 10jenkins-bot: all wikis to 1.27.0-wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258190 (owner: 10Thcipriani) [19:11:20] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.27.0-wmf.8 [19:11:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:11:43] all wikis now running php-1.27.0-wmf.8 [19:12:41] thcipriani: Woo. [19:12:55] * thcipriani wipes brow [19:13:38] thcipriani: Thursday is kind of anticlimactic after all the work you had to do on Tuesday :) [19:13:57] bd808: I'm fine with that. [19:15:25] (03PS1) 10Cmjohnson: Adding mgmt dns entries for rdb1005/rdb1006 bug: task T119543 [dns] - 10https://gerrit.wikimedia.org/r/258191 [19:15:38] thcipriani: \o/ [19:15:50] (03CR) 10jenkins-bot: [V: 04-1] Adding mgmt dns entries for rdb1005/rdb1006 bug: task T119543 [dns] - 10https://gerrit.wikimedia.org/r/258191 (owner: 10Cmjohnson) [19:17:19] ori: :D [19:17:35] (03PS2) 10Cmjohnson: Adding mgmt dns entries for rdb1005/rdb1006 bug: task T119543 [dns] - 10https://gerrit.wikimedia.org/r/258191 [19:18:03] RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:18:04] (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns entries for rdb1005/rdb1006 bug: task T119543 [dns] - 10https://gerrit.wikimedia.org/r/258191 (owner: 10Cmjohnson) [19:18:13] PROBLEM - puppet last run on mw2015 is CRITICAL: CRITICAL: puppet fail [19:20:16] (03PS1) 10Andrew Bogott: Replace use of $::is_virtual with $::virtual == kvm [puppet] - 10https://gerrit.wikimedia.org/r/258193 [19:22:29] (03CR) 10Andrew Bogott: [C: 032] Replace use of $::is_virtual with $::virtual == kvm [puppet] - 10https://gerrit.wikimedia.org/r/258193 (owner: 10Andrew Bogott) [19:23:36] 6operations, 6Reading-Admin, 10Reading-Community-Engagement: UX strategic test: redirect small portion of unauthenticated desktop users to mobile web - https://phabricator.wikimedia.org/T117826#1870395 (10dr0ptp4kt) [19:29:14] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:36:34] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [19:37:15] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [19:40:35] moritzm: build succeeded! Does that also mean the tests ran OK? From what I can find on google I understand building packages should auto-run tests as well, but it's not entirely clear to me. [19:41:50] 10Ops-Access-Requests, 6operations, 6Multimedia, 5Patch-For-Review: Give Bartosz access to stat1003 ("researchers" and "statistics-users") - https://phabricator.wikimedia.org/T119404#1870469 (10Milimetric) I think Ariel meant to ping @TrevorParscal here, which is important because Bartosz needs this access... [19:42:30] (03CR) 10Aude: [C: 031] "we need to schedule this for swat..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257397 (owner: 10Matěj Suchánek) [19:43:24] PROBLEM - PyBal backends health check on lvs1002 is CRITICAL: PYBAL CRITICAL - parsoidsvc_80 - Could not depool server cp1058.eqiad.wmnet because of too many down!: parsoidcachelb_80 - Could not depool server cp1058.eqiad.wmnet because of too many down! [19:46:24] RECOVERY - puppet last run on mw2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:48:09] 6operations, 7Monitoring, 7Privacy, 7Security-Core: status.wikimedia.org should not load Google Analytics - https://phabricator.wikimedia.org/T115945#1870474 (10Dzahn) status.wikimedia.org is an alias for status.watchmouse.com. http://status.watchmouse.com/ http://status.cloudmonitor.ca.com/ ^ the site i... [19:48:34] valhallasw`cloud: it somehow depends on your build environment and whether it exports DEB_BUILD_OPTIONS. With the default build there's heaps of output of the test suite written to stdout, if you have the build log is should be in there [19:49:23] valhallasw`cloud: and could you mention your successful build on the respective ticket? [19:50:23] Yes, will do. I'll try to see if I can find the build log -- I couldn't find a log in the obvious places (current dir, /var/cache/pbuilder/output) [19:50:46] (03PS1) 10Catrope: Add infrastructure for $wmgEchoUseCrossWikiTrackingTable and enable it on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258201 [19:51:14] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:52:25] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:53:43] (03CR) 10Legoktm: [C: 031] Add infrastructure for $wmgEchoUseCrossWikiTrackingTable and enable it on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258201 (owner: 10Catrope) [19:54:42] ahh, it's called .build :) [20:03:09] (03PS1) 10Ottomata: Add kafka::server::monitoring for WMF specific monitoring, remove deprecated kafka::client class [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258202 [20:06:07] (03PS6) 10Hashar: zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:06:44] (03CR) 10Hashar: [C: 031] "Dropped the link to puppet compiler from commit message." [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:07:08] 10Ops-Access-Requests, 6operations, 6Multimedia, 5Patch-For-Review: Give Bartosz access to stat1003 ("researchers" and "statistics-users") - https://phabricator.wikimedia.org/T119404#1870536 (10TrevorParscal) I, Trevor Parscal, hereby approve of giving Bartosz Dziewoński membership in the "researchers" and... [20:07:59] (03PS1) 10Jforrester: Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 [20:08:01] (03PS1) 10Jforrester: Centralise VisualEditor feedback pages for all wikis except de/en/es/frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 [20:08:27] (03CR) 10Jforrester: [C: 04-2] "Need to plan and announce first, this is unlikely to be the right list of wikis, etc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (owner: 10Jforrester) [20:08:30] (03CR) 10jenkins-bot: [V: 04-1] Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 (owner: 10Jforrester) [20:08:42] (03CR) 10jenkins-bot: [V: 04-1] Centralise VisualEditor feedback pages for all wikis except de/en/es/frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (owner: 10Jforrester) [20:09:26] (03PS2) 10Ottomata: Add kafka::server::monitoring for WMF specific monitoring, remove deprecated kafka::client class [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258202 [20:09:37] (03PS3) 10Ottomata: Add kafka::server::monitoring for WMF specific monitoring, remove deprecated kafka::client class [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258202 (https://phabricator.wikimedia.org/T120957) [20:10:36] going to deploy a parsoid config change. [20:10:40] (03PS2) 10Jforrester: Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 [20:10:41] !log starting parsoid deploy [20:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:12:22] (03PS1) 10BBlack: VCL: no special handling for CentralAutoLogin [puppet] - 10https://gerrit.wikimedia.org/r/258207 (https://phabricator.wikimedia.org/T96847) [20:12:36] (03PS1) 10BBlack: Text VCL: same no-article-cache for mobile as desktop [puppet] - 10https://gerrit.wikimedia.org/r/258208 (https://phabricator.wikimedia.org/T109286) [20:13:08] (03PS7) 10Dzahn: zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 [20:13:14] (03CR) 10Dzahn: [C: 032] zuul: move roles into role module [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:13:31] (03CR) 10Ottomata: [C: 032] Add kafka::server::monitoring for WMF specific monitoring, remove deprecated kafka::client class [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258202 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [20:13:46] (03CR) 10Jforrester: "Patch that will use this: Id48729c89db" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 (owner: 10Jforrester) [20:14:21] (03PS2) 10Jforrester: Centralise VisualEditor feedback pages for all wikis except de/en/es/frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 [20:14:29] (03PS1) 10Ottomata: Update kafka submodule, use kafka::server::monitoring class from it in role::analytics::kafka::* [puppet] - 10https://gerrit.wikimedia.org/r/258210 (https://phabricator.wikimedia.org/T120957) [20:14:29] !log restarted parsoid on wtp1003 as canary [20:14:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:14:43] (03PS3) 10Jforrester: Centralise VisualEditor feedback pages except for de/en/es/frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (https://phabricator.wikimedia.org/T92661) [20:15:14] PROBLEM - Host mw2031 is DOWN: PING CRITICAL - Packet loss = 100% [20:15:24] (03Abandoned) 10BBlack: varnish: return (pass) for CAL URLs [puppet] - 10https://gerrit.wikimedia.org/r/257632 (https://phabricator.wikimedia.org/T96847) (owner: 10BBlack) [20:15:44] RECOVERY - Host mw2031 is UP: PING OK - Packet loss = 0%, RTA = 36.13 ms [20:17:53] (03PS2) 10Ottomata: Update kafka submodule, use kafka::server::monitoring class from it in role::analytics::kafka::* [puppet] - 10https://gerrit.wikimedia.org/r/258210 (https://phabricator.wikimedia.org/T120957) [20:18:15] (03CR) 10Ottomata: "No change, good!" [puppet] - 10https://gerrit.wikimedia.org/r/258210 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [20:18:55] (03CR) 10Dzahn: "no change on gallium and scandium" [puppet] - 10https://gerrit.wikimedia.org/r/257039 (owner: 10Dzahn) [20:18:58] (03PS2) 10BBlack: Text VCL: same no-article-cache for mobile as desktop [puppet] - 10https://gerrit.wikimedia.org/r/258208 (https://phabricator.wikimedia.org/T109286) [20:19:05] (03CR) 10BBlack: [C: 032 V: 032] Text VCL: same no-article-cache for mobile as desktop [puppet] - 10https://gerrit.wikimedia.org/r/258208 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [20:19:17] restarting parsoid on all nodes [20:19:34] (03PS3) 10Ottomata: Update kafka submodule, use kafka::server::monitoring class from it in role::analytics::kafka::* [puppet] - 10https://gerrit.wikimedia.org/r/258210 (https://phabricator.wikimedia.org/T120957) [20:19:43] (03CR) 10Ottomata: [C: 032 V: 032] Update kafka submodule, use kafka::server::monitoring class from it in role::analytics::kafka::* [puppet] - 10https://gerrit.wikimedia.org/r/258210 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [20:21:26] 6operations, 10ops-codfw, 5Patch-For-Review: power off Codfw-Cisco Servers - https://phabricator.wikimedia.org/T115372#1870588 (10Papaul) All Cisco servers are power off and disconnect from network and power. I will pull them out from C1 and C5 to stock them in the cage close to D8. It will make it easy for... [20:21:38] (03PS1) 10Ottomata: Remove pasted error reassigning nagios_servicegroup [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258211 [20:21:51] (03CR) 10Ottomata: [C: 032 V: 032] Remove pasted error reassigning nagios_servicegroup [puppet/kafka] - 10https://gerrit.wikimedia.org/r/258211 (owner: 10Ottomata) [20:22:33] 6operations, 10ops-codfw, 5Patch-For-Review: power off Codfw-Cisco Servers - https://phabricator.wikimedia.org/T115372#1870591 (10Papaul) I am also removing them from racktables and putting them in the decommission rack D4 [20:22:38] (03PS1) 10Ottomata: Update kafka submodule with fix for nagios_servicegroup puppet error [puppet] - 10https://gerrit.wikimedia.org/r/258212 [20:22:46] (03PS2) 10Ottomata: Update kafka submodule with fix for nagios_servicegroup puppet error [puppet] - 10https://gerrit.wikimedia.org/r/258212 [20:22:54] (03CR) 10Ottomata: [C: 032 V: 032] Update kafka submodule with fix for nagios_servicegroup puppet error [puppet] - 10https://gerrit.wikimedia.org/r/258212 (owner: 10Ottomata) [20:24:04] (03PS3) 10BBlack: ssl_ciphersuite: add DHE+3DES option only for "mid" [puppet] - 10https://gerrit.wikimedia.org/r/251153 [20:25:17] (03CR) 10BBlack: [C: 032] ssl_ciphersuite: add DHE+3DES option only for "mid" [puppet] - 10https://gerrit.wikimedia.org/r/251153 (owner: 10BBlack) [20:27:24] (03PS4) 10Dzahn: RT: move role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) [20:27:34] PROBLEM - puppet last run on kafka1018 is CRITICAL: CRITICAL: puppet fail [20:28:12] (03PS5) 10Dzahn: RT: move role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) [20:28:22] (03CR) 10jenkins-bot: [V: 04-1] RT: move role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [20:28:41] (03PS6) 10Dzahn: RT: add role to krypton [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) [20:32:02] !log finished parsoid deploy [20:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:32:23] 6operations, 10ops-codfw: rack 8 new misc systems - https://phabricator.wikimedia.org/T120885#1870631 (10Papaul) [20:37:00] (03CR) 10BBlack: [C: 031] "This should be probably be in a ticket rather than codereview, but FWIW, I'm of the opposite opinion. The meaning of "Test change 123456"" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/248634 (owner: 10Alex Monk) [20:41:49] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant jgirault an jan_drewniak access to the eventlogging db on stat1003 and hive to query webrequests tables on stat1002 - https://phabricator.wikimedia.org/T118998#1870644 (10Dzahn) a:3JGirault [20:42:38] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant jgirault an jan_drewniak access to the eventlogging db on stat1003 and hive to query webrequests tables on stat1002 - https://phabricator.wikimedia.org/T118998#1815162 (10Dzahn) [stat1002:~] $ lastlog | grep jgirault jgirault... [20:53:14] RECOVERY - puppet last run on kafka1018 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:58:56] (03PS1) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:00:05] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [21:04:01] (03PS2) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:04:54] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [21:07:28] (03PS3) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:08:23] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [21:10:55] (03PS4) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:10:56] 6operations, 7Mail: Mails from MediaWiki seem to get (partially) lost - https://phabricator.wikimedia.org/T121105#1870741 (10Dzahn) please add more details: which email addresses, around what time should they have been sent, the mail subject if you know it without that we don't really have anything to check... [21:11:51] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [21:12:45] (03PS5) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:13:20] (03CR) 10coren: "Some instances of role::labs::lvm::srv not included anymore (noted inline) which introduces a semantic change." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/255082 (owner: 10Yuvipanda) [21:19:14] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 1 failures [21:19:19] (03CR) 10coren: [C: 04-1] "Discarding a mismatching server cert and automatically accepting the new one pretty much entirely defeats the point of server certs to beg" [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:20:15] (03CR) 10Yuvipanda: "This is what setting up role::puppet::self right now does - it discards your current certificates and accepts new ones by changing ssldir." [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:20:24] (03CR) 10Ottomata: "No changes according to" [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) (owner: 10Ottomata) [21:20:49] (03CR) 10Yuvipanda: "You also need to explicitly opt into this via a hiera variable, which is also an improvement over the current role::puppet::self" [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:20:52] !log Changed email for mbeattie on otrs_wikiwiki per request of mdennis [21:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:21:47] (03PS6) 10Ottomata: [WIP] Using more generic roles for kafka classes [puppet] - 10https://gerrit.wikimedia.org/r/258220 (https://phabricator.wikimedia.org/T120957) [21:22:17] Reedy, heh, another private wiki email address issue? :) [21:22:26] Indeed [21:23:27] oh, I see the other channel [21:23:32] not quite the same thing that I had in mind [21:23:45] (03CR) 10RobH: "RT's migration is blocked (needs mail relays) until this task is completed: https://phabricator.wikimedia.org/T118176" [puppet] - 10https://gerrit.wikimedia.org/r/250047 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [21:23:59] (I once had to modify an officewiki email address because it contained a typo, making the account useless) [21:24:10] (03CR) 10Catrope: [C: 032] Enable Flow opt-in beta feature on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258101 (https://phabricator.wikimedia.org/T120829) (owner: 10Catrope) [21:24:18] (03CR) 10Catrope: [C: 032] Allow all logged-in users to create Flow boards on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258104 (https://phabricator.wikimedia.org/T120468) (owner: 10Catrope) [21:24:23] (03CR) 10Catrope: [C: 032] Stop overriding Echo's EventLogging revision ids [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257826 (owner: 10Legoktm) [21:24:28] (03CR) 10Catrope: [C: 032] Add infrastructure for $wmgEchoUseCrossWikiTrackingTable and enable it on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258201 (owner: 10Catrope) [21:24:38] bit easier than I would like to create an account no one but sysadmins can fix [21:24:50] (03Merged) 10jenkins-bot: Enable Flow opt-in beta feature on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258101 (https://phabricator.wikimedia.org/T120829) (owner: 10Catrope) [21:25:17] (03Merged) 10jenkins-bot: Allow all logged-in users to create Flow boards on mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258104 (https://phabricator.wikimedia.org/T120468) (owner: 10Catrope) [21:25:37] (03Merged) 10jenkins-bot: Stop overriding Echo's EventLogging revision ids [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257826 (owner: 10Legoktm) [21:25:58] (03Merged) 10jenkins-bot: Add infrastructure for $wmgEchoUseCrossWikiTrackingTable and enable it on testwiki and test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258201 (owner: 10Catrope) [21:26:31] (03CR) 10coren: "I would be much more comfortable with it if there was /some/ method by which not just /any/ certificate was blindly accepted." [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:26:57] (03CR) 10Yuvipanda: "Suggestions welcome. I think this is an improvement over current status quo :)" [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:27:08] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: Echo and Flow config patches (duration: 00m 29s) [21:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:27:14] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [21:27:25] YuviPanda: I can think of a couple solutions for specific use cases, but not of one that is generally useful. [21:27:45] (03CR) 10Yuvipanda: "One option is to have a hiera variable that specifies the puppetmaster's fingerprint and switch only if that matches. I think that'll be s" [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:27:50] (I.e.: switching the 'true' puppetmaster we know we have the new cert handy, switching to self-hosted you can at least validate the host) [21:28:05] !log catrope@tin Synchronized wmf-config/CommonSettings.php: SWAT: Echo and Flow config patches (duration: 00m 29s) [21:28:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:29:53] (03CR) 10coren: "Yep; matching the fingerprint would make a great deal of sense. Check that only if the cert mismatch and the semantics of this change fro" [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) [21:32:35] (03CR) 10Catrope: [C: 032] Set initial Staff password policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222057 (https://phabricator.wikimedia.org/T104370) (owner: 10CSteipp) [21:33:27] (03Merged) 10jenkins-bot: Set initial Staff password policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222057 (https://phabricator.wikimedia.org/T104370) (owner: 10CSteipp) [21:34:58] (03PS1) 10Cmjohnson: Adding production dns for rdb1005/1006 bug: task# T119543 [dns] - 10https://gerrit.wikimedia.org/r/258228 [21:35:11] !log catrope@tin Synchronized wmf-config/CommonSettings.php: New password policy for staff group (duration: 00m 28s) [21:35:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:51] csteipp: ---^^ [21:36:02] csteipp: Please confirm that that's working, if you can [21:36:02] RoanKattouw: Thanks! [21:37:56] (03CR) 10Catrope: [C: 032] Remove never-used VisualEditorBetaInTab config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257801 (owner: 10Jforrester) [21:38:09] (03CR) 10Catrope: [C: 032] Remove always-used VisualEditorShowBetaWelcome config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257802 (owner: 10Jforrester) [21:38:54] (03Merged) 10jenkins-bot: Remove never-used VisualEditorBetaInTab config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257801 (owner: 10Jforrester) [21:39:13] (03Merged) 10jenkins-bot: Remove always-used VisualEditorShowBetaWelcome config option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257802 (owner: 10Jforrester) [21:41:06] (03CR) 10coren: "Some notes inline. nothing big." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [21:41:11] 6operations, 6Phabricator: migrate RT maint-announce into phabricator - https://phabricator.wikimedia.org/T118176#1870845 (10RobH) I have to put too much info regarding aliases for this to remain in public domain. [21:42:17] (03CR) 10Cmjohnson: [C: 032] Adding production dns for rdb1005/1006 bug: task# T119543 [dns] - 10https://gerrit.wikimedia.org/r/258228 (owner: 10Cmjohnson) [21:42:33] (03PS3) 10EBernhardson: A/B test for search lang detect via accept-language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258084 (https://phabricator.wikimedia.org/T119528) [21:43:18] (03CR) 10Ori.livneh: toollabs: migrate to redis::instance (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [21:44:50] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/Gadgets/: Fix MediaWiki:MediaWiki: (duration: 00m 29s) [21:44:54] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:45:21] (03PS3) 10Jforrester: Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 [21:45:37] (03CR) 10coren: [C: 031] "As far as I can tell, this covers all references in the manifest. (The filenames continue to use a dash but I expect that's not an issue)" [puppet] - 10https://gerrit.wikimedia.org/r/258057 (owner: 10Dzahn) [21:46:23] (03CR) 10Catrope: [C: 032] Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 (owner: 10Jforrester) [21:46:32] !log restbase: canary deploy of 7398850fe9 to restbase1001 [21:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:47:13] ostriches: hmm, github replication broken? https://github.com/wikimedia/mediawiki-extensions-NotebookViewer is 404 gerrit repo exists [21:47:19] (03Merged) 10jenkins-bot: Allow VisualEditor feedback pages to be centralised [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258205 (owner: 10Jforrester) [21:47:49] "Surprise SWAT"? :o [21:48:08] thanks RoanKattouw [21:48:10] legoktm: This evening's SWAT had 14 patches. Greg was lovely and agreed to do an early one. [21:48:13] YuviPanda: I'm off today, sick. Not broken. Auto Repo creation is broke [21:48:17] (And RoanKattouw was lovely too, of course.) [21:48:26] Ergo repo needs creation [21:48:26] ostriches: <3 ok take care [21:48:44] Repos get man-made now [21:49:29] !log catrope@tin Synchronized wmf-config/: SWAT: VisualEditor config patches (duration: 00m 30s) [21:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:49:39] !log restbase: starting full deploy of 7398850fe9 to restbase cluster [21:49:42] ostriches: ah, ok. I'll just create it then [21:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:41] (03PS1) 10EBernhardson: Bump portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258246 [21:53:08] (03CR) 10Dzahn: [C: 04-1] "@Coren thank you! no change was intended, only moving the existing role classes around. must have been a rebase issue, i need amend" [puppet] - 10https://gerrit.wikimedia.org/r/255082 (owner: 10Yuvipanda) [21:53:54] (03CR) 10Yuvipanda: "No I think I explicitly removed them. They aren't really being used, and removing them won't change the current instances. New instances d" [puppet] - 10https://gerrit.wikimedia.org/r/255082 (owner: 10Yuvipanda) [21:56:40] !log restbase: finished full deploy of 7398850fe9 to restbase cluster [21:56:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:05:37] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/VisualEditor/: SWAT (duration: 00m 47s) [22:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:06:19] (03PS1) 10Dzahn: site: add technetium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/258354 (https://phabricator.wikimedia.org/T118763) [22:06:45] (03CR) 10coren: [C: 031] "Works for me, then." [puppet] - 10https://gerrit.wikimedia.org/r/257534 (owner: 10Ori.livneh) [22:07:20] (03PS2) 10Dzahn: site: add technetium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/258354 (https://phabricator.wikimedia.org/T118763) [22:07:51] (03CR) 10Dzahn: [C: 032] site: add technetium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/258354 (https://phabricator.wikimedia.org/T118763) (owner: 10Dzahn) [22:09:26] 6operations, 6Services: reinstall OCG servers - https://phabricator.wikimedia.org/T84723#1870907 (10Dzahn) 5Open>3stalled [22:12:18] (03PS4) 10Jforrester: Centralise all VisualEditor feedback pages except for a few wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258206 (https://phabricator.wikimedia.org/T92661) [22:12:30] (03CR) 10coren: [C: 031] "This is very nice, and works on my local puppet install. That said, this very much needs an extra pair of eyes on it because I don't know" [puppet] - 10https://gerrit.wikimedia.org/r/249489 (https://phabricator.wikimedia.org/T116813) (owner: 10Merlijn van Deen) [22:13:07] (03CR) 10coren: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/242044 (https://phabricator.wikimedia.org/T114063) (owner: 10Yuvipanda) [22:13:53] 6operations: ftpsync@carbon - mirror sync - ERROR - https://phabricator.wikimedia.org/T100482#1870917 (10Dzahn) just checked on carbon if this is still here, and it's not.. must have been fixed at some point. not sure how but it's gone [22:14:16] 6operations: ftpsync@carbon - mirror sync - ERROR - https://phabricator.wikimedia.org/T100482#1870918 (10Dzahn) 5Open>3Resolved a:3Dzahn [22:15:25] 6operations, 6Security, 7Monitoring: use rkhunter/chkrootkit and have monitoring for it - https://phabricator.wikimedia.org/T82288#1870921 (10Dzahn) 5Open>3declined a:3Dzahn [22:15:27] 6operations, 6Security, 7Monitoring: use rkhunter/chkrootkit and have monitoring for it - https://phabricator.wikimedia.org/T82288#898979 (10Dzahn) [22:17:17] 6operations, 10Reading-Web, 7Varnish: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1870935 (10Dzahn) Should we delete it from DNS or redirect it? Will there ever be an actual mobile site? probably not, right? [22:18:06] 6operations, 6Labs, 10Labs-Infrastructure, 10Reading-Web, 7Mobile: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1870936 (10Dzahn) [22:22:33] 6operations, 7HTTPS: SSL cert needed for benefactorevents.wikimedia.org - https://phabricator.wikimedia.org/T115028#1870941 (10Dzahn) [22:23:09] 6operations: ssh connection to wikimedia cluster is slooooooow for me - https://phabricator.wikimedia.org/T119275#1870944 (10Dzahn) 5Open>3Resolved a:3Dzahn 14:24 < Reedy> I thought it was an ipv6 issue 14:25 < Reedy> It seems ok 14:25 < Reedy> We can probably close it for now [22:24:20] 6operations, 7HTTPS: SSL cert needed for benefactorevents.wikimedia.org - https://phabricator.wikimedia.org/T115028#1712876 (10Dzahn) @Bblack @Robh thoughts on this? [22:25:32] legoktm: poke [22:25:39] matanya: hey [22:25:50] I'd like to solve https://phabricator.wikimedia.org/T119998 [22:26:13] can you please guide me to the right conf file ? i can push the fix if pointed at [22:26:18] 6operations, 7HTTPS: SSL cert needed for benefactorevents.wikimedia.org - https://phabricator.wikimedia.org/T115028#1870962 (10RobH) a:3BBlack Brandon would need to weigh in on this, I just order them. [22:27:28] so legoktm i looked at mediawiki-config but didn't find any pointers to this [22:27:41] matanya: https://tools.wmflabs.org/robin/?tool=uploadconfig "Disabled due to lack of MediaWiki:Licenses" [22:28:00] matanya: they just need to create "MediaWiki:Licenses" on the local wiki and uploads will be enabled again [22:28:12] with some actual licenses hopefully :) [22:28:25] that's all ? [22:28:42] yeah [22:29:02] https://no.wikimedia.org/wiki/MediaWiki:Licenses [22:29:45] hmm [22:30:01] 6operations, 6Labs, 10Labs-Infrastructure, 10Reading-Web, 7Mobile: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1870974 (10Krenair) Well, wikitech has MobileFrontend installed: https://wikitech.wikimedia.org/wiki/?useformat=mobile That domain w... [22:30:20] matanya: for whatever reason, that isn't good enough... [22:30:30] https://no.wikimedia.org/wiki/Spesial:Last_opp To be able to use this special page to upload to this wiki, an administrator needs to add one or more license options to the page MediaWiki:Licenses. [22:30:30] Use the following format: * Template name|Label. Use any text to enable uploading without license options. [22:30:39] (03PS1) 10GWicke: Minor / restbase config: Explicitly set the return status for robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/258356 [22:30:40] 6operations, 6Labs, 10Reading-Web, 10wikitech.wikimedia.org: [Regression] Unable to browse certain wikitech.wikimedia.org urls from mobile device (Apache error) - https://phabricator.wikimedia.org/T120528#1870977 (10Krenair) [22:30:49] 6operations, 6Labs, 10Labs-Infrastructure, 10Reading-Web, and 2 others: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1870980 (10Krenair) [22:31:38] legoktm: i changed it to any text, and still failing [22:32:23] 6operations, 6Labs, 10Labs-Infrastructure, 10Reading-Web, and 2 others: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1855709 (10Krenair) See T87633 [22:32:39] matanya: I'm not sure then, I can take a more detailed look in a bit [22:32:40] 6operations, 6Labs, 10Labs-Infrastructure, 10Reading-Web, and 2 others: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#1870989 (10Krenair) [22:32:51] thanks legoktm [22:34:31] proof that deployment sucks: Notice: Undefined variable: wmgVisualEditorConsolidateFeedback in /srv/mediawiki/wmf-config/CommonSettings.php on line 2033 https://github.com/wikimedia/operations-mediawiki-config/commit/646b1441d0ef8 [22:34:36] https://logstash.wikimedia.org/#/dashboard/elasticsearch/hhvm [22:34:40] Syncing to live servers [22:34:44] No such thing as atomicity that way [22:35:10] sync IS [22:35:12] sync CS [22:35:14] ... [22:35:15] profit [22:38:26] 6operations, 7HTTPS: SSL cert needed for benefactorevents.wikimedia.org - https://phabricator.wikimedia.org/T115028#1871008 (10BBlack) Yeah @CCogdill_WMF and I talked about this before and we're planning to do it, I just haven't had time to get back around to this yet. This will be a case where we order and s... [22:41:25] (03CR) 10Cmjohnson: [C: 032] Minor / restbase config: Explicitly set the return status for robots.txt [puppet] - 10https://gerrit.wikimedia.org/r/258356 (owner: 10GWicke) [22:41:35] Reedy: Yeah [22:42:00] If C wasn't before I, we might not have this problem :) [22:43:34] Maybe we should rename InititaliseSettings to SpecificSettings.php? [22:43:40] Then it would be after. :_0 [22:43:56] `git blame` would hate me. [22:45:08] James_F: -M -w to git blame will ignore both whitespace changes and moves [22:45:18] will track even just moving blocks of code inside the same file [22:45:19] YuviPanda: Sure, and --follow will help too. [22:45:30] YuviPanda: But people won't use the flags by default. [22:45:37] indeed [22:45:50] as other channels note, everything sucks, including git's UI [22:46:16] * James_F grins. [22:47:19] "As a git user I want to spend a lot of time learning it so I can feel smug about knowing more of those things than all the other people who have not spent the time to dive into it' [22:47:24] s/'/"/ [22:48:07] 6operations, 6Reading-Admin, 10Reading-Community-Engagement: UX strategic test: redirect small portion of unauthenticated desktop users to mobile web - https://phabricator.wikimedia.org/T117826#1871047 (10BBlack) So, in technical terms at the varnish level, by the time we're working on this we'll have mobile... [22:48:09] !log Re-deployed a bunch of security patches for wmf8 [22:48:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:48:45] YuviPanda: I was told off for complaining that you can't do `git filter-branch` in the UI interface of GitHub. Apparently I ask too much for what other people don't consider a simple operation. :-) [22:49:08] git filter-branch is a bit dangerous since it rewrites history intensely i guess [22:49:21] so clearly you must be hazed before you can be trusted with it! [22:49:57] (03PS1) 10Cmjohnson: Adding dhcp entries and add setting uprdb105/6 to intall mw.cfg bug: task# T119543 [puppet] - 10https://gerrit.wikimedia.org/r/258360 [22:51:01] (03CR) 10Cmjohnson: [C: 032] Adding dhcp entries and add setting uprdb105/6 to intall mw.cfg bug: task# T119543 [puppet] - 10https://gerrit.wikimedia.org/r/258360 (owner: 10Cmjohnson) [22:51:54] (03PS1) 10RobH: new unified cluster cert [puppet] - 10https://gerrit.wikimedia.org/r/258362 [22:51:59] bblack: ^ all yours [22:52:06] oh [22:52:10] is this including w.wiki? [22:52:12] legoktm: ^ [22:53:00] 6operations, 10ops-eqiad, 5Patch-For-Review: rack/setup/deploy rdb1005 & rdb1006 - https://phabricator.wikimedia.org/T119543#1871062 (10Cmjohnson) [22:53:04] YuviPanda: yes [22:53:18] wooo! [22:53:20] \o/ [22:53:20] the cert does, i cannot speak to backend other things [22:53:22] but yep [22:53:23] right [22:53:27] but woooo \o/ [22:53:57] i had to add w.wiki to our domain listing to do it, its most def on there [22:54:11] robh: did you disable puppet on carbon? [22:54:17] nope [22:54:18] bblack ? [22:54:29] cmjohnson1: no one put a reason in the disable eh? [22:54:39] nope [22:54:42] folks seem to forget they can but comments in =P [22:54:48] and should SAL it as well [22:55:45] i wonder if I should re-enable or leave it till tomorrow ... [22:56:42] the last sal about carbon was mortiz updating openssl on the 7th [22:56:46] has it been disabled since then? [22:56:51] yeah it has it [22:57:00] we have a whole deploy process to do though, and verification before that [22:58:00] sorry that answer was about w.wiki [22:58:12] not about whatever above [22:58:20] yeah [22:59:49] cmjohnson1: if its been since then id say they forgot to renable but dunno [23:00:29] yeah....dunno either. I will wait till tomorrow. I have to go soon anyway...have to get to practice [23:05:50] (03PS1) 10Aaron Schulz: [WIP] Configure $wgCdnReboundPurgeDelay [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258365 [23:09:29] robh: yaaaay thanks :D [23:10:21] (03PS2) 10Krinkle: Remove profiler config variables that no longer exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257437 [23:14:14] (03PS9) 10MaxSem: WIP: OSM replication for maps [puppet] - 10https://gerrit.wikimedia.org/r/254490 (https://phabricator.wikimedia.org/T110262) [23:14:33] fuck [23:14:50] so git-review works on this repo but not on GeoData... [23:15:22] (03CR) 10jenkins-bot: [V: 04-1] WIP: OSM replication for maps [puppet] - 10https://gerrit.wikimedia.org/r/254490 (https://phabricator.wikimedia.org/T110262) (owner: 10MaxSem) [23:17:14] (03PS2) 10Aaron Schulz: [WIP] Configure $wgCdnReboundPurgeDelay [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258365 (https://phabricator.wikimedia.org/T113192) [23:20:19] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 29s) [23:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:56] (03PS1) 10Alex Monk: Update interwiki.cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258375 (https://phabricator.wikimedia.org/T120937) [23:24:26] (03CR) 10Alex Monk: [C: 032] "Already in prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258375 (https://phabricator.wikimedia.org/T120937) (owner: 10Alex Monk) [23:25:32] (03Merged) 10jenkins-bot: Update interwiki.cdb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258375 (https://phabricator.wikimedia.org/T120937) (owner: 10Alex Monk) [23:25:52] 6operations, 6Release-Engineering-Team, 10Wikimedia-Apache-configuration: Make it possible to quickly and programmatically pool and depool application servers - https://phabricator.wikimedia.org/T73212#760100 (10bd808) [23:25:55] 6operations, 10Deployment-Systems, 6Performance-Team, 7HHVM, 3Scap3: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#1414314 (10bd808) [23:27:30] !log ori@tin Synchronized php-1.27.0-wmf.8/extensions/WikimediaEvents: Ia44ec5ed4: Updated mediawiki/core Project: mediawiki/extensions/WikimediaEvents 152ecb10311bb04f4f2f91775cf821aff14aa327 (duration: 00m 30s) [23:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:30:07] ori: RoanKattouw's about to emergency-deploy a couple of fatals fixes. Clear or do you need him to hold off? [23:30:19] * RoanKattouw waits for Jenkins [23:30:25] That too. ;-) [23:31:59] Is the Math one that important? [23:35:00] clear [23:38:25] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/Echo: Fix errors (duration: 00m 29s) [23:38:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:56] !log catrope@tin Synchronized php-1.27.0-wmf.8/extensions/Math: Fix errors (duration: 00m 29s) [23:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:39:34] 'fix errors'? [23:39:52] isn't that the goal of most commits? :) [23:39:55] hehe [23:40:32] supposedly [23:40:34] haha yeah [23:40:37] PHP fatals/notices in this case [23:49:45] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: puppet fail [23:50:05] The ImageMap error was pretty spammy too :P [23:54:45] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: puppet fail [23:55:56] !log restbase: deploy 9657c4e [23:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:58:02] ori: Hm.. that wikimediaevents change worked without the core change? [23:58:24] Oh it did get backported. Didn't show up in SAL though [23:59:22] I silenced it, since i figure one log message for that change was enough