[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T0000). [00:00:04] subbu, tgr, and Amir1: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:28] o/ [00:00:39] o/ [00:02:05] (03PS1) 10Andrew Bogott: horizon: managehome => true for horizon service user [puppet] - 10https://gerrit.wikimedia.org/r/408720 [00:03:29] o/ [00:09:20] anyone swatting? :) [00:10:04] if no one is doing it, I do it [00:13:36] okay let's do it [00:14:15] ty :) [00:15:07] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3951131 (10Dzahn) Confirmed i saw the new signature now. I added bstorm to the .users file and then gpg --clearsign'ed the .users file Then i re-encrypted all (ops) files with... [00:15:28] Krinkle: You said the swagger spec is enwiki-only...where does that enwiki get defined? [00:15:35] I see how it's passed to the service checker script [00:15:37] :) [00:16:07] no_justification: Presumably something somewhere will need to know where to fetch the swagger file, and assuming it gets it over http, it'll be in the hostname there as en.wikipedia.org [00:16:50] (03PS2) 10Ladsgroup: Enable RemexHtml on wikis with < 10 errors in all high-priority categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407706 (https://phabricator.wikimedia.org/T184656) (owner: 10Subramanya Sastry) [00:16:52] Ahhh, so what we need is to have it in all the project docroots. [00:17:05] And like you said: swap out "Main Page" for isMainPage or w/e? [00:17:16] subbu: Do I need to do any sort of special stuff? [00:17:33] like creating tables, running maintenance script, etc? [00:17:41] for my patch, it is just config deploy. [00:17:50] put it on mwdebug1002 first. [00:18:23] okay [00:18:44] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3951145 (10Dzahn) [00:19:17] (03CR) 10Ladsgroup: [C: 032] Enable RemexHtml on wikis with < 10 errors in all high-priority categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407706 (https://phabricator.wikimedia.org/T184656) (owner: 10Subramanya Sastry) [00:20:02] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917475 (10Dzahn) 05Open>03Resolved Looks like we are all done. If there are any issues or things missing, please just reopen it. [00:20:09] zuul is crazy [00:20:19] (so many patchsets) [00:21:07] (03Merged) 10jenkins-bot: Enable RemexHtml on wikis with < 10 errors in all high-priority categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407706 (https://phabricator.wikimedia.org/T184656) (owner: 10Subramanya Sastry) [00:21:23] (03CR) 10jenkins-bot: Enable RemexHtml on wikis with < 10 errors in all high-priority categories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407706 (https://phabricator.wikimedia.org/T184656) (owner: 10Subramanya Sastry) [00:21:36] (03PS2) 10Dzahn: gerrit: Move header styles back out of login section [puppet] - 10https://gerrit.wikimedia.org/r/408611 (owner: 10Krinkle) [00:21:55] (03CR) 10Dzahn: [C: 032] gerrit: Move header styles back out of login section [puppet] - 10https://gerrit.wikimedia.org/r/408611 (owner: 10Krinkle) [00:22:45] subbu: your patch is in wmdebug1002 [00:22:51] *mwdebug1002 [00:23:09] ty. testing. [00:24:22] no_justification: I see lots of 28 Catchable fatal error: Argument 2 passed to MWHttpRequest::factory() must be an instance of array, null given in /srv/mediawiki/php-1.31.0-wmf.20/extensions/ExtensionDistributor/includes/providers/GerritExtDistProvider.php on line 42 in fatalmonitor [00:24:22] Amir1, tests are good ... if there are no errors / warnings in the logs related to this, good to go. [00:24:29] is it good? [00:24:48] Amir1: There's a fix I just backported to wmf.20 for that [00:24:56] cool [00:24:59] thanks [00:25:17] If you're doing swat, pull in wmf.20 for core and sync out MWHttpRequest [00:25:18] :) [00:25:59] no_justification: sure [00:26:09] let me get this deployed first and then get to it [00:26:13] :) [00:26:21] (03CR) 10Dzahn: [C: 032] gerrit: Apply class=loginParent earlier on login page. [puppet] - 10https://gerrit.wikimedia.org/r/408612 (https://phabricator.wikimedia.org/T185506) (owner: 10Krinkle) [00:26:29] (03PS2) 10Dzahn: gerrit: Apply class=loginParent earlier on login page. [puppet] - 10https://gerrit.wikimedia.org/r/408612 (https://phabricator.wikimedia.org/T185506) (owner: 10Krinkle) [00:27:50] (03CR) 10Dzahn: [C: 032] gerrit: Scope login-specific styles to loginParent [puppet] - 10https://gerrit.wikimedia.org/r/408613 (https://phabricator.wikimedia.org/T185506) (owner: 10Krinkle) [00:27:57] (03PS2) 10Dzahn: gerrit: Scope login-specific styles to loginParent [puppet] - 10https://gerrit.wikimedia.org/r/408613 (https://phabricator.wikimedia.org/T185506) (owner: 10Krinkle) [00:28:43] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:407706|Enable RemexHtml on wikis with < 10 errors in all high-priority categories (T184656)]] (duration: 01m 09s) [00:28:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:28:57] T184656: Enable RemexHTML on additional wikis with < 10 high-priority linter errors in all linter categories - https://phabricator.wikimedia.org/T184656 [00:30:22] Amir1, \o/ ty. [00:31:08] thanks for deploying with releng, please keep the logs clean [00:31:17] no_justification: syncing yours [00:31:30] tgr: around? [00:31:48] !log ladsgroup@tin Synchronized php-1.31.0-wmf.20/includes/http/MWHttpRequest.php: [[gerrit:408718|MWHttpRequest: Restore ability to pass null for $options]] (duration: 01m 11s) [00:31:49] Amir1: here [00:32:05] deploying yours next, does it require any order or any special work? [00:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:32:16] Amir1: Ty! [00:32:16] like running maintenance script [00:32:28] yw :) [00:32:49] Amir1: no [00:34:07] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407872 (https://phabricator.wikimedia.org/T180921) (owner: 10Gergő Tisza) [00:37:06] (03PS2) 10Ladsgroup: Support fallback values for referrer policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407872 (https://phabricator.wikimedia.org/T180921) (owner: 10Gergő Tisza) [00:37:43] (03CR) 10Ladsgroup: [C: 032] Support fallback values for referrer policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407872 (https://phabricator.wikimedia.org/T180921) (owner: 10Gergő Tisza) [00:38:38] sorry, I completely forgot this rebase thing, happens all the time [00:40:59] (03Merged) 10jenkins-bot: Support fallback values for referrer policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407872 (https://phabricator.wikimedia.org/T180921) (owner: 10Gergő Tisza) [00:41:09] (03CR) 10jenkins-bot: Support fallback values for referrer policy [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407872 (https://phabricator.wikimedia.org/T180921) (owner: 10Gergő Tisza) [00:42:10] tgr: it's on mwdebug1002 [00:43:04] Amir1: works [00:43:12] ack. [00:44:17] (03PS2) 10Ladsgroup: Enable AICaptcha data collection everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408364 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [00:45:18] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:407872|Support fallback values for referrer policy (T180921)]] (duration: 01m 12s) [00:45:23] (03CR) 10Dzahn: [C: 031] ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox) [00:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:32] T180921: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921 [00:46:34] how long are wiki pages cached in Varnish? [00:46:39] (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408364 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [00:49:26] (03Merged) 10jenkins-bot: Enable AICaptcha data collection everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408364 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [00:49:40] (03CR) 10jenkins-bot: Enable AICaptcha data collection everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408364 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [00:50:10] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3951223 (10Papaul) 05Open>03Resolved [00:50:43] tgr: it's up in mwdebug1002 [00:51:19] arg, more broken RL messages [00:52:19] (03CR) 10Dzahn: ircecho: Support ssl when connecting to irc (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox) [00:52:26] should I revert? [00:52:41] this came up yesterday, thcipriani used https://gist.github.com/thcipriani/869dc99a53dba8c99eaaf3cbcad1b8f5 to fix it [00:52:54] except it would be group2 now [00:53:29] or you can just revert if you want to finish the last patch in time, and I'll find another window for this [00:53:41] should have thought of checking the messages for group2 [00:55:24] Krinkle hi, the styles are not applying on the iPhone [00:55:24] it's updating these wikis [00:55:27] https://gerrit.wikimedia.org/r/login/%23%2Fc%2F405591%2F [00:55:50] group2 is a lot of wikis but this is rather fast [00:55:58] paladox: Did you refresh? [00:56:00] now bewiki [00:56:44] Krinkle: yep [00:56:48] (03CR) 10Paladox: ircecho: Support ssl when connecting to irc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox) [00:56:50] Or is it cached [00:56:51] paladox: screenshot? [00:56:58] Yep [00:57:43] (03PS1) 10Papaul: Partman: Add tendril2001 to partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/408731 [00:58:16] Seems to work in simulation - https://i.imgur.com/bcMHtO8.png [00:58:19] (03CR) 10jerkins-bot: [V: 04-1] Partman: Add tendril2001 to partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/408731 (owner: 10Papaul) [00:58:26] Krinkle: https://phabricator.wikimedia.org/F13158016 [00:59:46] tgr: mlwiki now [00:59:52] will end really soon [01:00:23] cool [01:00:29] beside that is there anything else? Is it okay to move forward or do you want to test again? [01:00:31] enwiki is fixed so it seems to work [01:00:40] okay [01:01:32] Krinkle: hmm it works in private mode [01:01:37] other than the RL cache issue, works as expected [01:01:56] paladox: tried on iPhone SE and 5S in person just now. [01:01:59] Works without private mode [01:02:01] probably cache [01:02:09] Yeh [01:02:28] I was looking for the cache but found nothing under gerrit.wikimedia.org [01:02:37] Yeah, me neither. [01:02:40] Though I found wikimedia.org (14mb) [01:02:46] Might be varnish misc [01:03:10] Gerrit isent behind varnish [01:03:15] Yeah [01:03:16] It uses letsencrypt [01:03:18] but the server cache is short [01:03:29] okay the cache invalidation is done [01:03:31] Oh [01:03:32] and not the issue as otherwise we'd see it on all devices [01:03:33] moving forward [01:03:51] paladox: Looks like Gerrit is sending "Cache-Control: private, max-age=31536000" [01:03:55] which means no cache in Varnish [01:04:00] and 1 year cache in your browser [01:04:02] that's... not very good [01:04:12] I guess it's expected to change the url if it changes [01:05:07] Krinkle: works now [01:05:16] Krinkle: oh [01:05:27] That’s probaly gerrit sending that [01:05:35] (I restarted the iPhone X) [01:05:38] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:408364|Enable AICaptcha data collection everywhere (T186244)]] (duration: 01m 11s) [01:05:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:05:51] tgr: Live everywhere, please take a look [01:05:52] T186244: Deploy AICaptcha data collection - https://phabricator.wikimedia.org/T186244 [01:05:57] (03PS1) 10Krinkle: gerrit: Invalidate gerritLogin.js cache [puppet] - 10https://gerrit.wikimedia.org/r/408732 [01:06:10] mutante: paladox: ^ [01:06:12] (03PS2) 10Ladsgroup: Enable fine grained usage tracking, another batch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408624 (https://phabricator.wikimedia.org/T186645) [01:06:38] looks good, thanks Amir1! [01:06:42] Thank you [01:06:59] Thank you for your great work tgr, keep it up [01:07:17] my patch is not testable [01:07:23] Lenovo says i should immediately stop using my computer [01:07:33] because there might be a loose screw that makes it go up in flames [01:07:48] so i gotta go and schedule an appointment to have that screw checked [01:07:58] (03CR) 10Paladox: [C: 031] gerrit: Invalidate gerritLogin.js cache [puppet] - 10https://gerrit.wikimedia.org/r/408732 (owner: 10Krinkle) [01:08:49] mutante the screw can break your battery I think [01:09:06] (03CR) 10Dzahn: [C: 032] gerrit: Invalidate gerritLogin.js cache [puppet] - 10https://gerrit.wikimedia.org/r/408732 (owner: 10Krinkle) [01:09:09] (03CR) 10Ladsgroup: [C: 032] Enable fine grained usage tracking, another batch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408624 (https://phabricator.wikimedia.org/T186645) (owner: 10Ladsgroup) [01:10:53] (03Merged) 10jenkins-bot: Enable fine grained usage tracking, another batch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408624 (https://phabricator.wikimedia.org/T186645) (owner: 10Ladsgroup) [01:11:18] (03CR) 10jenkins-bot: Enable fine grained usage tracking, another batch. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408624 (https://phabricator.wikimedia.org/T186645) (owner: 10Ladsgroup) [01:14:22] !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:408624|Enable fine grained usage tracking, another batch. (T186645)]] (duration: 01m 11s) [01:14:33] (03CR) 10Paladox: ircecho: Support ssl when connecting to irc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox) [01:14:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:14:34] T186645: Enable lua fine grained usage tracking -Early February 2018 batch - https://phabricator.wikimedia.org/T186645 [01:14:41] SWAT is done [01:25:15] (03CR) 10Dzahn: ircecho: Support ssl when connecting to irc (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405591 (owner: 10Paladox) [01:29:00] 10Operations, 10Gerrit, 10Patch-For-Review, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3951289 (10Krinkle) 05Open>03Resolved [01:29:06] 10Operations, 10Gerrit, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3917733 (10Krinkle) [01:29:55] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951291 (10Krinkle) 05Open>03stalled [01:40:54] (03PS2) 10Dzahn: Partman: Add tendril2001 to partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/408731 (https://phabricator.wikimedia.org/T186123) (owner: 10Papaul) [01:41:09] (03CR) 10Dzahn: "fixed commit message to make jenkins-bot vote +1" [puppet] - 10https://gerrit.wikimedia.org/r/408731 (https://phabricator.wikimedia.org/T186123) (owner: 10Papaul) [01:42:23] (03CR) 10Dzahn: "i'm afraid the naming thing hasn't been decided though (on ticket)" [puppet] - 10https://gerrit.wikimedia.org/r/408731 (https://phabricator.wikimedia.org/T186123) (owner: 10Papaul) [02:01:20] (03PS4) 10Krinkle: Improve wmf-config file documentation headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407152 [02:01:33] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3951340 (10Tgr) @Nuria the new config is live now (although it will only take effect gradually due to Varnish caching). Can you check if t... [02:02:00] (03CR) 10Krinkle: [C: 032] Improve wmf-config file documentation headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407152 (owner: 10Krinkle) [02:02:18] 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic: Lower varnish caching length on doc.wikimedia.org - https://phabricator.wikimedia.org/T184255#3951341 (10Legoktm) 05Open>03Resolved a:03Legoktm Yep! [02:06:12] (03Merged) 10jenkins-bot: Improve wmf-config file documentation headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407152 (owner: 10Krinkle) [02:09:27] (03CR) 10jenkins-bot: Improve wmf-config file documentation headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407152 (owner: 10Krinkle) [02:16:58] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951353 (10BBlack) @Krinkle - Seems sane to try. I do wonder if browsers will actually allow `Age: 0` to... [02:21:02] (03CR) 10Chad: "If we just moved the file to not have .cached. in it, we'd drop the caching. I don't like style numbers..." [puppet] - 10https://gerrit.wikimedia.org/r/408732 (owner: 10Krinkle) [02:23:36] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951362 (10Krinkle) 05stalled>03Open [02:24:43] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#1448575 (10Krinkle) @BBlack Thanks for the quick response. I'll put move it out of blocked then. Will let... [02:24:49] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951365 (10Krinkle) a:05Krinkle>03None [02:25:16] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.17) (duration: 06m 34s) [02:25:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:08] (03Abandoned) 10Legoktm: Add basic debug logging functionality [software/service-checker] - 10https://gerrit.wikimedia.org/r/308019 (owner: 10Legoktm) [03:07:12] (03PS4) 10KartikMistry: Add Matxin MT config [puppet] - 10https://gerrit.wikimedia.org/r/407197 (https://phabricator.wikimedia.org/T186204) [03:25:59] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 783.52 seconds [03:26:14] (03PS4) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 [03:29:17] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [03:41:15] (03PS2) 10Andrew Bogott: horizon: managehome => true for horizon service user [puppet] - 10https://gerrit.wikimedia.org/r/408720 [03:47:03] (03CR) 10Andrew Bogott: [C: 032] horizon: managehome => true for horizon service user [puppet] - 10https://gerrit.wikimedia.org/r/408720 (owner: 10Andrew Bogott) [03:49:49] PROBLEM - puppet last run on scb2006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apertium-apy] [03:56:00] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 257.45 seconds [04:06:50] (03PS1) 10Andrew Bogott: openstack::common: Add version switching so this can be applied on Stretch [puppet] - 10https://gerrit.wikimedia.org/r/408757 [04:14:02] (03CR) 10Andrew Bogott: [C: 032] openstack::common: Add version switching so this can be applied on Stretch [puppet] - 10https://gerrit.wikimedia.org/r/408757 (owner: 10Andrew Bogott) [04:15:14] (03PS5) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 [04:17:02] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [04:19:49] RECOVERY - puppet last run on scb2006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:27:57] (03PS6) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 [04:29:30] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [04:31:46] (03PS7) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 [04:33:22] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [04:38:59] PROBLEM - puppet last run on etherpad1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:01:51] (03PS8) 10Chad: Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 [05:03:20] (03CR) 10jerkins-bot: [V: 04-1] Move all dblists on noc to dblists/ directory, rather than individually [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [05:03:59] RECOVERY - puppet last run on etherpad1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:27:43] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [06:27:43] PROBLEM - Check systemd state on scb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:00] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3951635 (10Marostegui) 05Open>03Resolved Worked fine this time - thanks! ``` root@db2039:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380312089D0) Port... [06:28:20] PROBLEM - ores on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:29:09] PROBLEM - cxserver endpoints health on scb1002 is CRITICAL: /v2/translate/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium, adapt the links to target language wiki.) timed out before a response was received: /v1/mt/{from}/{to}{/provider} (Machine translate an HTML fragment using Apertium.) timed out before a response was received [06:32:39] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [06:32:49] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received [06:33:39] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy [06:34:00] RECOVERY - cxserver endpoints health on scb1002 is OK: All endpoints are healthy [06:39:20] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 3691 bytes in 0.014 second response time [06:39:49] RECOVERY - Check systemd state on scb1002 is OK: OK - running: The system is fully operational [06:52:26] 10Operations, 10ops-eqiad, 10DBA: db1051 database host BBU issues - https://phabricator.wikimedia.org/T186049#3951640 (10Marostegui) 05Open>03Resolved After a full recharge it looks good now: ``` root@db1051:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Voltage: 3936 mV Current... [06:55:08] (03PS1) 10Marostegui: db-eqiad.php: Repool db1051 on vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408761 (https://phabricator.wikimedia.org/T186049) [06:56:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1051 on vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408761 (https://phabricator.wikimedia.org/T186049) (owner: 10Marostegui) [06:58:59] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1051 on vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408761 (https://phabricator.wikimedia.org/T186049) (owner: 10Marostegui) [06:59:09] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1051 on vslow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408761 (https://phabricator.wikimedia.org/T186049) (owner: 10Marostegui) [06:59:28] (03PS6) 10Elukey: profile::analytics::refinery::job::json_refine: add netflow job [puppet] - 10https://gerrit.wikimedia.org/r/408535 (https://phabricator.wikimedia.org/T181036) [07:00:08] (03CR) 10Elukey: [C: 032] profile::analytics::refinery::job::json_refine: add netflow job [puppet] - 10https://gerrit.wikimedia.org/r/408535 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [07:00:44] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Start repooling db1051 after the BBU change - T186049 (duration: 01m 15s) [07:00:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:00] T186049: db1051 database host BBU issues - https://phabricator.wikimedia.org/T186049 [07:05:30] !log Change triggers for s6 on db1102 - T174569 [07:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:43] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [07:13:39] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received [07:14:30] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy [07:17:33] !log Change triggers for s7 on db1102 - T174569 [07:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:52] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [07:30:38] akosiaris: I will keep cxserver/deploy patch ready for Matxin deployment. [07:53:36] !log Change triggers for s8 on db1095 - T174569 [07:53:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:50] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [08:11:10] !log Change triggers for s5 on db1095 - T174569 [08:11:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:23] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [08:21:27] !log Change triggers for s1 on db1095 - T174569 [08:21:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:40] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [08:43:07] !log Change triggers for s3 on db1095 - T174569 [08:43:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:17] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [09:02:12] (03PS1) 10Marostegui: dbproxy: Switchover labsdb1011 to labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/408762 [09:04:49] (03CR) 10Marostegui: [C: 032] dbproxy: Switchover labsdb1011 to labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/408762 (owner: 10Marostegui) [09:05:33] !log Failover labsdb1011 to labsdb1010 - T174569 [09:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:46] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [09:09:50] (03PS1) 10Marostegui: Revert "dbproxy: Switchover labsdb1011 to labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/408764 [09:11:22] (03CR) 10Marostegui: [C: 032] Revert "dbproxy: Switchover labsdb1011 to labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/408764 (owner: 10Marostegui) [09:15:49] (03PS1) 10Marostegui: dbproxy1010: Switchover labsdb1009 to labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/408765 [09:16:45] !log Failover back labsdb1010 to labsdb1011 - T174569 [09:16:49] (03PS2) 10Marostegui: dbproxy1010: Switchover labsdb1009 to labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/408765 [09:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:58] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [09:17:39] (03CR) 10Marostegui: [C: 032] dbproxy1010: Switchover labsdb1009 to labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/408765 (owner: 10Marostegui) [09:18:38] !log Failover labsdb1009 to labsdb1010 - T174569 [09:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:21] (03PS1) 10Marostegui: Revert "dbproxy1010: Switchover labsdb1009 to labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/408766 [09:22:05] (03CR) 10Marostegui: [C: 032] Revert "dbproxy1010: Switchover labsdb1009 to labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/408766 (owner: 10Marostegui) [09:23:09] PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 2 down 1 [09:24:30] ^ me [09:27:09] RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0 [09:37:37] !log kartik@tin Started deploy [cxserver/deploy@eabb6d7]: Update cxserver to e164ead and Matxin MT deployment (T184901) [09:37:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:50] T184901: Add apertium-rus-urk MT language pair - https://phabricator.wikimedia.org/T184901 [09:38:07] !log Failover back labsdb1010 to labsdb1009 - T174569 [09:38:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:19] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [09:41:20] !log kartik@tin Finished deploy [cxserver/deploy@eabb6d7]: Update cxserver to e164ead and Matxin MT deployment (T184901) (duration: 03m 44s) [09:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:28] (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/9876/scb1001.eqiad.wmnet/ says ok, merging" [puppet] - 10https://gerrit.wikimedia.org/r/407197 (https://phabricator.wikimedia.org/T186204) (owner: 10KartikMistry) [09:47:31] (03PS5) 10Alexandros Kosiaris: Add Matxin MT config [puppet] - 10https://gerrit.wikimedia.org/r/407197 (https://phabricator.wikimedia.org/T186204) (owner: 10KartikMistry) [09:47:47] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add Matxin MT config [puppet] - 10https://gerrit.wikimedia.org/r/407197 (https://phabricator.wikimedia.org/T186204) (owner: 10KartikMistry) [09:47:49] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408770 [09:49:32] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408770 (owner: 10Marostegui) [09:51:05] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408770 (owner: 10Marostegui) [09:51:16] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408770 (owner: 10Marostegui) [09:52:21] (03PS5) 10Jayprakash12345: Add "Portal" namespace on it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405024 (https://phabricator.wikimedia.org/T185232) [09:52:41] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1051 after the BBU change - T186049 (duration: 01m 14s) [09:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:54] T186049: db1051 database host BBU issues - https://phabricator.wikimedia.org/T186049 [10:02:54] (03PS1) 10Elukey: role::logging::kafkatee::webrequest::base: move out code related to outputs [puppet] - 10https://gerrit.wikimedia.org/r/408771 [10:07:15] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/9877/" [puppet] - 10https://gerrit.wikimedia.org/r/408771 (owner: 10Elukey) [10:11:35] (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-ukr] - 10https://gerrit.wikimedia.org/r/408264 (https://phabricator.wikimedia.org/T184901) (owner: 10KartikMistry) [10:19:32] (03CR) 10Hashar: "recheck" [debs/contenttranslation/apertium-rus] - 10https://gerrit.wikimedia.org/r/407202 (https://phabricator.wikimedia.org/T184901) (owner: 10KartikMistry) [10:20:37] (03CR) 10Hashar: "Depends on having apertium-rus and apertium-ukr packages to be made available in apt.wikimedia.org. That is until one day CI supports dep" [debs/contenttranslation/apertium-rus-ukr] - 10https://gerrit.wikimedia.org/r/408508 (https://phabricator.wikimedia.org/T184901) (owner: 10KartikMistry) [10:41:09] (03PS1) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [10:44:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408774 (https://phabricator.wikimedia.org/T186321) [10:46:19] (03PS1) 10Marostegui: db1069: Switch its binlog to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/408775 (https://phabricator.wikimedia.org/T186321) [10:53:02] (03CR) 10Volans: "I'm not sure we really need both the version and the source of the package. It's not clear to me from this CR if they will be used both an" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405808 (https://phabricator.wikimedia.org/T185501) (owner: 10Herron) [10:53:53] (03CR) 10Elukey: "I don't remember if it is enough to include the new hosts in hiera and then add the ipsec role, going to wait for a more expert review :)" [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) (owner: 10Elukey) [10:54:55] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408774 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [10:55:22] (03PS4) 10Volans: puppetdb: add support for puppetlabs puppetdb 4.4 package [puppet] - 10https://gerrit.wikimedia.org/r/407492 (https://phabricator.wikimedia.org/T185500) (owner: 10Herron) [10:55:55] (03PS2) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [10:56:24] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408774 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [10:56:43] (03CR) 10Volans: "I've rebased it on top of I0e19c279e295ca097d8ecc6d9396ec931221752a, so that CI passes and you can try the compiler too. See also a couple" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/407492 (https://phabricator.wikimedia.org/T185500) (owner: 10Herron) [10:57:02] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1069 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408774 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [10:58:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1069 - T186321 (duration: 01m 09s) [10:58:49] (03CR) 10Marostegui: [C: 032] db1069: Switch its binlog to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/408775 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [10:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:53] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [11:04:52] !log Stop MySQL on db1069 for MySQL upgrade, kernel upgrade and change binlog format to statement - T186321 [11:05:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:05] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [11:06:45] marostegui: about T186685, the thing is running maintain_views and friends? [11:06:46] T186685: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685 [11:08:25] arturo: No, there are no views there ;-) T186685#3951677 [11:08:59] !log install libc6-dbg on phab1001 to get a more precise gdb stack trace - T182832 [11:09:05] marostegui: so, anything pending in our side? [11:09:11] arturo: nope [11:09:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:17] T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 [11:10:23] marostegui: ok, then not sure it shows up in our phab query, since I don't see any Cloud-Services tags [11:10:32] not sure why* [11:10:38] arturo: I added Data-Services [11:10:53] Which we normally do for stuff related to Cloud and/or for heads up about ongoing things there [11:11:00] ah ok [11:19:47] (03CR) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/408543 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [11:19:56] 10Operations, 10ops-eqiad, 10DBA: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#3952063 (10Marostegui) p:05Triage>03Normal [11:20:16] (03PS2) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/408543 (https://phabricator.wikimedia.org/T86552) [11:22:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Various minor comments inline" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/390330 (https://phabricator.wikimedia.org/T106056) (owner: 10Ayounsi) [11:22:47] (03CR) 10Filippo Giunchedi: [C: 031] cassandra: create parent data directories with exec (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/404705 (https://phabricator.wikimedia.org/T175284) (owner: 10Eevans) [11:26:33] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1069" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408780 [11:28:28] (03CR) 10Volans: "I like the new abstraction, see some comments inline" (038 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 (owner: 10Giuseppe Lavagetto) [11:29:11] (03CR) 10Giuseppe Lavagetto: [C: 032] remove compare-puppet-catalogs [software] - 10https://gerrit.wikimedia.org/r/408527 (https://phabricator.wikimedia.org/T186304) (owner: 10Giuseppe Lavagetto) [11:51:22] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1069" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408780 (owner: 10Marostegui) [11:52:59] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1069" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408780 (owner: 10Marostegui) [11:53:57] (03PS1) 10Marostegui: check_private_data_report: Changed regex [puppet] - 10https://gerrit.wikimedia.org/r/408784 [11:54:23] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1069 - T186321 (duration: 01m 11s) [11:54:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:35] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [11:54:36] (03CR) 10Marostegui: [C: 032] check_private_data_report: Changed regex [puppet] - 10https://gerrit.wikimedia.org/r/408784 (owner: 10Marostegui) [11:55:14] (03PS1) 10Filippo Giunchedi: alerts: add varnish HTTP availability [puppet] - 10https://gerrit.wikimedia.org/r/408785 (https://phabricator.wikimedia.org/T186069) [11:56:49] (03CR) 10Filippo Giunchedi: "Straw man for now, the check isn't paging and we need to fix T181410 too." [puppet] - 10https://gerrit.wikimedia.org/r/408785 (https://phabricator.wikimedia.org/T186069) (owner: 10Filippo Giunchedi) [11:58:46] (03CR) 10Filippo Giunchedi: alerts: add varnish HTTP availability (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/408785 (https://phabricator.wikimedia.org/T186069) (owner: 10Filippo Giunchedi) [12:01:13] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1069" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408780 (owner: 10Marostegui) [12:03:36] (03PS3) 10Alexandros Kosiaris: Remove ores::stresstest as its no longer needed [puppet] - 10https://gerrit.wikimedia.org/r/408558 (https://phabricator.wikimedia.org/T171851) [12:03:39] (03PS2) 10Alexandros Kosiaris: ores: Allow oresX00X to reach respective oresrdb [puppet] - 10https://gerrit.wikimedia.org/r/408564 (https://phabricator.wikimedia.org/T171851) [12:03:41] (03PS6) 10Alexandros Kosiaris: ores: Set oresX00X hosts as role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408559 (https://phabricator.wikimedia.org/T171851) [12:03:43] (03PS6) 10Alexandros Kosiaris: Remove ORES profile from scb [puppet] - 10https://gerrit.wikimedia.org/r/408560 (https://phabricator.wikimedia.org/T171851) [12:04:43] (03CR) 10Alexandros Kosiaris: [C: 032] Remove ores::stresstest as its no longer needed [puppet] - 10https://gerrit.wikimedia.org/r/408558 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [12:04:52] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Allow oresX00X to reach respective oresrdb [puppet] - 10https://gerrit.wikimedia.org/r/408564 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [12:04:56] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Set oresX00X hosts as role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408559 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [12:15:03] (03PS1) 10Muehlenhoff: Remove access for jgonsior [puppet] - 10https://gerrit.wikimedia.org/r/408786 [12:16:37] (03PS1) 10MarcoAurelio: Archive repository [software/gdash] - 10https://gerrit.wikimedia.org/r/408787 (https://phabricator.wikimedia.org/T186696) [12:20:32] (03CR) 10Muehlenhoff: [C: 032] Remove access for jgonsior [puppet] - 10https://gerrit.wikimedia.org/r/408786 (owner: 10Muehlenhoff) [12:20:53] (03Draft1) 10MarcoAurelio: Mark repository as read-only [software/gdash] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/408788 (https://phabricator.wikimedia.org/T186696) [12:20:57] (03PS2) 10MarcoAurelio: Mark repository as read-only [software/gdash] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/408788 (https://phabricator.wikimedia.org/T186696) [12:24:00] jouncebot: next [12:24:00] In 1 hour(s) and 35 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1400) [12:24:01] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408789 [12:27:08] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408789 (owner: 10Marostegui) [12:28:47] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408789 (owner: 10Marostegui) [12:28:58] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408789 (owner: 10Marostegui) [12:30:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 01m 40s) [12:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:27] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408792 [12:33:50] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[enforce-users-groups-cleanup] [12:34:59] PROBLEM - MariaDB Slave Lag: s1 on dbstore2002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 335.70 seconds [12:35:07] It seems there is a permission issu on ores, is there a deploy going on? [12:35:29] I am getting 100 emails with errors [12:35:29] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [12:35:42] the dbstore2002 is me doing a backups [12:35:59] I will ack/downtime on icinga, it is a 1 time thing [12:36:22] <_joe_> icinga config errors? [12:37:01] <_joe_> lemme see [12:37:03] icinga is not mine [12:37:11] <_joe_> yeah I know [12:37:23] <_joe_> Error: Could not find any hostgroup matching 'ores_codfw' (config file '/etc/icinga/puppet_hosts.cfg', starting on line 34275) [12:37:30] <_joe_> it's akosiaris [12:37:30] akosiaris: could ores be related to your deploy, too [12:37:31] <_joe_> :P [12:37:34] ? [12:37:53] I would advice a revert- there are 200 emails or so arriving [12:38:08] not a huge issue, but a lot of spam [12:39:00] (03PS1) 10Rush: openstack: labtest codify mysql settings [puppet] - 10https://gerrit.wikimedia.org/r/408793 [12:41:00] (03PS1) 10Elukey: role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) [12:41:27] yeah yeah that's me [12:41:37] (03CR) 10jerkins-bot: [V: 04-1] role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) (owner: 10Elukey) [12:42:31] (03PS2) 10Giuseppe Lavagetto: [WiP] Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 [12:43:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408792 (owner: 10Marostegui) [12:44:37] (03PS7) 10Alexandros Kosiaris: Remove ORES profile from scb [puppet] - 10https://gerrit.wikimedia.org/r/408560 (https://phabricator.wikimedia.org/T171851) [12:44:39] (03PS1) 10Alexandros Kosiaris: Add ORES to monitoring::groups [puppet] - 10https://gerrit.wikimedia.org/r/408796 [12:44:51] (03PS2) 10Alexandros Kosiaris: Add ORES to monitoring::groups [puppet] - 10https://gerrit.wikimedia.org/r/408796 [12:44:56] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add ORES to monitoring::groups [puppet] - 10https://gerrit.wikimedia.org/r/408796 (owner: 10Alexandros Kosiaris) [12:45:30] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/9882/kafka-jumbo1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) (owner: 10Elukey) [12:45:35] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408792 (owner: 10Marostegui) [12:47:05] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1089 weight (duration: 01m 11s) [12:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:23] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408792 (owner: 10Marostegui) [12:47:41] PROBLEM - Check systemd state on ores2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:41] PROBLEM - Check systemd state on ores2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:41] PROBLEM - Check systemd state on ores2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:42] PROBLEM - Check systemd state on ores2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:51] PROBLEM - Check systemd state on ores1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:51] PROBLEM - Check systemd state on ores1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:52] PROBLEM - Check systemd state on ores1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:52] PROBLEM - Check systemd state on ores2009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:52] PROBLEM - Check systemd state on ores2005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:47:52] PROBLEM - Check systemd state on ores1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:01] PROBLEM - Check systemd state on ores1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:01] PROBLEM - Check systemd state on ores1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:01] PROBLEM - Check systemd state on ores1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:01] RECOVERY - keystone admin endpoint port 35357 on labtestcontrol2003 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 783 bytes in 0.078 second response time [12:48:02] PROBLEM - Check systemd state on ores1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:02] PROBLEM - Check systemd state on ores1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:11] PROBLEM - Check systemd state on ores2008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:11] PROBLEM - Check systemd state on ores2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:11] PROBLEM - Check systemd state on ores2007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:24] hmm, something wrong [12:49:01] nothing really [12:49:09] ok :) [12:49:14] those hosts are not yet in production [12:49:45] PROBLEM - puppet last run on ores2007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 13 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:49:45] plus we have our best people working on it! [12:49:51] PROBLEM - puppet last run on ores2008 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 13 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:49:51] :-D [12:50:03] PROBLEM - puppet last run on ores2002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 12 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:21] PROBLEM - puppet last run on ores1008 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:22] PROBLEM - puppet last run on ores2006 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 13 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:51] PROBLEM - puppet last run on ores2004 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 14 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:51] PROBLEM - puppet last run on ores2009 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 13 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:51] PROBLEM - puppet last run on ores1003 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 17 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:50:51] PROBLEM - puppet last run on ores1004 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:51:22] PROBLEM - puppet last run on ores1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[ores/deploy] [12:51:22] PROBLEM - ores uWSGI web app on ores1004 is CRITICAL: NRPE: Command check_uwsgi-ores not defined [12:51:22] PROBLEM - ores on ores2006 is CRITICAL: connect to address 10.192.32.174 and port 8081: Connection refused [12:51:22] PROBLEM - ores on ores2003 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.073 second response time [12:52:02] (03PS2) 10Elukey: role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) [12:52:44] (03CR) 10jerkins-bot: [V: 04-1] role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) (owner: 10Elukey) [12:53:11] PROBLEM - ores on ores1001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.001 second response time [12:53:11] PROBLEM - ores on ores1007 is CRITICAL: connect to address 10.64.48.16 and port 8081: Connection refused [12:53:11] PROBLEM - ores on ores2001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 136 bytes in 0.072 second response time [12:53:11] PROBLEM - ores uWSGI web app on ores2006 is CRITICAL: NRPE: Command check_uwsgi-ores not defined [12:54:52] PROBLEM - ores uWSGI web app on ores1007 is CRITICAL: NRPE: Command check_uwsgi-ores not defined [12:55:31] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [12:55:41] jouncebot: next [12:55:41] In 1 hour(s) and 4 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1400) [12:55:51] RECOVERY - puppet last run on ores2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:55:51] RECOVERY - puppet last run on ores1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:56:31] PROBLEM - ores on ores2009 is CRITICAL: connect to address 10.192.48.90 and port 8081: Connection refused [12:58:24] (03PS8) 10Alexandros Kosiaris: Remove ORES profile from scb [puppet] - 10https://gerrit.wikimedia.org/r/408560 (https://phabricator.wikimedia.org/T171851) [12:58:26] (03PS1) 10Alexandros Kosiaris: Add ores scap::dsh::groups [puppet] - 10https://gerrit.wikimedia.org/r/408799 (https://phabricator.wikimedia.org/T171851) [12:58:41] (03PS2) 10Alexandros Kosiaris: Add ores scap::dsh::groups [puppet] - 10https://gerrit.wikimedia.org/r/408799 (https://phabricator.wikimedia.org/T171851) [12:58:45] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add ores scap::dsh::groups [puppet] - 10https://gerrit.wikimedia.org/r/408799 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [12:59:35] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-ArielGlenn: Generate a list of files that are supposed to exist but 404s - https://phabricator.wikimedia.org/T182822#3952396 (10Aklapper) >>! In T182822#3950035, @ArielGlenn wrote: > In the meantime, @Aklapper what scripts are you using to fi... [13:03:43] (03PS1) 10BBlack: Add v6 DNS for kafka-jumbo10* [dns] - 10https://gerrit.wikimedia.org/r/408801 (https://phabricator.wikimedia.org/T185262) [13:04:33] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408802 [13:07:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408802 (owner: 10Marostegui) [13:11:01] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408802 (owner: 10Marostegui) [13:11:12] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408802 (owner: 10Marostegui) [13:13:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1089 (duration: 01m 11s) [13:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:03] (03CR) 10Muehlenhoff: [WIP] php7 manifests for mediawiki on stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [13:16:10] !log Drop wikidata tables and database from s5 codfw hosts - T184599 [13:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:23] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [13:16:41] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: (no justification provided) [13:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:18] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [13:17:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:30] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [13:18:37] and of course it's timing out or something [13:18:39] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 01m 22s) [13:18:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:31] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [13:19:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:26] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 00m 55s) [13:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:52] Needed a single revision\nUnable to find current revision in submodule path 'submodules/draftquality'\n" [13:21:59] great... I thought we had solved that [13:24:39] (03PS3) 10Elukey: role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) [13:25:07] (03CR) 10jerkins-bot: [V: 04-1] role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) (owner: 10Elukey) [13:26:05] (03CR) 10Elukey: [V: 032 C: 032] role::kafka::jumbo::broker: add static IPV6 mapped addrs [puppet] - 10https://gerrit.wikimedia.org/r/408795 (https://phabricator.wikimedia.org/T185262) (owner: 10Elukey) [13:29:28] (03CR) 10Arturo Borrero Gonzalez: "thanks Faidon for the review" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [13:30:36] arturo: I don't see a PS8? [13:32:51] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [13:32:51] (03CR) 10Arturo Borrero Gonzalez: "> * sort output of python3 apt-upgrade report so it shows per repo" [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [13:32:56] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 00m 06s) [13:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:04] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [13:33:05] finally some success [13:33:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:01] (03PS8) 10Arturo Borrero Gonzalez: apt: merge report-pending-upgrades script into apt-upgrade [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) [13:34:05] faidon ^^^ [13:34:09] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [13:34:12] oh [13:34:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:30] it's possible to respond comments from an old PS btw [13:34:38] ok, good to know [13:34:40] so you could upload PS8 and then respond on the comments on PS7 etc. :) [13:35:30] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 01m 21s) [13:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:51] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [13:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:35] (03CR) 10Faidon Liambotis: apt: merge report-pending-upgrades script into apt-upgrade (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [13:36:43] !log installing p7zip security updates [13:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:35] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 02m 45s) [13:38:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:47] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [13:39:03] (03PS9) 10Arturo Borrero Gonzalez: apt: merge report-pending-upgrades script into apt-upgrade [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) [13:39:56] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review, 10Wikimedia-Incident: Icinga: page in case all MediaWiki are throwing 5xx - https://phabricator.wikimedia.org/T186069#3932784 (10fgiunchedi) Tangentially related to this and something I wanted to experiment with, namely what the straw man in https... [13:55:40] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: (no justification provided) [13:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:42] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: (no justification provided) (duration: 03m 02s) [13:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:54] 10Operations, 10Commons, 10Multimedia, 10media-storage, 10User-ArielGlenn: Generate a list of files that are supposed to exist but 404s - https://phabricator.wikimedia.org/T182822#3952456 (10ArielGlenn) @Aklapper woops yes, indeed. [14:00:05] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1400). [14:00:05] Jayprakash12345 and Trey314159: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:26] I can SWAT today [14:00:49] I'm here. [14:01:20] hello [14:02:52] Trey314159, Jayprakash12345: do you want to do your deployment yourself (if you have the access)? [14:03:25] No, I have +2 power [14:03:29] not [14:03:37] no [14:03:47] !log akosiaris@tin Started deploy [ores/deploy@eb0f776]: T171851 [14:04:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:03] T171851: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851 [14:04:08] Can you deploy withot +2 Power? [14:04:23] Jayprakash12345: no [14:04:30] Jayprakash12345: It is not possible [14:04:31] Trey314159, Jayprakash12345: do you need more than a few minutes to test the change? do I need to run scripts after deployment? [14:04:49] I only need a minute or two, no scripts afterwards. [14:04:59] Jayprakash12345: no, you have to have +2 rights on mediawiki/config, but that's only a start :) [14:05:34] !log akosiaris@tin Finished deploy [ores/deploy@eb0f776]: T171851 (duration: 01m 47s) [14:05:37] zeljkof: No need to run Script in My case. [14:05:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:19] Trey314159: you should work on getting deploy rights :) [14:06:35] zeljkof: And I :) [14:06:44] zeljkof: probably, but this is fun, too ;) [14:06:49] (03CR) 10Elukey: [C: 031] Add v6 DNS for kafka-jumbo10* [dns] - 10https://gerrit.wikimedia.org/r/408801 (https://phabricator.wikimedia.org/T185262) (owner: 10BBlack) [14:06:56] ok, starting the deploy, first Jayprakash12345, then Trey314159, will ping you when your commits are at mwdebug1002 [14:07:32] Trey314159: developers should deploy their code, if possible :) how are you going to earn "I broke wikipedia" t-shirt ;) [14:08:34] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405024 (https://phabricator.wikimedia.org/T185232) (owner: 10Jayprakash12345) [14:10:13] (03Merged) 10jenkins-bot: Add "Portal" namespace on it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405024 (https://phabricator.wikimedia.org/T185232) (owner: 10Jayprakash12345) [14:10:24] (03CR) 10jenkins-bot: Add "Portal" namespace on it.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405024 (https://phabricator.wikimedia.org/T185232) (owner: 10Jayprakash12345) [14:13:25] paravoid: review welcome :-) [14:14:16] (03CR) 10Elukey: [C: 032] Add v6 DNS for kafka-jumbo10* [dns] - 10https://gerrit.wikimedia.org/r/408801 (https://phabricator.wikimedia.org/T185262) (owner: 10BBlack) [14:14:24] Jayprakash12345: the patch is at mwdebug1002, please test and let me know if I can send the patch to orbit https://en.wikipedia.org/wiki/Low_Earth_orbit [14:14:40] zeljkof: https://it.wikiquote.org/wiki/Portale:Sandbox Tested. [14:14:51] Jayprakash12345: ok to deploy? [14:15:02] zeljkof: yeah GO ahead. [14:16:40] All systems are go, launching 405024 to low earth orbit [14:18:01] this was in logs before the deployment [14:18:07] 297 Notice: Undefined index: 0 in /srv/mediawiki/php-1.31.0-wmf.20/vendor/wikimedia/running-stat/src/Wikimedia/PSquare.php on line 115 [14:18:21] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:405024|Add "Portal" namespace on it.wikiquote (T185232)]] (duration: 01m 13s) [14:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:33] T185232: Activation of "Portal" namespace on it.wikiquote - https://phabricator.wikimedia.org/T185232 [14:19:07] Jayprakash12345: deployed, please check and thanks for deploying with #releng ;) [14:19:24] zeljkof: Thanks for being here. [14:19:45] Jayprakash12345: it's my job to be here! :D [14:20:32] (03PS3) 10Zfilipin: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408540 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [14:20:54] Trey314159: please stand by, your patch will be in low earth orbit in a few minutes [14:21:43] zeljkof: I'm here and ready! (Does the deployment come with a Tesla Roadster?) [14:21:56] * Trey314159 would like one of those. [14:22:18] Trey314159: you don't want scap to deliver a testla to your home :D trust me ;) [14:22:45] post mortem of a failure would probably be a real post mortem [14:23:16] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408540 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [14:24:57] (03Merged) 10jenkins-bot: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408540 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [14:26:07] 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3952506 (10awight) [14:26:34] Trey314159: your tesla is in orbit, please check all parameters and let me know if I can push it to mars [14:26:49] zeljkof: checking.... [14:27:05] (03CR) 10jenkins-bot: Updates to enable transliteration for crhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408540 (https://phabricator.wikimedia.org/T23582) (owner: 10Tjones) [14:28:36] zeljkof: everything seems to be in order; ready for the next rocket stage! [14:29:25] (03PS1) 10BBlack: Varnish: swizzle default/cap 1d TTLs by 5% [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) [14:29:28] initiating thruster burn [14:30:43] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:408540|Updates to enable transliteration for crhwiki (T23582)]] (duration: 01m 11s) [14:30:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:57] T23582: Transliteration of Crimean Wiki - https://phabricator.wikimedia.org/T23582 [14:31:19] Trey314159: main engine cutoff, please check [14:31:32] (03PS1) 10Muehlenhoff: Extend library hint for poppler [puppet] - 10https://gerrit.wikimedia.org/r/408811 [14:31:48] zeljkof: I don't know if they can see it on Mars yet—pesky time delay—but I can see it here on Earth! Woo hoo! Thanks! [14:32:10] (03CR) 10Muehlenhoff: [C: 032] Extend library hint for poppler [puppet] - 10https://gerrit.wikimedia.org/r/408811 (owner: 10Muehlenhoff) [14:32:18] Trey314159: they should see the light from engines in 10 or so minutes, forgot the exact time ;) [14:32:30] :) [14:32:53] and if there are no mistakes in scap, tesla will not land on their head :) [14:33:04] 10Operations, 10ops-eqiad, 10DBA: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952557 (10jcrespo) a:03jcrespo claiming it for cleaning up purposes only. [14:33:18] no more patches for swat, closing window [14:33:19] (03PS1) 10Jcrespo: mariadb: Remove dbstore1001 role [puppet] - 10https://gerrit.wikimedia.org/r/408812 (https://phabricator.wikimedia.org/T186596) [14:33:36] !log EU SWAT finished [14:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:17] Thanks again, zeljkof! [14:34:57] Trey314159: no problem, but you should work on getting deployment rights, if you don't have them yet, in case releng is not around for deployment [14:35:33] 10Operations, 10Discovery, 10Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: Setup a WDQS test cluster on real hardware - https://phabricator.wikimedia.org/T186713#3952575 (10Gehel) [14:37:38] zeljkof: this was a 10% project outside of my usual work area, and this particular task was open for over 8 years (I've only been working on it since April 2017). Waiting for releng to be around was not a problem! (If I do more in this area in the future, I will look into getting more permissions.) [14:37:39] !log installing poppler security updates [14:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:30] Trey314159: I looked at the phab task and notice it was imported from bugzilla, you don't see those much these days :) [14:40:01] zeljkof: yep... it's been a looooooong time coming. [14:40:22] I have to learn more rocket launch terminology, it's a fun thing to do while doing deployments :) [14:40:46] "t minus 1 minute to swat" [14:41:24] patch is ok for ignition [14:41:45] :) [14:41:58] oh man http://www.braeunig.us/space/glossary.htm [14:42:23] lol [14:43:11] <_joe_> deploy the core! [14:43:52] scap deploy tesla mars [14:46:09] <_joe_> zeljkof: well let's hope it doesn't fail for 33% [14:46:29] let's hope it does not fall on our heads :) [14:47:41] 10Operations, 10DBA: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179#3952629 (10jcrespo) This is important, but not a goal for this quarter- we are still blocked on mediawiki extension maintainers to be compatible with it; however, all databases (misc, x1, parsercache, e... [14:48:55] (03CR) 10Jcrespo: [C: 032] "I am going to start deploying without your ok, I can always revert if there is any problem." [puppet] - 10https://gerrit.wikimedia.org/r/408812 (https://phabricator.wikimedia.org/T186596) (owner: 10Jcrespo) [14:49:00] (03PS2) 10Jcrespo: mariadb: Remove dbstore1001 role [puppet] - 10https://gerrit.wikimedia.org/r/408812 (https://phabricator.wikimedia.org/T186596) [14:50:36] (03PS3) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/408543 (https://phabricator.wikimedia.org/T86552) [14:50:59] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952677 (10jcrespo) @Marostegui, let me know what you think of the plan: * Deploy the above patch * Move current backup files to d... [14:52:39] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/408543 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [14:52:52] 10Operations, 10Discovery, 10Discovery-Search, 10Wikidata, and 2 others: Setup a WDQS test cluster on real hardware - https://phabricator.wikimedia.org/T186713#3952685 (10Gehel) Subtasks will be created for the different steps once we all agree on the principle. [14:53:12] (03PS4) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/408543 (https://phabricator.wikimedia.org/T86552) [14:54:35] jynus moritzm merging your changes too [14:55:22] godog: thanks, sorry about that [14:55:37] no worries, easy enough! [14:56:48] (03CR) 10Imarlier: [C: 031] "Nice one." [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) (owner: 10BBlack) [14:57:21] (03PS3) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [14:58:49] (03Abandoned) 10Ottomata: Add IPv6 to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/405891 (https://phabricator.wikimedia.org/T185262) (owner: 10Ottomata) [14:58:57] elukey: thank you for ^ ! :) [14:59:02] and ^^^ [15:05:31] ottomata: we are still trying to figure out in #traffic how to proceed since it would be the first use case of ipsec jessie-stretch (minor version change of strongswan) [15:06:07] joining traffic [15:09:39] (03PS1) 10Muehlenhoff: Correctly absent jgonsior as well [puppet] - 10https://gerrit.wikimedia.org/r/408816 [15:11:06] (03PS2) 10Gehel: maps: new path to osm-bright-style is now the default [puppet] - 10https://gerrit.wikimedia.org/r/408554 [15:12:26] (03CR) 10Muehlenhoff: [C: 032] Correctly absent jgonsior as well [puppet] - 10https://gerrit.wikimedia.org/r/408816 (owner: 10Muehlenhoff) [15:12:40] (03CR) 10Ema: [C: 031] "Minor question inline, LGTM otherwise." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) (owner: 10BBlack) [15:13:32] 10Operations, 10Fr-CentralNotice-Translation-Bugs, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 4 others: Publishing translations for central notice banners fails - https://phabricator.wikimedia.org/T104774#3952775 (10DStrine) [15:17:19] (03PS9) 10Alexandros Kosiaris: Remove ORES profile from scb [puppet] - 10https://gerrit.wikimedia.org/r/408560 (https://phabricator.wikimedia.org/T171851) [15:17:21] (03PS1) 10Alexandros Kosiaris: Disable notification for role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408817 (https://phabricator.wikimedia.org/T171851) [15:17:41] (03CR) 10BBlack: Varnish: swizzle default/cap 1d TTLs by 5% (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) (owner: 10BBlack) [15:21:09] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3952792 (10akosiaris) [15:24:14] 10Operations, 10ORES, 10Scap, 10Scoring-platform-team: Use external dsh group to list pooled ORES nodes - https://phabricator.wikimedia.org/T179501#3726795 (10akosiaris) ORES dsh groups populated via conftool alongside 2 canaries is in https://gerrit.wikimedia.org/r/#/c/408799/. Some changes are required... [15:26:07] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3952800 (10akosiaris) ORES deploy has failed (see T182799#3952787 and T184135#3873517) but the topic of this task, which is to reimage the hosts t... [15:30:07] (03PS2) 10BBlack: Varnish: swizzle TTLs by 5% [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) [15:32:52] (03PS3) 10BBlack: Varnish: swizzle TTLs by 5% [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) [15:36:48] 10Operations: Add email addresses to techcom@wikimedia.org email alias - https://phabricator.wikimedia.org/T186718#3952825 (10debt) [15:37:07] TIL: the verb "swizzle" [15:37:30] 10Operations: Add email addresses to techcom@wikimedia.org email alias - https://phabricator.wikimedia.org/T186718#3952844 (10debt) [15:37:39] (03CR) 10Alexandros Kosiaris: [C: 032] Disable notification for role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408817 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [15:37:46] (03PS2) 10Alexandros Kosiaris: Disable notification for role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408817 (https://phabricator.wikimedia.org/T171851) [15:37:49] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Disable notification for role::ores [puppet] - 10https://gerrit.wikimedia.org/r/408817 (https://phabricator.wikimedia.org/T171851) (owner: 10Alexandros Kosiaris) [15:38:57] paravoid: TIL: the noun "incantation" [15:39:21] lol [15:39:40] my use of "incantation" is probably near the normal definition, I'm not so sure about "swizzle". [15:40:02] I don't think I'm the only one to use it for that meaning in a technical context, but normal dictionaries don't line up well with my meaning I guess. [15:40:54] my meaning of "swizzle" in technical context is usually "randomly vary something to avoid some ill effect of too many things sharing some common value" [15:41:26] ahahaha [15:41:38] I learned something today as well [15:41:55] https://en.wikipedia.org/wiki/Swizzling_(computer_graphics) [15:42:04] http://www.catb.org/jargon/html/S/swizzle.html doesn't line up with my meaning either, maybe it's just mine :) [15:45:02] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: legoktm can't deploy docker images on contint1001 - https://phabricator.wikimedia.org/T186475#3952875 (10akosiaris) >>! In T186475#3950009, @hashar wrote: > Both @Legoktm and @Addshore already have the privileges to run privileged cod... [15:46:50] (03PS1) 10Alexandros Kosiaris: Add addshore to contint-docker admins [puppet] - 10https://gerrit.wikimedia.org/r/408823 (https://phabricator.wikimedia.org/T186475) [15:58:13] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952904 (10Marostegui) >>! In T186596#3952677, @jcrespo wrote: > @Marostegui, let me know what you think of the plan: > * Deploy t... [15:59:32] (03PS10) 10Arturo Borrero Gonzalez: apt: merge report-pending-upgrades script into apt-upgrade [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) [16:00:27] !log jmm@puppetmaster1001 conftool action : set/pooled=yes; selector: mw1271.eqiad.wmnet [16:00:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:49] 10Operations, 10ops-eqiad: Hardware check on mw1271 - https://phabricator.wikimedia.org/T184722#3952910 (10MoritzMuehlenhoff) 05Open>03Resolved Thanks, I ran "scap pull" and repooled the server. [16:14:55] (03PS4) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [16:20:52] (03PS3) 10Gehel: maps: new path to osm-bright-style is now the default [puppet] - 10https://gerrit.wikimedia.org/r/408554 [16:21:27] 10Operations: Add email addresses to techcom@wikimedia.org email alias - https://phabricator.wikimedia.org/T186718#3953007 (10Joe) 05Open>03Resolved [16:21:52] 10Operations: Add email addresses to techcom@wikimedia.org email alias - https://phabricator.wikimedia.org/T186718#3952825 (10Joe) Both addressed added. [16:27:53] !log upgrading tilerator / kartotherian on maps eqiad [16:28:05] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=maps1001.eqiad.wmnet [16:28:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:56] !log gehel@tin Started deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging [16:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:20] 10Operations, 10Traffic, 10HTTPS: Wikimania.org uses an invalid security certificate - https://phabricator.wikimedia.org/T186717#3953055 (10Stryn) 05Open>03declined See T133548 [16:30:12] !log gehel@tin Finished deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging (duration: 01m 17s) [16:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:39] (03PS5) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [16:31:02] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: new tilerator packaging [16:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:20] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: new tilerator packaging (duration: 00m 20s) [16:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:38] (03PS4) 10Gehel: maps: new path to osm-bright-style is now the default [puppet] - 10https://gerrit.wikimedia.org/r/408554 [16:32:03] (03CR) 10Gehel: [C: 032] maps: new path to osm-bright-style is now the default [puppet] - 10https://gerrit.wikimedia.org/r/408554 (owner: 10Gehel) [16:34:30] !log gehel@tin Started deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging [16:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:10] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: new tilerator packaging [16:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:28] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: new tilerator packaging (duration: 00m 17s) [16:37:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:17] !log gehel@tin Started deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging [16:38:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:41] !log gehel@tin Finished deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging (duration: 00m 24s) [16:38:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:16] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: new tilerator packaging [16:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:39:34] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: new tilerator packaging (duration: 00m 18s) [16:39:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:20] (03PS6) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [16:42:59] !log gehel@tin Started deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging [16:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:23] !log gehel@tin Finished deploy [kartotherian/deploy@ecdda41]: new kartotherian packaging (duration: 00m 24s) [16:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:46] (03CR) 10Ottomata: [C: 031] role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) (owner: 10Elukey) [16:44:04] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: new tilerator packaging [16:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:21] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: new tilerator packaging (duration: 00m 18s) [16:44:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:28] !log gehel@tin Started deploy [tilerator/deploy@29d633e]: new tilerator packaging [16:46:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:41] (03CR) 10BBlack: [C: 031] role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) (owner: 10Elukey) [16:46:43] (03CR) 10Giuseppe Lavagetto: Add preemptive validation. (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [16:46:48] !log gehel@tin Finished deploy [tilerator/deploy@29d633e]: new tilerator packaging (duration: 00m 21s) [16:46:51] (03CR) 10Giuseppe Lavagetto: cli.tool: drop the "find" interface (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/405301 (owner: 10Giuseppe Lavagetto) [16:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:19] !log upgrade of tilerator / kartotherian on maps eqiad completed, sorry for the noise... [16:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:45] akosiaris hi, wondering could you review https://gerrit.wikimedia.org/r/c/405591/ please? [16:48:53] (03PS7) 10Elukey: role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) [16:50:56] jouncebot: now [16:50:56] No deployments scheduled for the next 1 hour(s) and 9 minute(s) [16:50:59] jouncebot: next [16:51:00] In 1 hour(s) and 9 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1800) [16:51:22] I'm deploying a patch that for some reason didn't get deployed last night [16:53:48] !log legoktm@tin Synchronized php-1.31.0-wmf.20/includes/http/MWHttpRequest.php: MWHttpRequest: Restore ability to pass null for $options - https://gerrit.wikimedia.org/r/408718 (Unbreak ExtensionDistributor) (duration: 01m 12s) [16:53:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:16] Amir1: I think something went wrong when you tried to deploy https://gerrit.wikimedia.org/r/408718 last night...forgot to git rebase? [16:56:27] akosiaris: I don’t think I have the permissions to read the logs, but it looks like 3 of the eqiad ORES nodes went down due to OOM just now. [16:56:33] Did the worker count change? [16:57:36] (03PS1) 10Ottomata: Set api.version.request: false for eventstreams [puppet] - 10https://gerrit.wikimedia.org/r/408831 (https://phabricator.wikimedia.org/T176126) [16:57:40] (03CR) 10Elukey: [C: 032] role::kafka::jumbo::broker: add ipsec configuration to cache hosts [puppet] - 10https://gerrit.wikimedia.org/r/408773 (https://phabricator.wikimedia.org/T186598) (owner: 10Elukey) [16:57:45] legoktm: AFAIK I remember rebasing [16:57:52] but it might something wrong [16:58:25] also the fatals dropped [16:59:01] hello people [16:59:37] we are going to merge/test https://gerrit.wikimedia.org/r/408773 that might raise some ipsec alerts [17:00:11] we are adding ipsec from all the non-eqiad cp hosts to kafka jumbo [17:03:42] (03CR) 10Rush: [C: 031] "small note on declarative cleanup of removed script. Seems ok to me, let's put it through the paces in Toolforge and see what we love and" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [17:05:40] (03PS1) 10Gehel: maps: backport cache header configuration from upstream [puppet] - 10https://gerrit.wikimedia.org/r/408832 (https://phabricator.wikimedia.org/T108435) [17:08:17] (03PS11) 10Arturo Borrero Gonzalez: apt: merge report-pending-upgrades script into apt-upgrade [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) [17:22:21] 10Operations, 10Cloud-Services, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#3953299 (10brion) In the middle of the week it seems less congested than on the weekend, still on the same route. Seeing up to 32 megabits downl... [17:23:04] akosiaris: I have no idea why scb* ORES is running out of memory. The scb100[1-2] worker count is dialed way down. Maybe a new service was deployed to these machines? [17:23:18] It’s pretty urgent as far as our service goes. [17:28:15] (03PS1) 10Awight: Reduce ORES workers everywhere [puppet] - 10https://gerrit.wikimedia.org/r/408834 [17:28:39] akosiaris: mutante: ^ Looks like we need to take drastic measures to keep the ORES service alive. [17:30:24] Any opsen? ^ [17:30:45] I can deploy that for you, but I do not know anything about the service [17:31:15] someone else will have to babysit the patch after deploy [17:31:42] (03CR) 10Jcrespo: [C: 032] Reduce ORES workers everywhere [puppet] - 10https://gerrit.wikimedia.org/r/408834 (owner: 10Awight) [17:31:56] jynus: Thanks, I’ll babysit! [17:32:36] do you habe puppet run rights on those hosts? [17:32:44] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Services (doing), and 2 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3953326 (10fgiunchedi) [17:32:48] *have [17:32:55] no [17:33:04] ok, I will run them myself [17:33:05] I have sudo for the service though, so I can restart when the change lands. [17:33:21] ah, so puppet will not do that automaticall? that works, too [17:34:53] deployed into scb1001 [17:35:02] jynus: Thanks for the suggestion, I’ll make a task to create that dependency [17:35:27] actually it is ok [17:35:44] it allows manual deploy, I do not see as a bug- I do it like that for my mysqls [17:36:03] I was just asking in case I had to disable puppet for coordinated deploy [17:36:30] there are cases were you may want to not reload things automatically, depending on the case [17:37:02] swaps seems indeed concerning [17:37:27] and ores seems to be the cause [17:37:33] based on memory pressure [17:37:55] please reload what you have to do, and log the change after that [17:37:57] The machine is certainly fine without us [17:38:10] !log ORES celery workers restarted on scb100[1-4] [17:38:18] my observations was based on ores app memory usage [17:38:22] it is the top app there [17:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:28] and swap is almost to the limit [17:38:36] to the point it may start killing processes soon [17:39:02] I have not applied the change to scb1002 [17:39:10] so you may want to wait for that [17:39:46] do you know where ores::web runs? [17:40:03] jynus: yes, ores::web is on scb[1-2]* [17:40:04] I can search it myself, but you probably know that faster [17:40:46] The ones I’m concerned about for the moment are scb1* [17:40:47] interesting I didn't see the celery_workers change being applied [17:41:17] 99-main.yaml is confirmed to be applied on both hosts now [17:41:48] looks like scb100[3-4] don’t have the change yet. No rush, those machines are healthy for the moment. [17:42:04] let me run puppet there, too [17:42:27] scb2* is healthy FWIW, those machines are built with more memory. [17:42:32] Thanks again! [17:42:58] the key to deploments is to have patience on roll over, rollback fast on issues [17:43:12] I am not very familiar with these machines and services [17:43:48] so I am a bit cautious [17:44:02] That sounds right. I’m fine with restarting one box at a time, good reminder. [17:44:04] I do not see ores::web running on those hosts, let me check puppet [17:44:41] oh, I see now why [17:44:49] Looks like conftool is in charge of maintaining the pools? [17:44:52] it is a default parameter, being overriden [17:45:21] puppet was run on the 4 hosts [17:46:42] The bad news is that the available memory is GC cycling, but the minimum memory seems to be decreasing each cycle: https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&orgId=1&from=1518024667681&to=1518025507681 [17:46:49] (03PS1) 10Elukey: profile::analytics::refinery::job::json_refine: standardize netflow conf [puppet] - 10https://gerrit.wikimedia.org/r/408836 (https://phabricator.wikimedia.org/T181036) [17:47:25] If true, we might crash again in 1 minute. [17:48:05] (03CR) 10Dzahn: "yea, it's not wrong and i see why you want it, it's just that we want to stop using the apache module entirely. to be replaced by the http" [puppet] - 10https://gerrit.wikimedia.org/r/407962 (owner: 10Paladox) [17:48:17] did you do any code deploy today at 17? [17:48:30] Not personally, but let me check the scap logs [17:49:02] actually, it seems to be crashing regularly every day [17:49:34] (03CR) 10Pnorman: [C: 031] "Opening a ticket to consider what cache-control headers we should send, but this follows the pattern used elsewhere and is okay for now." [puppet] - 10https://gerrit.wikimedia.org/r/408832 (https://phabricator.wikimedia.org/T108435) (owner: 10Gehel) [17:51:43] jynus: This is why we’re migrating off of shared boxes :) [17:53:59] can you or did you restart the service? [17:54:03] 10Operations, 10Maps-Sprint, 10Traffic: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732#3953401 (10Gehel) [17:55:32] I’ve restarted, but it looks like the changes weren’t applied to scb100[3-4]. [17:55:47] (03CR) 10Ottomata: [C: 031] profile::analytics::refinery::job::json_refine: standardize netflow conf [puppet] - 10https://gerrit.wikimedia.org/r/408836 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [17:56:09] pupped didn't change anything there [17:56:20] so your change may be missing something [17:56:31] +1 That would explain it. [17:56:56] I asked where ores::web was deployed because the parameter (not the hiera changes) didn't seem to have any effect on any of the servers [17:57:15] so there must be something missing, give a second look at the code [17:57:18] 10Operations, 10Cloud-Services, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2343755 (10ayounsi) What speed/path do you get for example between your home and your linode server? Can you also try for example: https://uplo... [17:57:57] (03PS1) 10Awight: Reduce redundant default value for ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/408838 [17:58:03] jynus: There was a second default :-/ [17:58:16] yeah, something like taht happens [17:58:47] Available memory on scb100[1-2] look stable, btw! I think this fix will hold us over for a while. [17:58:52] my recommendation is to document that, can be done on a later patch [17:59:04] (to avoid worse problems in the future) [17:59:18] (03CR) 10Jcrespo: [C: 032] Reduce redundant default value for ORES worker count [puppet] - 10https://gerrit.wikimedia.org/r/408838 (owner: 10Awight) [18:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1800). [18:00:05] subbu,James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:33] o/ [18:01:34] - CELERYD_CONCURRENCY: 45 / + CELERYD_CONCURRENCY: 35 [18:01:42] awight^ [18:01:53] Fun. [18:02:47] (03PS6) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [18:04:22] I can SWAT. Got a meeting at 11:30 but it looks like it'll be a short one. [18:04:37] (I've certainly jinxed that now though) [18:05:48] (03PS2) 10Thcipriani: Tidy: Re-do this as a sorted negative list that gets shorter over time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407727 (owner: 10Jforrester) [18:05:58] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407727 (owner: 10Jforrester) [18:06:54] 10Operations, 10ORES, 10Scoring-platform-team: Clean up redundant ORES celery_workers defaults - https://phabricator.wikimedia.org/T186734#3953455 (10awight) [18:07:36] !log fixing ferm breakage by restarting the service on db1051 [18:07:41] (03Merged) 10jenkins-bot: Tidy: Re-do this as a sorted negative list that gets shorter over time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407727 (owner: 10Jforrester) [18:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:52] (03CR) 10jenkins-bot: Tidy: Re-do this as a sorted negative list that gets shorter over time [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407727 (owner: 10Jforrester) [18:09:34] subbu: James_F ^ is live on mwdebug1002, check please [18:09:56] checking .. [18:10:22] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Services (doing), and 2 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3953499 (10MoritzMuehlenhoff) Is there a specific reason for calling the repo component cassandra311? That's very specific a... [18:10:46] (03PS7) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [18:11:03] (03CR) 10Zoranzoki21: "Sorry for spamming, but I working on this patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) (owner: 10Zoranzoki21) [18:12:39] James_F, thcipriani had to line up some test pages .. but, lgtm. [18:13:25] cool. James_F anything you wanted to check? If not I'll go ahead and sync. [18:13:52] thcipriani, no errors in logs i presume [18:14:12] logs look clean for mwdebug1002 [18:14:33] thcipriani: Go for it. [18:14:37] * thcipriani does [18:16:21] jynus: Thanks, I think we’re out of the danger zone. [18:16:58] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:407727|Tidy: Re-do this as a sorted negative list that gets shorter over time]] (duration: 01m 13s) [18:17:04] ^ subbu James_F live now [18:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:18] ty! [18:17:23] yw :) [18:19:02] wikimania2018wiki is not in there, fyi :p [18:19:08] in that list [18:19:27] 10Operations, 10Analytics, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3953540 (10Nuria) Will do, let's let it bake a bit and i shall check. [18:20:09] awight: I am not sure I agree from checking the server resources, but if you are happy, I am happy :-) [18:21:38] Wiki13: Yes, because it's already using Remex. [18:22:23] jynus: Would you mind pointing out what you see? [18:22:24] i see [18:22:56] after i asked i thought myself, maybe its there already [18:23:42] awight: servers are swapping hard: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=16&fullscreen&orgId=1&var-server=scb1001&var-network=eth0&from=now-6h&to=now [18:24:01] that is an abnormal state, and slows down hosts multiple times more than just being busy [18:24:45] others are in a worse state, with a full swap- meaning oom happens and kills random processes [18:24:50] making thins very unstable [18:25:15] red == bad [18:25:41] but that seems to be happening for over a month already, so I guess as long as OOM isn't killing much, people are happy [18:26:15] oh dear! [18:26:23] That’s horrifying [18:26:24] https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=16&fullscreen&orgId=1&var-server=scb1001&var-network=eth0&from=now-1y&to=now [18:26:35] it seems to be worse lately [18:26:42] a bit of swap is not that bad [18:26:50] actively swapping is [18:30:26] (03PS1) 10Andrew Bogott: labweb: move labweb1001 and 1002 back to public IPs [dns] - 10https://gerrit.wikimedia.org/r/408841 (https://phabricator.wikimedia.org/T186729) [18:35:50] (03PS8) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [18:38:15] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3953599 (10jcrespo) I am currently on step 2, "Moving current backup files to dbstore2001", FYI, `/srv/backups/_mysqldump`, will t... [18:38:33] (03PS9) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [18:41:05] (03PS4) 10BBlack: Varnish: swizzle TTLs by 5% [puppet] - 10https://gerrit.wikimedia.org/r/408810 (https://phabricator.wikimedia.org/T181315) [18:43:25] (03PS10) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [18:45:04] (03PS3) 10Thcipriani: Remove old 'accountcreator' rules now handled by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408071 (https://phabricator.wikimedia.org/T185417) (owner: 10Framawiki) [18:45:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408071 (https://phabricator.wikimedia.org/T185417) (owner: 10Framawiki) [18:45:53] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3953621 (10Cmjohnson) I will not be available this week..Let's circle back to this mid-week next week please. [18:48:25] (03Merged) 10jenkins-bot: Remove old 'accountcreator' rules now handled by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408071 (https://phabricator.wikimedia.org/T185417) (owner: 10Framawiki) [18:48:37] (03CR) 10jenkins-bot: Remove old 'accountcreator' rules now handled by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408071 (https://phabricator.wikimedia.org/T185417) (owner: 10Framawiki) [18:49:50] framawiki: ^ is live on mwdebug1002, check please [18:52:35] thcipriani: looks good [18:52:44] ok, going live [18:55:52] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:408071|Remove old "accountcreator" rules now handled by default]] T185417 T186462 (duration: 01m 12s) [18:55:58] ^ framawiki live now [18:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:05] T186462: Remove old 'accountcreator' rules now handled by default - https://phabricator.wikimedia.org/T186462 [18:56:06] T185417: Bureaucrats on WMF wikis to add/remove 'accountcreator' by default - https://phabricator.wikimedia.org/T185417 [18:56:35] (03PS7) 10Volans: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 [18:57:06] (03PS3) 10Volans: Backends: add known hosts files backend [software/cumin] - 10https://gerrit.wikimedia.org/r/405719 [18:57:37] thcipriani: looks good too [18:57:52] (03PS2) 10Thcipriani: Rename Project NS on Wikimedia Canada Chapter's wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407040 (https://phabricator.wikimedia.org/T185661) (owner: 10Framawiki) [18:58:01] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407040 (https://phabricator.wikimedia.org/T185661) (owner: 10Framawiki) [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T1900) [19:00:04] No GERRIT patches in the queue for this window AFAICS. [19:00:48] swat running a bit long, will cut sanity down a touch. [19:02:40] (03Merged) 10jenkins-bot: Rename Project NS on Wikimedia Canada Chapter's wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407040 (https://phabricator.wikimedia.org/T185661) (owner: 10Framawiki) [19:03:15] (03CR) 10Volans: "In the last PS I've just replaced mock with unittest.mock and adjusted the mock of input() in two tests for the cli." [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [19:04:06] framawiki: Project NS rename for cawikimedia is on mwdebug1002, check please, I'll deploy and run namespacedupes after you check it out. [19:05:09] thcipriani: siprop=namespaces is good [19:06:18] okie doek, going live [19:08:57] * Hauskatze was about to suggest the namespaceDupes thing [19:09:03] (03PS2) 10Thcipriani: Add NS_MAIN to $wgNamespacesWithSubpages for cawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405755 (https://phabricator.wikimedia.org/T185436) (owner: 10Framawiki) [19:09:20] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:407040|Rename Project NS on Wikimedia Canada Chapter wiki]] T185661 (duration: 01m 11s) [19:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:34] T185661: Rename Project NS on Wikimedia Canada Chapter's wiki - https://phabricator.wikimedia.org/T185661 [19:09:48] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405755 (https://phabricator.wikimedia.org/T185436) (owner: 10Framawiki) [19:10:21] framawiki: Project NS rename live and namespaceDupes run: Looks Good! [19:10:40] (03Merged) 10jenkins-bot: Add NS_MAIN to $wgNamespacesWithSubpages for cawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405755 (https://phabricator.wikimedia.org/T185436) (owner: 10Framawiki) [19:11:02] !log after conversation with andrew we moved labweb to public for T186729 [19:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:17] T186729: labweb1001 and 1002 need to access labnet1001.eqiad.wmnet:8774 (and labnet1002.eqiad.wmnet:8774) - https://phabricator.wikimedia.org/T186729 [19:11:32] framawiki: NS_MAIN to $wgNamespacesWithSubpages for cawikimedia is on mwdebug1002, check please [19:12:16] (03CR) 10jenkins-bot: Rename Project NS on Wikimedia Canada Chapter's wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407040 (https://phabricator.wikimedia.org/T185661) (owner: 10Framawiki) [19:12:19] (03CR) 10jenkins-bot: Add NS_MAIN to $wgNamespacesWithSubpages for cawikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/405755 (https://phabricator.wikimedia.org/T185436) (owner: 10Framawiki) [19:12:47] (03CR) 10Volans: [C: 032] Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [19:13:17] thcipriani: ok for me: https://ca.wikimedia.org/wiki/Grants/Requests/2018?action=info shows subpage numbers [19:13:30] framawiki: cool, thanks for checking, going live [19:14:05] (03CR) 10Andrew Bogott: [C: 032] labweb: move labweb1001 and 1002 back to public IPs [dns] - 10https://gerrit.wikimedia.org/r/408841 (https://phabricator.wikimedia.org/T186729) (owner: 10Andrew Bogott) [19:14:10] (03PS1) 10Andrew Bogott: Move labweb1001 and 1002 back to public IPs [puppet] - 10https://gerrit.wikimedia.org/r/408854 [19:14:54] (03PS2) 10Andrew Bogott: Move labweb1001 and 1002 back to public IPs [puppet] - 10https://gerrit.wikimedia.org/r/408854 (https://phabricator.wikimedia.org/T186729) [19:15:44] (03Merged) 10jenkins-bot: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [19:15:55] (03CR) 10jenkins-bot: Migration to Python 3 [software/cumin] - 10https://gerrit.wikimedia.org/r/402059 (owner: 10Volans) [19:16:03] (03CR) 10Andrew Bogott: [C: 032] Move labweb1001 and 1002 back to public IPs [puppet] - 10https://gerrit.wikimedia.org/r/408854 (https://phabricator.wikimedia.org/T186729) (owner: 10Andrew Bogott) [19:16:24] 10Operations, 10Page-Previews, 10RESTBase, 10Services, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#3953710 (10Volans) [19:16:42] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:405755|Add NS_MAIN to $wgNamespacesWithSubpages for cawikimedia]] T185436 (duration: 01m 12s) [19:16:50] ^ framawiki live now [19:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:56] T185436: Activate Subpages feature for NS:0 on Wikimedia Canada Chapter's wiki - https://phabricator.wikimedia.org/T185436 [19:17:20] 10Operations, 10Cloud-Services, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#3953714 (10brion) I get a full 150 megabits download (my bandwidth cap) on that file from ulsfo, and about 100 megabits from my Linode server (t... [19:17:38] thcipriani: confirmed, thank you !! [19:17:51] yw, thanks for the patches :) [19:24:40] 10Operations, 10OCG-General, 10Readers-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3953779 (10Jdlrobson) [19:40:33] (03PS1) 10Andrew Bogott: labweb1002: added hiera for profile::openstack::main::horizon::webserver_hostname [puppet] - 10https://gerrit.wikimedia.org/r/408858 [19:41:24] (03CR) 10jerkins-bot: [V: 04-1] labweb1002: added hiera for profile::openstack::main::horizon::webserver_hostname [puppet] - 10https://gerrit.wikimedia.org/r/408858 (owner: 10Andrew Bogott) [19:42:14] (03PS2) 10Andrew Bogott: labweb1002: added hiera for webserver_hostname [puppet] - 10https://gerrit.wikimedia.org/r/408858 [19:42:47] (03CR) 10Andrew Bogott: [C: 032] labweb1002: added hiera for webserver_hostname [puppet] - 10https://gerrit.wikimedia.org/r/408858 (owner: 10Andrew Bogott) [19:51:01] (03PS11) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [19:57:47] (03CR) 10Addshore: [C: 031] Add addshore to contint-docker admins [puppet] - 10https://gerrit.wikimedia.org/r/408823 (https://phabricator.wikimedia.org/T186475) (owner: 10Alexandros Kosiaris) [20:00:04] no_justification: #bothumor I � Unicode. All rise for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T2000). [20:00:05] No GERRIT patches in the queue for this window AFAICS. [20:00:50] Oh shut up bot, you know nothing [20:01:31] (03PS2) 10Zoranzoki21: Disable Flow extension on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) [20:02:17] (03PS1) 10Chad: group1 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408862 [20:02:19] (03PS3) 10Zoranzoki21: Disable Flow extension on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) [20:02:40] (03CR) 10Zoranzoki21: Disable Flow extension on Commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) (owner: 10Zoranzoki21) [20:08:53] (03PS1) 10Madhuvishy: nfs-mount-manager: Add option to kill process accessing a mount [puppet] - 10https://gerrit.wikimedia.org/r/408864 (https://phabricator.wikimedia.org/T171540) [20:16:55] (03CR) 10Chad: [C: 032] group1 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408862 (owner: 10Chad) [20:18:34] (03Merged) 10jenkins-bot: group1 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408862 (owner: 10Chad) [20:18:47] (03CR) 10jenkins-bot: group1 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408862 (owner: 10Chad) [20:18:58] 10Operations, 10ops-eqiad, 10Analytics, 10hardware-requests: Decommission kafka1018 - https://phabricator.wikimedia.org/T182955#3953998 (10RobH) All decommissioning should be tagged with #hw-requests. [20:20:56] (03PS2) 10Herron: puppetdb: add major version and package variant parameters [puppet] - 10https://gerrit.wikimedia.org/r/405808 (https://phabricator.wikimedia.org/T185501) [20:35:09] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#3954012 (10Niedzielski) p:05Triage>03Normal [20:36:57] !log demon@tin rebuilt and synchronized wikiversions files: group1 to wmf.20 [20:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:55] jynus: Heya, I rolled out wmf.20 to group1 wikis and I saw a huge spike in db lag [20:39:05] Rolling back, but fyi (and needs investigation) [20:39:18] !log demon@tin rebuilt and synchronized wikiversions files: revert, huge spike in db lag [20:39:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:58] 10Operations, 10ops-eqiad, 10Analytics, 10hardware-requests: Decommission kafka1018 - https://phabricator.wikimedia.org/T182955#3954040 (10RobH) a:03RobH [20:45:02] 10Operations, 10ops-eqiad, 10Analytics, 10hardware-requests: Decommission kafka1018 - https://phabricator.wikimedia.org/T182955#3954043 (10RobH) So I cannot see kafka1018 on the switch stack in row D. @Cmjohnson, I cannot actually finish the non-interrupt steps, since the port isn't noted. The host is cu... [20:45:17] robh sounds good [20:45:33] im just doing the rest of the steps [20:45:41] cuz its not going to power on and call into puppet, its dead. [20:45:45] okay I will figure out the port [20:45:56] yeah, that's all that matters [20:48:23] (03CR) 10Herron: "> I'm not sure we really need both the version and the source of the package. It's not clear to me from this CR if they will be used both " (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/405808 (https://phabricator.wikimedia.org/T185501) (owner: 10Herron) [20:48:53] (03PS1) 10RobH: kafka1018 decommission [puppet] - 10https://gerrit.wikimedia.org/r/408870 (https://phabricator.wikimedia.org/T182955) [20:49:55] (03PS5) 10Herron: puppetdb: add support for puppetlabs puppetdb 4.4 package [puppet] - 10https://gerrit.wikimedia.org/r/407492 (https://phabricator.wikimedia.org/T185500) [20:51:42] (03PS1) 10RobH: kafka1018 decom, production dns [dns] - 10https://gerrit.wikimedia.org/r/408871 (https://phabricator.wikimedia.org/T182955) [20:52:07] (03CR) 10RobH: [C: 032] kafka1018 decommission [puppet] - 10https://gerrit.wikimedia.org/r/408870 (https://phabricator.wikimedia.org/T182955) (owner: 10RobH) [20:52:28] (03CR) 10RobH: [C: 032] kafka1018 decom, production dns [dns] - 10https://gerrit.wikimedia.org/r/408871 (https://phabricator.wikimedia.org/T182955) (owner: 10RobH) [20:54:17] 10Operations, 10ops-eqiad, 10Analytics, 10hardware-requests: Decommission kafka1018 - https://phabricator.wikimedia.org/T182955#3954057 (10RobH) a:05RobH>03Cmjohnson [20:54:57] (03PS12) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [20:55:27] 10Operations, 10ops-eqiad, 10Analytics, 10hardware-requests: Decommission kafka1018 - https://phabricator.wikimedia.org/T182955#3839641 (10RobH) Ok, ready for on-site wipe and unracking (plus the tracing and disabling of the switch port) [20:59:39] 10Operations, 10ops-eqiad, 10User-Eevans: Degraded RAID on restbase-dev1006 - https://phabricator.wikimedia.org/T185494#3954115 (10Eevans) [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: How many deployers does it take to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180207T2100). [21:00:04] No GERRIT patches in the queue for this window AFAICS. [21:00:16] Nothing for ORES. [21:00:31] !log andrew@tin Started deploy [horizon/deploy@9773454]: (no justification provided) [21:00:36] !log andrew@tin Finished deploy [horizon/deploy@9773454]: (no justification provided) (duration: 00m 05s) [21:00:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:46] !log andrew@tin Started deploy [horizon/deploy@9773454]: (no justification provided) [21:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:31] !log andrew@tin Finished deploy [horizon/deploy@9773454]: (no justification provided) (duration: 00m 44s) [21:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:43] !log andrew@tin Started deploy [horizon/deploy@9773454]: (no justification provided) [21:01:45] !log andrew@tin Started deploy [horizon/deploy@9773454]: (no justification provided) [21:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:02:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:24] !log andrew@tin Finished deploy [horizon/deploy@9773454]: (no justification provided) (duration: 02m 38s) [21:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:05] !log andrew@tin Started deploy [horizon/deploy@9773454]: (no justification provided) [21:06:09] !log andrew@tin Finished deploy [horizon/deploy@9773454]: (no justification provided) (duration: 00m 03s) [21:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:19] !log demon@tin rebuilt and synchronized wikiversions files: mw.org also back to wmf.17 [21:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:23] andrewbogott is above justifying his actions :) [21:09:01] Ok, definitely proven there's a replag issue in wmf.20 [21:09:16] Maybe I can hardcode that to always say "This isn't working and I'm going to do this 1000 times" [21:09:18] rolled mw.org back to wmf.17 -> went from #3 most replag'd wiki to #7 [21:09:39] andrewbogott: I'm thinking of giving it a list of snarky ways to put it [21:09:50] "you got some 'splainin' to do!" [21:11:27] Anyway, we gotta isolate what caused the replag. [21:11:33] * no_justification pokes around [21:12:34] !log andrew@tin Started deploy [horizon/deploy@9773454]: This isn't working and I'm going to do this 1000 times [21:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:58] !log andrew@tin Finished deploy [horizon/deploy@9773454]: This isn't working and I'm going to do this 1000 times (duration: 01m 24s) [21:14:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:17] andrewbogott: Isn't working...on horizon's end or isn't working in scap? [21:14:45] no_justification: scap, but it's almost certainly my fault [21:14:55] 10Operations, 10Cloud-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954161 (10ArielGlenn) [21:15:03] actually that last one went better [21:15:11] 10Operations, 10Cloud-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954173 (10ArielGlenn) p:05Triage>03Normal [21:15:24] I still need to figure out how to allow deploy-service to restart apache on this host [21:27:08] !log deploying wmf.20 to en* (except enwiki) on mwdebug1001 to debug new cirrus errors in wmf.20/wmf.19 mixed sister search [21:27:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:41] err, wmf.20/wmf.17 [21:29:30] !log mlitn@tin Started deploy [3d2png/deploy@8135c2d]: Updating 3d2png [21:29:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:25] !log mlitn@tin Finished deploy [3d2png/deploy@8135c2d]: Updating 3d2png (duration: 03m 55s) [21:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:58] (03PS1) 10Andrew Bogott: Horizon/queens: replace deprecated django.utils.log.NullHandler [puppet] - 10https://gerrit.wikimedia.org/r/408931 [21:35:00] (03PS1) 10Andrew Bogott: horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 [21:35:30] (03PS2) 10Ottomata: Set api.version.request: false for eventstreams [puppet] - 10https://gerrit.wikimedia.org/r/408831 (https://phabricator.wikimedia.org/T176126) [21:35:38] (03CR) 10Ottomata: [V: 032 C: 032] Set api.version.request: false for eventstreams [puppet] - 10https://gerrit.wikimedia.org/r/408831 (https://phabricator.wikimedia.org/T176126) (owner: 10Ottomata) [21:35:49] (03CR) 10jerkins-bot: [V: 04-1] horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 (owner: 10Andrew Bogott) [21:35:51] (03CR) 10jerkins-bot: [V: 04-1] Horizon/queens: replace deprecated django.utils.log.NullHandler [puppet] - 10https://gerrit.wikimedia.org/r/408931 (owner: 10Andrew Bogott) [21:39:05] !log otto@tin Started deploy [eventstreams/deploy@ee854df]: (no justification provided) [21:39:07] !log otto@tin Finished deploy [eventstreams/deploy@ee854df]: (no justification provided) (duration: 00m 02s) [21:39:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:14] !log otto@tin Started deploy [eventstreams/deploy@ee854df]: (no justification provided) [21:40:16] !log otto@tin Finished deploy [eventstreams/deploy@ee854df]: (no justification provided) (duration: 00m 02s) [21:40:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:38] (03PS2) 10Andrew Bogott: Horizon/queens: replace deprecated django.utils.log.NullHandler [puppet] - 10https://gerrit.wikimedia.org/r/408931 [21:40:40] (03PS2) 10Andrew Bogott: horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 [21:41:33] (03PS3) 10Andrew Bogott: Horizon/queens: replace deprecated django.utils.log.NullHandler [puppet] - 10https://gerrit.wikimedia.org/r/408931 [21:42:02] (03PS3) 10Zoranzoki21: horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 (owner: 10Andrew Bogott) [21:42:04] (03CR) 10Andrew Bogott: [C: 032] Horizon/queens: replace deprecated django.utils.log.NullHandler [puppet] - 10https://gerrit.wikimedia.org/r/408931 (owner: 10Andrew Bogott) [21:42:40] (03PS4) 10Andrew Bogott: horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 [21:43:23] (03CR) 10Andrew Bogott: [C: 032] horizon/queens: disable offline compression [puppet] - 10https://gerrit.wikimedia.org/r/408932 (owner: 10Andrew Bogott) [21:44:26] !log bsitzmann@tin Started deploy [mobileapps/deploy@fe3cd60]: Update mobileapps to 7a3b19c (T186745 T186643) [21:44:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:41] T186745: MW API requests to mediawiki.org (without the www subdomain) fail - https://phabricator.wikimedia.org/T186745 [21:51:07] !log bsitzmann@tin Finished deploy [mobileapps/deploy@fe3cd60]: Update mobileapps to 7a3b19c (T186745 T186643) (duration: 06m 41s) [21:51:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:22] T186745: MW API requests to mediawiki.org (without the www subdomain) fail - https://phabricator.wikimedia.org/T186745 [21:53:46] !log mwdebug1001 back to standard deployed versions [21:53:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:55:34] (03PS1) 10BBlack: dhcp bootstrap for bast5001 [puppet] - 10https://gerrit.wikimedia.org/r/408934 (https://phabricator.wikimedia.org/T156027) [21:56:26] (03CR) 10BBlack: [C: 032] dhcp bootstrap for bast5001 [puppet] - 10https://gerrit.wikimedia.org/r/408934 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [21:59:09] (03CR) 10MarcoAurelio: [C: 04-2] Mark repository as read-only [software/gdash] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/408788 (https://phabricator.wikimedia.org/T186696) (owner: 10MarcoAurelio) [22:04:48] (03CR) 10MarcoAurelio: "(testing new access done)" [software/gdash] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/408788 (https://phabricator.wikimedia.org/T186696) (owner: 10MarcoAurelio) [22:05:36] does filipo use IRC? [22:05:48] Hauskatze: Yep, go.dog :) [22:06:04] thanks [22:06:31] godog: got https://gerrit.wikimedia.org/r/#/c/408787/ wrt your request to archive operations/software/gdash [22:12:42] (03PS1) 10Chad: Gerrit: Rename link to gitiles [puppet] - 10https://gerrit.wikimedia.org/r/408936 [22:16:45] (03CR) 10Paladox: [C: 031] Gerrit: Rename link to gitiles [puppet] - 10https://gerrit.wikimedia.org/r/408936 (owner: 10Chad) [22:32:25] 10Operations, 10ops-eqiad: OfflineUncorrectableSector on mw1256 sda - https://phabricator.wikimedia.org/T186535#3954339 (10Cmjohnson) @Joe The disk is out of warranty but I have kajillion 500GB disks lying around. The disk can be replaced anytime but a re-install will be needed. Let me know once the server i... [22:33:45] 10Operations, 10ops-eqiad: Offline uncorrectable sectors on poolcounter1002 /dev/sda - https://phabricator.wikimedia.org/T186534#3946536 (10Cmjohnson) I have several 500GB disks lying around. Let me when this server is depooled and powered off. Most likely will need a re-install. [22:35:50] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954359 (10madhuvishy) [22:40:30] 10Operations, 10ops-eqiad, 10User-Eevans: Degraded RAID on restbase-dev1006 - https://phabricator.wikimedia.org/T185494#3954364 (10Cmjohnson) Because of the AHCI configuration the h/w does not show up in the standard log and HP has no way of proving the SSD type [22:41:46] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954368 (10madhuvishy) @Cmjohnson When we racked labstore1006 & 7 we approved the proposal for racking in 1GBE racks (T167984). I did not know that we h... [22:44:24] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954377 (10Cmjohnson) @madhuvishy we do not have 10G racks in row B yet. We are doing a network refresh and will be adding 10G racks in the next couple... [22:45:43] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954383 (10madhuvishy) @Cmjohnson Can we move them to a row with 10G then? These are in public vlan so don't need labs-support. I believe they are curre... [22:47:21] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954386 (10RobH) Moving rows means the IP address and vlan change. So that is usually a reimage but can also be done manually I suppose, unless anyone... [22:49:28] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954388 (10Cmjohnson) @madhuvishy I will talk with @faidon. I have no issues moving one to row D in a 10G rack but the 10G racks in A and C are changing... [22:51:00] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954394 (10madhuvishy) @Cmjohnson So to clarify, do both row A and D (or the racks we have these servers in - D6 and A1) not have 10G enabled? @RobH On... [22:54:15] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: set up labstore1006,1007 for use of their 10G nics - https://phabricator.wikimedia.org/T186756#3954401 (10Cmjohnson) @madhuvishy They are currently not in 10G racks. I can move one from d6 to d7 or d2(both 10G racks). The network gear has alread... [23:01:54] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954433 (10madhuvishy) a:03Cmjohnson [23:04:19] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Services (doing), and 2 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3954438 (10Eevans) >>! In T186619#3953499, @MoritzMuehlenhoff wrote: > Is there a specific reason for calling the repo compo... [23:12:51] 10Operations, 10ops-eqiad, 10User-Eevans: Degraded RAID on restbase-dev1006 - https://phabricator.wikimedia.org/T185494#3954459 (10Cmjohnson) After more review, it turns out these servers are using the old Intel S3610 ssds. I will need to check with @Faidon and @Robh about ordering another one. [23:17:05] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954466 (10ayounsi) Is having them both in the same row a temporary solution, and are those servers redundant (can loose both without service inte... [23:20:50] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954470 (10madhuvishy) @ayounsi No we can't lose both without service interruption. I am not sure how we can have row level redundancy in this cas... [23:22:19] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954473 (10RobH) My understanding is there is 10G available in other rows, but they will go away during the refersh and possible be replaced with... [23:24:26] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954479 (10Cmjohnson) I can probably make room in C8 but that requires expediting the decom of several MC servers. I am not a fan of having to mo... [23:25:30] 10Operations, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954482 (10madhuvishy) +1 on moving only once! [23:28:43] (03PS1) 10BBlack: eqsin: add prometheus placeholder files [puppet] - 10https://gerrit.wikimedia.org/r/408942 (https://phabricator.wikimedia.org/T156027) [23:29:03] (03PS1) 10Andrew Bogott: horizon/queens: disable static compression entirely [puppet] - 10https://gerrit.wikimedia.org/r/408943 [23:29:45] (03CR) 10Andrew Bogott: [C: 032] horizon/queens: disable static compression entirely [puppet] - 10https://gerrit.wikimedia.org/r/408943 (owner: 10Andrew Bogott) [23:31:42] (03CR) 10BBlack: [C: 032] eqsin: add prometheus placeholder files [puppet] - 10https://gerrit.wikimedia.org/r/408942 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [23:31:47] (03PS2) 10BBlack: eqsin: add prometheus placeholder files [puppet] - 10https://gerrit.wikimedia.org/r/408942 (https://phabricator.wikimedia.org/T156027) [23:31:51] (03CR) 10BBlack: [V: 032 C: 032] eqsin: add prometheus placeholder files [puppet] - 10https://gerrit.wikimedia.org/r/408942 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [23:36:11] (03PS1) 10Andrew Bogott: horizon keystone policy.json: add a missing rule [puppet] - 10https://gerrit.wikimedia.org/r/408944 [23:39:48] (03PS1) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [23:40:15] (03CR) 10jerkins-bot: [V: 04-1] phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [23:40:23] (03PS2) 10Andrew Bogott: horizon policy files: add some missing rules [puppet] - 10https://gerrit.wikimedia.org/r/408944 [23:41:28] (03CR) 10Andrew Bogott: [C: 032] horizon policy files: add some missing rules [puppet] - 10https://gerrit.wikimedia.org/r/408944 (owner: 10Andrew Bogott) [23:42:11] (03PS1) 10BBlack: eqsin: rename asw-eqsin to asw1-eqsin [dns] - 10https://gerrit.wikimedia.org/r/408948 [23:43:15] (03PS1) 10BBlack: eqsin: netops monitoring defs [puppet] - 10https://gerrit.wikimedia.org/r/408949 (https://phabricator.wikimedia.org/T156027) [23:43:18] (03PS2) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [23:43:19] (03CR) 10BBlack: [C: 032] eqsin: rename asw-eqsin to asw1-eqsin [dns] - 10https://gerrit.wikimedia.org/r/408948 (owner: 10BBlack) [23:43:36] 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, 10User-ArielGlenn: Move labstore1006 and 1007 to 10G enabled racks in row D - https://phabricator.wikimedia.org/T186756#3954511 (10RobH) [23:43:44] (03PS1) 10Chad: wmf.20 off most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408950 [23:43:46] (03CR) 10Chad: [C: 032] wmf.20 off most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408950 (owner: 10Chad) [23:44:04] (03CR) 10BBlack: [C: 032] eqsin: netops monitoring defs [puppet] - 10https://gerrit.wikimedia.org/r/408949 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [23:44:07] (03CR) 10jerkins-bot: [V: 04-1] phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [23:45:26] (03PS1) 10Andrew Bogott: horizon designate policy.json: add a missing comma [puppet] - 10https://gerrit.wikimedia.org/r/408951 [23:45:32] eventually, some alerts may appear for hosts/routers with "eqsin" in the name, or bast5001. they can be safely ignored! [23:45:32] (03Merged) 10jenkins-bot: wmf.20 off most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408950 (owner: 10Chad) [23:45:34] (03PS2) 10Andrew Bogott: horizon designate policy.json: add a missing comma [puppet] - 10https://gerrit.wikimedia.org/r/408951 [23:45:52] (03PS3) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [23:46:40] (03CR) 10Andrew Bogott: [C: 032] horizon designate policy.json: add a missing comma [puppet] - 10https://gerrit.wikimedia.org/r/408951 (owner: 10Andrew Bogott) [23:46:58] (03CR) 10jenkins-bot: wmf.20 off most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408950 (owner: 10Chad) [23:54:47] (03CR) 10Dzahn: [C: 04-1] "The title 'status' has already been used in this resource expression at /srv/jenkins-workspace/puppet-compiler/9889/change/src/modules/htt" [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [23:54:54] ok I think I ack'd everything before it hit IRC :) [23:55:42] :) and yay @ eqsin [23:56:21] icinga-wm: ping [23:56:34] (03PS4) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [23:56:55] there hasn't been any icinga-wm traffic in quite a while I think, unless I lost it in other noise [23:57:16] it should've at least alerted when I broke einsteinium config earlier (now fixed) [23:57:41] surprising, 43 CRITs in web ui ... [23:57:52] but most are disabled notifications, yea [23:58:21] yea, seems broken :/ [23:58:27] manual ACK not showing [23:59:05] It was working in -releng [23:59:34] !log restarted icinga-wm, too quiet [23:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log