[00:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T0000). [00:00:04] ebernhardson: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:28] I can deploy [00:00:34] ACKNOWLEDGEMENT - Check systemd state on mw1393 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:34] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1393 is CRITICAL: Host mw1393 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:34] ACKNOWLEDGEMENT - Check systemd state on mw1394 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:34] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1394 is CRITICAL: Host mw1394 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:34] ACKNOWLEDGEMENT - Check systemd state on mw1395 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:34] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1395 is CRITICAL: Host mw1395 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:34] ACKNOWLEDGEMENT - Check systemd state on mw1396 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:35] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1396 is CRITICAL: Host mw1396 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:35] ACKNOWLEDGEMENT - Check systemd state on mw1397 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:36] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1397 is CRITICAL: Host mw1397 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:36] ACKNOWLEDGEMENT - Check systemd state on mw1398 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:37] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1398 is CRITICAL: Host mw1398 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:37] ACKNOWLEDGEMENT - Check systemd state on mw1399 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:38] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1399 is CRITICAL: Host mw1399 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:38] ACKNOWLEDGEMENT - Check systemd state on mw1400 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:39] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1400 is CRITICAL: Host mw1400 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:39] ACKNOWLEDGEMENT - Check systemd state on mw1401 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:40] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1401 is CRITICAL: Host mw1401 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:40] ACKNOWLEDGEMENT - Check systemd state on mw1402 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:41] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw1402 is CRITICAL: Host mw1402 is not in mediawiki-installation dsh group daniel_zahn envoyproxy not starting. not pooled. https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [00:00:41] PROBLEM - Check systemd state on mw1404 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:46] ok cool, looks like all that can be ignored :) [00:00:48] PROBLEM - Check systemd state on mw1403 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:21] (03PS2) 10EBernhardson: [cirrus] Configuration for glent m0 AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576952 (https://phabricator.wikimedia.org/T246947) [00:02:15] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576952 (https://phabricator.wikimedia.org/T246947) (owner: 10EBernhardson) [00:02:35] (03CR) 10Jforrester: [C: 03+1] Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [00:03:20] I have some last-minute SWAT additions [00:03:23] (03Merged) 10jenkins-bot: [cirrus] Configuration for glent m0 AB test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576952 (https://phabricator.wikimedia.org/T246947) (owner: 10EBernhardson) [00:03:29] can self-deploy after you are finished ebernhardson [00:03:42] tgr: sure, i'll be done quick, both of these configure something that doesn't change any web requests [00:03:59] (03CR) 10RobH: [C: 03+2] adding R640 skus [software] - 10https://gerrit.wikimedia.org/r/576944 (owner: 10RobH) [00:04:03] Hopefully. :-) [00:05:21] (03PS2) 10Krinkle: Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 [00:06:01] James_F: want to swat the gewikimedia fix? [00:06:05] unfix* [00:06:14] (03PS3) 10EBernhardson: [cirrus] use 2 shards for commonswiki_content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576659 (https://phabricator.wikimedia.org/T246882) (owner: 10DCausse) [00:06:23] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576659 (https://phabricator.wikimedia.org/T246882) (owner: 10DCausse) [00:06:25] Krinkle: Unless you do? [00:06:33] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cirrus: Backend configuration for glent m0 ab test (duration: 01m 04s) [00:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:07:08] James_F: can do, np [00:07:19] (03Merged) 10jenkins-bot: [cirrus] use 2 shards for commonswiki_content [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576659 (https://phabricator.wikimedia.org/T246882) (owner: 10DCausse) [00:08:49] !log ebernhardson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cirrus: use 2 shards for commonswiki_content (duration: 01m 04s) [00:08:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:58] (03PS3) 10Jforrester: Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [00:09:08] (03PS4) 10Jforrester: MWConfigCacheGenerator: Stop reading most wiki-family dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle) [00:09:30] tgr: all yours [00:09:40] (03CR) 10Jforrester: [C: 03+1] "Let's land this ASAP." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576977 (https://phabricator.wikimedia.org/T239301) (owner: 10Krinkle) [00:09:49] Krinkle: go ahead, mine will take a while [00:09:50] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:10:01] (03PS1) 10Papaul: DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/577005 (https://phabricator.wikimedia.org/T241852) [00:10:05] (03CR) 10Krinkle: [C: 03+2] Revert "Add gewikimedia to special wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576974 (owner: 10Jforrester) [00:10:28] ok, now I need to figure out how to generate that interwiki thing [00:10:37] scap [00:11:11] (03Merged) 10jenkins-bot: Revert "Add gewikimedia to special wikis" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576974 (owner: 10Jforrester) [00:11:52] Krinkle: Hmm, your change only touches liwikinews now. [00:12:05] Which is… better, but probably not OK? [00:12:21] yeah, still need to resolve that as well [00:12:59] aha update-interwiki-cache runs the maintenane script and pushes to gerrit *and* syncs it [00:13:10] Yes. [00:13:16] Not at all scary. [00:13:19] (03PS1) 10Dwisehaupt: Add frdb2001 to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/577006 (https://phabricator.wikimedia.org/T242269) [00:13:48] so.. no mwdebug testing then I guess.. [00:13:54] Nope. [00:14:01] Or even code review. [00:14:09] I've seen people run it on Sundays, indeed. [00:14:11] and then it wants my Gerrit password [00:14:18] ok, let's do this differently :) [00:14:23] !log krinkle@deploy1001 update-interwiki-cache aborted: Update interwiki cache (duration: 00m 31s) [00:14:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:14:30] Krinkle: ssh-agent is your friend. [00:15:12] You can use a http password and then invalidate it after [00:15:19] fwiw the password it probably wants is your gerrit HTTP password, which is a random token and you can reset it at any time [00:17:19] right, but running three commands back to back instead of 1 isn't bad enough imho to need to create that in Gerrit, copy/paste, and then reset. [00:17:49] scap seems to have crashed on me [00:18:54] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:18:57] 10Operations, 10ops-codfw, 10DC-Ops, 10fundraising-tech-ops: (Need by: ASAP) rack/setup/install frdb2001 - https://phabricator.wikimedia.org/T245566 (10Dwisehaupt) [00:19:00] (03PS1) 10Krinkle: Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577007 [00:22:13] (03CR) 10Krinkle: [C: 03+2] Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577007 (owner: 10Krinkle) [00:22:37] Krinkle: Don't run `scap pull` from a deploy server, then. :-P [00:23:11] (03Merged) 10jenkins-bot: Update interwiki.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577007 (owner: 10Krinkle) [00:23:30] James_F: that's to sync from /mw-staging to /mw, required for the maintenance script. [00:23:37] I'm not sure how the scap command does it [00:23:40] I guess it doesn't [00:23:46] Youd' have to sync the dblist separately first [00:23:48] I guess that's fine [00:23:50] meh [00:23:54] Yeah. [00:24:05] !log krinkle@deploy1001 Synchronized dblists/: I4fb3d14ed86 (duration: 01m 04s) [00:24:05] The 2 mins is just a timeout, I guess. [00:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:25:17] James_F: it worked though [00:25:21] not sure what it was doing [00:25:42] the fpm is because there's no public http on that host, but that step is optional anyway [00:27:27] !log krinkle@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 02s) [00:27:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:29:17] tgr: all yours [00:29:24] (03CR) 10Krinkle: [C: 03+2] tests: Re-enable 'family' dblist test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576977 (https://phabricator.wikimedia.org/T239301) (owner: 10Krinkle) [00:29:26] thx [00:30:24] (03Merged) 10jenkins-bot: tests: Re-enable 'family' dblist test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576977 (https://phabricator.wikimedia.org/T239301) (owner: 10Krinkle) [00:31:40] James_F: so wgEnableUploads, 11 wikinews have true explicitly, then 'wikinews' has default of false, and 'commonsuploads' (incl 8 wikinews) has it true. [00:31:56] and it seems only 1 wiki in this mess has an ambigious outcome: liwikinews. [00:32:01] Yeah. [00:32:45] commonsuploads is our worst dblist [00:34:19] I have no idea what it means to be on that list [00:34:31] Yeah, it's a mess. [00:34:44] commonsuploads originally meant "the upload link points to Commons". [00:34:56] But then it became "… and local uploads aren't allowed". [00:35:01] Except not always. [00:35:11] it actually enables local uploads [00:35:20] but many of the wikis on taht list have an override for false indeed [00:35:23] "Soft-disabled" [00:35:27] But only for sysops? [00:35:31] yeah [00:35:35] Oh [00:35:35] * James_F nods. [00:35:40] right, that's the link [00:35:43] okay I get it now [00:36:30] https://li.wikinews.org/wiki/Speciaal:Lètste_verangeringe is active enough that I'd be OK changing it and expecting them to shout if they're unhappy. [00:37:03] it's fine, I'll just set it explicitly no biggie [00:37:08] OK. [00:40:42] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [00:41:43] !log tgr@deploy1001 Synchronized php-1.35.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/SearchStrategy/SearchStrategy.php: SWAT: [[gerrit:577001|Newcomer tasks: Set search sort to random for ORES based topics (T242476)]] (duration: 01m 04s) [00:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:48] T242476: Newcomer tasks: when selecting multiple topics, one topic should not dominate over the others - https://phabricator.wikimedia.org/T242476 [00:44:54] jouncebot: next [00:44:54] In 0 hour(s) and 15 minute(s): Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T0100) [00:45:21] (03PS1) 10Krinkle: Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 [00:45:39] !log tgr@deploy1001 Synchronized php-1.35.0-wmf.22/extensions/GrowthExperiments/modules/homepage/: SWAT: [[gerrit:577002|Adjust topic UX (T244421)]] (duration: 01m 05s) [00:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:44] T244421: Newcomer tasks: UX changes for ORES topics - https://phabricator.wikimedia.org/T244421 [00:47:33] 10Operations, 10ops-codfw, 10DC-Ops: (Need by: TBD) rack/setup/install ganeti20[19-24] - https://phabricator.wikimedia.org/T244783 (10Papaul) [00:51:04] how do you tell these days which train schedule is followed? [00:51:22] or is it always in the EU slot and the US slot is used for troubleshooting? [00:51:24] jouncebot: next [00:51:24] In 0 hour(s) and 8 minute(s): Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T0100) [00:52:00] tgr: eu slot when the train conductor is in europe, us when the conductor is in the us [00:52:20] so it varies as we rotate conductor duties amongs the whole team [00:52:31] (03PS2) 10Krinkle: Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 [00:52:45] mutante: I only have a small phabricator change to deploy, not very important or risky [00:52:50] (03PS4) 10Krinkle: Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 [00:53:02] (03PS3) 10Krinkle: Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 [00:53:08] well the conductor seems to be liw based on ticket assignment, but he is marked for both slots in the deploy calendar [00:53:13] (03PS5) 10Krinkle: MWConfigCacheGenerator: Stop reading most wiki-family dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) [00:53:24] do people really have to look up personal data on officewiki to tell when the train runs? [00:53:42] normally one of them is marked as reserved but not used this week [00:53:47] past few months that is [00:54:03] but not this week it seems not sure if that was an intentional omission. [00:54:07] tgr: yeah, you shouldn't have to look up on office wiki [00:54:17] twentyafterfour: SWAT is done [00:54:25] probably an oversight when thciprianigenerated the deployments page this week [00:55:06] James_F: no change detected :) [00:55:09] (on both) [00:55:10] yay [00:55:19] ack, thanks [00:55:29] both slots said "European+American Version" so I thought something has changed [00:55:57] EU slot used, US for troubleshooting. There's also American+European which is opposite. [00:56:12] depends on if primary conductor is US or EU. [00:56:12] twentyafterfour: sounds good [00:56:28] oh, so the order counts [00:57:03] !log deploying phabricator-extensions tag release/2020-03-04/1 ( https://phabricator.wikimedia.org/source/phab-extensions/history/wmf%252Fstable/;release/2020-03-04/1 ) [00:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:58:03] the American+European schedule skips generating the Tuesday EU Train window in the schedule since primary conductor is US: I think that's the main difference between the two in terms of schedule. [01:00:04] twentyafterfour: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T0100). [01:03:22] (03PS6) 10Krinkle: MWConfigCacheGenerator: Stop reading most wiki-family dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) [01:03:52] (03CR) 10Dzahn: [C: 03+1] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/577005 (https://phabricator.wikimedia.org/T241852) (owner: 10Papaul) [01:03:56] !log phabricator deployment done [01:03:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:05:50] nice twentyafterfour. cya later. off [01:19:37] (03CR) 10Jforrester: [C: 03+1] Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [01:20:10] (03CR) 10Jforrester: [C: 03+1] Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 (owner: 10Krinkle) [01:20:33] (03CR) 10Jforrester: [C: 03+1] "Perfect." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle) [01:24:52] (03PS4) 10Jforrester: [WiP] Provide infrastructure to create InitialiseSettings.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576514 [01:29:02] RECOVERY - Check systemd state on mw1404 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:32:46] (03PS1) 10Krinkle: tests: Remove unused phpunit.xml entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577018 [01:32:47] James_F: three violations ^ [01:32:48] (03PS1) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 [01:33:23] (03CR) 10Krinkle: "Failed asserting that two arrays are equal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [01:33:54] James_F: I guess some of the commons uploads wikis got closed [01:34:42] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [01:35:12] ugh [01:35:14] labtestwiki is a dblist [01:35:20] but also an identically named wiki [01:35:38] it has to be a dblist because it's on a separate db cluster, and we require 1:1 with those [01:35:41] that'll need to be renamed [01:35:44] * Krinkle writes another test against that [01:35:50] who knew tests were useful? [01:36:58] actually.. [01:37:02] no it *was* renamed [01:37:04] it was named s11 [01:37:12] PROBLEM - Hadoop NodeManager on analytics1074 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [01:37:26] 10Operations, 10netops, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10Papaul) @ayounsi please see below for the configuration i just added the first interface for now. If all looks good I will add the other interfaces ` [edit interfaces ge-0/0/0 unit... [01:47:26] (03PS1) 10Krinkle: tests: Remove outdated 'testExpressionListsNaming' test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577025 [01:48:06] (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address for mw2350 to mw2365, Add those servers too to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/577005 (https://phabricator.wikimedia.org/T241852) (owner: 10Papaul) [01:50:53] James_F: so, the fixme for group1 and group2. That's fixable right? [01:51:50] Might be a good one for buildDBLists.php to expand :) [01:53:38] (03PS1) 10Krinkle: multiversion: Remove Beta-hack from buildDBLists.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 [01:56:17] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2350.codfw.wmnet ` The log can be fou... [01:58:55] (03PS1) 10Mstyles: kibana: add kibana to relforge [puppet] - 10https://gerrit.wikimedia.org/r/577031 (https://phabricator.wikimedia.org/T246961) [01:59:48] (03CR) 10jerkins-bot: [V: 04-1] kibana: add kibana to relforge [puppet] - 10https://gerrit.wikimedia.org/r/577031 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [02:01:24] (03CR) 10Mstyles: "I am aware that this is nowhere close to correct. Outside of the tests failing, I don't know if I added things in the correct place. Also " [puppet] - 10https://gerrit.wikimedia.org/r/577031 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [02:05:51] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2351.codfw.wmnet ` The log can be fou... [02:11:02] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [02:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:13:27] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [02:13:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:18:13] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2350.codfw.wmnet'] ` and were **ALL** successful. [02:20:19] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2352.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [02:20:50] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [02:20:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:16] (03CR) 10Krinkle: [C: 03+2] Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [02:23:15] (03Merged) 10jenkins-bot: Fix wgUploadNavigationUrl conflict between 'commonsuploads' and 'wikinews' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [02:23:18] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [02:23:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:03] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2351.codfw.wmnet'] ` and were **ALL** successful. [02:28:31] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2353.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [02:29:46] (03CR) 10Krinkle: [C: 03+2] "Verified via mwdebug1002 that (when bypassing the sidebar cache by switching to a rarely used interface language), the Upload link now con" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576999 (owner: 10Krinkle) [02:29:55] (03CR) 10Krinkle: [C: 03+2] Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 (owner: 10Krinkle) [02:30:02] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ia5b12516be59fd (duration: 01m 05s) [02:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:31:36] (03Merged) 10jenkins-bot: Resolve 'wgEnableUploads' ambiguity for wikinews wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577010 (owner: 10Krinkle) [02:33:31] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I52bb7024384 (no-op) (duration: 01m 04s) [02:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:07] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [02:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:17] (03CR) 10Krinkle: [C: 03+2] MWConfigCacheGenerator: Stop reading most wiki-family dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle) [02:36:13] (03Merged) 10jenkins-bot: MWConfigCacheGenerator: Stop reading most wiki-family dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle) [02:36:29] (03PS1) 10Samwilson: Enable watchlist expiry on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577033 (https://phabricator.wikimedia.org/T246849) [02:36:32] * Krinkle staging on mwdebug1002 [02:36:49] (03PS2) 10C. Scott Ananian: Check out parsoid from git on paroid::testing machines [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) [02:37:37] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [02:37:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:40:18] (03CR) 10Krinkle: [C: 03+2] "Verified via mwdebug1002 that, hard-disabling config cache and profiling with XHGui, that DBList reads went down to 35 from 43." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576490 (https://phabricator.wikimedia.org/T169821) (owner: 10Krinkle) [02:41:39] (03CR) 10Krinkle: [C: 03+2] tests: Remove unused phpunit.xml entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577018 (owner: 10Krinkle) [02:42:20] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2352.codfw.wmnet'] ` and were **ALL** successful. [02:42:30] !log krinkle@deploy1001 Synchronized multiversion/MWConfigCacheGenerator.php: Ib2aaf6540d85 (duration: 01m 04s) [02:42:33] (03Merged) 10jenkins-bot: tests: Remove unused phpunit.xml entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577018 (owner: 10Krinkle) [02:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:42:52] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2354.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [02:43:33] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [02:43:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:44:04] (03CR) 10DannyS712: [C: 03+1] "LGTM, but would it make sense to have a wiki or 2 with it not enabled, to see if that results in any issues?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577033 (https://phabricator.wikimedia.org/T246849) (owner: 10Samwilson) [02:45:40] (03CR) 10DannyS712: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 (owner: 10Krinkle) [02:46:02] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [02:46:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:51:48] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2353.codfw.wmnet'] ` and were **ALL** successful. [02:52:56] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [02:53:13] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2355.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [02:54:20] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [02:57:54] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [02:57:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:00:21] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [03:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:02:34] (03PS6) 10Andrew Bogott: neutron: update l3_agent hacks for Queens [puppet] - 10https://gerrit.wikimedia.org/r/576928 [03:02:36] (03PS1) 10Andrew Bogott: designate: update api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/577036 [03:04:00] (03PS1) 10Krinkle: multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 [03:05:09] (03CR) 10Andrew Bogott: [C: 03+2] designate: update api-paste.ini [puppet] - 10https://gerrit.wikimedia.org/r/577036 (owner: 10Andrew Bogott) [03:05:25] (03CR) 10jerkins-bot: [V: 04-1] multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (owner: 10Krinkle) [03:07:13] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2354.codfw.wmnet'] ` and were **ALL** successful. [03:08:14] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [03:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:52] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.22/includes/SiteConfiguration.php: I723133e68, I2b90e8e9b0 (duration: 01m 05s) [03:09:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:10:45] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [03:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:14:43] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) |Servers Rack D3| Ready for service| |mw2350|Yes| |mw2351|Yes| |mw2352|Yes| |mw2353|Yes| |mw2354|Yes| |mw2355|Yes| |mw2356|| |mw2357|| |mw2358| |mw2359|| |mw2... [03:15:29] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2355.codfw.wmnet'] ` and were **ALL** successful. [03:23:03] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.22/extensions/WikimediaMaintenance/dumpInterwiki.php: Iec6da824cca (duration: 01m 04s) [03:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:23:48] (03CR) 10Krinkle: "Looks the oddball 'labtestwiki' dblist isn't tracked by the YAML files currently. I'll delete that first then, which I was gonna do anyway" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (owner: 10Krinkle) [03:37:10] (03PS2) 10Krinkle: multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) [03:37:12] (03PS1) 10Krinkle: dblists: Remove 'labtestwiki' dblist containing 'labtestwiki' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) [03:37:42] (03PS2) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 [03:38:51] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [03:43:31] (03CR) 10Krinkle: "A few have been fixed. Left: Failed asserting that two arrays are equal." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [03:55:57] (03CR) 10Krinkle: "Filed T246968 about wmgUseGlobalAbuseFilters." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [04:07:51] (03PS1) 10Krinkle: dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 [04:08:23] (03PS1) 10C. Scott Ananian: All parsoid profiles use_php=true [puppet] - 10https://gerrit.wikimedia.org/r/577043 [04:08:25] (03PS1) 10C. Scott Ananian: WIP: purge parsoid service from puppet [puppet] - 10https://gerrit.wikimedia.org/r/577044 [04:09:02] (03CR) 10jerkins-bot: [V: 04-1] dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 (owner: 10Krinkle) [04:09:56] (03CR) 10C. Scott Ananian: "> So scandium uses "role(parsoid::testing)" and wtp prod servers use" [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) (owner: 10C. Scott Ananian) [04:11:17] (03CR) 10C. Scott Ananian: "Low-hanging fruit -- just remove the 'use_php' property from profile::parsoid since all parsoid machines use PHP now; we're removing suppo" [puppet] - 10https://gerrit.wikimedia.org/r/577043 (owner: 10C. Scott Ananian) [04:12:15] (03PS2) 10Krinkle: dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 [04:14:48] (03PS3) 10Krinkle: dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 [04:16:46] (03CR) 10Krinkle: [C: 03+2] dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 (owner: 10Krinkle) [04:17:45] (03Merged) 10jenkins-bot: dblists: Remove closed wikis from commonsuploads (angwikibooks, iewikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 (owner: 10Krinkle) [04:19:54] RECOVERY - Check systemd state on mw1398 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:21:37] (03PS2) 10Krinkle: dblists: Remove 'labtestwiki' dblist containing 'labtestwiki' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) [04:22:21] !log krinkle@deploy1001 Synchronized dblists/commonsuploads.dblist: Idb69b82f5 (duration: 01m 04s) [04:22:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:23:45] (03CR) 10Krinkle: [C: 03+2] "Verfied on mwdebug1002 that on these wikis, when logged-in and bypassing sidebar cache via random uselang, that previously "Upload file" w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577042 (owner: 10Krinkle) [04:23:54] (03PS3) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 [04:24:08] (03CR) 10Krinkle: "The permission conflict between closed/commonsuploads is fixed as of Idb69b82f56c854." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [04:25:09] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [04:29:29] (03CR) 10Krinkle: "I've confirmed that the $wgSessionCacheType conflict seems to be won by the wrong side in prod." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [04:30:28] (03PS3) 10Krinkle: multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) [04:49:31] (03PS1) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [04:49:44] (03PS3) 10KartikMistry: cg3: Update to new upstream release 1.3.1 [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/576833 (https://phabricator.wikimedia.org/T234182) [04:53:16] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [04:57:06] (03PS2) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [04:57:14] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [04:58:15] (03PS5) 10KartikMistry: Apertium: Update to new upstream release 3.6.1 [debs/contenttranslation/apertium] - 10https://gerrit.wikimedia.org/r/576664 (https://phabricator.wikimedia.org/T234182) [05:01:21] (03PS3) 10KartikMistry: apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) [05:04:14] (03CR) 10jerkins-bot: [V: 04-1] apertium-lex-tools: Update to new upstream release 0.2.3 [debs/contenttranslation/apertium-lex-tools] - 10https://gerrit.wikimedia.org/r/577045 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [05:32:14] (03PS1) 10KartikMistry: apertium-separable: Update to new upstream release 0.3.3 [debs/contenttranslation/apertium-separable] - 10https://gerrit.wikimedia.org/r/577046 (https://phabricator.wikimedia.org/T234182) [05:35:08] (03CR) 10jerkins-bot: [V: 04-1] apertium-separable: Update to new upstream release 0.3.3 [debs/contenttranslation/apertium-separable] - 10https://gerrit.wikimedia.org/r/577046 (https://phabricator.wikimedia.org/T234182) (owner: 10KartikMistry) [05:36:10] RECOVERY - Check systemd state on mw1399 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:39:05] (03PS1) 10KartikMistry: Add apertium-oci-fra package [debs/contenttranslation/apertium-oci-fra] - 10https://gerrit.wikimedia.org/r/577047 (https://phabricator.wikimedia.org/T202360) [05:42:17] (03CR) 10jerkins-bot: [V: 04-1] Add apertium-oci-fra package [debs/contenttranslation/apertium-oci-fra] - 10https://gerrit.wikimedia.org/r/577047 (https://phabricator.wikimedia.org/T202360) (owner: 10KartikMistry) [06:00:56] (03PS2) 10KartikMistry: Add apertium-oci-fra package [debs/contenttranslation/apertium-oci-fra] - 10https://gerrit.wikimedia.org/r/577047 (https://phabricator.wikimedia.org/T202360) [06:04:09] (03CR) 10jerkins-bot: [V: 04-1] Add apertium-oci-fra package [debs/contenttranslation/apertium-oci-fra] - 10https://gerrit.wikimedia.org/r/577047 (https://phabricator.wikimedia.org/T202360) (owner: 10KartikMistry) [06:18:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1103:3312 db1103:3314 for reimage T246604', diff saved to https://phabricator.wikimedia.org/P10615 and previous config saved to /var/cache/conftool/dbconfig/20200305-061811-marostegui.json [06:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:17] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [06:20:41] !log Stop MySQL on db1103:3312 and db1103:3314 for reimage T246604 [06:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:59] (03PS1) 10Marostegui: Revert "install_server: Allow reimage db1103" [puppet] - 10https://gerrit.wikimedia.org/r/577049 [06:31:14] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10ArielGlenn) Hey @Krenair, I don't have to have it right now, but I might need it again in the future. Basically I used it when we were doing transitio... [06:32:57] 10Operations, 10DBA, 10OTRS, 10Recommendation-API, 10Research: Upgrade and restart m2 primary database master (db1132) - https://phabricator.wikimedia.org/T246098 (10Marostegui) @leila would you be able to discuss this ticket with your team to try to find some suitable dates?. If your service is resilien... [06:34:55] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [06:34:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:25] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:42] !log restart yarn on analytics1074 (GC overhead, traces of network errors with datanodes) [06:48:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:28] RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:54:02] RECOVERY - Hadoop NodeManager on analytics1074 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [06:54:42] goood [06:55:20] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Allow reimage db1103" [puppet] - 10https://gerrit.wikimedia.org/r/577049 (owner: 10Marostegui) [06:56:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312 db1103:3314 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10616 and previous config saved to /var/cache/conftool/dbconfig/20200305-065603-marostegui.json [06:56:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:56:09] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [06:58:21] (03PS4) 10Giuseppe Lavagetto: prometheus::ops: collect envoy stats from all servers [puppet] - 10https://gerrit.wikimedia.org/r/575504 [06:58:48] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/576851 (owner: 10Giuseppe Lavagetto) [07:03:19] (03CR) 10Giuseppe Lavagetto: [C: 03+2] prometheus::ops: collect envoy stats from all servers [puppet] - 10https://gerrit.wikimedia.org/r/575504 (owner: 10Giuseppe Lavagetto) [07:09:18] (03PS1) 10Marostegui: install_server: Allow reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) [07:14:44] (03PS1) 10Giuseppe Lavagetto: prometheus:fix typo [puppet] - 10https://gerrit.wikimedia.org/r/577059 [07:15:04] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] prometheus:fix typo [puppet] - 10https://gerrit.wikimedia.org/r/577059 (owner: 10Giuseppe Lavagetto) [07:19:15] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312 db1103:3314 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10617 and previous config saved to /var/cache/conftool/dbconfig/20200305-071915-marostegui.json [07:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:19:20] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [07:22:06] (03CR) 10Giuseppe Lavagetto: [C: 03+2] envoyproxy: support tcp fast open [puppet] - 10https://gerrit.wikimedia.org/r/576851 (owner: 10Giuseppe Lavagetto) [07:31:47] (03PS1) 10Gergő Tisza: Enable ORES topic matching + remote search on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577060 [07:33:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1103:3312 db1103:3314 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10618 and previous config saved to /var/cache/conftool/dbconfig/20200305-073319-marostegui.json [07:33:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:25] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [07:37:54] (03PS1) 10Marostegui: install_server: Reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577090 (https://phabricator.wikimedia.org/T246604) [07:39:09] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577090 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [07:47:13] o/ I upped the write rate of the terms migration and am keeping my eyes wide open [07:48:00] wrong channel... [07:55:42] (03PS2) 10Muehlenhoff: Add profile::base::no_firewall to load balancers [puppet] - 10https://gerrit.wikimedia.org/r/575022 [08:05:23] (03CR) 10Jcrespo: [C: 03+1] "> The same change as this one? Is there another bug on the port side?" [puppet] - 10https://gerrit.wikimedia.org/r/576398 (https://phabricator.wikimedia.org/T242702) (owner: 10Jcrespo) [08:05:54] (03PS3) 10Jcrespo: prometheus-mysqld-exporter: Workaround upstream package regression [puppet] - 10https://gerrit.wikimedia.org/r/576398 (https://phabricator.wikimedia.org/T242702) [08:05:56] (03CR) 10Marostegui: [C: 03+1] "> > The same change as this one? Is there another bug on the port" [puppet] - 10https://gerrit.wikimedia.org/r/576398 (https://phabricator.wikimedia.org/T242702) (owner: 10Jcrespo) [08:06:22] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 60, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:07:12] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:07:36] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:12:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1103:3312 db1103:3314 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10619 and previous config saved to /var/cache/conftool/dbconfig/20200305-081227-marostegui.json [08:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:34] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [08:13:34] RECOVERY - Disk space on stat1007 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=stat1007&var-datasource=eqiad+prometheus/ops [08:14:49] (03CR) 10Jcrespo: [C: 03+2] prometheus-mysqld-exporter: Workaround upstream package regression [puppet] - 10https://gerrit.wikimedia.org/r/576398 (https://phabricator.wikimedia.org/T242702) (owner: 10Jcrespo) [08:21:26] (03PS3) 10Filippo Giunchedi: install_server: use buster for theemin [puppet] - 10https://gerrit.wikimedia.org/r/576840 (https://phabricator.wikimedia.org/T215301) [08:21:30] RECOVERY - Check systemd state on mw1397 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:27:27] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] install_server: use buster for theemin [puppet] - 10https://gerrit.wikimedia.org/r/576840 (https://phabricator.wikimedia.org/T215301) (owner: 10Filippo Giunchedi) [08:35:24] (03CR) 10Kosta Harlan: [C: 03+1] Enable ORES topic matching + remote search on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577060 (owner: 10Gergő Tisza) [08:35:40] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:36:28] RECOVERY - Check systemd state on mw1400 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:36:28] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:36:52] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 133, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:47:37] 10Operations, 10Phabricator, 10Traffic: Phabricator is inaccessible from Egypt: HTTP 501 error - https://phabricator.wikimedia.org/T246923 (10akosiaris) >>! In T246923#5943203, @Urbanecm wrote: >>>! In T246923#5943004, @Krenair wrote: >> If it is that I would not expect HTTP 501 responses. > >>>! In T246923... [08:47:48] RECOVERY - Check systemd state on mw1393 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:48:04] 10Operations, 10Phabricator, 10Traffic: Phabricator is inaccessible from Egypt: HTTP 501 error - https://phabricator.wikimedia.org/T246923 (10akosiaris) [08:49:30] 10Operations, 10Traffic: switch to irate() instead of rate() for traffic graphs - https://phabricator.wikimedia.org/T246902 (10akosiaris) p:05Triage→03Lowest Lowest priority until it is better phrased and work on it scheduled. But +1 on premise. [08:49:48] 10Operations, 10procurement: eqiad: (16) Hadoop worker node refresh - FY19/20 Q3 - https://phabricator.wikimedia.org/T246784 (10akosiaris) p:05Triage→03Medium [08:49:55] 10Operations, 10Traffic, 10netops: BGP: Investigate isolating codfw and eqiad - https://phabricator.wikimedia.org/T246721 (10akosiaris) p:05Triage→03Medium [08:50:25] !log START warm cache for db1111 & db1126 for Q25-30 million T219123 (pass 1 today) [08:50:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:36] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [08:55:12] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add es5 as new ES, for initial testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) [08:55:38] (03CR) 10Marostegui: [C: 04-2] "Wait for Tuesday 10th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [08:55:48] 10Operations: Log the real X-Client-IP in apache mediawiki logs - https://phabricator.wikimedia.org/T246348 (10akosiaris) p:05Triage→03High [08:55:54] (03CR) 10Marostegui: [C: 04-2] "es1023 is the master for es5" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [08:56:04] 10Operations, 10netops: can aggregated netflow data include the router it was sampled from? - https://phabricator.wikimedia.org/T246186 (10akosiaris) p:05Triage→03Medium [09:04:01] (03PS1) 10Giuseppe Lavagetto: envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 [09:04:03] (03PS1) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [09:05:03] 10Operations, 10Growth-Team, 10Mail, 10MediaWiki-Watchlist: Notifications about changes by Oznamovatel sent to Janbery doesn't seem to be reliable - https://phabricator.wikimedia.org/T245762 (10akosiaris) p:05Triage→03Lowest I 've had a look and I see nothing but successful deliveries to jan.beranek@wi... [09:06:03] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Different age of history logged in and out when from the EU (but not SF office) - https://phabricator.wikimedia.org/T246185 (10akosiaris) p:05Triage→03Medium I can not reproduce this either. Is it still a problem? [09:06:22] 10Operations, 10observability: Have monitoring of updatequerypages cronjobs - https://phabricator.wikimedia.org/T246097 (10akosiaris) p:05Triage→03Medium [09:08:44] (03PS2) 10Giuseppe Lavagetto: envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 [09:08:46] (03PS2) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [09:12:41] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Set es5 as writable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577189 (https://phabricator.wikimedia.org/T246072) [09:13:58] (03CR) 10Marostegui: [C: 04-2] "Wait for Tuesday 10th and wait for https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/577185/ to be pushed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577189 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [09:17:17] <_joe_> is ci broken? [09:17:39] I was checking that https://integration.wikimedia.org/zuul/ [09:17:47] It doesn't look like it is examining anything [09:20:51] <_joe_> hashar: ^^ [09:22:07] marostegui: which change? [09:22:27] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/577185/ [09:22:34] oh the mediawiki-config postmerge thing bah [09:22:59] <_joe_> hashar: also https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/577186/ [09:23:10] <_joe_> this is not being processed either [09:23:35] the mediawiki-config jobs are not processed by Jenkins due to some weird deadlock in Jenkins which we never figured out [09:23:41] fixing it up [09:23:47] the puppet patch, I am digging into the logs [09:24:04] <_joe_> hashar: <3 [09:24:17] thanks hashar! [09:27:36] !log ci: Zuul is not processing changes since ~ 8:07 [09:27:43] (03PS1) 10Vgutierrez: ATS: Enable KA between ats-tls and varnish-fe globally [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) [09:30:50] (03CR) 10Gehel: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [09:34:22] I am digging into zuul [09:36:27] filled as https://phabricator.wikimedia.org/T246973 [09:36:49] (03PS2) 10Vgutierrez: ATS: Enable KA between ats-tls and varnish-fe globally [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) [09:38:59] (03CR) 10Jcrespo: "warning" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [09:39:53] (03CR) 10Marostegui: install_server: Allow reimage db1078 and db2109 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [09:40:15] <_joe_> ok I'll cancel my deployment window at 10:00Z then [09:41:07] 10Operations, 10ops-codfw: (Need by: TBD) codfw: rack/setup/install wdqs200[7-8].codfw.wmnet - https://phabricator.wikimedia.org/T242301 (10Gehel) 05Open→03Resolved [09:41:49] (03PS2) 10Marostegui: install_server: Allow reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) [09:42:38] (03CR) 10Vgutierrez: "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1003/21279/" [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [09:43:44] (03CR) 10Alexandros Kosiaris: [C: 03+2] Route intake-analytics.wm.org to eventgate-analytics-external [puppet] - 10https://gerrit.wikimedia.org/r/573369 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [09:43:51] (03PS2) 10Alexandros Kosiaris: Route intake-analytics.wm.org to eventgate-analytics-external [puppet] - 10https://gerrit.wikimedia.org/r/573369 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [09:44:11] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): (Need by: TBD) codfw: rack/setup/install elastic20{55,56,57,58,59,60}.codfw.wmnet - https://phabricator.wikimedia.org/T241337 (10Gehel) [09:44:23] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): (Need by: TBD) codfw: rack/setup/install elastic20{55,56,57,58,59,60}.codfw.wmnet - https://phabricator.wikimedia.org/T241337 (10Gehel) 05Open→03Resolved [09:45:27] (03CR) 10Ema: [C: 03+1] ATS: Enable KA between ats-tls and varnish-fe globally [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [09:45:29] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [09:51:33] (03CR) 10Jcrespo: [C: 03+1] db-eqiad,db-codfw.php: Add es5 as new ES, for initial testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [09:52:10] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Route intake-analytics.wm.org to eventgate-analytics-external [puppet] - 10https://gerrit.wikimedia.org/r/573369 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [09:52:12] (03CR) 10Jcrespo: [C: 03+1] db-eqiad,db-codfw.php: Set es5 as writable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577189 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [09:53:04] !log Restarting Zuul, it no more process Gerrit events due to a thread stuck waiting on Gerrit.. T246973 [09:53:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:10] T246973: CI / Zuul not processing changes - https://phabricator.wikimedia.org/T246973 [09:53:31] marostegui: _joe_ : I have restarted zuul. Somehow something got stuck with Gerrit :-\ [09:53:46] hashar: thanks :* [09:54:00] I have no idea what happened :/ [09:54:00] hashar: should we "recheck"? [09:54:04] (03CR) 10Vgutierrez: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [09:54:12] (03CR) 10Jcrespo: [C: 03+1] install_server: Allow reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [09:54:22] marostegui: yes [09:54:28] cool, thanks [09:54:34] (03CR) 10Marostegui: [C: 04-2] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [09:54:45] (03CR) 10Marostegui: [C: 04-2] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577189 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [09:55:02] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577187 (owner: 10Giuseppe Lavagetto) [09:55:04] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [09:55:07] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [09:55:09] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [09:55:21] (03CR) 10Marostegui: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [09:57:00] jouncebot: next [09:57:00] In 0 hour(s) and 2 minute(s): SRE mediawiki-config rollout (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1000) [09:58:33] (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [10:00:05] _joe_ and hnowlan: I, the Bot under the Fountain, allow thee, The Deployer, to do SRE mediawiki-config rollout deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1000). [10:00:05] _joe_: A patch you scheduled for SRE mediawiki-config rollout is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [10:00:35] (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable KA between ats-tls and varnish-fe globally [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [10:02:25] (03PS3) 10Vgutierrez: ATS: Enable KA between ats-tls and varnish-fe globally [puppet] - 10https://gerrit.wikimedia.org/r/577198 (https://phabricator.wikimedia.org/T244464) [10:04:10] (03PS2) 10Muehlenhoff: Remove cas-logstash-next from IDP service definition [puppet] - 10https://gerrit.wikimedia.org/r/576921 [10:04:11] <_joe_> hashar: is ci back then? [10:04:22] <_joe_> ok, so back to the deployment [10:04:44] yeah hopefully [10:04:59] (03CR) 10Marostegui: [C: 03+2] install_server: Allow reimage db1078 and db2109 [puppet] - 10https://gerrit.wikimedia.org/r/577058 (https://phabricator.wikimedia.org/T246604) (owner: 10Marostegui) [10:08:43] (03CR) 10Alexandros Kosiaris: [C: 03+1] ProductionServices: use local http proxy for parsoid, parsoidphp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575269 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:08:49] (03PS3) 10Giuseppe Lavagetto: ProductionServices: use local http proxy for parsoid, parsoidphp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575269 (https://phabricator.wikimedia.org/T244843) [10:11:06] !log START warm cache for db1111 & db1126 for Q25-30 million T219123 (pass 2 today) [10:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:11] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [10:11:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2109 for reimage to buster - T246604', diff saved to https://phabricator.wikimedia.org/P10621 and previous config saved to /var/cache/conftool/dbconfig/20200305-101111-marostegui.json [10:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:16] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [10:12:05] (03CR) 10Giuseppe Lavagetto: [C: 03+2] ProductionServices: use local http proxy for parsoid, parsoidphp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575269 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:12:32] !log Stop MySQL on db2109 for reimage - T246604 [10:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:05] (03Merged) 10jenkins-bot: ProductionServices: use local http proxy for parsoid, parsoidphp [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575269 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:14:30] !log Enable keep alive between ats-tls and varnish-fe globally - T244464 [10:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:14:34] T244464: Investigate side-effects of enabling KA between ats-tls and varnish-fe - https://phabricator.wikimedia.org/T244464 [10:16:07] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Yippi!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575270 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:18:05] !log oblivian@deploy1001 Synchronized wmf-config/ProductionServices.php: Switch parsoid calls to use envoy as a proxy (duration: 01m 07s) [10:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:11] (03PS3) 10Giuseppe Lavagetto: ProductionServices: use the local proxy for sessionstore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575270 (https://phabricator.wikimedia.org/T244843) [10:22:18] (03CR) 10Giuseppe Lavagetto: [C: 03+2] ProductionServices: use the local proxy for sessionstore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575270 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:23:11] (03Merged) 10jenkins-bot: ProductionServices: use the local proxy for sessionstore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575270 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [10:23:25] <_joe_> so hnowlan [10:23:29] <_joe_> the change is merged [10:23:44] marostegui: seems the root cause of the ci/zuul lock is a mysql connection being lost :\ [10:23:56] <_joe_> now if you run git remote update in mediawiki-staging, then run git diff origin/master [10:24:04] <_joe_> you should confirm only my change is pending [10:24:06] which eventually causes Gerrit to get a connection exception and it does not properly terminate the ssh command Zuul does ;) [10:24:38] _joe_: yep [10:24:41] I see it [10:24:52] <_joe_> ok now you can git pull --ff-only [10:25:05] done [10:25:12] <_joe_> and that should pull just my change to wmf-config/ProductionServices.php [10:25:22] yeah [10:25:30] <_joe_> now, do you have the wikimedia debug browser extension installed [10:25:32] <_joe_> ? [10:25:48] no, will do now [10:25:52] <_joe_> ok [10:26:08] got it [10:26:09] <_joe_> I can in the meanwhile proceed with the deploy and describe you what I do [10:26:13] <_joe_> oh great [10:26:20] <_joe_> so, if you open its icon [10:26:30] <_joe_> you can choose some hosts to send your request to [10:27:14] <_joe_> you can use mwdebug1002 [10:27:22] (03PS1) 10Marostegui: install_server: Remove duplicate line [puppet] - 10https://gerrit.wikimedia.org/r/577206 [10:27:27] <_joe_> so, first we need to pull the new code to mwdebug1002 [10:27:38] <_joe_> ssh to mwdebug1002 and run 'scap pull' as your user [10:27:48] <_joe_> (yes, this is seriously the procedure) [10:28:20] okay, done (seems to have been very quick?) [10:28:28] <_joe_> yes [10:28:34] <_joe_> it's just one file to download [10:28:43] <_joe_> now, are you logged in on mediawiki.org? [10:29:15] yes [10:29:26] <_joe_> if you enable the debug extension selecting mwdebug1002 [10:29:34] <_joe_> you should keep being logged in if everything works [10:29:41] (03CR) 10Marostegui: [C: 03+2] install_server: Remove duplicate line [puppet] - 10https://gerrit.wikimedia.org/r/577206 (owner: 10Marostegui) [10:29:42] <_joe_> it means we still reach sessionstore correctly [10:30:02] lgtm [10:30:02] (03CR) 10Jbond: [C: 03+1] Remove cas-logstash-next from IDP service definition [puppet] - 10https://gerrit.wikimedia.org/r/576921 (owner: 10Muehlenhoff) [10:30:23] <_joe_> hnowlan: hah sigh, hold off [10:30:41] <_joe_> the access log from envoy leaks session keys I think [10:30:46] <_joe_> I have to remove it [10:30:47] heh [10:31:27] <_joe_> hnowlan: let's revert for now :P [10:31:32] <_joe_> I'll fix this later [10:31:34] sigh [10:31:46] <_joe_> akosiaris: it will be matter of today hopefully [10:31:59] aren't session keys in the url? [10:32:04] <_joe_> yes [10:32:14] <_joe_> so I just have to avoid logging them [10:32:15] _joe_: ack - revert via gerrit etc I assume [10:32:21] <_joe_> hnowlan: exactly [10:32:22] what's the value?, I don't remember [10:32:26] <_joe_> wanna do the honors? [10:32:29] <_joe_> akosiaris: the session content [10:32:30] I mean, maybe it's kv sotre? [10:32:33] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolforge: remove monitoring for old k8s cluster nodes and flannel etcd [puppet] - 10https://gerrit.wikimedia.org/r/576992 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [10:32:39] store* and it's fine to log the name of the key ? [10:32:44] <_joe_> which is what is actually important [10:32:50] <_joe_> well I wouldn't take the chance [10:32:56] (03CR) 10Jbond: [V: 03+2 C: 03+2] ssosessions: enable the sso sessions end point [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/576884 (https://phabricator.wikimedia.org/T233938) (owner: 10Jbond) [10:33:24] sure revert, but maybe instead of not logging them, ask eric+petr first? [10:33:35] it might be ok to log the session keys [10:33:45] it might also not be ok, not sure [10:33:53] <_joe_> akosiaris: also given the req/s I was already dubious [10:34:22] (03PS1) 10Giuseppe Lavagetto: Revert "ProductionServices: use the local proxy for sessionstore" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577207 [10:34:49] <_joe_> hnowlan: so I'll merge this revert, then you would need to do the same steps we just did [10:34:55] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "ProductionServices: use the local proxy for sessionstore" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577207 (owner: 10Giuseppe Lavagetto) [10:35:19] <_joe_> in general I'm a fan of reverting when in doubt :) [10:35:20] (03PS1) 10Hnowlan: ProductionServices: Revert to using discovery for sessionstore. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577208 (https://phabricator.wikimedia.org/T244843) [10:35:25] <_joe_> ah [10:35:29] <_joe_> sorry I already did it [10:35:31] oh heh [10:35:37] I'll close [10:35:48] (03Merged) 10jenkins-bot: Revert "ProductionServices: use the local proxy for sessionstore" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577207 (owner: 10Giuseppe Lavagetto) [10:35:55] <_joe_> ok [10:36:04] (03Abandoned) 10Hnowlan: ProductionServices: Revert to using discovery for sessionstore. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577208 (https://phabricator.wikimedia.org/T244843) (owner: 10Hnowlan) [10:36:05] <_joe_> do you want to re-do the same steps as before? [10:36:20] cool [10:36:35] <_joe_> basically git remote update, git diff origin/master, git pull --ff-only should work [10:36:39] (03PS1) 10Addshore: Read from the new term store up to Q30 mill everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577209 (https://phabricator.wikimedia.org/T219123) [10:36:49] <_joe_> and then scap pull on the server where did scap pull before [10:37:25] (03CR) 10Hnowlan: [C: 03+2] MWScript: Allow MWScript to be invoked via phpdbg as well as the cli [mediawiki-config] - 10https://gerrit.wikimedia.org/r/573558 (https://phabricator.wikimedia.org/T244549) (owner: 10Hnowlan) [10:37:48] (03PS1) 10Vgutierrez: ATS: Disable parent proxies on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/577210 (https://phabricator.wikimedia.org/T244464) [10:38:01] (03PS1) 10Addshore: Write to the new terms store up to Q 87 million [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577211 (https://phabricator.wikimedia.org/T219123) [10:38:21] <_joe_> ok, with your patch we should get to the last step [10:38:24] (03Merged) 10jenkins-bot: MWScript: Allow MWScript to be invoked via phpdbg as well as the cli [mediawiki-config] - 10https://gerrit.wikimedia.org/r/573558 (https://phabricator.wikimedia.org/T244549) (owner: 10Hnowlan) [10:38:29] <_joe_> which is actuall deploying it :P [10:39:56] (03CR) 10Jbond: [C: 03+1] "lgtm" [homer/public] - 10https://gerrit.wikimedia.org/r/576886 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [10:40:21] <_joe_> hnowlan: did you run scap pull on mwdebug1002? [10:40:25] _joe_: yep [10:40:27] (03CR) 10Jbond: [C: 03+1] Add support for multiple SNMP communities [homer/mock-private] - 10https://gerrit.wikimedia.org/r/576885 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [10:40:30] <_joe_> ok [10:40:34] <_joe_> test that your change works [10:40:47] <_joe_> basically that you can call mwscript from phpdbg :) [10:41:28] (03CR) 10Muehlenhoff: [C: 03+2] Remove cas-logstash-next from IDP service definition [puppet] - 10https://gerrit.wikimedia.org/r/576921 (owner: 10Muehlenhoff) [10:41:41] looks good to me [10:43:05] <_joe_> ok now you need to run scap from deploy1001, I'll talk you through it in private [10:45:45] (03CR) 10Jbond: [C: 03+2] mediawiki::maintenance: force removal of directories [puppet] - 10https://gerrit.wikimedia.org/r/576323 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [10:46:44] (03PS2) 10Vgutierrez: ATS: Disable parent proxies on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/577210 (https://phabricator.wikimedia.org/T244464) [10:47:44] (03CR) 10Jbond: [C: 03+2] cli: filter the hosts array to remove empty elements [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/576663 (owner: 10Jbond) [10:48:02] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [10:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:34] <_joe_> jbond42: argh I didn't see that patch [10:48:34] !log hnowlan@deploy1001 Synchronized multiversion/MWScript.php: T244549: enable running MWScript with phpdbg (duration: 01m 04s) [10:48:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:38] T244549: Enable phpdbg on mwdebug* servers - https://phabricator.wikimedia.org/T244549 [10:48:39] (03Merged) 10jenkins-bot: cli: filter the hosts array to remove empty elements [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/576663 (owner: 10Jbond) [10:48:43] <_joe_> it wasn't needed :) [10:48:53] <_joe_> the one to mediawiki::maintenance [10:49:09] <_joe_> I should fix it next week actually by switching to timers [10:49:47] (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1001/21281/" [puppet] - 10https://gerrit.wikimedia.org/r/577210 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [10:50:01] <_joe_> hnowlan: you're now a crowned deployer! [10:50:07] \o/ [10:50:26] <_joe_> jokes aside, things get more complicated if you don't do simple patches to mediawiki-config [10:50:29] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:50:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:37] yeah I can imagine :) [10:51:07] _joe_: i was in the middle of applying the change, and it had allready started to delete the historic log files. puppet claims it has sent them to the bucket and i have halted and disabled puppet now on mwmain2001. are thos log files needed or can i continue with the change as is? [10:51:36] <_joe_> no no it's ok sorry [10:51:44] <_joe_> I just wanted to spare you the work [10:52:39] ahh ok well thats good :) thanks [10:55:21] <_joe_> we've finished our deployment window btw [10:55:39] woo! [10:55:52] jouncebot: next [10:55:52] In 1 hour(s) and 4 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1200) [10:56:27] * addshore will be raising wikidata new term storage reads form 25 million to 30 million soon (once the cache warming finishes) [10:57:06] (03PS2) 10Addshore: Read from the new term store up to Q30 mill everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577209 (https://phabricator.wikimedia.org/T219123) [10:57:11] (03PS2) 10Addshore: Write to the new terms store up to Q 87 million [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577211 (https://phabricator.wikimedia.org/T219123) [10:57:24] (03CR) 10Addshore: [C: 03+2] Read from the new term store up to Q30 mill everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577209 (https://phabricator.wikimedia.org/T219123) (owner: 10Addshore) [10:58:50] (03Merged) 10jenkins-bot: Read from the new term store up to Q30 mill everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577209 (https://phabricator.wikimedia.org/T219123) (owner: 10Addshore) [10:59:43] (03PS1) 10Jbond: 0.6.2: prepare for release [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/577214 [11:00:02] (03CR) 10Jbond: [V: 03+2 C: 03+2] 0.6.2: prepare for release [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/577214 (owner: 10Jbond) [11:03:58] (03CR) 10Ema: [C: 03+1] ATS: Disable parent proxies on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/577210 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [11:04:31] !log small update to PCC https://gerrit.wikimedia.org/r/c/operations/software/puppet-compiler/+/576663 [11:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:55] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Reading up to Q30M for the new term store everywhere (was Q25M) + warm db1126 & db1111 caches (T219123) (duration: 01m 05s) [11:04:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:59] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [11:06:08] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Reading up to Q30M for the new term store everywhere (was Q25M) + warm db1126 & db1111 caches (T219123) cache bust (duration: 01m 04s) [11:06:09] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable parent proxies on ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/577210 (https://phabricator.wikimedia.org/T244464) (owner: 10Vgutierrez) [11:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:33] !log Disable parent proxies on ats-tls in ulsfo - T244464 [11:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:38] T244464: Investigate side-effects of enabling KA between ats-tls and varnish-fe - https://phabricator.wikimedia.org/T244464 [11:12:02] 10Operations, 10Phabricator, 10Security-Team, 10Security: Adjust onboarding/offboarding logic to accommodate changes to #security (now acl*security) - https://phabricator.wikimedia.org/T245771 (10jbond) @chasemp can you please crate me a couple of test users so i can ensure the off boarding script is worki... [11:15:36] (03CR) 10Addshore: [C: 03+2] Write to the new terms store up to Q 87 million [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577211 (https://phabricator.wikimedia.org/T219123) (owner: 10Addshore) [11:16:42] (03Merged) 10jenkins-bot: Write to the new terms store up to Q 87 million [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577211 (https://phabricator.wikimedia.org/T219123) (owner: 10Addshore) [11:19:26] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87 million, was 86 (T219123) (duration: 01m 04s) [11:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:31] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [11:20:34] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87 million, was 86 (T219123) cache bust (duration: 01m 03s) [11:20:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:38] addshore: I think you forgot to do "git submodule update" on the wikibase backport [11:21:48] (03PS1) 10KartikMistry: apertium-fra-cat: Updated to upstream release 1.7.0 [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/577216 (https://phabricator.wikimedia.org/T233700) [11:21:58] Amir1: I havn't done the backport yet, swat isnt for 40 mins :) [11:22:18] oh okay [11:22:52] I did +2 it 1 hour early though (by accident) I thought the last deploy slot went straight into swat, but I want an hour out [11:23:08] (I did it for you, we will just sync it during the SWAT) [11:23:27] Amir1: well, we could also do it now :P [11:23:50] cool, I do Cognate change you do the wikibase [11:24:05] Amir1: you want to sync the wikibase one? (as you touched it last)? [11:24:23] * addshore will watch [11:24:46] (03CR) 10jerkins-bot: [V: 04-1] apertium-fra-cat: Updated to upstream release 1.7.0 [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/577216 (https://phabricator.wikimedia.org/T233700) (owner: 10KartikMistry) [11:24:48] (03PS3) 10Giuseppe Lavagetto: envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 [11:24:50] (03PS3) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [11:24:56] sure [11:25:22] addshore: to be 100% sure, you're talking about https://gerrit.wikimedia.org/r/c/576963 ? [11:25:45] !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.22/extensions/Cognate: [[gerrit:576876|Exit undelete hook early if revision not found (T245869)]] (duration: 01m 04s) [11:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:50] T245869: Undeleting of an entry (NS=0) is no longer possible in de.wiktionary - https://phabricator.wikimedia.org/T245869 [11:26:06] Amir1: yup [11:26:26] Amir1: I added some panels for the job to the term storage grafana dashboard too btw :) [11:26:36] (just incase you havn't reloaded it in days) [11:28:11] (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [11:28:47] thanks. I haven't reloaded it for a week now [11:29:29] !log ladsgroup@deploy1001 Synchronized php-1.35.0-wmf.22/extensions/Wikibase: [[gerrit:576963|Schedule 1 CleanTermsIfUnusedJob per ID to clean (T244115 T246898)]] (duration: 01m 08s) [11:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:34] T244115: Investigate & Fix holes for aliases in new term tables (take 3) - https://phabricator.wikimedia.org/T244115 [11:29:35] T246898: Wikibase\Repo\Content\DataUpdateAdapter::doUpdate: Commit failed on server(s) 10.64.48.172: Cannot execute query from Wikibase\Repo\Content\DataUpdateAdapter::doUpdate while transaction status is ERROR - https://phabricator.wikimedia.org/T246898 [11:29:42] Yup, I figured as much, I just have a lovely windows full of terms related dashboards that stays open right now.. [11:32:42] (03PS1) 10Ladsgroup: Stop writing to the old term store for properties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577218 (https://phabricator.wikimedia.org/T219301) [11:35:10] (03PS2) 10Ladsgroup: Stop writing to the old term store for properties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577218 (https://phabricator.wikimedia.org/T219301) [11:36:04] (03CR) 10Ladsgroup: [C: 03+2] Stop writing to the old term store for properties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577218 (https://phabricator.wikimedia.org/T219301) (owner: 10Ladsgroup) [11:36:20] orilly! wow [11:37:02] (03Merged) 10jenkins-bot: Stop writing to the old term store for properties [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577218 (https://phabricator.wikimedia.org/T219301) (owner: 10Ladsgroup) [11:37:36] apergos: ! :D [11:38:12] can't wait to see that for items too! [11:38:19] it's been a long time coming [11:38:35] apergos: at the current rate we are writing & migrating, that coul happen as soon as next week [11:39:05] good! because it was coming up time for me to ask again how long for the migration to complete (I ask once a month) :-D [11:39:38] I'll probably switch over to dumping the new tables on the 20th run then. excellent! [11:42:17] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:577218|Stop writing to the old term store for properties (T219301 T225054)]] (duration: 01m 04s) [11:42:18] Amir1: where can I see the length of the job queue for a certain type? On the dashboard all I see if processing rate etc [11:42:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:23] T225054: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_NEW - https://phabricator.wikimedia.org/T225054 [11:42:23] T219301: Migrate to and read from new store for property terms - https://phabricator.wikimedia.org/T219301 [11:42:46] addshore: there's time in backlog [11:42:51] gotcha! [11:42:51] let me get it [11:43:04] I see it :) it all looks good [11:43:31] https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?orgId=1&fullscreen&panelId=6 [11:43:35] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:577218|Stop writing to the old term store for properties (T219301 T225054)]], take II (duration: 01m 04s) [11:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:11] PROBLEM - Check systemd state on mwdebug2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:45:35] !log deleting property terms from wb_terms in wikidatawiki (T225054) [11:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:45] I do this mostly to make sure we catch bugs [11:46:05] yup, and so people notice [11:48:23] RECOVERY - Check systemd state on mwdebug2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:50:31] https://phabricator.wikimedia.org/T246980 -- potential train blocker [11:51:54] addshore: ^ The same thing we had [11:52:06] * addshore looks up [11:52:16] liw: I think there's a ticket already [11:52:32] sorry, I didn't find that [11:53:02] also related https://phabricator.wikimedia.org/T245396 [11:53:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2109 after reimage to buster - T246604', diff saved to https://phabricator.wikimedia.org/P10622 and previous config saved to /var/cache/conftool/dbconfig/20200305-115322-marostegui.json [11:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:28] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [11:54:19] 100k rows removed from wb_terms so far, that would stop it from growing for a couple of days :D [11:54:47] Amir1: unfortunately until we defragment it, we won't see that space back on disk :( [11:55:53] marostegui: nah, let's just drop it (probably in a couple of weeks I think) [11:57:37] Amir1, addshore, should I un-UBN the task I filed, and drop it as a train blocker? is the already filed ticket T245396 or something else? I can at least link the two [11:57:37] T245396: SimpleCacheWithBagOStuff shouldnt be so easy to use bad keys with - https://phabricator.wikimedia.org/T245396 [11:58:04] tarrow: is going to look at it now has it keeps coming up [11:58:20] (03PS4) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [11:58:48] 200k rows [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1200). Please do the needful. [12:00:04] addshore: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:01:06] (03PS4) 10Giuseppe Lavagetto: envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 [12:01:08] (03PS5) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [12:01:24] Amir1: yeah, that's what I told addshore a few days ago, there's no point in investing time cleaning it up, just let's drop it when ready [12:01:59] marostegui: this is mostly for catching bugs, if any place uses the old store, it starts to show the holes [12:02:29] ah ok! [12:02:52] Amir1: can you !log when that property deletion stuff finishes? so that I can add it to the graphs as an annotation? :) [12:03:19] Sure, I'm taking it slow so the whole database doesn't go read-only (In batches of 10k) [12:03:49] 300k rows [12:05:27] (03CR) 10jerkins-bot: [V: 04-1] envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [12:05:27] <_joe_> that's faster than jenkins [12:05:30] <_joe_> arg [12:05:36] <_joe_> 5 minutes for a -1? [12:06:42] !log the property terms removal is finished. 312K rows deleted (T225054) [12:06:46] woo! [12:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:47] T225054: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_NEW - https://phabricator.wikimedia.org/T225054 [12:08:45] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) Fixed package is neutron-bgp-dragent_11.0.0-2~bpo9+1 and friends. [12:10:05] (03PS1) 10Jcrespo: wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) [12:10:13] Amir1: \o/ [12:10:40] (03CR) 10jerkins-bot: [V: 04-1] wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) (owner: 10Jcrespo) [12:10:54] Finally [12:18:32] (03PS1) 10ArielGlenn: rename 'parts' attribute of Dump subclasses to something more accurate [dumps] - 10https://gerrit.wikimedia.org/r/577225 (https://phabricator.wikimedia.org/T246465) [12:23:13] (03PS1) 10ArielGlenn: make value of 'parts' in the file listing methods be None or a list [dumps] - 10https://gerrit.wikimedia.org/r/577226 (https://phabricator.wikimedia.org/T246465) [12:31:38] (03PS1) 10ArielGlenn: New class for output file listing methods to move them out of jobs code [dumps] - 10https://gerrit.wikimedia.org/r/577228 (https://phabricator.wikimedia.org/T246465) [12:33:22] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) hey @ayounsi do you have to enable anything in your side for BGP to work? I see something weird, I get a no route to host error here: `lang=s... [12:49:43] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) The ARP reply is produced and reach my server, just I don't know yet what's going on, or which interface is this reply packet using: `lang=sh... [12:52:42] !log START warm cache for db1111 & db1126 for Q30-32 million (100k batch selects, 30s sleep) T219123 (pass 1) [12:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:47] T219123: Migrate to and read from new store for item terms - https://phabricator.wikimedia.org/T219123 [12:56:21] 10Operations, 10LDAP-Access-Requests: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10akosiaris) @Moushira Any updates on the new contract expiry? Thanks! [12:56:38] !log stop that cache warming .... [12:56:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:39] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10aborrero) Forget last 2 comments. I think I can assign the address to the bridge device instead of the vlan device and everything should work as expected. [13:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1300) [13:05:00] 10Operations, 10LDAP-Access-Requests: access to Superset for Alex Hollender - https://phabricator.wikimedia.org/T244490 (10akosiaris) 05Open→03Resolved @MNovotny_WMF thanks! @alexhollender, you should already have access to http://superset.wikimedia.org. Make sure to type your shell login name (ahollender... [13:05:47] (03PS1) 10Arturo Borrero Gonzalez: codfw: cloudnet: refresh extra address FQDN [dns] - 10https://gerrit.wikimedia.org/r/577232 (https://phabricator.wikimedia.org/T245606) [13:06:58] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal: Add Abban Dunne to the ldap/wmde group - https://phabricator.wikimedia.org/T246664 (10akosiaris) 05Open→03Resolved a:03akosiaris @AbbanWMDE you have been added to the WMDE ldap group. I 'll resolve this, feel free to reopen. Thanks! [13:07:22] (03CR) 10CDanis: [C: 03+1] grafana: remove the obsolete X-WEBAUTH-USER hack for grafana logins [puppet] - 10https://gerrit.wikimedia.org/r/575759 (https://phabricator.wikimedia.org/T246508) (owner: 10Andrew Bogott) [13:07:45] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] codfw: cloudnet: refresh extra address FQDN [dns] - 10https://gerrit.wikimedia.org/r/577232 (https://phabricator.wikimedia.org/T245606) (owner: 10Arturo Borrero Gonzalez) [13:07:47] (03CR) 10CDanis: [C: 03+1] db-eqiad,db-codfw.php: Add es5 as new ES, for initial testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577185 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [13:08:05] (03CR) 10CDanis: [C: 03+1] db-eqiad,db-codfw.php: Set es5 as writable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577189 (https://phabricator.wikimedia.org/T246072) (owner: 10Marostegui) [13:09:03] 10Operations, 10Release-Engineering-Team-TODO: Should 'doc' machines (i.e. doc1001) have contint-roots as a group? - https://phabricator.wikimedia.org/T245691 (10akosiaris) I 'll remove the #sre-access-requests tag for based on the task status. Feel free to readd when it's back to Open so SRE can work on it. [13:09:19] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: introduce filtering for neutron BGP addresses - https://phabricator.wikimedia.org/T246887 (10aborrero) [13:09:23] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2356.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [13:09:47] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2357.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [13:10:27] (03CR) 10CDanis: [C: 03+1] Add support for multiple BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/576886 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:10:42] (03CR) 10CDanis: [C: 03+1] Add support for multiple SNMP communities [homer/mock-private] - 10https://gerrit.wikimedia.org/r/576885 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:23:33] 10Operations, 10Discovery-Search: commonswiki_content shards > 50GB - need resharding - https://phabricator.wikimedia.org/T246986 (10Gehel) [13:24:24] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [13:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:43] 10Operations, 10Discovery-Search: commonswiki_content shards > 50GB - need resharding - https://phabricator.wikimedia.org/T246986 (10Gehel) [13:26:30] (03PS5) 10Dave Pifke: Scrape webperf Prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) [13:26:52] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [13:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:58] (03CR) 10jerkins-bot: [V: 04-1] Scrape webperf Prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) (owner: 10Dave Pifke) [13:32:40] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2356.codfw.wmnet'] ` and were **ALL** successful. [13:34:26] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2358.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [13:37:55] (03CR) 10Ayounsi: [C: 03+2] Add support for multiple SNMP communities [homer/mock-private] - 10https://gerrit.wikimedia.org/r/576885 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:37:59] (03CR) 10Ayounsi: [C: 03+2] Add support for multiple BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/576886 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:38:31] (03Merged) 10jenkins-bot: Add support for multiple SNMP communities [homer/mock-private] - 10https://gerrit.wikimedia.org/r/576885 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:38:36] (03Merged) 10jenkins-bot: Add support for multiple BGP communities [homer/public] - 10https://gerrit.wikimedia.org/r/576886 (https://phabricator.wikimedia.org/T246890) (owner: 10Ayounsi) [13:45:31] tarrow, re class@anonymous -- it's currently a train blocker, but I'm thinking there's maybe not enough instances of the error to block the train: do you have an opinion [13:47:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1078 for reimage to buster - T246604', diff saved to https://phabricator.wikimedia.org/P10623 and previous config saved to /var/cache/conftool/dbconfig/20200305-134701-marostegui.json [13:47:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:08] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [13:48:49] !log Stop MySQL on db1078 for reimage - T246604 [13:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:23] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [13:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:44] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [13:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:58] liw: I *suspect* that we won't see an increase when going to group2 [13:51:30] tarrow, in that case I'll promote to group2 in a few minutes, thanks [13:51:49] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [13:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:55] (03PS6) 10Dave Pifke: Scrape webperf Prometheus metrics [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) [13:53:37] 👍 [13:54:21] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [13:54:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:08] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) a:03Papaul [13:56:34] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2358.codfw.wmnet'] ` and were **ALL** successful. [13:59:06] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2357.codfw.wmnet'] ` and were **ALL** successful. [14:00:04] liw and Brennen: Your horoscope predicts another unfortunate Mediawiki train - European+American Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1400). [14:01:52] (03PS1) 10Lars Wirzenius: all wikis to 1.35.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577238 [14:01:54] (03CR) 10Lars Wirzenius: [C: 03+2] all wikis to 1.35.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577238 (owner: 10Lars Wirzenius) [14:02:33] * addshore watches [14:03:06] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577238 (owner: 10Lars Wirzenius) [14:03:07] !log set all eqiad/codfw PDUs, cord W thresholds to 3440 - T245655 [14:03:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:12] T245655: audit/rebalance power in a5-eqiad - https://phabricator.wikimedia.org/T245655 [14:03:21] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime [14:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:37] (03PS6) 10Alexandros Kosiaris: changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) [14:03:39] (03PS1) 10Alexandros Kosiaris: admin: Add redis databases for changeprop [deployment-charts] - 10https://gerrit.wikimedia.org/r/577239 (https://phabricator.wikimedia.org/T213193) [14:05:20] !log liw@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.22 [14:05:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:05:39] 10Operations, 10ops-eqiad, 10DC-Ops: audit/rebalance power in a5-eqiad - https://phabricator.wikimedia.org/T245655 (10ayounsi) Done and doc updated - https://wikitech.wikimedia.org/wiki/LibreNMS#Mass_update_PDU_alerting_thresholds [14:05:40] 10Operations, 10ops-eqiad, 10DC-Ops: audit/rebalance power in a5-eqiad - https://phabricator.wikimedia.org/T245655 (10ayounsi) 05Open→03Resolved [14:05:50] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:08] 10Operations, 10observability, 10serviceops, 10vm-requests: Provision grafana VM in codfw - https://phabricator.wikimedia.org/T244357 (10akosiaris) Is there anything left to do in this task? [14:09:10] 10Operations, 10netops, 10cloud-services-team (Kanban): CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 (10ayounsi) `lang=diff [edit protocols bgp] group Netflow { ... } + /* T245606 */ + group Cloud { + import BGP_Cloud_in; + family... [14:09:27] Hey operations just dropping in because I wanted to thank you all for what you do! [14:09:28] !log push BGP to Cloud on cr1-codfw - T245606 [14:09:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:33] T245606: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 [14:12:06] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, worth a try" [puppet] - 10https://gerrit.wikimedia.org/r/576910 (https://phabricator.wikimedia.org/T239458) (owner: 10Cwhite) [14:13:08] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, worth a try" [puppet] - 10https://gerrit.wikimedia.org/r/576908 (https://phabricator.wikimedia.org/T239090) (owner: 10Cwhite) [14:13:48] !log Password reset for SUL User:Yezi Brook (T246988) [14:13:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:52] T246988: Password reset for SUL user:Yezi Brook - https://phabricator.wikimedia.org/T246988 [14:16:03] (03CR) 10Filippo Giunchedi: [C: 03+2] "LGTM https://puppet-compiler.wmflabs.org/compiler1002/21282/" [puppet] - 10https://gerrit.wikimedia.org/r/572141 (https://phabricator.wikimedia.org/T175087) (owner: 10Dave Pifke) [14:17:56] Zppix: <3 [14:18:24] cdanis: np <3 [14:21:08] (03PS1) 10KartikMistry: apertium-spa-cat: Update to new upstream release 0.2.2 [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/577243 (https://phabricator.wikimedia.org/T233700) [14:21:20] (03CR) 10jerkins-bot: [V: 04-1] apertium-spa-cat: Update to new upstream release 0.2.2 [debs/contenttranslation/apertium-spa-cat] - 10https://gerrit.wikimedia.org/r/577243 (https://phabricator.wikimedia.org/T233700) (owner: 10KartikMistry) [14:25:34] !log push BGP to Cloud on cr2-codfw - T245606 [14:25:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:39] T245606: CloudVPS: enable BGP in the neutron transport network - https://phabricator.wikimedia.org/T245606 [14:27:51] (03PS5) 10Giuseppe Lavagetto: envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 [14:27:53] (03PS6) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format [puppet] - 10https://gerrit.wikimedia.org/r/577187 [14:29:17] * liw doesn't see widepread breakage after promoting to group2, so far [14:29:37] (03PS3) 10Andrew Bogott: grafana: remove the obsolete X-WEBAUTH-USER hack for grafana logins [puppet] - 10https://gerrit.wikimedia.org/r/575759 (https://phabricator.wikimedia.org/T246508) [14:36:13] (03CR) 10Andrew Bogott: [C: 03+2] grafana: remove the obsolete X-WEBAUTH-USER hack for grafana logins [puppet] - 10https://gerrit.wikimedia.org/r/575759 (https://phabricator.wikimedia.org/T246508) (owner: 10Andrew Bogott) [14:36:54] (03CR) 10Andrew Bogott: [C: 03+2] neutron: update l3_agent hacks for Queens [puppet] - 10https://gerrit.wikimedia.org/r/576928 (owner: 10Andrew Bogott) [14:38:48] tarrow, so far there's no explostion of class@anonymous errors at least, after group2 promotion [14:39:19] liw: I agree, the little recent flurry was created by my digging [14:40:09] tarrow, excellent; I'll give it another 20 minutes and if there's still no explosion, I'll remove the task as a blocker [14:40:17] just to be tidy [14:40:19] Cool! Thanks :) [14:45:10] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [14:45:10] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [14:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:18] !log copied hpssaducli to thirdparty/hwraid for buster-wikimedia (current releases are named ssaducli now, but retain the old package (which only uses libc anyway) for backwards compat [14:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:02] (03CR) 10Jforrester: [C: 03+1] "Oops, my fault, I renamed it to avoid the dblist/wiki name collision, and forgot to clean this one up." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [14:52:15] !log copied hpssacli to thirdparty/hwraid for buster-wikimedia (current Gen 10 releases are named ssaducli now, but retain the old package (which only uses libc anyway) for backwards compat with gen9 on Buster) [14:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:55] (03PS1) 10Gehel: cirrus: initial configuration of elastic20[55-60] [puppet] - 10https://gerrit.wikimedia.org/r/577250 (https://phabricator.wikimedia.org/T246975) [14:55:34] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [14:55:34] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [14:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:29] 10Operations, 10LDAP-Access-Requests: Allow LDAP access to superset dashboards for Moushira Elamrawy - https://phabricator.wikimedia.org/T242000 (10Moushira) Hello @akosiaris actually not yet, possibly within the next two weeks. Thanks for checking. [14:56:42] tarrow, removed the train blocker [14:59:43] (03CR) 10Giuseppe Lavagetto: [C: 03+2] envoyproxy: run validation as user envoy [puppet] - 10https://gerrit.wikimedia.org/r/577186 (owner: 10Giuseppe Lavagetto) [15:01:14] 10Operations, 10netops, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10ayounsi) Looks good! Instead of manually applying the profile to each interface I think we should refactor them and use interface-range like we do on access switches. That range coul... [15:01:50] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [15:01:50] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [15:01:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:34] (03CR) 10Jforrester: [C: 03+1] multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [15:03:40] (03PS1) 10Marostegui: Revert "install_server: Allow reimage db1078 and db2109" [puppet] - 10https://gerrit.wikimedia.org/r/577255 [15:04:12] (03CR) 10jerkins-bot: [V: 04-1] Revert "install_server: Allow reimage db1078 and db2109" [puppet] - 10https://gerrit.wikimedia.org/r/577255 (owner: 10Marostegui) [15:05:42] (03Abandoned) 10Marostegui: Revert "install_server: Allow reimage db1078 and db2109" [puppet] - 10https://gerrit.wikimedia.org/r/577255 (owner: 10Marostegui) [15:15:05] !log add SNMP community to Juniper devices [15:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:37] PROBLEM - Ensure local MW versions match expected deployment on mw1394 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:16:47] PROBLEM - Ensure local MW versions match expected deployment on mw1395 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:17:17] 👀 [15:17:21] uh? [15:17:31] PROBLEM - Ensure local MW versions match expected deployment on mw1402 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:17:33] <_joe_> don't worry, expected [15:17:41] <_joe_> don't freak out [15:17:42] should we downtime the alert? [15:17:45] :) [15:17:47] PROBLEM - Ensure local MW versions match expected deployment on mw1400 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:17:49] <_joe_> those servers are out of rotation [15:17:50] if it is 1x/appserver it will be a lot of spam [15:17:55] <_joe_> cdanis: no I'm going to solve it [15:17:56] ok [15:17:57] (03CR) 10Bstorm: [C: 03+2] toolforge: remove monitoring for old k8s cluster nodes and flannel etcd [puppet] - 10https://gerrit.wikimedia.org/r/576992 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [15:17:58] <_joe_> it's 10 servers [15:18:07] PROBLEM - Ensure local MW versions match expected deployment on mw1404 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:18:20] <_joe_> !log fixing the envoy installation on mw1394-1404, running scap pull [15:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:30] <_joe_> hnowlan: ^^ your check works! [15:18:43] rad! [15:18:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10627 and previous config saved to /var/cache/conftool/dbconfig/20200305-151858-marostegui.json [15:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:03] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [15:19:40] <_joe_> these are servers that mutante was setting up yesterday [15:19:49] PROBLEM - Ensure local MW versions match expected deployment on mw1399 is CRITICAL: CRITICAL: 304 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:20:01] RECOVERY - Check systemd state on mw1396 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:20:51] RECOVERY - Check systemd state on mw1395 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:21:47] RECOVERY - Check systemd state on mw1403 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:22:53] RECOVERY - Ensure local MW versions match expected deployment on mw1394 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:23:05] RECOVERY - Ensure local MW versions match expected deployment on mw1395 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:23:47] RECOVERY - Ensure local MW versions match expected deployment on mw1402 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:24:03] RECOVERY - Ensure local MW versions match expected deployment on mw1400 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:24:08] <_joe_> yay [15:24:14] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [15:24:14] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [15:24:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:23] RECOVERY - Ensure local MW versions match expected deployment on mw1404 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:25:59] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [15:26:00] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [15:26:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:03] RECOVERY - Ensure local MW versions match expected deployment on mw1399 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:09] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . [15:28:09] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . [15:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:49] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 6 others: Public EventGate instance and endpoint for analytics event intake: eventgate-analytics-external - https://phabricator.wikimedia.org/T233629 (10Ottomata) [15:37:36] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10JHedden) [15:38:16] (03PS1) 10Ottomata: Enable client side error logging on haw.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577260 (https://phabricator.wikimedia.org/T246030) [15:38:17] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10JHedden) [15:39:52] 10Operations: smartd not starting properly on gen9 + buster - https://phabricator.wikimedia.org/T246997 (10Marostegui) [15:40:22] ACKNOWLEDGEMENT - Check systemd state on db1078 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Marostegui T246997 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:41:10] (03PS1) 10Bstorm: tools-prometheus: removing material related to the legacy k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/577261 (https://phabricator.wikimedia.org/T246689) [15:44:03] (03PS1) 10Ppchelko: Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) [15:45:21] (03CR) 10jerkins-bot: [V: 04-1] Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [15:45:32] (03CR) 10Jforrester: All parsoid profiles use_php=true (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/577043 (owner: 10C. Scott Ananian) [15:47:14] (03PS2) 10Ppchelko: Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) [15:48:47] (03PS1) 10Ottomata: eventstreams - bump limits to 2000m cpu and 1900mi memory [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) [15:49:13] (03PS4) 10Ppchelko: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [15:49:53] (03PS2) 10Ottomata: eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) [15:50:37] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [15:50:44] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Implement logic to be able to perform full and incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) {F31667036} [15:51:28] (03CR) 10Ppchelko: "PS4 adds a Depends-On clause to the commit message and fixes the wgSessionCacheType failure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (owner: 10Krinkle) [15:51:29] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:52:38] Pchelolo: thx! [15:52:50] https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-deploy-2020.03.05/mediawiki?id=AXCrZN1dh3Uj6x1znNdN&_g=h@44136fa there's a lot of thse complaining about elasticsearch being down [15:53:07] Krinkle: that's a pretty awesome test btw [15:54:17] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1004 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [600.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [15:54:49] (03PS2) 10Bstorm: toolforge: remove old k8s client material for Jessie [puppet] - 10https://gerrit.wikimedia.org/r/576995 (https://phabricator.wikimedia.org/T246689) [15:56:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [15:56:34] Pchelolo: thx, I'm frankly surprised that in 10 years of this, we only have two violations of it. [15:57:02] with one of them added last month or so [15:57:35] liw: > Elastica\Exception\Connection\HttpException from line 189 of /srv/mediawiki/php-1.35.0-wmf.22/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php: Couldn't connect to host, Elasticsearch down? [15:57:53] Yikes, I didn't know elastic down could cause fatal errors. I thought it was contained. [15:58:18] do we have a list of hosts? [15:59:13] > _security [15:59:36] 10Operations: Enable SSO for Kibana - https://phabricator.wikimedia.org/T246998 (10MoritzMuehlenhoff) [15:59:59] ack [16:01:25] !log mw1394 (api_appserver) is fatalling search-related api requests due to "Elastic down?" [16:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:45] !log depool mw1394 [16:01:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:48] (03CR) 10Jhedden: [C: 03+1] tools-prometheus: removing material related to the legacy k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/577261 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [16:11:38] (03CR) 10Jforrester: "Neat." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 (owner: 10Krinkle) [16:12:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10629 and previous config saved to /var/cache/conftool/dbconfig/20200305-161222-marostegui.json [16:12:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:27] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [16:14:01] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:18:40] <_joe_> !log repooling mw1394 [16:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:09] RECOVERY - Check systemd state on mw1394 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:20:55] (03PS1) 10Elukey: Add xmldumps to stat100[4,5] [puppet] - 10https://gerrit.wikimedia.org/r/577278 (https://phabricator.wikimedia.org/T243934) [16:21:14] (03CR) 10CRusnov: tox: Support DNS_INCLUDE_DIR and generated DNS (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/569340 (https://phabricator.wikimedia.org/T243362) (owner: 10CRusnov) [16:21:17] RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1004 is OK: OK: Less than 20.00% above the threshold [300.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [16:22:06] (03CR) 10Elukey: [C: 03+1] cirrus: initial configuration of elastic20[55-60] [puppet] - 10https://gerrit.wikimedia.org/r/577250 (https://phabricator.wikimedia.org/T246975) (owner: 10Gehel) [16:22:13] !log Restart tendril/dbtree database [16:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:48] 10Operations, 10Mobile-Content-Service, 10Wikimedia-Logstash, 10observability, and 4 others: Move mobile apps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10Mholloway) a:03Mholloway [16:23:52] (03CR) 10Jforrester: [C: 03+1] tests: Remove outdated 'testExpressionListsNaming' test (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577025 (owner: 10Krinkle) [16:24:22] 10Operations, 10Mobile-Content-Service, 10Wikimedia-Logstash, 10observability, and 4 others: Move mobileapps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10Mholloway) [16:26:58] (03PS3) 10Ottomata: eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) [16:27:41] (03PS2) 10Krinkle: tests: Remove outdated 'testExpressionListsNaming' test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577025 [16:27:50] (03CR) 10Krinkle: [C: 03+2] tests: Remove outdated 'testExpressionListsNaming' test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577025 (owner: 10Krinkle) [16:27:56] (03PS2) 10Krinkle: multiversion: Remove Beta-hack from buildDBLists.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 [16:28:00] (03CR) 10Krinkle: [C: 03+2] multiversion: Remove Beta-hack from buildDBLists.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 (owner: 10Krinkle) [16:28:27] (03PS4) 10Gehel: mjolnir: Install python3.7 to older debian versions [puppet] - 10https://gerrit.wikimedia.org/r/576116 (owner: 10EBernhardson) [16:28:55] (03Merged) 10jenkins-bot: tests: Remove outdated 'testExpressionListsNaming' test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577025 (owner: 10Krinkle) [16:29:19] (03Merged) 10jenkins-bot: multiversion: Remove Beta-hack from buildDBLists.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577028 (owner: 10Krinkle) [16:30:05] (03PS1) 10Bstorm: toolforge-clush: correct the classifications and remove legacy k8s [puppet] - 10https://gerrit.wikimedia.org/r/577279 (https://phabricator.wikimedia.org/T246689) [16:31:16] (03CR) 10Gehel: [C: 03+2] mjolnir: Install python3.7 to older debian versions [puppet] - 10https://gerrit.wikimedia.org/r/576116 (owner: 10EBernhardson) [16:33:45] (03CR) 10Bstorm: "The general gist of this was to use the prefixes to form a hierarchical way of pooling clush commands. I broke this by using the sge pref" [puppet] - 10https://gerrit.wikimedia.org/r/577279 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [16:33:52] gehel, ebernhardson can you please wait for me or andrew before merging something related to hadoop packages common? The change seems good but it is a delicate profile for our infra :) [16:37:15] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Last rounds of comments, this looks ready to merge, some nitpicks inline." (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [16:40:17] (03CR) 10Alexandros Kosiaris: [C: 03+1] eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [16:40:30] 10Operations, 10fundraising-tech-ops, 10netops: DHCP routing issue with civi2001 - https://phabricator.wikimedia.org/T246812 (10Papaul) 05Stalled→03Resolved This is fixed. eth0 cable was connected to payments2002 eth1 [16:40:39] (03Abandoned) 10Herron: dns: add kibana-next and logstash-next service addresses [dns] - 10https://gerrit.wikimedia.org/r/556442 (owner: 10Herron) [16:41:07] 10Operations, 10DC-Ops, 10decommission: decommission WMF6141 (old payments2001.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246697 (10Papaul) [16:41:22] (03Abandoned) 10Herron: lvs: add entries for logstash-next and kibana-next [puppet] - 10https://gerrit.wikimedia.org/r/556443 (owner: 10Herron) [16:41:43] 10Operations, 10DC-Ops, 10decommission: decommission WMF6143 (old payments2002.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246698 (10Papaul) [16:42:04] 10Operations, 10DC-Ops, 10decommission: decommission WMF6142 (old payments2003.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246699 (10Papaul) [16:42:30] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission lvs2004.codfw.wmnet - https://phabricator.wikimedia.org/T246669 (10Papaul) [16:43:00] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission lvs2005.codfw.wmnet - https://phabricator.wikimedia.org/T246666 (10Papaul) [16:43:13] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Nice! +2ing and merging. Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/570162 (https://phabricator.wikimedia.org/T218733) (owner: 10Mholloway) [16:43:17] (03PS8) 10Alexandros Kosiaris: Add chart for mobileapps [deployment-charts] - 10https://gerrit.wikimedia.org/r/570162 (https://phabricator.wikimedia.org/T218733) (owner: 10Mholloway) [16:43:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1078 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10630 and previous config saved to /var/cache/conftool/dbconfig/20200305-164319-marostegui.json [16:43:23] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission lvs2006.codfw.wmnet - https://phabricator.wikimedia.org/T246329 (10Papaul) [16:43:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:37] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [16:43:46] (03CR) 10Jforrester: [C: 03+1] Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [16:45:01] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission lvs2003.codfw.wmnet - https://phabricator.wikimedia.org/T246334 (10Papaul) [16:45:09] 10Operations, 10Phabricator, 10Security-Team, 10Security: Adjust onboarding/offboarding logic to accommodate changes to #security (now acl*security) - https://phabricator.wikimedia.org/T245771 (10chasemp) >>! In T245771#5944399, @jbond wrote: > @chasemp can you please crate me a couple of test users so i c... [16:45:25] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission lvs2006.codfw.wmnet - https://phabricator.wikimedia.org/T246329 (10Papaul) [16:46:28] (03CR) 10Elukey: [C: 03+1] profile::java::analytics: Switch to apt::package_from_component [puppet] - 10https://gerrit.wikimedia.org/r/565567 (owner: 10Muehlenhoff) [16:47:03] (03PS3) 10Krinkle: dblists: Remove 'labtestwiki' dblist containing 'labtestwiki' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) [16:47:15] (03CR) 10Krinkle: [C: 03+2] dblists: Remove 'labtestwiki' dblist containing 'labtestwiki' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [16:47:54] (03CR) 10Elukey: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/576845 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [16:48:19] (03Merged) 10jenkins-bot: dblists: Remove 'labtestwiki' dblist containing 'labtestwiki' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577040 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [16:48:59] (03CR) 10EBernhardson: "test failure is only extra whitespace, no big deal. A common way to test puppet patches is the puppet compiler, see https://wikitech.wikim" [puppet] - 10https://gerrit.wikimedia.org/r/577031 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [16:49:12] Urbanecm: are you around monday/tuesday swat if i do a patch to remove expired throttle.php config? When’s best for you? [16:50:01] (03CR) 10CRusnov: [C: 03+1] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/576985 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [16:50:17] !log krinkle@deploy1001 Synchronized dblists/: I22a3c2a82f7be4a (duration: 00m 57s) [16:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:27] (03CR) 10Ottomata: [C: 03+2] eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [16:50:35] (03PS4) 10Ottomata: eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) [16:50:39] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventstreams - bump limits to 2000m cpu, 1900mi memory, 8 replicas [deployment-charts] - 10https://gerrit.wikimedia.org/r/577264 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [16:51:49] (03CR) 10Elukey: [C: 03+1] "The change looks very good to me, I am wondering if we care about old logs when we remove them or not. In theory no, like in this case, so" [puppet] - 10https://gerrit.wikimedia.org/r/576364 (https://phabricator.wikimedia.org/T242910) (owner: 10Jbond) [16:51:55] (03CR) 10EBernhardson: "Oh and for the kibana version, nothing special should need to be done. We have and apt repository that contains only 6.5.4 packages, it co" [puppet] - 10https://gerrit.wikimedia.org/r/577031 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [16:53:04] (03PS1) 10Muehlenhoff: Extend package list for HP package sync with ssaducli [puppet] - 10https://gerrit.wikimedia.org/r/577292 [16:54:27] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2359.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [16:54:38] (03PS4) 10Krinkle: multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) [16:54:44] (03CR) 10Krinkle: [C: 03+2] multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [16:54:56] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [16:54:56] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [16:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:44] (03Merged) 10jenkins-bot: multiversion: Make buildDBLists.php both create and delete dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577037 (https://phabricator.wikimedia.org/T223602) (owner: 10Krinkle) [16:55:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1078 after reimage to buster T246604', diff saved to https://phabricator.wikimedia.org/P10631 and previous config saved to /var/cache/conftool/dbconfig/20200305-165555-marostegui.json [16:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:59] T246604: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 [16:56:35] (03PS1) 10Alexandros Kosiaris: mobileapps: Switch to wmf named templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/577293 [16:56:37] (03PS1) 10Alexandros Kosiaris: mobileapps: Package version 0.0.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/577294 [16:57:18] (03CR) 10Dzahn: [C: 03+2] Check out parsoid from git on paroid::testing machines [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) (owner: 10C. Scott Ananian) [16:58:39] (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: Switch to wmf named templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/577293 (owner: 10Alexandros Kosiaris) [16:58:45] (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: Package version 0.0.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/577294 (owner: 10Alexandros Kosiaris) [16:58:53] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [16:58:53] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [16:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:05] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2360.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [17:00:31] (03CR) 10Elukey: [C: 03+2] admin: deprecate two old analytics posix groups [puppet] - 10https://gerrit.wikimedia.org/r/576845 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [17:00:59] (03CR) 10Dzahn: "Notice: /Stage[main]/Profile::Parsoid::Testing/Git::Clone[mediawiki/services/parsoid]/File[/srv/parsoid-testing]/mode: mode changed '2775'" [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) (owner: 10C. Scott Ananian) [17:03:49] (03CR) 10RLazarus: [C: 03+1] "We might be interested in logging 4xxs at the service level too, since some of them might reflect bugs in the client service rather than b" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/577187 (owner: 10Giuseppe Lavagetto) [17:08:29] (03PS1) 10Ottomata: eventstreams - Use kafka brokers in codfw in codfw k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/577296 (https://phabricator.wikimedia.org/T238658) [17:08:51] (03CR) 10Ottomata: [C: 03+2] eventstreams - Use kafka brokers in codfw in codfw k8s cluster [deployment-charts] - 10https://gerrit.wikimedia.org/r/577296 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [17:08:58] RhinosF1: I'd say anything is fine - just put it there, maybe note it's not testable :-) [17:09:25] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [17:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:45] 10Operations, 10Traffic, 10Wikimedia-Logstash, 10observability, and 3 others: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10Krinkle) [17:09:49] (03PS1) 10Elukey: Remove statistics-admins and statistics-web-admins from Analytics [puppet] - 10https://gerrit.wikimedia.org/r/577297 (https://phabricator.wikimedia.org/T243934) [17:10:42] (03CR) 10Dzahn: "Keep in mind the default with the git::clone is just "ensure: present" (it just clones once but does not keep pulling all the time). Not s" [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) (owner: 10C. Scott Ananian) [17:11:08] 10Operations, 10Traffic, 10Wikimedia-Logstash, 10observability, and 3 others: Changing Kibana filters is ridiculously slow - https://phabricator.wikimedia.org/T189333 (10Krinkle) This is still an issue. Editing Kibana dashboards: * In Safari, crashes the tab. * In Firefox, times out after 30 seconds, the... [17:11:26] (03CR) 10Elukey: [C: 03+2] Remove statistics-admins and statistics-web-admins from Analytics [puppet] - 10https://gerrit.wikimedia.org/r/577297 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [17:11:49] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [17:11:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:57] (03PS4) 10Alexandros Kosiaris: facilities:monitor_pdu_service: Add types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/576390 [17:14:22] (03PS1) 10Ayounsi: Nfacct, add export_proto_sysid [puppet] - 10https://gerrit.wikimedia.org/r/577298 (https://phabricator.wikimedia.org/T246186) [17:14:39] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [17:14:39] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [17:14:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:14:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:03] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [17:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:33] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "very nice!" [puppet] - 10https://gerrit.wikimedia.org/r/576485 (owner: 10RLazarus) [17:15:49] (03CR) 10Ayounsi: [C: 03+2] Nfacct, add export_proto_sysid [puppet] - 10https://gerrit.wikimedia.org/r/577298 (https://phabricator.wikimedia.org/T246186) (owner: 10Ayounsi) [17:16:41] elukey: sorry for that, I'll make sure to get your feedback next time ! [17:18:07] gehel: <3 np! [17:18:20] (03CR) 10Giuseppe Lavagetto: profile::services_proxy: use non-deprecated config format (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/577187 (owner: 10Giuseppe Lavagetto) [17:18:38] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2359.codfw.wmnet'] ` and were **ALL** successful. [17:19:16] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [17:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:19:39] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash: (Need by: 2020-03-06) rack/setup/install logstash102[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T240881 (10Cmjohnson) [17:21:22] (03CR) 10Alexandros Kosiaris: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21286/ says ok, merging" [puppet] - 10https://gerrit.wikimedia.org/r/576390 (owner: 10Alexandros Kosiaris) [17:22:57] (03CR) 10Alexandros Kosiaris: changeprop: Add nutcracker sidecar (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:22:58] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2360.codfw.wmnet'] ` and were **ALL** successful. [17:23:31] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2361.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [17:23:35] (03PS1) 10Krinkle: Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) [17:23:48] (03PS7) 10Alexandros Kosiaris: changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) [17:23:50] (03PS2) 10Alexandros Kosiaris: admin: Add redis databases for changeprop [deployment-charts] - 10https://gerrit.wikimedia.org/r/577239 (https://phabricator.wikimedia.org/T213193) [17:24:17] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [17:24:17] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [17:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:23] (03CR) 10Krinkle: "diffConfig job confirms there is no difference :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:25:38] (03CR) 10Krinkle: "@Daimona Does this look good to you?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:26:22] (03PS2) 10Krinkle: Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) [17:26:37] 10Operations, 10ops-codfw, 10serviceops: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2362.codfw.wmnet ` The log can be found in `/var/log/wmf-au... [17:26:45] (03CR) 10Alexandros Kosiaris: [C: 03+1] "> No errors, and the result looks correct, but it seems like it's diffing against an empty file instead of the old content -- maybe becaus" [puppet] - 10https://gerrit.wikimedia.org/r/576485 (owner: 10RLazarus) [17:27:04] (03CR) 10Daimona Eaytoy: [C: 03+1] "> @Daimona Does this look good to you?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:27:11] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [17:27:11] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [17:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:37] (03PS3) 10Krinkle: Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) [17:27:55] (03CR) 10Elukey: [C: 03+2] Remove old eventgate-analytics LVS port from Analyitcs VLAN firewall [homer/public] - 10https://gerrit.wikimedia.org/r/576873 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [17:28:00] (03CR) 10Krinkle: [C: 03+2] Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:28:19] (03PS1) 10Dzahn: add mw1385 through mw1392 as api and appservers [puppet] - 10https://gerrit.wikimedia.org/r/577304 (https://phabricator.wikimedia.org/T241849) [17:28:32] (03PS3) 10Krinkle: Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:28:34] (03PS3) 10Herron: elasticsearch: add max_clause_count setting [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) [17:28:56] (03PS1) 10Ayounsi: Nfacct, export proper field [puppet] - 10https://gerrit.wikimedia.org/r/577305 (https://phabricator.wikimedia.org/T246186) [17:29:38] (03CR) 10Hnowlan: [C: 03+1] changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:30:18] (03CR) 10Ayounsi: [C: 03+2] Nfacct, export proper field [puppet] - 10https://gerrit.wikimedia.org/r/577305 (https://phabricator.wikimedia.org/T246186) (owner: 10Ayounsi) [17:30:25] (03CR) 10Krinkle: [C: 03+2] Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:30:53] (03PS4) 10Herron: elasticsearch: add max_clause_count setting [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) [17:31:17] (03CR) 10jerkins-bot: [V: 04-1] Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:31:51] (03PS4) 10Krinkle: Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) [17:31:54] (03PS4) 10Krinkle: Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:32:02] (03CR) 10Krinkle: [C: 03+2] Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:32:07] (03CR) 10Krinkle: [C: 03+2] Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:32:51] !log run homer on cumin1001 to apply https://gerrit.wikimedia.org/r/576873 on cr1/cr2-eqiad [17:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:13] (03Merged) 10jenkins-bot: Resolve wmgUseGlobalAbuseFilters dblist ambiguity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577302 (https://phabricator.wikimedia.org/T246968) (owner: 10Krinkle) [17:33:24] (03Merged) 10jenkins-bot: Explicitly set wikitech wgSessionCacheType to kask-session [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577262 (https://phabricator.wikimedia.org/T246996) (owner: 10Ppchelko) [17:33:28] (03PS5) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) [17:33:39] (03PS6) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) [17:33:41] (03PS8) 10Alexandros Kosiaris: changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) [17:33:43] (03PS3) 10Alexandros Kosiaris: admin: Add redis databases for changeprop [deployment-charts] - 10https://gerrit.wikimedia.org/r/577239 (https://phabricator.wikimedia.org/T213193) [17:33:45] (03PS1) 10Alexandros Kosiaris: changeprop: Release 0.9.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/577307 (https://phabricator.wikimedia.org/T213193) [17:33:47] (03CR) 10Elukey: [C: 03+2] "To keep archives happy:" [homer/public] - 10https://gerrit.wikimedia.org/r/576873 (https://phabricator.wikimedia.org/T233629) (owner: 10Ottomata) [17:33:48] * Krinkle staging on mwdebug1002 [17:33:56] (03CR) 10Alexandros Kosiaris: "Thanks!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:34:11] 10Operations, 10ops-codfw, 10fundraising-tech-ops: new fundraising Buster servers - bonded ethernet network error/warning - https://phabricator.wikimedia.org/T246492 (10Jgreen) [17:34:43] (03CR) 10Alexandros Kosiaris: [C: 03+2] changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:34:47] (03CR) 10jerkins-bot: [V: 04-1] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) (owner: 10Krinkle) [17:34:55] (03CR) 10Alexandros Kosiaris: [C: 03+2] changeprop: Release 0.9.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/577307 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:35:08] (03Merged) 10jenkins-bot: changeprop: Add nutcracker sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/576827 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:35:17] (03Merged) 10jenkins-bot: changeprop: Release 0.9.8 [deployment-charts] - 10https://gerrit.wikimedia.org/r/577307 (https://phabricator.wikimedia.org/T213193) (owner: 10Alexandros Kosiaris) [17:35:21] (03PS7) 10Krinkle: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) [17:35:33] (03CR) 10Herron: "gehel how does this look to you in the larger context of all wmf ES instances?" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:36:58] (03PS5) 10Hnowlan: jobrunner: Standard mediawiki webserver configuration [puppet] - 10https://gerrit.wikimedia.org/r/576913 (https://phabricator.wikimedia.org/T246389) [17:37:26] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I8f0d82164, Iaac7cbfbb9 (no-op) (duration: 00m 59s) [17:37:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:47] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack - https://phabricator.wikimedia.org/T234854 (10herron) [17:38:32] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [17:38:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:52] 10Operations, 10netops, 10Patch-For-Review: can aggregated netflow data include the router it was sampled from? - https://phabricator.wikimedia.org/T246186 (10ayounsi) a:03elukey The new field is being exported properly to Kafka, see: `"peer_ip_src": "103.102.166.129"` In Turnilo it would be convenient to... [17:38:56] (03PS1) 10Dzahn: fake certificates for all remaining new eqiad and codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577308 (https://phabricator.wikimedia.org/T241849) [17:39:11] (03CR) 10Krinkle: [C: 03+2] tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) (owner: 10Krinkle) [17:39:35] (03CR) 10Gehel: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:40:18] (03PS6) 10Hnowlan: jobrunner: Standard mediawiki webserver configuration [puppet] - 10https://gerrit.wikimedia.org/r/576913 (https://phabricator.wikimedia.org/T246389) [17:40:20] (03Merged) 10jenkins-bot: tests: Assert there are no ambiguously tagged config values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577019 (https://phabricator.wikimedia.org/T246996) (owner: 10Krinkle) [17:40:59] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [17:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:39] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [17:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:13] (03CR) 10Herron: "> > Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:44:09] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [17:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:30] (03CR) 10EBernhardson: "perhaps i'm missing, but i don't see any justification in the ticket or this patch for why it needs 4k clauses in a boolean?" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:46:49] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2361.codfw.wmnet'] ` and were **ALL** successful. [17:46:56] (03CR) 10Papaul: [C: 03+1] fake certificates for all remaining new eqiad and codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577308 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [17:47:30] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2363.codfw.wmnet ` The log can be fou... [17:47:40] 10Operations, 10Analytics, 10User-Elukey: Refactor Analytics POSIX groups in puppet to improve maintainability - https://phabricator.wikimedia.org/T246578 (10elukey) [17:48:05] (03CR) 10EBernhardson: "I'm totally missing it, you had it in the last comment :) Which field expansion is doing this? Typically in search instead of searching 1" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:48:55] (03PS1) 10Elukey: Remove profiles from stat100[6,7]'s roles not used anymore [puppet] - 10https://gerrit.wikimedia.org/r/577309 (https://phabricator.wikimedia.org/T243934) [17:48:58] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2362.codfw.wmnet'] ` and were **ALL** successful. [17:49:03] (03CR) 10Ayounsi: [C: 03+1] "LGTM." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/576391 (https://phabricator.wikimedia.org/T239244) (owner: 10CRusnov) [17:49:20] (03CR) 10Dzahn: [V: 03+2 C: 03+2] fake certificates for all remaining new eqiad and codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577308 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [17:50:22] (03CR) 10Herron: "> I'm totally missing it, you had it in the last comment :) Which" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [17:50:44] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [17:50:44] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [17:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:58] (03CR) 10Ayounsi: "LGTM." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/576499 (https://phabricator.wikimedia.org/T241289) (owner: 10CRusnov) [17:51:43] (03CR) 10Ayounsi: [C: 03+1] reports/coherence.py: Add check for Juniper inventory item descriptions [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/576499 (https://phabricator.wikimedia.org/T241289) (owner: 10CRusnov) [17:54:14] (03CR) 10Ayounsi: "> Patch Set 2:" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/550051 (https://phabricator.wikimedia.org/T237464) (owner: 10CRusnov) [17:54:27] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2364.codfw.wmnet ` The log can be fou... [17:56:54] (03PS7) 10Hnowlan: jobrunner: Standard mediawiki webserver configuration [puppet] - 10https://gerrit.wikimedia.org/r/576913 (https://phabricator.wikimedia.org/T246389) [18:00:00] (03PS1) 10Ottomata: schema.wikimedia.org - show 'latest' files as text in browser [puppet] - 10https://gerrit.wikimedia.org/r/577311 (https://phabricator.wikimedia.org/T245859) [18:00:04] halfak and accraze: May I have your attention please! Services – Graphoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1800) [18:00:26] (03CR) 10Elukey: [C: 03+2] Remove profiles from stat100[6,7]'s roles not used anymore [puppet] - 10https://gerrit.wikimedia.org/r/577309 (https://phabricator.wikimedia.org/T243934) (owner: 10Elukey) [18:00:58] (03PS1) 10EBernhardson: Whitelist urls for inclusion in wikidata statements indexed to search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 [18:01:19] (03CR) 10jerkins-bot: [V: 04-1] Whitelist urls for inclusion in wikidata statements indexed to search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 (owner: 10EBernhardson) [18:02:28] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [18:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:50] (03CR) 10Ottomata: [C: 03+2] schema.wikimedia.org - show 'latest' files as text in browser [puppet] - 10https://gerrit.wikimedia.org/r/577311 (https://phabricator.wikimedia.org/T245859) (owner: 10Ottomata) [18:05:03] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [18:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:29] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [18:09:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:37] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2363.codfw.wmnet'] ` and were **ALL** successful. [18:11:41] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` mw2365.codfw.wmnet ` The log can be fou... [18:11:58] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [18:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:02] (03PS1) 10Dzahn: fix fake certificate names for new codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577314 (https://phabricator.wikimedia.org/T241852) [18:15:05] (03CR) 10EBernhardson: "For some context to perhaps help, elasticsearch used to have an auto-populated field called "_all" where content was copied for filtering " [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [18:18:18] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2364.codfw.wmnet'] ` and were **ALL** successful. [18:18:42] (03PS2) 10Dzahn: fix fake certificate names for new codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577314 (https://phabricator.wikimedia.org/T241852) [18:21:19] (03PS1) 10Ottomata: eventstreams - Use schema.wm.org stream schema documentation links [deployment-charts] - 10https://gerrit.wikimedia.org/r/577315 (https://phabricator.wikimedia.org/T238658) [18:21:30] (03CR) 10jerkins-bot: [V: 04-1] eventstreams - Use schema.wm.org stream schema documentation links [deployment-charts] - 10https://gerrit.wikimedia.org/r/577315 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [18:22:25] (03PS2) 10Ottomata: eventstreams - Use schema.wm.org stream schema documentation links [deployment-charts] - 10https://gerrit.wikimedia.org/r/577315 (https://phabricator.wikimedia.org/T238658) [18:24:03] (03CR) 10Ottomata: [C: 03+2] eventstreams - Use schema.wm.org stream schema documentation links [deployment-charts] - 10https://gerrit.wikimedia.org/r/577315 (https://phabricator.wikimedia.org/T238658) (owner: 10Ottomata) [18:24:05] (03PS3) 10Dzahn: fix fake certificate names for new codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577314 (https://phabricator.wikimedia.org/T241852) [18:24:45] (03CR) 10Cwhite: "I'm not sure I'm for this." [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [18:25:42] (03CR) 10Dzahn: [V: 03+2 C: 03+2] fix fake certificate names for new codfw appservers [labs/private] - 10https://gerrit.wikimedia.org/r/577314 (https://phabricator.wikimedia.org/T241852) (owner: 10Dzahn) [18:26:19] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [18:26:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:38] 10Operations, 10Security-Team, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10HMarcus) Thanks @chasemp @MoritzMuehlenhoff please confirm you can access the admin dashboard, and I will go ahead and close this task out. [18:27:01] (03PS2) 10EBernhardson: Whitelist urls for inclusion in wikidata statements indexed to search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 [18:27:20] (03PS3) 10EBernhardson: Whitelist urls for inclusion in wikidata statements indexed to search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 (https://phabricator.wikimedia.org/T243693) [18:27:29] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [18:27:29] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [18:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:49] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [18:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:11] (03PS1) 10Ayounsi: Sample all inbound traffic [homer/public] - 10https://gerrit.wikimedia.org/r/577316 (https://phabricator.wikimedia.org/T246618) [18:30:36] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [18:30:36] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [18:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:14] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [18:31:54] (03CR) 10Jforrester: "Sorry, was going to suggest doing this today; glad you got support to do it without needing me to remember. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/573558 (https://phabricator.wikimedia.org/T244549) (owner: 10Hnowlan) [18:32:00] (03CR) 10Subramanya Sastry: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/576990 (https://phabricator.wikimedia.org/T240055) (owner: 10C. Scott Ananian) [18:32:35] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw2365.codfw.wmnet'] ` and were **ALL** successful. [18:32:58] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' . [18:32:58] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . [18:33:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:07] (03PS3) 10Jforrester: Stop loading 'wikipedia-english', 'wikipedia-e-acute', 'wikipedia-cyrillic', 'wikipedia-devanagari' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575365 [18:35:03] 10Operations, 10Wikimedia-Logstash: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10herron) [18:36:00] (03PS2) 10Jcrespo: wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) [18:36:15] (03PS3) 10Jforrester: Stop defining 'wikipedia-english', 'wikipedia-e-acute', 'wikipedia-cyrillic', 'wikipedia-devanagari' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575366 [18:36:24] (03CR) 10jerkins-bot: [V: 04-1] wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) (owner: 10Jcrespo) [18:37:17] (03PS5) 10Herron: elasticsearch: add max_clause_count setting [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T247014) [18:37:45] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10herron) In testing I was able to work around this by increasing `indices.query.bo... [18:39:47] (03CR) 10Herron: "alrighty, error is producible once again at https://logstash-next.wikimedia.org, and tracking task is at T247014" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T247014) (owner: 10Herron) [18:39:50] jouncebot: now [18:39:50] For the next 0 hour(s) and 20 minute(s): Services – Graphoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1800) [18:39:53] (03PS3) 10Jcrespo: wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) [18:39:57] jouncebot: next [18:39:57] In 0 hour(s) and 20 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1900) [18:40:09] Right-o. [18:40:10] (03CR) 10Bstorm: "After fighting a bit with a clumsy typo on PCC, this seems to do the thing. Merging https://puppet-compiler.wmflabs.org/compiler1003/2129" [puppet] - 10https://gerrit.wikimedia.org/r/577261 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [18:40:14] (03CR) 10Bstorm: [C: 03+2] tools-prometheus: removing material related to the legacy k8s cluster [puppet] - 10https://gerrit.wikimedia.org/r/577261 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [18:41:12] (03PS2) 10ArielGlenn: New class for output file listing methods to move them out of jobs code [dumps] - 10https://gerrit.wikimedia.org/r/577228 (https://phabricator.wikimedia.org/T246465) [18:41:30] (03CR) 10jerkins-bot: [V: 04-1] New class for output file listing methods to move them out of jobs code [dumps] - 10https://gerrit.wikimedia.org/r/577228 (https://phabricator.wikimedia.org/T246465) (owner: 10ArielGlenn) [18:41:48] (03CR) 10RLazarus: [C: 03+1] add mw1385 through mw1392 as api and appservers [puppet] - 10https://gerrit.wikimedia.org/r/577304 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [18:43:34] (03CR) 10Jcrespo: "This is not in any way finished, but what would you think about leaving this as is for now, and I focusing on the incrementals part so we " [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) (owner: 10Jcrespo) [18:43:41] (03PS3) 10ArielGlenn: New class for output file listing methods to move them out of jobs code [dumps] - 10https://gerrit.wikimedia.org/r/577228 (https://phabricator.wikimedia.org/T246465) [18:44:53] (03PS4) 10Jcrespo: wmfbackups: Add new simple script to analyze dump row ids [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) [18:47:08] (03CR) 10Jcrespo: "For the record, analysis of a full enwiki backup on hds (no ssds) took 11 minutes." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/577224 (https://phabricator.wikimedia.org/T244884) (owner: 10Jcrespo) [18:50:43] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) For a Total of 86 mw servers , 71 are done and 15 left. The 15 left are waiting for space in row C in rack C3 [18:55:08] (03PS1) 10Andrew Bogott: codfw1dev: move all hosts to openstack queens [puppet] - 10https://gerrit.wikimedia.org/r/577319 [18:56:15] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [18:57:19] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [18:57:55] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10herron) Additionally, before moving on to `max_clause_count` I had experimented w... [19:00:03] Reedy: Nice work. [19:00:05] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T1900). [19:00:05] No GERRIT patches in the queue for this window AFAICS. [19:00:06] 10Operations, 10DC-Ops, 10decommission: decommission WMF6147 (old frpig2001.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246824 (10Papaul) ` [edit interfaces interface-range disabled] member "ge-[0-1]/0/3" { ... } + member "ge-[0-1]/0/13"; [edit interfaces interface-range vlan-listenerdmz]... [19:00:23] James_F: What did I break now? [19:00:40] Reedy: No no, meant seriously. Fixing the log key clashes. [19:00:50] 10Operations, 10DC-Ops, 10decommission: decommission WMF6147 (old frpig2001.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246824 (10Papaul) [19:00:59] I mean, if you give me a moment I'm sure I find /something/ to complain about. ;-) [19:01:16] :D [19:01:27] We both know "Nice work" can be a compliment or sarcasm... [19:02:30] Sorry, the lack of the sarcasm co-channel is a real weakness of textual communications. [19:03:30] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: TBD) rack/setup/install wdqs101[123].eqiad.wmnet - https://phabricator.wikimedia.org/T246352 (10Gehel) 05Resolved→03Open Re-opening since it looks (from the checklist above and from the status in netbox) that this isn't completed yet. [19:03:52] (03PS1) 10Elukey: jupyterhub: force systemd spawner to use the user.slice [puppet] - 10https://gerrit.wikimedia.org/r/577320 [19:05:54] looks like swat is empty...i'm going to add a patch anddeploy it [19:06:27] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for WhatamIdoing - https://phabricator.wikimedia.org/T247016 (10Whatamidoing-WMF) [19:07:07] (03CR) 10Ottomata: [C: 03+1] jupyterhub: force systemd spawner to use the user.slice [puppet] - 10https://gerrit.wikimedia.org/r/577320 (owner: 10Elukey) [19:07:25] (03CR) 10EBernhardson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 (https://phabricator.wikimedia.org/T243693) (owner: 10EBernhardson) [19:08:27] (03Merged) 10jenkins-bot: Whitelist urls for inclusion in wikidata statements indexed to search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577312 (https://phabricator.wikimedia.org/T243693) (owner: 10EBernhardson) [19:09:06] 10Operations, 10DC-Ops, 10decommission: decommission WMF6147 (old frpig2001.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246824 (10Papaul) [19:09:08] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/21299/notebook1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/577320 (owner: 10Elukey) [19:10:14] !log ebernhardson@deploy1001 Synchronized wmf-config/SearchSettingsForWikibase.php: (no justification provided) (duration: 00m 57s) [19:10:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:56] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Papaul) [19:15:17] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 145.4 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [19:17:47] 10Operations, 10netops, 10Wikimedia-Incident: Add graceful-restart to cr2-esams - https://phabricator.wikimedia.org/T246338 (10CDanis) a:05ayounsi→03CDanis [19:21:32] (03PS1) 10Gehel: wdqs: Initial configuration of wdqs200[78]. [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) [19:22:55] (03CR) 10Gehel: [C: 03+2] elasticsearch: return the cluster name in __str__ for ElasticsearchCluster [software/spicerack] - 10https://gerrit.wikimedia.org/r/576650 (owner: 10Elukey) [19:22:57] (03CR) 10Elukey: [C: 03+1] wdqs: Initial configuration of wdqs200[78]. [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) (owner: 10Gehel) [19:23:41] 10Operations, 10ops-codfw, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install 86 new codfw mw systems - https://phabricator.wikimedia.org/T241852 (10Dzahn) 05Open→03Stalled thanks Papaul for all the new servers. We'll continue with racking right after eqiad is done. We can do that on... [19:23:44] (03PS1) 10Gergő Tisza: Switch GrowthExperiments topic search to ORES [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577325 (https://phabricator.wikimedia.org/T240517) [19:24:03] 10Operations, 10ops-codfw, 10serviceops: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Papaul) [19:24:43] (03CR) 10DCausse: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/576967 (https://phabricator.wikimedia.org/T247014) (owner: 10Herron) [19:25:45] (03CR) 10Gehel: [C: 03+2] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/576641 (owner: 10Elukey) [19:29:24] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: load.php?modules=startup miss rate tripped on 2012-02-05 - https://phabricator.wikimedia.org/T247020 (10Krinkle) [19:29:30] 10Operations, 10serviceops: move all 86 new codfw appservers into production - https://phabricator.wikimedia.org/T247021 (10Dzahn) [19:30:24] ebernhardson: are you done SWATting? I have a last-minute addition [19:30:31] tgr: yup, all done [19:30:55] 10Operations, 10serviceops: move all 86 new codfw appservers into production - https://phabricator.wikimedia.org/T247021 (10Dzahn) Only 71 of 86 servers can be done until T247018 is resolved first because there is not enough rack space for them. [19:31:32] (03CR) 10Gergő Tisza: [C: 03+2] Enable ORES topic matching + remote search on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577060 (owner: 10Gergő Tisza) [19:32:25] (03Merged) 10jenkins-bot: Enable ORES topic matching + remote search on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577060 (owner: 10Gergő Tisza) [19:32:47] (03PS2) 10Gergő Tisza: Switch GrowthExperiments topic search to ORES [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577325 (https://phabricator.wikimedia.org/T240517) [19:32:58] (03CR) 10Gergő Tisza: [C: 03+2] Switch GrowthExperiments topic search to ORES [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577325 (https://phabricator.wikimedia.org/T240517) (owner: 10Gergő Tisza) [19:34:05] (03Merged) 10jenkins-bot: Switch GrowthExperiments topic search to ORES [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577325 (https://phabricator.wikimedia.org/T240517) (owner: 10Gergő Tisza) [19:37:07] (03PS2) 10Dzahn: add mw1385 through mw1392 as api and appservers [puppet] - 10https://gerrit.wikimedia.org/r/577304 (https://phabricator.wikimedia.org/T241849) [19:37:31] 10Operations, 10DC-Ops, 10decommission: decommission WMF6141 (old payments2001.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246697 (10Papaul) [19:39:57] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [19:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:01] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [19:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:08] 10Operations, 10serviceops, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install ` mw[1385-1392].eqi... [19:40:14] 10Operations, 10netops, 10Wikimedia-Incident: Add graceful-restart to cr2-esams - https://phabricator.wikimedia.org/T246338 (10CDanis) Will do this tonight, at or after 00:00 UTC. Currently there are just a few (AMSIX peer) BGP sessions down: `cdanis@re0.cr2-esams> show bgp summary | match "(Active|Idle|Con... [19:40:24] (03CR) 10Dzahn: [C: 03+2] add mw1385 through mw1392 as api and appservers [puppet] - 10https://gerrit.wikimedia.org/r/577304 (https://phabricator.wikimedia.org/T241849) (owner: 10Dzahn) [19:40:45] 10Operations, 10DC-Ops, 10decommission: decommission WMF6143 (old payments2002.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246698 (10Papaul) [19:42:10] (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev: move all hosts to openstack queens [puppet] - 10https://gerrit.wikimedia.org/r/577319 (owner: 10Andrew Bogott) [19:45:22] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:577325|Switch GrowthExperiments topic search to ORES (T240517)]] (duration: 00m 58s) [19:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:31] T240517: [EPIC] Growth: Newcomer tasks 1.1.1 (ORES topics) - https://phabricator.wikimedia.org/T240517 [19:46:51] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: re-sync for bug 236104 (duration: 00m 56s) [19:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:04] liw and Brennen: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Mediawiki train - European+American Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T2000). [20:08:01] jouncebot: now [20:08:01] For the next 0 hour(s) and 51 minute(s): Mediawiki train - European+American Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200305T2000) [20:11:10] (03PS1) 10Andrew Bogott: Add python3 version of designate/makedomain [puppet] - 10https://gerrit.wikimedia.org/r/577333 [20:11:53] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 76.69 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1 [20:11:59] (03PS2) 10Andrew Bogott: Install python3 version of designate/makedomain [puppet] - 10https://gerrit.wikimedia.org/r/577333 [20:12:26] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash: (Need by: 2020-03-06) rack/setup/install logstash102[6-9].eqiad.wmnet - https://phabricator.wikimedia.org/T240881 (10Cmjohnson) @herron do these need to be 10G? I racked them today in 1G racks [20:13:12] (03CR) 10Andrew Bogott: [C: 03+2] Install python3 version of designate/makedomain [puppet] - 10https://gerrit.wikimedia.org/r/577333 (owner: 10Andrew Bogott) [20:13:28] (03PS1) 10Mholloway: MachineVision: Update label blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577335 [20:17:51] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:52] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:18:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:55] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:18:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:19:02] 10Operations, 10serviceops, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 3:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install ` mw[1385-1392].eqi... [20:19:42] (03PS1) 10Andrew Bogott: Keystone: Include python3-mwclient [puppet] - 10https://gerrit.wikimedia.org/r/577336 [20:20:23] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:20:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:40] 10Operations, 10serviceops, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install ` mw[1385-1392].eqi... [20:21:46] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: Include python3-mwclient [puppet] - 10https://gerrit.wikimedia.org/r/577336 (owner: 10Andrew Bogott) [20:24:13] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: load.php?modules=startup miss rate tripped on 2012-02-05 - https://phabricator.wikimedia.org/T247020 (10Jdforrester-WMF) The `Accept-Encoding` changes seem most suspect? [20:25:36] (03PS1) 10Cmjohnson: Add production dns entries for logstash102[6-9] [dns] - 10https://gerrit.wikimedia.org/r/577337 (https://phabricator.wikimedia.org/T240881) [20:27:20] (03PS2) 10Cmjohnson: Add production dns entries for logstash102[6-9] [dns] - 10https://gerrit.wikimedia.org/r/577337 (https://phabricator.wikimedia.org/T240881) [20:27:49] (03CR) 10Cmjohnson: [C: 03+2] Add production dns entries for logstash102[6-9] [dns] - 10https://gerrit.wikimedia.org/r/577337 (https://phabricator.wikimedia.org/T240881) (owner: 10Cmjohnson) [20:29:33] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: load.php?modules=startup miss rate tripped on 2012-02-05 - https://phabricator.wikimedia.org/T247020 (10Krinkle) Indeed. I suspect that maybe it is now getting both gzip and no-gzip traffic and/or no longer sha... [20:30:13] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: load.php?modules=startup miss rate tripped on 2020-02-05 - https://phabricator.wikimedia.org/T247020 (10Mholloway) [20:32:32] 10Operations, 10MediaWiki-ResourceLoader, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: load.php?modules=startup miss rate trippled on 2020-02-05 - https://phabricator.wikimedia.org/T247020 (10Krinkle) [20:32:44] (03CR) 10Mholloway: [C: 03+2] MachineVision: Update label blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577335 (owner: 10Mholloway) [20:33:38] (03Merged) 10jenkins-bot: MachineVision: Update label blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577335 (owner: 10Mholloway) [20:37:21] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update label blacklist (duration: 00m 59s) [20:37:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:18] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Update label blacklist (once more for good measure) (duration: 00m 57s) [20:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:08] (03PS1) 10Andrew Bogott: Update keystone-wsgi-admin.py [puppet] - 10https://gerrit.wikimedia.org/r/577339 [20:42:56] 10Operations, 10DC-Ops, 10decommission: decommission WMF6142 (old payments2003.frack.codfw.wmnet) - https://phabricator.wikimedia.org/T246699 (10Papaul) [20:43:01] (03CR) 10jerkins-bot: [V: 04-1] Update keystone-wsgi-admin.py [puppet] - 10https://gerrit.wikimedia.org/r/577339 (owner: 10Andrew Bogott) [20:44:14] (03PS2) 10Andrew Bogott: Update keystone-wsgi-admin.py [puppet] - 10https://gerrit.wikimedia.org/r/577339 [20:45:31] 10Operations, 10Mobile-Content-Service, 10Wikimedia-Logstash, 10observability, and 4 others: Move mobileapps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10Mholloway) Hmm, is this still worth doing if mobileapps is finally moving to k8s soon (T218733)? Same question for Pro... [20:45:42] (03CR) 10Andrew Bogott: [C: 03+2] Update keystone-wsgi-admin.py [puppet] - 10https://gerrit.wikimedia.org/r/577339 (owner: 10Andrew Bogott) [20:45:45] * Krinkle testing on mwdebug1002 [20:47:09] 10Operations, 10Page Content Service, 10Wikimedia-Logstash, 10observability, and 4 others: Move mobileapps logging to new logging pipeline - https://phabricator.wikimedia.org/T219924 (10Mholloway) [20:54:58] (03CR) 10Mholloway: WIP: Add chart for chromium-render (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [20:55:12] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10colewhite) It looks like the issue has been run into before in the Beats family o... [20:56:26] (03CR) 10Nuria: "Thanks for the cleanup" [puppet] - 10https://gerrit.wikimedia.org/r/576845 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [20:59:53] (03PS10) 10Mholloway: Add chart for chromium-render [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [21:00:25] (03CR) 10Mholloway: Add chart for chromium-render (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [21:02:27] 10Operations, 10netops, 10Wikimedia-Incident: Investigate Juniper storm control - https://phabricator.wikimedia.org/T245192 (10Papaul) I created the interface range mgmt-switches, added interfaces ge-0/0/0 to ge-0/0/31 to it and bind the storm control profile wmf-mgmt-storm to it. ` papaul@msw1-codfw# show |... [21:09:43] (03PS1) 10Bstorm: toolschecker: Update to monitor the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/577341 (https://phabricator.wikimedia.org/T246689) [21:11:01] (03CR) 10Jforrester: [C: 03+1] Add chart for chromium-render [deployment-charts] - 10https://gerrit.wikimedia.org/r/557090 (https://phabricator.wikimedia.org/T238830) (owner: 10MSantos) [21:14:01] (03CR) 10Bstorm: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/21300/tools-checker-03.tools.eqiad.wmflabs/" [puppet] - 10https://gerrit.wikimedia.org/r/577341 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [21:14:49] (03CR) 10Bstorm: [C: 04-1] "I think this one will work I28c12ab4cb283d5ed49f4a554f4b9d73dd152105" [puppet] - 10https://gerrit.wikimedia.org/r/573738 (owner: 10BryanDavis) [21:21:39] Krinkle: Still testing? [21:24:42] 10Operations, 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, and 6 others: Public EventGate instance and endpoint for analytics event intake: eventgate-analytics-external - https://phabricator.wikimedia.org/T233629 (10Nuria) 05Open→03Resolved [21:26:05] James_F: no [21:26:57] (03PS1) 10RhinosF1: Remove expired throttle config from throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 [21:28:17] (03PS1) 10Andrew Bogott: keystone: use python2 mod_wsgi for pike but python3 for queens [puppet] - 10https://gerrit.wikimedia.org/r/577347 (https://phabricator.wikimedia.org/T242766) [21:31:05] (03CR) 10Andrew Bogott: [C: 03+2] keystone: use python2 mod_wsgi for pike but python3 for queens [puppet] - 10https://gerrit.wikimedia.org/r/577347 (https://phabricator.wikimedia.org/T242766) (owner: 10Andrew Bogott) [21:31:56] (03PS2) 10RhinosF1: Remove expired throttle config from throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 [21:35:23] (03PS3) 10RhinosF1: Remove expired throttle config from throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 [21:36:58] (03PS4) 10RhinosF1: Remove expired throttle config from throttle.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 [21:37:29] (03CR) 10RhinosF1: [C: 04-1] "Do not merge until all included have expired." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 (owner: 10RhinosF1) [21:37:47] (03PS1) 10Jhedden: toolforge: increase elasticsearch timeout on haproxy [puppet] - 10https://gerrit.wikimedia.org/r/577352 (https://phabricator.wikimedia.org/T236606) [21:40:53] !log rzl@cumin1001 START - Cookbook sre.hosts.downtime [21:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:58] !log rzl@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [21:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:04] 10Operations, 10serviceops, 10Patch-For-Review: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by rzl@cumin1001 on 9 host(s) and their services with reason: new install ` mw[1405-1413].eqiad... [21:41:09] (03CR) 10Jhedden: [C: 03+2] toolforge: increase elasticsearch timeout on haproxy [puppet] - 10https://gerrit.wikimedia.org/r/577352 (https://phabricator.wikimedia.org/T236606) (owner: 10Jhedden) [21:41:11] (03CR) 10RLazarus: [C: 03+2] site: Assign appservers and API servers in eqiad row C. [puppet] - 10https://gerrit.wikimedia.org/r/576966 (https://phabricator.wikimedia.org/T241849) (owner: 10RLazarus) [21:41:25] (03PS5) 10RLazarus: site: Assign appservers and API servers in eqiad row C. [puppet] - 10https://gerrit.wikimedia.org/r/576966 (https://phabricator.wikimedia.org/T241849) [21:42:36] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10herron) Thank you @gehel @EBernhardson @dcausse @colewhite for looking at this!... [21:44:01] Urbanecm: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1859265&oldid=1859244 - that look okay? [21:56:10] (03PS4) 10Jforrester: Stop loading 'wikipedia-english', 'wikipedia-e-acute', 'wikipedia-cyrillic', 'wikipedia-devanagari' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575365 [21:56:11] OK, I'm going to try this again. [21:56:20] (03CR) 10Jforrester: [C: 03+2] Stop loading 'wikipedia-english', 'wikipedia-e-acute', 'wikipedia-cyrillic', 'wikipedia-devanagari' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575365 (owner: 10Jforrester) [21:57:22] (03Merged) 10jenkins-bot: Stop loading 'wikipedia-english', 'wikipedia-e-acute', 'wikipedia-cyrillic', 'wikipedia-devanagari' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575365 (owner: 10Jforrester) [21:57:33] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) >>! In T241719#5943938, @ArielGlenn wrote: > Hey @Krenair, I don't have to have it right now, but I might need it again in the future. Basica... [22:00:15] (03CR) 10CDanis: [C: 03+1] "lgtm, thanks!" [homer/public] - 10https://gerrit.wikimedia.org/r/577316 (https://phabricator.wikimedia.org/T246618) (owner: 10Ayounsi) [22:01:32] !log jforrester@deploy1001 Synchronized multiversion/MWConfigCacheGenerator.php: Stop loading four old logo dblists (duration: 00m 59s) [22:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:49] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: ELK7 shards failed errors when loading saved objects, e.g. "field expansion matches too many fields, limit: 1024, got: 1726" - https://phabricator.wikimedia.org/T247014 (10colewhite) I have concerns about re-implementing the _all field given that it is... [22:02:13] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [22:02:19] OK, with that done, let's give some wikis some custom logos finally. [22:02:55] (03PS4) 10Jforrester: Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556212 (owner: 10TechneSiyam) [22:03:05] (03PS5) 10Jforrester: Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556212 (owner: 10TechneSiyam) [22:03:15] (03CR) 10Jforrester: [C: 03+2] Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556212 (owner: 10TechneSiyam) [22:04:10] (03Merged) 10jenkins-bot: Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556212 (owner: 10TechneSiyam) [22:05:40] !log rzl@cumin1001 START - Cookbook sre.hosts.downtime [22:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:49] !log rzl@cumin1001 END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) [22:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:59] (03CR) 10Ottomata: "Alex, this is the first step eh? Maybe on Monday we can just add one k8s node in each DC as a backend here?" [puppet] - 10https://gerrit.wikimedia.org/r/566771 (https://phabricator.wikimedia.org/T238658) (owner: 10Alexandros Kosiaris) [22:07:33] (03PS3) 10Jforrester: [ukwikivoyage] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556214 (owner: 10TechneSiyam) [22:07:33] !log rzl@cumin1001 START - Cookbook sre.hosts.downtime [22:07:35] (03PS1) 10Jforrester: [bnwikibooks] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577355 [22:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:07:37] (03PS1) 10Jforrester: [bnwikisource] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577356 [22:09:07] !log jforrester@deploy1001 Synchronized static/images/project-logos/: Provide HD logos for bnwikibooks, bnwikisource, and ukwikivoyage (duration: 01m 00s) [22:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:23] (03CR) 10Jforrester: [C: 03+2] [bnwikibooks] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577355 (owner: 10Jforrester) [22:09:40] (03CR) 10Jforrester: [C: 03+2] [bnwikisource] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577356 (owner: 10Jforrester) [22:09:44] (03CR) 10Jforrester: [C: 03+2] [ukwikivoyage] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556214 (owner: 10TechneSiyam) [22:10:16] James_F: can you do me a favour and comment “recheck” on https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/577346/ to trigger jenkins [22:10:25] (03Merged) 10jenkins-bot: [bnwikibooks] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577355 (owner: 10Jforrester) [22:10:29] (03CR) 10Jforrester: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577346 (owner: 10RhinosF1) [22:10:32] Done. [22:10:41] !log rzl@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [22:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:47] 10Operations, 10serviceops: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by rzl@cumin1001 on 9 host(s) and their services with reason: new install ` mw[1405-1413].eqiad.wmnet ` [22:10:50] (03Merged) 10jenkins-bot: [bnwikisource] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577356 (owner: 10Jforrester) [22:10:53] (03Merged) 10jenkins-bot: [ukwikivoyage] Set HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556214 (owner: 10TechneSiyam) [22:10:58] Thanks James_F [22:12:03] (03PS4) 10Jforrester: Added betawikiversity and ukwikinews hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556863 (owner: 10TechneSiyam) [22:12:49] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use HD logos at bnwikibooks, bnwikisource, and ukwikivoyage (duration: 00m 59s) [22:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:26] (03CR) 10Jforrester: [C: 04-1] "Is the logo for betawikiversity intentionally different, or is it just drift?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556863 (owner: 10TechneSiyam) [22:14:35] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 59s) [22:14:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:14:50] (03PS4) 10Jforrester: Update three logos with more detailed versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555620 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:14:55] (03PS1) 10Andrew Bogott: Openstack glance queens: override package-installed init script [puppet] - 10https://gerrit.wikimedia.org/r/577358 (https://phabricator.wikimedia.org/T242766) [22:15:04] (03CR) 10Jforrester: [C: 03+2] Update three logos with more detailed versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555620 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:15:28] (03PS3) 10Jforrester: [arwikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555629 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:15:33] (03PS2) 10Jforrester: [cawikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576453 (https://phabricator.wikimedia.org/T150618) [22:15:48] (03PS2) 10Jforrester: [plwikivoyage] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576454 (https://phabricator.wikimedia.org/T150618) [22:16:04] (03CR) 10jerkins-bot: [V: 04-1] Openstack glance queens: override package-installed init script [puppet] - 10https://gerrit.wikimedia.org/r/577358 (https://phabricator.wikimedia.org/T242766) (owner: 10Andrew Bogott) [22:16:13] (03Merged) 10jenkins-bot: Update three logos with more detailed versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555620 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:16:52] 10Operations, 10DNS, 10Technical blog, 10Traffic, 10cloud-services-team (Kanban): Setup DNS to direct techblog.wikimedia.org to new Wordpress VIP hosting - https://phabricator.wikimedia.org/T246507 (10bd808) Upstream instructions https://wpvip.com/documentation/vip-go/managing-domains-and-dns/#external-d... [22:17:06] (03CR) 10Jforrester: [C: 03+2] [arwikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555629 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:17:13] (03CR) 10Jforrester: [C: 03+2] [cawikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576453 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:17:24] (03CR) 10Jhedden: [C: 03+1] Openstack glance queens: override package-installed init script [puppet] - 10https://gerrit.wikimedia.org/r/577358 (https://phabricator.wikimedia.org/T242766) (owner: 10Andrew Bogott) [22:17:42] (03CR) 10Jforrester: [C: 03+2] [plwikivoyage] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576454 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:17:51] !log jforrester@deploy1001 Synchronized static/images/project-logos/: Provide HD logos for arwikibooks, cawikibooks, and plwikivoyage (duration: 01m 00s) [22:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:12] (03Merged) 10jenkins-bot: [arwikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555629 (https://phabricator.wikimedia.org/T150618) (owner: 10Bjornskjald) [22:18:18] (03PS2) 10Andrew Bogott: Openstack glance queens: override package-installed init script [puppet] - 10https://gerrit.wikimedia.org/r/577358 (https://phabricator.wikimedia.org/T242766) [22:18:24] (03Merged) 10jenkins-bot: [cawikibooks] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576453 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:19:07] (03Merged) 10jenkins-bot: [plwikivoyage] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576454 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:19:09] (03CR) 10Jforrester: "Added in 2ad7adcab8816a0c018a008cf1e1a8185cc7b3c4 but not used?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560386 (https://phabricator.wikimedia.org/T150618) (owner: 10Subscriptshoe9) [22:20:44] (03CR) 10Andrew Bogott: [C: 03+2] Openstack glance queens: override package-installed init script [puppet] - 10https://gerrit.wikimedia.org/r/577358 (https://phabricator.wikimedia.org/T242766) (owner: 10Andrew Bogott) [22:20:51] (03PS13) 10Jforrester: [cywikiquote] Add custom logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560386 (https://phabricator.wikimedia.org/T150618) (owner: 10Subscriptshoe9) [22:21:36] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use HD logos at arwikibooks, cawikibooks, and plwikivoyage (duration: 00m 59s) [22:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:42] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 59s) [22:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:55] (03CR) 10BryanDavis: [C: 03+1] toolschecker: Update to monitor the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/577341 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [22:23:14] (03PS2) 10Jforrester: [fawikivoyage] Add custom logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576458 (https://phabricator.wikimedia.org/T150618) [22:23:20] (03CR) 10Jforrester: [C: 03+2] [fawikivoyage] Add custom logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576458 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:23:44] (03CR) 10Jforrester: [C: 04-1] "This logo is the same as the currently-configured (default) one. Let's just use that?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/560386 (https://phabricator.wikimedia.org/T150618) (owner: 10Subscriptshoe9) [22:23:52] (03CR) 10Jforrester: [C: 04-1] "This logo is the same as the currently-configured (default) one. Let's just use that?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576457 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:23:54] (03CR) 10Bstorm: [C: 03+2] toolschecker: Update to monitor the new etcd cluster [puppet] - 10https://gerrit.wikimedia.org/r/577341 (https://phabricator.wikimedia.org/T246689) (owner: 10Bstorm) [22:23:56] (03Abandoned) 10BryanDavis: toolschecker: Add tools-k8s-etcd-[456] to checks [puppet] - 10https://gerrit.wikimedia.org/r/573738 (owner: 10BryanDavis) [22:24:28] (03Merged) 10jenkins-bot: [fawikivoyage] Add custom logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576458 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:24:34] James_F: do you know if top6-wikipedia is needed? [22:25:07] looks like that might be delete that as well [22:25:23] Krinkle: I think it's used for reports. [22:25:50] The problem with dblists is that we can never know if they're part of some foreachwikiindblist job. [22:25:52] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [fawikivoyage] Add custom logos (duration: 00m 58s) [22:25:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:08] (03CR) 10Jforrester: [C: 04-1] [betawikiversity] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556864 (https://phabricator.wikimedia.org/T150618) (owner: 10TechneSiyam) [22:27:08] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 56s) [22:27:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:28:30] James_F: hm.. ok, but at least we can not load it on web requests [22:28:36] in fact, that could use another test [22:28:40] Yes. [22:30:59] (03PS5) 10Jforrester: [ukwikinews] Provide HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556863 (owner: 10TechneSiyam) [22:31:01] (03PS2) 10Jforrester: [ukwikinews] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576455 (https://phabricator.wikimedia.org/T150618) [22:31:03] (03PS4) 10Jforrester: [betawikiversity] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556864 (https://phabricator.wikimedia.org/T150618) (owner: 10TechneSiyam) [22:31:05] (03PS1) 10Jforrester: [betawikiversity] Provide HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577362 [22:31:30] (03PS1) 10CDanis: depool esams for cr2 router maintenance [dns] - 10https://gerrit.wikimedia.org/r/577363 (https://phabricator.wikimedia.org/T246338) [22:31:48] (03CR) 10CDanis: [C: 04-2] "doing this around/after 00:00 UTC" [dns] - 10https://gerrit.wikimedia.org/r/577363 (https://phabricator.wikimedia.org/T246338) (owner: 10CDanis) [22:31:50] (03CR) 10Jforrester: [C: 04-1] "This logo is wrong." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577362 (owner: 10Jforrester) [22:32:04] (03CR) 10Jforrester: [C: 03+2] "Split off the problematic logo." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556863 (owner: 10TechneSiyam) [22:33:04] (03Merged) 10jenkins-bot: [ukwikinews] Provide HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/556863 (owner: 10TechneSiyam) [22:33:57] (03CR) 10Jforrester: [C: 03+2] [ukwikinews] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576455 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:34:40] !log jforrester@deploy1001 Synchronized static/images/project-logos/: [ukwikinews] Provide HD logos (duration: 00m 59s) [22:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:34:49] (03Merged) 10jenkins-bot: [ukwikinews] Add HD logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/576455 (https://phabricator.wikimedia.org/T150618) (owner: 10Jforrester) [22:36:36] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [ukwikinews] Add HD logos (duration: 00m 59s) [22:36:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:04] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s) [22:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:19] James_F: when convenient, can I have the prod conch for a couple of minutes to pool a handful of new appservers? won't take long [22:38:28] rlazarus: It's yours. [22:38:33] thanks! [22:39:08] jouncebot: now [22:39:08] No deployments scheduled for the next 1 hour(s) and 20 minute(s) [22:40:01] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [22:40:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:05] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [22:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:09] !log rzl@cumin1001 conftool action : set/weight=30; selector: name=mw14(0[5-9]|1[0-2]).eqiad.wmnet [22:40:09] 10Operations, 10serviceops: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10ops-monitoring-bot) Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install ` mw[1385-1392].eqiad.wmnet ` [22:40:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:16] (03PS1) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [22:40:26] !log rzl@cumin1001 conftool action : set/pooled=yes; selector: name=mw14(0[5-9]|1[0-2]).eqiad.wmnet [22:40:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:32] James_F: done, thanks! [22:40:45] !log [cumin1001:~] $ sudo -i cumin -b 15 'mw13[85-92].eqiad.wmnet' 'sudo -u dzahn scap pull' [22:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:41:51] !log reimaging mw1413 (new appserver, not pooled) to test https://gerrit.wikimedia.org/r/c/576464 [22:41:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:41:58] !log dzahn@cumin1001 conftool action : set/weight=20; selector: name=mw138[5-9]eqiad.wmnet [22:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:09] (03PS2) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [22:42:33] (03PS7) 10Jforrester: Scap: update-interwiki-cache for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446507 (https://phabricator.wikimedia.org/T198844) (owner: 10Thcipriani) [22:42:38] !log dzahn@cumin1001 conftool action : set/weight=20; selector: name=mw139[0-2]eqiad.wmnet [22:42:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:44] (03CR) 10Jforrester: [C: 03+2] "It's not worse." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446507 (https://phabricator.wikimedia.org/T198844) (owner: 10Thcipriani) [22:43:44] (03Merged) 10jenkins-bot: Scap: update-interwiki-cache for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446507 (https://phabricator.wikimedia.org/T198844) (owner: 10Thcipriani) [22:44:12] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw138[5-9]eqiad.wmnet [22:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:18] !log dzahn@cumin1001 conftool action : set/weight=20; selector: name=mw139[0-2].eqiad.wmnet [22:46:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:37] (03PS3) 10Jforrester: scap: Add Python 3 support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566411 (owner: 10Legoktm) [22:46:39] (03PS1) 10Krinkle: multiversion: Introduce MWMultiVersion::SUFFIXES constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577366 [22:46:43] (03CR) 10Jforrester: [C: 03+2] scap: Add Python 3 support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566411 (owner: 10Legoktm) [22:46:45] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw139[0-2].eqiad.wmnet [22:46:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:00] (03PS3) 10Jforrester: scap: Clean up unused build configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 (owner: 10Legoktm) [22:47:06] (03CR) 10Jforrester: [C: 03+2] scap: Clean up unused build configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 (owner: 10Legoktm) [22:47:10] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw138[5-9].eqiad.wmnet [22:47:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:31] (03CR) 10jerkins-bot: [V: 04-1] multiversion: Introduce MWMultiVersion::SUFFIXES constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577366 (owner: 10Krinkle) [22:47:42] (03Merged) 10jenkins-bot: scap: Add Python 3 support [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566411 (owner: 10Legoktm) [22:47:55] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw138[5-9].eqiad.wmnet [22:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:48:10] (03Merged) 10jenkins-bot: scap: Clean up unused build configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/566412 (owner: 10Legoktm) [22:50:19] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw139[0-2].eqiad.wmnet [22:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:50:52] !log added 8 new appservers to pool in eqiad [22:51:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:51:09] (03PS3) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [22:52:22] (03Abandoned) 10Jforrester: Added betawikiversity,hiwikibooks,ukwikinews hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558103 (owner: 10TechneSiyam) [22:52:27] (03Abandoned) 10Jforrester: Modified IS.php with betawikiveristy,ukwikines,hiwikibooks under wghdlogos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/558106 (owner: 10TechneSiyam) [22:52:49] (03Abandoned) 10Jforrester: Added bnwikibooks,bnwikisource,ukwikivoyage under wiki hd logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/557053 (owner: 10TechneSiyam) [22:53:43] (03CR) 10jerkins-bot: [V: 04-1] Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [22:55:41] 10Operations, 10serviceops: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) ` {"mw1385.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"} {"mw1385.eqiad.wmnet": {"weight": 30, "pooled": "yes... [22:55:44] (03PS2) 10Krinkle: multiversion: Introduce MWMultiVersion::SUFFIXES constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577366 [22:55:56] (03PS4) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [22:56:28] (03PS1) 10BryanDavis: techblog.wikimedia.org: Reduce TTL [dns] - 10https://gerrit.wikimedia.org/r/577370 (https://phabricator.wikimedia.org/T246507) [22:56:30] (03PS1) 10BryanDavis: techblog.wikimedia.org: Point at upstream service provider [dns] - 10https://gerrit.wikimedia.org/r/577371 (https://phabricator.wikimedia.org/T246507) [22:56:51] (03CR) 10BryanDavis: [C: 04-1] techblog.wikimedia.org: Point at upstream service provider [dns] - 10https://gerrit.wikimedia.org/r/577371 (https://phabricator.wikimedia.org/T246507) (owner: 10BryanDavis) [22:57:37] (03CR) 10Jforrester: [C: 03+1] multiversion: Introduce MWMultiVersion::SUFFIXES constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577366 (owner: 10Krinkle) [22:57:58] (03PS5) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [22:58:12] (03PS2) 10BryanDavis: techblog.wikimedia.org: Point at upstream service provider [dns] - 10https://gerrit.wikimedia.org/r/577371 (https://phabricator.wikimedia.org/T246507) [22:58:15] 10Operations, 10serviceops: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) [22:58:16] Krinkle: So… deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/443866 feels like something big could go bang, but "should" be a no-op. Thoughts? [22:59:24] James_F: eh yeah, prefer not. [22:59:29] what do we want to solve? [22:59:47] Krinkle: Anything exceptionally old lying around in cache. [22:59:52] we don't rely on cache epoch for expiration or ttl afaik. [23:00:03] Don't we use it for parser cache? [23:00:10] not anymore anyway. would be good to inventorise where it is used. [23:00:15] Yeah. [23:00:19] 10Operations, 10serviceops: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Dzahn) 05Open→03Resolved The one exception (1413 at the bottom) is currently being used for a test. Everything else is pooled with weight 30 and alternating between... [23:00:25] James_F: right, it's a way to override that, but parser cache's actual ttl is far shorter than that. it's 30 days. [23:00:36] so unless set to < 30d it's a no-op for parser cache [23:00:42] not worries about PC [23:00:45] worried* [23:00:48] (03PS6) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:01:14] !log rzl@cumin1001 START - Cookbook sre.hosts.downtime [23:01:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:01:19] until recently, RL used it as salt for all its cache versions and stuff and used it by string identity which meant even a second different or backwards would invalidate all caches [23:01:22] (03PS7) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:01:22] Aha. FlaggedRevs uses it directly. Of course it does. [23:01:24] * James_F sighs. [23:01:25] that's fixed as of last year [23:01:29] (RL) [23:01:30] https://gerrit.wikimedia.org/g/mediawiki/extensions/FlaggedRevs/+/7709d5de005fa822d1e178af4e5eebf6b706724d/backend/FRParserCacheStable.php [23:01:53] I'd actually prefer it be deprecated as it's too powerful and vague at the same time [23:02:03] * James_F nods. [23:02:05] Krinkle: https://codesearch.wmflabs.org/deployed/?q=(wg)%3F(Cache|Thumbnail)Epoch [23:02:41] (03CR) 10BryanDavis: [C: 04-1] "This needs to wait until we are ready to make the upstream blog "live"." [dns] - 10https://gerrit.wikimedia.org/r/577371 (https://phabricator.wikimedia.org/T246507) (owner: 10BryanDavis) [23:02:47] (03CR) 10Andrew Bogott: "pcc run: https://puppet-compiler.wmflabs.org/compiler1003/21307/" [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [23:03:45] !log rzl@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [23:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:03:49] James_F: looks like FileCache (third-party on disk Varnish-light basically) usees it as its only ttl. [23:03:58] That could probably do with a dedicated config var as minimum change [23:04:14] and also preferably something dynamic like a normal ttl instead of fixed date [23:04:18] interesting how that works [23:04:32] wgThumbnailEpoch is interesting. I wonder if that still does anything for WMF [23:04:58] Probably. We've not changed much about media handling. [23:05:05] well, there's thumbor [23:05:15] mediawiki no longer gets called for most thumbnail requests [23:05:19] Front-end, I mean. [23:05:33] straight from cdn to thumbor and then back via swift. mw doesn't get to decide anything there [23:05:35] but... [23:05:38] there's thumb.php [23:05:41] and api rotate [23:05:57] Yeah. [23:06:00] which do provide a rare entry point for calling thumbor proxied from inside MW php [23:06:15] might be that this does something in that case, and when we change it we'll probably find out some bot is using that a lot [23:06:47] That'd be un-fun. [23:06:56] so.. yeah, without a specific bug or evidence of something old being stored I'd lean towards not touching .. [23:07:18] (03CR) 10Jhedden: Neutron: override the default neutron init script from Queens (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [23:07:28] the number of search results there is lower than I feared [23:07:40] could be doable to fix that all up [23:07:41] (03CR) 10Jforrester: [C: 04-1] " so.. yeah, without a specific bug or evidence of something old being stored I'd lean towards not touching .." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443866 (owner: 10Reedy) [23:07:52] sorry R.eedy [23:08:17] I never broke anything bumping it before ;p [23:08:38] I doubt bumping it to 2018 would make any difference at this point either [23:09:46] Krinkle: Filed T247040 for dumping it. [23:09:47] T247040: Consider replacing wgCacheEpoch and wgThumbnailEpoch with more specific flags - https://phabricator.wikimedia.org/T247040 [23:10:02] Reedy: How about you merge it whilst Krinkle and I aren't looking, then? ;-) [23:10:11] lols [23:10:21] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [23:10:35] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [23:11:18] 10Operations, 10DNS, 10Technical blog, 10Traffic, and 2 others: Setup DNS to direct techblog.wikimedia.org to new Wordpress VIP hosting - https://phabricator.wikimedia.org/T246507 (10bd808) [23:14:28] 10Operations, 10cloud-services-team (Kanban): Migrate remaining self-hosted puppet masters to Puppet 5 / facter 3 - https://phabricator.wikimedia.org/T241719 (10Krenair) [23:16:20] (03PS1) 10BryanDavis: redirects: Remove redirect handling for techblog.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/577373 (https://phabricator.wikimedia.org/T246507) [23:16:43] (03PS8) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:17:18] (03PS1) 10Krinkle: tests: Remove "wiki-suffix disambiguation" dblist structure test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577374 [23:17:20] (03PS1) 10Krinkle: tests: Move MWWikiversionsTest out of dblistTest.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577375 [23:18:31] 10Operations, 10DNS, 10Technical blog, 10Traffic, and 2 others: Setup DNS to direct techblog.wikimedia.org to new Wordpress VIP hosting - https://phabricator.wikimedia.org/T246507 (10bd808) [23:19:17] (03CR) 10jerkins-bot: [V: 04-1] redirects: Remove redirect handling for techblog.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/577373 (https://phabricator.wikimedia.org/T246507) (owner: 10BryanDavis) [23:19:39] (03CR) 10jerkins-bot: [V: 04-1] Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [23:19:57] (03PS9) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:21:19] (03PS3) 10Krinkle: multiversion: Introduce MWMultiVersion::SUFFIXES constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577366 [23:21:22] (03PS1) 10Krinkle: wmf-config: Document wgConf.php load order [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577376 [23:21:23] (03PS2) 10Krinkle: tests: Remove "wiki-suffix disambiguation" dblist structure test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577374 [23:21:28] (03PS2) 10Krinkle: tests: Move MWWikiversionsTest out of dblistTest.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577375 [23:21:46] (03PS10) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:23:02] (03PS11) 10Andrew Bogott: Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 [23:24:29] (03CR) 10Andrew Bogott: "latest pcc: https://puppet-compiler.wmflabs.org/compiler1002/21311/" [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [23:26:45] !log mw1413 test-reimage completed successfully, pooling [23:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:51] !log rzl@cumin1001 conftool action : set/weight=30; selector: name=mw1413.eqiad.wmnet [23:27:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:07] !log rzl@cumin1001 conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet [23:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:29] (03CR) 10Jhedden: [C: 03+1] Neutron: override the default neutron init script from Queens [puppet] - 10https://gerrit.wikimedia.org/r/577365 (owner: 10Andrew Bogott) [23:41:32] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [23:41:33] 10Operations, 10Cloud-Services: During labservices1001 failover fqdn changed from foo.project.eqiad.wmflabs to foo.eqiad.wmflabs - https://phabricator.wikimedia.org/T163823 (10Krenair) is it still worth having this task open? [23:41:38] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [23:50:45] 10Operations: Migrate Cumin hosts to Buster - https://phabricator.wikimedia.org/T245114 (10Krenair) Noticed in T236576 that cumin is not packaged for buster. Is that going to change soon or should I make a new stretch instance? [23:55:13] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet [23:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:50] !log pooled mw2290 - noticed it was the only API appserver in codfw not pooled but did not see why, fine in Icinga and no open tickets/SAL [23:55:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:57:04] 10Operations, 10Performance-Team, 10serviceops, 10Performance-Team-publish: (Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet - https://phabricator.wikimedia.org/T241849 (10Krinkle) I've added a Grafana annotation for this event, and looks like we've got some nice perf wins here across the...