[00:00:04] twentyafterfour: That opportune time is upon us again. Time for a Phabricator update deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T0000). [00:02:39] (03PS1) 10Wolfgang Kandek: Documentation for locust setup on AWS. Locust is a stress testing program. [software/locust] - 10https://gerrit.wikimedia.org/r/597655 [00:09:29] (03CR) 10Thcipriani: [C: 03+1] "Like the idea!" (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) (owner: 10Jeena Huneidi) [00:16:35] 10Operations, 10Analytics, 10Analytics-Cluster, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10diego) [00:32:56] (03CR) 10Dave Pifke: [C: 03+1] "Dunno if it matters the sequencing between this patch and the companion one needed to set up ArcLamp to receive the new event channel. If" (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597654 (https://phabricator.wikimedia.org/T253160) (owner: 10Ori.livneh) [00:41:23] (03CR) 10Krinkle: wall-clock excimer profiling for production (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597654 (https://phabricator.wikimedia.org/T253160) (owner: 10Ori.livneh) [00:54:13] (03PS6) 10Krinkle: Set "coalesceKeys" to "non-global" for testwiki and mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575098 (https://phabricator.wikimedia.org/T252564) (owner: 10Aaron Schulz) [00:54:17] (03CR) 10Krinkle: [C: 03+2] Set "coalesceKeys" to "non-global" for testwiki and mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575098 (https://phabricator.wikimedia.org/T252564) (owner: 10Aaron Schulz) [00:54:57] (03Merged) 10jenkins-bot: Set "coalesceKeys" to "non-global" for testwiki and mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575098 (https://phabricator.wikimedia.org/T252564) (owner: 10Aaron Schulz) [00:56:44] * Krinkle staging on mwdebug1002 [01:03:53] !log krinkle@deploy1001 Synchronized wmf-config/mc.php: Ic9efa98312b (duration: 01m 08s) [01:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:04:17] (03PS1) 10Krinkle: mc-labs: Remove unused wan/purge config from Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597662 [01:13:01] (03PS2) 10Jeena Huneidi: [WIP] Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [01:15:17] (03PS3) 10Jeena Huneidi: [WIP] Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [01:17:35] (03PS4) 10Jeena Huneidi: [WIP] Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [01:19:04] (03PS5) 10Jeena Huneidi: [WIP] Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [01:19:26] 10Operations, 10Core Platform Team, 10MediaWiki-General, 10serviceops, and 2 others: Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10tstarling) > * SwiftFileBackend has a concerning mix of high timeouts, high traffic and an inability t... [01:25:18] PROBLEM - Check systemd state on ms-be2026 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:25:20] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10AntiCompositeNumber) [01:25:31] 10Blocked-on-Operations, 10Operations, 10Commons, 10Performance-Team, and 3 others: Convert eqiad imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842 (10AntiCompositeNumber) [01:27:20] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2026 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [01:29:00] 10Operations, 10DBA, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: [RFC] improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Krinkle) [01:31:40] 10Operations, 10DBA, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Krinkle) [01:45:44] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Add LMata to wmf ldap group - https://phabricator.wikimedia.org/T253277 (10lmata) [01:46:09] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Add LMata to wmf ldap group - https://phabricator.wikimedia.org/T253277 (10lmata) [01:51:16] RECOVERY - Check systemd state on ms-be2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:58:10] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2026 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [02:09:55] 10Operations, 10Analytics, 10Analytics-Cluster, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10Reedy) [02:09:57] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Reedy) [02:10:04] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Reedy) [02:10:06] 10Operations, 10Analytics, 10Analytics-Cluster, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10Reedy) [02:10:32] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Reedy) [02:10:35] 10Operations, 10Jupyter-Hub, 10LDAP-Access-Requests: Give access to the JupyterHub (SWAP) notebooks to (@Rvvalentim) - https://phabricator.wikimedia.org/T253155 (10Reedy) [02:10:40] 10Operations, 10SRE-Access-Requests: Give access to the Analytics Cluster to Research Inter (Rodolfo) - https://phabricator.wikimedia.org/T252476 (10Reedy) [02:10:42] 10Operations, 10Jupyter-Hub, 10LDAP-Access-Requests: Give access to the JupyterHub (SWAP) notebooks to (@Rvvalentim) - https://phabricator.wikimedia.org/T253155 (10Reedy) [03:43:33] (03CR) 10Ammarpad: [C: 04-1] "This is a duplicate of Ia31af6e" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597647 (https://phabricator.wikimedia.org/T253268) (owner: 10BrandonXLF) [03:56:52] (03CR) 10BrandonXLF: [C: 03+1] Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [03:58:00] (03Abandoned) 10BrandonXLF: Drop enwiki mobile main page special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597647 (https://phabricator.wikimedia.org/T253268) (owner: 10BrandonXLF) [04:02:07] 10Operations, 10Commons, 10SRE-swift-storage, 10Thumbor, 10Wikimedia-SVG-rendering: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141 (10AntiCompositeNumber) In my opinion, these fonts are non-free and shouldn't be installed on Wikimedia servers. The non... [04:04:15] (03CR) 10Andrew Bogott: [C: 03+2] cloudnet1003 and cloudnet1004: next install with Buster [puppet] - 10https://gerrit.wikimedia.org/r/597560 (https://phabricator.wikimedia.org/T253124) (owner: 10Andrew Bogott) [04:32:20] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:32:50] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 62, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:43:51] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) a:05Jclark-ctr→03jcrespo Per my IRC chat with John, assigning this back to Jaime as the on-site part is done Than you John! [05:00:04] marostegui: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) s1 primary database master restart deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T0500). [05:00:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set enwiki as read-only for maintenance T251982', diff saved to https://phabricator.wikimedia.org/P11257 and previous config saved to /var/cache/conftool/dbconfig/20200521-050029-marostegui.json [05:00:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:35] T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May - https://phabricator.wikimedia.org/T251982 [05:00:56] restarting [05:03:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Set enwiki as read-only=off after maintenance T251982', diff saved to https://phabricator.wikimedia.org/P11258 and previous config saved to /var/cache/conftool/dbconfig/20200521-050328-marostegui.json [05:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:47] I can edit fine [05:03:50] same [05:03:56] oh hi [05:04:10] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:06:39] 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests: LDAP access to the wmf group for Segun Oworu (superset, turnilo, hue) - https://phabricator.wikimedia.org/T252703 (10Nuria) I think all access needed is granted as @soworu is able to access turnilo and superset. I have to say that this... [05:07:02] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:07:56] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:08:31] (not time sensitive) please see https://phabricator.wikimedia.org/T251985#6154444 for an issue with the centralnotice for the readonly window [05:08:49] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:09:05] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) 05Open→03Resolved This is all done! [05:09:09] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10Marostegui) [05:09:11] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Extend Puppet CA Expiry date - https://phabricator.wikimedia.org/T236277 (10Marostegui) [05:09:18] DannyS712: I have commented there [05:09:39] Sorry, I should have been clearer - was the centralnotice meant to only be enabled for enwiki? [05:10:20] DannyS712: this maintenance only affected enwiki, yes [05:11:19] So the centralnotice was shown to everyone seeing pages in english on wikipedias and it should have been for everyone seeing pages on the english wikipedia [05:13:18] DannyS712: Let's follow up on the task, I guess that's a mistake, but I don't really know how that system works, so probably T_rizek is the one that can help you out there [05:13:36] okay [05:21:33] (03PS1) 10Marostegui: mariadb: Add db1143 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597685 (https://phabricator.wikimedia.org/T252512) [05:30:50] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10AntiCompositeNumber) [05:42:32] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool es1019 with low weight', diff saved to https://phabricator.wikimedia.org/P11259 and previous config saved to /var/cache/conftool/dbconfig/20200521-054231-jynus.json [05:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:10] 10Operations, 10Analytics, 10Analytics-Cluster, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10elukey) 05Open→03Resolved a:03elukey @diego any user in analytics-privatedata-users can have access to the GPUs by default (since a few months... [05:48:39] 10Operations, 10Traffic, 10conftool, 10discovery-system, 10services-tooling: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10Joe) I think I will try to implement the following RBAC schema: - | user | RO | RW | from | | - | * | - | * | | root | * | * | cumin | | confto... [05:49:28] 10Operations, 10Analytics, 10Analytics-Cluster, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10elukey) Also another detail: see the versions of tensorflow supported - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU#Use_ten... [06:04:35] !log pool cp5012 - T251219 [06:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:39] T251219: cp5012 memory errors - https://phabricator.wikimedia.org/T251219 [06:05:54] 10Operations, 10ops-eqsin, 10Traffic: cp5012 memory errors - https://phabricator.wikimedia.org/T251219 (10Vgutierrez) 05Open→03Stalled @robh done. let's see how it goes, thanks! [06:28:24] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool es1019 with 50% weight', diff saved to https://phabricator.wikimedia.org/P11260 and previous config saved to /var/cache/conftool/dbconfig/20200521-062823-jynus.json [06:28:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:16] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Ladsgroup) >>! In T243701#6152282, @Gehel wrote: > I'm wondering if exposing both MySQL lag and WDQS la... [06:58:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1091 - T252512', diff saved to https://phabricator.wikimedia.org/P11261 and previous config saved to /var/cache/conftool/dbconfig/20200521-065858-marostegui.json [06:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:03] T252512: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 [07:00:33] (03CR) 10Marostegui: [C: 03+2] mariadb: Add db1143 to s4 [puppet] - 10https://gerrit.wikimedia.org/r/597685 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [07:03:36] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool es1019 fully', diff saved to https://phabricator.wikimedia.org/P11263 and previous config saved to /var/cache/conftool/dbconfig/20200521-070335-jynus.json [07:03:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:54] 10Operations, 10DC-Ops, 10SRE-Access-Requests: access request on cumin[1-2]001 for John Clark - https://phabricator.wikimedia.org/T249916 (10Dzahn) Hi @Jclark-ctr, from the log files on bast1002 and cumin1001 I can see there are 2 different keys involved. The first one is the one in /Users/jclark/.ssh/id_r... [07:11:48] (03PS1) 10Elukey: role::druid::analytics::worker: add autoconfig for historical [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) [07:12:39] (03CR) 10Dzahn: [C: 03+2] wikidough: add role and profile (first commit) [puppet] - 10https://gerrit.wikimedia.org/r/597580 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [07:12:47] (03PS2) 10Dzahn: wikidough: add role and profile (first commit) [puppet] - 10https://gerrit.wikimedia.org/r/597580 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [07:19:38] (03CR) 10Dzahn: [C: 03+2] contint: Drop the slave_scripts manifest, no longer used [puppet] - 10https://gerrit.wikimedia.org/r/597644 (https://phabricator.wikimedia.org/T252955) (owner: 10Jforrester) [07:20:10] (03CR) 10Dzahn: [C: 03+2] contint: Fix comment claiming the file is remote [puppet] - 10https://gerrit.wikimedia.org/r/597643 (owner: 10Jforrester) [07:22:17] * marostegui !log Purge events from tendril.global_status_log older than 24h - T252331 [07:22:27] !log Purge events from tendril.global_status_log older than 24h - T252331 [07:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:22:31] T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging - https://phabricator.wikimedia.org/T252331 [07:23:20] (03PS1) 10Ayounsi: Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/597738 (https://phabricator.wikimedia.org/T253196) [07:24:13] (03PS1) 10Dzahn: Revert "contint: Drop the slave_scripts manifest, no longer used" [puppet] - 10https://gerrit.wikimedia.org/r/597739 [07:25:16] (03CR) 10jerkins-bot: [V: 04-1] Revert "contint: Drop the slave_scripts manifest, no longer used" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (owner: 10Dzahn) [07:25:32] (03PS2) 10Elukey: role::druid::analytics::worker: add autoconfig for historical [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) [07:26:26] (03CR) 10Dzahn: "without revert: puppet broken due to dependency cycle" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (owner: 10Dzahn) [07:26:45] (03CR) 10Dzahn: "profile 'profile::ci::slave' includes non-profile class contint::slave_scripts" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (owner: 10Dzahn) [07:27:31] (03PS2) 10Dzahn: Revert "contint: Drop the slave_scripts manifest, no longer used" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (https://phabricator.wikimedia.org/T252955) [07:27:59] (03CR) 10Vgutierrez: [C: 03+1] Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/597738 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [07:28:24] (03CR) 10Ayounsi: [C: 03+2] Depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/597738 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [07:28:34] (03CR) 10jerkins-bot: [V: 04-1] Revert "contint: Drop the slave_scripts manifest, no longer used" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (https://phabricator.wikimedia.org/T252955) (owner: 10Dzahn) [07:29:12] !log depool ulsfo - T253196 [07:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:16] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [07:31:34] (03CR) 10Ayounsi: [C: 03+1] "LGTM! Feedback on the task." [puppet] - 10https://gerrit.wikimedia.org/r/597606 (https://phabricator.wikimedia.org/T249454) (owner: 10CDanis) [07:32:19] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [07:37:43] (03PS3) 10Elukey: role::druid::analytics::worker: add autoconfig for historical [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) [07:44:46] (03PS4) 10Elukey: role::druid::analytics::worker: add autoconfig for historical [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) [07:47:54] (03PS1) 10Dzahn: ci::slave: add /srv/deployment dir to fix dependency problem [puppet] - 10https://gerrit.wikimedia.org/r/597741 (https://phabricator.wikimedia.org/T252955) [07:48:07] (03CR) 10Dzahn: "this broke puppet on contint* with "Could not find resource 'File[/srv/deployment]' in parameter 'require'"" [puppet] - 10https://gerrit.wikimedia.org/r/597644 (https://phabricator.wikimedia.org/T252955) (owner: 10Jforrester) [07:53:06] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22663/" [puppet] - 10https://gerrit.wikimedia.org/r/597741 (https://phabricator.wikimedia.org/T252955) (owner: 10Dzahn) [07:54:41] (03Abandoned) 10Dzahn: Revert "contint: Drop the slave_scripts manifest, no longer used" [puppet] - 10https://gerrit.wikimedia.org/r/597739 (https://phabricator.wikimedia.org/T252955) (owner: 10Dzahn) [07:55:00] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/597644 (https://phabricator.wikimedia.org/T252955) (owner: 10Jforrester) [07:57:29] (03CR) 10Giuseppe Lavagetto: [C: 03+2] jobrunner: add code to switch to envoy, switch codfw [puppet] - 10https://gerrit.wikimedia.org/r/597513 (https://phabricator.wikimedia.org/T247389) (owner: 10Giuseppe Lavagetto) [08:00:36] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1003/22665/" [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [08:01:17] (03CR) 10Elukey: [C: 03+2] role::druid::analytics::worker: add autoconfig for historical [puppet] - 10https://gerrit.wikimedia.org/r/597737 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [08:02:00] 10Operations, 10SRE-Access-Requests: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10Dzahn) a:05Dzahn→03MBinder_WMF Thanks for taking care of this @RLazarus I can confirm Max's user exists on the Phabri... [08:03:46] !log Shrink ulsfo's 198.35.26.0/23 to 198.35.26.0/24 - T253196 [08:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:51] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:04:37] 10Operations, 10Security-Team, 10serviceops, 10vm-requests, 10PM: Eqiad: 1VM request for Peek (PM service in use by Security Team) - https://phabricator.wikimedia.org/T252210 (10Dzahn) 05Open→03Stalled a:05Dzahn→03None Giving it back to the pool and setting to stalled because of ongoing discussio... [08:04:37] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:05:13] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [08:08:07] (03CR) 10QEDK: [C: 03+1] "Scheduling for SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [08:08:40] XioNoX: see? you work in ulsfo and all of a sudden the Telia link gets up in eqiad [08:08:43] :D [08:09:04] butterfly effect [08:09:15] (03PS1) 10Ayounsi: Revert "Depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/597742 [08:10:18] (03CR) 10Ayounsi: [C: 03+2] Revert "Depool ulsfo" [dns] - 10https://gerrit.wikimedia.org/r/597742 (owner: 10Ayounsi) [08:10:43] !log repool ulsfo - T253196 [08:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:47] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:11:24] (03PS1) 10Dzahn: admin: add Leonardo Mata to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/597743 (https://phabricator.wikimedia.org/T253277) [08:12:13] are swat deploys being done today? [08:12:23] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [08:13:37] !log Delete ROA for 198.35.26.0/23 - T253196 [08:13:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:10] !log Delete ARIN route object for 198.35.26.0/23 - T253196 [08:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:13] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:23:22] (03CR) 10Ayounsi: [C: 03+2] ulsfo: shrink 198.35.26.0/23 to 198.35.26.0/24 [homer/public] - 10https://gerrit.wikimedia.org/r/597486 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:23:40] (03Merged) 10jenkins-bot: ulsfo: shrink 198.35.26.0/23 to 198.35.26.0/24 [homer/public] - 10https://gerrit.wikimedia.org/r/597486 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:27:02] !log Advertise Anycast 198.35.27.0/24 from ulsfo - T253196 [08:27:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:27:06] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:30:51] 10Operations, 10Traffic: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10fgiunchedi) [08:34:42] !log Advertise Anycast 198.35.27.0/24 from dfw - T253196 [08:34:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:45] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:42:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1143 to the list of s4 hosts, depooled - T252512', diff saved to https://phabricator.wikimedia.org/P11264 and previous config saved to /var/cache/conftool/dbconfig/20200521-084226-marostegui.json [08:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:30] T252512: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 [08:43:11] 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [08:43:58] (03CR) 10Ayounsi: [C: 03+2] Add 198.35.27.27 to border-in4 term authdns [homer/public] - 10https://gerrit.wikimedia.org/r/597598 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:44:17] (03Merged) 10jenkins-bot: Add 198.35.27.27 to border-in4 term authdns [homer/public] - 10https://gerrit.wikimedia.org/r/597598 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:45:08] (03CR) 10Ayounsi: [C: 03+2] Advertise anycast 198.35.27.0/24 to the world [homer/public] - 10https://gerrit.wikimedia.org/r/597603 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:45:27] (03Merged) 10jenkins-bot: Advertise anycast 198.35.27.0/24 to the world [homer/public] - 10https://gerrit.wikimedia.org/r/597603 (https://phabricator.wikimedia.org/T253196) (owner: 10Ayounsi) [08:47:30] !log Advertise Anycast 198.35.27.0/24 from eqiad/eqord - T253196 [08:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:34] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:49:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1143 with minimal weight for the first time T252512', diff saved to https://phabricator.wikimedia.org/P11265 and previous config saved to /var/cache/conftool/dbconfig/20200521-084933-marostegui.json [08:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:38] T252512: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 [08:50:59] 10Operations, 10Traffic: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10Dzahn) I think option 3) is the easiest of all, has no risk to break existing checks and we already have many different check_commands using check_http in different ways so another one should not hurt us. In `mo... [08:51:10] (03PS1) 10Marostegui: db1143: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597746 (https://phabricator.wikimedia.org/T252512) [08:51:58] (03CR) 10Marostegui: [C: 03+2] db1143: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/597746 (https://phabricator.wikimedia.org/T252512) (owner: 10Marostegui) [08:52:21] !log Advertise Anycast 198.35.27.0/24 from eqsin - T253196 [08:52:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:22] !log Advertise Anycast 198.35.27.0/24 from esams - T253196 [08:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:25] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [08:56:57] (03CR) 10Filippo Giunchedi: "LGTM, see nit on commit message" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/597743 (https://phabricator.wikimedia.org/T253277) (owner: 10Dzahn) [08:57:02] (03CR) 10Filippo Giunchedi: [C: 03+1] admin: add Leonardo Mata to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/597743 (https://phabricator.wikimedia.org/T253277) (owner: 10Dzahn) [08:59:57] (03CR) 10Dzahn: [C: 03+2] admin: add Leonardo Mata to ldap_only_admins (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/597743 (https://phabricator.wikimedia.org/T253277) (owner: 10Dzahn) [09:01:03] !log LDAP - added lmata to wmf group (T253277) [09:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:07] T253277: Add LMata to wmf ldap group - https://phabricator.wikimedia.org/T253277 [09:01:12] (03PS1) 10Elukey: turnilo: move broker config to an-druid1001 [puppet] - 10https://gerrit.wikimedia.org/r/597747 (https://phabricator.wikimedia.org/T252771) [09:02:51] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Patch-For-Review: Add LMata to wmf ldap group - https://phabricator.wikimedia.org/T253277 (10Dzahn) 05Open→03Resolved a:03Dzahn @lmata Welcome! This is done. You should now be able to log into Icinga, Grafana etc. [09:04:38] (03CR) 10Elukey: [C: 03+2] turnilo: move broker config to an-druid1001 [puppet] - 10https://gerrit.wikimedia.org/r/597747 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [09:12:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11266 and previous config saved to /var/cache/conftool/dbconfig/20200521-091245-marostegui.json [09:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:58] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "> > Patch Set 4: Code-Review+1" (031 comment) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/597322 (owner: 10Cwhite) [09:26:04] (03Merged) 10jenkins-bot: add ca_bundle configuration option [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/597322 (owner: 10Cwhite) [09:26:10] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] update readme [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597304 (owner: 10Cwhite) [09:26:34] 10Operations, 10Traffic, 10netops: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [09:28:41] !log deneb - sudo systemctl reset-failed to clear Icinga alerts about systemd degraded state [09:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:24] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:33:55] (03PS1) 10Apakhomov: changeprop: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597749 [09:36:19] (03CR) 10Giuseppe Lavagetto: "this LGTM! I just have one question: why do we want to have a golang 1.13 image?" (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 (owner: 10Cwhite) [09:38:23] 10Operations, 10Core Platform Team, 10Traffic, 10serviceops, and 2 others: Reduce rate of purges emitted by MediaWiki - https://phabricator.wikimedia.org/T250205 (10aaron) I'm not fond of the idea of not sending purges for indirect edits nor using RefreshLinksJob instead of HtmlCacheUpdateJob (too slow IMO... [09:41:53] (03PS2) 10Apakhomov: changeprop: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597749 [09:44:01] 10Operations, 10netops: scrape ripe atlas data for a few anchors at other large networks - https://phabricator.wikimedia.org/T252890 (10ayounsi) Good idea! What's the limit? I'd suggest: * Comcast - large US ISP - https://atlas.ripe.net/probes/6080/ - https://atlas.ripe.net/probes/6072/ * RIPE to have somethin... [09:51:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11267 and previous config saved to /var/cache/conftool/dbconfig/20200521-095100-marostegui.json [09:51:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:52] (03PS2) 10Dzahn: switch peopleweb service/discovery names to people1002 [dns] - 10https://gerrit.wikimedia.org/r/595959 (https://phabricator.wikimedia.org/T247649) [09:57:38] (03CR) 10Dzahn: [C: 03+2] "[cumin1001:~] $ httpbb /tmp/sc --hosts=people1002.eqiad.wmnet" [dns] - 10https://gerrit.wikimedia.org/r/595959 (https://phabricator.wikimedia.org/T247649) (owner: 10Dzahn) [10:00:10] (03PS1) 10Apakhomov: citoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597750 [10:04:03] ACKNOWLEDGEMENT - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=netbox_device_statistics site=codfw Ayounsi https://phabricator.wikimedia.org/T243927 - The acknowledgement expires at: 2020-05-25 10:03:35. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:06:09] !log test adding --sni to check_http -S on icinga2001 - T253292 [10:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:12] T253292: check_http and SNI support - https://phabricator.wikimedia.org/T253292 [10:06:44] (03PS1) 10Dzahn: site: remove peopleweb role from people1001 [puppet] - 10https://gerrit.wikimedia.org/r/597751 (https://phabricator.wikimedia.org/T247649) [10:07:38] !log replaced backend of people.wikimedia.org - people1001 will be inaccessible, replaced with people1002 on buster. all home dirs have been synced over, there should be no difference except you have to use people1002 now for uploads (T247649) [10:07:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:41] T247649: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 [10:08:44] (03PS1) 10Apakhomov: cxserver: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597752 [10:11:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11268 and previous config saved to /var/cache/conftool/dbconfig/20200521-101100-marostegui.json [10:11:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:53] (03CR) 10Jbond: [C: 03+2] puppetmaster::gitclone: add pre-commit to private repo [puppet] - 10https://gerrit.wikimedia.org/r/596649 (https://phabricator.wikimedia.org/T251247) (owner: 10Jbond) [10:13:05] !log deploy CI for pupet privcate repo [10:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:04] (03PS2) 10Dzahn: site: remove peopleweb role from people1001 [puppet] - 10https://gerrit.wikimedia.org/r/597751 (https://phabricator.wikimedia.org/T247649) [10:17:03] !log restart of acme-chief servers for kernel update [10:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:00] (03PS1) 10Jbond: puppetmaster: correct path in pre-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/597753 [10:21:58] (03CR) 10Jbond: [C: 03+2] puppetmaster: correct path in pre-commit hook [puppet] - 10https://gerrit.wikimedia.org/r/597753 (owner: 10Jbond) [10:23:38] 10Operations, 10SRE-tools: E901 SyntaxError: invalid syntax is wrongly raised on using python's abc by jenkins python CI linter - https://phabricator.wikimedia.org/T152950 (10Aklapper) Three years later, is this still a problem? If yes, is `move the code out of puppet.git to a standalone repo` the preferred so... [10:26:28] (03CR) 10Dzahn: [C: 03+2] site: remove peopleweb role from people1001 [puppet] - 10https://gerrit.wikimedia.org/r/597751 (https://phabricator.wikimedia.org/T247649) (owner: 10Dzahn) [10:30:20] PROBLEM - Keyholder SSH agent on acmechief1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [10:33:11] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "The patch itself LGTM. I have some concerns on diverting from the tools models, but is not a strong opinion either. So, feel free to merge" [puppet] - 10https://gerrit.wikimedia.org/r/597591 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [10:34:54] PROBLEM - Keyholder SSH agent on acmechief2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder [10:35:04] PROBLEM - Check systemd state on acmechief1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:36:02] (03PS1) 10Vgutierrez: hieradata: Remove authdns1001 from authdns_servers [puppet] - 10https://gerrit.wikimedia.org/r/597755 (https://phabricator.wikimedia.org/T241770) [10:36:45] keyholder @ acme-chief it's me, fixing [10:39:34] RECOVERY - Keyholder SSH agent on acmechief1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [10:40:26] RECOVERY - Keyholder SSH agent on acmechief2001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [10:42:29] RECOVERY - Check systemd state on acmechief1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:43:18] 10Operations, 10serviceops, 10Patch-For-Review: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Dzahn) 05Open→03Resolved This has happened and an announcement has been sent to ops and wikitech-l lists. [10:43:21] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn) [10:43:23] (03CR) 10Vgutierrez: "pcc looks happy for acme-chief: https://puppet-compiler.wmflabs.org/compiler1001/22671/" [puppet] - 10https://gerrit.wikimedia.org/r/597755 (https://phabricator.wikimedia.org/T241770) (owner: 10Vgutierrez) [10:44:00] 10Operations, 10serviceops: decom people1001 - https://phabricator.wikimedia.org/T253296 (10Dzahn) [10:49:46] (03CR) 10Vgutierrez: [C: 03+2] hieradata: Remove authdns1001 from authdns_servers [puppet] - 10https://gerrit.wikimedia.org/r/597755 (https://phabricator.wikimedia.org/T241770) (owner: 10Vgutierrez) [10:50:18] 10Operations, 10DBA, 10Patch-For-Review, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10Aklapper) >>! In T161296#5005686, @jcrespo wrote: > We can thing of enabling extra metrics later, but stalling this as the basic work is done (not a blocker anymo... [10:51:15] 10Operations, 10DBA, 10Patch-For-Review, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) 05Stalled→03Resolved a:03jcrespo [10:51:19] 10Operations, 10DBA, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo) [10:53:56] (03PS1) 10Dzahn: site/DHCP: remove people1001.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/597756 (https://phabricator.wikimedia.org/T253296) [10:54:43] 10Operations, 10DBA, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo) [10:55:20] PROBLEM - Check systemd state on thanos-be2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:56:17] PROBLEM - Check systemd state on thanos-be2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:58:23] (03PS1) 10Dzahn: ATS: fix never_cache rule to apply to peopleweb discovery name [puppet] - 10https://gerrit.wikimedia.org/r/597757 (https://phabricator.wikimedia.org/T247649) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T1100). [11:00:05] qedk: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] (03CR) 10Vgutierrez: [C: 03+1] "nice catch, LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597757 (https://phabricator.wikimedia.org/T247649) (owner: 10Dzahn) [11:00:22] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570180 [11:01:47] Amir1 are we deploying this window hmm [11:03:23] (03CR) 10Dzahn: [C: 03+2] ATS: fix never_cache rule to apply to peopleweb discovery name [puppet] - 10https://gerrit.wikimedia.org/r/597757 (https://phabricator.wikimedia.org/T247649) (owner: 10Dzahn) [11:04:02] RECOVERY - Check systemd state on thanos-be2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:04:20] RECOVERY - Check systemd state on thanos-be2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:04:38] !log rolling restart of ncredir servers for kernel update [11:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:58] re-scheduling patch welp [11:06:11] jouncebot: now [11:06:12] For the next 0 hour(s) and 53 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T1100) [11:07:17] (03PS1) 10Dzahn: peopleweb: set rsync_src and rsync_dst host to the same server [puppet] - 10https://gerrit.wikimedia.org/r/597758 (https://phabricator.wikimedia.org/T247649) [11:14:31] (03CR) 10Dzahn: [C: 03+2] "Wasn't sure if this conflicts but apparently it does not. And then it's a nice solution because we can just keep the migration code around" [puppet] - 10https://gerrit.wikimedia.org/r/597758 (https://phabricator.wikimedia.org/T247649) (owner: 10Dzahn) [11:14:47] (03CR) 10Hnowlan: [C: 03+2] changeprop: remove changeprop configuration from scb [puppet] - 10https://gerrit.wikimedia.org/r/597258 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [11:18:46] !log Removed changeprop from scb hosts [11:18:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:01] 10Operations, 10netops: scrape ripe atlas data for a few anchors at other large networks - https://phabricator.wikimedia.org/T252890 (10jbond) Which measurements to you plan to scrap? - all measurements the anchors are performing outbound? - the anchoring measurements directed at thes anchor. If its t... [11:37:20] PROBLEM - Check systemd state on ncredir1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:37:22] PROBLEM - Check systemd state on ncredir5002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:37:35] vgutierrez: ^ [11:41:42] hmmm [11:41:50] * vgutierrez checking [11:42:26] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Dzahn) 05Stalled→03Open [11:44:44] RECOVERY - Check systemd state on ncredir1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:44:48] RECOVERY - Check systemd state on ncredir5002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:54:16] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Overall I agree with this, left a couple of comments inline. As far as the golang versions go, I am also unsure why we need both 1.13 and " (037 comments) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 (owner: 10Cwhite) [11:54:49] 10Operations, 10Puppet, 10User-jbond: Add CI to the private repo - https://phabricator.wikimedia.org/T251247 (10jbond) Have added yamllint checking to the private repo [12:05:34] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw215[8-9].codfw.wmnet [12:05:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11270 and previous config saved to /var/cache/conftool/dbconfig/20200521-120555-marostegui.json [12:05:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:23] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw216[0-9].codfw.wmnet [12:10:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:09] (03PS1) 10Marostegui: dbproxy1018: Repool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/597762 (https://phabricator.wikimedia.org/T249188) [12:11:38] (03CR) 10Marostegui: [C: 03+2] dbproxy1018: Repool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/597762 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [12:12:47] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw217[0-2].codfw.wmnet [12:12:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:51] !log Repool labsdb1011 into the analytics role 🤞- T249188 [12:12:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:54] T249188: Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 [12:13:19] !log depooled mw2158 through mw2172 to make room again in C3 as planned (T247018) [12:13:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:22] T247018: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 [12:20:12] 10Operations, 10Traffic: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10fgiunchedi) >>! In T253292#6154832, @Dzahn wrote: > I think option 3) is the easiest of all, has no risk to break existing checks and we already have many different check_commands using check_http in different wa... [12:20:46] (03PS1) 10Elukey: role::druid::public::worker: update historical's settings [puppet] - 10https://gerrit.wikimedia.org/r/597764 (https://phabricator.wikimedia.org/T252771) [12:22:49] 10Operations, 10Traffic: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10CDanis) +1, option 3 is the most expedient but creates the most technical debt. Re: option 1, I think it would be relatively straightforward to dump all check_http from the puppet db (or from the icinga config f... [12:24:49] (03PS1) 10Filippo Giunchedi: icinga: add --sni to check_http --ssl invocations [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) [12:25:23] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22673/druid1004.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597764 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [12:26:30] (03CR) 10Filippo Giunchedi: "I tried this on icinga2001 and no new/unexpected alerts fired, whereas sni-only envoy checks started working (i.e. thanos-swift). Let me k" [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [12:28:14] (03CR) 10CDanis: [C: 03+1] "Ah, I didn't even think of just trying it on icinga2001! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/597765 (https://phabricator.wikimedia.org/T253292) (owner: 10Filippo Giunchedi) [12:28:49] cdanis: aye I just tried it earlier today, brutal but worked :) [12:29:04] !log roll restart druid-public cluster (druid100[4-6], backend for the AQS API) to apply new settings + openjdk upgrade - T252771 [12:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:08] T252771: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 [12:33:54] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [12:37:24] (03PS1) 10Dzahn: site: remove decom'ed codfw appservers [puppet] - 10https://gerrit.wikimedia.org/r/597769 [12:44:22] !log reimaging cloudnet1004.eqiad.wmnet for T253124 [12:44:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:26] T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster - https://phabricator.wikimedia.org/T253124 [12:48:09] 10Operations, 10Cassandra, 10Services (watching), 10User-Eevans: Cassandra uses default ip address for outbound packets while bootstrapping - https://phabricator.wikimedia.org/T128590 (10Aklapper) >>! In T128590#4397640, @fgiunchedi wrote: >>>! In T128590#4396369, @Eevans wrote: >> is this still a thing? >... [12:50:17] 10Operations, 10Graphite: graphite / carbon-cache leaks memory on corrupted whisper files - https://phabricator.wikimedia.org/T101572 (10Aklapper) >>! In T101572#1606068, @fgiunchedi wrote: > upstream issue hasn't seen a lot of activity, stalling @fgiunchedi: https://github.com/graphite-project/carbon/issues/... [12:50:48] 10Operations, 10Traffic, 10conftool, 10discovery-system, 10services-tooling: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10akosiaris) RBAC without roles isn't really Role Based Access Control, but I digress. LGTM on my side for those permissions. [12:54:26] 10Operations, 10Traffic, 10conftool, 10discovery-system, 10services-tooling: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10CDanis) These permissions LGTM. [12:57:25] (03PS1) 10Dzahn: site: remove 13 old jobrunners from codfw rack C3 [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) [13:00:44] (03PS2) 10Dzahn: site: remove 13 old jobrunners from codfw rack C3 [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) [13:00:51] (03Abandoned) 10Dzahn: site: remove decom'ed codfw appservers [puppet] - 10https://gerrit.wikimedia.org/r/597769 (owner: 10Dzahn) [13:04:37] (03PS1) 10Apakhomov: eventgate: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597772 [13:08:32] (03PS1) 10Dzahn: site: fix comment regarding servers in the wrong rack [puppet] - 10https://gerrit.wikimedia.org/r/597773 [13:11:02] (03PS1) 10Apakhomov: eventstreams: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597774 [13:16:34] (03PS1) 10Apakhomov: kask: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597776 [13:19:01] 10Operations, 10observability: run nic_saturation_exporter on all physical hosts - https://phabricator.wikimedia.org/T250401 (10CDanis) [13:22:22] !log cloudnet1004 - reboot to test PXE boot [13:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:04] 10Operations, 10observability: run nic_saturation_exporter on all physical hosts - https://phabricator.wikimedia.org/T250401 (10CDanis) 05Open→03Resolved The saturation metrics don't nicely fit into the existing dashboards for LVS/MySQL/appservers, really. I think the additions on the host dashboard and t... [13:31:13] (03PS3) 10Vgutierrez: Release 8.0.7-1wm10 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/597552 [13:34:47] PROBLEM - Host authdns1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [13:36:13] ^^ that's an expired downtime [13:36:43] hmm or not.. that's the mgmt one [13:37:08] (03PS1) 10Apakhomov: mathoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597777 [13:40:41] RECOVERY - Host authdns1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [13:43:02] (03PS1) 10Apakhomov: termbox: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597778 [13:43:31] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) > Per my IRC chat with John Could you tell me more, as before a processor error was mentioned, but then a board change? [13:45:10] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) >>! In T250602#6155334, @jcrespo wrote: >> Per my IRC chat with John > > Could you tell me more, as before a processor error was mentioned, but then a board change? My cha... [13:51:25] !log depool cp4032 for some ats tests [13:51:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:30] (03PS1) 10Cmjohnson: Updating mac address for new nic card authnds1001 [puppet] - 10https://gerrit.wikimedia.org/r/597779 (https://phabricator.wikimedia.org/T241770) [13:54:20] (03CR) 10Cmjohnson: [C: 03+2] Updating mac address for new nic card authnds1001 [puppet] - 10https://gerrit.wikimedia.org/r/597779 (https://phabricator.wikimedia.org/T241770) (owner: 10Cmjohnson) [14:00:07] (03PS1) 10Jcrespo: mariadb-backups: Prepare db1140 for reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/597780 (https://phabricator.wikimedia.org/T250602) [14:00:10] (03PS1) 10Dzahn: install_server: add an nginx also on servers with "light" role [puppet] - 10https://gerrit.wikimedia.org/r/597781 (https://phabricator.wikimedia.org/T252526) [14:00:59] (03PS22) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:01:11] (03PS2) 10Jcrespo: mariadb-backups: Prepare db1140 for reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/597780 (https://phabricator.wikimedia.org/T250602) [14:01:15] (03CR) 10jerkins-bot: [V: 04-1] install_server: add an nginx also on servers with "light" role [puppet] - 10https://gerrit.wikimedia.org/r/597781 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [14:01:48] (03CR) 10jerkins-bot: [V: 04-1] hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (owner: 10Jbond) [14:03:54] (03PS1) 10Apakhomov: wikifeeds: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597782 [14:04:34] (03PS23) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:05:55] (03CR) 10jerkins-bot: [V: 04-1] hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (owner: 10Jbond) [14:06:20] (03PS1) 10Apakhomov: zotero: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597783 [14:07:49] (03PS24) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:08:38] (03CR) 10jerkins-bot: [V: 04-1] hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (owner: 10Jbond) [14:09:39] (03PS7) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 [14:11:04] (03PS2) 10Dzahn: install_server: add an nginx also on servers with "light" role [puppet] - 10https://gerrit.wikimedia.org/r/597781 (https://phabricator.wikimedia.org/T252526) [14:11:09] (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris) [14:12:44] (03PS1) 10Apakhomov: chromium-render: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597785 [14:13:03] (03PS1) 10Elukey: role::druid::public::worker: increase historical thread pool [puppet] - 10https://gerrit.wikimedia.org/r/597786 (https://phabricator.wikimedia.org/T252771) [14:14:45] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10Jclark-ctr) [14:15:10] (03PS1) 10Apakhomov: mediawiki-dev: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597787 [14:15:32] (03PS8) 10Alexandros Kosiaris: Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 [14:16:50] (03CR) 10Elukey: [C: 03+2] role::druid::public::worker: increase historical thread pool [puppet] - 10https://gerrit.wikimedia.org/r/597786 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [14:17:51] (03CR) 10jerkins-bot: [V: 04-1] Bump to 0.109.0 [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris) [14:19:14] (03PS1) 10Apakhomov: mobileapps: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597788 [14:19:19] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/22675/install2003.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/597781 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [14:21:26] (03PS1) 10Apakhomov: parsoid: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597789 [14:22:25] (03PS1) 10DCausse: [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 [14:23:19] (03PS1) 10Apakhomov: restrouter: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597791 [14:23:30] (03CR) 10jerkins-bot: [V: 04-1] [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 (owner: 10DCausse) [14:23:58] 10Operations, 10Graphite: graphite / carbon-cache leaks memory on corrupted whisper files - https://phabricator.wikimedia.org/T101572 (10fgiunchedi) 05Stalled→03Declined >>! In T101572#6155200, @Aklapper wrote: >>>! In T101572#1606068, @fgiunchedi wrote: >> upstream issue hasn't seen a lot of activity, sta... [14:28:55] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] "This is failing tests for helmfile itself, it's not the debianization that's wrong. For now bypassed with DEB_BUILD_OPTIONS=nocheck, it se" [debs/helmfile] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/588415 (owner: 10Alexandros Kosiaris) [14:29:32] (03PS1) 10Marostegui: dashboard.sql: Remove limit clause [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) [14:32:07] (03PS1) 10Arturo Borrero Gonzalez: paws: install the helm3 package in the control nodes [puppet] - 10https://gerrit.wikimedia.org/r/597793 (https://phabricator.wikimedia.org/T253241) [14:33:40] !log upload helmfile 0.109.0 to apt.wikimedia.org/buster-wikimedia and stretch-wikimedia, component main [14:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:12] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Add LMata to wmf ldap group - https://phabricator.wikimedia.org/T253277 (10lmata) Thank you! [14:39:27] (03PS25) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:40:52] (03PS26) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:41:44] (03CR) 10Alexandros Kosiaris: "operations/software/locust as a project name would be good as a name for a local fork of the locust software itself, not puppet code for m" [software/locust] - 10https://gerrit.wikimedia.org/r/597649 (owner: 10Wolfgang Kandek) [14:42:08] (03PS2) 10DCausse: [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 [14:43:14] (03CR) 10jerkins-bot: [V: 04-1] [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 (owner: 10DCausse) [14:43:23] (03CR) 10Alexandros Kosiaris: [C: 03+2] mathoid: Uninstall mathoid canary in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/597587 (owner: 10Alexandros Kosiaris) [14:43:44] (03Merged) 10jenkins-bot: mathoid: Uninstall mathoid canary in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/597587 (owner: 10Alexandros Kosiaris) [14:44:18] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' . [14:44:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:56] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10Jclark-ctr) [14:45:15] (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "mathoid: Test canary functionality in codfw" [deployment-charts] - 10https://gerrit.wikimedia.org/r/597588 (owner: 10Alexandros Kosiaris) [14:45:36] (03Merged) 10jenkins-bot: Revert "mathoid: Test canary functionality in codfw" [deployment-charts] - 10https://gerrit.wikimedia.org/r/597588 (owner: 10Alexandros Kosiaris) [14:47:22] (03PS27) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:47:32] !log bblack@cumin1001 START - Cookbook sre.hosts.downtime [14:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:06] (03CR) 10Dzahn: "wmcs team reported not being able to do installs from the cloudvirt VLAN (cloudnet1004)" [puppet] - 10https://gerrit.wikimedia.org/r/597781 (https://phabricator.wikimedia.org/T252526) (owner: 10Dzahn) [14:50:12] !log bblack@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:10] authdns1001 is being reimaged - the linked task is ACL'd so we don't as much auto-communication [14:51:26] but just FYI, if you see some random alerts about DNS IPs or hostnames, it's probably that. [14:51:59] (03CR) 10Ayounsi: [C: 03+1] cookbooks sre.hosts.rotate-pdu-password: small refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [14:52:34] (03PS1) 10Alexandros Kosiaris: Revert "mathoid: Support canary functionality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/597796 [14:53:16] (03CR) 10Ayounsi: [C: 03+1] cookbook sre.hosts.rotate-pdu-password: use request.Session and response.raise_for_status [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [14:53:40] (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "mathoid: Support canary functionality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/597796 (owner: 10Alexandros Kosiaris) [14:53:52] 10Operations, 10Patch-For-Review: serve tftpboot environment from the install servers and create one in each edge POP - https://phabricator.wikimedia.org/T252526 (10Dzahn) wmcs team reported not being able to do installs from the cloudvirt VLAN (cloudnet1004) debugging it I saw DHCP worked but after that the... [14:54:01] (03Merged) 10jenkins-bot: Revert "mathoid: Support canary functionality" [deployment-charts] - 10https://gerrit.wikimedia.org/r/597796 (owner: 10Alexandros Kosiaris) [14:54:30] (03PS28) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [14:56:58] (03PS4) 10Alexandros Kosiaris: admin: deduplicate main helmfile.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/581656 [14:57:00] (03PS4) 10Alexandros Kosiaris: admin/namespace: Deduplicate all helmfile templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/581657 [14:57:02] (03PS4) 10Alexandros Kosiaris: admin: Default to sensible values for deploUser, namespaceName [deployment-charts] - 10https://gerrit.wikimedia.org/r/581658 [14:57:04] (03PS4) 10Alexandros Kosiaris: admin: Remove all override files [deployment-charts] - 10https://gerrit.wikimedia.org/r/581748 [14:58:56] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: name=mw215[8-9].codfw.wmnet [14:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:05] 10Operations, 10Traffic, 10Patch-For-Review: check_http and SNI support - https://phabricator.wikimedia.org/T253292 (10fgiunchedi) In terms of `check_http` usage ATM we have these invocations, for which we'd stop checking the default (non-SNI) certificate: ` $ # the only check_http check not named after che... [14:59:20] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet [14:59:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:28] (03CR) 10Bstorm: [C: 03+2] "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/597591 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [14:59:38] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: name=mw217[0-2].codfw.wmnet [14:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:25] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [15:00:59] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:02:33] 👀 [15:03:27] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 59 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:03:43] PROBLEM - mediawiki-installation DSH group on mw2165 is CRITICAL: Host mw2165 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [15:04:03] RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [15:04:43] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:05:30] ACKNOWLEDGEMENT - mediawiki-installation DSH group on mw2165 is CRITICAL: Host mw2165 is not in mediawiki-installation dsh group daniel_zahn decom https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [15:07:43] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [15:07:43] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [15:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:51] (03CR) 10Jcrespo: [C: 03+1] "As long as it doesn't put down the mariadb server, I don't have any thoughts about this." [software/tendril] - 10https://gerrit.wikimedia.org/r/597792 (https://phabricator.wikimedia.org/T252331) (owner: 10Marostegui) [15:08:32] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [15:08:32] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [15:08:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:45] (03PS1) 10BBlack: Revert "hieradata: Remove authdns1001 from authdns_servers" [puppet] - 10https://gerrit.wikimedia.org/r/597800 [15:08:51] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 45 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [15:08:59] (03PS2) 10BBlack: Revert "hieradata: Remove authdns1001 from authdns_servers" [puppet] - 10https://gerrit.wikimedia.org/r/597800 [15:09:06] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [15:09:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:12] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:19] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10ops-monitoring-bot) Icinga downtime for 4 days, 0:00:00 set by dzahn@cumin10... [15:09:46] (03CR) 10BBlack: [V: 03+2 C: 03+2] Revert "hieradata: Remove authdns1001 from authdns_servers" [puppet] - 10https://gerrit.wikimedia.org/r/597800 (owner: 10BBlack) [15:10:33] (03PS1) 10Elukey: Add AAAA records for an-druid100[1,2] and stat1008 [dns] - 10https://gerrit.wikimedia.org/r/597801 (https://phabricator.wikimedia.org/T252771) [15:10:58] (03CR) 10jerkins-bot: [V: 04-1] Add AAAA records for an-druid100[1,2] and stat1008 [dns] - 10https://gerrit.wikimedia.org/r/597801 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [15:11:29] (03CR) 10Papaul: [C: 03+2] Add kubestage200[1-2] , kubernetes200[7-14] MAC entries and to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/597595 (https://phabricator.wikimedia.org/T252185) (owner: 10Papaul) [15:11:39] (03PS7) 10Papaul: Add kubestage200[1-2] , kubernetes200[7-14] MAC entries and to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/597595 (https://phabricator.wikimedia.org/T252185) [15:11:46] (03CR) 10Papaul: [V: 03+2 C: 03+2] Add kubestage200[1-2] , kubernetes200[7-14] MAC entries and to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/597595 (https://phabricator.wikimedia.org/T252185) (owner: 10Papaul) [15:12:27] (03PS2) 10Elukey: Add AAAA records for an-druid100[1,2] and stat1008 [dns] - 10https://gerrit.wikimedia.org/r/597801 (https://phabricator.wikimedia.org/T252771) [15:13:05] elukey: I'll hold that one maybe a few minutes /cc bblack [15:15:49] (03PS29) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [15:16:20] vgutierrez: sorry good point, didn't notice! [15:17:08] almost there [15:17:39] yep yep no rush, very low priority [15:18:56] ok you should be good to go, authdns-update has been tested both directions wrt authdns1001 and is working now :) [15:19:14] elukey: ^ [15:19:17] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Prepare db1140 for reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/597780 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [15:19:26] (03PS3) 10Jcrespo: mariadb-backups: Prepare db1140 for reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/597780 (https://phabricator.wikimedia.org/T250602) [15:19:53] (03CR) 10Ayounsi: [C: 03+1] "Another nit." (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:20:38] (03PS1) 10Arturo Borrero Gonzalez: openstack: oidentd: change arguments to support 2.3.2 [puppet] - 10https://gerrit.wikimedia.org/r/597804 [15:21:50] bblack: ack thanks :) [15:22:10] (03CR) 10Andrew Bogott: [C: 03+2] openstack: oidentd: change arguments to support 2.3.2 [puppet] - 10https://gerrit.wikimedia.org/r/597804 (owner: 10Arturo Borrero Gonzalez) [15:22:26] !log Add BGP between cr1/2-eqiad and authdns1001 - T253196 [15:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:22:30] T253196: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 [15:27:46] (03PS1) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: refresh code for modern puppet [puppet] - 10https://gerrit.wikimedia.org/r/597805 (https://phabricator.wikimedia.org/T97972) [15:27:48] (03PS1) 10Giuseppe Lavagetto: profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) [15:29:05] (03CR) 10jerkins-bot: [V: 04-1] profile::etcd::tlsproxy: add additional users for pools [puppet] - 10https://gerrit.wikimedia.org/r/597806 (https://phabricator.wikimedia.org/T97972) (owner: 10Giuseppe Lavagetto) [15:29:36] 10Operations, 10Traffic, 10netops: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [15:33:39] 10Operations, 10Puppet, 10User-jbond: Add CI to the private repo - https://phabricator.wikimedia.org/T251247 (10jbond) 05Open→03Resolved a:03jbond [15:39:58] (03CR) 10Jforrester: "Aha, thank you, sorry." [puppet] - 10https://gerrit.wikimedia.org/r/597741 (https://phabricator.wikimedia.org/T252955) (owner: 10Dzahn) [15:42:31] (03PS30) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [15:46:10] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops, 10Patch-For-Review: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) a:05jcrespo→03Jclark-ctr Hi, I cannot reinstall the server because the remote ipmi interface doesn't work (and the ssh or the https acesses, that are... [15:47:20] (03PS31) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [15:48:44] !log rebuilding cloudnet1003.eqiad.wmnet with Debian Buster for T253124 [15:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:48] T253124: Upgrade cloudnet2003 and cloudnet2004 to Debian Buster - https://phabricator.wikimedia.org/T253124 [15:49:15] (03PS3) 10Jdlrobson: Use AddFooterLink hook for code of conduct and contact links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/596277 (https://phabricator.wikimedia.org/T251817) [15:53:43] (03PS5) 10Cwhite: add golang1-devel builder images based on golang 1.14 using wikimedia-buster base [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 [15:53:45] (03PS5) 10Cwhite: add loki 1.5.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597317 (https://phabricator.wikimedia.org/T222826) [15:55:37] (03CR) 10Cwhite: "Based on golang's commitment to maintaining backwards compatibility between minor releases, I see no reason to maintain lower versions as " [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 (owner: 10Cwhite) [15:55:58] (03PS32) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [15:56:54] Hey all - I'd like to deploy a small change to PrivateSettings.php if that won't interfere with the puppet swat or anybody else. [15:57:47] (03CR) 10Elukey: [C: 03+2] Add AAAA records for an-druid100[1,2] and stat1008 [dns] - 10https://gerrit.wikimedia.org/r/597801 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [15:58:12] sbassett: I don't think there'll be anything for the puppet swat [15:58:23] rzl: cool thanks [15:58:42] can't speak for the "anybody else" part :) [16:00:04] godog and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:41] (03PS33) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [16:02:33] (03PS34) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 [16:04:07] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: Update mitigations for T250887 (duration: 01m 08s) [16:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:04] (03PS1) 10Papaul: Partman: Add kubestsge200[1-2] kubernetes200[7-14] [puppet] - 10https://gerrit.wikimedia.org/r/597815 (https://phabricator.wikimedia.org/T252185) [16:12:15] !log andrew@cumin1001 START - Cookbook sre.hosts.downtime [16:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:14:15] (03CR) 10Bstorm: [C: 03+2] "Thanks for finding this!" [puppet] - 10https://gerrit.wikimedia.org/r/597793 (https://phabricator.wikimedia.org/T253241) (owner: 10Arturo Borrero Gonzalez) [16:14:55] !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:02] (03PS1) 10Elukey: Add PTR/AAAA records for druid100[7,8] [dns] - 10https://gerrit.wikimedia.org/r/597817 (https://phabricator.wikimedia.org/T252771) [16:28:39] (03CR) 10Elukey: [C: 03+2] Add PTR/AAAA records for druid100[7,8] [dns] - 10https://gerrit.wikimedia.org/r/597817 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [16:31:10] (03PS1) 10Bstorm: kubeadm: Our template should match package versions for bootstraps [puppet] - 10https://gerrit.wikimedia.org/r/597818 (https://phabricator.wikimedia.org/T188912) [16:31:50] (03CR) 10Ayounsi: [C: 03+1] "Not tested but the logic looks sound." (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594436 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:33:20] (03PS1) 10Elukey: Add druid100[7,8] to the druid_public_hosts network/firewall range [puppet] - 10https://gerrit.wikimedia.org/r/597821 (https://phabricator.wikimedia.org/T252771) [16:35:43] (03PS1) 10BBlack: Remove unused systemd fragment [puppet] - 10https://gerrit.wikimedia.org/r/597822 (https://phabricator.wikimedia.org/T98006) [16:35:45] (03PS1) 10BBlack: [WIP] dnsbox: looser binding for anycast healtcheck [puppet] - 10https://gerrit.wikimedia.org/r/597823 (https://phabricator.wikimedia.org/T98006) [16:35:53] (03PS1) 10Bstorm: paws-k8s: Adding a checkout of the paws repo [puppet] - 10https://gerrit.wikimedia.org/r/597824 (https://phabricator.wikimedia.org/T211096) [16:36:54] (03Abandoned) 10Bstorm: paws: Add a profile to provide some special config for paws [puppet] - 10https://gerrit.wikimedia.org/r/597602 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [16:38:06] (03PS2) 10BBlack: Remove unused systemd fragment [puppet] - 10https://gerrit.wikimedia.org/r/597822 (https://phabricator.wikimedia.org/T98006) [16:38:08] (03PS2) 10BBlack: [WIP] dnsbox: looser binding for anycast healtcheck [puppet] - 10https://gerrit.wikimedia.org/r/597823 (https://phabricator.wikimedia.org/T98006) [16:42:27] (03CR) 10Elukey: [C: 03+2] Add druid100[7,8] to the druid_public_hosts network/firewall range [puppet] - 10https://gerrit.wikimedia.org/r/597821 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [16:44:46] (03CR) 10Bstorm: "Since this doesn't seem to damage anything, I'll merge it." [puppet] - 10https://gerrit.wikimedia.org/r/597818 (https://phabricator.wikimedia.org/T188912) (owner: 10Bstorm) [16:44:48] (03CR) 10Bstorm: [C: 03+2] kubeadm: Our template should match package versions for bootstraps [puppet] - 10https://gerrit.wikimedia.org/r/597818 (https://phabricator.wikimedia.org/T188912) (owner: 10Bstorm) [16:46:15] (03CR) 10BBlack: [C: 03+2] Remove unused systemd fragment [puppet] - 10https://gerrit.wikimedia.org/r/597822 (https://phabricator.wikimedia.org/T98006) (owner: 10BBlack) [16:58:02] (03CR) 10Bstorm: [C: 03+2] paws-k8s: Adding a checkout of the paws repo [puppet] - 10https://gerrit.wikimedia.org/r/597824 (https://phabricator.wikimedia.org/T211096) (owner: 10Bstorm) [16:59:18] (03PS1) 10RLazarus: admin: Fix my typo, s/ssh-keys/ssh_keys/. [puppet] - 10https://gerrit.wikimedia.org/r/597830 (https://phabricator.wikimedia.org/T251349) [17:00:04] halfak and accraze: Your horoscope predicts another unfortunate Services – Graphoid / Citoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T1700). [17:00:18] (03CR) 10CDanis: [C: 03+1] admin: Fix my typo, s/ssh-keys/ssh_keys/. [puppet] - 10https://gerrit.wikimedia.org/r/597830 (https://phabricator.wikimedia.org/T251349) (owner: 10RLazarus) [17:02:15] (03CR) 10RLazarus: [C: 03+2] admin: Fix my typo, s/ssh-keys/ssh_keys/. [puppet] - 10https://gerrit.wikimedia.org/r/597830 (https://phabricator.wikimedia.org/T251349) (owner: 10RLazarus) [17:04:23] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10RLazarus) @MBinder_WMF On the offchance you'd already tried logging into phab1001, and it didn't wor... [17:04:24] !log starting labstore1005 upgrades T224582 [17:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:04:27] T224582: Migrate labstore1004/labstore1005 to Stretch/Buster - https://phabricator.wikimedia.org/T224582 [17:05:04] (03PS3) 10DCausse: [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 [17:06:10] (03CR) 10jerkins-bot: [V: 04-1] [WIP][wdqs] add a new streaming updater test role [puppet] - 10https://gerrit.wikimedia.org/r/597790 (owner: 10DCausse) [17:10:52] (03CR) 10Gehel: profile::java: one profile to rule them all (openjdk-x versions) (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/597219 (owner: 10Elukey) [17:17:29] (03CR) 10CDanis: [C: 03+2] fnm: increase thresholds [puppet] - 10https://gerrit.wikimedia.org/r/597606 (https://phabricator.wikimedia.org/T249454) (owner: 10CDanis) [17:20:47] (03PS1) 10Elukey: Assign role::druid::public::worker to druid100[7,8] [puppet] - 10https://gerrit.wikimedia.org/r/597833 (https://phabricator.wikimedia.org/T252771) [17:20:57] !log anycast experimentation commencing in ulsfo (test route withdrawal)... [17:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:01] (may see some alerts related to ulsfo DNS, it's ok!) [17:23:32] (03CR) 10Elukey: [C: 03+2] Assign role::druid::public::worker to druid100[7,8] [puppet] - 10https://gerrit.wikimedia.org/r/597833 (https://phabricator.wikimedia.org/T252771) (owner: 10Elukey) [17:24:27] !log anycast experiment done, all back to normal [17:24:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:43] 10Operations, 10Traffic, 10netops: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) [17:26:55] 10Operations, 10Traffic, 10netops: Advertise 198.35.27.0/24 as anycast prefix - https://phabricator.wikimedia.org/T253196 (10ayounsi) 05Open→03Resolved Confirmed that if dns4001 and dns4002 are down, ulsfo will stop advertising `198.35.27.0/24` to the world but still had routes to 198.35.27.27/32 via codfw. [17:26:59] 10Operations, 10Traffic, 10netops, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10ayounsi) [17:47:52] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10RLazarus) 05Open→03Resolved [17:51:23] 10Operations, 10Analytics, 10Traffic: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990 (10RLazarus) p:05Triage→03Medium a:03ssingh Trying to route this -- @ssingh, should this be assigned to you? [17:51:47] (03CR) 10Ayounsi: [C: 04-1] "Thanks! Nits and see the RW community and the log comments inline." (037 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/594445 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [17:52:30] 10Operations, 10Readers-Web-Backlog, 10Traffic: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10RLazarus) [17:53:37] 10Operations, 10Analytics, 10Traffic: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990 (10ssingh) Hi, yes that's fine for now. The privacy threats will be more suited for the Security team but I will triage it again when required... [18:00:05] RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T1800). [18:00:05] qedk: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:07:11] (03PS1) 10Aaron Schulz: arclamp: set --minwidth to 1 for all SVG flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/597839 [18:09:26] 10Operations, 10Traffic, 10Patch-For-Review: Implement a prometheus exporter for rdkafka in golang - https://phabricator.wikimedia.org/T253197 (10RLazarus) p:05Triage→03Medium [18:14:21] (03CR) 10Krinkle: [C: 03+1] arclamp: set --minwidth to 1 for all SVG flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/597839 (owner: 10Aaron Schulz) [18:15:05] (03PS2) 10Aaron Schulz: arclamp: set --minwidth to 1 for all SVG flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/597839 (https://phabricator.wikimedia.org/T247717) [18:17:47] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:23:15] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:23:44] i am workin on this :) [18:24:10] !log restarting phabricator on phab1001 to deploy https://phabricator.wikimedia.org/rPHEX2687d08786a9dadcbaa96709de991f471f239830 [18:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:55] 10Operations, 10Readers-Web-Backlog, 10Traffic: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10BBlack) Interestingly, the mobile redirect code in varnish doesn't strip any parameters. The problem is that the analytics-side VCL code that consumes the `wprov` parame... [18:25:09] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:30:45] Urbanecm RoanKattouw are swat deploys happening? [18:31:10] 10Operations, 10Analytics, 10Readers-Web-Backlog, 10Traffic: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10BBlack) [18:31:56] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10wkandek) pwstore installed, access pending regeneration of pgp key. Task can be closed. [18:32:03] qedk: oh, nk one came so far? [18:32:10] In that case, let me take the swat [18:32:29] i just joined now '=D [18:33:37] no one was there in mid-day swat so i rescheduled (and forgot) [18:33:46] thanks urbanecm! [18:35:17] qedk: there is a cofnlict, could you rebase, please? [18:35:19] *conflict [18:35:42] jdlrobson is patch owner :/ [18:36:12] let me patch it up wait [18:36:41] hmm, Jdlrobson: is this ready to be deployed? [18:38:50] it was supposed to be deployed on 18th [18:41:02] (03CR) 10Krinkle: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/597839 (https://phabricator.wikimedia.org/T247717) (owner: 10Aaron Schulz) [18:42:52] (03CR) 10RLazarus: "T247018 mentions removing mw[2158-2172] but these start at 2150, not 2158 -- is that an intentional change?" [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [18:49:05] yes. but perhaps there is a reason to wait [18:49:09] Urbanecm my data is terrible, pulling is taking way too much time :/ might as well let Jdlrobson take another window for this [18:49:28] if Jdlrobson is around, I'm happy to do that, but I don't want to deploy something that's supposed to wait [18:49:28] oh, he was waiting for no one to complain [18:49:43] and till 20th no one did, so he +1ed it [18:49:55] and marked it ready for review [18:50:06] either way no pressure, i can't seem to get to rebase it anyway :) [18:52:49] (03PS1) 10MarcoAurelio: Fix typo 'desciption' [puppet] - 10https://gerrit.wikimedia.org/r/597850 (https://phabricator.wikimedia.org/T201491) [18:54:39] (03Abandoned) 10MarcoAurelio: Fix typo 'desciption' [puppet] - 10https://gerrit.wikimedia.org/r/597850 (https://phabricator.wikimedia.org/T201491) (owner: 10MarcoAurelio) [18:55:35] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [18:57:17] (03CR) 10RLazarus: [C: 03+1] "Makes sense, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/597771 (https://phabricator.wikimedia.org/T247018) (owner: 10Dzahn) [18:58:09] Urbanecm: hey im around [18:58:43] no reason to wait, but also no urgency [19:00:53] (03CR) 10Jdlrobson: [C: 03+1] "Note: enwiki is ready for the switch and that's all that's important right now. That said switching it off does have a visual effect so I'" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [19:01:33] (03PS1) 10CRusnov: netbox scriptproxy: Add listen stanza for 8443 for the scriptproxy [puppet] - 10https://gerrit.wikimedia.org/r/597851 (https://phabricator.wikimedia.org/T243927) [19:03:22] (03CR) 10QEDK: [C: 03+1] "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [19:05:24] (03PS3) 10Jdlrobson: Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) [19:07:28] (03CR) 10Dave Pifke: [C: 03+1] "LGTM. (And thanks for quantifying the storage effect, Timo.)" [puppet] - 10https://gerrit.wikimedia.org/r/597839 (https://phabricator.wikimedia.org/T247717) (owner: 10Aaron Schulz) [19:12:26] Jdlrobson: hey, just noticed that. If you want, I can push that - or it can also wait - your call :) [19:32:02] (03CR) 10RLazarus: [C: 03+2] arclamp: set --minwidth to 1 for all SVG flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/597839 (https://phabricator.wikimedia.org/T247717) (owner: 10Aaron Schulz) [19:49:15] (03CR) 10Papaul: [C: 03+2] Partman: Add kubestsge200[1-2] kubernetes200[7-14] [puppet] - 10https://gerrit.wikimedia.org/r/597815 (https://phabricator.wikimedia.org/T252185) (owner: 10Papaul) [19:49:25] (03PS2) 10Papaul: Partman: Add kubestsge200[1-2] kubernetes200[7-14] [puppet] - 10https://gerrit.wikimedia.org/r/597815 (https://phabricator.wikimedia.org/T252185) [19:49:28] (03CR) 10Papaul: [V: 03+2 C: 03+2] Partman: Add kubestsge200[1-2] kubernetes200[7-14] [puppet] - 10https://gerrit.wikimedia.org/r/597815 (https://phabricator.wikimedia.org/T252185) (owner: 10Papaul) [19:50:01] 10Operations, 10SRE-Access-Requests: Request for srv/phab/phabricator/bin/bulk make-silent --id * command via SSH for moving tasks quarterly - https://phabricator.wikimedia.org/T251349 (10MBinder_WMF) Thanks! Will investigate tomorrow, I hope. Currently swamped by a (virtual) offsite. :) [19:53:34] 10Operations, 10Wikimedia-Mailing-lists: Create SDAW-Internal mailing list - https://phabricator.wikimedia.org/T253340 (10CBogen) [19:56:55] (03PS1) 10Papaul: Fix typo for kubernetes2007 to kubernetes2014 [puppet] - 10https://gerrit.wikimedia.org/r/597856 (https://phabricator.wikimedia.org/T252185) [19:59:29] (03CR) 10CRusnov: "I have tested this." [puppet] - 10https://gerrit.wikimedia.org/r/597851 (https://phabricator.wikimedia.org/T243927) (owner: 10CRusnov) [20:00:38] (03PS6) 10Cwhite: add golang 1.13-1 builder image based on golang 1.13 using wikimedia-buster base [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 [20:03:38] (03CR) 10Cwhite: "I take that prior comment back. I ran into a bug that prevented running tests against golang 1.14 on a golang 1.13 targeted project. May" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597299 (owner: 10Cwhite) [20:06:00] (03PS6) 10Cwhite: add loki 1.5.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597317 (https://phabricator.wikimedia.org/T222826) [20:06:30] (03PS7) 10Cwhite: add loki 1.5.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597317 (https://phabricator.wikimedia.org/T222826) [20:06:38] (03CR) 10BrandonXLF: [C: 03+1] Drop enwiki mainpage special casing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/570180 (https://phabricator.wikimedia.org/T32405) (owner: 10Jdlrobson) [20:06:58] (03PS8) 10Cwhite: add loki 1.5.0 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/597317 (https://phabricator.wikimedia.org/T222826) [20:08:33] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [20:19:45] (03PS6) 10Jeena Huneidi: Automate deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) [20:20:14] (03CR) 10Jeena Huneidi: "This change is ready for review." [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) (owner: 10Jeena Huneidi) [20:28:46] 10Operations, 10DC-Ops, 10cloud-services-team (Kanban): labstore1005 A PCIe link training failure error on boot - https://phabricator.wikimedia.org/T169286 (10Bstorm) Managed to get the error after 2 more reboots! ` UEFI0067: A PCIe link training failure is observed in PCIe Slot 6 and the link is disabled.... [20:42:06] 10Operations, 10DC-Ops, 10cloud-services-team (Kanban): labstore1005 A PCIe link training failure error on boot - https://phabricator.wikimedia.org/T169286 (10Bstorm) Hitting F1 and letting it boot resulted in a totally working system. I'm not sure I care enough to fight with it again until refresh. [20:42:56] (03PS2) 10CRusnov: netbox scriptproxy: Add listen stanza for 8443 for the scriptproxy [puppet] - 10https://gerrit.wikimedia.org/r/597851 (https://phabricator.wikimedia.org/T243927) [20:44:39] !log labstore1005 is now running stretch and drbd devices are resyncing after several reboots and some significant effort T224582 [20:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:44:43] T224582: Migrate labstore1004/labstore1005 to Stretch/Buster - https://phabricator.wikimedia.org/T224582 [20:45:00] (03CR) 10CRusnov: [C: 03+2] netbox scriptproxy: Add listen stanza for 8443 for the scriptproxy [puppet] - 10https://gerrit.wikimedia.org/r/597851 (https://phabricator.wikimedia.org/T243927) (owner: 10CRusnov) [20:47:32] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10leila) @bmansurov You can fetch the videos from YT now: Wiki Workshop 2020 - A conversation: https://www.youtube.com/watch?v=YmdS4ZzToU8 Wiki Workshop 2020 - Keynote: https://www.youtube.com/... [20:48:11] (03PS1) 10CDanis: taskgen: invoke tox:adminschema on admin changes [puppet] - 10https://gerrit.wikimedia.org/r/597862 [20:51:31] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [20:53:02] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labstore1004/labstore1005 to Stretch/Buster - https://phabricator.wikimedia.org/T224582 (10Bstorm) Ok, so labstore1005 upgrade notes. - Downtimed the server. - sudo puppet agent --disable "upgrading to stretch [bst... [20:53:27] 10Operations, 10Cloud-VPS (Debian Jessie Deprecation), 10cloud-services-team (Kanban): Migrate labstore1004/labstore1005 to Stretch/Buster - https://phabricator.wikimedia.org/T224582 (10Bstorm) [20:57:17] (03CR) 10CDanis: "For reasons I don't yet understand, I'm not able to validate that this change actually fails an invalid schema with run_ci_locally.sh; how" [puppet] - 10https://gerrit.wikimedia.org/r/597862 (owner: 10CDanis) [20:59:00] (03CR) 10CRusnov: [C: 03+1] "LGTM, this *should* work. However I think that testing the series of API calls against a test host with interfaces and IP addresses would " [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/576461 (owner: 10Volans) [21:04:07] (03CR) 10Jbond: [C: 03+1] "thanks i forgot tox only ran fpr everything when tox.ini was updated" [puppet] - 10https://gerrit.wikimedia.org/r/597862 (owner: 10CDanis) [21:10:20] !log removing two files for legal compliance [21:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:12:08] (03PS1) 10CDanis: run_ci_locally: update to the current docker image version [puppet] - 10https://gerrit.wikimedia.org/r/597863 [21:15:04] (03CR) 10CDanis: [C: 03+2] "After figuring out I62b9d03b I was able to verify locally that 1) tox:adminschema executes and passes on a no-op change to data.yaml, and " [puppet] - 10https://gerrit.wikimedia.org/r/597862 (owner: 10CDanis) [21:16:25] (03CR) 10RLazarus: [C: 03+1] run_ci_locally: update to the current docker image version [puppet] - 10https://gerrit.wikimedia.org/r/597863 (owner: 10CDanis) [21:19:24] (03CR) 10CDanis: [C: 03+2] run_ci_locally: update to the current docker image version [puppet] - 10https://gerrit.wikimedia.org/r/597863 (owner: 10CDanis) [21:41:12] (03PS1) 10Bstorm: labstore: the monitor_systemd_service module doesn't work with drbd [puppet] - 10https://gerrit.wikimedia.org/r/597868 (https://phabricator.wikimedia.org/T224582) [21:41:40] (03PS1) 10Cmjohnson: Removing mgmt for the asset tag associated with analytics1032 [dns] - 10https://gerrit.wikimedia.org/r/597869 (https://phabricator.wikimedia.org/T233080) [21:42:35] (03CR) 10Thcipriani: Automate deployments (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597653 (https://phabricator.wikimedia.org/T253264) (owner: 10Jeena Huneidi) [21:42:51] 10Operations, 10Wikimedia-Mailing-lists: Create SDAW-Internal mailing list - https://phabricator.wikimedia.org/T253340 (10RLazarus) 05Open→03Resolved a:03RLazarus All set! Info page: https://lists.wikimedia.org/mailman/listinfo/sdaw-internal Admin page: https://lists.wikimedia.org/mailman/admin/sdaw-int... [21:44:53] (03CR) 10Cmjohnson: [C: 03+2] Removing mgmt for the asset tag associated with analytics1032 [dns] - 10https://gerrit.wikimedia.org/r/597869 (https://phabricator.wikimedia.org/T233080) (owner: 10Cmjohnson) [21:46:01] (03PS2) 10Cmjohnson: Removing mgmt for the asset tag associated with analytics1032 [dns] - 10https://gerrit.wikimedia.org/r/597869 (https://phabricator.wikimedia.org/T233080) [21:46:04] (03CR) 10Cmjohnson: [V: 03+2 C: 03+2] Removing mgmt for the asset tag associated with analytics1032 [dns] - 10https://gerrit.wikimedia.org/r/597869 (https://phabricator.wikimedia.org/T233080) (owner: 10Cmjohnson) [21:48:51] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission analytics1032 - https://phabricator.wikimedia.org/T233080 (10Cmjohnson) [21:49:24] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: Decommission analytics1032 - https://phabricator.wikimedia.org/T233080 (10Cmjohnson) 05Open→03Resolved removed from rack, mgmt dns removed, switch ports were already removed. Updated netbox [21:49:26] 10Operations, 10ops-eqiad, 10decommission: Decommission analytics10[28-31,33-41] - https://phabricator.wikimedia.org/T227485 (10Cmjohnson) [22:01:13] (03PS1) 10Bstorm: labstore: current setup doesn't allow check_call against exportfs [puppet] - 10https://gerrit.wikimedia.org/r/597873 (https://phabricator.wikimedia.org/T224582) [22:02:07] (03PS1) 10BryanDavis: d/changelog: prepare for 0.70 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/597875 [22:02:09] (03CR) 10Bstorm: [C: 03+2] labstore: the monitor_systemd_service module doesn't work with drbd [puppet] - 10https://gerrit.wikimedia.org/r/597868 (https://phabricator.wikimedia.org/T224582) (owner: 10Bstorm) [22:03:02] (03CR) 10BryanDavis: [C: 03+1] "as far as bandaids go, this should work" [puppet] - 10https://gerrit.wikimedia.org/r/597873 (https://phabricator.wikimedia.org/T224582) (owner: 10Bstorm) [22:03:25] (03CR) 10Bstorm: [C: 03+2] labstore: current setup doesn't allow check_call against exportfs [puppet] - 10https://gerrit.wikimedia.org/r/597873 (https://phabricator.wikimedia.org/T224582) (owner: 10Bstorm) [22:03:40] (03CR) 10BryanDavis: [C: 03+2] d/changelog: prepare for 0.70 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/597875 (owner: 10BryanDavis) [22:04:22] (03Merged) 10jenkins-bot: d/changelog: prepare for 0.70 release [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/597875 (owner: 10BryanDavis) [22:37:55] (03PS1) 10Cmjohnson: Removing mgmt dns entries of asset tags associated w/labsdb1006/7 [dns] - 10https://gerrit.wikimedia.org/r/597882 (https://phabricator.wikimedia.org/T220144) [22:38:59] (03PS2) 10Cmjohnson: Removing mgmt dns entries of asset tags associated w/labsdb1006/7 [dns] - 10https://gerrit.wikimedia.org/r/597882 (https://phabricator.wikimedia.org/T220144) [22:40:49] (03CR) 10Cmjohnson: [C: 03+2] Removing mgmt dns entries of asset tags associated w/labsdb1006/7 [dns] - 10https://gerrit.wikimedia.org/r/597882 (https://phabricator.wikimedia.org/T220144) (owner: 10Cmjohnson) [22:41:56] 10Operations, 10ops-eqiad, 10Data-Services, 10decommission, 10Patch-For-Review: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet - https://phabricator.wikimedia.org/T220144 (10Cmjohnson) [22:42:15] 10Operations, 10ops-eqiad, 10Data-Services, 10decommission, 10Patch-For-Review: Decommission labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet - https://phabricator.wikimedia.org/T220144 (10Cmjohnson) 05Open→03Resolved Removed from rack, netbox updated [22:46:37] (03PS1) 10Bstorm: paws-k8s: set some volumes up [puppet] - 10https://gerrit.wikimedia.org/r/597884 (https://phabricator.wikimedia.org/T211096) [22:53:00] 10Operations, 10Commons, 10SRE-swift-storage, 10Thumbor, 10Wikimedia-SVG-rendering: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141 (10kaldari) 05Open→03Declined @AntiCompositeNumber - You've convinced me. [23:00:04] RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200521T2300). Please do the needful. [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:01:40] (03CR) 10Cwhite: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [23:15:38] (03PS1) 10Cmjohnson: Removing mgmt dns asset tags associated w/elastic1017-1020 [dns] - 10https://gerrit.wikimedia.org/r/597890 (https://phabricator.wikimedia.org/T239821) [23:17:32] 10Operations, 10DC-Ops, 10decommission, 10Discovery-Search (Current work), 10Patch-For-Review: decommission elastic10[18-31].eqiad.wmnet - https://phabricator.wikimedia.org/T239821 (10Cmjohnson) 05Resolved→03Open this has not been completed [23:17:39] (03PS2) 10Cmjohnson: Removing mgmt dns asset tags associated w/elastic1017-1020 [dns] - 10https://gerrit.wikimedia.org/r/597890 (https://phabricator.wikimedia.org/T239821) [23:22:52] (03CR) 10Cmjohnson: [C: 03+2] Removing mgmt dns asset tags associated w/elastic1017-1020 [dns] - 10https://gerrit.wikimedia.org/r/597890 (https://phabricator.wikimedia.org/T239821) (owner: 10Cmjohnson) [23:23:46] (03CR) 10Papaul: [C: 03+2] Fix typo for kubernetes2007 to kubernetes2014 [puppet] - 10https://gerrit.wikimedia.org/r/597856 (https://phabricator.wikimedia.org/T252185) (owner: 10Papaul) [23:24:47] (03PS1) 10Cmjohnson: Removing mgmt dns for asset tags associated w/elastic1020-1031 [dns] - 10https://gerrit.wikimedia.org/r/597894 (https://phabricator.wikimedia.org/T239821) [23:25:55] * Krinkle staging on mwdebug1002 [23:26:30] (03PS1) 10Aaron Schulz: Set "coalesceKeys" to "non-global" for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597895 (https://phabricator.wikimedia.org/T252564) [23:26:41] Krinkle: ^ [23:27:21] AaronSchulz: OK :) - rolling two wmf.32 patches as well while at it [23:27:26] doing yours first since they'll merge quicker [23:27:29] nice [23:27:39] AaronSchulz: the puppet change is live and hourly/daily graphs are updated. [23:27:52] are you pushing the SubmitAction change as wlel, or still working on it? no problem :) [23:28:01] (03CR) 10Krinkle: [C: 03+2] Set "coalesceKeys" to "non-global" for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597895 (https://phabricator.wikimedia.org/T252564) (owner: 10Aaron Schulz) [23:28:59] (03Merged) 10jenkins-bot: Set "coalesceKeys" to "non-global" for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597895 (https://phabricator.wikimedia.org/T252564) (owner: 10Aaron Schulz) [23:30:06] AaronSchulz: staged on mwdebug1002 [23:36:09] (03PS2) 10Cmjohnson: Removing mgmt dns for asset tags associated w/elastic1020-1031 [dns] - 10https://gerrit.wikimedia.org/r/597894 (https://phabricator.wikimedia.org/T239821) [23:36:46] (03CR) 10Krinkle: "No, I'm asking whether within the mtail software as it exists, whether there is any way at all to do this in one go without it triggering " [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [23:37:06] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, and 2 others: decommission elastic10[18-31].eqiad.wmnet - https://phabricator.wikimedia.org/T239821 (10Cmjohnson) [23:38:20] 10Operations, 10ops-eqiad, 10decommission: decommission elastic1017 - https://phabricator.wikimedia.org/T234045 (10Cmjohnson) [23:38:26] 10Operations, 10ops-eqiad, 10decommission: decommission elastic1017 - https://phabricator.wikimedia.org/T234045 (10Cmjohnson) 05Open→03Resolved [23:38:30] Krinkle: looks OK [23:38:36] profiling percents are still messed up btw [23:38:46] at least in view-source [23:39:26] aye yeah. [23:39:30] There's a ticket for that [23:40:45] (03PS2) 10Krinkle: mc-labs: Remove unused wan/purge config from Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597662 [23:40:52] AaronSchulz: is that ok too? ^ [23:41:02] !log krinkle@deploy1001 Synchronized wmf-config/mc.php: I222457729a5b (duration: 01m 08s) [23:41:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:44:25] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, and 2 others: decommission elastic10[18-31].eqiad.wmnet - https://phabricator.wikimedia.org/T239821 (10Cmjohnson) [23:45:06] Krinkle: haha, sure [23:45:16] (03CR) 10Krinkle: [C: 03+2] mc-labs: Remove unused wan/purge config from Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597662 (owner: 10Krinkle) [23:46:01] (03Merged) 10jenkins-bot: mc-labs: Remove unused wan/purge config from Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/597662 (owner: 10Krinkle) [23:47:15] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/extensions/LiquidThreads/classes/Thread.php: If3418cba06e (duration: 01m 07s) [23:47:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:31] 10Operations, 10Research: Add Git LFS support for research/wikiworkshop - https://phabricator.wikimedia.org/T252956 (10bmansurov) 05Open→03Resolved Thanks, @leila! [23:54:29] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/includes/content/ContentHandlerFactory.php: If578893f5689 (duration: 01m 06s) [23:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log