[00:00:04] RoanKattouw, Niharika, and Urbanecm: Dear deployers, time to do the Evening backport window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T0000). [00:00:05] No GERRIT patches in the queue for this window AFAICS. [00:01:02] (03CR) 10Bstorm: [C: 03+1] "Seems legit." [puppet] - 10https://gerrit.wikimedia.org/r/651550 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [00:13:02] (03PS4) 10Dave Pifke: profiler: remove MongoDB client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621095 (https://phabricator.wikimedia.org/T180761) [00:21:26] 10Operations, 10SRE-Access-Requests: convert Maya Kampurath to full-time employee - https://phabricator.wikimedia.org/T271169 (10Dzahn) @MoritzMuehlenhoff ACK, will do. The one for "Jim Maddock" ist not extended yet though. I told them to let you know the new expiry date once they have it. [00:32:17] 10Operations, 10Inuka-Team, 10SRE-Access-Requests, 10Security-Team, 10Product-Analytics (Kanban): Provide raw KaiOSAppFeedback data to Chelsea Riley for analysis - https://phabricator.wikimedia.org/T271202 (10Dzahn) >>! In T271202#6722852, @nshahquinn-wmf wrote: >Getting and setting up private data acces... [00:34:47] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:46:06] (03PS2) 10Dzahn: openldap::management: hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/651838 (https://phabricator.wikimedia.org/T651838) [00:47:03] (03CR) 10Dzahn: [C: 03+2] "Another one that is very simple and only applied on mwmaint* and is noop." [puppet] - 10https://gerrit.wikimedia.org/r/651838 (https://phabricator.wikimedia.org/T651838) (owner: 10Dzahn) [00:49:53] (03PS3) 10Dzahn: openldap::management: hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/651838 (https://phabricator.wikimedia.org/T651838) [00:51:27] PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4478389408 and 393 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:55] PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 4258420904 and 406 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:53:58] (03PS1) 10Bstorm: wikireplicas: add a multiinstance role for the dedicated analytics host [puppet] - 10https://gerrit.wikimedia.org/r/654558 (https://phabricator.wikimedia.org/T269211) [00:54:33] RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 102512 and 341 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:55:03] RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 37512 and 371 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:58:07] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) @hashar I created new buster VMs... [01:01:33] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Dzahn) Ah, I was just on the "next" depl... [01:06:59] PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 181467416 and 7 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:07:26] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10Dzahn) [01:07:53] PROBLEM - Postgres Replication Lag on maps2009 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 147845168 and 7 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:08:05] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 302593136 and 14 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:08:37] PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 359469792 and 21 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:09:31] RECOVERY - Postgres Replication Lag on maps2009 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 65696 and 38 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:09:43] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 29744 and 50 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:09:54] 10Operations, 10Parsoid-Tests, 10serviceops, 10Parsoid (Tracking), 10Patch-For-Review: Make testreduce web UI publicly accessible on the internet - https://phabricator.wikimedia.org/T266509 (10Dzahn) @ssastry ACK, only "rt" no "vd" needed. Adjusted the patches accordingly. Regarding the database: - I... [01:10:15] RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 24144 and 82 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:10:15] RECOVERY - Postgres Replication Lag on maps2005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 84 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:22:15] !log testreduce1001 rm -rf /srv/deployment/parsoid/deploy [01:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:32:24] (03PS1) 10Dzahn: parsoid::testing: switch db_host from m5-master to localhost [puppet] - 10https://gerrit.wikimedia.org/r/654565 (https://phabricator.wikimedia.org/T266509) [02:48:01] (03PS1) 10Andrew Bogott: nova vendordata.txt: move some repo and package actions into cloud-init rules [puppet] - 10https://gerrit.wikimedia.org/r/654569 (https://phabricator.wikimedia.org/T271273) [03:29:26] (03PS2) 10Andrew Bogott: nova vendordata.txt: move some repo and package actions into cloud-init rules [puppet] - 10https://gerrit.wikimedia.org/r/654569 (https://phabricator.wikimedia.org/T271273) [03:53:55] PROBLEM - Check systemd state on ms-be2051 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:07:20] (03CR) 10Andrew Bogott: [C: 03+2] nova vendordata.txt: move some repo and package actions into cloud-init rules [puppet] - 10https://gerrit.wikimedia.org/r/654569 (https://phabricator.wikimedia.org/T271273) (owner: 10Andrew Bogott) [04:11:17] RECOVERY - Check systemd state on ms-be2051 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:16:51] (03PS1) 10Andrew Bogott: nova-api: restart service if vendor_data.json changes [puppet] - 10https://gerrit.wikimedia.org/r/654571 (https://phabricator.wikimedia.org/T271273) [04:17:42] (03CR) 10Andrew Bogott: [C: 03+2] nova-api: restart service if vendor_data.json changes [puppet] - 10https://gerrit.wikimedia.org/r/654571 (https://phabricator.wikimedia.org/T271273) (owner: 10Andrew Bogott) [04:39:42] RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:56:16] RECOVERY - exim queue on mx1001 is OK: OK: Less than 2000 mails in exim queue. https://wikitech.wikimedia.org/wiki/Exim [06:59:30] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:01:10] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:17:23] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/654515 (owner: 10CRusnov) [07:30:29] (03CR) 10Muehlenhoff: [C: 03+2] Enable base::service_auto_restart for Apache on Netmon [puppet] - 10https://gerrit.wikimedia.org/r/654429 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [08:29:02] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Add a images summary command [puppet] - 10https://gerrit.wikimedia.org/r/651166 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:29:05] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Add a method to create a vm backup [puppet] - 10https://gerrit.wikimedia.org/r/651507 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:29:35] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Remove all dangling snapshots [puppet] - 10https://gerrit.wikimedia.org/r/651537 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:29:45] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Add a way to remove old backups and snapshots [puppet] - 10https://gerrit.wikimedia.org/r/651550 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:30:07] (03CR) 10David Caro: [C: 03+2] wmcs.backup: Add command to backup all assigned vms [puppet] - 10https://gerrit.wikimedia.org/r/651761 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:30:21] (03CR) 10David Caro: [C: 03+2] wmcs.backup: add a command to remove non-handled backups [puppet] - 10https://gerrit.wikimedia.org/r/651776 (https://phabricator.wikimedia.org/T267195) (owner: 10David Caro) [08:35:48] (03CR) 10JMeybohm: [C: 03+1] Redirect top level URL to https://dockerregistry.toolforge.org/ [puppet] - 10https://gerrit.wikimedia.org/r/650215 (https://phabricator.wikimedia.org/T179696) (owner: 10Ahmon Dancy) [08:40:58] !log installing Linux 4.9.246 on stretch hosts (no reboots yet) [08:41:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:16] (03CR) 10JMeybohm: [C: 04-1] Add new service eventstreams-internal (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/644612 (https://phabricator.wikimedia.org/T269160) (owner: 10Ottomata) [08:49:36] PROBLEM - exim queue on mx1001 is CRITICAL: CRITICAL: 4213 mails in exim queue. https://wikitech.wikimedia.org/wiki/Exim [08:56:05] (03PS4) 10David Caro: wmcs.backup: Add a command to create the next backup [puppet] - 10https://gerrit.wikimedia.org/r/654220 (https://phabricator.wikimedia.org/T267195) [08:56:07] (03PS4) 10David Caro: wmcs.backup: Add host to the rbd snapshot name [puppet] - 10https://gerrit.wikimedia.org/r/654221 (https://phabricator.wikimedia.org/T267195) [08:56:09] (03PS4) 10David Caro: wmcs.backup: Add backup_image command [puppet] - 10https://gerrit.wikimedia.org/r/654266 (https://phabricator.wikimedia.org/T270478) [09:00:26] (03Abandoned) 10David Caro: wmcs.backup: blacked all files [puppet] - 10https://gerrit.wikimedia.org/r/654267 (owner: 10David Caro) [09:01:32] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Migrate WDQS to Debian Buster - https://phabricator.wikimedia.org/T244753 (10Gehel) 05Open→03Resolved [09:01:38] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Gehel) [09:45:20] PROBLEM - Host db2140 is DOWN: PING CRITICAL - Packet loss = 100% [10:08:40] (03PS1) 10ZPapierski: Fix deployment for internal wdqs hosts [puppet] - 10https://gerrit.wikimedia.org/r/654597 [10:09:18] (03PS1) 10Gehel: wdqs: the wdqs DSH group should contain all wdqs servers [puppet] - 10https://gerrit.wikimedia.org/r/654598 [10:10:34] (03CR) 10ZPapierski: [C: 03+1] wdqs: the wdqs DSH group should contain all wdqs servers [puppet] - 10https://gerrit.wikimedia.org/r/654598 (owner: 10Gehel) [10:10:49] (03CR) 10Gehel: [C: 03+2] wdqs: the wdqs DSH group should contain all wdqs servers [puppet] - 10https://gerrit.wikimedia.org/r/654598 (owner: 10Gehel) [10:21:44] 10Operations, 10Abstract Wikipedia, 10LDAP-Access-Requests: Grant Access to ldap/wmf for Cory Massaro - https://phabricator.wikimedia.org/T271245 (10Aklapper) Hi and welcome! > Purpose: Gerrit access Anyone [can self-create an account to get Gerrit access](https://www.mediawiki.org/wiki/Developer_account),... [10:29:10] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10MoritzMuehlenhoff) 05Resolved→03Open [10:29:13] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10MoritzMuehlenhoff) The server went down again: ` Record: 1 Date/Time: 01/04/2021 15:42:11 Source: system Severity: Ok Description: Log cleared. ----------------------------... [10:38:15] 10Operations, 10ops-eqiad: Interface errors between pfw3a-eqiad and fasw-c1a-eqiad - https://phabricator.wikimedia.org/T271295 (10ayounsi) p:05Triage→03High [10:38:55] !log depooling db2140 T271084 [10:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:00] T271084: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 [10:40:30] !log jmm@cumin2001 dbctl commit (dc=all): 'Depool db2140', diff saved to https://phabricator.wikimedia.org/P13658 and previous config saved to /var/cache/conftool/dbconfig/20210106-104029-jmm.json [10:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:37] 10Operations, 10ops-codfw, 10DBA: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10MoritzMuehlenhoff) Given the flaky state of the hardware I'm not powering up the server again, db2140 was depooled with dbctl. @papaul: Since the faulty DIMM moved around we should be el... [10:47:03] 10Operations, 10netops, 10observability: Add Icinga check for SRX cluster status - https://phabricator.wikimedia.org/T271298 (10ayounsi) p:05Triage→03Medium [11:10:48] (03PS3) 10Urbanecm: Add uz.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) [11:14:53] !log remove cloudceph2002-dev.wikimedia.org and cloudceph2003-dev.wikimedia.org from debmonitor (got reinstalled as .wmnet) [11:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:41] !log installing libjpeg-turbo security updates on buster [11:18:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:14] (03CR) 10Klausman: [C: 03+2] cache: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/651640 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [11:33:04] (03CR) 10Klausman: [C: 03+2] hive: Migrate hiera() to lookup() and setting datatype in metastore [puppet] - 10https://gerrit.wikimedia.org/r/654521 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [11:33:07] !log installing ruby2.5 security updates [11:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: May I have your attention please! European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1200) [12:00:04] Jayprakash12345: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:16] o/ [12:01:01] Jayprakash12345: Hi, I can deploy this. [12:01:13] awight: okay, I'll let you do this :) [12:01:21] :-) [12:01:30] awight: Please go ahead :) [12:04:24] (03CR) 10Awight: [C: 03+2] "Config deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654005 (https://phabricator.wikimedia.org/T270864) (owner: 10Jayprakash12345) [12:05:17] (03Merged) 10jenkins-bot: Create rollbacker group on mrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654005 (https://phabricator.wikimedia.org/T270864) (owner: 10Jayprakash12345) [12:06:49] I have enabled mwdebug1002, let me know when to test. [12:07:23] Thanks, I'm looking at some other pending changes which haven't been deployed yet. [12:07:41] okay, they were labs-only. [12:07:53] meh, people not updating deploy1001 :( [12:09:33] Jayprakash12345: The change is deployed to mwdebug1002. [12:09:51] awight: Thanks, checking ... [12:10:28] awight: Looks good to me, please deploy it [12:10:32] ack [12:11:56] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:654005|Create rollbacker group on mrwiki (T270864)]] (duration: 01m 21s) [12:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:01] T270864: Create Rollbackers usergroup on Marathi Wikipedia - https://phabricator.wikimedia.org/T270864 [12:21:13] 10Operations, 10serviceops, 10cloud-services-team (Kanban): Upgrade labweb servers to buster - https://phabricator.wikimedia.org/T269004 (10MoritzMuehlenhoff) >>! In T269004#6720008, @Andrew wrote: > @jijiki, my tests suggest that this upgrade will go smoothly. If you judge MW to be mostly ready for Buster t... [12:38:54] (03PS3) 10WMDE-Fisch: Migrate TemplateWizard to full "new" events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650093 (https://phabricator.wikimedia.org/T238230) (owner: 10Awight) [12:39:04] (03CR) 10jerkins-bot: [V: 04-1] Migrate TemplateWizard to full "new" events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650093 (https://phabricator.wikimedia.org/T238230) (owner: 10Awight) [12:40:12] (03PS4) 10WMDE-Fisch: Migrate TemplateWizard to full "new" events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650093 (https://phabricator.wikimedia.org/T238230) (owner: 10Awight) [12:40:23] (03CR) 10jerkins-bot: [V: 04-1] Migrate TemplateWizard to full "new" events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/650093 (https://phabricator.wikimedia.org/T238230) (owner: 10Awight) [13:03:47] !log installing tcpdump security updates [13:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:57] (03PS1) 10Gehel: wdqs: fix DSH group for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/654606 [13:05:04] (03CR) 10ZPapierski: [C: 03+1] wdqs: fix DSH group for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/654606 (owner: 10Gehel) [13:05:21] (03CR) 10Gehel: [C: 03+2] wdqs: fix DSH group for WDQS [puppet] - 10https://gerrit.wikimedia.org/r/654606 (owner: 10Gehel) [13:08:31] 10Operations: Integrate Buster 10.7 point update - https://phabricator.wikimedia.org/T269558 (10MoritzMuehlenhoff) [13:09:06] (03PS1) 10Klausman: Add tentative recipe for ml-serve machines [puppet] - 10https://gerrit.wikimedia.org/r/654609 [13:11:46] (03CR) 10Klausman: [C: 03+2] Add tentative recipe for ml-serve machines [puppet] - 10https://gerrit.wikimedia.org/r/654609 (owner: 10Klausman) [13:17:31] (03CR) 10Muehlenhoff: [C: 03+2] Add some more Cumin aliases [puppet] - 10https://gerrit.wikimedia.org/r/654440 (owner: 10Muehlenhoff) [13:20:34] (03PS1) 10Klausman: Whitespace fixes and add missing backslash [puppet] - 10https://gerrit.wikimedia.org/r/654611 [13:21:05] (03PS2) 10Klausman: Whitespace fixes and add missing backslash [puppet] - 10https://gerrit.wikimedia.org/r/654611 [13:30:01] 10Operations, 10Traffic: ats-be occasional system CPU usage increase - https://phabricator.wikimedia.org/T265625 (10ema) It turns out that `malloc(3)` does not really say the whole truth: the threshold for choosing when to use mmap vs brk is dynamic, and not hardcoded to 128K. For instance I found values of ~4... [13:54:44] 10Operations, 10serviceops, 10Wikimedia-production-error: PHP7 corruption reports in 2020-2021 (Call on wrong object, etc.) - https://phabricator.wikimedia.org/T245183 (10jijiki) >>! In T245183#6724069, @hashar wrote: > That happened during 1.36.0-wmf.25 promotion to testwiki. We then had three servers showi... [14:00:04] longma and hashar: My dear minions, it's time we take the moon! Just kidding. Time for Mediawiki train - American+European Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1400). [14:01:50] PROBLEM - Host kafka-jumbo1009.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:05:35] (03CR) 10Dzahn: "thanks for these :)" [puppet] - 10https://gerrit.wikimedia.org/r/654521 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [14:06:46] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:07:24] RECOVERY - Host kafka-jumbo1009.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [14:15:34] (03CR) 10Ottomata: Add new service eventstreams-internal (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/644612 (https://phabricator.wikimedia.org/T269160) (owner: 10Ottomata) [14:15:39] (03PS5) 10Ottomata: Add new service eventstreams-internal [deployment-charts] - 10https://gerrit.wikimedia.org/r/644612 (https://phabricator.wikimedia.org/T269160) [14:17:26] (03CR) 10Ladsgroup: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/654521 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [14:20:38] (03CR) 10Klausman: [C: 03+2] Whitespace fixes and add missing backslash [puppet] - 10https://gerrit.wikimedia.org/r/654611 (owner: 10Klausman) [14:26:32] PROBLEM - ElasticSearch health check for shards on 9200 on logstash1009 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7fd192472518: Failed to establish a new connection: [Errno 111] Connection [14:26:32] ://wikitech.wikimedia.org/wiki/Search%23Administration [14:26:52] PROBLEM - Check systemd state on logstash1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:37:24] (03PS1) 10JMeybohm: Update to v3.17.1 [debs/calico] - 10https://gerrit.wikimedia.org/r/654620 [14:42:32] PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:48:38] RECOVERY - k8s API server requests latencies on kubestagemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:48:41] 10Operations, 10Project-Admins: Rename #Operations Phab project to #SRE - https://phabricator.wikimedia.org/T258305 (10Aklapper) [14:55:06] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:18] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:55] RECOVERY - Check systemd state on logstash1009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:01:24] !log installing p11-kit security updates on stretch [15:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:04] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:02:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:03] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:04:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:21] RECOVERY - ElasticSearch health check for shards on 9200 on logstash1009 is OK: OK - elasticsearch status production-logstash-eqiad: cluster_name: production-logstash-eqiad, active_primary_shards: 483, timed_out: False, delayed_unassigned_shards: 0, status: green, initializing_shards: 0, number_of_data_nodes: 3, number_of_pending_tasks: 0, number_of_nodes: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, acti [15:04:21] _as_number: 100.0, relocating_shards: 0, unassigned_shards: 0, active_shards: 916 https://wikitech.wikimedia.org/wiki/Search%23Administration [15:08:11] (03PS1) 10Klausman: netboot: Fix missing ;; [puppet] - 10https://gerrit.wikimedia.org/r/654628 [15:09:49] (03CR) 10Klausman: [C: 03+2] netboot: Fix missing ;; [puppet] - 10https://gerrit.wikimedia.org/r/654628 (owner: 10Klausman) [15:17:44] (03PS1) 10Klausman: netboot: Fix missing 'r' [puppet] - 10https://gerrit.wikimedia.org/r/654630 [15:18:26] (03CR) 10Klausman: [C: 03+2] netboot: Fix missing 'r' [puppet] - 10https://gerrit.wikimedia.org/r/654630 (owner: 10Klausman) [15:20:13] !log restarting FPM/Apache on mw canaries to pick up p11-kit update [15:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:42] (03PS2) 10JMeybohm: Update to v3.17.1 [debs/calico] - 10https://gerrit.wikimedia.org/r/654620 [15:22:00] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] Update to v3.17.1 [debs/calico] - 10https://gerrit.wikimedia.org/r/654620 (owner: 10JMeybohm) [15:22:40] (03PS1) 10JMeybohm: admin_ng: Set global calico version to 3.17.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/654631 [15:26:24] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:26:37] (03PS1) 10Klausman: netboot: Fix missing 'custom/' [puppet] - 10https://gerrit.wikimedia.org/r/654632 [15:27:16] (03CR) 10Klausman: [C: 03+2] netboot: Fix missing 'custom/' [puppet] - 10https://gerrit.wikimedia.org/r/654632 (owner: 10Klausman) [15:27:30] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:28:01] !log imported calico 3.17.1-1 to component/calico-future stretch-wikimedia [15:28:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:37] (03PS1) 10WMDE-Fisch: Enable bracket matching on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654633 (https://phabricator.wikimedia.org/T271293) [15:40:36] (03CR) 10Bstorm: "I *think*, as long as the resulting configs are not invalid, this should be safe enough to merge at this point because it should not touch" [puppet] - 10https://gerrit.wikimedia.org/r/627379 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm) [15:42:20] (03PS3) 10Clarakosi: Remove OAuth experimental routes from beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644553 (https://phabricator.wikimedia.org/T262495) [15:44:22] (03CR) 10Clarakosi: [C: 03+1] "Other changes have landed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644553 (https://phabricator.wikimedia.org/T262495) (owner: 10Clarakosi) [15:44:24] (03PS1) 10Ladsgroup: eventschemas: Migrate hiera() to lookup() and setting datatype [puppet] - 10https://gerrit.wikimedia.org/r/654635 (https://phabricator.wikimedia.org/T209953) [15:47:27] (03CR) 10Ladsgroup: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/27363/" [puppet] - 10https://gerrit.wikimedia.org/r/654635 (https://phabricator.wikimedia.org/T209953) (owner: 10Ladsgroup) [15:54:09] (03PS1) 10Ppchelko: Return back accidentally removed ParserCache 'hit' metric [core] (wmf/1.36.0-wmf.25) - 10https://gerrit.wikimedia.org/r/654456 [15:54:12] PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:54:14] !log installing openexr security updates [15:54:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:24] (03CR) 10Razzi: [C: 03+2] Set fs.permissions.umask-mode for the Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/654187 (https://phabricator.wikimedia.org/T270629) (owner: 10Elukey) [15:57:18] (03CR) 10JMeybohm: [C: 03+2] admin_ng: Set global calico version to 3.17.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/654631 (owner: 10JMeybohm) [15:57:28] RECOVERY - k8s API server requests latencies on kubestagemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:58:41] (03Merged) 10jenkins-bot: admin_ng: Set global calico version to 3.17.1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/654631 (owner: 10JMeybohm) [16:01:37] !log installing cups security updates on buster (client-side tools/libs) [16:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:33] !log razzi@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters [16:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:25] (03PS1) 10JMeybohm: Check for already published images before pushing [debs/calico] - 10https://gerrit.wikimedia.org/r/654637 [16:15:12] !log jayme@deploy1001 helmfile [staging-codfw] START helmfile.d/admin 'sync'. [16:15:12] !log jayme@deploy1001 helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [16:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:41] ACKNOWLEDGEMENT - SSH on db2140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Muehlenhoff T271084 https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:15:41] ACKNOWLEDGEMENT - Host db2140 is DOWN: PING CRITICAL - Packet loss = 100% Muehlenhoff T271084 [16:18:04] 10ops-codfw, 10SRE, 10SRE-swift-storage: Degraded RAID on ms-be2055 - https://phabricator.wikimedia.org/T271055 (10Papaul) a:05Papaul→03fgiunchedi @fgiunchedi disk replaced [16:22:04] RECOVERY - HP RAID on ms-be2055 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 3I:3:1, 3I:3:2, 3I:3:3, 3I:3:4, 4I:5:1, 4I:5:2 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [16:25:52] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:27:24] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:28:20] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1026, mc2026 to buster [puppet] - 10https://gerrit.wikimedia.org/r/654639 (https://phabricator.wikimedia.org/T213089) [16:31:32] !log jayme@deploy1001 helmfile [staging-codfw] START helmfile.d/admin 'sync'. [16:31:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:50] !log jayme@deploy1001 helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [16:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:05] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1026, mc2026 to buster [puppet] - 10https://gerrit.wikimedia.org/r/654639 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [16:42:47] !log razzi@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) [16:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:58] PROBLEM - MediaWiki memcached error rate on alert1001 is CRITICAL: 5168 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:46:36] RECOVERY - MediaWiki memcached error rate on alert1001 is OK: (C)5000 gt (W)1000 gt 14 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:51:47] ~. [16:56:30] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1026.eqiad.wmnet with reason: REIMAGE [16:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:20] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2026.codfw.wmnet with reason: REIMAGE [16:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:34] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1026.eqiad.wmnet with reason: REIMAGE [16:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:40] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2026.codfw.wmnet with reason: REIMAGE [17:00:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:45] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10calbon) @Jclark-ctr Thanks for the photos. For clarification, does this mean the GPU does not fit or that the GPU does fit but any future GPU's order should be smaller in dimensions? [17:24:50] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10Jclark-ctr) Gpu dose not fit. [17:26:24] (03PS1) 10Ottomata: Adjust refine_event memory and parallelism [puppet] - 10https://gerrit.wikimedia.org/r/654650 [17:26:56] !log depooling labweb1002 for rebuild [17:26:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:05] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10calbon) @Jclark-ctr Got it. Thanks. @RobH @wiki_willy Can we set up a call to discuss next steps? [17:30:15] (03CR) 10Ayounsi: [C: 03+1] "Overall LGTM, but I don't know logstash enough to review it in details." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/647029 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [17:31:07] (03CR) 10Ayounsi: [C: 03+1] profile: update netdev rsyslog template to ecs 1.7.0 [puppet] - 10https://gerrit.wikimedia.org/r/647032 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [17:33:07] (03CR) 10Cicalese: [C: 03+2] Remove OAuth experimental routes from beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644553 (https://phabricator.wikimedia.org/T262495) (owner: 10Clarakosi) [17:33:57] (03Merged) 10jenkins-bot: Remove OAuth experimental routes from beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644553 (https://phabricator.wikimedia.org/T262495) (owner: 10Clarakosi) [17:42:21] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) >>! In T267050#6726182, @calbon wrote: > @Jclark-ctr Got it. Thanks. > > @RobH @wiki_willy Can we set up a call to discuss next steps? Sure. Additionally, my understanding... [17:51:32] !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE [17:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:33] !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE [17:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:02] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) Additionally I've created T271351 to determine if there is another version of this GPU chipset via Rahi that may have a differing fan/heatsink assembly that would suit our nee... [17:55:07] (03PS1) 10Andrew Bogott: cloudweb mcrouter config: use AllFastestRoute for write actions [puppet] - 10https://gerrit.wikimedia.org/r/654654 (https://phabricator.wikimedia.org/T271349) [17:56:57] (03CR) 10Andrew Bogott: [C: 03+2] cloudweb mcrouter config: use AllFastestRoute for write actions [puppet] - 10https://gerrit.wikimedia.org/r/654654 (https://phabricator.wikimedia.org/T271349) (owner: 10Andrew Bogott) [18:04:59] (03PS1) 10Andrew Bogott: cloudweb mcrouter config: use LatestRoute for read actions [puppet] - 10https://gerrit.wikimedia.org/r/654655 (https://phabricator.wikimedia.org/T271349) [18:05:13] (03PS2) 10Andrew Bogott: cloudweb mcrouter config: use LatestRoute for read actions [puppet] - 10https://gerrit.wikimedia.org/r/654655 (https://phabricator.wikimedia.org/T271349) [18:08:11] (03CR) 10Andrew Bogott: [C: 03+2] cloudweb mcrouter config: use LatestRoute for read actions [puppet] - 10https://gerrit.wikimedia.org/r/654655 (https://phabricator.wikimedia.org/T271349) (owner: 10Andrew Bogott) [18:14:04] (03CR) 10Dzahn: [C: 03+2] "https://meta.wikimedia.org/wiki/User:MKaur_(WMF)" [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [18:14:46] !log creating uz.wikimedia.org - Uzbek language User Group - https://meta.wikimedia.org/wiki/Affiliations_Committee/Resolutions/Wikimedians_of_the_Uzbek_language_User_Group [18:14:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:51] (03CR) 10Dzahn: "uz.wikimedia.org is an alias for dyna.wikimedia.org." [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [18:22:59] (03PS2) 10Dzahn: etcd::v3: hiera->lookup [puppet] - 10https://gerrit.wikimedia.org/r/651834 (https://phabricator.wikimedia.org/T209953) [18:24:12] (03CR) 10Dzahn: [C: 04-1] "I would prefer if we skip this step and go straight to https://gerrit.wikimedia.org/r/c/operations/puppet/+/649752 and then https://gerrit" [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [18:26:26] (03PS1) 10Urbanecm: add uk.wikimedia.org to wikimedia-chapter [puppet] - 10https://gerrit.wikimedia.org/r/654659 (https://phabricator.wikimedia.org/T270987) [18:27:01] mutante: can you do the above too please? Forgot to submit earlier :( [18:28:52] uhm.. yes, but that will mean another global reload of appservers.. sigh [18:29:02] but we need to do it, yea [18:29:46] (03CR) 10Dzahn: [C: 04-2] "but wait, this is not Uzbek, it should be UZ" [puppet] - 10https://gerrit.wikimedia.org/r/654659 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [18:30:10] mutante: oh, did i do a mistake? [18:30:17] Urbanecm: uk.wikimedia.org would be https://wikimedia.org.uk/ :) [18:30:25] ah [18:30:26] typo [18:30:49] mutante: fixed [18:30:50] (03PS2) 10Urbanecm: add uz.wikimedia.org to wikimedia-chapter [puppet] - 10https://gerrit.wikimedia.org/r/654659 (https://phabricator.wikimedia.org/T270987) [18:31:38] (03CR) 10Dzahn: [C: 03+1] add uz.wikimedia.org to wikimedia-chapter [puppet] - 10https://gerrit.wikimedia.org/r/654659 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [18:32:18] (03CR) 10Dzahn: [C: 03+2] add uz.wikimedia.org to wikimedia-chapter [puppet] - 10https://gerrit.wikimedia.org/r/654659 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [18:32:26] jouncebot: now [18:32:26] No deployments scheduled for the next 0 hour(s) and 27 minute(s) [18:32:28] jouncebot: next [18:32:28] In 0 hour(s) and 27 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1900) [18:32:28] In 0 hour(s) and 27 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1900) [18:33:58] Urbanecm: you know.. we should also add that to the httpbb tests.. if all other wikimedia.org are in it [18:35:24] mutante: but not all need to be [18:35:31] species.wikimedia.org is not a chapter wiki [18:37:37] Urbanecm: I don't think we need to make that distinction for the tests. there it is more to have a complete list of all the virtual hosts expected to be on appservers [18:40:35] Urbanecm: but.. it's not that we list all the existing aliases now.. that's what I meant "if".. just checked [18:40:46] got it [18:41:09] (03PS1) 10Ahmon Dancy: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/654661 [18:43:16] Urbanecm: by the way, this is how this works. confirmed on mwdebug1001 as an example: [18:43:52] a yaml file with this: [18:43:53] http://uz.wikimedia.org: [18:43:53] - path: / assert_status: 302 [18:44:09] [deploy1001:~] $ httpbb --hosts mwdebug1001.eqiad.wmnet ~/test_uz.yaml [18:44:12] Sending to mwdebug1001.eqiad.wmnet... [18:44:14] PASS: 1 request sent to mwdebug1001.eqiad.wmnet. All assertions passed. [18:49:14] (03PS1) 10Hnowlan: tegola: Add docker image. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) [18:52:13] (03PS1) 10Andrew Bogott: Striker: replace requirement with libmariadb3 [puppet] - 10https://gerrit.wikimedia.org/r/654663 (https://phabricator.wikimedia.org/T269004) [18:53:01] (03CR) 10Andrew Bogott: [C: 03+2] Striker: replace requirement with libmariadb3 [puppet] - 10https://gerrit.wikimedia.org/r/654663 (https://phabricator.wikimedia.org/T269004) (owner: 10Andrew Bogott) [18:56:35] (03CR) 10Ahmon Dancy: [C: 03+2] Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/654661 (owner: 10Ahmon Dancy) [18:57:26] (03Merged) 10jenkins-bot: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config into train-dev [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/654661 (owner: 10Ahmon Dancy) [19:00:04] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1900). [19:00:04] dpifke: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:04] longma and hashar: How many deployers does it take to do Train log triage with CPT deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T1900). [19:00:22] dpifke: I guess you'll self-service? [19:00:38] Yup, but need a minute or two to verify the correct way to roll this out. Fighting breakage in beta right now. [19:00:58] sure, thanks [19:05:02] (03CR) 10Dzahn: "@Muehlenhoff Thoughts on www-data owning a file in /etc ?" [puppet] - 10https://gerrit.wikimedia.org/r/606824 (owner: 10Ladsgroup) [19:09:01] OK, I'm fairly certain the problem was with the beta environment, not with the scap commands I was running. Proceeding with backport. [19:10:18] (03CR) 10Dave Pifke: [C: 03+2] Remove Excimer single-shot profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651267 (owner: 10Ori.livneh) [19:11:08] (03Merged) 10jenkins-bot: Remove Excimer single-shot profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651267 (owner: 10Ori.livneh) [19:11:24] (03CR) 10Dave Pifke: [C: 03+2] profiler: remove MongoDB client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621095 (https://phabricator.wikimedia.org/T180761) (owner: 10Dave Pifke) [19:11:36] (03PS5) 10Dave Pifke: profiler: remove MongoDB client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621095 (https://phabricator.wikimedia.org/T180761) [19:11:49] (03CR) 10Dave Pifke: [V: 03+2 C: 03+2] profiler: remove MongoDB client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621095 (https://phabricator.wikimedia.org/T180761) (owner: 10Dave Pifke) [19:12:41] (03Merged) 10jenkins-bot: profiler: remove MongoDB client [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621095 (https://phabricator.wikimedia.org/T180761) (owner: 10Dave Pifke) [19:14:23] dpifke: Hurrah for getting rid of Mongo finally. :_) [19:14:51] Yeah, looking forward to it being deleted not just unused. :) [19:17:17] !log dpifke@deploy1001 Synchronized wmf-config/profiler.php: Removing unused profiler code (duration: 01m 08s) [19:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:39] !log dpifke@deploy1001 Synchronized wmf-config/PhpAutoPrepend.php: Removing unused profiler code (duration: 01m 04s) [19:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:38] !log andrew@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE [19:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:21] !log dpifke@deploy1001 Synchronized lib: Removing unused profiler libraries (duration: 01m 03s) [19:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:36] !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE [19:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:26] !log Morning backport window complete. [19:25:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:17] (03PS3) 10Dave Pifke: mediawiki: remove mongodb PHP extension from appservers [puppet] - 10https://gerrit.wikimedia.org/r/620729 (https://phabricator.wikimedia.org/T180761) (owner: 10Dzahn) [19:37:08] (03PS6) 10Ladsgroup: mailman3: Start mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) [19:37:58] (03CR) 10Ladsgroup: mailman3: Start mailman3 (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/608163 (https://phabricator.wikimedia.org/T256536) (owner: 10Ladsgroup) [19:43:55] (03CR) 10Dave Pifke: [C: 03+1] "Sorry it took me a while to get back to this. LGTM now." [puppet] - 10https://gerrit.wikimedia.org/r/620729 (https://phabricator.wikimedia.org/T180761) (owner: 10Dzahn) [19:47:13] (03CR) 10Legoktm: [C: 04-1] "$wmgElectronSecret should also be removed from private/readme.php (and then afterward from PrivateSettings.php, but that should be done af" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/634935 (owner: 10Giuseppe Lavagetto) [19:51:32] (03PS1) 10Bstorm: maintain-meta_p: simplify getting a str from requests module [puppet] - 10https://gerrit.wikimedia.org/r/654667 (https://phabricator.wikimedia.org/T270839) [20:00:04] longma and hashar: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Mediawiki train - American+European Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T2000). [20:00:29] (03CR) 10Andrew Bogott: [C: 03+1] maintain-meta_p: simplify getting a str from requests module [puppet] - 10https://gerrit.wikimedia.org/r/654667 (https://phabricator.wikimedia.org/T270839) (owner: 10Bstorm) [20:01:09] (03CR) 10Bstorm: [C: 03+2] maintain-meta_p: simplify getting a str from requests module [puppet] - 10https://gerrit.wikimedia.org/r/654667 (https://phabricator.wikimedia.org/T270839) (owner: 10Bstorm) [20:01:13] I'll be doing a backport and deploying to group0. If there are no issues I'll then deploy to group1 [20:03:12] (03CR) 10Jeena Huneidi: [C: 03+2] "Backport" [core] (wmf/1.36.0-wmf.25) - 10https://gerrit.wikimedia.org/r/654456 (owner: 10Ppchelko) [20:26:44] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy [20:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:51] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 00m 08s) [20:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:18] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: (no justification provided) [20:27:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:23] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: (no justification provided) (duration: 00m 05s) [20:27:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:31] (03Merged) 10jenkins-bot: Return back accidentally removed ParserCache 'hit' metric [core] (wmf/1.36.0-wmf.25) - 10https://gerrit.wikimedia.org/r/654456 (owner: 10Ppchelko) [20:38:43] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster [20:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:48] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 00m 05s) [20:38:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:20] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster [20:40:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:27] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 00m 07s) [20:40:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:53] !log andrew@deploy1001 Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster [20:40:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:23] !log andrew@deploy1001 Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 02m 30s) [20:43:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:55:01] deployment server is in a weird state following morning backport window... [20:55:08] trying to figure out what's deployed and what's not [20:58:17] !log andrew@deploy1001 Started deploy [horizon/deploy@965995d]: update codfw1dev deploy from train-buster branch [20:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:00] oh! I think I see what happened. Something merged for labs-only [21:00:04] chrisalbon and accraze: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210106T2100). [21:00:08] !log andrew@deploy1001 Finished deploy [horizon/deploy@965995d]: update codfw1dev deploy from train-buster branch (duration: 01m 51s) [21:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:14] and that wasn't deployed, but it should affect prod [21:01:04] you mean it should be deployed to prod? [21:01:37] I mean that it got merged to master, but doesn't really need a deployment to production (since it's a labs-only change) [21:01:47] longma: you should be clear now [21:02:06] as long as /srv/mediawiki-staging looks like it's supposed to anyway [21:02:08] ah ok, I didn't understand this part of what you said :"but it should affect prod" [21:02:30] bah, I meant to say "shouldn't" [21:02:33] yeah it looks clean now, thanks [21:03:34] sure thing, git riddle of the day [21:03:55] :) [21:04:12] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:04:19] !log andrew@deploy1001 deploy aborted: update codfw1dev deploy from train-buster branch (duration: 00m 07s) [21:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:37] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:43] !log andrew@deploy1001 deploy aborted: update codfw1dev deploy from train-buster branch (duration: 00m 07s) [21:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:51] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:53] (03CR) 10Ori.livneh: "Thanks, Dave :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651267 (owner: 10Ori.livneh) [21:08:06] (03PS1) 10Jeena Huneidi: group0 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654673 [21:08:08] (03CR) 10Jeena Huneidi: [C: 03+2] group0 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654673 (owner: 10Jeena Huneidi) [21:08:56] (03Merged) 10jenkins-bot: group0 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654673 (owner: 10Jeena Huneidi) [21:10:27] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.25 refs T267418 [21:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:34] T267418: 1.36.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T267418 [21:13:32] !log andrew@deploy1001 Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 08m 42s) [21:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:19] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:14:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:25] !log andrew@deploy1001 Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 05s) [21:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:15] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:20] !log andrew@deploy1001 Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 05s) [21:15:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:23] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:30] !log andrew@deploy1001 Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 06s) [21:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:38] !log andrew@deploy1001 Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch [21:16:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:40] deploying to group1 now [21:20:43] (03PS1) 10Jeena Huneidi: group1 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654674 [21:20:45] (03CR) 10Jeena Huneidi: [C: 03+2] group1 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654674 (owner: 10Jeena Huneidi) [21:21:31] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.25 refs T267418 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654674 (owner: 10Jeena Huneidi) [21:22:54] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.25 refs T267418 [21:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:00] T267418: 1.36.0-wmf.25 deployment blockers - https://phabricator.wikimedia.org/T267418 [21:23:59] !log jhuneidi@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.25 refs T267418 (duration: 01m 04s) [21:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:42] !log andrew@deploy1001 Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 09m 04s) [21:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:55] (03CR) 10Bartosz Dziewoński: "`false` and not `'unavailable'`?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654520 (owner: 10Esanders) [21:33:33] (03PS1) 10Dzahn: snapshot: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654675 (https://phabricator.wikimedia.org/T266479) [21:34:00] (03CR) 10jerkins-bot: [V: 04-1] snapshot: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654675 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [21:44:44] (03CR) 10Hashar: "The goal here is to fix the canonical hostname on the Gerrit replica. We already spend a fair amount of time to polish this change to a po" [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [21:44:57] (03PS1) 10Dzahn: netbox: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654676 (https://phabricator.wikimedia.org/T266479) [21:47:00] (03PS1) 10Dzahn: homer: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654677 (https://phabricator.wikimedia.org/T266479) [21:53:45] (03PS1) 10Dzahn: iegreview: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654681 (https://phabricator.wikimedia.org/T266479) [22:00:13] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) Pleae note that after our sync up with @wiki_willy and @calbon, the followup steps are: * new GPU will be ordered for tested via T271351. ** due to having to order another te... [22:05:53] Hey all - would like to try to deploy a security patch for T270988 now. [22:06:21] (03CR) 10SBassett: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654449 (owner: 10DannyS712) [22:09:07] (03CR) 10SBassett: [C: 03+2] Revoke `tboverride` from testwiki template editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654449 (owner: 10DannyS712) [22:10:50] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) [22:12:18] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) This had a mistake, introduced by me in the racking task, of 10G networking. These have 1G/10G nics, but ONLY need 1G. I had updated the codfw racking task, but failed to up... [22:12:44] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) [22:13:14] (03PS4) 10SBassett: Revoke `tboverride` from testwiki template editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654449 (owner: 10DannyS712) [22:14:33] (03CR) 10SBassett: [C: 03+2] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654449 (owner: 10DannyS712) [22:15:23] (03Merged) 10jenkins-bot: Revoke `tboverride` from testwiki template editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654449 (owner: 10DannyS712) [22:17:12] !log forced gerrit to replicate RequestTimeout to github (`ssh gerrit.wikimedia.org replication start mediawiki/libs/RequestTimeout --wait`) [22:17:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:26:07] (03PS1) 10SBassett: Revert "Revoke `tboverride` from testwiki template editors" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654462 [22:26:17] (03CR) 10SBassett: [C: 03+2] Revert "Revoke `tboverride` from testwiki template editors" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654462 (owner: 10SBassett) [22:28:20] (03Merged) 10jenkins-bot: Revert "Revoke `tboverride` from testwiki template editors" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/654462 (owner: 10SBassett) [22:32:02] !log Deployed security patch for T270988 to wmf.25 [22:32:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:34:15] !log Deployed security patch for T270988 to wmf.22 [22:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:36:13] (03PS1) 10Dzahn: mail::smarthost: hiera -> lookup [puppet] - 10https://gerrit.wikimedia.org/r/654697 [22:39:03] (03CR) 10Dzahn: [C: 03+2] iegreview: require_package -> ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/654681 (https://phabricator.wikimedia.org/T266479) (owner: 10Dzahn) [22:41:01] (03CR) 10Dzahn: [C: 04-1] "The other patch fixes the same issue, as you said yourself further up. I don't see why we would have to go through it twice." [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [22:44:41] (03CR) 10Dzahn: [C: 04-1] "going back to hardcoded wikimedia.org domain will break cloud / devtools too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [22:47:26] (03CR) 10Jdlrobson: [C: 03+1] "Can confirm that this revert is safe and the preferred rollback strategy." [core] (wmf/1.36.0-wmf.25) - 10https://gerrit.wikimedia.org/r/654461 (https://phabricator.wikimedia.org/T271365) (owner: 10Krinkle) [22:47:52] (03CR) 10Dzahn: [C: 04-1] "> Patch Set 6:" [puppet] - 10https://gerrit.wikimedia.org/r/643919 (owner: 10Hashar) [22:58:42] (03CR) 10Legoktm: [C: 04-1] "I don't think it's a good idea to redirect a *.wikimedia.org domain to a much lower security *.toolforge.org one. I'm working on porting t" [puppet] - 10https://gerrit.wikimedia.org/r/650215 (https://phabricator.wikimedia.org/T179696) (owner: 10Ahmon Dancy) [23:19:13] 10ops-eqiad, 10DC-Ops, 10SRE: (Need By: TBD) rack/setup/install ml-serve100[1-4] - https://phabricator.wikimedia.org/T267050 (10Jclark-ctr) Host racked ,working on cabling, netbox updated. host port rack ml-serve1001 12 a1 ml-serve1002 36 b5 ml-serve1003 5 c3 ml-serve1004 27 d8