[00:04:06] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11407956 (10Papaul) [00:09:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11407960 (10Papaul) @RobH I update the task description with all the connections that we need for phase 1 in December. Please don't forget the Cable ID's. Please... [08:59:10] 10SRE-tools, 06Infrastructure-Foundations, 06serviceops: Add a --rack flag to sre.k8s.pool-depool-node - https://phabricator.wikimedia.org/T410537#11408474 (10MLechvien-WMF) a:03MLechvien-WMF [09:53:30] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#11408657 (10jcrespo) > The only supported/working way is to stage the firmwares manually on the cumin nodes and use those :( How?... [10:23:53] 10netops, 06Infrastructure-Foundations, 10Toolforge, 06tools-infrastructure-team: Plan networking for Toolforge-on-Metal experiment - https://phabricator.wikimedia.org/T407140#11408747 (10taavi) [14:24:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098 (10cmooney) 03NEW p:05Triage→03Medium [14:25:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11409583 (10cmooney) [14:38:07] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11409646 (10Jclark-ctr) a:03Jclark-ctr Relocated sretest1006 to D8 U37. Connected to lswtest-d8-eqiad Port 1 [14:58:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:03:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:25:43] 10CAS-SSO, 10Gerrit, 06Infrastructure-Foundations: Use IDP for authentication in Gerrit - https://phabricator.wikimedia.org/T147864#11409769 (10hashar) It is stalled because that is merely a which list and it is definitely not a priority. Before that can be acted on there is a thorough analysis of what need... [15:40:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11409838 (10fnegri) @Jclark-ctr `clouddb10[17-20]` are now depooled, but not downtimed. Can you please downtime them yourself when you migrate them? Otherwise... [15:48:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11409855 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=80e83414-993e-4a63-b612-9625174481c7) set by fnegri@cumin1003 for 2:00:00 on 4 ho... [17:35:54] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11410416 (10RobH) Day 11 Update: * 8 hosts moved, 5 remain out of 308 total hosts. * John did all the moves today working with Andrew. * Migrated 6 of the 8 W... [17:36:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11410427 (10RobH) [19:21:15] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410830 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm... [19:54:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410946 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet... [19:56:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410949 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm... [20:37:34] FIRING: DiskSpace: Disk space serpens:9100:/ 6.433% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=serpens - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [20:37:57] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11411029 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet... [23:58:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Remove lvs1018 L2 link to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T405499#11411590 (10VRiley-WMF) Hey @cmooney It has been reused for that purpose, however it's still being worked on to update the connection in netbox