[03:17:15] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11888290 (10Papaul) All the servers in rack 22 are connected to the new switch and all the link are up I just tested cp4037 but all others should be online.... [03:35:40] FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:20:25] RESOLVED: SystemdUnitFailed: netbox_ganeti_ulsfo02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:34:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11888629 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bdfd24a0-f5cd-4c3b-945b-36deeb91ba1c) set by ayounsi@cumin1003 for 20:00:00 o... [10:48:06] 10CFSSL-PKI, 06Infrastructure-Foundations: Establish a process to periodically upgrade the CFSSL infrastructure - https://phabricator.wikimedia.org/T365361#11889062 (10elukey) 05Open→03Resolved a:03elukey We are using T416664 to upgrade the current pki nodes to the latest upstream, 1.6.5. This seems... [11:38:10] I'm rebooting the the etcd nodes for the aux k8s cluster to move them to 2G RAM for T422596, should cause no issues, but if you notice anything let me know [11:38:11] T422596: Failing Trixie VM installations on routed Ganeti - https://phabricator.wikimedia.org/T422596 [13:29:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11889565 (10Papaul) 05Open→03Resolved We can close this [16:37:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:57:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:37:56] FIRING: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:42:56] RESOLVED: [2x] ProbeDown: Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [18:13:34] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891287 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie [18:26:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891333 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie executed with... [18:30:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891359 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie [18:44:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891449 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie executed with... [18:49:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891474 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie [18:55:11] 10netops, 06Infrastructure-Foundations, 06SRE: Packet loss on eqsin OOB CCT via IPv6 - https://phabricator.wikimedia.org/T425471 (10cmooney) 03NEW p:05Triage→03Medium [19:04:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891558 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie executed with... [19:15:32] 10netops, 06Infrastructure-Foundations, 06SRE: Packet loss on eqsin OOB CCT via IPv6 - https://phabricator.wikimedia.org/T425471#11891576 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e811f42a-7bc8-4cee-b558-794852157c2b) set by cmooney@cumin1003 for 0:30:00 on 6 host(s) and their servi... [19:20:25] FIRING: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:45:20] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891705 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie [20:20:25] RESOLVED: SystemdUnitFailed: update-tails-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:41:37] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11891874 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1003 for host cp4038.ulsfo.wmnet with OS trixie completed: - c... [22:48:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11892268 (10Papaul) @RobH see below the list of node still on 10G DAC that We will need to move to 25G DAC. Can you please order 7x2m 25G DAC? Thank you...