[02:01:50] sukhe: Wikimedia DNS seems to start sending China to ulsfo, can you confirm that change? [02:10:33] Ok, confirmed via dig lookup. [02:11:11] The weird thing is that it's still eqsin for China in AuthDNS. [04:23:09] 06Traffic, 06SRE: Anycast ns[01].wikimedia.org for IPv4 - https://phabricator.wikimedia.org/T366193#11686108 (10cmooney) @ssingh in terms of the IPv6 anycast plans what is the current situation? I notice some patches like [[ https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1238015 | this one ]] have... [09:10:30] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11686377 (10ayounsi) a:03RobH Rob, could you prioritize this ? Thanks [09:39:08] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11686507 (10ayounsi) a:05ayounsi→03Papaul @papaul, can you try a factory reset of the switch from rack 23? (the one failing the TLS cookbook). I'm also still... [09:40:15] 10netops, 06Infrastructure-Foundations, 07sre-alert-triage: Alert in need of triage: PeeringBGPDown (instance cr1-drmrs:9804) - https://phabricator.wikimedia.org/T416987#11686528 (10ayounsi) 05Open→03Resolved [13:36:59] 10Acme-chief, 06Traffic, 13Patch-For-Review, 07Upstream: acme-chief is unable to validate challenges against GTS staging environment - https://phabricator.wikimedia.org/T419352#11687747 (10Vgutierrez) I've tried to patch our client to skip already validated challenges but I'm running into another issue, th... [13:40:10] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11687791 (10Papaul) @ayounsi yes I can do that. Do we have like some Documentation on how to factory reset the the Nokia switch somewhere or it is just "delete /"... [14:29:10] 10Acme-chief, 06Traffic, 13Patch-For-Review, 07Upstream: acme-chief is unable to validate challenges against GTS staging environment - https://phabricator.wikimedia.org/T419352#11688026 (10Vgutierrez) p:05Triage→03High [14:51:09] 06Traffic, 06SRE: Image Rate Limiting Issues For Future Audiences Project - https://phabricator.wikimedia.org/T418377#11688138 (10CDanis) >>! In T418377#11678861, @HSwan-WMF wrote: > Hey Chris, > > Perhaps we can talk live about this. I'm concerned about you mentioning that there will be no version of the li... [14:59:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11688170 (10MoritzMuehlenhoff) p:05Triage→03High [15:10:35] 06Traffic, 06SRE: Image Rate Limiting Issues For Future Audiences Project - https://phabricator.wikimedia.org/T418377#11688217 (10derenrich) Amazing. Thank you so much. [15:21:06] 06Traffic, 06SRE: Image Rate Limiting Issues For Future Audiences Project - https://phabricator.wikimedia.org/T418377#11688263 (10HSwan-WMF) >>! In T418377#11688138, @CDanis wrote: > Sorry for being unclear -- there's no version of the //bot// ratelimits that could yield an acceptable UX for this. Which is so... [15:44:15] !log vgutierrez@acmechief-test2001:~$ sudo -i systemctl disable reload-acme-chief-backend.timer - T419352 [15:44:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:19] T419352: acme-chief is unable to validate challenges against GTS staging environment - https://phabricator.wikimedia.org/T419352 [15:48:36] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11688418 (10RobH) I'll work on this now. [16:00:09] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11688523 (10RobH) CS1253254 filed, listed myself, Arzhel, Cathal, and Papaul on the CC list. > Account: WIKIMEDIA > Contact: Robert McMahon Halsell > D... [16:24:59] 06Traffic, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Puppet agent failure detected on instance deployment-cache-upload08 in project deployment-prep - https://phabricator.wikimedia.org/T419099#11688671 (10elukey) 05Open→03Resolved a:03elukey ` elukey@deployment-cache-upload08:~$ sudo run-pu... [17:46:02] 10Acme-chief, 06Traffic, 13Patch-For-Review, 07Upstream: acme-chief is unable to validate challenges against GTS staging environment - https://phabricator.wikimedia.org/T419352#11689171 (10Vgutierrez) p:05High→03Medium with the patch applied acme-chief was able to issue the certificate a few hours afte... [19:27:17] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11689626 (10RobH) Ok, they swapped the optic in cr2-magru but still shows down: et-0/0/1 up down Core: asw1-b3-magru:et-0/0/50 {#70130} The o... [19:35:51] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11689696 (10RobH) > Support, > > Thank you, we can see the old module QSFP-100GBASE-SR4 SN GT3AAG00321 was removed and replaced with QSFP-100GBASE-SR4 mo... [20:56:50] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11690062 (10BCornwall) [21:03:00] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11690071 (10CDobbins) [21:03:17] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11690073 (10BCornwall) [21:06:44] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp7002:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:06:44] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp7002 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7002 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [21:18:55] FIRING: SystemdUnitFailed: haproxy.service on cp7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:21:43] RESOLVED: HaproxyKafkaExporterDown: HaproxyKafka on cp7002 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7002 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [21:23:55] FIRING: [4x] SystemdUnitFailed: haproxy.service on cp7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:26:47] brett: cjd91: I see you have a bit of work going on in magru. any concerns if I roll through with a puppet change? (varnish) [21:28:55] RESOLVED: [4x] SystemdUnitFailed: haproxy.service on cp7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:31:43] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp7002 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7002 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [21:33:55] FIRING: [4x] SystemdUnitFailed: haproxy.service on cp7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:36:40] RESOLVED: VarnishPrometheusExporterDown: Varnish Exporter on instance cp7002:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [21:36:43] RESOLVED: HaproxyKafkaExporterDown: HaproxyKafka on cp7002 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=magru&var-instance=cp7002 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [21:38:55] RESOLVED: [4x] SystemdUnitFailed: haproxy.service on cp7002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:52:44] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11690222 (10CDobbins) [22:23:05] 06Traffic, 06Commons, 10UploadWizard, 072026-user-javascript-incident, and 3 others: UploadWizard on Wikimedia Commons no longer able to import files from Flickr - https://phabricator.wikimedia.org/T419263#11690295 (10sbassett) Hey all - * live.staticflickr.com should be allowed within Wikimedia project... [22:28:37] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11690332 (10RobH) They've now replace the patch cable but we're still seeing down: > Comentário gerado em Smart Hands: Dear, evening. > > As requested... [23:23:28] 06Traffic, 06Commons, 10UploadWizard, 072026-user-javascript-incident, and 3 others: UploadWizard on Wikimedia Commons no longer able to import files from Flickr - https://phabricator.wikimedia.org/T419263#11690470 (10A_smart_kitten) 05Open→03Resolved a:03sbassett Based on some brief testing I've... [23:53:55] FIRING: SystemdUnitFailed: prometheus_nic_queue_cpu_eno16795np0.service on cp2058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed