[00:16:17] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11690573 (10bd808) >>! In T399419#11183331, @BCornwall wrote: > Hm, it s... [02:47:17] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11690710 (10Papaul) Looks like changing the module on the switch side fixed the issue. ` sw1-b3-magru> show interfaces et-0/0/50 descriptions Interface... [03:54:15] FIRING: SystemdUnitFailed: prometheus_nic_queue_cpu_eno16795np0.service on cp2058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:54:31] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11690822 (10cmooney) Showing as down right now both sides, lane 3 RX still poor on cr2-magru: ` cmooney@cr2-magru> show interfaces diagnostics optics et-0... [06:09:52] 06Traffic, 06Commons, 10UploadWizard, 072026-user-javascript-incident, and 3 others: UploadWizard on Wikimedia Commons no longer able to import files from Flickr - https://phabricator.wikimedia.org/T419263#11690830 (10TimSC) I can see thumbnails in the selection UI, but the following screen in which I... [07:54:15] FIRING: SystemdUnitFailed: prometheus_nic_queue_cpu_eno16795np0.service on cp2058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:42:19] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11691068 (10ayounsi) Link is down again: https://alerts.wikimedia.org/?q=scope%3Dnetwork&q=site%3Dmagru&q=%40state%3Dsuppressed Thanks rob for leading on... [10:04:42] 06Traffic, 06Commons, 10UploadWizard, 072026-user-javascript-incident, and 3 others: UploadWizard on Wikimedia Commons no longer able to import files from Flickr - https://phabricator.wikimedia.org/T419263#11691116 (10A_smart_kitten) Hmmm... Thanks for the comment, @timsc. A similar experience was also... [10:33:55] RESOLVED: SystemdUnitFailed: prometheus_nic_queue_cpu_eno16795np0.service on cp2058:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:37:47] 10Wikimedia-Apache-configuration, 06ServiceOps new, 06SRE, 10Wikibase GraphQL, and 2 others: Create a rewrite for the GraphQL endpoint on wikidata.org - https://phabricator.wikimedia.org/T417026#11691213 (10Ifrahkhanyaree_WMDE) 05Open→03Invalid Hey @Clement_Goubert, I've chatted with Halley and we'... [11:53:31] 06Traffic: Dump stats-file before haproxy restart/reload - https://phabricator.wikimedia.org/T419524 (10Fabfur) 03NEW [13:25:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11691843 (10Papaul) @ayounsi factory reset the switch same issue. [13:28:40] !log testing acme-chief 0.39 in acmechief-test2001 - T419352 [13:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:43] T419352: acme-chief is unable to validate challenges against GTS staging environment - https://phabricator.wikimedia.org/T419352 [13:29:04] Cuthead: hi, sorry I was out so just reading [13:29:21] you meant that Wikimedia DNS traffic is going to ulsfo from CN, correct? [13:29:24] yeah that should not happen [15:26:52] sukhe: Now I think it's not related to China, but the ECS itself. [15:27:17] It just fall backed to ulsfo. [15:28:13] e.g. evening specifying the subnet to a Singapore IP address, I still get ulsfo. [15:28:28] dig @185.71.138.138 +https dyna.wikimedia.org a +subnet=103.102.166.224/24 [15:30:54] https://fars.ee/T2O4.png [16:04:39] 06Traffic: Revisit HAProxy cpu-map directive usage - https://phabricator.wikimedia.org/T419568 (10BCornwall) 03NEW [16:05:02] 06Traffic: Revisit HAProxy cpu-map directive usage - https://phabricator.wikimedia.org/T419568#11692999 (10taavi) [16:08:51] 06Traffic: Revisit HAProxy cpu-map directive usage - https://phabricator.wikimedia.org/T419568#11693028 (10BCornwall) [16:09:03] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11693030 (10Krinkle) [16:11:01] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons (iOS app by default hiding the Referrer header) - https://phabricator.wikimedia.org/T413570#11693050 (10Nylki) @SuperHamster thanks! [16:11:02] Cuthead: hmm I am a bit puzzled by that [16:11:12] > generic-map 103.102.166.224 [16:11:12] generic-map => 103.102.166.224/24 => eqsin, codfw, eqiad, ulsfo, esams, drmrs, magru [16:11:46] you should have been getting text-lb eqsin from that, not ulsfo [16:15:14] Cuthead: what do you get when you resolv reflect.wikimedia.org ? [16:18:58] reflect-edns.wikimedia.org to see if ECS is being sent in general. but well, +subnet does that explicitly though [16:47:33] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11693404 (10RobH) > Support, > > You have swapped the optic on the router side, and the MPO patch cable. The link is still down, so we'd like you to swa... [16:51:43] 06Traffic, 06Fundraising-Backlog, 06Fundraising-Tech-Roadmap, 10Wikimedia-Fundraising-CiviCRM, 07fr-acoustic: Acoustic SMS: Domain needed for short links - https://phabricator.wikimedia.org/T379318#11693429 (10ssingh) @greg: Traffic was notified about this an it seems like wiki.gives is a 404. Is tha... [17:01:13] 06Traffic, 06Fundraising-Backlog, 06Fundraising-Tech-Roadmap, 10Wikimedia-Fundraising-CiviCRM, 07fr-acoustic: Acoustic SMS: Domain needed for short links - https://phabricator.wikimedia.org/T379318#11693521 (10greg) >>! In T379318#11693429, @ssingh wrote: > @greg: Traffic was notified about this an i... [17:05:00] 06Traffic, 06Fundraising-Backlog, 06Fundraising-Tech-Roadmap, 10Wikimedia-Fundraising-CiviCRM, 07fr-acoustic: Acoustic SMS: Domain needed for short links - https://phabricator.wikimedia.org/T379318#11693535 (10EWilfong_WMF) I can confirm that is how it works. The root URL does not have a page, but t... [17:30:55] XioNoX: dig @185.71.138.138 +https reflect.wikimedia.org I did this multiple times, I saw my own IPv4 address at the first attempt, but all later responses are empty. [17:38:28] And for AAAA lookup, in multiple attempts there are times I got my own IPv6 prefix, but more were 2620:0:863:1:198:35:26:6 [17:38:28] or 2620:0:863:1:198:35:26:14. [17:49:56] vgutierrez: It would look like if I want to regex match on anything other than the host I need to use regex_remap and a remap.config? (according to ATS docs) [17:50:32] https://docs.trafficserver.apache.org/en/latest/admin-guide/files/remap.config.en.html#regular-expression-regex-remap-support [17:50:37] yeah... that's why I was asking if you were going to add a regex :) [17:50:40] "Only the host field can contain a regex; the scheme, port, and other fields cannot. For path manipulation via regexes, use the Regex Remap Plugin." [17:51:01] Ugh [17:51:21] So the simplest would actually be 3 map stanzas, one for core, one for service, one for feed [17:51:29] and everything else goes to the mw-web catch all [17:51:37] but that sucks [17:54:21] I'll think about it tomorrow :P [17:56:03] Cuthead: are you connecting from CN? [18:16:09] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694093 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp7003.magru.wmnet with OS trixie [18:16:19] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11694107 (10BCornwall) @bd808 You would think so with such a key name! I... [18:21:35] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11694127 (10RobH) The order is placed and I'm currently scheduling the Unisys/Dell engineer to go onsite sometime between Friday-Wednesday of this/next week. Host is hard down, so no traffic interven... [18:23:04] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11694128 (10BCornwall) Ah, that service doesn't handle certs to consumer... [18:23:08] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694129 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp7004.magru.wmnet with OS trixie [18:37:17] 10Acme-chief, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible: Warning about /etc/acmecerts/unified contents during puppet run on deployment-cache-text08 & deployment-cache-upload08 - https://phabricator.wikimedia.org/T399419#11694194 (10BCornwall) If I'm reading things correctly, the certs are no... [18:42:12] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694201 (10BCornwall) [18:47:56] Cuthead: I think I can reproduce the issue as well and so can other Traffic folk. I will look into this and follow up [18:48:58] so far, not sure what is happening so I will revisit the Wikimedia DNS ECS interactions from dnsdist to pdns-rec to the auth [19:16:12] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694360 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp7003.magru.wmnet with OS trixie completed: - cp7003 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pup... [19:19:51] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694374 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp7004.magru.wmnet with OS trixie completed: - cp7004 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pup... [19:19:54] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694375 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp7005.magru.wmnet with OS trixie [19:20:03] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694377 (10CDobbins) [19:35:24] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11694408 (10RobH) They swapped the optic GT3AAG00314 out of the switch for optic GT3AAG00316 and now the link shows up: router: ` et-0/0/1 up... [19:39:28] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694414 (10CDobbins) [19:40:31] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694415 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cdobbins@cumin2002 for host cp7006.magru.wmnet with OS trixie [19:42:40] FIRING: [17x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:47:40] FIRING: [17x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:52:40] FIRING: [23x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [19:57:40] FIRING: [29x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:02:40] FIRING: [29x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:04:23] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694491 (10BCornwall) [20:06:30] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11694501 (10RobH) Sent an email to investigate the return/repair of GT3AAG00314 & GT3AAG00321 [20:12:40] FIRING: [15x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [20:16:29] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11694523 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cdobbins@cumin2002 for host cp7005.magru.wmnet with OS trixie completed: - cp7005 (**PASS**) - Downtimed on Icinga/Alertmanager - Disabled Pup... [20:17:40] RESOLVED: [14x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount