[01:34:37] 06Traffic, 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-cache-text08 is down - https://phabricator.wikimedia.org/T418884#11671573 (10bd808) 05Open→03Invalid Looks like this was a temporary overload that broke the Prometheus scrape: `lang=shell-session bd808@deployment-cach... [02:01:32] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11671606 (10matmarex) >>! In T412585#11461635, @daniel wrote: >>>! In T412585#11458421, @matmarex wrote: >> is there some way to "force" an error res... [02:24:09] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11671623 (10ssingh) @BrokenImages1234, @ihurbain: Hi. I can confirm I can reproduce the issue as well. I have a possible theory I would like to confirm and then tomorrow we can work on hopefully resolving this issue. Do you have... [03:19:40] FIRING: [8x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:24:43] FIRING: [49x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [03:29:43] FIRING: [55x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [03:39:43] RESOLVED: [55x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [03:44:01] 10netops, 06Infrastructure-Foundations, 06SRE: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11671668 (10Papaul) I update the diagram [03:48:49] 10netops, 06Infrastructure-Foundations, 06SRE: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11671673 (10cmooney) Looks great, nice work! [03:54:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:59:40] FIRING: [12x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:04:40] FIRING: [16x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:14:40] FIRING: [14x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:19:40] FIRING: [12x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:24:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp7001:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [04:39:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11671699 (10Papaul) @ayounsi prior of deleting the sandbox1-ulsfo range 198.35.26.240/28 I will have to delete the interfaces et-0/0/1.1221 on both routers. D... [04:54:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971 (10Papaul) 03NEW [05:01:16] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11671717 (10Papaul) @jcrespo @Marostegui @MatthewVernon can you please let us know if backup1007, dbprov1004 and ms-be1093 need depool befo... [05:10:41] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11671721 (10BrokenImages1234) @ssingh Yes, of course. I don't think it's even possible to enable them in Tor... [07:33:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:38:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:05:55] hello traffic folks! thanks vgutierrez for the hotfix yesterday! on the same topic of moving gerrit instances behind CDN, we have created small 2 CR (-0/+2) https://w.wiki/J9qq https://w.wiki/J9qr would someone be available to review them? [08:18:40] FIRING: [2x] VarnishHighThreadCount: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:25:44] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11671968 (10jcrespo) @papaul for backup1007, dbprov1004, while they are a production host with important content, a small network interrupti... [08:26:14] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11671969 (10jcrespo) [08:36:52] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11671989 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cd8c8777-0916-4a5b-b6f5-55f2535990f4) set by jynus@cumin1003 fo... [08:37:05] 10netops, 06Infrastructure-Foundations, 10ops-magru, 06SRE: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978 (10cmooney) 03NEW p:05Triage→03High [08:38:15] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11672002 (10jcrespo) [08:38:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp5024:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5024 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [08:40:46] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support ECH on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11672025 (10Diskdance) FYI, the ECH standard has been stabilized as RFC9848: https://www.rfc-editor.org/info/rfc9848. [08:46:57] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support Encrypted Client Hello (ECH) on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11672036 (10Diskdance) [08:47:54] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support Encrypted Client Hello (ECH) on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11672040 (10Diskdance) [09:20:19] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux DHCP Relay Bug - https://phabricator.wikimedia.org/T411054#11672136 (10cmooney) @ayounsi thanks for following up on this. I've done some testing to see if there may be a better way to force a tunnel teardown/re-establishment today. The reason cl... [09:34:28] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11672217 (10MatthewVernon) [09:36:07] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11672223 (10MatthewVernon) Is this maintenance happening at 15:00 UTC today? @Papaul ms-be1093 needs no action taking, but it'd be worth co... [10:10:04] 06Traffic, 13Patch-For-Review: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11672354 (10Vgutierrez) [10:16:59] 06Traffic, 06MW-Interfaces-Team, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MediaWiki-Platform-Team (Radar), and 2 others: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11672387 (10JAllemandou) a:03JAllemandou [10:37:28] 06Traffic, 06MW-Interfaces-Team, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11672485 (10Fabfur) The [[ https://gerrit.wikimedia.org/r/1247034 |... [10:49:09] 06Traffic, 10JWTAuth, 06MediaWiki-Platform-Team: JWT tokens issued with empty subject - https://phabricator.wikimedia.org/T418991 (10Vgutierrez) 03NEW [10:49:13] 06Traffic, 10JWTAuth, 06MediaWiki-Platform-Team: JWT tokens issued with empty subject - https://phabricator.wikimedia.org/T418991#11672552 (10Vgutierrez) p:05Triage→03High [10:52:50] 06Traffic, 10JWTAuth, 06MediaWiki-Platform-Team: JWT tokens issued with empty subject - https://phabricator.wikimedia.org/T418991#11672563 (10Vgutierrez) this seems to be tracked as T417278 [10:56:09] 06Traffic, 10JWTAuth, 06MediaWiki-Platform-Team: JWT tokens issued with empty subject - https://phabricator.wikimedia.org/T418991#11672576 (10Tgr) →14Duplicate dup:03T417278 [11:23:13] 06Traffic, 13Patch-For-Review: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11672711 (10Vgutierrez) [11:33:45] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 3 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11672762 (10Ladsgroup) I just added any requests to non-standard thumbs that the referrer is not us to the global rate limit. So far... [12:04:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11672840 (10ayounsi) [12:04:11] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11672843 (10ayounsi) [12:04:21] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11672845 (10ayounsi) [12:04:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11672844 (10ayounsi) [12:32:03] 06Traffic, 13Patch-For-Review: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11672921 (10Fabfur) [12:34:18] 06Traffic, 06MW-Interfaces-Team, 06MediaWiki-Platform-Team (Radar), 07OKR-Work: haproxy: strip x-wmf-* headers from responses - https://phabricator.wikimedia.org/T417781#11672925 (10Raine) [14:27:49] 06Traffic, 10MediaWiki-Revision-deletion, 06MW-Interfaces-Team: 404 error when using action=raw on an admin-level hidden revision - https://phabricator.wikimedia.org/T351688#11673467 (10HCoplin-WMF) @Krinkle -- do you have any thoughts about this? We chatted about this in MWI, but it's unclear if action=raw... [14:33:01] 06Traffic, 13Patch-For-Review: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11673497 (10Fabfur) [14:39:34] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11673532 (10Papaul) @MatthewVernon thank you. Yes it will be at 15:00 UTC [14:40:45] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11673540 (10Tgr) Create a global group like `rate-limit-testers`, make it emit an `rlc`, make the rate limit for that class 0? A dedicated Wikimedia... [14:41:47] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11673545 (10ssingh) >>! In T418323#11671721, @BrokenImages1234 wrote: > @ssingh > Yes, of course. I don't think it's even possible to enable them in Tor... You mentioned Firefox and Brave above. Can you please try with those? [15:07:51] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 10ServiceOps-SharedInfra, and 4 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11673699 (10daniel) [15:22:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, and 2 others: ULSFO: Update ULSFO LVS service IP's - https://phabricator.wikimedia.org/T418971#11673768 (10ssingh) Per T410411, we no longer need at least `pybal-high-traffic1-ulsfo.wikimedia.org` and `pybal-high-traffic2-ulsfo.wikimedia.org` i... [15:38:34] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 3 others: MediaViewer (and the commons file page) should serve WebP originals not thumbnails of equivalent size - https://phabricator.wikimedia.org/T418745#11673841 (10Ladsgroup) 05Open→03Resolved [15:39:01] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11673851 (10Ladsgroup) [15:46:59] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11673879 (10ayounsi) cirrussearch repooled [16:00:20] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11673956 (10matmarex) I have third-party cookies disabled and I am apparently also affected by this bug. I'm using Firefox 149 with the "Enhanced Tracking Protection" setting set to "Strict", which blocks (among other things) "Cr... [16:25:23] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support Encrypted Client Hello (ECH) on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11674077 (10Naruse_shiroha) Nah, you linked to the wrong number. 9848 is for ECH configuration distribution, ECH itself is in 9849. https://datat... [16:26:03] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support Encrypted Client Hello (ECH) on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11674078 (10Naruse_shiroha) [16:26:17] 07HTTPS, 06Traffic, 06SRE, 06Traffic-Icebox, 07Upstream: Support Encrypted Client Hello (ECH) on Wikimedia servers - https://phabricator.wikimedia.org/T205378#11674081 (10Naruse_shiroha) [16:26:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11674086 (10ayounsi) >>! In T408892#11671699, @Papaul wrote: > @ayounsi prior of deleting the sandbox1-ulsfo range 198.35.26.240/28 I will have to delete the... [18:37:07] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11674804 (10ssingh) Thanks for confirming @matmarex. We are discussing this internally and will follow up. [19:08:07] 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11674977 (10BCornwall) [19:22:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11674997 (10RobH) >>! In T418018#11648884, @Papaul wrote: > @RobH like @ayounsi mentioned today everything for row A/B should be QSFP-100G CWMD4 like in... [20:16:29] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012#11675191 (10RobH) [20:26:49] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11675231 (10RobH) >>! In T418018#11674997, @RobH wrote: >>>! In T418018#11648884, @Papaul wrote: >> @RobH like @ayounsi mentioned today everything for r... [21:46:19] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012#11675615 (10RobH) [21:48:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad row A/B switch upgrade - https://phabricator.wikimedia.org/T418012#11675636 (10RobH) [22:20:11] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 3 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11675775 (10Ladsgroup) >>! In T414805#11668230, @Ladsgroup wrote: > Top "file formats" for the non-standard sizes with enwiki as refe... [23:04:44] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad A/B switch cabling documentation - https://phabricator.wikimedia.org/T418018#11675991 (10Papaul) Please see below for the spine to spine port information |Switch|Interface|Switch|Interface| |ssw1-a1-eqiad|ethernet-1/31|ssw1-f1-e... [23:08:59] 06Traffic, 10API Platform, 10MediaWiki-User-login-and-signup, 06MediaWiki-Platform-Team (Q3 Kanban Board), and 2 others: Login with `action=login` and bot password does not create a JWT session cookie - https://phabricator.wikimedia.org/T415007#11676009 (10Tgr) [23:33:10] 06Traffic, 10API Platform, 10MediaWiki-User-login-and-signup, 06MediaWiki-Platform-Team (Q3 Kanban Board), and 2 others: Login with `action=login` and bot password does not create a JWT session cookie - https://phabricator.wikimedia.org/T415007#11676041 (10Tgr) Undeployed temporarily as change management f...