[03:24:28] 06Traffic, 10Wikimedia-Site-requests, 07Logos: logos/manage.py failing due to 429 (thumbnail steps) - https://phabricator.wikimedia.org/T414048 (10Chlod) 03NEW [03:29:21] 06Traffic, 10Wikimedia-Site-requests, 07Logos: logos/manage.py failing due to 429 (thumbnail steps) - https://phabricator.wikimedia.org/T414048#11502067 (10Pppery) I was aware of this rate limit, but my thinking was that it wouldn't apply since making 3 requests once when a human runs a script should be far... [09:33:03] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Remove pfw configuration related to former pybal/LVS service - https://phabricator.wikimedia.org/T414015#11502556 (10ayounsi) 208.80.155.5 still have a DNS PTR record, can you take care of removing it ? To be done on pfw1-eqiad: `lang=diff [edi... [09:33:24] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Remove pfw configuration related to former pybal/LVS service - https://phabricator.wikimedia.org/T414015#11502557 (10ayounsi) a:03ayounsi [13:51:07] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Remove pfw configuration related to former pybal/LVS service - https://phabricator.wikimedia.org/T414015#11503671 (10Jgreen) >>! In T414015#11502556, @ayounsi wrote: > 208.80.155.5 still have a DNS PTR record, can you take care of removing it ?... [15:05:40] sukhe: we'll update esams to Bird 2.18 now, ok? [15:05:53] I'm around [15:06:08] moritzm: sure [15:08:49] I'm upgrading ganeti3006 first [15:08:59] ok [15:09:49] and durum3005 [15:10:40] both upgraded, I'm also forcing a puppet run on durum3005 to doublecheck we have no similar issues to what we found yesterday [15:11:55] thanks! good idea [15:14:14] puppet ran and I also restarted Bird manually on durum3005, [15:14:19] looks all fine in journalctl [15:14:59] cool [15:16:07] proceeding with doh3006 [15:16:38] FIRING: LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [15:16:52] also upgraded now [15:17:34] bird looks good, I will let XioNoX check BGP on the routers but if not, I can also [15:17:55] I can :) [15:18:05] <3 [15:19:41] sukhe: lgtm [15:20:04] thanks! [15:20:04] seeing the VMs IPs and IPs advertized by the VMs [15:21:38] RESOLVED: LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [15:24:37] movin on with ganeti3007 [15:25:29] and durum3006 next [15:26:15] and hcaptcha-proxy3002 [15:26:33] ok! [15:27:12] and doh3005 (last VM on ganeti3007) [15:27:26] all three done now [15:34:33] XioNoX: ok to proceed with ganeti3008 and the last VM on routed Ganeti (hcaptcha-proxy3001)? [15:36:04] looking [15:38:44] moritzm: lgtm! [15:39:37] resuming with ganeti3008 and hcaptcha-proxy3001 [15:42:46] both upgraded now [15:46:18] lgtm, same routes before/after [15:46:37] nice [15:47:11] as a next step we should upgrade the hcaptcha-proxy, doh and durum VMs not on routed also to 2.18 next week, I'll prepare patches to enable the component providing 2.18 also in the classic ganeti setup [15:47:21] and then dns* [15:47:48] and there's a handful of cloud hosts and then we'd be in position to default to 2.18 across all services and simply upload it to "main" [15:56:47] ok [16:29:48] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q2 FY25/26 October 1st - December 31th): Handle `network_flows_internal` data growth - https://phabricator.wikimedia.org/T412443#11504396 (10Ahoelzl) 05Open→03Resolved [16:38:57] 06Traffic, 10Beta-Cluster-Infrastructure: CDN edge HAProxy config broken by missing `/usr/share/GeoIP/proxy.mmdb` data file - https://phabricator.wikimedia.org/T414111#11504535 (10bd808) [16:39:11] 06Traffic, 10Beta-Cluster-Infrastructure: Remove need for manually applied MaxMind data hacks on Beta Cluster cache servers - https://phabricator.wikimedia.org/T403105#11504538 (10bd808) [16:47:11] 06Traffic, 10Beta-Cluster-Infrastructure: CDN edge HAProxy config broken by missing `/usr/share/GeoIP/proxy.mmdb` data file - https://phabricator.wikimedia.org/T414111#11504567 (10bd808) `counterexample HACK incoming!!!!! ` `lang=shell-session bd808@deployment-cache-text08.deployment-prep.eqiad1:~$ sudo ln -s... [16:50:16] 06Traffic, 10Beta-Cluster-Infrastructure: CDN edge HAProxy config broken by missing `/usr/share/GeoIP/proxy.mmdb` data file - https://phabricator.wikimedia.org/T414111#11504579 (10ssingh) @SLyngshede-WMF can you look into this please? I think we can `ensure` on `$use_private_data.bool2str` for `/usr/share/GeoI... [16:51:27] 06Traffic, 10Beta-Cluster-Infrastructure: Remove need for manually applied MaxMind data hacks on Beta Cluster cache servers - https://phabricator.wikimedia.org/T403105#11504582 (10bd808) Another symlink hack has been applied for {T414111}. [16:52:04] 06Traffic, 10Beta-Cluster-Infrastructure: Remove need for manually applied MaxMind data hacks on Beta Cluster cache servers - https://phabricator.wikimedia.org/T403105#11504584 (10bd808) [17:32:43] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11504733 (10ssingh) [18:08:17] 06Traffic, 10Beta-Cluster-Infrastructure: CDN edge HAProxy config broken by missing `/usr/share/GeoIP/proxy.mmdb` data file - https://phabricator.wikimedia.org/T414111#11504849 (10bd808) p:05Triage→03Medium [18:08:46] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11504853 (10Joe) @Ragesoss what is the User-Agent you use when making those requests? [18:15:44] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11504886 (10Joe) @Ragesoss as far as I can tell, the problem is you are not honoring the wikimedia User-Agent policy, and we have recently started to enforce stricter rat... [18:16:14] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11504890 (10ssingh) Hi @Ragesoss: We looked through the logs and it seems like requests originating from your end are not respecting our UA policy, documented at https://... [18:58:49] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11505043 (10Ragesoss) Thanks! Unfortunately, the OAuth library we use doesn't support setting the User Agent, so I'm going to have to figure out how to monkey patch it. :-( [19:01:53] 06Traffic, 07Essential-Work, 05MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), 06Test Kitchen (Test Kitchen (Experiment Platform Sprint 17)): Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11505059 (10JVanderhoop-WMF) @ssingh - when... [19:34:38] 06Traffic, 07Essential-Work, 05MW-1.46-notes (1.46.0-wmf.5; 2025-12-02), 06Test Kitchen (Test Kitchen (Experiment Platform Sprint 17)): Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11505162 (10ssingh) >>! In T407570#11505059,... [20:37:43] FIRING: HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp3066 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3066 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [21:02:43] RESOLVED: [2x] HaproxyKafkaNoMessages: Unexpected rate of produced HaproxyKafka messages by cp3066 - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaNoMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=esams&var-instance=cp3066 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaNoMessages [22:20:28] 06Traffic, 06SRE: Wiki Education Dashboard being rate-limited for OAuth login and token fetching - https://phabricator.wikimedia.org/T414114#11505652 (10Ragesoss) @ssingh I've just deployed an update that should fix it. Now the user agent is `Wiki Education Dashboard/1.0 (dashboard.wikiedu.org; sage@wikiedu.or... [22:46:03] 06Traffic, 10Wikimedia-Site-requests, 07Logos: logos/manage.py failing due to 429 (thumbnail steps) - https://phabricator.wikimedia.org/T414048#11505718 (10SuperHamster) See T413570, may be related. We found that requests for thumbnails that don't have a Referer header would quickly get 429ed. Currently an...