[07:10:40] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11661729 (10Tacsipacsi) >>! In T414805#11591625, @Ladsgroup wrote: > This image is in a standard size and passes through our rate lim... [07:47:09] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11661784 (10Bugreporter) 05Stalled→03Invalid Many countries and many people has slow or intermittent Internet access, either because of low bandwidth or Internet censorship. WMF can give little help on that. So close as i... [08:05:09] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11661828 (10BrokenImages1234) 05Invalid→03Open [08:38:01] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11661927 (10BrokenImages1234) This has only started happening about a few weeks ago. Doesn't matter if I use Tor, a VPN or nothing at all, so this is not a specific connection issue. Tor bridges in particular work the same for ev... [09:18:59] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11662005 (10Bugreporter) Oh, if it is returning 429, then please provide a screenshot with more detail as following: * Click the red number "429" in browser console * Click "Response" at the right of newly-shown detail field * Sc... [09:27:51] FYI, lvs1013 seems to have lost network connectivity over the main IP, it's still reachable via mgmt [09:39:26] weird [09:41:18] switch port is up, but no inbound traffic [09:41:26] Last flapped : 2026-02-27 20:25:19 UTC (2d 13:15 ago) [09:41:34] so something was funny 2 days ago [09:41:37] funky* [09:53:45] 10netops, 06Infrastructure-Foundations, 06SRE: Update esams network pop diagrams - https://phabricator.wikimedia.org/T368084#11662141 (10ayounsi) To not have it become obsolete too fast, I suggest regrouping the transit providers in a single "peerings and transits" cloud (or one per esams core router) Same w... [10:09:20] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11662211 (10MatthewVernon) So that's a "thumbnail the same size as original, rather than original" issue (the original image is 1074p... [10:11:30] 06Traffic, 13Patch-For-Review: Refresh trafficserver_backend_requests_seconds histogram - https://phabricator.wikimedia.org/T411584#11662213 (10SLyngshede-WMF) Notes to myself, for deployment: ` sudo cumin C:mtail "disable-puppet 'T411584'" sudo cumin cp2027.codfw.wmnet "enabled-puppet 'T411584'" sudo cumin... [10:49:59] 06Traffic: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11662343 (10Fabfur) [10:55:25] 06Traffic: Upgrade to HAProxy 3.0 on cache (bullseye) hosts - https://phabricator.wikimedia.org/T417253#11662355 (10Vgutierrez) [11:22:19] 06Traffic, 13Patch-For-Review: Refresh trafficserver_backend_requests_seconds histogram - https://phabricator.wikimedia.org/T411584#11662384 (10SLyngshede-WMF) 05Open→03In progress [11:22:36] 06Traffic, 13Patch-For-Review: Refresh trafficserver_backend_requests_seconds histogram - https://phabricator.wikimedia.org/T411584#11662385 (10SLyngshede-WMF) a:05CDobbins→03SLyngshede-WMF [11:28:39] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11662413 (10Tacsipacsi) I didn’t realize it’s the same size as the original. However, I did notice that – unlike the thumbnail I foun... [11:29:58] 06Traffic, 06MW-Interfaces-Team, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MediaWiki-Platform-Team (Radar), and 2 others: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11662416 (10Fabfur) Thanks to the @JAllemandou summary, I've wrote... [13:05:14] Hi! Is there a way to yank a page from our cache servers? Specifically, I've added the turnilo-next.w.o -> turnilo-next.discovery.wmnet:30443 mapping to our tarfficserver hosts a couple of hours ago, and I keep being served the generic "Welcome! The Wikimedia movement is a global community of people..." page. I think I might have cached the page by [13:05:14] opening it in my browser to see if the redirection was effective. Thanks! [13:08:45] brouberol: yes, https://wikitech.wikimedia.org/wiki/Kafka_HTTP_purging#One-off_purge [13:10:21] thanks! So that'd be `echo 'https://turnilo-next.wikimedia.org' | mwscript-k8s --attach -- purgeList.php` I take it? [13:15:01] yes [13:15:18] BTW.. you can see in the response headers what the CDN is replying [13:15:24] are you getting a hit from the CDN? [13:15:53] curl https://en.wikipedia.org -v -o /dev/null 2>&1 |grep -i cache-status [13:15:53] < x-cache-status: hit-front [13:15:55] something like that [13:16:17] ah, while we were talking, seems that whatever cache was there expired, and I'm now getting to the app itself [13:16:42] brouberol: or your IP changed and you're now hitting another cp server [13:17:38] I'm still getting the "Welcome! The Wikimedia movement..." [13:17:55] I'm getting `< x-cache-status: hit-front` [13:18:22] yep, I spoke too fast, some requests are going to the app. some are not [13:18:39] I'm tempted to yank the domain from the cache [13:18:43] so what happened here? you added the rule in ATS before the backend was ready? [13:19:13] the backend was ready, and had been for 4-5 days. I think I opened the page in my browser too fast, before puppet could run on all cache hosts [13:19:30] and that might have poisoned the cache? [13:19:56] cf https://gerrit.wikimedia.org/r/c/operations/puppet/+/1247013/2#message-1fbb4274b423dc7141dded7c958efb048a4d2659 [13:20:33] oh.. so you hit turnilo-next before the rule was actually in place and you reached the catch all MW rule [13:20:41] ack [13:20:55] so you need to PURGE the URLs or wait ~24h :) [13:21:22] and by purge, you mean through the mwscript-k8s snippet you shared? [13:21:29] hi, I'm trying to track down whether HA Proxy does URL normalization before logging to Kafka (as I believe Varnish used to do). I see the log format here uses %HPO: [13:21:34] https://gerrit.wikimedia.org/g/operations/puppet/+/5bb8a7d2cb6bfe4e45b43b3a9d320491c6d5fc77/modules/profile/templates/cache/haproxy.cfg.erb#71 [13:21:35] brouberol: yes [13:21:46] (sorry if I'm being slow there, I'm a bit terrified of doing the wrong thing :D) [13:22:40] !log Running `echo 'https://turnilo-next.wikimedia.org' | mwscript-k8s --attach -- purgeList.php` [13:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:50] milimetric: yes.. we send %HPO as the uri_path to haproxykafka [13:23:20] milimetric: with URL what you're referring to? Host, path, query? [13:23:33] (all of them)? [13:24:18] I think all of them, I'm trying to figure out what logic applies to what HAProxy first sees from the request and what ultimately is served from cache to the response [13:24:31] so like, if a request came in for /wiki/barack%20Obama, would %HPO be /wiki/Barack_Obama? [13:25:28] so we have certain character normalization that happens in varnish and ATS that's missed by haproxy [13:25:35] hmm, somehow I'm still getting the "Welcome! The Wikimedia movement..." page. [13:25:42] Oh well, I might just have to wait then [13:26:04] brouberol: the only other thing I've seen work with poison is true love's kiss [13:26:52] one day, my sweet https://en.wikipedia.org/wiki/Prince_(musician) will come [13:28:42] milimetric: for the host header we perform certain normalization (http-request set-header Host %[req.fhdr(Host),host_only,rtrim(.)]) but haproxykafka gets it before normalization IIRC /cc fabfur [13:28:48] vgutierrez: Timo helped me realize that, but somehow I don't see _any_ requests with a " " (as opposed to %20) in kafka, and I think some browsers send literal spaces, so I'm thinking some normalization happens still, maybe %HPO is the URI that HAProxy sees as it sends the response back? And then it's going through processing still? [13:29:26] milimetric: I can test that easily.... [13:30:54] or not that easily.. curl refuses the white space :D [13:32:41] oh, what would you curl? I can run some tests on all the things I'm wondering about (there's a few and some may be happening while others not) [13:34:07] so a white space shouldn't be there per RFC 3986.. so curl/wget will refuse to do that for you [13:35:13] that's ok, I'm wondering about lowercase first letter, percent-encoded things, etc. [13:35:30] so like /wiki/barack%20Obama [13:35:48] https://www.irccloud.com/pastebin/clh8tjxv/ [13:35:59] it looks like we refuse white spaces in URLs [13:37:43] so in barack%20Obama the U-A must be sending a %20 in there [13:41:24] milimetric: so, if an U-A sends an actual white space on the URL it ends up in turnilo as `BADREQ` [13:41:44] so for `/wiki/barack%20Obama` the U-A actually sent that `%20` [13:42:01] vgutierrez: ok, makes sense for spaces, I'm trying to figure out if other normalization happens too, like turning the %20 to _ [13:43:51] milimetric: that's varnishland [13:44:06] and/or ATS [13:45:31] so you won't see that on turnilo [13:45:49] vgutierrez: right, I remain confused as to whether HA Proxy is sending me the URL it first sees, with no Varnish/ATS logic (as described here: https://phabricator.wikimedia.org/T210295) or the processed URL after it comes back from that and/or MediaWiki, as it's sending the response back to the UA [13:46:53] milimetric: what do you mean the processed URL? [13:47:02] the URl is a request field [13:47:11] unless you're referring to some sort of `link` canonical header [13:48:09] so the path in `%HPO` is the one sent by the U-A [13:49:03] ok, that's interesting, this would then be quite different from the way we were getting logs from Varnish [13:50:56] yes, you'll miss all the normalization magic that happens in Varnish [13:51:23] that's `templates/normalize_path.inc.vcl.erb` [13:52:50] very interesting, ok, thank you very much @vgutierrez [14:02:08] I think by the current character-level normalization bits at the top of normalize_path, a raw space from the URI is supposed to be translated to %20 [14:03:24] (at least at the varnish layer) [14:04:26] there's a generic set of rules in normalize_path, and then MW and RB -specific ones in templates/text-frontend.inc.vcl.erb as normalize_mediawiki_path and normalize_rest_path [14:04:58] it's always been challenging to pin some of this down without creating bugs or finding ambiguities in how the applayer and/or browsers handle things, but those do normalize a lot of character variants. [14:05:42] this is my takeaway so far: [14:06:12] haproxy may have its own rules now though, which would happen before the varnish ones [14:06:47] 1. before the varnishkafka -> haproxykafka migration, what JS would give an UA for location.pathname would match what we had in kafka for uri_path decently well (because JS updates the URI client-side after it gets the response) [14:07:30] it makes sense the kafka transition might affect all of this. Now your kafka stream comes from before varnish can normalize the input URL path. [14:07:36] 2. after the migration, location.pathname is usually different from uri_path on any case where VCL/RB/MW would change the URL [14:08:12] yep, and Valentin is saying HA Proxy doesn't normalize at all, so at least there's not another layer there [14:08:17] ok [14:08:35] I mean, we could probably add matching normalization there. It might not be trivial. [14:09:11] nono, it's ok, analytics and this matching can happen downstream, and I'll put all this together and make it work, thanks a lot to you all for helping me gather the info [14:11:11] oh, I missed my own comment from back then: the decoder ring entries with 0 (meaning, always force encoding that character as %XX) are not currently enforced at all. I guess we were scared of that at the time. [14:11:34] but the entries marked 1 are forced from superfluous-%XX to a regular raw character. [14:19:43] FIRING: [9x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:24:43] FIRING: [15x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:29:43] RESOLVED: [15x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [14:38:22] 06Traffic, 06MW-Interfaces-Team, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06MediaWiki-Platform-Team (Radar), and 2 others: haproxy: capture x-wmf-* headers in webrequest data set - https://phabricator.wikimedia.org/T417864#11663137 (10Tgr) > Can be set in X-Analytics directly in the backen... [14:50:41] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: MediaViewer (and the commons file page) should serve WebP originals not thumbnails of equivalent size - https://phabricator.wikimedia.org/T418745 (10MatthewVernon) 03NEW [14:56:50] 06Traffic, 06MW-Interfaces-Team, 06MediaWiki-Platform-Team (Radar), 07OKR-Work, 13Patch-For-Review: haproxy: strip x-wmf-* headers from responses - https://phabricator.wikimedia.org/T417781#11663198 (10Joe) For now, let's remove the individual headers at the edge; we will have time to come up with a pref... [15:20:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN:New switch setup/configuration - https://phabricator.wikimedia.org/T418439#11663288 (10ayounsi) p:05Triage→03Low [15:28:14] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11663336 (10AntiCompositeNumber) Original-size thumbnails must be supported as not all formats are web-safe. [15:39:55] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11663413 (10Ladsgroup) >>! In T414805#11663336, @AntiCompositeNumber wrote: > Original-size thumbnails must be supported as not all f... [15:51:20] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11663515 (10AntiCompositeNumber) We intentionally don't serve WebP originals for browser support reasons at the time support was adde... [15:59:36] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11663557 (10Ladsgroup) >>! In T414805#11663515, @AntiCompositeNumber wrote: > We intentionally don't serve WebP originals for browser... [16:54:44] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11664108 (10ihurbain) I can see this behaviour on the (internal) OfficeWiki, on the Contact List, on Firefox 148.0 on Linux (Ubuntu snap packaging), from a connection with the Init7 internet provider in Switzerland. The 429s loo... [17:14:28] 10netops, 06Infrastructure-Foundations, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772 (10Papaul) 03NEW [17:34:08] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11664460 (10JMeybohm) [17:50:01] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11664578 (10Papaul) p:05Triage→03High [17:51:42] 06Traffic: Images randomly fail to load - https://phabricator.wikimedia.org/T418323#11664582 (10Bugreporter) Can you add a screenshot of the body (not just header) of the response? [17:59:38] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: MediaViewer (and the commons file page) should serve WebP originals not thumbnails of equivalent size - https://phabricator.wikimedia.org/T418745#11664624 (10Aklapper) [Please review project tags and subscribers when creating s... [18:13:39] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: MediaViewer (and the commons file page) should serve WebP originals not thumbnails of equivalent size - https://phabricator.wikimedia.org/T418745#11664686 (10Ladsgroup) Based on comparing https://caniuse.com/?search=WebP and ht... [18:17:12] sukhe: we've opened this task to track the brief planned maintenance : https://phabricator.wikimedia.org/T418772 one CP and one LVS for traffic [18:19:14] hi traffic, ok if I do a rollout to A:cp soon? [18:21:18] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, and 2 others: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11664738 (10Ladsgroup) For the sake of bookkeeping. This is 1:128 sample of requests to non-standard sizes that haven't been blocked... [18:30:48] brett: ^ [18:30:49] XioNoX: thanks, will respond [18:31:28] cdanis: Sure, may I get the patch to follow? [18:34:44] brett: it's now https://gerrit.wikimedia.org/r/c/operations/puppet/+/1247126 😅 [18:38:13] cdanis: I'm in a meeting but will keep an eye - have at it [18:38:44] cdanis: is that a revert of the incident changes? [18:38:48] no [18:39:03] it's some improvements to the default automatic policy [18:40:36] I'd love a more descriptive commit message if that's okay [19:14:18] brett: all done :) [19:17:58] ty! [19:25:39] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11665163 (10ssingh) [20:18:40] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: MediaViewer (and the commons file page) should serve WebP originals not thumbnails of equivalent size - https://phabricator.wikimedia.org/T418745#11665357 (10Tacsipacsi) TIFFs (e.g. https://commons.wikimedia.org/wiki/Category:T... [21:17:29] 10netops, 06Infrastructure-Foundations, 10Prod-Kubernetes, 06ServiceOps new, 06SRE: Eqiad: lsw1-d7-eqiad BGP maintenance - https://phabricator.wikimedia.org/T418772#11665601 (10colewhite) [22:36:54] 06Traffic, 07OKR-Work, 06Test Kitchen (Experiment Platform Sprint 20): Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11665964 (10Sfaci) Hi again @ssingh! After discussing a bit the above (the privacy considerations), we think t...