[02:47:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp3069:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=esams&var-instance=cp3069 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:52:40] FIRING: [20x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [02:57:41] FIRING: [20x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:02:40] FIRING: [22x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:07:40] FIRING: [22x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:12:40] FIRING: [20x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:22:40] FIRING: [18x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:27:40] FIRING: [20x] VarnishHighThreadCount: Varnish's thread count on cp3066:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:42:40] FIRING: [9x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:47:40] RESOLVED: [8x] VarnishHighThreadCount: Varnish's thread count on cp5017:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [06:17:43] FIRING: [12x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [06:22:43] FIRING: [33x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [06:32:43] RESOLVED: [33x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [08:12:11] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 transport flapping, disabled for now - https://phabricator.wikimedia.org/T407578#11284197 (10cmooney) Thanks Brandon you did the right thing. For now, for troubleshooting, I have set the Arelion circuit to 'drained' sta... [08:24:49] 10netops, 06Infrastructure-Foundations, 06SRE: cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 transport flapping, disabled for now - https://phabricator.wikimedia.org/T407578#11284209 (10cmooney) Seems this started fairly suddenly yesterday afternoon: {F66756494 width=800} The link is flapping hard up/down cons... [08:52:04] 10netops, 06Infrastructure-Foundations, 06SRE: Arelion 100G transport cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 flapping on eqiad side [Oct 2025] - https://phabricator.wikimedia.org/T407578#11284373 (10cmooney) [09:02:11] 10netops, 06Infrastructure-Foundations, 06SRE: Arelion 100G transport cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 flapping on eqiad side [Oct 2025] - https://phabricator.wikimedia.org/T407578#11284488 (10cmooney) The link has been mostly stable since re-enabling it at 08:15 UTC, it flapped a few times immediat... [10:00:21] 10netops, 06Infrastructure-Foundations, 06SRE: Arelion 100G transport cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 flapping on eqiad side [Oct 2025] - https://phabricator.wikimedia.org/T407578#11284632 (10cmooney) So there was a known fault on the Arelion side and they had raised a ticket internally about it.... [11:22:46] 10netops, 06Infrastructure-Foundations, 07Documentation: The links under "Test IP fragmentation issues" on `wikitech:Reporting a connectivity issue` no longer appear to work - https://phabricator.wikimedia.org/T407505#11284843 (10Ladsgroup) Triaging the ticket to the correct team. Please correct me if it's w... [13:18:33] 06Traffic, 06Experimentation Lab: Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11285200 (10ssingh) Thanks for filling the task, @JVanderhoop-WMF. As per the discussion on Slack, the above sounds good. Please let us know if Traff... [13:31:39] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#11285219 (10cmooney) Further to note that the Nokias do support the Openconfig BGP paths, apart from this one which is no main issue: ` /network-insta... [14:53:24] sukhe: is the list of 'haproxy_allowed_healthcheck_sources' in common.yaml still used? I think it might be missing a few sources [14:54:27] I'm not sure on how strict it needs to be either, looking at the list it strikes me it might be easier to just list all the vlan ranges [14:55:37] topranks: yes, that's used [14:56:11] topranks: it's used to build an HAProxy map [14:59:07] I really don't recall why we IPs there instead of just directly the ranges (I mean, we do have some /64s there) [14:59:10] vgutierrez: do you recall? [15:01:15] topranks: I think we can just do the VLAN ranges but I will defer to vg in case I am missing something. that takes away the manual maintenance work [15:01:29] (I mean putting those vs the /32s we have) [15:11:51] we need to be pretty strict there, allowing lvs only [15:12:10] yeah I think that's what he meant [15:31:13] 06Traffic, 06SRE, 05FY2025-26 WE3.3 Engaging core audiences: [Reading Lists] Monitor potential performance impact of Reading Lists for Web - https://phabricator.wikimedia.org/T397526#11285716 (10CDanis) Are there any early estimates of the expected %age increase in something like logged-in daily active users? [15:40:56] well you are allowing literally millions of IPv6 IPs [15:41:03] so you need to be more strict I think [15:41:29] yeah we have a bunch of /64s in there right [15:41:44] let us know if you want us to patch it instead [15:41:47] but anyway... I'll make some patches next week to fix the list it's missing some of them [15:41:53] ok thanks <3 [15:42:32] nah it's fine. I don't get what security benefit being strict with v4 brings if everything on the vlan can do it over v6 though? [15:43:28] I am not sure why we put those in and it seems like v..g doesn't recall either but we can discuss in the Traffic meetingand follow up [15:43:43] ok thanks [15:44:23] I'm not opposed to being strict for security purposes if it's needed, but given _all_ our hosts have IPv6 addressing blocking them on v4 only doesn't seem to add much on the face of it [15:45:42] blaming the line to see if we can get some context on the specificity of v4s [15:46:12] https://phabricator.wikimedia.org/T348851 [15:46:44] doesn't really say why we made the v4s specific though [15:47:05] topranks: anyway, fabrizi.o is out till Monday so I will check with him and follow up [15:49:27] hmm [15:49:49] vgutierrez: see you on Monday, this is not urgent <3 [15:50:49] the IPv6 /64 is actually allocated to LBs but we don't have that segmentation on IPv4? [15:52:32] hmm nope [15:52:33] well, we do have the v4 /22 equivalents [15:52:44] 10.64.16.0/22 -> private1-b-eqiad should cover it for example [16:04:22] oh.. I think i remember [16:05:04] yeah.. IPv6 addresses on vlan interfaces in lvs boxes [16:05:55] use vlan1061@lvs1020 as an example https://www.irccloud.com/pastebin/NvqpQhbz/ [16:05:56] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11285785 (10BCornwall) 05Stalled→03In progress I was able to get in contact and the domain transfer will begin shortly. Unfortunately, services will be disrupted for a short while as we initiat... [16:06:24] in non-core DCs we can be specific with IPv6 addresses as well [16:06:27] and drop the /64 prefixes [16:07:47] in core DCs we can do that already with high traffic LVS given all the traffic is IPIP encapsulated already [16:07:51] so no need for vlans anymore [16:08:01] yeah thattoo [16:08:17] so we should revisit the list [16:09:06] 🚴‍♂️ time here :D see on Monday [16:09:16] good weekend vg! [16:09:59] he will be back from the bike [16:10:07] he is doing 100km bike runs and reading Hiera [16:11:06] nah.. I've stopped doing that [16:11:39] I've noticed that my IRC interactions can be impacted by the elevated HR [16:12:02] so.. I'm currently sitting at 50bpms... interacting with coworkers at 150bpm isn't ok :D [16:12:34] 12:11:39 < vgutierrez> I've noticed that my IRC interactions can be impacted by the elevated HR [16:12:37] of course you have [16:12:45] on your grafana dashboard that tracks your bpm [16:13:28] I wonder if an LLM could score my IRC messages by moodiness [16:13:37] and correlate that with HR measures [16:13:47] vgutierrez: bike calling, bye :* [16:13:54] sentiment analysis was one of the early medium language model applicatiosn [17:18:46] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11286029 (10violetwtf) @BCornwall I would like to take a moment to commend your persistence through all of this. 2.5 years later, here we go! [17:31:15] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11286089 (10Dzahn) Seconded! It's great to see old domain tickets being handled. Thank you, Brett. [19:25:38] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11286514 (10ssingh) Nice job indeed in pursuing this over the years, Brett! [19:49:43] FIRING: HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=drmrs&var-instance=cp6016&viewPanel=panel-19 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:54:43] FIRING: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [19:59:43] FIRING: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:04:43] RESOLVED: [16x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages [20:57:25] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11286687 (10BCornwall) a:03BCornwall