[02:00:31] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10921099 (10BCornwall) [05:54:10] 10netops, 06Infrastructure-Foundations: Enable gNMI on SRX devices and fasw - https://phabricator.wikimedia.org/T390052#10921326 (10ayounsi) From JTAC: > Engineering stated that JSD the process code that manages gRPC missed being shipped to this platform and they are working to push the grpc library code to th... [08:20:42] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Requestctl should use x-provenance header - https://phabricator.wikimedia.org/T396621#10921633 (10Fabfur) [09:48:02] o/ I didn't get around to deploying the ATS change I linked yesterday, would it suit to do it now? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1156813 [10:21:30] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on my maps - https://phabricator.wikimedia.org/T397151 (10Fpisot) 03NEW [10:28:44] hnowlan: sure [10:30:16] thanks! [10:30:22] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on my maps - https://phabricator.wikimedia.org/T397151#10922246 (10Bugreporter) 05Open→03Invalid This is not a valid usecase. See https://switch2osm.org/providers/ instead, or if you just want to use in your personal project, create a reverse proxy... [10:31:33] 10netops, 06Infrastructure-Foundations, 06SRE: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153 (10cmooney) 03NEW p:05Triage→03Low [10:49:20] 06Traffic, 10Maps, 06SRE: Allow Wikimedia Maps usage on my maps - https://phabricator.wikimedia.org/T397151#10922331 (10Aklapper) 05Invalid→03Declined Per https://wikitech.wikimedia.org/wiki/Maps/External_usage, > maps.wikimedia.org tiles may only be used by Wikimedia wikis, and sites hosted by Wikim... [10:51:02] testing done, rolling out now [10:57:53] 06Traffic, 10MediaWiki-Core-AuthManager, 06MediaWiki-Platform-Team: [WE5.5.3] Decide how to expose session information to infrastructure layers in front of MediaWiki - https://phabricator.wikimedia.org/T394012#10922373 (10Vgutierrez) //Turn some session tokens into JWTs, deprecate the rest// (Option 3) is th... [12:59:33] 06Traffic, 10Liberica, 13Patch-For-Review: seamless upgrade triggers dropped packets with katran - https://phabricator.wikimedia.org/T397053#10923087 (10Vgutierrez) even after restarting liberica-fp with `User=root` on its service unit we can see the following warning on its journallog: ` Jun 17 11:00:50 lvs... [13:00:07] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185 (10taavi) 03NEW [13:53:43] 06Traffic, 10Liberica: seamless upgrade triggers dropped packets with katran - https://phabricator.wikimedia.org/T397053#10923448 (10Vgutierrez) 05Open→03Resolved spawning daemons that use eBPF with `User=root` and `ProtectKernelTunables=no` solved the issue [13:54:08] 06Traffic: Upgrade to ATS 9.2.10 - https://phabricator.wikimedia.org/T390912#10923462 (10ssingh) [15:10:46] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924062 (10xcollazo) CC @BTullis [15:16:00] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924112 (10xcollazo) @cmooney: +1 to the change. Can you please share the link to this dashboard? [15:21:48] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924152 (10cmooney) >>! In T397153#10924112, @xcollazo wrote: > @cmooney: +1 to the change. > > Can you please share the link to this dashboard?... [16:07:21] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924481 (10Dzahn) Would it be easier and more consistent to point all domains to the main Wikimedia NS servers as previously came up in another ticket? (... [16:08:52] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10924494 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye [16:12:50] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924503 (10taavi) >>! In T397185#10924480, @Dzahn wrote: > Would it be easier and more consistent to point all domains to the main Wikimedia NS servers a... [16:19:01] 06Traffic: Upgrade to ATS 9.2.10 - https://phabricator.wikimedia.org/T390912#10924577 (10ssingh) [16:22:25] FIRING: SystemdUnitFailed: prometheus-nft-throttling-denylist.service on durum7003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:23:31] ^ this is a insetup server [16:23:32] but still looking [16:36:55] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924650 (10xcollazo) Now that I think more about this: I don't know where in puppet, but I am aware that we throttle any individual download to 3-6... [16:52:47] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924738 (10Dzahn) Gotcha. Thanks for adding that. [16:55:52] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10924754 (10cmooney) >>! In T397153#10924650, @xcollazo wrote: > Perhaps this also includes rsync traffic? Yeah the throughput graphs include all... [17:10:26] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10924834 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host lvs1016.eqiad.wmnet with OS bullseye completed: - lvs1016 (**PASS**)... [17:12:51] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10924863 (10BCornwall) [17:22:51] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10924906 (10BCornwall) [17:28:02] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10924947 (10BCornwall) [17:34:14] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10925006 (10ssingh) You can add them if required even though my reading of the other tasks and RFC indicates it is optional. That being said, why only v6... [17:44:02] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925056 (10BCornwall) [17:54:26] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10925115 (10taavi) >>! In T397185#10925006, @ssingh wrote: > You can add them if required even though my reading of the other tasks and RFC indicates it i... [17:59:32] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925156 (10BCornwall) [18:16:27] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925216 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by brett@cumin2002 for hosts: `lvs1017.eqiad.wmnet` - lvs1017.eqiad.wmnet (**PASS**) - Downtimed hos... [18:58:12] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925320 (10BCornwall) [19:01:49] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925328 (10BCornwall) [19:15:02] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925361 (10BCornwall) [19:15:59] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925367 (10BCornwall) @VRiley-WMF Okay! We've reimaged lvs1016 as the new primary and have lvs1020 as secondary. lvs1017 has been decommissioned and is ready to be removed/ser... [19:24:18] 06Traffic, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q3:test NIC for lvs1017 - https://phabricator.wikimedia.org/T387145#10925414 (10BCornwall) [20:02:25] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_varnish-frontend-slowlog.service on cp6010:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:04:27] hmmmm [20:04:32] Jun 16 19:57:01 cp6010 wmf-auto-restart[645535]: INFO: 2025-06-16 19:57:01,362 : No restart necessary for service varnish-frontend-slowlog [20:04:56] It just kept looping itself [20:05:08] thanks for looking [20:06:25] So it detected it needed to restart, kicked the slowlog, then exited, then got restarted again [20:06:42] and then detected no restart needed, exited, then got retriggered [20:10:13] I'mma be honest; I'm not really interested in mucking around the auto_restart mess in puppet, so unless this happens again I'mma just let it go [20:10:38] brett@cp6010:~$ sudo systemctl reset-failed wmf_auto_restart_varnish-frontend-slowlog.service [20:14:03] yeah [20:27:25] FIRING: [2x] SystemdUnitFailed: wmf_auto_restart_varnish-frontend-slowlog.service on cp6010:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:33:40] grr [20:52:32] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Map dumps HTTPS traffic as low-priority for QoS - https://phabricator.wikimedia.org/T397153#10925689 (10xcollazo) >That said we you can see that many of the busiest times - as seen on the Grafana throughput graph - correlate with times when... [23:12:37] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10926105 (10BCornwall)