[03:19:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [03:24:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp5018:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [07:54:59] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes, 13Patch-For-Review: Handling inbound IPIP traffic on low traffic LVS k8s based realservers - https://phabricator.wikimedia.org/T352956#11015954 (10Vgutierrez) @akosiaris I think we could start considering enabling inbound IPIP traffic on the stagi... [08:49:47] 10netops, 06Infrastructure-Foundations, 06SRE: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931 (10cmooney) 03NEW p:05Triage→03Low [08:54:57] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: eqsin purged consumers lag - https://phabricator.wikimedia.org/T399221#11016109 (10cmooney) All looks clean overnight with this, I have confirmed to Arelion they can close their ticket and we will re-open if the same thing happens a... [08:55:06] 10netops, 06Infrastructure-Foundations, 06SRE: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931#11016110 (10ayounsi) Makes sens! [09:08:15] 06Traffic, 10Prod-Kubernetes, 06serviceops, 07Kubernetes, 13Patch-For-Review: Handling inbound IPIP traffic on low traffic LVS k8s based realservers - https://phabricator.wikimedia.org/T352956#11016142 (10akosiaris) >>! In T352956#11015954, @Vgutierrez wrote: > @akosiaris I think we could start consideri... [09:33:26] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: BGP: Support receipt of graceful-shutdown community and set local-pref - https://phabricator.wikimedia.org/T399931#11016186 (10cmooney) [09:43:28] 06Traffic, 13Patch-For-Review: Consider using the alternate chain of Google Trust Services certificates - https://phabricator.wikimedia.org/T398596#11016205 (10Vgutierrez) `sql SELECT CASE WHEN hostname IN ('cp7009.magru.wmnet', 'cp7010.magru.wmnet', 'cp7011.magru.wmnet', 'cp7012.magru.wmnet') TH... [11:02:15] cdanis: could I get https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1169149 reviewed? initial figures available from webrequest data look good (https://phabricator.wikimedia.org/T398596#11016205), but I think it could be useful to be able to stare at full data without the webrequest sampling [11:46:09] 06Traffic: Puppet CI should test haproxy configuration file syntax - https://phabricator.wikimedia.org/T399941 (10Fabfur) 03NEW [12:25:39] 06Traffic: ncredir sometimes receives large trafic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947 (10jcrespo) 03NEW [12:28:24] 06Traffic: ncredir sometimes receives large trafic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947#11016696 (10jcrespo) @vgutierrez sent https://gerrit.wikimedia.org/r/c/operations/puppet/+/1170536 [12:29:58] 06Traffic: ncredir sometimes receives large trafic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947#11016701 (10jcrespo) [12:31:37] 06Traffic: ncredir sometimes receives large trafic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947#11016709 (10Vgutierrez) p:05Triage→03Medium [13:52:06] vgutierrez: homer wants to add BGP peering to lvs1017 on lsw1-e2-eqiad [13:52:18] is it set up to peer with the switch? [13:52:23] nope [13:52:36] ok - I'll set bgp to false for it in netbox for now and we can enable when it's ready [13:52:42] right now it's idling [13:52:45] it was probably set to true from previous [13:52:46] ok [13:53:02] yeah.. lvs1017 was handling production traffic 2 weeks ago [13:53:05] topranks: I remember we toggled it though, so I wonder if it after reimage it was set to true or did we miss a step? [13:54:23] sukhe: I can't see in the netbox logs that it was ever disabled [13:54:46] ok then it's our bad. [13:54:49] however the host status changed to 'decommisioning', which would cause it to be removed from the switch [13:54:58] now it's state=active again [13:55:03] no biggie, thansk! [13:55:06] ok, that would be after the reimage then [13:55:11] state=active [14:27:31] vgutierrez: I can help you with the schema change if you need, give me a shout [14:28:27] that will require a version bump right? aka 1.0.1? [14:29:17] I had that CR ready till I realized about the version bump :D [14:33:11] yeah [14:33:15] there is a tool to do it for you [14:33:25] (easy enough to do it by hand ofc but) [14:34:50] npm? 👀 [14:35:10] use `fresh` if you like ;) [15:05:11] 06Traffic: Adapt varnish test script(s) to perform HAProxy configuration validation - https://phabricator.wikimedia.org/T399941#11017257 (10Fabfur) [17:42:46] 06Traffic, 10Hiddenparma: Introduce allowlists into the CDN (text) filtering - https://phabricator.wikimedia.org/T399057#11017599 (10CDanis) We discussed this -- both the overall structure and some of the finer points -- on IRC this week, and again at the WE4.3.1 kickoff. * Letter grades imply a total orderi... [19:55:38] 06Traffic, 10Hiddenparma: Introduce allowlists into the CDN (text) filtering - https://phabricator.wikimedia.org/T399057#11017815 (10CDanis) [23:17:30] 06Traffic: ncredir sometimes receives large traffic spikes leading to unavailability - https://phabricator.wikimedia.org/T399947#11018062 (10Reedy)