[01:48:31] <wikibugs>	 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Main Rollout] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11288292 (10Krinkle) >>! https://gerrit.wikimedia.org/r/1191134 **merged** by BCornwall: > varnish: Enable unified mob...
[01:48:45] <wikibugs>	 06Traffic, 06MediaWiki-Platform-Team (Radar), 13Patch-For-Review, 07User-notice: [Main Rollout] Enable unified mobile routing on remaining wikis - https://phabricator.wikimedia.org/T403510#11288295 (10Krinkle)
[06:38:00] <jinxer-wm>	 FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp5027:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue
[06:40:40] <jinxer-wm>	 FIRING: VarnishHighThreadCount: Varnish's thread count on cp5028:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5028 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[06:43:00] <jinxer-wm>	 FIRING: [2x] PurgedHighBacklogQueue: Large backlog queue for purged on cp5027:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts  - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue
[06:45:40] <jinxer-wm>	 FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[06:48:00] <jinxer-wm>	 FIRING: [2x] PurgedHighBacklogQueue: Large backlog queue for purged on cp5027:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts  - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue
[06:50:40] <jinxer-wm>	 FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[06:55:40] <jinxer-wm>	 FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[06:58:00] <jinxer-wm>	 RESOLVED: [2x] PurgedHighBacklogQueue: Large backlog queue for purged on cp5028:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5028 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue
[07:00:40] <jinxer-wm>	 FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[07:20:40] <jinxer-wm>	 FIRING: [4x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[07:35:40] <jinxer-wm>	 FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp5026:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[07:45:40] <jinxer-wm>	 FIRING: [6x] VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[07:55:40] <jinxer-wm>	 FIRING: [5x] VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[08:05:40] <jinxer-wm>	 RESOLVED: [3x] VarnishHighThreadCount: Varnish's thread count on cp5027:0 is high - https://wikitech.wikimedia.org/wiki/Varnish  - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount
[08:05:43] <jinxer-wm>	 FIRING: [8x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[08:10:43] <jinxer-wm>	 FIRING: [32x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[08:15:43] <jinxer-wm>	 FIRING: [32x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[08:20:43] <jinxer-wm>	 RESOLVED: [32x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[13:48:25] <wikibugs>	 06Traffic, 10MediaWiki-Core-AuthManager, 06MediaWiki-Platform-Team: Consider using EdDSA rather than RSA for MediaWiki session tokens - https://phabricator.wikimedia.org/T407194#11289498 (10Tgr) >>! In T407194#11273155, @ssingh wrote: > (Is there anything -- including input -- required from Traffic on this?...
[13:49:33] <wikibugs>	 06Traffic, 10bot-traffic-requests: Global block exception for AddDesc app - https://phabricator.wikimedia.org/T407706#11289502 (10CDanis)
[15:03:20] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Arelion 100G transport cr1-eqiad:et-1/1/2 <-> cr1-codfw:et-1/0/2 flapping on eqiad side [Oct 2025] - https://phabricator.wikimedia.org/T407578#11290097 (10cmooney) p:05Triage→03Low a:03cmooney Gonna leave this a few days before closing, we've had a few fla...
[15:31:46] <wikibugs>	 06Traffic, 06SRE: Improve how we build the 'haproxy_allowed_healthcheck_sources' list of IPs - https://phabricator.wikimedia.org/T407769#11290283 (10ssingh) Thanks for filing this task! I think this is a good idea to reduce the manual updates to this list, and something we have failed to keep updated. We will...
[15:51:20] <wikibugs>	 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11290389 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum6001.drmrs.wmnet with OS trixie
[16:31:38] <jinxer-wm>	 FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 198.35.26.96:443 @ cp4038 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=ulsfo&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS
[16:31:51] <sukhe>	 yeah ok
[16:32:31] <vgutierrez>	 just sukhe doing sukhe things
[16:33:55] * sukhe is guilty
[16:34:06] <sukhe>	 alert fires here, I open karma to silence, gone
[16:34:30] <sukhe>	 pre-emptive silencing didn't work last time so at least that's something to debug in a task
[16:34:57] <sukhe>	 tried again
[16:40:38] <wikibugs>	 10Domains, 06Traffic, 06SRE: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11290600 (10BCornwall) Thank you, all. :)  This has been migrated and things should continue to behave as expected. If that's not true, please re-open this ticket so we can look into it!
[16:40:47] <wikibugs>	 10Domains, 06Traffic, 06SRE: Acquire enwp.org - https://phabricator.wikimedia.org/T332220#11290601 (10BCornwall) 05In progress→03Resolved
[16:46:31] <wikibugs>	 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11290610 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum6001.drmrs.wmnet with OS trixie completed: - durum6001 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled...
[16:49:00] <wikibugs>	 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11290624 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin1003 for host durum6002.drmrs.wmnet with OS trixie
[17:18:32] <wikibugs>	 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11290695 (10ssingh)
[17:42:59] <wikibugs>	 06Traffic: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832#11290758 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin1003 for host durum6002.drmrs.wmnet with OS trixie completed: - durum6002 (**PASS**)   - Downtimed on Icinga/Alertmanager   - Disabled...
[18:04:38] <jinxer-wm>	 FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS
[18:04:56] <sukhe>	 silence worked on ulsfo, failed here
[18:05:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: haproxykafka.service on cp5025:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:05:57] <sukhe>	 hmm so no timing issues in ulsfo but again in eqsin
[18:07:12] <sukhe>	 Deleted silence ID 48ba116f-8035-47e0-aad8-d4e9b53fa1dc
[18:07:12] <sukhe>	 Sleeping for 5 seconds (until 2025-10-20T18:04:08+0000)
[18:07:27] <sukhe>	 so it deleted the silence *after* but it still alerted here
[18:10:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: haproxy.service on cp5017:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:14:21] <sukhe>	 can't be just IRC though since the email was also delivered
[18:49:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: haproxy.service on cp5018:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:50:24] <sukhe>	 ^ again, fired *after* everything was done.
[18:50:31] <sukhe>	 filing a task, I have sufficient data
[18:51:43] <jinxer-wm>	 FIRING: [4x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[18:52:22] <sukhe>	 that's esams and not related
[18:54:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: haproxy.service on cp5018:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:56:43] <jinxer-wm>	 FIRING: [13x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[19:01:52] <wikibugs>	 06Traffic, 10Observability-Alerting: Alertmanager triggers an alert on IRC and email after the alert has resolved - https://phabricator.wikimedia.org/T407787 (10ssingh) 03NEW
[19:01:59] <wikibugs>	 06Traffic, 10Observability-Alerting: Alertmanager triggers an alert on IRC and email after the alert has resolved - https://phabricator.wikimedia.org/T407787#11291074 (10ssingh) p:05Triage→03Low
[19:06:43] <jinxer-wm>	 RESOLVED: [13x] HaproxyKafkaSocketDroppedMessages: Sustained high rate of dropped messages from HaproxyKafka - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaSocketDroppedMessages  - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaSocketDroppedMessages
[19:16:17] <cdanis>	 bblack: sukhe: ok I've actually fixed it, this time https://gerrit.wikimedia.org/r/c/operations/puppet/+/1197323
[19:18:23] <sukhe>	 cdanis: thanks, going to 302 to bblack on this since has the most context
[19:18:37] <cdanis>	 ack, I might just self-merge, I'm pretty convinced (and tested it by hand on one cp host)
[19:19:30] <sukhe>	 bblack is around I think in case you want to get his input
[19:20:24] <sukhe>	 looks good in theory, the clamping
[19:24:52] <cdanis>	 thanks :)
[19:29:20] <bblack>	 something's not right with the explanation, at least
[19:29:41] <bblack>	 "days" are a fixed concept (they reset at midnight UTC or whatever approximation of that)
[19:30:02] <bblack>	 but weeks are counted from that particular cookie's creation day
[19:31:26] <bblack>	 (for the purpose of incrementing "count")
[19:31:39] <cdanis>	 oh, hm, okay, I guess I had taken "number of distinct weeks" too literally
[19:32:00] <bblack>	 yeah this stuff can be confusing, even to me in retrospect
[19:32:15] <bblack>	 but I think the correct low-level understanding is:
[19:32:29] <cdanis>	 this does seem to be fixing the bad data though -- cp1100 hasn't logged single a freq>10 since
[19:32:31] <bblack>	 count==0 -> freshly-baked cookie during this request
[19:32:47] <bblack>	 count==1 -> a returned cookie, during the first ~week since creation
[19:33:29] <bblack>	 count>1 -> a returned cookie, which has been returned in multiple distinct weeks since creation (including the first one).
[19:35:37] <bblack>	 so, for example, if count is 5, and the cookie's age is 100 days old (which is ~14.29 weeks), then their frequency of visition down to 1 week resolution is basically 5/15 => 1/3)
[19:36:01] <bblack>	 and then all the blah blah about rounding to integers, and that we only want a rough gauge with ~10 steps.
[19:36:42] <bblack>	 the fractional weeks part doesn't have to be perfect, this is all meant to be very rough for privacy anyways.
[19:39:01] <bblack>	 I suspect the real issue is in those latter parts? maybe the initial weeks  =(days+6)/7?
[19:39:29] <cdanis>	 that works fine with floor division
[19:41:37] <bblack>	 then something else is wrong, hmmmm
[19:46:53] <bblack>	 I'm not sure what yet, still kinda digging around and trying to verify assumptions
[19:47:49] <cdanis>	 bblack: hmm... not sure what to make of https://w.wiki/Fkcw
[19:55:34] <bblack>	 I'm still going through the process of "question everything" at the low level, I'll get back up there eventually :)
[19:56:16] <cdanis>	 oh -- btw -- count being weeks+1 was a direct observation btw, not a guess (the mechanism was a guess)
[19:56:56] <bblack>	 yeah but "weeks" is not a value the vmod reports, it's something we're calculating based on the reported cday_age, which is itself a calculation based on the current wallclock and the cookie's stored creation date.
[19:57:26] <bblack>	 so I'm trying to jump back through all the mental hoops of that and make sure I understand
[19:57:48] <bblack>	 (and that there's not an actual bug in cookie creation/refresh)
[19:58:30] <cdanis>	 bblack: sooooooo we're calling get_cookie_count() after process_cookie() which can increment count, is that relevant?
[19:59:42] <bblack>	 maybe to our mental understanding.  nothing outside the vmod ever really sees a state based on what the user actually provided (or nothing at all, if they didn't).  all state we see outside the vmod is after the vmod's validation/creation/refresh
[19:59:47] <cdanis>	 nod
[20:00:43] <bblack>	 and since (unlike weeks), days are a fixed concept on the wallclock, we do expect anomalies on the 24 hour cycle
[20:01:10] <bblack>	 (it's not even UTC midnight, technically.  it ignores all the date math about leaps and just uses unix_time/86400 as the day-boundary)
[20:02:12] <bblack>	 https://gitlab.wikimedia.org/repos/sre/libvmod-wmfuniq/-/blob/main/src/cookies.c?ref_type=heads#L325
[20:02:32] <bblack>	 ^ those are the source numbers, and the logic below it affects count.  it all flows from there somehow (or from bugs there!)
[20:04:30] <bblack>	 refresh_basis could cause count>weeks too, but that was something early I left in as an option in case we ever needed it.  I don't think we ever set it.
[20:04:59] <bblack>	 errr no, even that wouldn't bump the count.  it would just refresh the salt+mac with the same count.
[20:07:51] <bblack>	 ah wait, I think I get part of the puzzle now...
[20:08:51] <bblack>	 cday_age should not be what determines the logic we're following there about it being a fresh cookie.
[20:10:05] <bblack>	 cday_age will be zero from the moment of creation, until the next fixed "day" boundary on the wallclock (so anywhere from 0-86400 seconds), then will roll over to 1 at our sloppy-midnight boundary.
[20:10:19] <bblack>	 var.set_int("wmfuniq_days", wu_cfg.get_cookie_cday_age()); // 0 on a freshly-generated cookie, 1+ otherwise
[20:10:34] <bblack>	 ^ which is not what this comment and the rest of the logic is treating it like
[20:11:04] <bblack>	 "count" acts like that: it's zero on fresh-generation, 1+ on any request where it was returned.
[20:12:13] <bblack>	 basically, the outer logic should be: if (count > 0) { do math } else { set everything as zero }
[20:12:19] <cdanis>	 right
[20:13:53] <wikibugs>	 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-esams: esams switch oritentation migration - https://phabricator.wikimedia.org/T407794 (10RobH) 03NEW p:05Triage→03Medium
[20:19:08] <bblack>	 and honestly, probably the VCL "weeks" concept should be like the vmod version it uses to decide whether to bump count.  weeks = "1 + (cday_age/7)"
[20:19:45] <bblack>	 it's always at least week 1, for a returned cookie
[20:20:04] <bblack>	 and that will match count's counting
[20:20:19] <bblack>	 then freq = (count/weeks)
[20:21:01] <bblack>	 maybe I should move some of this up to the vmod, once we're sure this is the useful info we want.
[20:21:03] <cdanis>	 yep
[20:21:35] <bblack>	 because this seems fragile for VCL heh
[20:23:35] <cdanis>	 bblack: okay, I have verified that, for all the distinct cday_ages I observed on cp1100 with a nonsense freq result, that adding 1 to the days makes the count match weeks
[20:25:57] <bblack>	 yeah and that other part about the outer if (count > 0) logic, I think that's basically suppressing some data (that should've reported "real" values, but instead is reporting all-zeros, because it's a returned cookie and its first midnight hasn't happened yet)
[20:28:08] <cdanis>	 yeah, that would explain the 1 day old discontinuity at 00:00
[20:28:12] <cdanis>	 I am pretty sure
[20:28:16] <bblack>	 yeah I think so
[20:30:20] <bblack>	 I really wanted to confirm that my midnight is sloppy where the spike is at in turnilo, but I think it rounds everything to minutes no matter how far you zoom in :P
[20:30:31] <bblack>	 there's only ever been +27 leap seconds
[20:30:43] <cdanis>	 yeah, indeed
[20:30:51] <cdanis>	 it's pre-aggregated data
[20:31:02] <cdanis>	 we don't get fractional seconds in the full data lake either (kinda wish we did)
[20:34:40] <bblack>	 I think, even if you want cday_age to always be 1+ in reporting for consistency vs all-zeros, the week calculation still has to be different
[20:35:02] <bblack>	 it has to be 1 + (raw_cday_age/7), which will roll over at a different time than you have it right now
[20:36:16] <bblack>	 cdanis: ^
[20:37:29] <cdanis>	 hmmm
[20:37:45] <cdanis>	 aye
[20:39:26] <bblack>	 it's hard for me to wrap my brain around it half the time, because all numbers start at zero, and some of these (count, and the internal "weeks") are really 1-based numbers with zero as a "special value"
[20:40:25] <cdanis>	 yeah
[20:41:22] <wikibugs>	 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-esams, 06SRE: esams switch oritentation migration - https://phabricator.wikimedia.org/T407794#11291474 (10RobH)
[20:42:37] <bblack>	 cdanis: almost, one more loop of madness to go: var.set_int("wmfuniq_freq", (var.get_int("wmfuniq_freq")
[20:43:05] <bblack>	 ^ it's not set yet in the current code.  I guess just call get_count() again there, it should be ~cheap to repeat these calls.
[20:43:16] <cdanis>	 argh yeah
[20:43:22] <cdanis>	 I was running varnishtest without looking
[20:43:30] <cdanis>	 and yeah I think get_count() is just a struct member read
[20:44:19] <bblack>	 yeah they all are
[20:45:30] <bblack>	 process_cookie() does all the real work, and then all the get_foo() rest just return struct members (plus some assertions and checking but whatever)
[20:48:53] <cdanis>	 thanks for taking the time, this makes much more sense to me now
[20:53:35] <wikibugs>	 06Traffic, 10HaproxyKafka: HAProxy sometimes does not apply host normalization - https://phabricator.wikimedia.org/T407796 (10Krinkle) 03NEW
[20:54:32] <wikibugs>	 06Traffic, 10HaproxyKafka: HAProxy sometimes does not apply host normalization - https://phabricator.wikimedia.org/T407796#11291513 (10Krinkle)
[20:55:36] <wikibugs>	 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-esams, 06SRE: esams switch orientation migration - https://phabricator.wikimedia.org/T407794#11291516 (10Krinkle)
[20:56:30] <bblack>	 cdanis: thanks for wading through my silly vmod code with me :)
[20:56:45] <cdanis>	 it was fun the whole time :)
[21:01:13] <cdanis>	 bblack: only about 1/3rd rolled out, but, you can see it's changing some 0-days to being reported as 1, which seems right https://w.wiki/FkfF
[21:42:43] <jinxer-wm>	 FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp5022 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqsin&var-instance=cp5022 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown
[21:47:43] <jinxer-wm>	 RESOLVED: HaproxyKafkaExporterDown: HaproxyKafka on cp5022 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqsin&var-instance=cp5022 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown
[21:56:32] <sukhe>	 reboots