[11:24:16] <gilles>	 FYI, I've just deployed the new lossy Wikipedia logos. Not that I expect it to affect performance metrics in a significant way...
[12:01:59] <gilles>	 based on a very rough estimate this could save users globally as much as  5GB/minute
[12:02:49] <gilles>	 or 7TB per day. but that's difficult to get a reliable figure for, this is extrapolated from one minute of traffic on a single Varnish frontend
[12:05:11] <gilles>	 the order of magnitude is probably correct, but not the actual figure
[12:29:01] <godog>	 quite nice
[12:31:13] <godog>	 Krinkle: re: slow kibana, I dug in a little a couple of days ago. There's a regression that will be fixed in 7.9.1 and the other glaring issue I could find is https://github.com/elastic/kibana/issues/76401
[12:54:16] <Krinkle>	 godog: ack, I'll see if I can file some issues as well. So -next is close to latest stable?
[12:54:51] <godog>	 Krinkle: it is in fact the latest stable yeah, 7.9
[12:57:11] <godog>	 FWIW I'm using 'varnish webrequest 50x' dashboard as a testbed
[20:01:08] <Krinkle>	 dpifke: would it make sense to have only one navtiming instance produce the last_handled metric at a time?
[20:01:28] <Krinkle>	 e.g. the alert would query it without specifying the dc
[20:01:49] <Krinkle>	 or maybe by max()-ing it
[20:02:04] <Krinkle>	 and letting the inactive one just repeat the old timestamp 
[20:02:23] <Krinkle>	 given graphite is not active-active and that we intentionally switch off the secondary
[20:07:28] <dpifke>	 I'm not sure we can aggregate across Prometheus instances (DCs), and even if we could, I'm not sure that's a great idea from a reliability standpoint.
[20:08:09] <Krinkle>	 they would both write to the same graphite instance, so only one needs to have done something ,right?
[20:08:25] <dpifke>	 I'm thinking it's possible to add an "is_active" label, and then aggregate across that if we want true numbers, or select just {is_active="true"} for alerts.
[20:08:33] <Krinkle>	 ah hm.. right we'd have to add these to the list of global aggregated metrics  
[20:08:52] <dpifke>	 The trick is we need to send NaN in the correct place so that it doesn't extrapolate when the label changes.
[20:09:41] <dpifke>	 I have to look at what Icinga does when given an NaN.  In this case, it should fail open, but I can imagine cases where the desired behavior would be the opposite.
[20:09:46] <Krinkle>	 grafana does support querying multiple proms in one  panel but aggregating functions over them is limited and confusing at best indeed.
[20:10:00] <Krinkle>	 .. right we use prom directly, right
[20:10:04] <dpifke>	 The alert is coming from a PromQL query.
[20:11:19] <dpifke>	 It's not an actionable alert, so arguably it shouldn't require a manual silence, but DC switchovers are rare enough that it's not a huge amount of toil to do so.
[20:11:52] <dpifke>	 I don't want to add a bunch of complexity that makes it fragile.
[20:11:55] <Krinkle>	 yeah, if we make them separate alerts with the dc in the title, that should suffice.
[20:12:18] <Krinkle>	 is that the only alert that did/should fire?
[20:12:23] <Krinkle>	 from navtming
[20:12:51] <dpifke>	 I think so.  Whatever we do for navtiming will get copied over to XHGui, ArcLamp, etc.  But those alerts aren't live yet.