[10:07:47] <_joe_> ok, I think this https://github.com/lavagetto/cdk8s-stub/blob/master/blubberoid/main.py is a nicer abstraction than what helm offers to write a kubernetes manifest [10:08:08] <_joe_> and this is still extremely raw, with some more time to think it can be made much better [10:12:16] <_joe_> we'd need to show this to the users though :) [11:16:25] 10serviceops, 10Deployments, 10Prod-Kubernetes, 10Release-Engineering-Team: Evaluate cost/benefits of switching to using a programming language to define kubernetes resources - https://phabricator.wikimedia.org/T254739 (10Joe) [14:09:05] i was almost able to attend today's serviceops meeting today, but then someone booked over it again ;( [14:27:25] <_joe_> heh [14:37:43] _joe_: do you have 5 mins to sync about purges? [14:37:58] <_joe_> Pchelolo: sure [14:38:13] TLDR: from now on enabling kafka purges/disabling HTCP is really easy on our side [14:38:35] <_joe_> ack, that sounds great [14:38:42] so, I'm pretty much waiting for your 'go' to do those [14:38:47] <_joe_> we should try disabling htcp somewhere and check that purges work [14:38:53] test wiki/ [14:38:56] ? [14:39:00] <_joe_> apart from that you have my +1 to proceed [14:39:02] <_joe_> yes [14:39:09] <_joe_> testwiki or mediawiki.org should do [14:39:53] ok. So, I'll do testwiki on today morning SWAT [14:40:18] <_joe_> cool [14:40:41] <_joe_> I would suggest ytou try both to use action=purge and modifying a template [14:40:54] one thing I was concerned - can we be double-purging for some time on big wikis? would it make any trouble for varnish? [14:42:10] or it's better to prep more and then pull the switch in a single deploy? [14:42:40] <_joe_> Pchelolo: I'd defer that to ema, but I don't think it's a problem as long as we don't live it running for very long [14:42:49] <_joe_> days with double purges seems wasteful [14:55:04] yeah I don't think it would be the end of the world, but we do get enough purges as it is without doubling them :) [14:56:45] <_joe_> for context, restbase is ~ 1/3 of all purges, and we doubled them for a couple days [14:56:57] <_joe_> I'm more worried about CDN cache depletion [15:10:19] _joe_: one more thing - do we want to completely move beta cluster first? [15:10:34] is everything there set up to do it, or we ignore beta cluster? [15:10:43] <_joe_> I have no idea! [15:11:42] ok. in that case, I'll do testwiki [15:11:52] eventually I guess we'd need to do labs to [15:11:55] <_joe_> I mean I don't even know if purged runs in beta [15:12:01] <_joe_> ema: ^ [15:36:26] _joe_: rzl: can i get a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/603511 ? [15:36:41] <_joe_> in a few mins [15:36:50] ty [15:36:51] we're both in meeting but will look right after, 9m on the calendar, might be up to 15 more [15:49:54] _joe_, Pchelolo: so, I see that puppet is broken since a while on deployment-cache-text06.deployment-prep.eqiad.wmflabs [15:50:43] other than that, I have only worried about production so far but of course we can/should take care of !prod too [15:54:13] ema: hm.. Would be neat if we had purged working in beta before we uneploy scb-config-based changeprop in beta. That would allow for much less specialcasing there [15:54:24] not sure what's the schedule on that one though, cc hnowlan [15:55:04] switching MW to disable htcp purges and enable kafka purges would be really easy. I'll make a patch, add you there, and you can just +1 it once you think we should do it [16:01:25] ok [16:05:29] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Prevention): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) (//Moving to team inbox for next meeting.//) [16:20:40] _joe_: btw what should we do about that opcache health alert in codfw? [16:20:57] is that just expected due to low traffic, and should we turn it off in the passive dc? [16:21:27] <_joe_> rzl: that or improve detection [16:23:07] rzl: could make it rps-dependent [16:23:35] <_joe_> it already is [16:23:39] <_joe_> somehow [16:25:53] i recently added https://gerrit.wikimedia.org/r/c/operations/puppet/+/602071/3/modules/profile/manifests/mediawiki/php/monitoring.pp so now it can be turned off in Hiera. the only host i actually did turn it off for so far was scandium [16:27:21] would just be a one-liner to add to role/codfw/mediawwiki/ [16:28:17] IIRC it's dependent on total requests since process start, but not on rps [16:29:54] <_joe_> rzl: correct [18:36:45] fyi: as of now, we're only purging test.wikipedia.org via kafka [18:36:54] it... works. [18:41:34] Pchelolo: <3 [18:41:37] 🎉 [18:41:52] 🚀 [20:10:44] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Wikimedia-Rdbms: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10Krinkle) a:03Krinkle [20:10:59] 10serviceops, 10Core Platform Team, 10Performance-Team, 10Wikimedia-Rdbms: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10Krinkle) a:05Krinkle→03None [22:20:08] 10serviceops, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: codfw: decom at least 15 appservers in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Papaul) switch ports removed for mw2154 through mw2186