[10:07:47] <_joe_>	 ok, I think this https://github.com/lavagetto/cdk8s-stub/blob/master/blubberoid/main.py is a nicer abstraction than what helm offers to write a kubernetes manifest
[10:08:08] <_joe_>	 and this is still extremely raw, with some more time to think it can be made much better
[10:12:16] <_joe_>	 we'd need to show this to the users though :)
[11:16:25] <wikibugs>	 10serviceops, 10Deployments, 10Prod-Kubernetes, 10Release-Engineering-Team: Evaluate cost/benefits of switching to using a programming language to define kubernetes resources - https://phabricator.wikimedia.org/T254739 (10Joe)
[14:09:05] <mark>	 i was almost able to attend today's serviceops meeting today, but then someone booked over it again ;(
[14:27:25] <_joe_>	 heh
[14:37:43] <Pchelolo>	 _joe_: do you have 5 mins to sync about purges? 
[14:37:58] <_joe_>	 Pchelolo: sure
[14:38:13] <Pchelolo>	 TLDR: from now on enabling kafka purges/disabling HTCP is really easy on our side
[14:38:35] <_joe_>	 ack, that sounds great
[14:38:42] <Pchelolo>	 so, I'm pretty much waiting for your 'go' to do those
[14:38:47] <_joe_>	 we should try disabling htcp somewhere and check that purges work
[14:38:53] <Pchelolo>	 test wiki/
[14:38:56] <Pchelolo>	 ?
[14:39:00] <_joe_>	 apart from that you have my +1 to proceed
[14:39:02] <_joe_>	 yes
[14:39:09] <_joe_>	 testwiki or mediawiki.org should do
[14:39:53] <Pchelolo>	 ok. So, I'll do testwiki on today morning SWAT
[14:40:18] <_joe_>	 cool
[14:40:41] <_joe_>	 I would suggest ytou try both to use action=purge and modifying a template
[14:40:54] <Pchelolo>	 one thing I was concerned - can we be double-purging for some time on big wikis? would it make any trouble for varnish?
[14:42:10] <Pchelolo>	 or it's better to prep more and then pull the switch in a single deploy?
[14:42:40] <_joe_>	 Pchelolo: I'd defer that to ema, but I don't think it's a problem as long as we don't live it running for very long
[14:42:49] <_joe_>	 days with double purges seems wasteful
[14:55:04] <ema>	 yeah I don't think it would be the end of the world, but we do get enough purges as it is without doubling them :)
[14:56:45] <_joe_>	 for context, restbase is ~ 1/3 of all purges, and we doubled them for a couple days
[14:56:57] <_joe_>	 I'm more worried about CDN cache depletion
[15:10:19] <Pchelolo>	 _joe_: one more thing - do we want to completely move beta cluster first?
[15:10:34] <Pchelolo>	 is everything there set up to do it, or we ignore beta cluster?
[15:10:43] <_joe_>	 I have no idea!
[15:11:42] <Pchelolo>	 ok. in that case, I'll do testwiki
[15:11:52] <Pchelolo>	 eventually I guess we'd need to do labs to
[15:11:55] <_joe_>	 I mean I don't even know if purged runs in beta
[15:12:01] <_joe_>	 ema: ^
[15:36:26] <cdanis>	 _joe_: rzl: can i get a quick look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/603511 ?
[15:36:41] <_joe_>	 in a few mins
[15:36:50] <cdanis>	 ty
[15:36:51] <rzl>	 we're both in meeting but will look right after, 9m on the calendar, might be up to 15 more
[15:49:54] <ema>	 _joe_, Pchelolo: so, I see that puppet is broken since a while on deployment-cache-text06.deployment-prep.eqiad.wmflabs 
[15:50:43] <ema>	 other than that, I have only worried about production so far but of course we can/should take care of !prod too 
[15:54:13] <Pchelolo>	 ema: hm.. Would be neat if we had purged working in beta before we uneploy scb-config-based changeprop in beta. That would allow for much less specialcasing there
[15:54:24] <Pchelolo>	 not sure what's the schedule on that one though, cc hnowlan
[15:55:04] <Pchelolo>	 switching MW to disable htcp purges and enable kafka purges would be really easy. I'll make a patch, add you there, and you can just +1 it once you think we should do it
[16:01:25] <ema>	 ok
[16:05:29] <wikibugs>	 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Prevention): Test gutter pool failover in production  and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) (//Moving to team inbox for next meeting.//)
[16:20:40] <rzl>	 _joe_: btw what should we do about that opcache health alert in codfw?
[16:20:57] <rzl>	 is that just expected due to low traffic, and should we turn it off in the passive dc?
[16:21:27] <_joe_>	 rzl: that or improve detection
[16:23:07] <cdanis>	 rzl: could make it rps-dependent 
[16:23:35] <_joe_>	 it already is
[16:23:39] <_joe_>	 somehow
[16:25:53] <mutante>	 i recently added https://gerrit.wikimedia.org/r/c/operations/puppet/+/602071/3/modules/profile/manifests/mediawiki/php/monitoring.pp so now it can be turned off in Hiera. the only host i actually did turn it off for so far was scandium
[16:27:21] <mutante>	 would just be a one-liner to add to role/codfw/mediawwiki/
[16:28:17] <rzl>	 IIRC it's dependent on total requests since process start, but not on rps
[16:29:54] <_joe_>	 rzl: correct
[18:36:45] <Pchelolo>	 fyi: as of now, we're only purging test.wikipedia.org via kafka
[18:36:54] <Pchelolo>	 it... works.
[18:41:34] <cdanis>	 Pchelolo: <3
[18:41:37] <cdanis>	 🎉
[18:41:52] <rzl>	 🚀
[20:10:44] <wikibugs>	 10serviceops, 10Core Platform Team, 10Performance-Team, 10Wikimedia-Rdbms: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10Krinkle) a:03Krinkle
[20:10:59] <wikibugs>	 10serviceops, 10Core Platform Team, 10Performance-Team, 10Wikimedia-Rdbms: Determine multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10Krinkle) a:05Krinkle→03None
[22:20:08] <wikibugs>	 10serviceops, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: codfw: decom at least 15 appservers  in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Papaul) switch ports removed for mw2154 through mw2186