[07:05:49] 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Joe) a:03Dzahn Assigning to Daniel as he's actively working on this. [07:08:20] 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Move proton to use TLS only - https://phabricator.wikimedia.org/T255877 (10Joe) a:03JMeybohm [09:49:11] <_joe_> akosiaris, effie janis and I were debating on a UX question [09:49:46] <_joe_> I've written a kube_env bash function that allows us to declare all the kubernetes-related env variables for a specific cluster and service [09:49:55] <_joe_> now we were debating if we prefer it to be [09:50:17] <_joe_> kube_env NS CLUSTER or kube_env CLUSTER NS [09:50:21] <_joe_> so for instance [09:50:33] <_joe_> kube_env blubberoid eqiad [09:50:35] <_joe_> or [09:50:44] <_joe_> kube_env eqiad blubberoid [09:51:02] <_joe_> what seems more natural to you? [09:55:24] are you trolling me or what? [09:56:03] if you are serious first service then cluster. If you are not, it's a pretty good bait [09:56:18] <_joe_> akosiaris: it was a serious question :P [09:56:49] cool, service then cluster [09:56:54] <_joe_> we had this discussion with jayme [09:56:57] it's what we are anyway going to go for in helmfile [09:57:12] the contrary of what we currently have [09:57:19] <_joe_> for the record, that's how I did it [09:57:33] which is mind bending to me (and I assume other people as well) every single time [10:19:12] hello! do ye know much about the integration of service-checker into kubernetes? I see a test in the scaffold but I'm not sure whethere/where it gets called. [10:27:54] <_joe_> I think the idea was to use it as a readiness probe at some point? but maybe akosiaris remembers more details off of the top of his head [10:29:57] hnowlan: it is being called IIRC during integration tests by CI. That gets triggered if there is a helm.yaml file in .pipeline [10:30:21] few services currently have it IIRC, e.g. mathoid. I think kask is going to use that soon though, cc longma [10:34:01] <_joe_> jayme: so I'm going to merge the kube_env change but not the one converting .hfenv [10:34:18] <_joe_> we will still have time to change it back in case [10:36:12] https://gerrit.wikimedia.org/r/c/operations/puppet/+/613190/4 and next one, right ? [10:36:14] * akosiaris reviewing as well [10:38:58] +1ed both [10:39:41] <_joe_> yep [10:49:03] akosiaris: ah cool, thanks! [11:05:08] <_joe_> so adding the __kube_env_ps1 to my prompt I get this https://dpaste.org/v6do [11:06:13] _joe_: Okay. I'm fine with either way. Stick with what the majority thinks is natural ;-) [13:04:23] jayme: and notes_url fixed as well https://puppet-compiler.wmflabs.org/compiler1002/23999/icinga1001.wikimedia.org/fulldiff.html [13:04:52] akosiaris: yeah, saw your change. Nice! [13:05:18] <_joe_> uhm so I'm taking a look at helmfile, and I'm kinda convinced we're doing it all wrong [13:05:59] <_joe_> I'm also unconvinced that there is a right way to do it [13:06:42] <_joe_> one q: do you think it's acceptable to require people to change how they deploy now? [13:07:37] <_joe_> like moving form "helmfile sync -i" into a specific dir [13:08:21] <_joe_> to defining what they want to operate on, then running helmfile sync, all from the root of the service hierarchy? [13:15:47] 10serviceops, 10DBA, 10OTRS, 10Operations: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10akosiaris) Many thanks! [13:20:13] _joe_: what do you mean? [13:20:28] <_joe_> so, right now you do something [13:20:30] <_joe_> like [13:20:36] it's not gonna be easy to separate services between them without some structure [13:20:45] <_joe_> cd services/codfw/mathoid [13:20:55] <_joe_> source .hfenv [13:21:01] <_joe_> helmfile sync -i [13:21:16] <_joe_> my proposal was, possibly, to do something like [13:21:21] <_joe_> cd services [13:21:38] <_joe_> kube_env mathoid codfw [13:22:00] <_joe_> helmfile sync -i [13:22:18] <_joe_> I think I can make that work [13:22:57] _joe_: I see to have lost part of the buffer backlog, can you repaste? [13:23:43] <_joe_> so say you want to release mathoid right now, you need to [13:23:52] fwiw my proposal was to do cd services/mathoid ; helmfile sync -I [13:24:19] <_joe_> yeah I can't get that to work without setting the env vars [13:24:21] s/proposal/grand plan/ [13:24:37] <_joe_> and helm-diff still doesn't support --kubeconfig [13:24:46] is that the only blocker? [13:24:55] <_joe_> they do have the flag now, but it does nothing [13:25:02] <_joe_> so I don't think it works? [13:25:08] what? [13:25:42] <_joe_> https://github.com/databus23/helm-diff/blob/55d757254771bb0eb0a4636feb7830b3af9e4f72/cmd/upgrade.go#L95 [13:26:09] This flag is ignored, to allow passing of this top level flag to helm [13:26:15] cool, that's exactly what we want it to do [13:26:20] <_joe_> ok just that? [13:26:22] <_joe_> ok then [13:26:31] <_joe_> so we can declare kubeconfig in the helmfile [13:27:13] <_joe_> ok, let's start by having our official golang packaging expert work on that [13:27:24] * _joe_ look at jayme [13:27:31] jayme: run for your life ! :P [13:27:53] * jayme hides [13:27:55] <_joe_> akosiaris: hey, we've done our tours in the meat grinder, now it's their turn [13:28:15] I am still doing mine... damn otrs [13:36:58] I'm not sure if I got the problem right. They now ignore the flag so we should no longer get errors when passing it in via helmfile, right? [13:37:28] So you're just asking for an updated helm-diff package? [13:45:20] <_joe_> yes [13:45:33] <_joe_> apparently it will work as intended :) [13:46:10] <_joe_> akosiaris: so we will need to have environments in helmfile, rather than separate helmfiles [13:46:17] <_joe_> one env per cluster I guess? [13:48:44] the releases I 'd say [13:48:47] not even environments [13:49:21] at least in the beginning. Maybe environments make sense with multiple releases per cluster ? [13:49:53] the diff being in the calling out to them [13:50:13] helmfile sync (for all releases in the helmfile) vs helmfile -e sync for a specific env (and it's releases) [13:50:36] and I am not sure helmfile -e staging sync ; helmfile -e codfw sync ; etc is much better [15:14:07] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Investigate why mobileapps in k8s "/{domain}/v1/data/css/mobile/base" endpoint takes way longer than on scb to complete - https://phabricator.wikimedia.org/T258186 (10Mhollow... [15:14:10] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10Mholloway) [15:14:49] 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Investigate why mobileapps in k8s "/{domain}/v1/data/css/mobile/site" endpoint takes way longer than on scb to complete - https://phabricator.wikimedia.org/T258186 (10Mhollow... [16:08:55] Re deploying, I did make a deployment script so people don't have to change directories, etc. Is there a task I can watch so I can update the script appropriately when changes are made _joe_ ? [16:10:05] <_joe_> longma: your script is deploy.sh in the deployment-charts repo, correct longma? [16:10:22] <_joe_> I plan on keeing it working through the transition [16:10:30] akosiaris: as for kask, yes on the service checker, just waiting for your final review :) I didn't notice kask was actually deployed to kubernetes though [16:10:51] Okay, thanks _joe_ ! [16:11:56] <_joe_> longma: do you know how many devs use it vs using helmfile directly, btw? [16:13:44] I don't know if anyone uses it except me lol :'( I guess I should add it to the documentation [16:16:45] longma: change reviewed btw. Btw, kask poweres sessionstore and echostore (https://grafana.wikimedia.org/d/000000519/kubernetes-overview?panelId=9&fullscreen&orgId=1) [16:21:43] Oh thanks! [19:20:24] btw is there a new way to publish charts now? [20:08:33] 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Sustainability (Incident Prevention): Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Krinkle) [20:11:16] 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Prevention): Test gutter pool failover in production and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) a:05aaron→03Krinkle Todo for me: Update docs on . [22:06:00] 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10jijiki) I am not very confident there is a "right" value, given that it will depend on the circumstances every time the gutter pool kicks in. Since mcrouter flapping is part of the equation, my opinion is to c... [22:06:12] 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10jijiki) p:05Triage→03Medium [22:47:10] 10serviceops, 10Patch-For-Review: Replace rdb200[34] with rdb200[78] - https://phabricator.wikimedia.org/T255250 (10jijiki) rdb2003 is primary for ores2* servers and changeprop/changeprop-jobqueue. * changeprop/changeprop-jobqueue is generally ok with briefly losing its redis connectivity/data (cc @Pchelolo)...