[07:05:49] <wikibugs>	 10serviceops, 10Operations, 10Parsoid, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10Joe) a:03Dzahn Assigning to Daniel as he's actively working on this.
[07:08:20] <wikibugs>	 10serviceops, 10Operations, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Move proton to use TLS only - https://phabricator.wikimedia.org/T255877 (10Joe) a:03JMeybohm
[09:49:11] <_joe_>	 akosiaris, effie janis and I were debating on a UX question
[09:49:46] <_joe_>	 I've written a kube_env bash function that allows us to declare all the kubernetes-related env variables for a specific cluster and service
[09:49:55] <_joe_>	 now we were debating if we prefer it to be
[09:50:17] <_joe_>	 kube_env NS CLUSTER or kube_env CLUSTER NS
[09:50:21] <_joe_>	 so for instance
[09:50:33] <_joe_>	 kube_env blubberoid eqiad 
[09:50:35] <_joe_>	 or
[09:50:44] <_joe_>	 kube_env eqiad blubberoid
[09:51:02] <_joe_>	 what seems more natural to you?
[09:55:24] <akosiaris>	 are you trolling me or what?
[09:56:03] <akosiaris>	 if you are serious first service then cluster. If you are not, it's a pretty good bait
[09:56:18] <_joe_>	 akosiaris: it was a serious question :P
[09:56:49] <akosiaris>	 cool, service then cluster
[09:56:54] <_joe_>	 we had this discussion with jayme 
[09:56:57] <akosiaris>	 it's what we are anyway going to go for in helmfile
[09:57:12] <akosiaris>	 the contrary of what we currently have 
[09:57:19] <_joe_>	 for the record, that's how I did it
[09:57:33] <akosiaris>	 which is mind bending to me (and I assume other people as well) every single time
[10:19:12] <hnowlan>	 hello! do ye know much about the integration of service-checker into kubernetes? I see a test in the scaffold but I'm not sure whethere/where it gets called. 
[10:27:54] <_joe_>	 I think the idea was to use it as a readiness probe at some point? but maybe akosiaris remembers more details off of the top of his head
[10:29:57] <akosiaris>	 hnowlan: it is being called IIRC during integration tests by CI. That gets triggered if there is a helm.yaml file in .pipeline
[10:30:21] <akosiaris>	 few services currently have it IIRC, e.g. mathoid. I think kask is going to use that soon though, cc longma
[10:34:01] <_joe_>	 jayme: so I'm going to merge the kube_env change but not the one converting .hfenv
[10:34:18] <_joe_>	 we will still have time to change it back in case
[10:36:12] <akosiaris>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/613190/4 and next one, right ?
[10:36:14] * akosiaris reviewing as well
[10:38:58] <akosiaris>	 +1ed both
[10:39:41] <_joe_>	 yep
[10:49:03] <hnowlan>	 akosiaris: ah cool, thanks! 
[11:05:08] <_joe_>	 so adding the __kube_env_ps1 to my prompt I get this https://dpaste.org/v6do
[11:06:13] <jayme>	 _joe_: Okay. I'm fine with either way. Stick with what the majority thinks is natural ;-)
[13:04:23] <akosiaris>	 jayme: and notes_url fixed as well https://puppet-compiler.wmflabs.org/compiler1002/23999/icinga1001.wikimedia.org/fulldiff.html
[13:04:52] <jayme>	 akosiaris: yeah, saw your change. Nice!
[13:05:18] <_joe_>	 uhm so I'm taking a look at helmfile, and I'm kinda convinced we're doing it all wrong
[13:05:59] <_joe_>	 I'm also unconvinced that there is a right way to do it
[13:06:42] <_joe_>	 one q: do you think it's acceptable to require people to change how they deploy now?
[13:07:37] <_joe_>	 like moving form "helmfile sync -i" into a specific dir
[13:08:21] <_joe_>	 to defining what they want to operate on, then running helmfile sync, all from the root of the service hierarchy?
[13:15:47] <wikibugs>	 10serviceops, 10DBA, 10OTRS, 10Operations: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10akosiaris) Many thanks!
[13:20:13] <akosiaris>	 _joe_: what do you mean? 
[13:20:28] <_joe_>	 so, right now you do something
[13:20:30] <_joe_>	 like
[13:20:36] <akosiaris>	 it's not gonna be easy to separate services between them without some structure 
[13:20:45] <_joe_>	 cd services/codfw/mathoid
[13:20:55] <_joe_>	 source .hfenv
[13:21:01] <_joe_>	 helmfile sync -i
[13:21:16] <_joe_>	 my proposal was, possibly, to do something like
[13:21:21] <_joe_>	 cd services
[13:21:38] <_joe_>	 kube_env mathoid codfw
[13:22:00] <_joe_>	 helmfile sync -i
[13:22:18] <_joe_>	 I think I can make that work
[13:22:57] <akosiaris>	 _joe_: I see to have lost part of the buffer backlog, can you repaste?
[13:23:43] <_joe_>	 so say you want to release mathoid right now, you need to
[13:23:52] <akosiaris>	 fwiw my proposal was to do cd services/mathoid ; helmfile sync -I
[13:24:19] <_joe_>	 yeah I can't get that to work without setting the env vars
[13:24:21] <akosiaris>	 s/proposal/grand plan/
[13:24:37] <_joe_>	 and helm-diff still doesn't support --kubeconfig
[13:24:46] <akosiaris>	 is that the only blocker?
[13:24:55] <_joe_>	 they do have the flag now, but it does nothing
[13:25:02] <_joe_>	 so I don't think it works?
[13:25:08] <akosiaris>	 what?
[13:25:42] <_joe_>	 https://github.com/databus23/helm-diff/blob/55d757254771bb0eb0a4636feb7830b3af9e4f72/cmd/upgrade.go#L95
[13:26:09] <akosiaris>	 This flag is ignored, to allow passing of this top level flag to helm
[13:26:15] <akosiaris>	 cool, that's exactly what we want it to do
[13:26:20] <_joe_>	 ok just that?
[13:26:22] <_joe_>	 ok then
[13:26:31] <_joe_>	 so we can declare kubeconfig in the helmfile
[13:27:13] <_joe_>	 ok, let's start by having our official golang packaging expert work on that
[13:27:24] * _joe_ look at jayme
[13:27:31] <akosiaris>	 jayme: run for your life ! :P
[13:27:53] * jayme hides
[13:27:55] <_joe_>	 akosiaris: hey, we've done our tours in the meat grinder, now it's their turn
[13:28:15] <akosiaris>	 I am still doing mine... damn otrs
[13:36:58] <jayme>	 I'm not sure if I got the problem right. They now ignore the flag so we should no longer get errors when passing it in via helmfile, right?
[13:37:28] <jayme>	 So you're just asking for an updated helm-diff package?
[13:45:20] <_joe_>	 yes
[13:45:33] <_joe_>	 apparently it will work as intended :)
[13:46:10] <_joe_>	 akosiaris: so we will need to have environments in helmfile, rather than separate helmfiles
[13:46:17] <_joe_>	 one env per cluster I guess?
[13:48:44] <akosiaris>	 the releases I 'd say
[13:48:47] <akosiaris>	 not even environments
[13:49:21] <akosiaris>	 at least in the beginning. Maybe environments make sense with multiple releases per cluster ?
[13:49:53] <akosiaris>	 the diff being in the calling out to them
[13:50:13] <akosiaris>	 helmfile sync (for all releases in the helmfile) vs helmfile -e <env> sync for a specific env (and it's releases)
[13:50:36] <akosiaris>	 and I am not sure helmfile -e staging sync ; helmfile -e codfw sync ; etc is much better 
[15:14:07] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Investigate why mobileapps in k8s "/{domain}/v1/data/css/mobile/base" endpoint takes way longer than on scb to complete - https://phabricator.wikimedia.org/T258186 (10Mhollow...
[15:14:10] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Migrate mobileapps to k8s and node 10 - https://phabricator.wikimedia.org/T218733 (10Mholloway)
[15:14:49] <wikibugs>	 10serviceops, 10Mobile-Content-Service, 10Page Content Service, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Investigate why mobileapps in k8s "/{domain}/v1/data/css/mobile/site" endpoint takes way longer than on scb to complete - https://phabricator.wikimedia.org/T258186 (10Mhollow...
[16:08:55] <longma>	 Re deploying, I did make a deployment script so people don't have to change directories, etc. Is there a task I can watch so I can update the script appropriately when changes are made _joe_ ? 
[16:10:05] <_joe_>	 longma: your script is deploy.sh in the deployment-charts repo, correct longma?
[16:10:22] <_joe_>	 I plan on keeing it working through the transition
[16:10:30] <longma>	 akosiaris: as for kask, yes on the service checker, just waiting for your final review :) I didn't notice kask was actually deployed to kubernetes though
[16:10:51] <longma>	 Okay, thanks _joe_ ! 
[16:11:56] <_joe_>	 longma: do you know how many devs use it vs using helmfile directly, btw?
[16:13:44] <longma>	 I don't know if anyone uses it except me lol :'( I guess I should add it to the documentation
[16:16:45] <akosiaris>	 longma: change reviewed btw. Btw, kask poweres sessionstore and echostore (https://grafana.wikimedia.org/d/000000519/kubernetes-overview?panelId=9&fullscreen&orgId=1)
[16:21:43] <longma>	 Oh thanks! 
[19:20:24] <longma>	 btw is there a new way to publish charts now?
[20:08:33] <wikibugs>	 10serviceops, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Sustainability (Incident Prevention): Reduce read pressure on memcached servers by adding a machine-local Memcache instance - https://phabricator.wikimedia.org/T244340 (10Krinkle)
[20:11:16] <wikibugs>	 10serviceops, 10Operations, 10Performance-Team, 10Sustainability (Incident Prevention): Test gutter pool failover in production  and memcached 1.5.x - https://phabricator.wikimedia.org/T240684 (10Krinkle) a:05aaron→03Krinkle Todo for me: Update docs on <https://wikitech.wikimedia.org/wiki/Memcached>.
[22:06:00] <wikibugs>	 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10jijiki) I am not very confident there is a "right" value, given that it will depend on the circumstances every time the gutter pool kicks in. Since mcrouter flapping is part of the equation, my opinion is to c...
[22:06:12] <wikibugs>	 10serviceops: mcrouter memcached flapping in gutter pool - https://phabricator.wikimedia.org/T255511 (10jijiki) p:05Triage→03Medium
[22:47:10] <wikibugs>	 10serviceops, 10Patch-For-Review: Replace rdb200[34] with rdb200[78] - https://phabricator.wikimedia.org/T255250 (10jijiki) rdb2003 is primary for ores2* servers and changeprop/changeprop-jobqueue.   * changeprop/changeprop-jobqueue is generally ok with briefly losing its redis connectivity/data (cc @Pchelolo)...