[07:03:17] <fsero>	 thcipriani: is helmfile command not helm
[07:03:32] <fsero>	 While helmfile uses helm underneath
[07:04:05] <fsero>	 And what do you missed is the source part
[07:07:35] <fsero>	 thcipriani: https://wikitech.wikimedia.org/wiki/Migrating_from_scap-helm
[09:13:47] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10User-fsero: set up limitranges and resourcequotas to protect the cluster from resource abuse and starvation - https://phabricator.wikimedia.org/T228965 (10fsero)
[09:15:21] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10User-fsero: Set up PodSecurityPolicies in clusters - https://phabricator.wikimedia.org/T228967 (10fsero)
[09:24:33] <wikibugs>	 10serviceops, 10Operations, 10ops-codfw: (OoW) restbase2009 lockup - https://phabricator.wikimedia.org/T227408 (10jijiki) @Eevans I was under the impression we have more work to be done on the server. Shall we mark this task as resolved?
[09:24:34] <fsero>	 mutante: is part of the helm index operation and its expected
[09:25:51] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10User-fsero: recreate codfw cluster state from code stored in deployment-charts with helmfile [MIGHT CAUSE DOWNTIME] - https://phabricator.wikimedia.org/T228837 (10fsero) p:05Triage→03High
[09:26:00] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10User-fsero: recreate eqiad cluster state from code stored in deployment-charts with helmfile [MIGHT CAUSE DOWNTIME] - https://phabricator.wikimedia.org/T228836 (10fsero) p:05Triage→03High
[09:26:05] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review, 10User-fsero: Set up PodSecurityPolicies in clusters - https://phabricator.wikimedia.org/T228967 (10fsero) p:05Triage→03Normal
[09:26:11] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Patch-For-Review, 10User-fsero: set up limitranges and resourcequotas to protect the cluster from resource abuse and starvation - https://phabricator.wikimedia.org/T228965 (10fsero) p:05Triage→03Normal
[09:26:43] <wikibugs>	 10serviceops, 10Analytics, 10EventBus, 10Patch-For-Review: helmfile apply with values.yaml file change did not deploy new k8s pods - https://phabricator.wikimedia.org/T228700 (10fsero) 05Open→03Resolved a:03fsero
[09:27:15] <elukey>	 _joe_ it would be interesting to test https://blog.box.com/introducing-memsniff-robust-memcache-traffic-analyzer
[09:27:18] <elukey>	 to replace memkeys
[09:27:41] <_joe_>	 heh, sure
[09:27:56] <elukey>	 I opened a pull request to what I think is memkeys current upstream but I didn't get any answer https://github.com/bmatheny/memkeys/issues/25
[09:28:06] <elukey>	 (it segfaults on stretch+ sigh)
[09:29:16] <wikibugs>	 10serviceops, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki)
[09:32:29] <fsero>	 that other project doesnt seem very active as well elukey 
[09:32:38] <fsero>	 but 🤷
[09:32:59] <wikibugs>	 10serviceops, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki) All async jobs run on PHP7, we will keep an eye for about a week, and then c...
[09:34:30] <wikibugs>	 10serviceops, 10Operations, 10observability, 10User-Elukey: Test memsniff as possible replacement of memkeys - https://phabricator.wikimedia.org/T228970 (10elukey) p:05Triage→03Normal
[09:36:42] <elukey>	 fsero: ah just noticed no response from upstream in any issue opened
[09:36:46] <elukey>	 sigh
[09:39:22] <elukey>	 I don't find any alternative, just also asked in #memcached
[09:42:48] <fsero>	 if you want help packaging it
[09:42:51] <fsero>	 i can help you
[09:42:56] <fsero>	 im the local golang packaging expert
[09:43:06] <fsero>	 apparently
[09:44:14] <elukey>	 ahahhahah
[09:44:25] <elukey>	 that would be great thanks :)
[09:44:32] <elukey>	 speaking of memcached! 
[09:44:34] <elukey>	 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/525224/
[09:44:58] <elukey>	 this is a bold move to enable async replication for all mw mcrouters
[09:45:26] <elukey>	 given how mediawiki works with memcached it seems reasonable for the moment, and we could revise it in the future if needed
[09:45:51] <jijiki>	 sounds good
[09:45:58] <elukey>	 but, the agreement was that I would have reached the canaries and then ask for opinions before proceeding further :)
[09:46:17] <jijiki>	 if we don't see anything funno for now 
[09:46:21] <jijiki>	 funny*
[09:46:59] <jijiki>	 do you think there is anything else we should keep and eye on/check out?
[09:47:09] <wikibugs>	 10serviceops, 10Operations, 10Wikimedia-General-or-Unknown, 10Performance-Team (Radar), 10User-Elukey: Deprecate the usage of nutcracker for memcached - https://phabricator.wikimedia.org/T214275 (10elukey) 05Open→03Resolved a:03elukey
[09:49:01] <elukey>	 jijiki: in theory no from my knowledge
[09:49:56] <elukey>	 the only pros would be to find a metric that could tell us if a latency improvement happened or not
[09:53:02] <jijiki>	 hmm
[09:54:57] <elukey>	 the per-shard latency metrics from mcrouter are not helping, since I believe that mcrouter immediately returns not_stored or similar but registers the latency of the command anyway
[09:55:33] <elukey>	 so it is probably on the mediawiki side that we'll see improvements 
[09:58:45] <jijiki>	 we'll see, we'll figure it out 
[10:24:37] <wikibugs>	 10serviceops, 10Parsoid-PHP, 10Core Platform Team (Parsoid REST API in PHP  (CDP2)): Deploy Parsoid-PHP with Mediawiki to scandium for RT and performance testing - https://phabricator.wikimedia.org/T228069 (10Joe) ok this sounds reasonable.  @Mutante I think we need to do what follows:  [] make the HHVM inst...
[10:27:05] <wikibugs>	 10serviceops, 10Parsoid-PHP, 10Core Platform Team (Parsoid REST API in PHP  (CDP2)): Allow to avoid installing HHVM from the mediawiki puppet module and profile - https://phabricator.wikimedia.org/T228976 (10Joe)
[13:45:25] <wikibugs>	 10serviceops, 10Parsoid-PHP, 10Core Platform Team (Parsoid REST API in PHP  (CDP2)): Allow to avoid installing HHVM from the mediawiki puppet module and profile - https://phabricator.wikimedia.org/T228976 (10Joe) p:05Triage→03High
[14:35:33] <mutante2>	 hello, it's me. i have an issue with the laptop again
[14:35:51] <mutante2>	 cant charge it. usb-c ports physical issue
[14:36:21] <mutante2>	 i was on my last percent of battery starting to send a mail to you guys about it ..and shut down on me
[14:37:28] <mutante2>	 local repair places would take days and then order the parts . so i just have to go to SF to the office asking for a loaner. dont have any wmf credentials on phone
[14:38:21] <mutante2>	 apergos: ^ will miss the meeting because of this. can you forward that
[14:38:31] <apergos>	 I will relay it
[14:38:35] <mutante2>	 will be on my way to SF
[14:38:40] <apergos>	 ok, good luck!
[14:38:46] <mutante2>	 thanks Apergos
[14:38:51] <apergos>	 yw!
[14:54:41] <fsero>	 tarrow: thcipriani and akosiaris this affects several deployments https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/525558
[14:55:04] <fsero>	 tarrow:  and thcipriani could you at least +1 the ones affecting your services blubberoid and termbox
[14:56:38] <fsero>	 _joe_: akosiaris this end my thursday stim of k8s, there is limitranges and resource quotas in the staging cluster ( prevents resource abuse and starvation), there are now podsecuritypolicies in place (no limits in things running in kube-system, and no privileges in the services part) and now users like urandom can do kubectl get events and see things if needed amongst other things.
[14:57:02] <akosiaris>	 cool
[14:57:04] <fsero>	 i'd say to leave the cluster in this state for a few days and if nothing goes wrong we can move this into production
[14:57:13] <akosiaris>	 I was about to say
[14:57:15] <_joe_>	 that's great
[14:57:15] <akosiaris>	 ^ 
[14:57:26] <urandom>	 fsero: \o/
[14:59:03] <fsero>	 tarrow: one byproduct of that CR is the termbox staging pod is not being launched
[14:59:08] <thcipriani>	 fsero: sure. One question: possible to change this once at the chart level and remove this from the individual cluster helmfiles?
[15:00:19] <tarrow>	 Just in some meeting; will look in a mo
[15:00:35] <fsero>	 thcipriani: not sure im following, but dont think so, one of the things we want to gain from this is observability from current cluster state
[15:00:49] <fsero>	 and regarding resources and configs could differ between clusters
[15:01:01] <fsero>	 think for instance in staging a new release requires more cpu ram et al
[15:01:06] <fsero>	 or other configs
[15:01:23] <fsero>	 i know it seems a lot of duplication (it is) but i dont have a better way
[15:02:32] <thcipriani>	 are values that aren't present in the individual helmfile.d values.yaml inherited from the stable chart?
[15:05:04] <thcipriani>	 that is, is https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/master/charts/blubberoid/values.yaml replaced by the values.yaml in helmfile.d/<env>/blubber/values.yaml or is it selectively overridden?
[15:05:25] <thcipriani>	 it seems like, from your patch, it is replaced?
[15:06:35] <fsero>	 it is overriden
[15:10:22] <tarrow>	 fsero: no objection to bumping it. I'm not sure I actually have the knowledge to know what I'm looking at though
[15:10:47] <tarrow>	 you say that if it is merged termbox staging will no longer launch
[15:11:35] <fsero>	 it will be fixed after merge
[15:12:06] <fsero>	 i deployed limitranges which control the resources used in any container
[15:12:26] <fsero>	 i've set the minimum request cpu to 100m
[15:13:00] <tarrow>	 ok!
[16:19:17] <wikibugs>	 10serviceops, 10Parsoid-PHP, 10Core Platform Team (Parsoid REST API in PHP  (CDP2)): Deploy Parsoid-PHP with Mediawiki to scandium for RT and performance testing - https://phabricator.wikimedia.org/T228069 (10ssastry)
[17:12:34] <wikibugs>	 10serviceops, 10Continuous-Integration-Config, 10Epic, 10Release-Engineering-Team (Pipeline), 10Release-Engineering-Team-TODO (201907): Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10greg)
[17:52:35] <ottomata>	 fsero:  q: are helmfile.d values merged with chart values?
[17:52:46] <ottomata>	 what happens if there is a field defined in both?
[17:53:02] <ottomata>	 (i want to override a config value that is set in chart values in helmfile values)
[17:53:06] <fsero>	 values defined in helmfile takes precedence
[17:53:11] <ottomata>	 great.
[17:53:32] <fsero>	 im not totally on keyboard ottomata  so i might answer with great latency :P
[17:55:04] <ottomata>	 :)
[17:55:07] <ottomata>	 thank you anyway!
[19:15:27] <wikibugs>	 10serviceops, 10Analytics, 10EventBus: Allow eventgate-analytics service to reach schema.svc.{eqiad,codfw}.wmnet:8190 - https://phabricator.wikimedia.org/T229051 (10Ottomata)
[20:30:09] <thcipriani>	 > Error creating: pods "blubberoid-blubber-thcipriani-8567b55bc5-jlttg" is forbidden: [minimum cpu usage per Container is 100m, but request is 1m.,
[20:30:39] <thcipriani>	 oh...nevermind...I already see my answer in scrollback " i deployed limitranges which control the resources used in any container"
[20:30:54] <thcipriani>	 broke helm test for blubberoid
[20:30:56] * thcipriani fixes.
[23:00:40] <wikibugs>	 10serviceops, 10DBA: phased rollout of dbctl, etcd-backed database configuration in Mediawiki - https://phabricator.wikimedia.org/T229070 (10CDanis)
[23:01:00] <wikibugs>	 10serviceops, 10DBA: phased rollout of dbctl, etcd-backed database configuration in Mediawiki - https://phabricator.wikimedia.org/T229070 (10CDanis)
[23:01:45] <wikibugs>	 10serviceops, 10DBA: phased rollout of dbctl, etcd-backed database configuration in Mediawiki - https://phabricator.wikimedia.org/T229070 (10CDanis)
[23:03:21] <wikibugs>	 10serviceops, 10Operations, 10Patch-For-Review, 10Release-Engineering-Team-TODO (201907), 10Wikimedia-Incident: docker-registry: some layers has been corrupted due to deleting other swift containers - https://phabricator.wikimedia.org/T228196 (10greg) What are the next steps with this incident task?  The...
[23:03:25] <wikibugs>	 10serviceops, 10DBA, 10Performance-Team (Radar): phased rollout of dbctl, etcd-backed database configuration in Mediawiki - https://phabricator.wikimedia.org/T229070 (10Krinkle)
[23:59:26] <wikibugs>	 10serviceops, 10Release Pipeline: Staging k8s ci namespace limitranges - https://phabricator.wikimedia.org/T229073 (10thcipriani)