[08:42:06] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes: Increase visibility of container/pod ressource exhaustion - https://phabricator.wikimedia.org/T266216 (10JMeybohm) p:05Triage→03Medium
[08:42:45] <wikibugs>	 10serviceops, 10Kubernetes, 10User-jijiki: Deploy kube-state-metrics - https://phabricator.wikimedia.org/T264625 (10JMeybohm)
[08:42:47] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes: Increase visibility of container/pod ressource exhaustion - https://phabricator.wikimedia.org/T266216 (10JMeybohm)
[08:42:49] <wikibugs>	 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Followup): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10JMeybohm)
[08:45:03] <wikibugs>	 10serviceops: Upgrade kubernetes nodes to kernel 4.19.x - https://phabricator.wikimedia.org/T255273 (10JMeybohm)
[08:45:07] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10User-jijiki: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527 (10JMeybohm)
[08:46:42] <wikibugs>	 10serviceops, 10Wikifeeds, 10Kubernetes: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation - https://phabricator.wikimedia.org/T266194 (10JMeybohm) a:03JMeybohm I think you are right, thanks for the heads up!  While this probably is also an issue of "too much throttling" (T262527),...
[10:05:13] <wikibugs>	 10serviceops, 10Wikifeeds, 10Kubernetes, 10Patch-For-Review: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation - https://phabricator.wikimedia.org/T266194 (10akosiaris) That's pretty interesting, there shouldn't be so much throttling at so low CPU usage. user+system summed barely h...
[10:11:16] <wikibugs>	 10serviceops, 10Wikifeeds, 10Kubernetes, 10Patch-For-Review: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation - https://phabricator.wikimedia.org/T266194 (10JMeybohm) >>! In T266194#6570986, @akosiaris wrote: > That's pretty interesting, there shouldn't be so much throttling at so...
[12:57:25] <cdanis>	 akosiaris: jayme: what's the eventual plan (if any) for CI validation of helm charts and values.yaml files? :)
[13:01:05] <akosiaris>	 cdanis: they are being validated right now. Both helm charts AND deployments
[13:01:23] <akosiaris>	 as in validated they are valid per the kubernetes schemas
[13:01:36] <akosiaris>	 not just valid yaml. Is there more you would like to see ?
[13:01:46] <cdanis>	 akosiaris: I was asking about the 'ressources' typo :)
[13:02:03] <cdanis>	 I understand that values.yaml can be freeform in some parts, but not all parts, right?
[13:02:27] <akosiaris>	 ah, yeah that one is interesting. That's an invalid key but valid yaml. So it does not override anything and would end up being a no-op
[13:02:59] <cdanis>	 right, but I feel like that's something that we can/should protect against -- admin module data.yaml for instance was similar, but we added validation against a schema there
[13:03:51] <akosiaris>	 note btw, that is the typo was not in ressources, but in limits (e.g. lamits) CI would have caught it 
[13:04:19] <akosiaris>	 cause it wouldn't generate an invalid manifest for kubernetes
[13:04:22] <akosiaris>	 it would*
[13:04:50] <jayme>	 Yeah, I just made a pretty good typo :-P
[13:06:30] <akosiaris>	 not sure how we could have caught that tbh. Maybe diff against a manifest without the patch? And say it's noop? but that wouldn't catch the typo, it would just tell you your change doesn't really do anything
[13:07:17] <jayme>	 And that's probably something that we want to be able to do (noop changes)
[13:15:07] <akosiaris>	 it's kind of a weird corner case. If the input was resulting in invalid kubernetes yaml we would have caught it, but it doesn't cause it's a noop. But there isn't really a schema for values.yaml files as they can have arbitrary data in them
[13:27:09] <wkandek>	 akosiaris, jayme, cdanis: not directly related, but I really enjoyed this talk on Senpai, a system that "starves" memory and tries to find out the minimla memory footprint: https://www.youtube.com/watch?v=ujk2pfgPul8
[13:28:08] <wkandek>	 Watched it because it came indirectly through a candidate for our open position. Indirectly: I watch the candidates presentation, this one was next and sounded pretty intersting.