[10:09:09] 10serviceops, 10affects-Kiwix-and-openZIM: ETAG response headers not always with double-quotes - https://phabricator.wikimedia.org/T256217 (10Kelson) [10:20:05] 10serviceops, 10Operations, 10Traffic, 10affects-Kiwix-and-openZIM: ETAG response headers not always with double-quotes - https://phabricator.wikimedia.org/T256217 (10ema) p:05Triage→03Medium [11:16:50] 10serviceops, 10Operations: Remaining nginx packages on some mw servers - https://phabricator.wikimedia.org/T255565 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This is complete [11:17:04] 10serviceops, 10Operations, 10SRE-swift-storage, 10Patch-For-Review: Access to the thanos-swift cluster for ChartMuseum - https://phabricator.wikimedia.org/T256020 (10JMeybohm) Commit in private is `e427c266f2d6ac0a937bf5d972b759933a9f9a18` [11:57:54] 10serviceops, 10Operations, 10Sustainability (Incident Prevention): Increase capacity of the sessionstore dedicated kubernetes nodes - https://phabricator.wikimedia.org/T256236 (10akosiaris) [11:58:01] 10serviceops, 10Operations, 10Sustainability (Incident Prevention): Increase capacity of the sessionstore dedicated kubernetes nodes - https://phabricator.wikimedia.org/T256236 (10akosiaris) p:05Triage→03High [12:02:08] 10serviceops, 10Operations, 10Sustainability (Incident Prevention): Increase capacity of the sessionstore dedicated kubernetes nodes - https://phabricator.wikimedia.org/T256236 (10akosiaris) Currently, sessionstore sets a limit of 400Mi and 2.5 CPUs[1]. Memory wise, the nodes have 4GB RAM and 6 CPUs. The eas... [13:24:44] 10serviceops, 10Operations, 10Patch-For-Review, 10Sustainability (Incident Prevention): Increase capacity of the sessionstore dedicated kubernetes nodes - https://phabricator.wikimedia.org/T256236 (10akosiaris) [13:35:58] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Investigate the iowait issues plaguing kubernetes nodes since 2020-05-29 - https://phabricator.wikimedia.org/T255975 (10akosiaris) >>! In T255975#6247460, @JMeybohm wrote: > Thanks for writing this up @akosiaris! I think it w... [13:38:39] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Investigate the iowait issues plaguing kubernetes nodes since 2020-05-29 - https://phabricator.wikimedia.org/T255975 (10akosiaris) [13:41:11] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10akosiaris) [13:41:20] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10akosiaris) p:05Triage→03Medium [13:43:47] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10akosiaris) An interesting thing to note here is that some services have quite often pod restarts. e.g. ` kubectl get pod... [13:50:12] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10JMeybohm) With kube-state-metrics (sorry for me repeating this over and over 😂 ) there is `kube_pod_container_status_resta... [14:16:40] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10akosiaris) >>! In T256256#6252638, @JMeybohm wrote: > With kube-state-metrics (sorry for me repeating this over and over 😂... [14:25:11] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Prevention): Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10JMeybohm) > That could help but the alert should always be actionable. For that to happen the owner needs to acknowledge t... [15:47:06] 10serviceops, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Release-Engineering-Team: Migrate push-notifications service to latest node LTS version - https://phabricator.wikimedia.org/T256131 (10LGoto) p:05Triage→03Medium [17:17:41] 10serviceops, 10Operations: SRE FY2019-20 Q3 goal: Increase reach of deployment pipeline - https://phabricator.wikimedia.org/T212935 (10Aklapper) [17:49:34] 10serviceops, 10Operations: Clean up the /*/mw/ mcrouter routing prefix - https://phabricator.wikimedia.org/T256291 (10RLazarus) p:05Triage→03Low [23:41:47] 10serviceops, 10Continuous-Integration-Infrastructure, 10Operations, 10Patch-For-Review: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 (10Dzahn) @hashar For some reason on releases1002/2002 (new VMs on buster), after applying the releases role, one...