[00:59:36] 10serviceops, 10Release-Engineering-Team, 10Scap: Missing annotations for sync-wikiversions - https://phabricator.wikimedia.org/T235787 (10thcipriani) [01:00:03] 10serviceops, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10SRE, 10Scap: Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 (10thcipriani) [01:05:53] 10serviceops, 10Release-Engineering-Team, 10dev-images: Sync node versions between docker dev and slim images - https://phabricator.wikimedia.org/T265554 (10thcipriani) [01:07:07] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10Patch-For-Review: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10thcipriani) [01:19:48] 10serviceops, 10Gerrit, 10Release-Engineering-Team: Rename operations/debs/poolcounter-prometheus-exporter to match other Prometheus repositories - https://phabricator.wikimedia.org/T239688 (10thcipriani) [01:19:51] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10Wikimedia-Portals: Migrate www.wikimedia.org (the portal) to be hosted as a service - https://phabricator.wikimedia.org/T238747 (10thcipriani) [01:19:55] 10serviceops, 10MediaWiki-Docker, 10Release-Engineering-Team, 10User-brennen: Clarify and document our docker image building process and policies. - https://phabricator.wikimedia.org/T216234 (10thcipriani) [01:20:37] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team, 10SRE, 10Goal: Self-service Deployment Pipeline - https://phabricator.wikimedia.org/T228676 (10thcipriani) [03:30:50] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team: Investigate how we can provide an mwdebug functionality on kubernetes - https://phabricator.wikimedia.org/T276994 (10thcipriani) [03:30:56] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team: Progressive rollout of MediaWiki deployment on Kubernetes - https://phabricator.wikimedia.org/T276487 (10thcipriani) [03:31:12] 10serviceops, 10Release-Engineering-Team: switch contint prod server back from contint2001 to contint1001 - https://phabricator.wikimedia.org/T256422 (10thcipriani) [03:46:32] 10serviceops, 10MW-on-K8s, 10Release-Engineering-Team, 10Release Pipeline (Blubber): Blubber needs to check if a user is present before creating it as part of its runs stanza - https://phabricator.wikimedia.org/T268819 (10thcipriani) [03:48:48] 10serviceops, 10MW-on-K8s, 10Platform Engineering, 10Release-Engineering-Team, and 4 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10thcipriani) [03:53:54] 10serviceops, 10Gerrit, 10Release-Engineering-Team, 10SRE: Deploy multi-site plugin to gerrit1001 and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10thcipriani) [07:54:32] 10serviceops, 10Machine-Learning-Team: Kubernetes packages in Debian Bullseye - https://phabricator.wikimedia.org/T280625 (10elukey) [08:05:09] 10serviceops, 10Machine-Learning-Team, 10SRE: Kubernetes packages in Debian Bullseye - https://phabricator.wikimedia.org/T280625 (10MoritzMuehlenhoff) [08:43:35] 10serviceops, 10Citoid, 10SRE, 10Wikimedia-Logstash, and 2 others: Citoid is logging all request / response headers as separate fields - https://phabricator.wikimedia.org/T239713 (10fgiunchedi) [08:44:13] 10serviceops, 10Icinga, 10SRE, 10observability: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10fgiunchedi) [08:49:29] 10serviceops, 10Machine-Learning-Team, 10SRE: Kubernetes packages in Debian Bullseye - https://phabricator.wikimedia.org/T280625 (10akosiaris) For the "main" set of clusters, we have devised a plan to adopt upstream binaries but without relying on their repos. The Policy as well as the reasoning for that (th... [09:13:44] hello folks, is there anybody using minikube on bullseye? [09:15:04] I keep seeing "Kubelet: mountpoint for cpu not found", and docker runs with systemd cgroups and version 2 [09:16:08] https://github.com/systemd/systemd/issues/13477#issuecomment-528076476 is the most relevant thing that I found, but no idea what is the best setting for minikube [09:16:15] (still super ignorant about it) [09:19:50] elukey: i'm on bullseye and latest minikube works for me out of the box, only seeing issues with older k8s versions [09:23:40] Majavah: thanks for the feedback, I have the same error with k8s 1.16 and 1.20, weird [09:31:46] my impression is that the minikube's kubelet is trying to use cgroups v1 for some reason, weird [09:32:38] wait now I tried k8s 1.20.2 and it worked [09:32:41] * elukey cries in a corner [09:32:46] 1.16.0 still doesn't [09:32:55] and yesterday 1.20.1 didn't as well [09:33:10] for me >= 1.19 works, only tried last patches [09:36:06] ack thanks, will try istio on 1.20 then :) [09:46:27] --- [09:46:42] for Istio, to keep going with the chat that we had yesterday: https://phabricator.wikimedia.org/T278192#7019228 [09:47:18] for the bare minimal config, that we need for the mvp, it seems that three dockerhub images are required [09:49:07] 10serviceops, 10SRE, 10observability: conftool unable to announce changes to icinga.wikimedia.org:9200 - https://phabricator.wikimedia.org/T280642 (10jijiki) [09:49:18] 10serviceops, 10SRE, 10observability: conftool unable to announce changes to icinga.wikimedia.org:9200 - https://phabricator.wikimedia.org/T280642 (10jijiki) p:05Triage→03High [09:49:20] assumption - the istio deployment was done via istioctl, not helm [09:51:46] and https://github.com/istio/istio/blob/release-1.6/tools/istio-docker.mk seems interesting [09:55:12] 10serviceops, 10SRE, 10observability: conftool unable to announce changes to icinga.wikimedia.org:9200 - https://phabricator.wikimedia.org/T280642 (10jijiki) 05Open→03Invalid [09:56:57] 10serviceops, 10SRE, 10observability: conftool unable to announce changes to icinga.wikimedia.org:9200 - https://phabricator.wikimedia.org/T280642 (10fgiunchedi) For future reference, we're explicitly allowing 9200/tcp only from a selection of hosts (deployment, cumin, etc) (in `modules/profile/manifests/tcp... [10:21:20] 10serviceops, 10SRE, 10WMF-JobQueue, 10Patch-For-Review, 10Sustainability (Incident Followup): Have some dedicated jobrunners that aren't active videoscalers - https://phabricator.wikimedia.org/T279100 (10akosiaris) 05Resolved→03Open Reopening. The bug that @jeena reported in T279100#7000270 is repro... [10:25:43] 10serviceops, 10SRE, 10WMF-JobQueue, 10Patch-For-Review, 10Sustainability (Incident Followup): Have some dedicated jobrunners that aren't active videoscalers - https://phabricator.wikimedia.org/T279100 (10akosiaris) So, the crux of the issue is at those 2 functions below `lang=python def pool(self,... [11:49:54] <_joe_> uhm eventstreams fails helm test [12:08:42] kubeflow also uses https://github.com/jetstack/cert-manager [12:08:50] to complete the joy [12:13:50] <_joe_> elukey: so basically https://cdn.thenewstack.io/media/2020/09/91df2aa6-ehg8e7suyaechic.jpg that's kubeflow dependency diagram [12:14:22] <_joe_> and that's either you or klausman right there [12:14:24] <_joe_> :D [12:26:10] _joe_ you are clearly not thinking as MLOps [12:26:17] embrace the madness [12:26:49] I know that akosiaris is secretly planning to steal all our technologies for the "main" clusters [12:27:36] mlops -> flops :-P [12:27:53] jokes aside, the kfserving part of kubeflow (what we need for ml-serve) seems using istio midly (basically only its gateway and basic control plane) [12:28:10] apergos: how dare you mocking this brand new evolution of ops people :D [12:28:54] as a devops until I die (tm), who am I to mock... except in unit tests of course! [12:29:04] :D [12:32:36] 10serviceops, 10WMF-JobQueue, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Pchelolo) After deploying all the updated improved timeouts, we're d... [12:50:20] <_joe_> akosiaris: I thought all the "helm test" stanzas were corrected [12:50:39] <_joe_> as in running helmfile test fails on quite a few charts [12:56:16] 10serviceops, 10WMF-JobQueue, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Ottomata) That's great! [12:58:50] 10serviceops, 10WMF-JobQueue, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Pchelolo) >>! In T249745#7020022, @Ottomata wrote: > That's great!... [13:02:38] _joe_: No I don't think we ever put any effort into fixing all charts [13:02:47] <_joe_> ack [13:02:50] * _joe_ sad [13:03:15] but running service-checker against them from deploy1002 should work for most [13:03:30] <_joe_> oh sure [13:03:49] <_joe_> I'm just writing this small script that does diff -> deploy -> test [13:04:02] <_joe_> and if test fails, it marks the deployment as failed and doesn' proceed [13:04:07] <_joe_> *doesn't [13:13:06] 10serviceops, 10WMF-JobQueue, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Ottomata) > Do not divide the skin of a bear we didn't kill yet. ٩(... [16:17:54] 10serviceops, 10Scap, 10Release-Engineering-Team (Radar): Deploy Scap version 3.17.1-1 - https://phabricator.wikimedia.org/T279695 (10thcipriani) [16:24:06] 10serviceops, 10MW-on-K8s, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Seen): Blubber needs to check if a user is present before creating it as part of its runs stanza - https://phabricator.wikimedia.org/T268819 (10dduvall) p:05Low→03Medium [16:26:09] 10serviceops, 10MW-on-K8s, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Seen): Blubber needs to check if a user is present before creating it as part of its runs stanza - https://phabricator.wikimedia.org/T268819 (10dduvall) a:03dduvall Picking this back up in the context of #mw-on-k8s work.... [16:31:07] 10serviceops, 10SRE, 10Release-Engineering-Team (Radar): Hundreds of tags for `wikimedia/mediawiki-core` image - https://phabricator.wikimedia.org/T242775 (10thcipriani) [16:38:04] 10serviceops, 10Release Pipeline, 10SRE, 10Release-Engineering-Team (Radar), and 2 others: Remove obsoleted docker images - https://phabricator.wikimedia.org/T242604 (10thcipriani) [16:45:06] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10Rileych)