[06:57:10] jayme: yes, more or less, in the sense that the operator requires some RBAC to function. The difference is that the operator is deployed once, whereas we'll have multiple airflow deployments, each in its own namespace. As we can't seem to be able to package RBAC resources in charts, this would mean I'd need to either have admin_ng loop over each [06:57:10] airflow NS to create the SA, role and Rolebinding, or create a single SA/ClusterRole/ClusterRoleBinding and refer to this SA in all my airflow deployments, I think. [06:57:23] cdanis: ah, I might not have expressed myself properly. I was concerned that advertising service cluster IPs will automatically do so for all services (not just the one we want). But indeed that could be circumvented with externalIPs - making it more obvious for which services this is enabled and for which is is not [07:02:09] brouberol: Ah...now I think I remember when I last dealt with airflow. So basically you're deploying a k8s client (kubernetes-executor) that is running inside airflow and does schedule workload (jobs/pods) on it's own, right? [07:02:37] that's exactly right [07:03:37] Airflow has the notion of an executor, which is how airflow jobs will run. The LocalExecutor will run jobs as subprocesses, the CeleryExecutor will run them as Celery tasks (async worker tasks), etc, and the KubernetesExecutor will create a Pod that will run the job [07:03:47] yeah, I remember now [07:04:13] so the Airflow Deployment must be tied to a ServiceAccount with R/W permissions on pod resources [07:07:23] brouberol: and there is no central "authority", like an operator, that is deployed via admin_ng, right? [07:11:59] what I was thinking: You could have a proper ClusterRole defined via admin_ng and refer to that by a RoleBinding that could be created by the airflow chart [07:13:38] might be to early in the day for me but allowing the deploy user to create RoleBinding and ServiceAccount objects should not allow it to escalate privileges beyond it's own namespace [07:14:12] but for that global change, we should probably have a phab task - if not only for a paper trail [07:33:21] ah that's a good idea! I can try that, thanks. I have a phab task in which I can explain what we're doing w.r.t RBAC. I'll tag you in there [07:45:31] o/ - I filed https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075502 as a follow up of the chat that we had yesterday about the Docker reports timeouts when fetching the catalog from the registry [07:46:20] jayme: https://phabricator.wikimedia.org/T364389#10174265 [07:48:47] thank [07:48:52] +s [07:49:26] brouberol: RoleBinding, not ClusterRoleBinding [07:50:07] allowing the deploy user to create ClusterRoleBindings would make it possible to grant yourself privileges to other namespaces [07:50:21] and unscoped objects...so basically root [07:51:22] but you can create a RoleBinding (namespace scoped) for a ClusterRole (cluster scoped) [07:53:32] oh, you're right [07:56:05] elukey: impressive how shitty the docker registry documentation is, given that it's more or less the only available implementation [07:56:43] jayme: ahhaha yes.. I am trying the setting on docker-registry1005, just to be sure [07:57:14] elukey: please do...I'm already confused by the fact that the code sais the default is 1000 - not 100 [07:57:54] jayme: there is a default max entries variable in the code with 100, I think that the docstring is not up-to-date [07:58:03] Link: ; rel="next" [07:58:07] yep it works :) [07:59:01] cool [07:59:39] brouberol: is the deploy user allowed to create RoleBinding objects? I don't recall [08:00:37] all right rolling out, thanks for the review! [08:24:55] jayme: I'm not seeing any authorization.k8s.io in the permission groups [08:25:10] (in the deploy clusterrole) [08:29:58] `User "airflow-test-k8s-deploy" cannot get resource "rolebindings" in API group "rbac.authorization.k8s.io" in the namespace "airflow-test-k8s"` [08:30:06] yep, deploy can't manage rolebindings as well [08:31:17] I think that is something we could change globally [08:31:38] But I'd like to have that discussed to make sure we're not missing anything here [08:32:57] ack. Until then, would it be ok if I added a loop in admin_ng/helmfile_rbac.yaml, going over all the airflow namespaces and creating a SA, Role and RoleBinding triplet? [08:33:42] brouberol: maybe helmfile_namespace is more suited as it already loops over all namespaces [08:33:58] I hate it - but I also don't want to block you :) [08:34:17] haha, thanks [08:35:06] If you'd open a separate task specifically for granting the deploy user permissions to manage role and rolebinding object I would assume we can pass that by a couple of people quickly [08:35:19] yep, I can do that [08:35:20] not having to wait for the next sig meeting [08:36:12] I think there is no harm in allowing those two as they are namespace bound anyways...but I might be missing something [08:39:37] the maxentries=50 improved a lot the timeouts, but it is still too slow, so I propose to go down to 25 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075510 [08:39:43] hopefully it is the sweet spot [08:40:18] sorry the best balance between perf and flexibility [08:40:33] I've created https://phabricator.wikimedia.org/T375597 [08:41:57] thanks [08:51:57] brouberol: and now it comes to mind that this means the deploy user can also modify it's own privileges then :/ [08:55:02] brouberol: what you could do as well is override the deployClusterRole for airflow namespaces - which means not modifying the global deploy user rights [08:55:28] that is already done for kserve and spark [09:02:43] ah, so we'd create a deploy-airflow clusterole that is authorized to create delete role and rolebindings, and we'd deploy the chart via helmfile with that clusterRole ? [09:03:28] yes [09:03:29] * we'd bind this custom ClusterRole to the deploy User [09:03:38] yes :) [09:03:50] I like that better than injecting more cr*p into helmfile_namespace.yaml [09:04:13] I still hate it, but I agree :D [09:04:42] TBF you hate many things [09:04:43] plus you could then easily test/verify the statements from https://kubernetes.io/docs/reference/access-authn-authz/rbac/#restrictions-on-role-binding-creation-or-update [09:05:05] it sound like there is no way for privilege escalation... [09:05:11] that is indeed true :) [09:54:59] jayme: I've refaactored https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1075508 according to your last suggestion [09:57:30] you already know I hate it, right? :p [10:00:14] yes I do [10:00:22] :D [11:12:14] pagination to 25 seems to work better on the docker registry, but I still see 504s [11:12:28] even if on the nginx logs (on the docker registry) I don't see the 499s anymore [11:12:47] so I filed https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075528 to change the ATS timeouts for docker registry [13:56:07] answering to myself - at the moment it is not ats complaining, but nginx - https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075570 [13:56:35] I tried docker-report directly using docker-registry.discovery and I got 504s from nginx at 240s [13:57:54] pagination seems to help a bit, but over time after some pagination requests the registry starts to accumulate time and replies with worse and worse timigns [13:57:57] *timings [14:12:16] hm...are we filling some cache maybe? Or does it take longer even if you start "in the middle"? [14:18:35] lemme try [14:23:32] nope ok even a single request takes ages [14:23:40] 4 minutes and hits 240s [14:24:43] trying another one [14:25:04] (I tried the one leading to the 504, now I am trying another one that gets served with 200 to check the timing) [14:30:13] jayme: ah I did understand your concern but maybe I was unclear in replying -- I think that previously Calico only supported advertising all service IPs, but, you can give it CIDRs now and I don't see any reason why you couldn't tell it to only advertise a specific /32 service IP [14:30:22] otoh externalIP makes it much clearer in a few ways [14:31:45] indeed...also it might make it more clear that the IP is supposed to be "static" [14:32:07] jayme: the timings are consistent, even if the requests are not "cumulated" [14:33:20] elukey: and the first page is as slow as a page in the middle I suppose? [14:34:50] jayme: so far it seems that the first pagination requests are quicker, the last ones tend to get more time to complete.. but if you pick randomly and execute a single request, the timing is the same [14:35:00] I tried 2/3 times with different examples from the access log [14:35:28] hm...odd [14:36:47] not sure if a lower pagination makes things worse, but shouldn't be.. I've seen docker report failures in the past weeks [14:37:18] maybe there is some ordering that the registry keeps, and the last pages are slower for some reason [14:37:34] the sad thing is that a ton of images that the pagination return are already gone [14:38:16] I'd honestly bump the timeout to 300s, the real fix is to run a proper GC cleanup for the registry [14:38:28] even if we know it may not feasible [14:39:29] what kind of GC? The actual swift GC (which isn't really working) or setting up a process to start untagging images? [14:39:39] the latter could help even without the former [14:40:34] akosiaris: ideally a GC that cleans up the catalog from images that we have deleted :D [14:41:09] the later might be then ok [14:41:14] I was chatting with Janis about https://gitlab.com/gitlab-org/docker-distribution-pruner earlier on, could be something to try (it doesn't support Swift out of the box, we'd need to send a pull request) [14:41:25] because IIRC deleted images aren't listed in the catalog [14:41:46] please tell me this ^ is right [14:41:57] I thought the same but nope, they are :( [14:42:13] 😢 [14:42:46] because I went through the deletion of a ton of stretch/buster stuff that I see flying through requests (paginated) when docker-report calls the registry for the catalog [14:43:04] elukey: doesn't support swift as in, won't send the swift-specific auth headers? [14:43:09] I see *those images [14:43:33] cdanis: I think it supports only the S3 API, not something different [14:43:56] and of course they mark everything as experimental [14:44:03] isn't... isn't Swift largely S3-compatible for basic CRUD? [14:44:35] I am very ignorant, so can't say, hopefully yes but we'd need to check [14:44:51] assuming that the tool is something that we want to explore [14:44:54] https://docs.openstack.org/swift/latest/s3_compat.html [14:45:29] but the alternative is running the registry in GC mode (read-only) and IIRC akosiaris tried it in the past, it takes days [14:45:47] cdanis: looks promising [14:46:10] yeah, there is the compat layer. But we're not using that for the docker registry...might be possible to point the cleanup thing to it though [14:46:14] yup [14:47:35] https://lounge.uname.gr/uploads/2874118810982d0a/IMG_20240812_103026455_HDR.jpg [14:47:42] I present you... the registry ^ [14:48:14] the artwork is from yours truly, probably including exim metadata so that you know exactly where it is [14:48:35] exif* (dammit it exim, you 've scared me for life) [14:49:36] we need to somehow put into the APP cross team work to make this better [14:49:51] cause we all rely on it and we aren't maintaining it as well as needed [14:50:53] the water under the registry does not look good... [14:52:59] ahahahahahaha [15:07:33] so, if everybody is ok I'd bump the timeout to 300s, so docker-report should restart working (hopefully) [15:08:06] then we can open a specific task for the distribution pruner, and see if we can test it (even a dry run would be fine, IIUC it supports it) [15:53:10] have we ever looked at https://goharbor.io/ or any other image registry? Not sure if that solves our problems, just curious if there's a better way [15:57:06] IIUC harbor uses docker distribution behind the scenes, that is the only really open source option (but there may be something that we haven't found yet) [15:57:09] yes. Quite a while ago and once more last year. First time we felt we couldn't support it (it was like 8 containers?) [15:57:35] last year we mostly reiterated quickly (like 1h or so?) on the previous result. [15:58:02] We aren't in the same place of course as either 7 years ago or last year, we should re-evaluate [15:58:26] toolforge runs harbor btw. if you search phab there's a ton of tickets [15:59:04] akosiaris: on qs - do you think that it would be viable to run https://docker-docs.uclv.cu/registry/garbage-collection/#run-garbage-collection with dry-run mode, maybe over a weekend, to see roughly if 48h could be enough for a first big mark/sweep phase? [15:59:31] not sure if the registry needs to be in read-only mode when running it in dry-run [15:59:44] It does not. [15:59:53] yeah it isn't needed [16:00:03] ah lovely, shall we set up a test then? [16:00:10] elukey: I don't mind. I also don't expect it to finish in a weekend but go ahead [16:00:51] akosiaris: I have the same feeling yes, but maybe if it is 3 days we could have a ballpark figure about what it takes [16:01:20] like how much time it would need to be read-only if we remove the dry-run etc.. [16:01:36] 3 days if we count a weekend could be doable [16:06:13] elukey: btw, with claime we were discussing this exact same thing on June. The best we arrived at was https://www.raftt.io/post/keeping-your-private-image-registry-clean.html [16:06:38] see, it's that link :p [16:06:56] so maybe docker-distribution-pruner (which you linked above) is going to work better [16:07:03] give it a try [16:07:11] or not at all because it requires s3 [16:07:23] I was chatting about it with Janis earlier on, and there are some concerns: [16:07:30] ah good point [16:07:38] I skipped the chat above, my bad [16:07:41] 1) The code is not really updated in a while, except from "update for golang XXXX" [16:07:51] 2) Everything is marked as experimental :D [16:08:03] awesome :-( [16:08:12] 3) The s3 stuff, buut in theory there is a decent interface so we could send a pull request for swift [16:08:38] OTOH, all the design decisions discussed in docker-dist-pruner's README, I like [16:09:17] elukey: imo it's worth trying the s3 compat layer + d-d-p dry-run asap [16:10:05] I have a feeling we're going to wind up doing something 'experimental' no matter which path we take [16:10:42] cdanis: I am struggling between the d-d bundled gc dry run and the d-d-p one, the former feels a little safer (but way slower( [16:11:03] fail fast ;) [16:11:53] (I'm being a bit cheeky but also totally serious) [16:12:41] you could probably give d-d-p an access token that only had read access, if you're really concerned, and at that point it's just load that's the potential issue -- and it looks like the [experimental] parallelism is also off by default [16:16:49] going to open a task to outline all the options, then we can decide [16:25:04] https://phabricator.wikimedia.org/T375645 [16:25:13] lemme know if I missed something etc.. [16:26:53] my only remark is, "its last commits are only related to supporting new golang versions (so active development of features and general reliability seems to have stopped)" -- maybe that's *good* news :D [16:31:58] cdanis: hahah could be yes, but if everything is marked as experimental I have some doubts.. I can definitely accept something like "use it at your own risk, the code is stable but we don't really guarantee anything, don't complain to us if you drop images that weren't supposed to be dropped :D" [16:32:40] not that I trust more the docker distribution built-in GC [16:32:49] the design of that one is awful [16:32:50] maybe just a little, but not dramatically more [16:32:53] I almost suspect deliberately so [16:33:25] I suppose all the big clouds have written their own GCs [16:33:34] it can't be they use this [16:34:31] hah [16:34:34] I just looked at Harbor docs [16:34:40] v1 had the same "puts cluster into read-only mode" [16:34:44] v2 doesn't, it does this instead: [16:34:46] > To avoid damaging the uploading artifact, the garbage collection introduces a time windows(2 hours) to reserve the recent uploaded layers. Garbage collection does not sweep the manifest & blob files that have a timestamp in the time window. Harbor runs garbage collection without interrupting your ability to continue use Harbor, for example you are able to push, pull, or delete artifacts while [16:34:48] garbage collection is running. [16:35:03] Janis linked me this earlier on.. https://github.com/goharbor/harbor/issues/19401 [16:35:18] lol [16:35:22] that's encouraging [16:35:23] lol [16:35:24] not [16:35:46] https://gitlab.wikimedia.org/repos/releng/reggie ? :-) [16:36:02] 👀 [16:36:59] wow TIL [16:37:36] dancy: sqlite metadata + local file storage for blobs only? [16:38:48] nod. We use this for the gitlab-cloud-runners [19:15:52] wow, harbor sounds pretty bad ;( [19:16:40] dancy: how hard would it be to move the storage side of things to swift? [19:17:13] (let's assume for the moment that the sqlite part doesn't matter) [20:19:31] looks like quay can be self-hosted too, I didn't realize that https://github.com/quay/quay/blob/master/docs/quick-local-deployment.md [21:13:27] well, damn, I guess it really is docker-distribution all the way down ;(