[07:25:42] We haven't set such a quota yet, so no practical experience. I see it was marked as stable in 1.25, so it should be possible to set it in any of our k8s clusters (with the caveat something might have changed in between versions). Even without quotas set though, you can probably set it anyway in your pods to inform the scheduler and allow evictions [07:25:42] to happen (which btw would happen anyway under disk pressure). The docs imply there is some configuration for the kubelet, but I don't see anything that we need to change? [07:26:04] Test it out in dse-k8s I 'd suggest? Whatever applies there, would apply in wikikube too [11:32:34] inflatador: 50 GB feels to me like a lot to pull across the network, for each time a blazegraph pod were to restart. I'm not sure that wikikube would be a great target for migration, without persistent storage options. I'll chip in on the doc. [11:55:29] Agreed wdqs isn't particularly in scope for wdqs. [12:16:33] *in scope for wikikube ;) [12:20:34] yup, typo, thanks for the fix [12:50:37] qq folks - I keep seeing docker report k8s failing to fetch the image catalog from the docker registry [12:51:26] I remember that sometimes it was a problem in the past, especially depending on traffic, but I keep seeing it more consistently now [12:56:10] probably more images around [12:56:27] afaics we go through the CDN and on the registry's nginx access logs I see a lot of 499 registered [12:58:22] akosiaris: both base and k8s fails with the same error, it seems as if we reached a tipping point [12:59:04] quite probably [12:59:26] the plank bridge is showing another crack [12:59:44] I am a little worried since without docker-report our ability to track packages deployed is reduced a lot [13:00:21] IIUC docker reports wants all the images and then it filters, maybe there is a way to reduce the scope of what fetched? [13:00:26] from the registry I mean [13:00:31] totally ignorant about it [13:29:23] Do we have a feeling for where the bottleneck is in docker-registry that is causing the 499s? Could it be swift? Redis? [13:30:52] Or can we tweak the nginx caching, somehow? [13:33:24] Ha! Made me laugh out loud. https://github.com/wikimedia/operations-puppet/blob/production/modules/docker/templates/registry-nginx.conf.erb#L71 [13:40:04] sorry my bad, the error from docker report is the following [13:40:05] Max retries exceeded with url: /v2/_catalog?last=releng%2Fphp70&n=100 (Caused by ResponseError('too many 504 error responses')) [13:40:23] 499 came form a quick check in the nginx access logs but probably not the issue [13:40:46] was about to say: IIRC the catalog calls are paginated [13:41:21] but each of them probably requires a bunch of reads from swift [13:41:46] it is interesting that on nginx's access log I see [13:41:47] "GET /v2/_catalog?last=releng%2Fphp70&n=100 HTTP/1.1" 499 [13:42:18] 499 is nginx for "client side went away" right? [13:42:20] 499 is nginx saying client went away, no? [13:42:22] is that just the CDN timing out? [13:42:25] must be then :D [13:42:29] ATS will only wait so long for you [13:42:42] the IPs listed are indeed cpXXXX nodes [13:42:56] so that sounds like the issue is still the registry itself [13:43:29] and given that we now have precise timings in the access logs (when past-self is so wise I am very happy) [13:43:35] I see "rt=180.896" [13:43:52] so possibly ATS waits 180s and then it gives up? [13:45:01] The timeout value (in seconds) for time to first byte for HTTP and HTTP2 connections. (default: 180 secs) [13:45:10] https://gerrit.wikimedia.org/g/operations/puppet/+/94ba07246380becd93139731e4763ec9e8003f95/hieradata/common/profile/trafficserver/backend.yaml#583 [13:45:32] the catalog btw is the 1 endpoint of the registry that we allow to get cached at the CDN [13:46:05] but apparently by now we 're far beyond the CDN's tolerance [13:46:44] cdanis: the only thing that it is slighly off is that the 499s are HTTP1.1 conns, while from the settings in backend.yaml I see h2_settings, but maybe it is just a misnaming [13:47:58] jayme: qq about the pagination - docker-report asks 100 images from the catalog at the time, and it terminates when the docker registry returns the last batch? (namely, does it iterate etc..?). If so maybe reducing the window to 50 could help [13:49:43] elukey: I'd have to look at the code, not sure. But I would assume it does read the whole response body before beginning it's iteration over all tags in that batch [13:50:53] so it might help to take smaller batches - especially with images having many tags (which I think slows down the response from the registry - as it has to getch all the manifest hashes) [13:51:28] jayme: ack I'll check the code and report back, maybe there is a quick way out of this [13:51:35] thanks all for the brainbounce :) [14:54:38] folks due to baby-duties I'll not be able to attend the SIG meeting, sorry :( For https://phabricator.wikimedia.org/T373526 I am onboard with both of the points of course, with the caveat that we should all agree that adding the SIG's email to "core" images means that we'll take turns in upgrading/keeping-up those images [15:27:11] qq related to RBAC. I'm working on making airflow instances deployed via our airflow chart use the Kubernetes executor, which means that the airflow Deployment must be able to create/read/list/watch/delete pods, meaning a ServiceAccount/Role/RoleBinding. However, the `-deploy` user used by the helmfile isn't authorized to deploy the chart, due to [15:27:11] missing permissions when it comes to handling these RBAC resources. Does anyone have an idea as to how we should proceed? Deploy the chart via the admin user? Adding an exception in admin_ng/helmfile_rbac,yaml? Something else? Thanks [15:30:42] Or possible other solution: package the RBAC as a ServiceAccount/ClusterRole/ClusterRoleBinding in admin_ng/helmfile_rbac.yaml ? [16:50:08] Gotta say after listening in I'm very pleased WMDE just pays for a GKE managed cluster; I felt horrendously out of my depth and have large quantities of sympathy for y'all worrying about this closer to the metal and super old k8s stuff [16:50:52] <3 [16:51:27] I remember that "Upgrade to vX.Y.Z" button next to the worker pool in GKE :D [16:51:40] I just have auto upgrades on :P [16:51:55] (with canary) [16:52:04] I didn't dare :D [16:52:37] we have no SLAs and not much traffic so very different situation [16:54:16] I was trying to follow along with your PTR/dns talk. Why do you want to expose these internal to the cluster IPs ofr DNS outside the cluster? Did I even get it right that's what you want to do? [16:55:12] brouberol: isn't that RBAC stuff similar to that we do for some of the operators? Maybe I'm missing something ... but we can have a chat tomorrow if you like [16:56:20] tarrow: mainly for debugging reasons. We would like to enable outside k8s systems to properly lookup the source of connections to them for example [16:56:55] if those are pod-ips, we currently only know exactly that. But we can't (easily) figure out which pod or which namespace initiated the connection [16:57:58] jayme: gotcha; (super naive questions) but aren't they ephemeral anyway? that Pod IP can be reallocated to a different pod whenever can't it? [16:58:54] yes, absolutely. But if we do a reverse lookup while logging for example, we would capture the current state [17:01:18] tarrow: yeah, they're short-lived -- but if you're looking at, say, a bare-metal database server, and running `ss -t` to see what's overwhelming it with connections, and all you see is 200 pod IPs, ... :) [17:02:30] the pod IPs are routable everywhere inside production for us, and are what the pods use to make all their outbound connections [17:03:38] and actually, I've dug deeper into Calico and k8s docs and I have yet another proposal [17:05:06] cdanis: gotcha! [17:05:49] tarrow: more context at https://phabricator.wikimedia.org/T372943 if you are morbidly curious :) and some proposed fun with eBPF [17:37:38] jayme: I found yet another configuration knob we could use! https://phabricator.wikimedia.org/T344171#10172491