[07:39:38] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team, 10Patch-For-Review: Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10JMeybohm) >>! In T279411#6981284, @kostajh wrote: > @akosiaris @JMeybohm wondering if you all have idea... [07:51:45] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team, 10Patch-For-Review: Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) >>! In T279411#6982708, @JMeybohm wrote: >>>! In T279411#6981284, @kostajh wrote: >> @akosiari... [07:53:59] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team, 10Patch-For-Review: Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10akosiaris) >>! In T279411#6981284, @kostajh wrote: > @akosiaris @JMeybohm wondering if you all have ide... [08:20:10] 10serviceops, 10Maps, 10Packaging: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064 (10MoritzMuehlenhoff) @msantos, @hnowlan : I've uploaded the postgis 3.1.1 backport to the newly created component/postgis for buster. You can add it to the maps Puppet manifests using... [08:21:14] 10serviceops, 10Maps, 10Packaging, 10SRE: Packaging PostGIS 3.1 for the new Maps stack - https://phabricator.wikimedia.org/T277064 (10MoritzMuehlenhoff) [09:37:03] akosiaris: elukey: https://gerrit.wikimedia.org/r/c/labs/private/+/677825 I would add a similar patch to private puppet while having puppet disabled on the production masters only (so ml and staging would pick up the change) if you're okay with that [09:38:55] jayme: +1 [09:56:02] elukey: done. Also ran puppet on ml-serve-ctrl* [09:57:20] super thanks a lot [09:57:59] akosiaris: I see a diff in admin_ng for staging-eqiad that removes globalnetworkpolicies...won't apply [10:00:44] yes I was looking at that [10:01:14] seems like calico/helmfile.yaml isn't including the per "what we call it" values [10:01:27] what we call it = main vs ml-serve [10:03:39] hmm...I think it should not include that [10:03:48] that would make it environment specific [10:05:32] ah, now I get it...yeah. [10:06:34] That's king of a blocker for the common.yaml / common-but-not-common-between-ml-and-main.yaml split [10:06:38] *kind [10:08:19] king was correct too ;-) cause it's a big one [10:08:29] I am thinking how to resolve it, haven't found an easy way yet [10:09:15] because ther is none as we try to introduce a second layer of "environments" and that means fighting king helmfile and his army [10:09:53] yup [10:10:05] I have an idea...maybe [10:10:53] of course coredns and eventrouter have the same issue, it's just that we don't have any value differing there yet [10:12:48] so, really hackish idea is: We prefix the environments with something (let's say "main_" and "ml_" for now) so they become "main_codfw", "main_staging-eqiad", "ml_ml-serve-codfw"... [10:14:08] and then include another values file in calico(etc)/helmfile.yaml by splitting that prefix of. Like "values/{{ (split "_" .Environment.Name)._0 }}/common.yaml" [10:16:49] that would break a lot of stuff where we currently use .Environment.Name .. [10:26:10] indeed it would, and it's pretty hackish.. sigh [10:50:14] akosiaris: https://gerrit.wikimedia.org/r/c/operations/puppet/+/677839 last step for infrastructure_users template if you have a second [10:57:16] +1ed [10:57:39] thanks [10:59:06] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [10:59:29] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: k8s_infrastructure_users: rsyslog and echostore share the same id - https://phabricator.wikimedia.org/T269461 (10JMeybohm) 05Open→03Resolved Users and template migrated to use the username as user ID and a YAML list of groups instead of... [11:00:02] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times - https://phabricator.wikimedia.org/T270271 (10JMeybohm) [11:00:04] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Support multiple kubernetes versions with puppet - https://phabricator.wikimedia.org/T278329 (10JMeybohm) [11:00:51] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times - https://phabricator.wikimedia.org/T270271 (10JMeybohm) This is probably something we can revisit when we've decided how we deal with multiple k8s versions in the future (T... [11:01:03] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times - https://phabricator.wikimedia.org/T270271 (10JMeybohm) p:05Triage→03Medium [11:05:04] jayme: got it. So an easy way to solve this is to specify per environment a value that specifies which "class" it belongs to. so say in staging-codfw adding classname: main (it did that in values/main.yaml but it could go anywhere) and then add a line in calico/helmfile.yaml to include in the values array values/{{ .Values.classname}}.yaml [11:05:13] tested on deploy1002 and staging-codfw and it works [11:05:43] the open questions are the best placement of that naming value and of course the naming of that naming value [11:06:11] Oh, nice. I was thinking about that to but convinced myself that .Values. is not accessible in that phase [11:06:57] as for the place: I would put it in admin_ng/helmfile.yaml directly [11:07:24] below each environment I mean. They should take values as well as values files [11:08:03] yup, I was gravitating towards that place too [11:08:29] but there will some repetition then [11:08:49] I am wondering whether we can use it though to reference which file to include [11:08:59] I think not cause it's still in the same layer of helmfile, but it's worth a try [11:09:31] It's repeating a bit, sure. But maybe it's worth the visibility [11:10:15] ok, I 'll rethink it a bit over lunch, but it seems like we have an easy way out [11:10:16] and for name...yeah...you know. :) ".cluster-class", "cluster-group", "cluster-cluster" :P [11:10:38] I was going for mantziafazoula :P [11:10:50] it's a given it will be almost unique [11:11:31] lolol [11:11:50] also I like that your name shows up properly in gerrit as a reviewer [11:12:03] where by "properly" I mean with the right script [11:13:59] mantziafazoula is fine. So it's "mantziafazoula: ml" and "mantziafazoula: tfkcropsimw" then :D [11:16:51] jayme: the difference is that mantziafazoula is actually pronouncable [11:16:57] while tfkcropsimw isnt [11:17:00] so, veto [13:20:40] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts... [13:36:18] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes, 10Patch-For-Review: Set resource requests and limits for calico PODs - https://phabricator.wikimedia.org/T277877 (10JMeybohm) a:03JMeybohm Added some defaults based on the current maximum values (https://grafana-rw.wikimedia.org/d/2AfU0X_Mz/jayme-ca... [13:49:07] 10serviceops, 10Performance-Team, 10Traffic: Decide on details of progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10Krinkle) [14:00:40] 10serviceops, 10SRE, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['parse2001.codfw.wmnet'] ` and were **ALL*... [14:16:38] apergos: yup, found out I can that today. It's awesome. I just hope ppl can still add me as a reviewer easily [14:17:19] they can add by email still I think [14:20:30] akosiaris: I was afraid at first, but it's still working as it would with ASCII characters (the completion still shows them as well) [14:21:35] \o/ [14:51:19] akosiaris: we should probably think about a way of not having to list all default admission controllers [14:53:09] you mean in hiera? yeah [14:53:44] helm3 is driving me crazy. It can find 1 chart from our repo, but not the other one ? insanity... [14:53:53] must be a botched local install or something [14:54:59] akosiaris: or even from the puppet class completely. I guess they don't need to be explicitely set [14:56:52] k8s::apiserver? Hmmm, if we don't pass --enable-admission-plugins, does kube-apiserver have that as a default ?cause we can do that [14:57:27] yeah it does. There is --disable-admission-plugins as well (to disable one of the defaults) [14:58:18] I thought it had something to do with the arguments to admission plugins (that they have to appear at a specific positon) but obviously that's not the case [15:00:04] well, yeah...how we do it is even really make sense now. I'll rework [15:00:52] *does not even make sense...whatever :) [15:06:46] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: kube-apiserver flag --admission-control has been deprecated - https://phabricator.wikimedia.org/T270063 (10JMeybohm) a:03JMeybohm We should take the chance and refactor this a bit. According to `kube-apiserver -h` we don't need to list th... [15:31:02] 10serviceops, 10Parsoid (Tracking), 10User-jijiki: Remove parsoidJS leftovers from production - https://phabricator.wikimedia.org/T279059 (10jijiki) [15:45:47] 10serviceops, 10SRE, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` ['wtp1025.eqiad.wmnet'] ` The log can be found in `/var/log/... [15:48:15] 10serviceops, 10SRE, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki) [15:48:28] 10serviceops, 10SRE, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10jijiki) [16:58:43] 10serviceops, 10SRE, 10Parsoid (Tracking), 10Patch-For-Review: Upgrade Parsoid servers to buster - https://phabricator.wikimedia.org/T268524 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp1025.eqiad.wmnet'] ` and were **ALL** successful. [17:28:53] 10serviceops, 10Scap, 10Release-Engineering-Team-TODO: Deploy Scap version 3.17.0-1 - https://phabricator.wikimedia.org/T279695 (10LarsWirzenius) [17:29:30] 10serviceops, 10Scap, 10Release-Engineering-Team-TODO: Deploy Scap version 3.17.0-1 - https://phabricator.wikimedia.org/T279695 (10LarsWirzenius) [18:05:29] 10serviceops, 10Scap, 10Release-Engineering-Team-TODO: Deploy Scap version 3.17.0-1 - https://phabricator.wikimedia.org/T279695 (10LarsWirzenius) We had a problem on beta; {{T279703}}. Until that's fixed, let's not not build or deploy the 3.17.0 release. [19:49:11] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team: Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) a:03kostajh [19:49:55] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) [20:02:11] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) Updated query information: Production, external traffic release (13 seconds) ` $ curl "https://ap... [20:06:09] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) >>! In T279411#6982721, @akosiaris wrote: >>>! In T279411#6981284, @kostajh wrote: >> @akosiaris @J... [20:11:26] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why querying is slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) p:05Triage→03Medium [20:11:40] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10kostajh) [20:32:30] 10serviceops: bring 10 new mediawiki appserver in codfw into production, new rack A5 (mw2402 - mw2411) - https://phabricator.wikimedia.org/T279599 (10Dzahn) [20:34:16] 10serviceops: bring 10 new mediawiki appserver in codfw into production, new rack A5 (mw2402 - mw2411) - https://phabricator.wikimedia.org/T279599 (10Dzahn) 05Open→03Resolved [23:09:57] 10serviceops, 10SRE, 10WMF-Annual-Report: Update annual.wikimedia.org redirect to point to 2020 Annual Report - https://phabricator.wikimedia.org/T279571 (10Dzahn)