[03:11:24] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [03:17:22] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) @jcrespo no IP change just switch port change [03:19:32] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [04:44:51] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10jijiki) [05:37:39] 10serviceops, 10Scap, 10Release-Engineering-Team (Radar): Deploy Scap version 3.17.1-1 - https://phabricator.wikimedia.org/T279695 (10jijiki) 05Open→03Resolved a:03jijiki @dancy it appears that rolling out scap coincided with the greek easter this time , sorry! Update complete. [08:07:19] profile::conftool::client pulls in python-socks (which is unavailable in Bullseye since Python 2), does anyone know what it's used for? I don't see it in conftool itself (wondering if it should get replaced with python3-socks or removed entirely) [09:00:22] moritzm: I don't know, but given that is the py2 version and we now ship python3-everywhere I doubt is something needed for conftool itself (it should be part of its dependency in that case) [09:00:33] does git blame helps at all? [09:02:35] digging around now [09:05:03] ah yes, so this was added to suppress a warning printed by urllib3: https://github.com/wikimedia/puppet/commit/cf23d518861cfe9e928fbf711a46b366ba0ae1e5 [09:06:47] given that conftool is now using Python 3 and we only installed the Py2 version of pysocks this doesn't seem needed anyway, I'll remove it [09:06:59] needed anymore [09:10:32] I concur™ :) [09:29:57] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint): Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10akosiaris) >>! In T279411#7061266, @kostajh wrote: > Another snapshot from today: > > {F34441646}... [10:04:23] <_joe_> yes +1 [10:06:19] <_joe_> moritzm: actually, maybe we need the py3 version too? [10:16:58] I'm not sure, conftool using Py3 for quite a while and the py3 counterpart was never added, I'd say we rather drop and it if we spot that spurious warning again in the wild it's easy to re-add [10:18:08] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint), 10Patch-For-Review: Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10akosiaris) Post deployment all 4 metrics (cpu/memory avg/maxes) look quite a... [10:21:34] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10jcrespo) @Papaul could you turn dbprov2002 back on when you finish all needed maintenance? That's all it will need to be back into service. Thank you. [10:26:45] <_joe_> moritzm: I think I did and I was like "didn't I fix it"? but ack [10:32:26] ack, if I see the error on cumin2002 (new bullseye cumin hosts), I'll readd python3-socks [10:33:46] what's the underlying issue? is something we could ignore in conftool? [10:37:26] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint), 10Patch-For-Review: Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10Tgr) >>! In T279411#7065251, @akosiaris wrote: > I 'll admit that with a lac... [10:41:55] volans: it's about the socks/urllib thing mentioned about [10:44:12] moritzm: what command can I run on cumin2002 to see it? [10:44:29] a sstandard confctl gives me conn refused [10:46:09] well, that's the issue, it's not known. python-socks it was added back in 2018 to suppress an urllib warning. So _if_ we see some urllib warning again on cumin2002 we can check whether installing python3-socks make a difference [10:46:42] I'd like to see the warning to understand what it is :) [10:48:01] sure, but in the absence of any warning, there's nothing to understadnd/fix to begin with :-) [10:48:42] maybe I misread the backlog but I thought j.oe had seen the warning recently [10:51:22] yeah, but that must have still been on buster-based cumin hosts and have two more years of upstream development the Python stack underneath conftool [10:54:00] sure :) [11:37:17] hnowlan: Thanks! [11:42:48] Is it my task to configure kube-proxy (or what not) to handle scaling of pods? Our project consist of a number of images that communicate internally via REST, and some of these are rather heavy on the CPU (speech synthesis). I'm rather uncertain as to how WMF prefere setting up scaling. Bundle all containers in a single pod and let HPA bring up more as they are needed? Single container pods [11:42:54] that bring up more replicas for the the resource heavy instances and communicated via kube-proxy? Something else? [11:43:54] FYI, we are not going to be deployed in the WMF infrastructure anytime soon, we are deplyoing on our (WMSE) own infrastucture for now, but we want to set it up to make it as seamless as possible if we would switch to WMF. [11:46:38] I can't seem to find any other helm chart in operations/deployment-chart that consist of a bundle of containers and scaling of specific once as we probably want to do it with Speechoid. [11:48:02] Is that correct, or am I missing something? [12:06:15] _joe_: another bit of puppet/Python 2 archeology when you have a moment: https://gerrit.wikimedia.org/r/685759 [12:13:20] <_joe_> moritzm: uhm yeah in fact we can probably remove it everywhere, as we actually use python3-etcd which is a dependency of python3-conftool [12:15:19] ack, I'll update the patch [13:06:38] 10serviceops, 10Add-Link, 10Data-Persistence (Consultation), 10Growth-Team (Current Sprint), 10Patch-For-Review: Determine why service responses are slow and what we can do about it - https://phabricator.wikimedia.org/T279411 (10akosiaris) >>! In T279411#7065373, @Tgr wrote: >>>! In T279411#7065251, @ako... [13:57:18] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [14:46:35] 10serviceops, 10Scap, 10Release-Engineering-Team (Radar): Deploy Scap version 3.17.1-1 - https://phabricator.wikimedia.org/T279695 (10dancy) @jijiki Thanks! [15:29:49] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [15:31:15] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [15:55:33] FYI an article about signed container images with cosign: https://security.googleblog.com/2021/05/making-internet-more-secure-one-signed.html [16:34:06] <_joe_> we bypass that whole issue by brewing everything at home [16:45:40] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [16:58:55] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10RKemper) [17:31:18] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [19:20:18] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10BBlack) [22:11:27] 10serviceops, 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) @BBlack i had meetings from 12:30 pm to 4PM so I didn't have the chance to work on the cp nodes. You can re-pool those since i will not be able to get back on those until th...