[01:39:17] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install conf200[456].codfw.wmnet - https://phabricator.wikimedia.org/T275637 (10Papaul) [04:42:06] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10Legoktm) [07:48:58] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10Joe) I agree that it would make sense for anyone with global root to also be able to manage CI, but it... [08:46:27] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jijiki) >>! In T255568#6858884, @akosiaris wrote: >>>! In T255568#6781621, @jijiki wrote: > It's not just "on the wire", it can and will happen even over localhost.... [09:01:33] 10serviceops, 10MediaWiki-extensions-OAuth, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Tgr) Tagging serviceops because this seems to be caused by high redis error rat... [09:34:48] <_joe_> effie: ^^ the errors have increased since jan 15th [09:34:48] <_joe_> does that coincide with mc1024 failing? [09:34:48] <_joe_> did we remove mc1024 from nutcracker? [09:34:48] no, but nutcracker should eject it [09:34:48] lety me read the task [09:46:58] kubelet is moving to structured logging. https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md [09:47:01] That's nice news [09:47:54] KEP's here https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#beta [09:52:23] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jbond) Just a note this was added to the main envoy config on [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/612603/2/modules/envoyproxy/templates/tls_term... [10:00:12] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jbond) p:05Triage→03Medium [10:19:07] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10jijiki) We lost mc1024 (shard06 on redis) on Jan14, which... [10:25:49] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jbond) I ran a quick (and likley error prone) script to see which other daemons listen with mapped addresse ` $ sudo cumin -o json 'A:all' "ss -ltp6 | awk '$ 4 ~ /... [11:06:55] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Joe) >>! In T272319#6860273, @jijiki wrote: > We lost mc1... [11:09:25] 10serviceops, 10Add-Link, 10Growth-Team, 10Patch-For-Review: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10kostajh) Our side of this is done, so I'm moving into External for us to keep an eye on. [11:10:46] 10serviceops, 10Add-Link, 10Growth-Team, 10Patch-For-Review: Add Link engineering: Allow external traffic to linkrecommendation service - https://phabricator.wikimedia.org/T269581 (10kostajh) a:05kostajh→03hnowlan @hnowlan I'm assigning to you as you're working on the patch(es) to make this work. Thanks! [11:11:59] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10akosiaris) >>! In T255568#6860123, @jbond wrote: > Just a note this was added to the main envoy config on [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/61... [11:31:20] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jbond) > 2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it at least. Ah thanks, i had only skimmed my... [11:43:36] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Joe) a:03Joe After merging my change, the number of err... [11:49:28] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Joe) Further update: in the next 10 minutes, after the du... [11:52:03] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Vort) My bot run finished successfully a minute ago. [12:50:06] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10akosiaris) >>! In T255568#6860521, @jbond wrote: >> 2.5.5.1 is what I was referring to. It's unfortunately confusingly named too. Seems like I managed to misread it... [12:52:21] 10serviceops, 10envoy, 10observability, 10User-fgiunchedi: Envoy should listen on ipv6 and ipv4 - https://phabricator.wikimedia.org/T255568 (10jbond) > True, I think it's a bug as well. But one that needs to be solve multiple times across multiple languages/frameworks agree and doesn't help with other issu... [13:11:51] 10serviceops, 10ops-eqiad: decommission scb100[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T275759 (10akosiaris) [13:12:02] 10serviceops, 10ops-eqiad: decommission scb100[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T275759 (10akosiaris) p:05Triage→03Medium [13:13:13] 10serviceops, 10ops-codfw: decommission scb200[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T275760 (10akosiaris) [13:13:26] 10serviceops, 10ops-codfw: decommission scb200[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T275760 (10akosiaris) p:05Triage→03Medium [13:13:35] <_joe_> that's UBN! akosiaris [13:13:47] <_joe_> before someone changes their minds :D [13:24:26] 10serviceops, 10SRE: Support proxying to etcd v3 storage on buster or later - https://phabricator.wikimedia.org/T275600 (10jbond) p:05Triage→03Medium [13:26:15] lol [13:26:28] heads up btw, I am killing staging-codfw cluster [13:26:40] hopefully I am also resurrecting it after that [14:31:56] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Joe) 05Open→03Resolved I got a few reports of bots no... [15:09:52] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) p:05Triage→03Medium [15:15:19] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10akosiaris) https://github.com/helm/helm/issues/8271 says that --recreatepods won't work in helm3, we need to find an alternative. [15:21:09] 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes: Kubernetes 1.16 dropped deprecated cadvisor metric labels pod_name and container_name - https://phabricator.wikimedia.org/T275618 (10fgiunchedi) Just chiming in to say that FWIW I concur with the `or` usage trick ! [15:26:34] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10thcipriani) >>! In T275731#6859724, @Joe wrote: > I agree that it would make sense for anyone with glob... [15:32:09] 10serviceops, 10MediaWiki-extensions-OAuth, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10cloud-services-team (Kanban): Frequent "Nonce already used" errors in scripts and tools - https://phabricator.wikimedia.org/T272319 (10Tgr) >>! In T272319#6860581, @Joe wrote: > Having said th... [15:59:58] 10serviceops, 10Discovery-Search, 10Maps, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): [OSM] Backport imposm3 to the debian channel - https://phabricator.wikimedia.org/T238753 (10MSantos) [16:11:39] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE, 10Patch-For-Review: legoktm can't build CI docker images without using root because he's no longer in contint-admins - https://phabricator.wikimedia.org/T275731 (10jbond) > would adding *contint_roots_members explicitly to contint-admin with a c... [21:41:05] 10serviceops: Phase out legacy "uploader" docker-registry.wikimedia.org user - https://phabricator.wikimedia.org/T275581 (10Legoktm) [21:41:17] 10serviceops, 10Patch-For-Review: Switch deneb.codfw.wmnet image building to use "prod-build" user for docker-registry pushes - https://phabricator.wikimedia.org/T275582 (10Legoktm) 05Open→03Resolved a:03Legoktm [21:46:57] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Switch CI docker usage to use dedicated "ci-build" account - https://phabricator.wikimedia.org/T275559 (10Legoktm) > but are the credentials anywhere else outside of puppet like in Jenkins? https://integ... [23:52:06] 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)): Replace production deployment servers and update them to Buster - https://phabricator.wikimedia.org/T265963 (10Dzahn)