[06:44:12] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10916704 (10Volans) > Why is reimaging messing with those addresses at all? @Volans says that it's b... [06:52:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:53:37] (03PS1) 10Muehlenhoff: Remove obsolete keytab [labs/private] - 10https://gerrit.wikimedia.org/r/1159125 [06:56:26] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete keytab [labs/private] - 10https://gerrit.wikimedia.org/r/1159125 (owner: 10Muehlenhoff) [06:57:51] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:58:33] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916708 (10Wangombe) [07:07:45] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916740 (10Wangombe) @Nokib_Sarkar please go ahead and give @translatewiki commit access to CampWiz NXT repos... [07:41:20] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916831 (10Nokib_Sarkar) @Wangombe Added translatewiki as collaborator. [07:44:50] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916835 (10abi_) >>! In T393850#10916831, @Nokib_Sarkar wrote: > @Wangombe Added translatewiki as collaborato... [07:56:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10916877 (10Aklapper) All linked patches are merged or closed. Should this remain opened or can this be resolved? [07:57:08] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916881 (10Nokib_Sarkar) [07:57:44] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916882 (10Nokib_Sarkar) I have one question, is it possible to redirect the [[ https://translatewiki.net/wik... [08:15:16] (03CR) 10Alexandros Kosiaris: [V:03+2 C:03+2] Remove old docker_registry_ha hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/1156762 (https://phabricator.wikimedia.org/T390251) (owner: 10Alexandros Kosiaris) [08:18:59] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10916937 (10Nokib_Sarkar) Some translations were copied from English in the source code for technical reasons.... [08:19:13] 06cloud-services-team, 10Cloud-VPS: consider storing information on cloud NAT mappings - https://phabricator.wikimedia.org/T273734#10916939 (10taavi) [08:23:24] (03Abandoned) 10Alexandros Kosiaris: Rename docker_registry_ha's occurrences to docker_registry [labs/private] - 10https://gerrit.wikimedia.org/r/1155601 (https://phabricator.wikimedia.org/T390251) (owner: 10Elukey) [08:23:28] (03open) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [08:23:38] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10916955 (10dcaro) a:03dcaro [08:29:35] 06cloud-services-team, 10Cloud-VPS, 07Epic: Replace remaining IPv4 NAT exemptions by IPv6 adoption - https://phabricator.wikimedia.org/T396986 (10taavi) 03NEW [08:30:06] 06cloud-services-team, 10Cloud-VPS, 13Patch-Needs-Improvement: Change routing to ensure that traffic originating from Cloud VPS is seen as non-private IPs by Wikimedia wikis - https://phabricator.wikimedia.org/T209011#10917003 (10taavi) {T396986} should be considered the spiritual successor of this task. [08:31:27] 06cloud-services-team, 10Cloud-VPS, 07Epic: Replace remaining IPv4 NAT exemptions by IPv6 adoption - https://phabricator.wikimedia.org/T396986#10917011 (10taavi) [08:31:32] 06cloud-services-team, 10Toolforge, 07IPv6, 07Kubernetes: Support IPv6 in Toolforge Kubernetes - https://phabricator.wikimedia.org/T380060#10917012 (10taavi) [08:35:15] 14cloud-services-team (Kanban), 10Cloud-VPS, 13Patch-For-Review: cloud: current nova-fullstack mechanism requires cloudcontrol nodes to access individual VMs - https://phabricator.wikimedia.org/T272587#10917023 (10taavi) [08:42:52] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991 (10taavi) 03NEW [08:46:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:46:19] 06cloud-services-team, 10Cloud-VPS, 07Epic, 13Patch-For-Review: Cloud: reduce NAT exceptions from production to cloud - https://phabricator.wikimedia.org/T272585#10917139 (10taavi) I suspect the only remaining traffic here is for some Zuul2 workflows. So we/I should revisit this once the Zuul3 migration is... [08:49:31] (03update) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [09:20:25] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283#10917409 (10fnegri) 05Open→03Resolved @Aklapper thanks for the ping. I think this can be resolved, we can create new tasks if we want to do further improvements. [09:26:10] (03update) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [09:28:27] (03update) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [09:31:04] (03update) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [09:31:14] (03open) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [09:37:26] (03update) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [09:37:43] (03update) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [09:48:35] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10917561 (10dcaro) 05Open→03In progress [09:52:51] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [10:01:45] (03update) 10dcaro: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) (owner: 10raymond-ndibe) [10:03:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-puppetdb-03 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:06:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-puppetdb-2 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:07:26] (03update) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [10:13:53] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [10:13:58] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [10:17:07] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [10:17:11] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [10:22:25] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Store logs for Cloud VPS egress NAT mappings - https://phabricator.wikimedia.org/T273734#10917659 (10taavi) [10:48:28] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-46 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:53:24] FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-46 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [10:56:08] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10917734 (10dcaro) [10:56:44] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Store logs for Cloud VPS egress NAT mappings - https://phabricator.wikimedia.org/T273734#10917736 (10taavi) a:03taavi [10:58:02] (03approved) 10taavi: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) (owner: 10dcaro) [10:58:24] RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-46 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [10:58:28] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-46 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:33:58] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-puppetdb-2 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:35:58] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-puppetdb-03 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:43:53] 10Quarry: [bug] "Internal Server Error" when logging into Quarry - https://phabricator.wikimedia.org/T333043#10917896 (10SD0001) 05Open→03Resolved The root cause was probably T332650, which is not Quarry-related and long since resolved. >>! In T333043#8726691, @Tgr wrote: > Quarry should probably be fix... [12:13:32] (03approved) 10dcaro: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) (owner: 10raymond-ndibe) [12:15:23] (03merge) 10dcaro: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) (owner: 10raymond-ndibe) [12:15:24] (03update) 10dcaro: [deploy] support health-checks and port [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/75 (https://phabricator.wikimedia.org/T362072) (owner: 10raymond-ndibe) [12:17:57] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.117-20250616121532-66131f90 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/815 (https://phabricator.wikimedia.org/T362072) [12:21:57] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, and 2 others: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10918036 (10Andrew) [12:24:16] 06cloud-services-team, 10Cloud-VPS: Import Fedora CoreOS 42 image for use with Magnum - https://phabricator.wikimedia.org/T396912#10918045 (10Andrew) I am interested, but right now I'm trying to get the new capi-helm driver online which will change everything (including using different base images). I'll circ... [12:24:21] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1159435 (owner: 10L10n-bot) [12:24:24] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1159436 (owner: 10L10n-bot) [12:39:15] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 [12:45:22] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 [12:47:36] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [12:48:51] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:58:53] (03update) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [13:00:01] !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:04:21] (03approved) 10dcaro: components-api: bump to 0.0.117-20250616121532-66131f90 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/815 (https://phabricator.wikimedia.org/T362072) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:04:25] (03merge) 10dcaro: components-api: bump to 0.0.117-20250616121532-66131f90 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/815 (https://phabricator.wikimedia.org/T362072) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:04:39] (03update) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [13:06:02] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Add support for port/healthcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10918201 (10dcaro) [13:06:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [13:06:20] (03merge) 10dcaro: prometheus: use multiproc stats [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/85 (https://phabricator.wikimedia.org/T394275) [13:06:24] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Add support for port/healthcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10918205 (10dcaro) 05In progress→03Resolved [13:09:06] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.118-20250616130629-11e499d8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/816 (https://phabricator.wikimedia.org/T394275) [13:29:49] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:29:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:46:54] !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:52:40] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10918634 (10cmooney) > Why just those four hosts, when I re-imaged more than four? If I had to gue... [14:04:37] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10918674 (10abi_) > Or at least, rebase from my current commit where all the redundant translations are remove... [14:12:15] 06cloud-services-team, 10Striker: Striker LibUp runs failing due to weird handling of .dockerignore - https://phabricator.wikimedia.org/T397044 (10taavi) 03NEW [14:22:04] (03approved) 10dcaro: components-api: bump to 0.0.118-20250616130629-11e499d8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/816 (https://phabricator.wikimedia.org/T394275) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:22:20] (03merge) 10dcaro: components-api: bump to 0.0.118-20250616130629-11e499d8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/816 (https://phabricator.wikimedia.org/T394275) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:22:26] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [14:23:18] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10918790 (10Andrew) @Jhancock.wm any more blockers to this? There's no actual rush although finishing this will help me a bit with T309789 as it will allow... [15:15:21] 10Cloud-VPS (Quota-requests): Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059 (10ayounsi) 03NEW [15:15:38] 10Cloud-VPS (Quota-requests): Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059#10919120 (10ayounsi) [15:18:40] 10Cloud-VPS (Quota-requests): Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059#10919132 (10Andrew) +1 [15:19:56] !log taavi@cloudcumin1001 automation-framework START - Cookbook wmcs.openstack.quota_increase (T397059) [15:19:59] T397059: Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059 [15:20:02] !log taavi@cloudcumin1001 automation-framework END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397059) [15:20:20] 10Cloud-VPS (Quota-requests): Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059#10919141 (10taavi) a:03taavi [15:20:34] 10Cloud-VPS (Quota-requests): Floating IP request for diffscan - https://phabricator.wikimedia.org/T397059#10919142 (10taavi) 05Open→03Resolved [15:40:09] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10919254 (10fnegri) [15:48:48] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [15:51:10] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10919349 (10fnegri) clouddb1013 upgraded and repooled. I have now //depooled// clouddb1017 so all traffic for `s1` and `s3... [15:57:27] (03open) 10dcaro: create runtime [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [15:58:35] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [16:23:29] (03update) 10dcaro: create runtime [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [16:28:12] (03update) 10dcaro: create runtime [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [16:31:24] (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [16:32:35] (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [16:33:04] (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [16:35:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudcephmon2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [16:40:50] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [16:42:10] (03update) 10addshore: builds: Introduce _get_status_style with default fallback [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/109 [16:42:16] (03update) 10addshore: builds: Introduce _get_status_style with default fallback [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/109 [16:43:20] (03approved) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [16:43:31] (03update) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [16:43:31] (03merge) 10dcaro: jobs-emailer: add alert when no emails are sent [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/32 (https://phabricator.wikimedia.org/T396850) [16:44:18] (03approved) 10dcaro: toolforge_deploy_mr: force install the packages [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/247 [16:44:27] (03merge) 10dcaro: toolforge_deploy_mr: force install the packages [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/247 [16:50:33] (03update) 10dcaro: logging: Init component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [16:57:03] FIRING: [2x] JobsEmailerNoEmails: No emails sent in the last five hours - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [17:03:58] (03update) 10dcaro: logging: Init component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [17:10:10] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [17:10:27] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [17:13:09] (03open) 10dcaro: Draft: api: add idp authentication [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/69 [17:14:09] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [17:15:14] (03open) 10dcaro: d/changelog: bump to 0.0.21 [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/110 [17:18:23] FIRING: AlertLintProblem: Linting problems found for JobsEmailerNoEmails - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [17:28:23] RESOLVED: AlertLintProblem: Linting problems found for JobsEmailerNoEmails - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://prometheus-alerts.wmcloud.org/?q=alertname%3DAlertLintProblem [17:29:12] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [17:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:31:10] !log dcaro@acme toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli [17:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:32:43] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [17:32:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:35:21] !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli [17:35:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [17:38:32] !log dcaro@acme tools START - Cookbook wmcs.toolforge.component.deploy for component builds-cli [17:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:41:34] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli [17:41:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:42:30] (03approved) 10dcaro: d/changelog: bump to 0.0.21 [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/110 [17:42:33] (03merge) 10dcaro: d/changelog: bump to 0.0.21 [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/110 [17:44:33] RESOLVED: [2x] JobsEmailerNoEmails: No emails sent in the last five hours - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [17:55:33] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Magnum created instances failing to talk to OpenStack user_data service - https://phabricator.wikimedia.org/T396935#10919962 (10bd808) 05Open→03Invalid This seems to have been a transient failure. Let's shake... [19:11:56] FIRING: SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1002-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:19:21] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920138 (10Mhurd) >>! In T395837#10888171, @Andrew wrote: > Is there a reason that Cinder volumes can't address your storage needs? Yes Whether it's a good reason... possi... [19:21:39] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920142 (10Mhurd) 05Declined→03Open Re-opening with quick comment above [19:23:25] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920149 (10Mhurd) Please keep in mind I am quite new to all of this so it's highly likely there are easier ways/methods/approaches to achieve what I'm trying that I am simply un... [19:39:21] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920231 (10Mhurd) [19:39:47] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920234 (10Mhurd) >>! In T395837#10885894, @Raymond_Ndibe wrote: > Hello @Mhurd can you structure you request a bit like the example below to enable us understand exactly what y... [19:45:34] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920267 (10Mhurd) [19:56:56] FIRING: [2x] SystemdUnitDown: The service unit backup_cinder_volumes.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:13:04] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920358 (10bd808) [[ https://wikitech.wikimedia.org/wiki/Help:Puppet | Puppet ]] and [[ https://wikitech.wikimedia.org/wiki/Help:Using_OpenTofu_on_Cloud_VPS | OpenTofu ]] are mo... [20:38:21] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storge quota for zuul project - https://phabricator.wikimedia.org/T397098 (10bd808) 03NEW [20:38:44] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storge quota for zuul project - https://phabricator.wikimedia.org/T397098#10920512 (10bd808) [20:42:31] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storge quota for zuul project - https://phabricator.wikimedia.org/T397098#10920523 (10Andrew) +1 [20:44:24] 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storge quota for zuul project - https://phabricator.wikimedia.org/T397098#10920527 (10bd808) [20:44:54] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storge quota for zuul project - https://phabricator.wikimedia.org/T397098#10920529 (10bd808) [20:50:36] 10Toolforge (Toolforge iteration 21), 07good first task: [components-cli] make `toolforge components deployment show` show the latest deployment if no id passed - https://phabricator.wikimedia.org/T394994#10920539 (10Chuckonwumelu) a:03Chuckonwumelu [20:57:57] (03open) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [21:01:34] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [21:06:56] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1002-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:07:01] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1002-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397100 (10phaultfinder) 03NEW [21:08:55] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade hosts to bullseye - https://phabricator.wikimedia.org/T309789#10920605 (10Andrew) [21:51:56] FIRING: SystemdUnitDown: The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:52:02] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105 (10phaultfinder) 03NEW [22:12:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:17:51] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:40:43] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920935 (10Mhurd) Thanks? [23:44:40] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920940 (10Mhurd) Bryan to be clear should I not use my Cloud config script? As I mentioned I’m a newbie here 😆 [23:47:00] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920942 (10Mhurd) I liked the approach as it takes all of 30 seconds to use to bring up my entire project - just wanted a few more GB of space [23:52:32] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10920943 (10Mhurd) Was hoping to avoid using a Tofu script, but since they can also be given cloud-config scripts perhaps I should resign myself to it 🤔