[00:04:23] !log root@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [00:09:39] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [00:14:58] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-etcd-33 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:25:57] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [00:27:07] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:37:33] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [00:37:49] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:38:52] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [00:39:54] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:40:59] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [00:44:02] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:45:06] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [00:45:29] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:46:32] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [00:52:47] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [00:53:51] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [01:04:11] 10VPS-project-Codesearch, 06collaboration-services: confd fails with "no such host" in SRV lookup from _etcd-client-ssl._tcp.codesearch.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T417458#11671542 (10Dzahn) in the template for the systemd unit: ` <%- if @scheme == "https" -%> Environment="CONFD... [01:08:48] 06cloud-services-team, 10Toolforge (Toolforge iteration 25): [fourohfour] general unavailability / overload - https://phabricator.wikimedia.org/T418829#11671546 (10bd808) >>! In T418829#11667295, @dcaro wrote: > That tool is redirected from the proxies too: > https://codesearch.wmcloud.org/search/?q=quentinv57... [01:12:55] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [01:13:59] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [01:18:10] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [01:19:14] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [01:19:29] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/22650516003 (https://github.com/cluebotng/component-configs/commits/e7a1e2e06f2ccf038c06cb203369f336c298cf6c) [01:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [01:21:25] (03open) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [01:21:43] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [01:27:29] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance extdist-06 in project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [01:49:32] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [01:50:37] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [01:52:30] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [01:53:05] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [01:57:00] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [01:57:33] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:03:08] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [02:03:43] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:23:11] FIRING: [2x] SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:36:42] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [02:37:16] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:43:54] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [02:47:07] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [02:49:37] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [02:50:11] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [03:05:42] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [03:06:16] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [03:08:40] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [03:09:14] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [03:31:35] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [03:32:09] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [03:55:41] 10Cloud-VPS (Project-requests): Request creation of azwikimedia VPS project - https://phabricator.wikimedia.org/T417736#11671678 (10Nemoralis) @Andrew, thank you for the answer. Yes, I am sure that we need a project to manage the communication infrastructure and the project will be limited to this only. Also, if... [04:03:23] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:03:57] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [04:04:21] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:04:55] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [04:05:09] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:13:59] !log root@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [04:15:45] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [04:26:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-33 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:32:30] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [04:36:17] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [04:37:21] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [04:41:36] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [04:46:24] !log root@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=0) [04:46:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-33 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [04:47:09] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:47:45] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [04:48:24] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:48:58] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [04:49:57] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [04:50:30] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [05:47:14] (03open) 10legoktm: Draft: Require OAuth login to use the tool [toolforge-repos/otd-helper] - 10https://gitlab.wikimedia.org/toolforge-repos/otd-helper/-/merge_requests/4 [05:50:03] (03update) 10legoktm: Draft: Require OAuth login to use the tool [toolforge-repos/otd-helper] - 10https://gitlab.wikimedia.org/toolforge-repos/otd-helper/-/merge_requests/4 [05:50:20] (03update) 10legoktm: Require OAuth login to use the tool [toolforge-repos/otd-helper] - 10https://gitlab.wikimedia.org/toolforge-repos/otd-helper/-/merge_requests/4 [06:23:11] FIRING: [2x] SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:25:20] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [07:02:56] RESOLVED: [2x] SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:41:33] FIRING: [4x] ProbeDown: Service tools-k8s-haproxy-8:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:46:33] RESOLVED: [4x] ProbeDown: Service tools-k8s-haproxy-8:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:06:56] FIRING: ProbeDown: Service tools-k8s-haproxy-8:443 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:11:56] RESOLVED: [4x] ProbeDown: Service tools-k8s-haproxy-8:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:45:56] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11672034 (10fgiunchedi) I spoke with @cmooney today and got Tues March 10th in Europe morning as a day to carry out tests in C8 [08:55:12] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11672052 (10fgiunchedi) [08:56:56] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Increased openstack latency and rabbitmq rolling restarts on certificate update - https://phabricator.wikimedia.org/T418444#11672055 (10fgiunchedi) I have disabled automated roll-restart for rabbit on cert renewal, and made a note to verify certs are inde... [09:19:18] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/22662750872 (https://github.com/cluebotng/component-configs/commits/01746ef8804c30c85963ea888a75887ebe879e3b) [09:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [09:35:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 25): [fourohfour] general unavailability / overload - https://phabricator.wikimedia.org/T418829#11672218 (10dcaro) There was one restart of one pod during the night that caused an alert email to be sent. I merged @bd808's patch to remove that specific to... [09:51:27] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (Google Summer of Code (2026)): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11672280 (10Dipanshu1223) hi @Nokib_Sarkar and @Tiven2240 can I work microtask now ? or wait for march 16 after work on microtask ? @LGoto say some mentor ok... [09:52:38] 06cloud-services-team, 10Toolforge: toolforge-deploy tests failure: Your local changes to the following files would be overwritten by checkout: components/jobs-api/2025_04_migration_of_all_jobs_to_version_2 - https://phabricator.wikimedia.org/T418897#11672284 (10dcaro) 05Open→03Resolved p:05Triage→0... [11:28:37] 06cloud-services-team, 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [fourohfour] general unavailability / overload - https://phabricator.wikimedia.org/T418829#11672735 (10dcaro) Note that since I enabaled request logging yesterday, from the last 100k requests, >50k are for that specific host nam... [12:52:03] (03merge) 10taavi: ingress-admission: bump to 0.0.85-20260303181419-22df37af [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1161 (https://phabricator.wikimedia.org/T418276) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:53:17] (03update) 10dcaro: functional_tests: added test to check for sha [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1137 (https://phabricator.wikimedia.org/T417503) [12:53:48] (03merge) 10dcaro: functional_tests: added test to check for sha [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1137 (https://phabricator.wikimedia.org/T417503) [12:53:51] (03merge) 10dcaro: docs: update deployment docs [repos/cloud/toolforge/registry-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/registry-admission/-/merge_requests/36 [12:53:58] (03merge) 10dcaro: docs: update deployment instructions [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/70 [12:54:03] (03merge) 10dcaro: docs: update deployment docs [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/44 [12:54:07] (03merge) 10dcaro: docs: added standard deployment notes [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/20 [12:56:27] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: image-config: bump to 0.0.24-20260304125431-e51ef3ab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1162 (https://phabricator.wikimedia.org/T407477) [12:56:31] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: image-config: bump to 0.0.24-20260304125431-e51ef3ab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1162 (https://phabricator.wikimedia.org/T407477) [12:59:55] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: volume-admission: bump to 0.0.82-20260304125428-06ff775e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1163 [13:00:18] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: registry-admission: bump to 0.0.73-20260304125424-c5fd960d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1164 (https://phabricator.wikimedia.org/T407477) [13:00:23] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: registry-admission: bump to 0.0.73-20260304125424-c5fd960d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1164 (https://phabricator.wikimedia.org/T407477) [13:03:47] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: envvars-api: bump to 0.0.82-20260304125430-4d1e4c78 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1165 (https://phabricator.wikimedia.org/T407477) [13:03:56] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: envvars-api: bump to 0.0.82-20260304125430-4d1e4c78 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1165 (https://phabricator.wikimedia.org/T407477) [13:05:11] (03open) 10dcaro: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 [13:05:25] (03update) 10dcaro: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 [13:11:49] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-api [13:12:46] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api [13:15:17] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-api [13:16:31] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api [13:19:25] (03approved) 10dcaro: envvars-api: bump to 0.0.82-20260304125430-4d1e4c78 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1165 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:19:31] (03merge) 10dcaro: envvars-api: bump to 0.0.82-20260304125430-4d1e4c78 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1165 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:19:40] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component registry-admission [13:28:11] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission [13:32:49] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component registry-admission [13:38:18] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [docs] update all readmes with the same deployment docs - https://phabricator.wikimedia.org/T407477#11673199 (10dcaro) [13:42:17] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission [13:43:22] (03approved) 10dcaro: registry-admission: bump to 0.0.73-20260304125424-c5fd960d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1164 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:43:25] (03update) 10dcaro: registry-admission: bump to 0.0.73-20260304125424-c5fd960d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1164 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:43:40] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [13:43:52] (03merge) 10dcaro: registry-admission: bump to 0.0.73-20260304125424-c5fd960d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1164 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:45:59] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission [13:46:56] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component volume-admission [13:48:24] 06cloud-services-team, 10Cloud-VPS: cloudgw2004-dev service implementation - https://phabricator.wikimedia.org/T418765#11673241 (10taavi) [13:49:41] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission [13:50:59] (03approved) 10dcaro: volume-admission: bump to 0.0.82-20260304125428-06ff775e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1163 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:51:07] (03update) 10dcaro: volume-admission: bump to 0.0.82-20260304125428-06ff775e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1163 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:51:33] (03merge) 10dcaro: volume-admission: bump to 0.0.82-20260304125428-06ff775e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1163 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [13:51:37] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component image-config [13:59:53] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config [14:00:10] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component image-config [14:01:31] (03open) 10dcaro: ci: added vale documentation linter [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/156 [14:01:46] (03update) 10dcaro: ci: added vale documentation linter [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/156 [14:07:58] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config [14:08:45] (03approved) 10dcaro: image-config: bump to 0.0.24-20260304125431-e51ef3ab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1162 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:08:48] (03update) 10dcaro: image-config: bump to 0.0.24-20260304125431-e51ef3ab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1162 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:09:17] (03merge) 10dcaro: image-config: bump to 0.0.24-20260304125431-e51ef3ab [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1162 (https://phabricator.wikimedia.org/T407477) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:13:20] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [components-api] updated image is not restarted - https://phabricator.wikimedia.org/T417503#11673390 (10DamianZaremba) Checked my deployments yesterday, all appear to work as expected with the digest. Th... [14:18:40] (03approved) 10taavi: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 (owner: 10dcaro) [14:25:24] (03update) 10raymond-ndibe: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 (owner: 10dcaro) [14:25:34] (03approved) 10raymond-ndibe: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 (owner: 10dcaro) [14:44:17] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: add more logs tests to toolforge-deploy - https://phabricator.wikimedia.org/T418326#11673551 (10Raymond_Ndibe) [14:54:32] 06cloud-services-team, 10Toolforge: Port fourohfour tool to new Gateway system - https://phabricator.wikimedia.org/T418479#11673598 (10taavi) p:05Triage→03High [14:54:50] 06cloud-services-team, 10Cloud-VPS: grafana.wmcloud.org unavailable - failed db migration - https://phabricator.wikimedia.org/T418831#11673604 (10taavi) p:05Triage→03Medium [14:55:11] 06cloud-services-team, 10PAWS, 06tools-platform-team: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T418629#11673605 (10taavi) [14:55:14] 10Wikibugs, 10Gerrit: Wikibugs should ignore `check experimental` messages for operations/puppet - https://phabricator.wikimedia.org/T417866#11673607 (10MLechvien-WMF) Removing the SRE tag as this is a feature request for Wikibugs, feel free to add back if we can help [14:57:59] 06cloud-services-team: changedetection-io tool not working as expected - https://phabricator.wikimedia.org/T416738#11673620 (10taavi) 05Open→03Invalid Does not seem to be an issue with Toolforge infrasturcture, thus declining. Please see the links mentioned above and refer to our support channels if you... [15:00:44] 06cloud-services-team, 10Quarry: Quarry should support connecting to alternative extension databases - https://phabricator.wikimedia.org/T395800#11673654 (10dcaro) p:05Triage→03Low [15:00:46] 06cloud-services-team, 10Quarry: Quarry should support connecting to alternative extension databases - https://phabricator.wikimedia.org/T395800#11673655 (10fnegri) p:05Low→03Medium [15:01:33] 06cloud-services-team, 10Quarry: Quarry should support connecting to alternative extension databases - https://phabricator.wikimedia.org/T395800#11673656 (10fnegri) p:05Medium→03Low [15:05:05] (03update) 10taavi: Deploy istio and a Gateway [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1143 (https://phabricator.wikimedia.org/T418274) [15:20:52] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [15:21:25] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [15:28:05] 10Toolforge (Toolforge iteration 25): [harbor,tools] Harbor object usage in S3 is steadily increasing - https://phabricator.wikimedia.org/T418528#11673787 (10dcaro) >>! In T418528#11668662, @fnegri wrote: > Looking at the rate of increase, we have about 10 days before we reach the current limit. > > Shall we bu... [15:34:37] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [15:35:11] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [15:39:27] (03approved) 10dcaro: Allow tools to manage HTTPRoute resources [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/83 (https://phabricator.wikimedia.org/T418276) (owner: 10taavi) [15:41:03] (03merge) 10taavi: Allow tools to manage HTTPRoute resources [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/83 (https://phabricator.wikimedia.org/T418276) [15:43:53] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: maintain-kubeusers: bump to 0.0.192-20260304154115-f23cb8df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1166 (https://phabricator.wikimedia.org/T418276) [15:44:02] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: maintain-kubeusers: bump to 0.0.192-20260304154115-f23cb8df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1166 (https://phabricator.wikimedia.org/T418276) [15:44:19] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [16:00:09] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:00:43] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:01:38] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [16:01:49] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [16:02:18] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:02:53] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:03:04] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [16:04:00] (03merge) 10taavi: maintain-kubeusers: bump to 0.0.192-20260304154115-f23cb8df [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1166 (https://phabricator.wikimedia.org/T418276) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:04:24] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Update access control logic for migration from Ingress to Gateway API objects - https://phabricator.wikimedia.org/T418276#11673982 (10taavi) [16:04:34] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Update access control logic for migration from Ingress to Gateway API objects - https://phabricator.wikimedia.org/T418276#11673983 (10taavi) 05Open→03Resolved [16:06:34] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: add more logs tests to toolforge-deploy - https://phabricator.wikimedia.org/T418326#11674007 (10dcaro) > either add tests for the logs endpoint in jobs-api, refactor the logs endpoint to start using logs-api behind the wheels, or remove the endpoint com... [16:18:13] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:18:47] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:18:54] (03open) 10dcaro: toolforge_get_versions: use component for td version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1167 [16:19:18] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:19:51] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:20:18] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:20:45] (03update) 10dcaro: toolforge_get_versions: use component for td version [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1167 [16:20:52] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:21:13] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:30:05] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [16:30:43] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:32:58] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (Google Summer of Code (2026)): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11674124 (10LGoto) The mentors for this project have asked that their contribution period start on March 2, so it is ok to begin submitting your work. [16:34:48] 10Tool-campwiz-nxt: CampWiz Nxt Redesign: Root Path - https://phabricator.wikimedia.org/T415408#11674132 (10Sanjaydevs) Hi @Nokib_Sarkar and @Tiven2240 . I've tried to run the next.js project, but it returns **Error : HTTP error! status: 403, message:** in the home and login page and //"i18next::translator: mis... [16:37:52] 10Tool-campwiz-nxt: CampWiz Nxt Redesign: Root Path - https://phabricator.wikimedia.org/T415408#11674140 (10Sanjaydevs) also, the **//CampaignEdit.tsx//** file has been migrated. So are there any other pages that need work which can be assigned as a MICROTASK ? [16:40:11] !log root@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [16:52:28] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (Google Summer of Code (2026)): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11674172 (10Dipanshu1223) @LGoto ok [17:11:07] FIRING: ToolsDBReplicationLagIsTooHigh: ToolsDB replication is lagging - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationLagIsTooHigh [17:18:14] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node [17:29:18] (03merge) 10dcaro: Revert "runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2" [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/276 [17:32:28] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.468-20260304172934-e06d0070 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1168 (https://phabricator.wikimedia.org/T359649) [17:32:35] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.468-20260304172934-e06d0070 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1168 (https://phabricator.wikimedia.org/T359649) [17:37:47] (03update) 10dcaro: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (owner: 10raymond-ndibe) [17:38:30] !log root@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [17:38:33] (03update) 10dcaro: core: add prometheus counter for jobs synced from runtime [repos/cloud/toolforge/jobs-api] (use_custom_resources_in_code) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/253 [17:46:00] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [17:56:48] 06cloud-services-team, 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045 (10fnegri) 03NEW [17:57:30] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node [17:58:17] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [18:01:07] RESOLVED: ToolsDBReplicationLagIsTooHigh: ToolsDB replication is lagging - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationLagIsTooHigh [18:01:42] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674654 (10fnegri) I'll try to fix it following the runbook https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication#Runn... [18:01:48] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674655 (10fnegri) p:05Triage→03High [18:02:07] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674658 (10fnegri) 05Open→03In progress [18:02:20] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674661 (10fnegri) [18:03:57] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [18:04:07] FIRING: [2x] ToolsDBReplicationMissing: ToolsDB replication is not running - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationMissing [18:04:07] FIRING: ToolsDBReplicationError: ToolsDB replication is broken - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationError [18:06:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-etcd-29 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:13:37] FIRING: ToolsDBReplicationLagIsTooHigh: ToolsDB replication is lagging - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationLagIsTooHigh [18:13:42] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674688 (10fnegri) The DELETE query completed in 5 minutes. SET GLOBAL sql_slave_skip_counter failed, so I had to manually ad... [18:13:52] RESOLVED: ToolsDBReplicationLagIsTooHigh: ToolsDB replication is lagging - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationLagIsTooHigh [18:17:12] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [18:17:28] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [18:17:57] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674709 (10fnegri) Replication has resumed and the replica host is catching up. It might take a while, I have to log off for a... [18:18:37] FIRING: ToolsDBReplicationLagIsTooHigh: ToolsDB replication is lagging - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationLagIsTooHigh [18:18:50] (03approved) 10dcaro: jobs-api: bump to 0.0.468-20260304172934-e06d0070 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1168 (https://phabricator.wikimedia.org/T359649) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [18:18:56] (03merge) 10dcaro: jobs-api: bump to 0.0.468-20260304172934-e06d0070 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1168 (https://phabricator.wikimedia.org/T359649) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [18:19:37] RESOLVED: [2x] ToolsDBReplicationMissing: ToolsDB replication is not running - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationMissing [18:19:55] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11674742 (10fnegri) {F72519920} [18:20:37] RESOLVED: ToolsDBReplicationError: ToolsDB replication is broken - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationError [18:21:22] 06cloud-services-team, 10Toolforge: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#11674747 (10fnegri) [18:26:23] 06cloud-services-team, 10superset.wmcloud.org: Sunset superset.wmcloud.org - https://phabricator.wikimedia.org/T416373#11674761 (10KCVelaga_WMF) >>! In T416373#11665536, @Andrew wrote: >> If it is shutting down, it would be good to understand. which will be the available alternatives to provide communities wit... [18:27:45] 06cloud-services-team, 10superset.wmcloud.org: Sunset superset.wmcloud.org - https://phabricator.wikimedia.org/T416373#11674771 (10KCVelaga_WMF) > I tried to copy over the query for that graph to Quarry, but I was getting an error. Despite the error, I have the impression that Quarry seems quite SQL-centric wh... [18:30:54] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 488 bytes in 0.488 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [18:32:58] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tools-k8s-etcd-29 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:53:42] 10Tool-recaptime-dev: What is recaptime-dev and how does it advance the goals of the Wikimedia movement? - https://phabricator.wikimedia.org/T418818#11674926 (10bd808) [18:57:52] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [18:58:58] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [18:59:02] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:00:07] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:00:58] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:02:02] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:02:10] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:03:14] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:03:44] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:04:48] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:05:21] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:06:24] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:06:27] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:07:31] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:07:54] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:08:57] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:09:03] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:09:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-etcd-29 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [19:10:07] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:10:24] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:11:27] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:11:31] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:12:01] !log root@cloudcumin1001 tools END (ERROR) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=97) [19:12:09] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [19:13:14] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) [19:13:52] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:14:28] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:14:51] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:29:40] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:30:56] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:44:54] !log root@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [19:45:31] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:46:07] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:46:27] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:57:41] !log root@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:58:19] !log root@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:10:28] !log root@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [20:12:05] 10Tool-inteGraality: Support explicit groupings - https://phabricator.wikimedia.org/T419059 (10JeanFred) 03NEW [20:12:54] 10Tool-inteGraality: Support explicit groupings - https://phabricator.wikimedia.org/T419059#11675182 (10JeanFred) I’m pretty much done with this. one question: should the grouping order be preserved in the output, instead of being sorted by count ? Or be sorted by count, just like now? [20:20:54] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.440 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [20:23:08] 10Tool-inteGraality: Support explicit groupings - https://phabricator.wikimedia.org/T419059#11675212 (10Ainali) Sorted by count is what I would expect. [20:26:58] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance tools-k8s-etcd-29 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [21:17:36] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11675426 (10fnegri) Got stuck again on a similar query: ` | 30836877 | system user | | s51698__yetkin | Slave_wo... [21:19:07] FIRING: [2x] ToolsDBReplicationMissing: ToolsDB replication is not running - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationMissing [21:19:07] FIRING: ToolsDBReplicationError: ToolsDB replication is broken - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationError [21:19:51] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [21:24:07] RESOLVED: [2x] ToolsDBReplicationMissing: ToolsDB replication is not running - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationMissing [21:24:07] RESOLVED: ToolsDBReplicationError: ToolsDB replication is broken - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBReplicationError [21:25:26] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11675511 (10fnegri) The manual `DELETE` completed in 3 minutes, replication has resumed but I see there are more slow queries o... [21:35:17] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [21:42:34] (03CR) 10Jforrester: [C:03+2] "check experimental" [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1246076 (owner: 10Libraryupgrader) [21:42:53] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2026-03-04 - https://phabricator.wikimedia.org/T419045#11675596 (10fnegri) 05In progress→03Resolved We're back in sync! {F72550482} [21:43:07] 06cloud-services-team, 10Openstack-Magnum: Investigate new Magnum drivers - https://phabricator.wikimedia.org/T393782#11675602 (10bd808) [21:44:04] (03CR) 10Jforrester: "check experimental" [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/1063008 (https://phabricator.wikimedia.org/T372565) (owner: 10Pwangai) [21:46:04] (03CR) 10Jforrester: "check experimental" [labs/countervandalism/cvn-api] - 10https://gerrit.wikimedia.org/r/879971 (owner: 10Krinkle) [21:48:37] (03CR) 10Jforrester: "check experimental" [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/1152873 (owner: 10Krinkle) [21:51:12] (03CR) 10Jforrester: "check experimental" [labs/tools/orphantalk] - 10https://gerrit.wikimedia.org/r/1242375 (owner: 10L10n-bot) [21:52:52] (03CR) 10Jforrester: "check experimental" [labs/tools/usage] - 10https://gerrit.wikimedia.org/r/908978 (owner: 10Krinkle) [21:56:09] (03CR) 10Jforrester: "check experimental" [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/1235785 (owner: 10L10n-bot) [21:58:18] 10Tool-Global-user-contributions, 07PHP 8.5 support: labs/tools/guc fails its CI tests in PHP 8.5 due to old Phan install - https://phabricator.wikimedia.org/T419080 (10Jdforrester-WMF) 03NEW [21:58:31] (03CR) 10Jforrester: "check experimental" [labs/tools/fileprotectionsync] - 10https://gerrit.wikimedia.org/r/1149847 (owner: 10AntiCompositeNumber) [22:01:06] (03CR) 10Jforrester: "check experimental" [labs/tools/toolbase] - 10https://gerrit.wikimedia.org/r/1180641 (owner: 10Krinkle) [22:26:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-34 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [22:32:08] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11675842 (10thcipriani) The background here: - We've been running out DO cluster since 2021 or so and we've been keen to get to a platform that is supported by SRE. - We talked with WMCS about this back... [22:35:59] 06cloud-services-team, 10Toolforge (Toolforge iteration 25): [harbor,toolsbeta] for some reason maintain_harbor seems to not be cleaning up toolforge/* images - https://phabricator.wikimedia.org/T417894#11675860 (10Raymond_Ndibe) [22:36:32] 06cloud-services-team, 10Toolforge (Toolforge iteration 25): [harbor,toolsbeta] for some reason maintain_harbor seems to not be cleaning up toolforge/* images - https://phabricator.wikimedia.org/T417894#11675872 (10Raymond_Ndibe) a:03Raymond_Ndibe [22:39:47] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11675877 (10Andrew) Hey folks, sorry about the not-very-coherent response on this. The bottom line is that compute+storage resources are not an issue, we can definitely provide what you need. The thing t... [22:45:00] 06cloud-services-team, 10Openstack-Magnum: Investigate new Magnum drivers - https://phabricator.wikimedia.org/T393782#11675893 (10Andrew) The status of this is: - The new drivers are running in codfw1dev and seem to be working. - Migrating to the new drivers in eqiad1 will require some template adjustments fo... [22:45:57] (03open) 10raymond-ndibe: jobs: fix toolsbeta stale images cleanup bug [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/62 (https://phabricator.wikimedia.org/T417894) [22:46:09] (03update) 10raymond-ndibe: jobs: fix toolsbeta stale images cleanup bug [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/62 (https://phabricator.wikimedia.org/T417894) [23:11:24] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [23:11:51] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [23:28:03] (03update) 10raymond-ndibe: logs-api,logs: add logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:28:32] (03update) 10raymond-ndibe: logs-api,logs: add logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:28:48] (03update) 10raymond-ndibe: logs-api,logs: add logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:29:29] (03update) 10raymond-ndibe: logs-api: add more logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:37:29] (03update) 10raymond-ndibe: logs-api: add more logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:38:23] (03update) 10raymond-ndibe: logs-api: add more logs tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1140 (https://phabricator.wikimedia.org/T418326) [23:47:45] (03update) 10raymond-ndibe: jobs: fix toolsbeta stale images cleanup bug [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/62 (https://phabricator.wikimedia.org/T417894) [23:50:13] (03update) 10raymond-ndibe: jobs: fix toolsbeta stale images cleanup bug [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/62 (https://phabricator.wikimedia.org/T417894) [23:55:43] (03update) 10raymond-ndibe: jobs: fix toolsbeta stale images cleanup bug [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/62 (https://phabricator.wikimedia.org/T417894)