[00:21:51] 10VPS-project-Codesearch: not getting matches in modules/admin/data/data.yaml - https://phabricator.wikimedia.org/T419484#11690577 (10A_smart_kitten) AIUI this has the same root-cause as {T241033} -- Codesearch [[https://codesearch.wmcloud.org/puppet/?action=excludes#:~:text=modules/admin/data,probably%20not%20t... [00:45:19] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/22881598726 (https://github.com/cluebotng/component-configs/commits/bc32d8044077ff83db8b985b87df029ff564ad29) [00:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [04:32:11] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 07Documentation, 07good first task: Update Help:Access to Toolforge instances with PuTTY and WinSCP - https://phabricator.wikimedia.org/T334697#11690763 (10Hype4shreshth) a:03Hype4shreshth [04:38:33] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 06RoadToWiki, and 2 others: Update Help:Access to Toolforge instances with PuTTY and WinSCP - https://phabricator.wikimedia.org/T334697#11690777 (10Hype4shreshth) [06:53:22] 10VPS-project-Extdist, 10CirrusSearch, 06Discovery-Search: latest Elastica release does not include Elastica\Client? - https://phabricator.wikimedia.org/T418469#11690845 (10Nimworking) >>! In T418469#11690213, @Reedy wrote: > This *should* fix itself overnight when(ever) the nightly jobs run to build the lat... [07:10:56] FIRING: SystemdUnitDown: The service unit hdfs_rsync_mediawiki_content_history.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:21:09] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1249275 (owner: 10L10n-bot) [07:41:47] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11690883 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=5025cdeb-6797-439c-a30c-98b645a86cc9) set by filippo@cumin1003 for 4:00:00 on... [07:47:39] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11690885 (10fgiunchedi) For reference these are the host facing interfaces we'll be operating on: ` xe-0/0/3 up up cloudnet1005 {#20220119} xe... [07:56:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [08:53:10] RESOLVED: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown [08:53:19] RESOLVED: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown [08:53:25] RESOLVED: JobsApiDown: JobsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsApiDown [08:53:33] RESOLVED: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown [08:53:34] RESOLVED: ComponentsApiDown: ComponentsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ComponentsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DComponentsApiDown [08:54:14] RESOLVED: [12x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server toolsbeta-test-k8s-control-10.toolsbeta.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerD [08:54:15] FIRING: [12x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [08:54:21] RESOLVED: Toolforge Kyverno no policy resources: Toolforge Kyverno has no policy resources - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/Toolforge_Kyverno_no_policy_resources - https://grafana.wmcloud.org/d/kyverno/kyverno?orgId=1&var-DS_PROMETHEUS_KYVERNO=prometheus-tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforge+Kyverno+no+policy+resources [08:56:25] RESOLVED: [17x] ProbeDown: Service tools-k8s-haproxy-7:30004 has failed probes (http_infra_tracing_loki_svc_tools_eqiad1_wikimedia_cloud_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:56:47] RESOLVED: [9x] ProbeDown: Service toolsbeta-static-2:80 has failed probes (http_toolsbeta_static_wmcloud_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:57:21] RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [08:57:25] RESOLVED: JobsApiDown: JobsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsApiDown [08:57:58] RESOLVED: JobsEmailerDown: JobsEmailer is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerDown [08:58:18] RESOLVED: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown [08:58:18] RESOLVED: EnvvarsApiDown: EnvvarsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsApiDown [08:58:19] RESOLVED: TektonDown: Tekton is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/TektonDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTektonDown [08:58:21] RESOLVED: MaintainKubeusersDown: maintain-kubeusers is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainKubeusersDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DMaintainKubeusersDown [08:58:28] RESOLVED: ToolforgeKubernetesNodeNotReady: (no data) Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [08:58:34] RESOLVED: ToolforgeKubernetesNodeNotReady: (no data) Multiple Kubernetes nodes are not ready #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady [08:58:38] RESOLVED: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown [08:58:41] RESOLVED: BuildsApiDown: BuildsApi is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/BuildsApiDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DBuildsApiDown [08:58:46] RESOLVED: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown [08:58:50] RESOLVED: EnvvarsAdmissionDown: EnvvarsAdmission is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/EnvvarsAdmissionDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DEnvvarsAdmissionDown [08:59:14] RESOLVED: [12x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-7.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [09:03:09] (03CR) 10David Caro: "Tested during the network switch down maintenance" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1249902 (owner: 10David Caro) [09:03:26] (03CR) 10Filippo Giunchedi: [C:03+1] neutronhastate: add unknown [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1249902 (owner: 10David Caro) [09:05:56] FIRING: SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:06:14] (03approved) 10raymond-ndibe: gitlab: move to trixie images [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/12 (owner: 10dcaro) [09:06:28] RESOLVED: [2x] MetricsinfraAlertmanagerDown: Metricsinfra alertmanager is unreachable #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MetricsinfraAlertmanagerDown - TODO - https://alerts.wikimedia.org/?q=alertname%3DMetricsinfraAlertmanagerDown [09:09:39] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [09:09:45] (03update) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [09:09:46] (03approved) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [09:09:50] (03merge) 10raymond-ndibe: fix filelog bug [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/275 (https://phabricator.wikimedia.org/T417518) [09:13:50] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.476-20260310091002-5c8652ff [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1174 (https://phabricator.wikimedia.org/T417518) [09:13:54] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.476-20260310091002-5c8652ff [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1174 (https://phabricator.wikimedia.org/T417518) [09:17:32] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [09:22:10] (03update) 10raymond-ndibe: support start and end params in logs endpoint [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/267 (https://phabricator.wikimedia.org/T400917) [09:24:36] (03close) 10raymond-ndibe: [dev] add CONTRIBUTING.md, LICENSE [repos/cloud/toolforge/buildpacks/dotnetcore-buildpack] (move_to_api_0.10) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/dotnetcore-buildpack/-/merge_requests/1 (https://phabricator.wikimedia.org/T408783) [09:24:54] (03close) 10raymond-ndibe: [dev] add CONTRIBUTING.md, LICENSE [repos/cloud/toolforge/buildpacks/cmake-buildpack] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/cmake-buildpack/-/merge_requests/1 (https://phabricator.wikimedia.org/T408783) [09:25:12] (03close) 10raymond-ndibe: [dev] CONTRIBUTING.md, LICENSE [repos/cloud/toolforge/buildpacks/rust-buildpack] (move_to_api_0.10) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/rust-buildpack/-/merge_requests/1 (https://phabricator.wikimedia.org/T408783) [09:25:28] (03close) 10raymond-ndibe: [dev] add CONTRIBUTING.md, LICENSE [repos/cloud/toolforge/buildpacks/locale-buildpack] (move_to_api_0.10) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/locale-buildpack/-/merge_requests/1 (https://phabricator.wikimedia.org/T408783) [09:25:42] (03close) 10raymond-ndibe: [dev] add CONTRIBUTING.md, LICENSE [repos/cloud/toolforge/buildpacks/clojure-buildpack] (move_to_api_0.10) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/buildpacks/clojure-buildpack/-/merge_requests/1 (https://phabricator.wikimedia.org/T408783) [09:25:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [09:30:16] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [09:31:30] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [09:37:23] (03CR) 10David Caro: [C:03+2] neutronhastate: add unknown [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1249902 (owner: 10David Caro) [09:40:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [09:41:13] (03Merged) 10jenkins-bot: neutronhastate: add unknown [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1249902 (owner: 10David Caro) [09:43:16] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11691071 (10fgiunchedi) Tests have been completed for today: good news and bad news. Good news is that ceph with `ceph osd noout` behave as expected i.e. b... [09:45:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508 (10fgiunchedi) 03NEW [09:45:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [09:47:13] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [09:48:26] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.476-20260310091002-5c8652ff [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1174 (https://phabricator.wikimedia.org/T417518) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:48:27] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.476-20260310091002-5c8652ff [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1174 (https://phabricator.wikimedia.org/T417518) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:48:34] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.476-20260310091002-5c8652ff [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1174 (https://phabricator.wikimedia.org/T417518) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:54:29] FIRING: ToolforgeToolviewsStale: Toolviews data is stale - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeToolviewsStale - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeToolviewsStale [09:56:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [10:05:44] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11691118 (10fgiunchedi) Focusing on the neutron part first, I noticed two agents on cloudnet1006 are marked DOWN ` root@... [10:10:01] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11691123 (10cmooney) Thanks @fgiunchedi. The other aspects we should monitor are the keepalived operations on both the c... [10:10:17] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 4 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11691124 (10taavi) New problem: ` Mar 10 10:08:41 cloudcontrol1007 maintain-dbusers[3910393]: WARNING [root._create_accounts_o... [10:24:58] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [10:26:03] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [10:57:12] (03update) 10raymond-ndibe: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [10:57:58] 10Cloud-VPS (Project-requests), 10Lingua-Libre, 03Hackathon-Northwestern-Europe-2026: Request creation of lingualibre VPS project - https://phabricator.wikimedia.org/T419182#11691280 (10Yug) Hello @dcaro , Thanks a lot for this activation. Volumes : We need 50GB of space to temporarily store our audios.... [10:59:39] 10Cloud-VPS (Project-requests), 10Lingua-Libre, 03Hackathon-Northwestern-Europe-2026: Request creation of lingualibre VPS project - https://phabricator.wikimedia.org/T419182#11691293 (10taavi) >>! In T419182#11691280, @Yug wrote: > Hello @dcaro , > Thanks a lot for this activation. > Volumes : We need 5... [11:01:59] RESOLVED: ToolforgeToolviewsStale: Toolviews data is stale - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeToolviewsStale - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeToolviewsStale [11:03:20] 10VPS-project-Extdist, 10CirrusSearch, 06Discovery-Search: latest Elastica release does not include Elastica\Client? - https://phabricator.wikimedia.org/T418469#11691325 (10Reedy) [11:04:28] (03merge) 10dcaro: gitlab: move to trixie images [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/12 [11:05:30] (03open) 10dcaro: d/changelog: bump to 1.49.4 [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/13 (https://phabricator.wikimedia.org/T408783) [11:06:17] 10VPS-project-Extdist, 10CirrusSearch, 06Discovery-Search: latest Elastica release does not include Elastica\Client? - https://phabricator.wikimedia.org/T418469#11691342 (10Reedy) 05Open→03Resolved a:03Reedy You don’t need to open tasks for every extension when they all have the same underlying cau... [11:07:34] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11691356 (10fgiunchedi) [11:10:12] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11691360 (10dcaro) a:03dcaro > Yes. Pinging @dcaro to fill the quota request since he's on clinic duty this week. I'll take this as a +1 :) [11:10:18] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11691363 (10dcaro) [11:10:22] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11691365 (10dcaro) 05Open→03In progress [11:12:28] !log dcaro@cloudcumin1001 gitlab-runners START - Cookbook wmcs.openstack.quota_increase by 84 cores, 1 floating-ips, 890 gigabytes, 12 instances, 335872 ram, 18 volumes (T418813) [11:12:32] T418813: Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813 [11:12:32] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11691369 (10fgiunchedi) >>! In T419508#11691123, @cmooney wrote: > Thanks @fgiunchedi. The other aspects we should monit... [11:12:36] !log dcaro@cloudcumin1001 gitlab-runners END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) by 84 cores, 1 floating-ips, 890 gigabytes, 12 instances, 335872 ram, 18 volumes (T418813) [11:14:24] 10Cloud-VPS (Quota-requests): Quota increases for gitlab-runners - https://phabricator.wikimedia.org/T418813#11691373 (10dcaro) This should be done, let me know if you find any issues! {F72791433} [11:15:32] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli [11:19:11] 10VPS-project-Extdist, 10CirrusSearch, 06Discovery-Search: latest Elastica release does not include Elastica\Client? - https://phabricator.wikimedia.org/T418469#11691387 (10Nimworking) ok, thanks again and sorry for the extra trouble [11:27:55] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli [11:28:34] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli [11:35:25] 10Tool-wikicontest, 06Indic-TechCom: Make Category field optional during contest creation in Wikicontest Tool - https://phabricator.wikimedia.org/T419521 (10Adityakumar0545) 03NEW [11:44:10] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli [11:51:41] (03approved) 10dcaro: d/changelog: bump to 1.49.4 [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/13 (https://phabricator.wikimedia.org/T408783) [11:51:46] (03merge) 10dcaro: d/changelog: bump to 1.49.4 [repos/cloud/toolforge/misctools-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/merge_requests/13 (https://phabricator.wikimedia.org/T408783) [12:57:36] (03open) 10dcaro: builds-api: update buildpacks to 24_0.20.7/24_0.21.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1175 (https://phabricator.wikimedia.org/T380127) [13:06:11] FIRING: SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:25:42] 10Toolforge (Toolforge iteration 26): [jobs-api] omitting filelog field from jobs creation payload completely breaks jobs-api for the tool - https://phabricator.wikimedia.org/T417518#11691845 (10Raymond_Ndibe) 05In progress→03Resolved [13:31:08] 10VPS-project-Codesearch: not getting matches in modules/admin/data/data.yaml - https://phabricator.wikimedia.org/T419484#11691874 (10Dzahn) +1 on merging tasks and renaming them slightly to reflect this [13:47:00] 10VPS-project-Codesearch, 07Upstream: Codesearch is not searching some files that it thinks are "probably not text" (incl. package-lock.json, modules/admin/data/data.yaml) - https://phabricator.wikimedia.org/T241033#11691956 (10A_smart_kitten) [13:47:18] 10VPS-project-Codesearch: not getting matches in modules/admin/data/data.yaml - https://phabricator.wikimedia.org/T419484#11691959 (10A_smart_kitten) →14Duplicate dup:03T241033 [13:47:24] 10VPS-project-Codesearch, 07Upstream: Codesearch is not searching some files that it thinks are "probably not text" (incl. package-lock.json, modules/admin/data/data.yaml) - https://phabricator.wikimedia.org/T241033#11691961 (10A_smart_kitten) [13:51:51] (03open) 10dcaro: build: add use_deprecated_versions [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/155 (https://phabricator.wikimedia.org/T380127) [13:54:27] (03update) 10dcaro: build: add use_deprecated_versions [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/155 (https://phabricator.wikimedia.org/T380127) [13:54:55] 06cloud-services-team, 10Horizon: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538 (10Physikerwelt) 03NEW [14:16:53] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11692130 (10fgiunchedi) [14:31:57] 06cloud-services-team, 10Cloud-VPS: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538#11692199 (10taavi) [14:39:48] 06cloud-services-team, 10Cloud-VPS: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538#11692241 (10taavi) 05Open→03Invalid Something on the instance causes [[ https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&from=2026-03-10T09:56:33.469Z&to=2026... [14:51:06] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11692315 (10Andrew) It seems that the network services on 1006 were manually (or via cookbook) set to down. That would ce... [14:52:33] 06cloud-services-team, 10Cloud-VPS: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538#11692316 (10Physikerwelt) @taavi thank you. This information >>! In T419538#11692241, @taavi wrote: > ` > Mar 10 12:31:56 worker-2 systemd-journald[404]: Under memory pressure, flushin... [14:54:10] 06cloud-services-team, 10Cloud-VPS: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538#11692317 (10taavi) >>! In T419538#11692316, @Physikerwelt wrote: > Is very helpful for us. Can we access it? `journalctl --boot -1` will show the full system log for the previous boot. [14:59:50] 06cloud-services-team, 10Cloud-VPS: wmcloud node dies without any information - https://phabricator.wikimedia.org/T419538#11692355 (10Physikerwelt) Thank you. That's excellent. We can use the grafana dashboard to determine that there is a problem and then use the logs to find the traces. [15:15:29] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 13Patch-For-Review: Debug and understand why bringing down cloud net/gw/lb resulted in cloud vps network down - https://phabricator.wikimedia.org/T419508#11692473 (10taavi) 05Open→03Resolved [15:19:57] 06cloud-services-team, 10Horizon, 10OABot: Horizon logins failing in codfw1dev - https://phabricator.wikimedia.org/T419558 (10Andrew) 03NEW [15:21:19] 06cloud-services-team, 10Horizon: Horizon logins failing in codfw1dev - https://phabricator.wikimedia.org/T419558#11692593 (10taavi) [15:21:36] (03open) 10r4356th: Noramlise lemmas using Arabic script [toolforge-repos/wiktlexbot] - 10https://gitlab.wikimedia.org/toolforge-repos/wiktlexbot/-/merge_requests/2 (https://phabricator.wikimedia.org/T419315) [15:34:47] (03update) 10r4356th: Normalise lemmas using Arabic script [toolforge-repos/wiktlexbot] - 10https://gitlab.wikimedia.org/toolforge-repos/wiktlexbot/-/merge_requests/2 (https://phabricator.wikimedia.org/T419315) [15:35:04] (03merge) 10r4356th: Normalise lemmas using Arabic script [toolforge-repos/wiktlexbot] - 10https://gitlab.wikimedia.org/toolforge-repos/wiktlexbot/-/merge_requests/2 (https://phabricator.wikimedia.org/T419315) [15:37:01] 10Tool-wiktlexbot, 13Patch-For-Review: Transform Arabic script lemmas for entry titles - https://phabricator.wikimedia.org/T419315#11692647 (10Redmin) 05In progress→03Resolved a:03Redmin [15:38:24] 10Tool-wiktlexbot: Transform Arabic script lemmas for entry titles - https://phabricator.wikimedia.org/T419315#11692657 (10Redmin) [15:43:05] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-29 (2026-03-10 to 2026-03-24)): Fix linter issues discovered during implementation of the OAD example - https://phabricator.wikimedia.org/T414974#11692678 (10HCoplin-WMF) [15:43:47] 10Tool-delintbot: Explore visual regression testing - https://phabricator.wikimedia.org/T415882#11692686 (10Redmin) [15:45:43] 10Tool-delintbot: Add ability to only enable or disable certain fixes - https://phabricator.wikimedia.org/T416550#11692721 (10Redmin) [15:46:43] 10Tool-delintbot, 10Tool-redminbot: Use argparse to parse CLI arguments - https://phabricator.wikimedia.org/T415812#11692766 (10Redmin) [15:49:00] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (Google Summer of Code (2026)): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11692809 (10Only-Vikas) Hi everyone! I'm planning to apply for the CampWiz NxT Redesign project for GSoC 2026. I've spent some time reviewing the objectives,... [15:50:12] 10Tool-delintbot: Document how to operate the bot - https://phabricator.wikimedia.org/T416070#11692826 (10Redmin) [15:52:23] (03update) 10nurahwakili: Add missing information for author and work page [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/183 [15:53:36] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team: Fix linter issues discovered during implementation of the OAD example - https://phabricator.wikimedia.org/T414974#11692862 (10HCoplin-WMF) [15:55:38] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-29 (2026-03-10 to 2026-03-24)): Fix linter issues discovered during implementation of the OAD example - https://phabricator.wikimedia.org/T414974#11692877 (10HCoplin-WMF) [15:56:00] 06cloud-services-team, 10Quarry: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564 (10Pppery) 03NEW [15:57:19] 06cloud-services-team, 10Quarry, 07patch-welcome: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#11692900 (10taavi) p:05Triage→03Low [15:58:26] 10Tool-delintbot: Fix pwrap-bug-workaround errors - https://phabricator.wikimedia.org/T418464#11692911 (10Redmin) [16:01:59] (03merge) 10ttaylor: Adding search functionality [toolforge-repos/wikipedia-tree-browser] - 10https://gitlab.wikimedia.org/toolforge-repos/wikipedia-tree-browser/-/merge_requests/1 [16:11:20] (03update) 10dcaro: build: add use_deprecated_versions [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/155 (https://phabricator.wikimedia.org/T380127) [16:34:51] 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#11693248 (10Reedy) [16:57:43] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team: Add tests to ensure consistency between OAD example and OpenAPI linter - https://phabricator.wikimedia.org/T419576 (10KBach) 03NEW [16:59:29] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team: Add tests to ensure consistency between OAD example and OpenAPI linter - https://phabricator.wikimedia.org/T419576#11693498 (10KBach) [17:06:11] FIRING: SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:10:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [17:14:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for all services [17:42:26] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Add floating IP, PTR record, and vanity domain for azwikimedia project - https://phabricator.wikimedia.org/T419582 (10Nemoralis) 03NEW [17:42:42] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Add floating IP, PTR record, and vanity domain for azwikimedia project - https://phabricator.wikimedia.org/T419582#11693802 (10Nemoralis) [17:44:48] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Add floating IP and vanity domain for azwikimedia project - https://phabricator.wikimedia.org/T419582#11693815 (10Nemoralis) [17:57:28] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Add floating IP and vanity domain for azwikimedia project - https://phabricator.wikimedia.org/T419582#11693920 (10Nemoralis) [17:58:38] 06cloud-services-team, 10Horizon: Horizon logins failing in codfw1dev - https://phabricator.wikimedia.org/T419558#11693937 (10Andrew) 05Open→03Resolved a:03Andrew [18:08:48] !log root@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node [18:16:06] (03update) 10dcaro: build: add use_deprecated_versions [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/155 (https://phabricator.wikimedia.org/T380127) [18:20:41] !log root@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [18:29:36] (03open) 10dcaro: start: add --use-deprecated-versions flag [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/129 [18:31:45] (03update) 10dcaro: start: add --use-deprecated-versions flag [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/129 [18:59:09] (03update) 10dcaro: start: add --use-deprecated-versions flag [repos/cloud/toolforge/builds-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/129 [18:59:18] (03update) 10dcaro: build: add use_deprecated_versions [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/155 (https://phabricator.wikimedia.org/T380127) [19:26:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-34 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [19:33:00] 06cloud-services-team, 10Tool-spacemedia, 10Toolforge: [Build service] latest builder has old Java - https://phabricator.wikimedia.org/T405415#11694405 (10Don-vip) @dcaro is this update live for us? https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1175/diffs I tried t... [21:00:20] (03open) 10bd808: QuickInstantCommons: set user-agent and thumbnail steps [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/9 [21:06:11] FIRING: SystemdUnitDown: The systemd unit hdfs_rsync_mediawiki_content_history.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:20:02] (03merge) 10bd808: QuickInstantCommons: set user-agent and thumbnail steps [toolforge-repos/mwdemo] - 10https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/merge_requests/9 [22:12:03] (03CR) 10Jforrester: [C:03+2] build: Updating mediawiki/mediawiki-phan-config to 0.20.0 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1249464 (owner: 10Libraryupgrader) [23:07:25] 10Tools, 06All-and-every-Wikisource, 06Community-Tech, 10Wikimedia OCR: Wikisource OCR UI supplies a non-standard thumbnail size to the OCR tool hosted on the cloud - https://phabricator.wikimedia.org/T419246#11695427 (10Samwilson) @Karanissolvingproblems See [[https://gerrit.wikimedia.org/r/c/mediawiki/ex...