[02:54:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudgw1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:04:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cloudgw1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [04:08:45] 06cloud-services-team, 10Toolforge: [builds-builder,dotnet] migrate to Heroku buildpack for dotnet 10 - https://phabricator.wikimedia.org/T412653#11490818 (10Hawkeye7) I couldn't get heroku:24 to work, but I was able to build .NET 10 with ` pack build $CONTAINER --buildpack paketo-buildpacks/dotnet-core --buil... [04:47:23] (03CR) 10Ladsgroup: [C:03+2] "root@codesearch9:/srv/codesearch/frontend# docker build . -t codesearch-frontend && systemctl restart codesearch-frontend" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1221217 (https://phabricator.wikimedia.org/T413538) (owner: 10Reedy) [04:54:34] 10VPS-project-Codesearch: codesearch vm running out of inode again - https://phabricator.wikimedia.org/T413739 (10Ladsgroup) 03NEW [05:02:59] (03CR) 10Ladsgroup: [C:03+2] "(deployed)" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1221217 (https://phabricator.wikimedia.org/T413538) (owner: 10Reedy) [05:39:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [05:44:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [09:37:45] (03PS1) 10Majavah: openstack: quota_increase: Use absolute value for RAM 1G check [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223146 [09:42:54] (03PS1) 10Majavah: wmcs_libs: openstack: Support negative human quota amounts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223147 [09:43:04] !log taavi@cloudcumin1001 mwoffliner START - Cookbook wmcs.openstack.quota_increase (T411751) [09:43:08] T411751: Temporary quota increase for mwoffliner project - https://phabricator.wikimedia.org/T411751 [09:43:12] !log taavi@cloudcumin1001 mwoffliner END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T411751) [09:46:11] (03PS1) 10Majavah: openstack: quota_increase: Set runtime_description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223148 [09:56:16] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Temporary quota increase for mwoffliner project - https://phabricator.wikimedia.org/T411751#11491168 (10taavi) 05Open→03Resolved [10:07:27] 06cloud-services-team, 10Striker: Prevent globally blocked users from requesting Toolforge access - https://phabricator.wikimedia.org/T413641#11491198 (10taavi) p:05Triage→03Medium [10:08:26] (03CR) 10FNegri: [C:03+1] openstack: quota_increase: Use absolute value for RAM 1G check [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223146 (owner: 10Majavah) [10:08:56] (03CR) 10FNegri: [C:03+1] wmcs_libs: openstack: Support negative human quota amounts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223147 (owner: 10Majavah) [10:09:47] (03CR) 10FNegri: [C:03+1] openstack: quota_increase: Set runtime_description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223148 (owner: 10Majavah) [10:10:10] (03CR) 10Majavah: [C:03+2] openstack: quota_increase: Use absolute value for RAM 1G check [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223146 (owner: 10Majavah) [10:10:12] (03CR) 10Majavah: [C:03+2] wmcs_libs: openstack: Support negative human quota amounts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223147 (owner: 10Majavah) [10:10:16] (03CR) 10Majavah: [C:03+2] openstack: quota_increase: Set runtime_description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223148 (owner: 10Majavah) [10:10:54] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491202 (10taavi) Hey. You will need to disable Cloudflare's "proxying" (MITM-as-a-service) functionality before we can use that domain. [10:13:27] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491203 (10Nemoralis) >>! In T413467#11491202, @taavi wrote: > Hey. You will need to disable Cloudflare's "proxying" (MITM-as-a-service) functionality before we can use that domain. Done. [10:14:05] (03Merged) 10jenkins-bot: openstack: quota_increase: Use absolute value for RAM 1G check [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223146 (owner: 10Majavah) [10:14:05] (03Merged) 10jenkins-bot: wmcs_libs: openstack: Support negative human quota amounts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223147 (owner: 10Majavah) [10:14:05] (03Merged) 10jenkins-bot: openstack: quota_increase: Set runtime_description [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1223148 (owner: 10Majavah) [10:17:07] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491204 (10taavi) a:03taavi [10:28:22] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491230 (10taavi) https://wikilovesmonuments.az/ now has a valid cert and points to the "No such proxy" page. You should be all set to create the proxy via Horizon. [10:36:32] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491256 (10Nemoralis) @taavi what should I use as hostname when creating a web proxy? I tried `wlmaz` but then it creates proxy for `wlmaz.wikilovesmonuments.az` {F71445671} [10:38:20] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491273 (10taavi) Documented at https://wikitech.wikimedia.org/wiki/Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet#Vanity_domains: > When configuring the web proxy, use `@` in... [10:39:50] 06cloud-services-team, 10Cloud-VPS: Configure vanity domain for wlmaz - https://phabricator.wikimedia.org/T413467#11491280 (10Nemoralis) 05Open→03Resolved https://wikilovesmonuments.az/ now works! Thanks! [10:55:39] (03open) 10samwilson: Make header logo and title into a link [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/5 [12:23:42] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/28 [12:54:25] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/28 (owner: 10l10n-bot) [12:54:28] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/28 (owner: 10l10n-bot) [12:55:35] 06cloud-services-team, 10Toolforge: add sql script to mono6.8 - https://phabricator.wikimedia.org/T413508#11491768 (10taavi) 05Open→03Declined The MariaDB command-line tool is installed in the `mariadb` image. And presumably, if you're using a `mono` image, you're running some Mono software which can u... [12:56:23] 06cloud-services-team, 10Cloud-VPS: upgrade WMCS ceph nodes to Debian Trixie - https://phabricator.wikimedia.org/T413726#11491772 (10taavi) p:05Triage→03Medium [12:56:40] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Maintain-dbusers is having sustained errors - https://phabricator.wikimedia.org/T413558#11491774 (10taavi) p:05Triage→03High [13:46:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [13:51:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [14:19:44] 06cloud-services-team, 10Toolforge: lima-kilo: INJECT_FACTS_AS_VARS default to `True` is deprecated - https://phabricator.wikimedia.org/T413782 (10taavi) 03NEW [14:51:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.922% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:01:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.877% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:40:51] 06cloud-services-team, 10PAWS: paws-nfs-1 attempts invalid NFS mounts - https://phabricator.wikimedia.org/T413786 (10fnegri) 03NEW [15:44:02] 06cloud-services-team, 10PAWS: [bug] server won't launch - https://phabricator.wikimedia.org/T413510#11492459 (10fnegri) 05Open→03Resolved a:03fnegri >>! In T413510#11487487, @fnegri wrote: > I had a quick look at this issue, the NFS server at paws-nfs-1.paws.eqiad1.wikimedia.cloud is failing to moun... [16:02:49] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 80.18% full for project tools-logging - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [16:05:44] 06cloud-services-team, 10PAWS: paws-nfs-1 attempts invalid NFS mounts - https://phabricator.wikimedia.org/T413786#11492603 (10fnegri) 05Open→03Resolved a:03fnegri These mounts were probably remnants of a previous configuration that were not cleaned out correctly. I removed them manually from `/etc/f... [16:20:26] (03merge) 10bd808: Fix errors when running tool locally [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/5 (owner: 10hashar) [16:22:23] (03update) 10bd808: Split testwikis to their own group [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/6 (https://phabricator.wikimedia.org/T412834) (owner: 10hashar) [16:32:03] (03update) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:32:04] (03update) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:32:06] (03update) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:32:06] (03open) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:32:08] (03open) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:32:11] (03open) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:32:21] (03update) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:32:22] (03update) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:32:22] (03update) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:32:38] (03update) 10bd808: Split testwikis to their own group [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/6 (https://phabricator.wikimedia.org/T412834) (owner: 10hashar) [16:32:42] (03update) 10bd808: Split testwikis to their own group [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/6 (https://phabricator.wikimedia.org/T412834) (owner: 10hashar) [16:33:16] (03merge) 10bd808: Split testwikis to their own group [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/6 (https://phabricator.wikimedia.org/T412834) (owner: 10hashar) [16:33:36] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.31 - https://phabricator.wikimedia.org/T372697#11492721 (10taavi) [16:35:23] (03update) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:35:23] (03update) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:35:23] (03update) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:35:52] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [lima-kilo] error mounting docker cache - https://phabricator.wikimedia.org/T411208#11492727 (10taavi) a:03taavi [16:39:00] (03update) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:39:00] (03update) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:39:01] (03update) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:40:07] (03approved) 10fnegri: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 (owner: 10taavi) [16:41:45] (03approved) 10fnegri: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) (owner: 10taavi) [16:44:17] (03approved) 10fnegri: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) (owner: 10taavi) [16:44:54] (03merge) 10taavi: docker: Continue to use overlay2 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 (https://phabricator.wikimedia.org/T411208) [16:44:54] (03update) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:44:55] (03merge) 10taavi: playbooks: Upgrade Kind to v0.31.0 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/301 [16:44:58] (03update) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:45:03] (03merge) 10taavi: kind: Update Kubernetes to 1.31 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/302 (https://phabricator.wikimedia.org/T372697) [16:47:56] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [lima-kilo] error mounting docker cache - https://phabricator.wikimedia.org/T411208#11492761 (10taavi) 05Open→03Resolved [16:48:04] 10Tools, 06Abstract Wikipedia team, 07Design: tools.abstract-wiki-prototype hosts an open registration MediaWiki install - https://phabricator.wikimedia.org/T413324#11492763 (10Jdforrester-WMF) This was created by @gonyeahialam as part of his design prototyping work; I don't know if anyone else has access. P... [16:49:08] 06cloud-services-team, 10Toolforge: lima-kilo: Migrate away from overlay2 storage - https://phabricator.wikimedia.org/T413793 (10taavi) 03NEW [17:12:20] 06cloud-services-team, 10Toolforge: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796 (10taavi) 03NEW [17:12:50] 06cloud-services-team, 10Toolforge: Upgrade tools cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413797 (10taavi) 03NEW [17:13:09] 06cloud-services-team, 10Toolforge: Upgrade toolsbeta cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413796#11492920 (10taavi) p:05Triage→03High [17:13:19] 06cloud-services-team, 10Toolforge: Upgrade tools cluster to Kubernetes 1.31 - https://phabricator.wikimedia.org/T413797#11492921 (10taavi) p:05Triage→03High [17:48:17] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: updatetools is frequently timing out - https://phabricator.wikimedia.org/T413431#11493099 (10fnegri) [17:48:27] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Uprade cloudservices1005 and cloudservices1006 to MariaDB 10.11 - https://phabricator.wikimedia.org/T409395#11493104 (10fnegri) [17:48:48] 10Cloud Services Proposals, 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06Data-Persistence, and 2 others: Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#11493106 (10fnegri) [17:48:54] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS (Debian Bullseye Deprecation): Upgrade cloudinfra database hosts off of Bullseye - https://phabricator.wikimedia.org/T402005#11493108 (10fnegri) [17:49:05] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services: [wikireplicas] Gather usage stats - https://phabricator.wikimedia.org/T381587#11493110 (10fnegri) [17:49:14] 06cloud-services-team (FY2025/2026-Q1-Q2), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716#11493112 (10fnegri) [17:49:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 13Patch-For-Review: [wikireplicas] add proper dry-run/diff mode to maintain-views - https://phabricator.wikimedia.org/T351637#11493114 (10fnegri) [17:49:38] 06cloud-services-team (FY2025/2026-Q1-Q2), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716#11493116 (10fnegri) [17:52:19] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11493125 (10fnegri) [17:52:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11493126 (10fnegri) p:05High→03Medium [17:55:24] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#11493141 (10fnegri) [17:56:23] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 10Toolforge: If the inactive clouddumps host goes down, it causes a ripple effect on Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T391369#11493151 (10fnegri) [18:01:22] 06cloud-services-team, 10GitLab (CI & Job Runners), 06Release-Engineering-Team (Priority Backlog 📥): Recent incidents of buildkitd's storage volume filling up - https://phabricator.wikimedia.org/T395097#11493173 (10fnegri) [18:02:39] 06cloud-services-team, 06serviceops, 06SRE: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#11493258 (10fnegri) [18:03:20] 06cloud-services-team, 10Striker, 13Patch-For-Review: Attaching Phabricator account to a second Developer account via Striker results in a fatal error - https://phabricator.wikimedia.org/T319500#11493261 (10fnegri) [18:21:47] 06cloud-services-team (FY2025/2026-Q1-Q2): MaxConntrack Max conntrack at 100% on cloudcephosd1042:9100 - https://phabricator.wikimedia.org/T402480#11493343 (10fnegri) 05Open→03Resolved a:03fnegri Optimistically resolving, this alert has never fired in the past 2 months. [18:26:20] 10Tool-curator: Improve title validation in frontend - https://phabricator.wikimedia.org/T413512#11493396 (10DaxServer) 05In progress→03Resolved [18:26:26] 10Tool-curator: Re-trigger file upload upon uploadstash-file-not-found error - https://phabricator.wikimedia.org/T413564#11493398 (10DaxServer) 05In progress→03Resolved [18:26:39] 10Tool-curator: Allow updating SDC (safe-merge) when a file already exists - https://phabricator.wikimedia.org/T412958#11493400 (10DaxServer) 05Open→03In progress [18:32:35] 06cloud-services-team, 10Toolforge: python3-venv is missing on toolforge - https://phabricator.wikimedia.org/T413683#11493413 (10bd808) 05Declined→03Invalid [18:32:46] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance: [ceph] export number of bad sectors per-disk - https://phabricator.wikimedia.org/T348716#11493414 (10fnegri) [18:41:20] 10Toolforge: [builds-cli] No obvious way to delete individual `toolforge build` generated artifacts other than `toolforge clean` - https://phabricator.wikimedia.org/T368317#11493467 (10fnegri) 05In progress→03Stalled [18:41:42] 06cloud-services-team, 10Toolforge: [builds-cli] No obvious way to delete individual `toolforge build` generated artifacts other than `toolforge clean` - https://phabricator.wikimedia.org/T368317#11493469 (10fnegri) [18:42:03] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10WikiCite, 10Wikidata, 10Wikidata-Query-Service: Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched - https://phabricator.wikimedia.org/T413097#11493470 (10Physikerwelt) p:05Medium→03High Raising priorit... [18:46:43] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10WikiCite, 10Wikidata, 10Wikidata-Query-Service: Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched - https://phabricator.wikimedia.org/T413097#11493508 (10taavi) p:05High→03Medium Resetting priority. Re... [18:52:10] 06cloud-services-team, 10Cloud-VPS: [cookbook,ceph] bootstrap_and_add ceph cookbook failed to add a new single osd 66 on host cloudcephosd1004 - https://phabricator.wikimedia.org/T402516#11493556 (10fnegri) [18:54:50] 10Cloud Services Proposals, 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 05Cloud-Services-Origin-Team, and 3 others: [builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push... - https://phabricator.wikimedia.org/T194332#11493570 [18:55:33] 10Cloud Services Proposals, 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 25), 05Cloud-Services-Origin-Team, and 3 others: [builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push... - https://phabricator.wikimedia.org/T194332#11493573 [18:58:09] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: updatetools is frequently timing out - https://phabricator.wikimedia.org/T413431#11493580 (10fnegri) This happened a few more times during the holidays, and I'm no longer sure the timeout happens during the LDAP step, as I did see one example where it hap... [18:59:49] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10WikiCite, 10Wikidata, 10Wikidata-Query-Service: Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched - https://phabricator.wikimedia.org/T413097#11493594 (10Physikerwelt) That's a problem. >>! In T413097#11... [19:02:51] 06cloud-services-team (FY2025/2026-Q3-Q4), 10PAWS: paws-nfs-1 attempts invalid NFS mounts - https://phabricator.wikimedia.org/T413786#11493599 (10fnegri) [19:14:59] 10Tool-Pageviews, 06Data-Engineering: Generate 2025 topviews yearly datasets - https://phabricator.wikimedia.org/T413393#11493666 (10MusikAnimal) 05Open→03In progress [19:16:25] 06cloud-services-team, 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [harbor,infra] Find a way to manage toolforge project policies with code - https://phabricator.wikimedia.org/T360509#11493673 (10fnegri) 05Stalled→03Open > What is this stalled on? Nothing, I just wanted to show that nobody... [20:03:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 80.31% full for project tools-logging - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [21:28:28] FIRING: InstanceDown: Project tools instance tools-k8s-etcd-23 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:30:10] 06cloud-services-team, 10Toolforge: updatetools should alert instead of emailing - https://phabricator.wikimedia.org/T413099#11493994 (10bd808) The emails are created by `toolforge jobs run --emails onfailure`. It would certainly be neat if this system supported integration with prometheus alerts. That could p... [21:34:25] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.012 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [22:18:26] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:32:17] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [22:32:22] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [22:34:42] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate True, for hosts list: ['cloudvirtlocal1002'] [22:35:33] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate True, for hosts list: ['cloudvirtlocal1002'] [22:38:23] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [22:38:28] RESOLVED: InstanceDown: Project tools instance tools-k8s-etcd-23 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:39:18] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [22:39:22] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [22:40:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [22:40:39] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [22:43:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,nova [22:43:26] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:49:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,nova [22:49:43] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [22:49:47] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [22:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:49:59] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [22:51:08] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [22:52:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [22:59:44] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [22:59:49] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [23:01:40] !log andrew@cloudcumin1001 tools END (ERROR) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=97) [23:01:57] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [23:02:07] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:04:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:10:20] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:14:00] FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:14:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:16:22] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [23:16:27] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [23:17:27] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:18:34] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [23:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:19:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:20:41] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:20:57] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [23:32:00] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [23:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [23:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown