[00:26:14] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1159436 (owner: 10L10n-bot) [00:26:45] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1159435 (owner: 10L10n-bot) [00:28:54] (03Abandoned) 10Abijeet Patro: Localisation updates from https://translatewiki.net. [labs/tools/intuition-web] - 10https://gerrit.wikimedia.org/r/1156333 (owner: 10L10n-bot) [01:10:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1h 1m 10s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [02:20:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1h 7m 46s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [08:04:05] 10Toolforge (Toolforge iteration 21), 07Documentation: [components-api] Add admin documentation page - https://phabricator.wikimedia.org/T394280#10921581 (10dcaro) 05In progress→03Resolved [08:07:17] 10Toolforge (Toolforge iteration 21): [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10921585 (10dcaro) The multiproc fix did the trick, closing [08:07:21] 10Toolforge (Toolforge iteration 21): [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10921587 (10dcaro) 05In progress→03Resolved [08:08:10] 10Toolforge (Toolforge iteration 21): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10921595 (10dcaro) All alerts are now in place in both prometheus, and the stats are correct, closing. [08:08:16] 10Toolforge (Toolforge iteration 21): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10921597 (10dcaro) 05In progress→03Resolved [08:11:12] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10921610 (10taavi) >>! In T395837#10920138, @Mhurd wrote: > Having said all that, if our project had access to 40GB flavors, everything would //just work// as-is, with no other H... [08:26:46] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10921655 (10Jelto) Unfortunately the volume is still stuck to the deleted instance and this is blocking {T396622}. [08:29:20] (03update) 10dcaro: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) (owner: 10chuckonwumelu) [08:46:11] (03PS1) 10Klausman: hiera: Add pseudosecrets for MT Thanos-Swift access also for staging [labs/private] - 10https://gerrit.wikimedia.org/r/1160032 [08:46:17] (03CR) 10Klausman: [V:03+2 C:03+2] hiera: Add pseudosecrets for MT Thanos-Swift access also for staging [labs/private] - 10https://gerrit.wikimedia.org/r/1160032 (owner: 10Klausman) [09:29:57] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10921977 (10fnegri) p:05Medium→03High [09:31:12] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10921993 (10fnegri) [09:34:02] (03update) 10dcaro: [components.deployment.create] add force-build and force-run option [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T389044) (owner: 10raymond-ndibe) [09:36:00] (03open) 10taavi: toolforge: Install real `become` from misctools [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/248 [09:36:05] (03update) 10taavi: toolforge: Install real `become` from misctools [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/248 [09:37:21] (03approved) 10dcaro: [components.deployment.create] add force-build and force-run option [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T389044) (owner: 10raymond-ndibe) [09:37:33] (03merge) 10dcaro: [components.deployment.create] add force-build and force-run option [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/33 (https://phabricator.wikimedia.org/T389044) (owner: 10raymond-ndibe) [09:41:24] (03approved) 10dcaro: toolforge: Install real `become` from misctools [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/248 (owner: 10taavi) [09:45:00] (03open) 10dcaro: d/changelog: bump to 0.0.7 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/37 (https://phabricator.wikimedia.org/T389044) [09:52:13] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [09:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:56:01] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922103 (10fnegri) [09:56:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922106 (10fnegri) clouddb1017 upgraded and repooled. I restored the default wmt-pt-kill timeout in clouddb1013. [09:58:20] !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [09:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:06:18] 06cloud-services-team, 06Infrastructure-Foundations, 07LDAP: Make ldap-ro service available over IPv6 - https://phabricator.wikimedia.org/T397149 (10taavi) 03NEW [10:07:08] (03approved) 10dcaro: d/changelog: bump to 0.0.7 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/37 (https://phabricator.wikimedia.org/T389044) [10:07:11] (03merge) 10dcaro: d/changelog: bump to 0.0.7 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/37 (https://phabricator.wikimedia.org/T389044) [10:17:11] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922186 (10fnegri) [10:19:20] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10922203 (10fnegri) cc @Andrew who has fixed similar issues in the past. [10:28:52] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922239 (10fnegri) [10:29:02] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922241 (10fnegri) clouddb1014 upgraded and repooled. [10:30:28] FIRING: InstanceDown: Project tools instance tools-prometheus-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:32:52] 06cloud-services-team, 06Infrastructure-Foundations, 07IPv6, 07LDAP: Make ldap-ro service available over IPv6 - https://phabricator.wikimedia.org/T397149#10922281 (10taavi) [10:33:03] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Replace remaining IPv4 NAT exemptions by IPv6 adoption - https://phabricator.wikimedia.org/T396986#10922282 (10taavi) [10:36:05] (03open) 10dcaro: generate config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/89 [10:37:02] (03update) 10dcaro: generate config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T394753) [10:37:10] (03open) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [10:37:12] (03close) 10dcaro: generate config [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/89 (https://phabricator.wikimedia.org/T394753) [10:41:36] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [10:46:34] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [10:48:00] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [10:51:08] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [11:00:45] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922394 (10fnegri) [11:00:58] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10922395 (10fnegri) clouddb1018 upgraded and repooled. [11:30:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:41:21] 10PAWS: Update list of PAWS admins - https://phabricator.wikimedia.org/T397165 (10taavi) 03NEW [11:42:04] 10PAWS: create a dynamic banner - microtask for T388234: httpss://github.com/Jemeelah1/Dynamic-Banner - https://phabricator.wikimedia.org/T389577#10922600 (10taavi) How is this related to #PAWS? [11:43:05] 10PAWS: Move nfs off of puppet? - https://phabricator.wikimedia.org/T359622#10922612 (10taavi) →14Duplicate dup:03T383403 [11:43:06] 10PAWS: Automate deploy, or move away from nfs paws - https://phabricator.wikimedia.org/T383403#10922614 (10taavi) [11:57:13] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Store logs for Cloud VPS egress NAT mappings - https://phabricator.wikimedia.org/T273734#10922646 (10taavi) The patches from above is working almost as expected. Main problem at the moment is that the logs are also being logged by journald, which we don... [12:21:56] FIRING: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:31:56] RESOLVED: SystemdUnitDown: The service unit kiwix-mirror-update.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:32:29] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [13:00:07] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185 (10taavi) 03NEW [13:19:29] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [13:23:50] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [13:27:11] (03open) 10dcaro: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38 [13:27:11] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [13:27:16] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [13:35:03] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [13:35:57] (03update) 10dcaro: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38 [13:59:31] (03approved) 10addshore: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) (owner: 10dcaro) [14:00:41] (03approved) 10dcaro: logging: Init component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [14:09:11] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [14:09:51] (03update) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [14:25:38] (03merge) 10dcaro: deploy_task: force reruning when there was a build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/86 (https://phabricator.wikimedia.org/T389044) [14:28:33] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.119-20250617142601-78c0f80f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/817 (https://phabricator.wikimedia.org/T389044) [14:29:10] !log dcaro@acme toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [14:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:33:38] !log dcaro@acme toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [14:33:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:34:17] (03approved) 10dcaro: components-api: bump to 0.0.119-20250617142601-78c0f80f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/817 (https://phabricator.wikimedia.org/T389044) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:34:21] (03merge) 10dcaro: components-api: bump to 0.0.119-20250617142601-78c0f80f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/817 (https://phabricator.wikimedia.org/T389044) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:35:00] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10923744 (10dcaro) 05In progress→03Resolved [15:31:06] (03open) 10addshore: Add top level list command [repos/cloud/toolforge/toolforge-gen-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli/-/merge_requests/3 (https://phabricator.wikimedia.org/T393275) [15:31:23] (03close) 10addshore: Add top level list command [repos/cloud/toolforge/toolforge-gen-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-gen-cli/-/merge_requests/1 (https://phabricator.wikimedia.org/T393275) (owner: 10tarrow) [16:04:42] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098#10924470 (10Dzahn) [16:07:21] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924481 (10Dzahn) Would it be easier and more consistent to point all domains to the main Wikimedia NS servers as previously came up in another ticket? (... [16:12:50] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924503 (10taavi) >>! In T397185#10924480, @Dzahn wrote: > Would it be easier and more consistent to point all domains to the main Wikimedia NS servers a... [16:19:52] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10924578 (10taavi) 05Open→03Resolved a:03taavi This was discussed in today's Toolforge monthly meeting. The summary of the agreement there... [16:31:49] (03update) 10taavi: logging: Add basic rate limiting and retention config [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/807 (https://phabricator.wikimedia.org/T386480) [16:31:53] (03update) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [16:33:22] (03merge) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [16:33:27] (03update) 10taavi: logging: Add basic rate limiting and retention config [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/807 (https://phabricator.wikimedia.org/T386480) [16:33:35] (03merge) 10taavi: logging: Add basic rate limiting and retention config [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/807 (https://phabricator.wikimedia.org/T386480) [16:52:47] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10924738 (10Dzahn) Gotcha. Thanks for adding that. [17:02:01] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [17:03:10] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567] service implementation - https://phabricator.wikimedia.org/T397237 (10Andrew) 03NEW [17:04:00] (03update) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [17:04:50] 10Tool-paulina: Create staging environment - https://phabricator.wikimedia.org/T393279#10924796 (10Pepe_piton) 05In progress→03Resolved [17:06:02] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade to v16 - https://phabricator.wikimedia.org/T306820#10924818 (10Andrew) Notes from a discussion today: After everything is on Bullseye, we can upgrade to 16,... [17:07:21] (03merge) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [17:08:57] 10Toolforge (Toolforge iteration 21): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10924827 (10dcaro) [17:09:37] (03update) 10dcaro: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 [17:10:03] (03update) 10dcaro: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 [17:10:29] 10Toolforge (Toolforge iteration 21): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10924836 (10dcaro) [17:11:34] 10Toolforge (Toolforge iteration 21): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10924842 (10dcaro) [17:20:35] 10Tool-paulina: Improve country dropdown - https://phabricator.wikimedia.org/T393278#10924899 (10DidiCoronel) Feature implemented on Authors search but not on Works search yet. [17:27:25] 10Tool-paulina: Authors page info displays wrong public domain message - https://phabricator.wikimedia.org/T397241 (10DidiCoronel) 03NEW [17:29:30] 10Tool-paulina: Authors page info displays wrong public domain message - https://phabricator.wikimedia.org/T397241#10924959 (10DidiCoronel) a:03DidiCoronel [17:29:50] (03approved) 10taavi: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 (owner: 10dcaro) [17:29:52] (03update) 10taavi: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 (owner: 10dcaro) [17:30:44] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098#10924966 (10dcaro) @bd808 I'm guessing that there's nothing else left to do? (let me know) [17:31:17] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567] service implementation - https://phabricator.wikimedia.org/T397237#10924971 (10dcaro) p:05Triage→03High [17:31:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567] service implementation - https://phabricator.wikimedia.org/T397237#10924978 (10dcaro) [17:31:48] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567] service implementation - https://phabricator.wikimedia.org/T397237#10924980 (10dcaro) [17:32:43] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098#10924996 (10bd808) >>! In T397098#10924966, @dcaro wrote: > @bd808 I'm guessing that there's nothing else lef... [17:34:14] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10925006 (10ssingh) You can add them if required even though my reading of the other tasks and RFC indicates it is optional. That being said, why only v6... [17:34:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10925010 (10dcaro) p:05Triage→03High [17:35:22] FIRING: HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:37:43] !log andrew@cloudcumin1001 zuul START - Cookbook wmcs.openstack.quota_increase (T397098) [17:37:46] T397098: Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098 [17:37:50] !log andrew@cloudcumin1001 zuul END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T397098) [17:38:14] !log andrew@cloudcumin1001 zuul START - Cookbook wmcs.openstack.quota_increase (T397098) [17:38:19] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10925037 (10dcaro) Hmm.... this has been failing for a while, the logs of the service seem to show that there's s... [17:38:21] !log andrew@cloudcumin1001 zuul END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397098) [17:38:53] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098#10925041 (10Andrew) 05Open→03Resolved a:03Andrew [17:40:22] RESOLVED: HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:46:19] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10925061 (10Andrew) When I was reimaging cloudcephmon nodes I temporarily got things into a state where ceph api... [17:47:23] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [17:54:26] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10925115 (10taavi) >>! In T397185#10925006, @ssingh wrote: > You can add them if required even though my reading of the other tasks and RFC indicates it i... [18:01:49] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [18:20:28] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10925226 (10Addshore) >>! In T321919#10724833, @aborrero wrote: > I like this idea, and the semantics tha... [18:59:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:02:51] PROBLEM - Host cloudcephosd1014 is DOWN: PING CRITICAL - Packet loss = 100% [19:03:19] RECOVERY - Host cloudcephosd1014 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [19:05:31] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [19:06:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:06:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:08:35] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [19:11:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:13:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:18:47] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [19:19:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [19:23:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:30:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [19:37:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [19:38:45] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:45:45] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:51:43] PROBLEM - Host cloudcephosd1020 is DOWN: PING CRITICAL - Packet loss = 100% [19:53:21] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:53:31] RECOVERY - Host cloudcephosd1020 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [19:55:04] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [19:57:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:16:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds [20:19:51] PROBLEM - Host cloudcephosd1022 is DOWN: PING CRITICAL - Packet loss = 100% [20:21:19] RECOVERY - Host cloudcephosd1022 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [20:22:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:25:59] PROBLEM - Host cloudcephosd1023 is DOWN: PING CRITICAL - Packet loss = 100% [20:26:43] RECOVERY - Host cloudcephosd1023 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [20:31:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,cinder [20:31:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,cinder [20:31:53] PROBLEM - Host cloudcephosd1024 is DOWN: PING CRITICAL - Packet loss = 100% [20:32:51] RECOVERY - Host cloudcephosd1024 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [20:35:04] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.upgrade_osds (exit_code=0) [20:35:50] 10Cloud-VPS (Quota-requests): Pixel project "disk40" flavor, and perhaps a few more cores? - https://phabricator.wikimedia.org/T395837#10925620 (10Mhurd) 05Open→03Declined >>! In T395837#10921610, @taavi wrote: >>>! In T395837#10920138, @Mhurd wrote: >> Having said all that, if our project had access to... [20:50:16] (03open) 10hashar: gerrit-channels: introduce #wikimedia-zuul [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/58 [20:50:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:54:49] (03merge) 10bd808: gerrit-channels: introduce #wikimedia-zuul [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/58 (owner: 10hashar) [21:02:27] (03open) 10bd808: gitlab: Configure repos to report to #wikimedia-zuul [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/59 [21:04:01] (03merge) 10bd808: gitlab: Configure repos to report to #wikimedia-zuul [toolforge-repos/wikibugs2] - 10https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/59 [21:33:01] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991#10925819 (10Krinkle) I assume the reason it uses Nunjucks is to share the template files between Python and JS. Is that right? That is - are there cases where the same exact template file is used on both sides? Or is... [21:51:39] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10925872 (10Andrew) I am going to explore the possibility that this is https://www.reddit.com/r/openstack/comments/1e9... [23:38:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:43:51] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown