[01:10:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1h 1m 17s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [01:48:04] 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266 (10Mhurd) 03NEW [01:50:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1h 13m 55s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [02:18:28] FIRING: InstanceDown: Project tools instance tools-prometheus-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:11:35] 10Cloud-Services: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272 (10Mhurd) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project... [03:12:30] 06cloud-services-team: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272#10926333 (10Mhurd) [03:13:08] 06cloud-services-team, 10Horizon: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272#10926334 (10Mhurd) [03:14:58] 06cloud-services-team, 10Horizon: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272#10926347 (10Mhurd) [04:23:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-8 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [04:26:59] FIRING: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [04:31:59] RESOLVED: JobsEmailerNoEmails: No emails sent in the last hour - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/JobsEmailerNoEmails - https://prometheus-alerts.wmcloud.org/?q=alertname%3DJobsEmailerNoEmails [06:14:36] (03PS1) 10Giuseppe Lavagetto: Add stub api tokens for hiddenparma [labs/private] - 10https://gerrit.wikimedia.org/r/1160477 [07:27:47] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 10Continuous-Integration-Infrastructure (Zuul upgrade): Increase volume storage quota for zuul project - https://phabricator.wikimedia.org/T397098#10926707 (10dcaro) Oh, I got confused by the tofu patches, lately I'm not sure if it's tofu or cookbooks... [07:37:20] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280 (10A_smart_kitten) 03NEW [07:37:42] 06cloud-services-team, 10Horizon: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272#10926759 (10dcaro) This might be a side-effect of the latest upgrade, from the cloudweblogs: ` root@cloudweb1004:~# docker logs -f --tail 100 openstack-dashboard.service ... [Wed Jun 18 07:... [07:40:59] supertassu opened https://github.com/toolforge/quarry/pull/89 [07:42:35] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Add stub api tokens for hiddenparma [labs/private] - 10https://gerrit.wikimedia.org/r/1160477 (owner: 10Giuseppe Lavagetto) [07:42:58] 06cloud-services-team, 10Horizon: Horizon proxy tab Edit buttons not working - https://phabricator.wikimedia.org/T397272#10926775 (10taavi) https://gerrit.wikimedia.org/r/c/openstack/horizon/wmf-proxy-dashboard/+/1153732 looks very related. [08:01:34] (03PS1) 10David Caro: views: fix access to non-initialized self.instance_tuples [openstack/horizon/wmf-proxy-dashboard] - 10https://gerrit.wikimedia.org/r/1160677 (https://phabricator.wikimedia.org/T397272) [08:23:44] (03open) 10dcaro: README: use makrdown for nice presentation in gitlab [repos/cloud/cloud-vps/horizon/deploy] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/horizon/deploy/-/merge_requests/4 [08:44:13] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10926989 (10fnegri) [08:59:34] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927015 (10fnegri) [09:05:26] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927031 (10fnegri) [09:10:46] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927034 (10fnegri) [09:18:32] (03PS1) 10Giuseppe Lavagetto: Use an actual user for the fake api tokens [labs/private] - 10https://gerrit.wikimedia.org/r/1160706 [09:18:42] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Use an actual user for the fake api tokens [labs/private] - 10https://gerrit.wikimedia.org/r/1160706 (owner: 10Giuseppe Lavagetto) [09:20:59] supertassu closed https://github.com/toolforge/quarry/pull/89 [09:25:24] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927072 (10fnegri) [09:26:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 13Patch-For-Review: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927074 (10fnegri) 05In progress→03Resolved clouddb1016 and clouddb1020 are both upgraded and repooled. This task... [09:28:39] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10927081 (10fnegri) [09:33:41] (03open) 10dcaro: makefile: support podman [repos/cloud/cloud-vps/horizon/deploy] (use_markdown) - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/horizon/deploy/-/merge_requests/5 [09:48:37] (03update) 10dcaro: makefile: support podman [repos/cloud/cloud-vps/horizon/deploy] (use_markdown) - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/horizon/deploy/-/merge_requests/5 [09:55:45] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10927172 (10dcaro) It's still there :/, will take a look in a bit if nobody beats me to it [10:18:13] (03CR) 10Reedy: [C:03+2] Change Serbocroatian to Serbo-Croatian [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1153275 (https://phabricator.wikimedia.org/T395915) (owner: 10Acamicamacaraca) [10:18:59] (03Merged) 10jenkins-bot: Change Serbocroatian to Serbo-Croatian [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1153275 (https://phabricator.wikimedia.org/T395915) (owner: 10Acamicamacaraca) [10:23:56] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10927357 (10dcaro) It does seem to do the backup correctly (did not try a restore though): ` root@cloudbackup1001... [10:24:30] (03update) 10dcaro: README: use makrdown for nice presentation in gitlab [repos/cloud/cloud-vps/horizon/deploy] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/horizon/deploy/-/merge_requests/4 [11:15:09] 10Cloud-VPS (Project-requests): Request creation of lemmy VPS project - https://phabricator.wikimedia.org/T396948#10927585 (10Gryllida) @Aklapper which ones are already there in addition to ... mailing list, irc, and phab? [11:47:15] 10Cloud-VPS (Project-requests): Request creation of lemmy VPS project - https://phabricator.wikimedia.org/T396948#10927767 (10Aklapper) See https://www.mediawiki.org/wiki/Communication [11:51:15] 10Tool-quickcategories, 10gadget-Cat-a-lot: Allow passing actions and title URL parameters to QuickCategories batch creation from PagePile - https://phabricator.wikimedia.org/T397320 (10adiba_anjum) 03NEW [11:58:15] 10Tool-quickcategories, 10gadget-Cat-a-lot: Allow passing actions and title URL parameters to QuickCategories batch creation from PagePile - https://phabricator.wikimedia.org/T397320#10927814 (10adiba_anjum) [12:18:39] (03approved) 10dcaro: shared: Provision storage buckets for Loki [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/49 (https://phabricator.wikimedia.org/T396574) (owner: 10taavi) [12:19:14] (03merge) 10taavi: shared: Provision storage buckets for Loki [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/49 (https://phabricator.wikimedia.org/T396574) [12:50:59] (03update) 10taavi: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) (owner: 10dcaro) [12:51:00] (03approved) 10taavi: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) (owner: 10dcaro) [13:00:20] (03merge) 10dcaro: events: added a timeout and retry for k8s watch [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/24 (https://phabricator.wikimedia.org/T396850) [13:07:26] 10Quarry: quarry: Upgrade Python libraries - https://phabricator.wikimedia.org/T397331 (10taavi) 03NEW [13:09:58] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [13:10:21] (03update) 10chuckonwumelu: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) [13:11:29] 10Quarry: quarry: Use a proper Python package manager - https://phabricator.wikimedia.org/T397332 (10taavi) 03NEW [13:13:35] (03approved) 10fnegri: GET the latest deployment for a particular tool [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/87 (https://phabricator.wikimedia.org/T394990) (owner: 10chuckonwumelu) [13:13:48] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-emailer: bump to 0.0.60-20250618130041-27cd8c3d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/818 (https://phabricator.wikimedia.org/T396850) [13:19:34] supertassu opened https://github.com/toolforge/quarry/pull/90 [13:28:52] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [13:37:14] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [13:45:04] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [14:00:13] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928384 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2005.codfw.wmnet with OS bullseye [14:00:43] 10Cloud-VPS (Project-requests): Request creation of toolsbeta-logging VPS project - https://phabricator.wikimedia.org/T397339 (10taavi) 03NEW [14:01:31] 10Cloud-VPS (Project-requests): Request creation of toolsbeta-logging VPS project - https://phabricator.wikimedia.org/T397339#10928404 (10taavi) [14:01:36] 06cloud-services-team, 10Toolforge: Provision object storage volumes for Loki - https://phabricator.wikimedia.org/T396574#10928405 (10taavi) [14:04:35] 10Cloud-VPS (Project-requests): Request creation of toolsbeta-logging VPS project - https://phabricator.wikimedia.org/T397339#10928420 (10Andrew) +1 approved Implementing this isn't really a clinic duty thing, it'll be managed by opentofu. [14:04:38] 06cloud-services-team, 10Cloud-VPS: Create OpenStack role that allows object storage access only - https://phabricator.wikimedia.org/T396594#10928421 (10taavi) 05Open→03Resolved a:03Andrew I haven't actually tried this out but in theory this is done. [14:06:24] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928429 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2006.codfw.wmnet with OS bullseye [14:09:25] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [14:14:42] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10928454 (10ssingh) Ah, thanks for the context. Looking at the domains in question, all of them are delegated at Markmonitor, which explains why they have... [14:16:04] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [14:20:34] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928484 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host cloudcephosd2007.codfw.wmnet with OS bullseye [14:22:57] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [14:23:35] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [14:35:45] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [14:36:51] (03approved) 10dcaro: jobs-emailer: bump to 0.0.60-20250618130041-27cd8c3d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/818 (https://phabricator.wikimedia.org/T396850) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:36:54] (03merge) 10dcaro: jobs-emailer: bump to 0.0.60-20250618130041-27cd8c3d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/818 (https://phabricator.wikimedia.org/T396850) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:43:45] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [14:47:21] 10Toolforge (Toolforge iteration 21), 07good first task, 13Patch-For-Review: [components-cli] make `toolforge components deployment show` show the latest deployment if no id passed - https://phabricator.wikimedia.org/T394994#10928558 (10Chuckonwumelu) 05Open→03In progress [14:55:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Quarry: Deploy prometheus-redis-exporter - https://phabricator.wikimedia.org/T396771#10928594 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/90 [14:55:58] supertassu closed https://github.com/toolforge/quarry/pull/90 [15:02:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph}' [15:03:17] (03open) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [15:04:54] 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [jobs-emailer] stops processing k8s events - https://phabricator.wikimedia.org/T396850#10928642 (10dcaro) 05In progress→03Resolved [15:07:58] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [15:23:20] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10928684 (10ssingh) I checked with Brandon and he confirmed that while not strictly required, we can add the AAAA records here so let me know if you want... [15:26:05] (03update) 10dcaro: config: add endpoint to generate sample config [repos/cloud/toolforge/components-api] (create_runtime) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/90 (https://phabricator.wikimedia.org/T394753) [15:29:19] andrew@cloudcumin1001 safe_reboot (PID 3520234) is awaiting input [15:35:20] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928756 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudcephosd2006.codfw.wmnet with OS bullseye ex... [15:36:20] (03open) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [15:37:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928762 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudcephosd2007.codfw.wmnet with OS bullseye ex... [15:38:27] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928772 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2007.codfw.wmnet with OS bullseye [15:44:31] andrew@cloudcumin1001 safe_reboot (PID 3520234) is awaiting input [15:44:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928800 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host cloudcephosd2005.codfw.wmnet with OS bullseye ex... [15:48:53] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928813 (10Jhancock.wm) [15:58:39] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [15:59:45] andrew@cloudcumin1001 safe_reboot (PID 3520234) is awaiting input [16:16:41] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10928897 (10Dzahn) a:03Dzahn [16:20:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10928914 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2005.codfw.wmnet with OS bullseye [16:23:56] FIRING: SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:29:45] siddharthvp opened https://github.com/toolforge/quarry/pull/91 [16:37:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10929005 (10Jhancock.wm) @Andrew i got these to the point where the image is on them, but for some reason it's not syncing with the puppetdb. Could you chec... [16:50:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10929057 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host cloudcephosd2005.codfw.wmnet with OS bullseye ex... [16:51:27] 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266#10929059 (10Mhurd) [16:57:00] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10929065 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host cloudcephosd2007.codfw.wmnet with OS bullseye ex... [17:10:38] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991#10929092 (10SD0001) Nunjucks is only used in JS. We can remove the manual build step by making it a part of docker build. I don't think there's a need to remove the built step altogether as it comes at the cost of ha... [17:22:35] 10Cloud-Services, 07ContentSecurityPolicy, 10Continuous-Integration-Infrastructure (Zuul upgrade): Object storage web service CSP does not allow inline images - https://phabricator.wikimedia.org/T397351 (10hashar) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the lis... [17:24:55] 06cloud-services-team, 10Data-Services, 07ContentSecurityPolicy, 10Continuous-Integration-Infrastructure (Zuul upgrade): Object storage web service CSP does not allow inline images - https://phabricator.wikimedia.org/T397351#10929174 (10hashar) [17:27:57] 10Quarry: quarry: Drop manual frontend build process - https://phabricator.wikimedia.org/T396991#10929192 (10SD0001) https://github.com/toolforge/quarry/pull/91 [17:28:41] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10929194 (10Andrew) We can't tell if those postgresql errors matter. They seem to have been happening since at le... [17:30:58] 06cloud-services-team, 10Cloud-VPS, 07ContentSecurityPolicy, 10Continuous-Integration-Infrastructure (Zuul upgrade): Object storage web service CSP does not allow inline images - https://phabricator.wikimedia.org/T397351#10929198 (10taavi) a:03taavi [17:33:47] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10929205 (10Andrew) @Jhancock.wm I will have a look. I've also just noticed that the names for these servers is wrong, everything should be -dev. I'll updat... [17:37:14] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph}' [17:40:09] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10929229 (10Andrew) [17:43:02] (03CR) 10David Caro: [V:03+1] "Tested in my local using horizon/deploy (see https://phabricator.wikimedia.org/F62379446)" [openstack/horizon/wmf-proxy-dashboard] - 10https://gerrit.wikimedia.org/r/1160677 (https://phabricator.wikimedia.org/T397272) (owner: 10David Caro) [17:58:03] 06cloud-services-team, 10Cloud-VPS, 07ContentSecurityPolicy, 10Continuous-Integration-Infrastructure (Zuul upgrade), 13Patch-For-Review: Object storage web service CSP does not allow inline images - https://phabricator.wikimedia.org/T397351#10929296 (10taavi) 05Open→03Resolved a:05taavi→03hash... [18:02:45] 06cloud-services-team: SystemdUnitDown The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://phabricator.wikimedia.org/T397105#10929315 (10dcaro) We also did a backup and restore of a volume and it worked well 🎉 so probably harmless [18:18:56] FIRING: SystemdUnitDown: The systemd unit nova-fullstack.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:19:01] 06cloud-services-team: SystemdUnitDown The systemd unit nova-fullstack.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T397357 (10phaultfinder) 03NEW [18:27:00] 06cloud-services-team, 10Cloud-VPS, 07ContentSecurityPolicy, 10Continuous-Integration-Infrastructure (Zuul upgrade), 13Patch-For-Review: Object storage web service CSP does not allow inline images - https://phabricator.wikimedia.org/T397351#10929381 (10hashar) That is wonderful thank you. [18:33:56] RESOLVED: SystemdUnitDown: The service unit nova-fullstack.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:39:26] RESOLVED: SystemdUnitDown: The systemd unit nova-fullstack.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:43:15] 10Tool-quickcategories, 10gadget-Cat-a-lot: Allow passing actions and title URL parameters to QuickCategories batch creation from PagePile - https://phabricator.wikimedia.org/T397320#10929408 (10LucasWerkmeister) 05Open→03Resolved a:03adiba_anjum Merged and deployed – thanks a lot! [18:43:22] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:47:56] FIRING: SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:52:56] RESOLVED: SystemdUnitDown: The service unit cinder-api.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:54:55] 10Tool-quickcategories, 10gadget-Cat-a-lot: Allow passing actions and title URL parameters to QuickCategories batch creation from PagePile - https://phabricator.wikimedia.org/T397320#10929449 (10LucasWerkmeister) And also [announced on Mastodon](https://wikis.world/@LucasWerkmeister/114705868117557526) bec... [18:59:52] RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:12:11] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:15:43] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:23:25] (03update) 10chuckonwumelu: show: Display latest deployment if no deploy_id included [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/36 (https://phabricator.wikimedia.org/T394994) [19:23:52] 10Tool-quickcategories, 10gadget-Cat-a-lot: Allow passing actions and title URL parameters to QuickCategories batch creation from PagePile - https://phabricator.wikimedia.org/T397320#10929496 (10adiba_anjum) That's awesome! thank you so much for merging and deploying it. Super cool that it even made it to... [20:27:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [20:31:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10929741 (10Jhancock.wm) actually, i think that would have been it. i usually only get that error when it's not in the site.pp file. my bad [20:39:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [20:53:52] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567]-dev service implementation - https://phabricator.wikimedia.org/T397237#10929828 (10Andrew) [21:24:09] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10929935 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2005.codfw.wmnet with OS bul... [21:34:19] (03PS1) 10Dzahn: add fake password for phab test db admin user [labs/private] - 10https://gerrit.wikimedia.org/r/1161051 (https://phabricator.wikimedia.org/T377889) [21:34:43] (03CR) 10Dzahn: [V:03+2 C:03+2] add fake password for phab test db admin user [labs/private] - 10https://gerrit.wikimedia.org/r/1161051 (https://phabricator.wikimedia.org/T377889) (owner: 10Dzahn) [21:35:07] (03PS2) 10Dzahn: add fake password for phab test db admin user [labs/private] - 10https://gerrit.wikimedia.org/r/1161051 (https://phabricator.wikimedia.org/T377889) [21:35:17] (03CR) 10Dzahn: [V:03+2] add fake password for phab test db admin user [labs/private] - 10https://gerrit.wikimedia.org/r/1161051 (https://phabricator.wikimedia.org/T377889) (owner: 10Dzahn) [22:12:20] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930109 (10Dzahn) T377236 explains how this was done in the past [22:14:36] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930113 (10Dzahn) ` root@phabricator-bullseye:/srv/phab/phabricator/bin# sudo ./user approve --user a_smart_kitten__test Usage Exception: User acco... [22:21:28] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930116 (10Dzahn) 05Open→03In progress [22:42:42] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930158 (10Dzahn) Ok.. so for the "verify" step after this the email address needs to be verified, not the user name. To get the email address, go... [22:46:04] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930160 (10Dzahn) finally, give the users the elevated rights: ` root@phabricator-bullseye:/srv/phab/phabricator/bin# sudo ./user empower --user... [22:48:19] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930164 (10Dzahn) @A_smart_kitten This should work now! If there are any problems please reopen and contact my team since I will be out until end of... [22:48:28] 10VPS-project-Phabricator, 06collaboration-services: Requesting manual activation of phabricator.wmcloud.org accounts - https://phabricator.wikimedia.org/T397280#10930165 (10Dzahn) 05In progress→03Resolved [22:51:25] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10930166 (10bd808) https://ldap.toolforge.org/user/zuuldevopsbot was not added to the bastion project when it was added to the zuul project. I will add it manua... [23:51:12] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10930246 (10bd808) >>! In T379550#10930166, @bd808 wrote: > https://ldap.toolforge.org/user/zuuldevopsbot was not added to the bastion project when it was added...