[00:05:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [00:11:06] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2010-dev.codfw.wmnet' [00:29:07] (03CR) 10CI reject: [V:04-1] build: Updating lodash to 4.17.23 [labs/striker] - 10https://gerrit.wikimedia.org/r/1229802 (owner: 10Libraryupgrader) [01:31:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2010-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:39:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2010-dev.codfw.wmnet' [01:49:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2010-dev.codfw.wmnet' [02:07:56] FIRING: [2x] SystemdUnitDown: The service unit remove_dangling_cinder_snapshots.service is in failed status on host cloudbackup1001-dev. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:27:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2010-dev.codfw.wmnet' [02:39:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2010-dev.codfw.wmnet' [03:20:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:47:36] (03open) 10samwilson: Default to language 'mul' if none can be determined [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/6 (https://phabricator.wikimedia.org/T389911) [04:02:56] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:50:17] (03open) 10raymond-ndibe: support exposing continuous jobs to the internet [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/262 (https://phabricator.wikimedia.org/T388092) [04:56:57] (03update) 10raymond-ndibe: Allow to force ARM kubernetes stack [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/303 (owner: 10volans) [04:57:09] (03update) 10raymond-ndibe: support exposing continuous jobs to the internet [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/262 (https://phabricator.wikimedia.org/T388092) [05:24:28] (03open) 10raymond-ndibe: support publishing continuous jobs to the internet [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/142 (https://phabricator.wikimedia.org/T388092) [07:22:18] 06cloud-services-team, 10Toolforge: Toolforge SSH login: connection closed after publickey authentication - https://phabricator.wikimedia.org/T415239 (10JacobHung) 03NEW [08:03:11] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:55:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [09:00:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [09:24:34] (03PS1) 10Neriah: frontend: Add dark mode support via prefers-color-scheme [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1230251 [09:25:39] (03PS2) 10Neriah: frontend: Add dark mode support via prefers-color-scheme [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1230251 [09:28:33] (03CR) 10Neriah: "recheck" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1227712 (https://phabricator.wikimedia.org/T414776) (owner: 10Akaza24) [09:29:27] (03CR) 10CI reject: [V:04-1] Fix request timeout issue by adding timeouts to Flask requests [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1227712 (https://phabricator.wikimedia.org/T414776) (owner: 10Akaza24) [09:37:33] 10Cloud-VPS (Quota-requests): Increase quota for wikiqlever - https://phabricator.wikimedia.org/T414983#11544307 (10Physikerwelt) >>! In T414983#11542790, @Andrew wrote: > +1 this is just fine if the project winds up succeeding and having users. Please try to fail fast and let us know if you wind up not using th... [09:59:32] (03PS3) 10Akaza24: Fix request timeout issue by adding timeouts to Flask requests [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1227712 (https://phabricator.wikimedia.org/T414776) [11:06:15] (03update) 10taavi: server: Allowlist permitted annotations [repos/cloud/toolforge/ingress-admission] (push-pqmrmzqtznvs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/35 (https://phabricator.wikimedia.org/T415192) [12:03:11] FIRING: [2x] SystemdUnitDown: The systemd unit remove_dangling_cinder_snapshots.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:32:44] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/29 [12:54:27] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/29 (owner: 10l10n-bot) [12:54:31] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/29 (owner: 10l10n-bot) [13:27:40] 06cloud-services-team, 06SRE: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#11544977 (10jijiki) [13:38:05] 06cloud-services-team, 10Cloud-VPS, 06SRE: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#11545004 (10taavi) [13:52:04] 06cloud-services-team, 10Toolforge: Toolforge SSH login: connection closed after publickey authentication - https://phabricator.wikimedia.org/T415239#11545060 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [13:53:25] 06cloud-services-team, 10Toolforge: Toolforge SSH login: connection closed after publickey authentication - https://phabricator.wikimedia.org/T415239#11545069 (10fnegri) Looks like for some reason the user is missing the `project-tools` LDAP group: https://ldap.toolforge.org/user/JacobHung [14:01:04] 06cloud-services-team, 10Toolforge: Toolforge SSH login: connection closed after publickey authentication - https://phabricator.wikimedia.org/T415239#11545113 (10fnegri) I don't see the user request in https://toolsadmin.wikimedia.org/tools/membership/, @JacobHung did you [[ https://wikitech.wikimedia.org/wiki... [14:03:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices2005-dev.codfw.wmnet' [14:10:17] FIRING: JobUnavailable: Reduced availability for job pdns in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:11:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudservices2005-dev.codfw.wmnet' [14:14:05] FIRING: [2x] HostBGPDown: BGP session for cloudservices2005-dev (172.20.5.9) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [14:15:17] RESOLVED: [2x] JobUnavailable: Reduced availability for job pdns in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:17:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudservices2004-dev.codfw.wmnet' [14:19:05] RESOLVED: [2x] HostBGPDown: BGP session for cloudservices2005-dev (172.20.5.9) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [14:24:02] FIRING: [2x] JobUnavailable: Reduced availability for job pdns in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:25:28] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudservices2004-dev.codfw.wmnet' [14:26:35] FIRING: [4x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [14:27:47] RESOLVED: [3x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:30:42] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [14:31:35] RESOLVED: [4x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [14:33:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for all services [15:10:06] 06cloud-services-team, 10Toolforge: Toolforge NFS tracing misses some dumps events - https://phabricator.wikimedia.org/T415199#11545498 (10Volans) a:03Volans [15:32:16] (03open) 10pepepiton: Oauth implementation [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/177 [15:36:21] 06cloud-services-team, 10Toolforge: Toolforge NFS tracing misses some dumps events - https://phabricator.wikimedia.org/T415199#11545637 (10Volans) Thanks for the report, that's why I was asking if we knew of tools using for example dumps. So it looks like the pre-existing code had an optimization to exclude an... [15:47:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet2006-dev.codfw.wmnet' [15:56:29] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudnet2006-dev.codfw.wmnet' [15:58:23] PROBLEM - Check nf_conntrack usage in neutron netns on cloudnet2006-dev is CRITICAL: CRITICAL: no netns defined? https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [16:00:21] (03update) 10fnegri: Use the image name as provided by builds-api [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T403322) (owner: 10damian) [16:11:37] (03update) 10pepepiton: Oauth implementation [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/177 [16:15:19] (03merge) 10pepepiton: Oauth implementation [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/177 [16:41:39] 10Cloud-VPS (Quota-requests): Increase quota for wikiqlever - https://phabricator.wikimedia.org/T414983#11545959 (10Physikerwelt) >>! In T414983#11545851, @taavi wrote: >> A single 32GB instance is not sufficient to handle the load > > What data do you have to believe that the proposed quota increase is enough... [16:57:30] (03approved) 10fnegri: Cleanup [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/34 (owner: 10taavi) [16:57:30] (03update) 10fnegri: Cleanup [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/34 (owner: 10taavi) [17:02:28] RECOVERY - Check nf_conntrack usage in neutron netns on cloudnet2006-dev is OK: OK: everything is apparently fine https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [17:03:49] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet2006-dev.codfw.wmnet' [17:03:56] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=97) on host 'cloudnet2006-dev.codfw.wmnet' [17:04:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudnet2005-dev.codfw.wmnet' [17:14:01] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudnet2005-dev.codfw.wmnet' [17:31:02] (03update) 10fnegri: runtime::diff_with_running_job: temp conditional to force job version upgrade from v1 -> v2 [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/259 (https://phabricator.wikimedia.org/T359649) (owner: 10raymond-ndibe) [17:33:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt2004-dev.codfw.wmnet' [17:38:11] (03update) 10taavi: server: Allowlist permitted annotations [repos/cloud/toolforge/ingress-admission] (push-pqmrmzqtznvs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/35 (https://phabricator.wikimedia.org/T415192) [17:40:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt2004-dev.codfw.wmnet' [17:49:49] (03merge) 10taavi: Cleanup [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/34 [17:49:51] (03update) 10taavi: server: Allowlist permitted annotations [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/35 (https://phabricator.wikimedia.org/T415192) [17:55:15] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: ingress-admission: bump to 0.0.76-20260122175000-402ddb48 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1112 [17:57:03] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission [18:01:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt2005-dev.codfw.wmnet' [18:05:54] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission [18:07:52] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission [18:08:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt2005-dev.codfw.wmnet' [18:16:32] 10Tool-campwiz-nxt, 10Google-Summer-of-Code (Google Summer of Code (2026)): GSoC 2026: CampWiz NxT Redesign - https://phabricator.wikimedia.org/T414269#11546479 (10LGoto) Hi @Nokib_Sarkar Reminder to please see my previous [[ https://phabricator.wikimedia.org/T414269#11513123 | comment ]] and update this task... [18:16:59] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission [18:18:50] (03merge) 10taavi: ingress-admission: bump to 0.0.76-20260122175000-402ddb48 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1112 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [18:25:54] (03approved) 10fnegri: Use the image name as provided by builds-api [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T403322) (owner: 10damian) [18:26:01] (03update) 10fnegri: Use the image name as provided by builds-api [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T403322) (owner: 10damian) [18:26:39] (03update) 10fnegri: Use the image name as provided by builds-api [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T403322) (owner: 10damian) [18:27:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack on host 'cloudvirt2006-dev.codfw.wmnet' [18:33:31] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) on host 'cloudvirt2006-dev.codfw.wmnet' [18:34:14] (03approved) 10fnegri: server: Allowlist permitted annotations [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/35 (https://phabricator.wikimedia.org/T415192) (owner: 10taavi) [18:36:38] (03open) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [18:37:29] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [18:37:52] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [18:39:49] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [18:40:26] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [19:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:11:25] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [19:21:27] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [20:35:30] (03update) 10raymond-ndibe: images::from_url_or_name: match variants of the same image [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/260 (https://phabricator.wikimedia.org/T414978) [20:38:08] (03open) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] (match_variants_of_the_same_image) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 [21:42:56] FIRING: SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:52:56] FIRING: [2x] SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [21:56:18] 10Tool-bridgebot: Investigate switching to the matterbridge-org/matterbridge fork - https://phabricator.wikimedia.org/T415313 (10bd808) 03NEW [22:02:07] PROBLEM - Memcached on cloudcontrol1011 is CRITICAL: connect to address 10.64.151.8 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [22:02:07] PROBLEM - Memcached on cloudcontrol1006 is CRITICAL: connect to address 10.64.150.6 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [22:02:22] FIRING: [3x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:02:56] FIRING: [2x] SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:03:07] RECOVERY - Memcached on cloudcontrol1011 is OK: TCP OK - 0.000 second response time on 10.64.151.8 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [22:03:07] RECOVERY - Memcached on cloudcontrol1006 is OK: TCP OK - 0.000 second response time on 10.64.150.6 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [22:07:22] RESOLVED: [7x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:07:56] FIRING: [6x] SystemdUnitDown: The service unit designate-producer.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:12:56] RESOLVED: SystemdUnitDown: The service unit drain_rabbitmq_notification_error.service is in failed status on host cloudrabbit1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [22:20:12] 10Tool-bridgebot: Investigate switching to the matterbridge-org/matterbridge fork - https://phabricator.wikimedia.org/T415313#11546991 (10bd808) I think the thing to try here would be a branch of https://gitlab.wikimedia.org/toolforge-repos/bridgebot that pulls in a treeish from https://github.com/matterbridge-o... [22:25:11] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:32:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:42:19] 10Tools, 06Release-Engineering-Team: versions % doesn't add up to 100% - https://phabricator.wikimedia.org/T415318 (10Reedy) 03NEW [22:42:25] 10Tools, 06Release-Engineering-Team: versions % doesn't add up to 100% - https://phabricator.wikimedia.org/T415318#11547069 (10Reedy) p:05Triage→03Low [22:43:21] 10Tools, 06Release-Engineering-Team: versions % doesn't add up to 100% - https://phabricator.wikimedia.org/T415318#11547070 (10Reedy) [22:46:57] (03open) 10reedy: index.php: Show wiki % to 2 decimal places [toolforge-repos/versions] - 10https://gitlab.wikimedia.org/toolforge-repos/versions/-/merge_requests/8 (https://phabricator.wikimedia.org/T415318) [22:52:00] FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [22:57:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:27:11] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978) [23:28:29] (03update) 10aghirelli: Draft: feat(linter): add new rules [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/10 [23:32:07] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [jobs-api] check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#11547195 (10Raymond_Ndibe) 05Stalled→03Invalid [23:32:12] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [jobs-api] check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#11547198 (10Raymond_Ndibe) 05Invalid→03Resolved [23:33:22] 10Toolforge (Toolforge iteration 25), 13Patch-For-Review: [jobs-api] check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#11547200 (10JJMC89) 05Resolved→03Invalid [23:34:21] 10Toolforge (Toolforge iteration 25): Replace job image variants with webservice image variants - https://phabricator.wikimedia.org/T415322 (10Raymond_Ndibe) 03NEW [23:36:14] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] (match_variants_of_the_same_image) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:36:50] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] (match_variants_of_the_same_image) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:37:21] (03open) 10raymond-ndibe: values.yaml: replace job image variants with webservice image variants [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/18 (https://phabricator.wikimedia.org/T415322) [23:41:05] (03update) 10raymond-ndibe: support publishing continuous jobs to the internet [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/142 (https://phabricator.wikimedia.org/T388092) [23:42:31] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] (match_variants_of_the_same_image) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:50:15] (03update) 10aghirelli: Draft: feat(linter): add new rules [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/10 [23:51:56] (03update) 10aghirelli: Draft: feat(linter): add new rules [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/10 [23:55:39] 06cloud-services-team, 10Toolforge: "toolforge build logs" shows error when logs have expired - https://phabricator.wikimedia.org/T415324 (10LucasWerkmeister) 03NEW [23:55:48] 06cloud-services-team, 10Toolforge: "toolforge build logs" shows error when logs have expired - https://phabricator.wikimedia.org/T415324#11547259 (10LucasWerkmeister) [23:56:30] (03update) 10raymond-ndibe: values.yaml: replace job image variants with webservice image variants [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/18 (https://phabricator.wikimedia.org/T415322) [23:58:10] (03update) 10raymond-ndibe: support publishing continuous jobs to the internet [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/142 (https://phabricator.wikimedia.org/T388092)