[00:05:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 82.35% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[00:09:48] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509)
[00:11:29] <wikibugs>	 (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509)
[00:13:09] <wikibugs>	 (03update) 10raymond-ndibe: [maintain-harbor.jobs] manage policies and robot accounts [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/47 (https://phabricator.wikimedia.org/T360509)
[00:55:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 80.33% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[04:50:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[04:54:52] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[05:32:00] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[05:32:12] <wikibugs>	 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T399144 (10phaultfinder) 03NEW
[06:12:59] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10990367 (10elukey) @Jclark-ctr IIUC it was a temporary failure right?
[08:12:06] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Prevent creation of VMs on the old ipv4 network - https://phabricator.wikimedia.org/T399127#10990858 (10dcaro) >>! In T399127#10990055, @bd808 wrote: >>>! In T396936#10945434, @bd808 wrote: >>>>! In T396936#10937326, @taavi wrote: >>> If Magnum doesn't support dual-stack clu...
[08:16:45] <wikibugs>	 (03merge) 10dcaro: packaging: change name to match the rest of clis [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/110
[08:27:38] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10990902 (10fnegri) It's indeed `diffscan02` (`172.16.3.44`) causing the spike around 00:00 UTC:  ` Wed Jul  9 11:57:44 PM UTC 2025 conntrack v1.4.7 (conntrack-tools): 13424 flow entries...
[08:56:28] <wikibugs>	 (03open) 10arthurtaylor: Add deprecation notice [toolforge-repos/wcsg-workboard-log] - 10https://gitlab.wikimedia.org/toolforge-repos/wcsg-workboard-log/-/merge_requests/1
[08:56:40] <wikibugs>	 (03merge) 10arthurtaylor: Add deprecation notice [toolforge-repos/wcsg-workboard-log] - 10https://gitlab.wikimedia.org/toolforge-repos/wcsg-workboard-log/-/merge_requests/1
[09:11:30] <wikibugs>	 (03open) 10dcaro: d/changelog: bump to 16.1.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/111 (https://phabricator.wikimedia.org/T399080)
[09:26:53] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli
[09:26:58] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli
[09:30:30] <wikibugs>	 (03PS1) 10David Caro: toolforge.component.deploy: rename jobs-cli [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1167829
[09:30:50] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli
[09:31:14] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli
[09:32:15] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[10:00:29] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10991125 (10fnegri) The alert started firing daily on 2025-06-27 because the limit changed:  {F63744349}
[10:13:09] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4): WMCS Offboarding: Chuck Onwumelu - https://phabricator.wikimedia.org/T399068#10991147 (10fnegri) 05Resolved→03In progress
[10:15:19] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4): WMCS Offboarding: Chuck Onwumelu - https://phabricator.wikimedia.org/T399068#10991168 (10fnegri)
[10:17:06] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4): Create WMCS offboarding checklist - https://phabricator.wikimedia.org/T398972#10991175 (10fnegri) 05In progress→03Resolved Marking as Resolved, feel free to edit the template directly if you find something is missing.
[10:22:59] <wikibugs>	 06cloud-services-team, 10Toolforge, 10MediaWiki-Action-API: APIhighlimits doesn't work on my bot (with bot password) since July 15, 2020 - https://phabricator.wikimedia.org/T258057#10991207 (10Aklapper)
[10:37:00] <jinxer-wm>	 RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[10:42:31] <wikibugs>	 (03open) 10dcaro: toolforge_get_version: use the new jobs-cli package name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/878
[10:44:53] <wikibugs>	 (03approved) 10dcaro: toolforge_get_version: use the new jobs-cli package name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/878
[10:44:59] <wikibugs>	 (03merge) 10dcaro: toolforge_get_version: use the new jobs-cli package name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/878
[10:45:00] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli
[10:45:26] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli
[11:25:03] <icinga-wm>	 PROBLEM - Host cloudcephosd1007 is DOWN: PING CRITICAL - Packet loss = 100%
[11:28:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[11:30:05] <icinga-wm>	 RECOVERY - Host cloudcephosd1007 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[11:31:44] <wikibugs>	 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#10991485 (10cmooney) 05Stalled→03Resolved a:03cmooney I am going to close this one (please ping me if that is hasty!) as I've o...
[11:32:28] <wmcs-alerts>	 FIRING: InstanceDown: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:44:01] <wikibugs>	 (03approved) 10dcaro: jobs-cli: use the new package name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/255
[11:44:26] <wikibugs>	 (03merge) 10dcaro: jobs-cli: use the new package name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/255
[11:45:44] <wikibugs>	 (03open) 10dcaro: ansible: use the new misctools package name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/256
[11:46:24] <wikibugs>	 (03approved) 10dcaro: ansible: use the new misctools package name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/256
[11:49:50] <wm-bot2>	 !log dcaro@hephaestus cloudinfra START - Cookbook wmcs.openstack.cloudvirt.vm_console
[11:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL
[11:51:00] <wm-bot2>	 !log dcaro@hephaestus cloudinfra END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[11:51:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL
[11:51:04] <wm-bot2>	 !log dcaro@hephaestus cloudinfra START - Cookbook wmcs.openstack.cloudvirt.vm_console
[11:51:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL
[11:52:02] <wm-bot2>	 !log dcaro@hephaestus cloudinfra END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
[11:52:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL
[11:52:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:52:39] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10991577 (10cmooney) I created the below task to continue the discussion of how we set up the interfaces for these hosts, and cop...
[12:12:40] <wikibugs>	 (03approved) 10dcaro: d/changelog: bump to 16.1.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/111 (https://phabricator.wikimedia.org/T399080)
[12:12:44] <wikibugs>	 (03merge) 10dcaro: d/changelog: bump to 16.1.15 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/111 (https://phabricator.wikimedia.org/T399080)
[12:15:57] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "Used to deploy the latest toolforge-misctools-cli and jobs-cli, merging" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro)
[12:16:20] <wikibugs>	 (03CR) 10David Caro: [C:03+2] "Used to deploy jobs-cli, needed now, merging" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1167829 (owner: 10David Caro)
[12:17:14] <wikibugs>	 (03merge) 10dcaro: ansible: use the new misctools package name [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/256
[12:20:46] <wikibugs>	 (03Merged) 10jenkins-bot: toolforge.component.deploy: support multiarch packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1166830 (https://phabricator.wikimedia.org/T398016) (owner: 10David Caro)
[12:21:21] <wikibugs>	 (03Merged) 10jenkins-bot: toolforge.component.deploy: rename jobs-cli [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1167829 (owner: 10David Caro)
[12:25:13] <wikibugs>	 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [lima-kilo,misctools] no arm64 version for mac-os based installations - https://phabricator.wikimedia.org/T398016#10991647 (10dcaro) 05Open→03In progress
[12:25:16] <wikibugs>	 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [lima-kilo,misctools] no arm64 version for mac-os based installations - https://phabricator.wikimedia.org/T398016#10991650 (10dcaro) 05In progress→03Resolved
[12:27:12] <wikibugs>	 (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/41
[12:27:12] <wikibugs>	 (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/4
[12:34:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[12:35:29] <wikibugs>	 10Toolforge (Toolforge iteration 21): [clis] standardize the package names - https://phabricator.wikimedia.org/T399080#10991666 (10dcaro)
[12:37:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[12:42:13] <icinga-wm>	 PROBLEM - Host cloudcephosd1007 is DOWN: PING CRITICAL - Packet loss = 100%
[12:42:41] <icinga-wm>	 RECOVERY - Host cloudcephosd1007 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms
[12:42:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[12:47:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[12:58:35] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_from_codfw_ip6) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:58:43] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T399189 (10phaultfinder) 03NEW
[12:58:53] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:00:54] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[13:01:50] <wmcs-alerts>	 FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:02:03] <wikibugs>	 10Tools, 10Pywikibot, 10Pywikibot-Scripts: Implement a webservice at toolforge.org based on create_isbn_edition script - https://phabricator.wikimedia.org/T379488#10991840 (10Xqt) 05Open→03Declined See T398140
[13:02:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[13:03:10] <wmcs-alerts>	 FIRING: ProjectProxyMainProxyDown: Proxy service address is unreachable - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MainProxyDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown
[13:03:11] <icinga-wm>	 PROBLEM - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 324 bytes in 60.006 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[13:03:35] <jinxer-wm>	 RESOLVED: [3x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_from_codfw_ip6) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:03:39] <icinga-wm>	 RECOVERY - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 26.026 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[13:03:53] <wmcs-alerts>	 FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:06:49] <wmcs-alerts>	 FIRING: [4x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:07:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[13:08:09] <wm-bot2>	 !log dcaro@hephaestus project-proxy START - Cookbook wmcs.openstack.cloudvirt.vm_console
[13:08:10] <wmcs-alerts>	 RESOLVED: ProjectProxyMainProxyDown: Proxy service address is unreachable - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MainProxyDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyDown
[13:08:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL
[13:08:14] <wm-bot2>	 !log dcaro@hephaestus project-proxy END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99)
[13:08:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL
[13:08:53] <wmcs-alerts>	 RESOLVED: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:11:50] <wmcs-alerts>	 RESOLVED: [4x] ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_admin_beta_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:13:31] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10991863 (10fnegri) The value for nf_conntrack_max was increased in {T355222} and again in {T373816}. The current value set in `modules/profile/manifests/openstack/base/nova/compute/servi...
[13:33:29] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10991952 (10fnegri) Grafana shows the value changed on all cloudvirts, but not at the same time. I think the value failed to be reapplied after the latest reboot of cloudvirts:  ` root@cl...
[13:43:00] <wikibugs>	 (03PS1) 10David Caro: tools-webservice: remove as it moved to gitlab [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1167865
[13:44:44] <wikibugs>	 (03merge) 10dcaro: deploy: allow retrieving a deploy with a token [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/109
[13:45:10] <wikibugs>	 (03update) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724)
[13:47:14] <wikibugs>	 (03merge) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724)
[13:47:48] <wikibugs>	 (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.135-20250710134503-c7e0923f [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T398485)
[13:50:06] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.136-20250710134726-f76face5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485)
[13:50:09] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.136-20250710134726-f76face5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485)
[13:57:10] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10992160 (10Jclark-ctr) >>! In T394333#10990367, @elukey wrote: > @Jclark-ctr IIUC it was a temporary failure right?  yes that wa...
[13:59:28] <wmcs-alerts>	 FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance tools-puppetserver-01 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[14:00:32] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10992170 (10fnegri) The last change to the value was actually in {T387179} with patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/1124821 setting the value to `33554432`, which i...
[14:01:04] <wikibugs>	 (03update) 10dcaro: toolconfig: make config_version explicitly nullable [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/108
[14:02:57] <wikibugs>	 (03merge) 10dcaro: toolconfig: make config_version explicitly nullable [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/108
[14:05:11] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.137-20250710140310-6d0932f6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485)
[14:05:14] <wikibugs>	 (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.137-20250710140310-6d0932f6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485)
[14:06:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:24:28] <wmcs-alerts>	 RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance tools-puppetserver-01 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure
[14:29:01] <wikibugs>	 06cloud-services-team, 10Toolforge: toolsbeta paging - https://phabricator.wikimedia.org/T396038#10992349 (10dcaro) Was opened due to {T398715} (adding for context)
[14:31:52] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10992376 (10fnegri) `sysctl --system` does fix the issue:  ` root@cloudvirt1067:~# cat /proc/sys/net/nf_conntrack_max 524288  root@cloudvirt1067:~# sysctl --system  root@cloudvirt1067:~#...
[14:54:02] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212 (10fnegri) 03NEW
[14:55:06] <wikibugs>	 06cloud-services-team: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#10992502 (10fnegri) I created a subtask {T399212} to address the root cause of the alert described by this task.
[14:57:23] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212#10992507 (10fnegri) I'm not sure what is loading the `nf_conntrack` module, because the module is loaded eventually, and I can apply the values with `sysctl --system`:  ` root@cloud...
[15:00:47] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212#10992512 (10fnegri) Maybe it's loaded by openvswitch:  ` root@cloudvirt1067:~# lsmod |grep nf_conntrack nf_conntrack_netlink    57344  0 nfnetlink              20480  3 nfnetlink_ct...
[15:03:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[15:03:47] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.reactivate (exit_code=97)
[15:03:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[15:05:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[15:07:26] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[15:07:28] <wikibugs>	 (03approved) 10dcaro: components-api: bump to 0.0.137-20250710140310-6d0932f6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:07:34] <wikibugs>	 (03merge) 10dcaro: components-api: bump to 0.0.137-20250710140310-6d0932f6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/879 (https://phabricator.wikimedia.org/T397724 https://phabricator.wikimedia.org/T398485) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620)
[15:11:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[16:36:35] <wikibugs>	 (03Abandoned) 10Wandji collins: Merge branch 'main' of https://gerrit.wikimedia.org/r/labs/tools/wdaudiolex-be [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132142 (owner: 10UnknownStrange)
[16:40:30] <wikibugs>	 (03Abandoned) 10Wandji collins: Remove duplicate routes copy file [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132112 (https://phabricator.wikimedia.org/T386326) (owner: 10Juniorbesong)
[16:43:48] <wikibugs>	 (03CR) 10Eugene233: "Change seems to be abandoned. Is this still relevant? Then it could be restored." [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132112 (https://phabricator.wikimedia.org/T386326) (owner: 10Juniorbesong)
[16:45:57] <wikibugs>	 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: [ceph] Upgrade to v16 - https://phabricator.wikimedia.org/T306820#10992899 (10Andrew) ceph eqiad11 is now running 16.2.15 on all nodes. One mon and one OSD are on bookworm...
[17:04:19] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [components-api,beta] CI pipelines should wait until Toolforge deployment is 100% successful - https://phabricator.wikimedia.org/T398485#10992948 (10dcaro) This should be already available: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_yo...
[17:04:42] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): ZuulDevOpsBot user can create but not delete a cluster template - https://phabricator.wikimedia.org/T396932#10992949 (10bd808) 05Open→03Invalid The tofu automation using ZuulDevOpsBot's credentials was able to...
[17:54:16] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: cloudcephosd10[48-51] service implementation - https://phabricator.wikimedia.org/T395910#10993075 (10cmooney) We may need to hold off on this for now.  The requirement for jumbo frames poses a difficulty for the plan as the parent i...
[18:45:15] <wikibugs>	 (03CR) 10Wandji collins: "I am doing a cleanup; this patch appeared multiple times, and the change is outdated." [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132112 (https://phabricator.wikimedia.org/T386326) (owner: 10Juniorbesong)
[18:56:24] <wikibugs>	 (03Abandoned) 10Wandji collins: T388196 BE - Route to add best match audio file to lexeme [labs/tools/wdaudiolex-be] - 10https://gerrit.wikimedia.org/r/1132178 (owner: 10UnknownStrange)
[19:02:36] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10993245 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host cloudcephosd1048.eq...
[19:10:29] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: "SSD firmware fetch from DELL website not yet implemented" - https://phabricator.wikimedia.org/T399234 (10Andrew) 03NEW
[19:34:46] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of Clipi VPS project - https://phabricator.wikimedia.org/T399237 (10IhsaanKhan) 03NEW
[19:41:12] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of Clipi VPS project - https://phabricator.wikimedia.org/T399237#10993382 (10JJMC89) a:05IhsaanKhan→03None
[19:46:21] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install cloudcephosd10[48-51] - https://phabricator.wikimedia.org/T394333#10993386 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host cloudcephosd1048.eqiad....
[19:50:08] <wikibugs>	 (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/41 (owner: 10l10n-bot)
[19:50:10] <wikibugs>	 (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/41 (owner: 10l10n-bot)
[19:56:30] <wikibugs>	 06cloud-services-team, 10Toolforge: Missing bash completion for `become` - https://phabricator.wikimedia.org/T399238 (10LucasWerkmeister) 03NEW
[19:58:24] <wikibugs>	 06cloud-services-team, 10Toolforge: Missing bash completion for `become` - https://phabricator.wikimedia.org/T399238#10993408 (10LucasWerkmeister) CCing @dcaro based on [the latest commits in that package](https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/commits/main) (and because Taavi, the...
[19:58:41] <wikibugs>	 (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/4 (owner: 10l10n-bot)
[19:58:44] <wikibugs>	 (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/4 (owner: 10l10n-bot)
[20:04:41] <wikibugs>	 06cloud-services-team, 10Toolforge: Missing bash completion for `become` - https://phabricator.wikimedia.org/T399238#10993422 (10bd808) I wonder if https://gitlab.wikimedia.org/repos/cloud/toolforge/misctools-cli/-/commit/0e495dbffcfe7c14cc9b92b6a5cf937dbf795e0a needed to update more `debian/*` files when rena...
[20:09:35] <wikibugs>	 06cloud-services-team, 10Toolforge: Missing bash completion for `become` - https://phabricator.wikimedia.org/T399238#10993440 (10LucasWerkmeister) Maybe… I don’t know how bash-completion files are installed in Debian packages, but they vanished from the installed files in 1.49.2:  `lang=shell-session lucaswerk...
[20:40:29] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993548 (10RobH)
[20:44:53] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1035.eqia...
[21:28:58] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:29:16] <wikibugs>	 06cloud-services-team, 10Toolforge: Missing bash completion for `become` - https://phabricator.wikimedia.org/T399238#10993703 (10bd808) My theory would be that `debian/misctools.bash-completion` and `debian/misctools.lintian-overrides` need to be renamed to `debian/toolforge-misctools-cli.bash-completion` and...
[21:31:48] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: "SSD firmware fetch from DELL website not yet implemented" - https://phabricator.wikimedia.org/T399234#10993706 (10RobH) 05Open→03Resolved a:03RobH IRC Update:  The file it was looking for didn't exist on the cumin1003 host, but does on cumin20...
[21:32:43] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1035.eqiad.wm...
[21:38:58] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:47:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[21:48:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[22:16:55] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993757 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1036.eqia...
[22:54:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[22:55:19] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[22:59:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[23:00:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[23:03:16] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993799 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin2002 for host cloudcephosd1036.eqiad.wm...
[23:59:24] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: SSD firmware update for cloudcephosd10[35-41] - https://phabricator.wikimedia.org/T396651#10993838 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin2002 for host cloudcephosd1037.eqia...