[00:01:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[00:06:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:27:28] <wikibugs>	 10Toolforge (Quota-requests): Request increased build quota for toc Toolforge tool - https://phabricator.wikimedia.org/T398780#11007417 (10Kanashimi) I try `cpu: 0.25` and it works. Thank you.
[01:01:56] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[03:09:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[03:59:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[05:00:56] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[05:05:56] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[05:20:56] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[06:01:05] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-71 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[06:21:04] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-71 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[06:30:34] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[06:35:34] <wmcs-alerts>	 RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce
[08:24:54] <wikibugs>	 10cloud-services-team (FY2025/26-Q1): WMCS Offboarding: Chuck Onwumelu - https://phabricator.wikimedia.org/T399068#11008159 (10fnegri)
[08:26:20] <wikibugs>	 10cloud-services-team (FY2025/26-Q1): WMCS Offboarding: Chuck Onwumelu - https://phabricator.wikimedia.org/T399068#11008166 (10fnegri) > Try sending an email in Gmail and verify that the account shows up as "deactivated"  This method does not work anymore. I used to see some people with a gray "deactivated" icon...
[08:26:34] <wikibugs>	 10cloud-services-team (FY2025/26-Q1): WMCS Offboarding: Chuck Onwumelu - https://phabricator.wikimedia.org/T399068#11008177 (10fnegri) 05In progress→03Resolved
[09:44:04] <logmsgbot_cloud>	 !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1073.eqiad.wmnet}' (T399212)
[09:44:11] <stashbot>	 T399212: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212
[09:46:46] <icinga-wm>	 PROBLEM - Host cloudvirt1073 is DOWN: PING CRITICAL - Packet loss = 100%
[09:47:50] <logmsgbot_cloud>	 !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1073.eqiad.wmnet}' (T399212)
[09:47:58] <icinga-wm>	 RECOVERY - Host cloudvirt1073 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms
[09:49:00] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212#11008413 (10fnegri) Merged the patch above and tested that it works by rebooting cloudvirt1073.  Before reboot:  ` fnegri@cloudvirt1073:~$ sudo cat /proc/sys/net/nf_c...
[09:49:49] <jinxer-wm>	 FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1073 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[09:52:14] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: nf_conntrack_max is not set at boot in cloudvirts - https://phabricator.wikimedia.org/T399212#11008428 (10fnegri) 05In progress→03Resolved I applied the setting on all other cloudvirts without reboot, by running:  ` sudo cumin 'cloudvirt*' 'sysctl --sy...
[09:58:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#11008444 (10fnegri) 05In progress→03Resolved The alert stopped firing after I updated the nf_conntrack setting for cloudvirt1067 in T399212#10992507  In {T399212} I...
[09:58:15] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: MaxConntrack Max conntrack at 95.11% on cloudvirt1067:9100 - https://phabricator.wikimedia.org/T399050#11008446 (10fnegri)
[10:06:45] <wikibugs>	 10Toolforge (Toolforge iteration 22): [lima-kilo] foxtrot ldap docke image is using buster and fails to build - https://phabricator.wikimedia.org/T399701 (10dcaro) 03NEW
[10:09:49] <jinxer-wm>	 RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1073 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[10:20:59] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: [wmcs-cookbooks] cloudvirt.safe_reboot triggers NeutronAgentDown alert - https://phabricator.wikimedia.org/T399705 (10fnegri) 03NEW
[11:51:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[13:29:44] <wikibugs>	 (03open) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[13:36:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[13:43:21] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[13:44:06] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Add k8s_admin, k8s_developer, and k8s_viewer roles expected by default Magnum config for Kubernetes auth using Keystone auth - https://phabricator.wikimedia.org/T399488#11009297 (10Andrew) 05Open→03Resolved a:03Andrew ` openstack role list | grep k8s | 70215d932207...
[13:47:56] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Documentation: Compile the frequently used webpage design snippets for Tools authors - https://phabricator.wikimedia.org/T202949#11009308 (10TBurmeister)
[13:53:11] <wmcs-alerts>	 FIRING: [2x] ProjectProxyMainProxyCertificateExpiry: Certificate for proxy on proxy-5 is about to expire (10d 23h 29m 52s to expiration)   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyCertificateExpiry
[13:53:37] <wikibugs>	 06cloud-services-team, 10Striker: Make it possible to maintain Toolforge tools via an easy-to-use web interface instead of a command-line one - https://phabricator.wikimedia.org/T332480#11009328 (10TBurmeister)
[13:56:45] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[14:04:13] <wikibugs>	 (03open) 10raymond-ndibe: [typing] use native types where possible [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/50
[14:05:19] <wikibugs>	 (03CR) 10Essa237: [C:03+1] elimininate shared productions duplicates [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1169783 (owner: 10Jacob4code)
[14:05:22] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[14:06:07] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[14:06:46] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[14:25:12] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[15:00:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[15:03:45] <wikibugs>	 (03open) 10dcaro: foxtrot_ldap: fix bug when accounts already exist [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/257
[15:20:04] <wikibugs>	 (03update) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[15:20:52] <wikibugs>	 (03update) 10dcaro: foxtrot_ldap: fix bug when accounts already exist [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/257
[15:34:02] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (admin role) unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731 (10bd808) 03NEW
[15:45:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[15:59:27] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[15:59:34] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[16:02:48] <wikibugs>	 (03approved) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[16:02:50] <wikibugs>	 (03merge) 10dcaro: container: use bitnami/openldap [repos/cloud/toolforge/foxtrot-ldap] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/foxtrot-ldap/-/merge_requests/9
[16:09:28] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11009980 (10fnegri)
[16:10:12] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1992 slow ops - https://phabricator.wikimedia.org/T399315#11009989 (10fnegri)
[16:10:20] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11009990 (10fnegri)
[16:10:21] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1992 slow ops - https://phabricator.wikimedia.org/T399315#11009991 (10fnegri) 05Open→03Resolved a:03fnegri
[16:10:30] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 987 slow ops - https://phabricator.wikimedia.org/T399309#11009995 (10fnegri)
[16:10:44] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11009996 (10fnegri)
[16:10:46] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 987 slow ops - https://phabricator.wikimedia.org/T399309#11009997 (10fnegri) 05Open→03Resolved a:03fnegri
[16:10:54] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 367 slow ops - https://phabricator.wikimedia.org/T399299#11010000 (10fnegri) 05Open→03Resolved
[16:11:01] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 367 slow ops - https://phabricator.wikimedia.org/T399299#11010001 (10fnegri)
[16:11:09] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010002 (10fnegri)
[16:11:14] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 5134 slow ops - https://phabricator.wikimedia.org/T399288#11010003 (10fnegri)
[16:11:22] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010004 (10fnegri)
[16:11:24] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 5134 slow ops - https://phabricator.wikimedia.org/T399288#11010005 (10fnegri) 05Open→03Resolved a:03fnegri
[16:11:44] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 847 slow ops - https://phabricator.wikimedia.org/T399287#11010008 (10fnegri)
[16:11:51] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010009 (10fnegri)
[16:11:52] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 847 slow ops - https://phabricator.wikimedia.org/T399287#11010010 (10fnegri) 05Open→03Resolved a:03fnegri
[16:11:55] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1 slow ops - https://phabricator.wikimedia.org/T399284#11010013 (10fnegri)
[16:12:02] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1678 slow ops - https://phabricator.wikimedia.org/T399267#11010015 (10fnegri)
[16:12:04] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010014 (10fnegri)
[16:12:08] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 908 slow ops - https://phabricator.wikimedia.org/T399262#11010017 (10fnegri)
[16:12:10] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010016 (10fnegri)
[16:12:15] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010018 (10fnegri)
[16:12:17] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1451 slow ops - https://phabricator.wikimedia.org/T399260#11010019 (10fnegri)
[16:12:24] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010020 (10fnegri)
[16:12:26] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 30 slow ops - https://phabricator.wikimedia.org/T399255#11010021 (10fnegri)
[16:12:33] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010022 (10fnegri)
[16:19:40] <wikibugs>	 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T399360#11010060 (10fnegri) 05Open→03Resolved a:03fnegri cloudcephosd1013 had a hard drive failure, see {T399366}.  cloudcephosd1036 had a single error message logged on 2025-07-11 15:56 UTC during the outage ({T399281}). Unfortuna...
[16:19:53] <wikibugs>	 06cloud-services-team: KernelErrors - https://phabricator.wikimedia.org/T399360#11010066 (10fnegri)
[16:20:02] <wikibugs>	 10cloud-services-team (FY2025/26-Q1), 10Cloud-VPS, 10Toolforge (Toolforge iteration 22): 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281#11010067 (10fnegri)
[16:20:25] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T399189#11010068 (10fnegri) 05Open→03Resolved a:03fnegri
[16:22:38] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T399189#11010078 (10fnegri) 05Resolved→03Open IPv6 was unavailable for about 1 hour. Reopening as maybe it's worth a quick investigation.  {F64824760}
[16:22:53] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 30 slow ops - https://phabricator.wikimedia.org/T399255#11010086 (10fnegri) 05Open→03Resolved a:03fnegri
[16:22:58] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1451 slow ops - https://phabricator.wikimedia.org/T399260#11010090 (10fnegri) 05Open→03Resolved a:03fnegri
[16:23:01] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 908 slow ops - https://phabricator.wikimedia.org/T399262#11010093 (10fnegri) 05Open→03Resolved a:03fnegri
[16:23:07] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1678 slow ops - https://phabricator.wikimedia.org/T399267#11010096 (10fnegri) 05Open→03Resolved a:03fnegri
[16:23:13] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 1 slow ops - https://phabricator.wikimedia.org/T399284#11010099 (10fnegri) 05Open→03Resolved a:03fnegri
[16:23:18] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 847 slow ops - https://phabricator.wikimedia.org/T399287#11010102 (10fnegri) 05Resolved→03Open
[16:23:32] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 847 slow ops - https://phabricator.wikimedia.org/T399287#11010105 (10fnegri) 05Open→03Resolved
[16:29:26] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T399189#11010133 (10fnegri) Actually only 5 minutes, the previous graph had a reporting artifact due to using `avg_over_time[1h]`:  {F64825930}
[16:32:00] <wikibugs>	 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T399189#11010151 (10fnegri) 05Open→03Resolved The probe failed 3 times on IPv6 and one time over IPv4 over the past week.  {F64826057}  I'm resolving for now, if it happens again a new task will be opened and we can investigate more.
[16:41:20] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (admin role) unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731#11010170 (10bd808) Poking around in Puppet to see the Ope...
[17:08:09] <wikibugs>	 (03update) 10dcaro: foxtrot_ldap: fix bug when accounts already exist [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/257
[17:17:24] <wikibugs>	 10Toolforge (Toolforge iteration 22): [lima-kilo] foxtrot ldap docke image is using buster and fails to build - https://phabricator.wikimedia.org/T399701#11010293 (10dcaro) 05Open→03Resolved p:05Triage→03High a:03dcaro Merged the foxtrot-ldap upgrade code, tested with lima-kilo and it's working, I'...
[17:43:05] <wikibugs>	 10Cloud-VPS (Quota-requests), 06Moderator-Tools-Team, 10Wikilink-Tool: Request to increase Object Storage capacity - Wikilink project - https://phabricator.wikimedia.org/T399746 (10Scardenasmolinar) 03NEW
[17:46:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[17:48:07] <wikibugs>	 (03update) 10dcaro: foxtrot_ldap: fix bug when accounts already exist [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/257
[17:49:05] <wikibugs>	 (03update) 10dcaro: foxtrot_ldap: fix bug when accounts already exist [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/257
[18:25:56] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:30:56] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:32:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[18:32:03] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.reactivate (exit_code=99)
[18:33:46] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.reactivate
[18:34:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[18:38:00] <icinga-wm>	 PROBLEM - Host cloudcephosd1006 is DOWN: PING CRITICAL - Packet loss = 100%
[18:38:30] <icinga-wm>	 RECOVERY - Host cloudcephosd1006 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[18:44:11] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.reactivate (exit_code=0)
[18:46:56] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:51:18] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (formerly 'projectadmin') unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731#11010683 (10Andrew)
[18:56:56] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:01:56] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:22:47] <wikibugs>	 (03open) 10lucaswerkmeister: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/59 (https://phabricator.wikimedia.org/T374780)
[19:23:03] <wikibugs>	 06cloud-services-team, 10Tool-quickcategories, 10Toolforge, 13Patch-For-Review: Relax restrictions on toolforge envvar names - https://phabricator.wikimedia.org/T374780#11010762 (10LucasWerkmeister) ^ Let’s try the low-hanging fruit (already backed by T374780#10162297) first then, I guess.
[19:23:22] <wikibugs>	 (03update) 10lucaswerkmeister: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/59 (https://phabricator.wikimedia.org/T374780)
[19:25:52] <wikibugs>	 (03update) 10lucaswerkmeister: openapi: Allow lowercase ASCII letters too [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/59 (https://phabricator.wikimedia.org/T374780)
[19:42:57] <wikibugs>	 10Tool-archive-externa-links, 10Datasets-Archiving, 10Pywikibot, 10Wikidata: Creation of the "ArchivingBot" for Automatic URL Archiving on Wikidata - https://phabricator.wikimedia.org/T389599#11010872 (10paulwiki)
[20:03:32] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation): Cloud VPS "auditlogging" project Buster deprecation - https://phabricator.wikimedia.org/T367522#11010895 (10Southparkfan) >>! In T367522#10999101, @Aklapper wrote: > @Andrew / @Southparkfan: Can this ticket be resolved by now, or is there more to do? The VM still exists....
[20:20:57] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (formerly 'projectadmin') unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731#11010935 (10bd808)
[20:20:58] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Review our handling of keystone 'member' role (previously known as 'projectadmin') - https://phabricator.wikimedia.org/T396016#11010936 (10bd808)
[20:34:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[20:39:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[21:44:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[21:49:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-44 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess
[22:13:22] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (formerly 'projectadmin') unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731#11011261 (10Andrew) I have confirmed that th...
[22:27:02] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): Cloud VPS project member (formerly 'projectadmin') unable to grant k8s_admin, k8s_developer, k8s_viewer via `openstack role add` - https://phabricator.wikimedia.org/T399731#11011323 (10bd808) >>! In T399731#11011261,...