[00:06:56] FIRING: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:49:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [00:55:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.995% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:01:56] RESOLVED: SystemdUnitDown: The service unit logrotate.service is in failed status on host cloudgw1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudgw1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:12:11] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [01:19:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [02:53:03] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudnet2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [02:59:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:09:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:11:57] RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:15:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.952% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:23:34] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 [03:29:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:31:28] RESOLVED: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [03:34:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:35:28] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-24 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:35:56] RESOLVED: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [03:39:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:40:28] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-24 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [03:41:44] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 [04:29:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:29:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:34:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-12 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:54:18] (03CR) 10Eugene233: [C:03+2] Updated README Elaborate steps to run tool on local setup [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1166450 (owner: 10Jacob4code) [04:55:59] (03Merged) 10jenkins-bot: Updated README Elaborate steps to run tool on local setup [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1166450 (owner: 10Jacob4code) [05:53:57] (03open) 10theleekycauldron: Date header fixes [toolforge-repos/leekbot] - 10https://gitlab.wikimedia.org/toolforge-repos/leekbot/-/merge_requests/1 [05:54:18] (03merge) 10theleekycauldron: Date header fixes [toolforge-repos/leekbot] - 10https://gitlab.wikimedia.org/toolforge-repos/leekbot/-/merge_requests/1 [06:53:04] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudnet2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:53:04] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on cloudnet2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [13:53:10] FIRING: [2x] ProjectProxyMainProxyCertificateExpiry: Certificate for proxy on proxy-5 is about to expire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProjectProxyMainProxyCertificateExpiry [17:30:49] 10Tools: Catanalysis tool gives an error message for Interslavic Wikipedia in the Incubator - https://phabricator.wikimedia.org/T399377#10998437 (10Aklapper) 05Open→03Invalid Hi @IJzeren_Jan, thanks for taking the time to report this. Please follow the instructions on https://meta.toolforge.org/catanalys... [17:33:50] 10Tools: Catanalysis: Not possible to specifically analyze Incubator project Wp/isv - https://phabricator.wikimedia.org/T399371#10998446 (10Aklapper) [21:20:34] 06cloud-services-team, 10Toolforge, 07Kubernetes: Unable to load Toolforge job: ERROR: TjfCliError: Unknown error (403 Client Error: Forbidden for url - https://phabricator.wikimedia.org/T399417 (10Multichill) 03NEW [21:31:31] 10Cloud-VPS (Project-requests): Request creation of voterlists VPS project - https://phabricator.wikimedia.org/T399418 (10SD0001) 03NEW