[00:22:16] 10Toolforge (Toolforge iteration 19), 07Epic: [cicd] Streamline toolforge cli deployment and external contributor ci flows - https://phabricator.wikimedia.org/T392524#10808952 (10Raymond_Ndibe) a:03Raymond_Ndibe [00:22:17] 10Toolforge (Toolforge iteration 19), 07Epic: [cicd] Streamline toolforge cli deployment and external contributor ci flows - https://phabricator.wikimedia.org/T392524#10808953 (10Raymond_Ndibe) a:05Raymond_Ndibe→03None [00:23:19] 10Toolforge (Toolforge iteration 19), 07Epic: [cicd] Streamline toolforge cli deployment and external contributor ci flows - https://phabricator.wikimedia.org/T392524#10808954 (10Raymond_Ndibe) a:03Raymond_Ndibe [00:25:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [jobs-api] Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#10808958 (10Raymond_Ndibe) 05Open→03In progress [01:14:34] FIRING: DiskSpace: Disk space clouddumps1001:9100:/ 3.643% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:19:25] PROBLEM - Disk space on clouddumps1001 is CRITICAL: DISK CRITICAL - free space: / 16324 MB (3% inode=99%): /tmp 16324 MB (3% inode=99%): /var/tmp 16324 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=clouddumps1001&var-datasource=eqiad+prometheus/ops [01:59:25] RECOVERY - Disk space on clouddumps1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=clouddumps1001&var-datasource=eqiad+prometheus/ops [01:59:34] RESOLVED: DiskSpace: Disk space clouddumps1001:9100:/ 3.156% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:08:34] FIRING: DiskSpace: Disk space clouddumps1001:9100:/ 5.601% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:38:34] RESOLVED: DiskSpace: Disk space clouddumps1001:9100:/ 5.604% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:53:34] 06cloud-services-team, 10Toolforge: Ssh to toolforge failing with "Connection closed by 185.15.56.62 port 22" - https://phabricator.wikimedia.org/T393829 (10Alien333) 03NEW [08:11:32] 06cloud-services-team, 10Toolforge: Ssh to toolforge failing with "Connection closed by 185.15.56.62 port 22" - https://phabricator.wikimedia.org/T393829#10809158 (10taavi) →14Duplicate dup:03T393732 [08:11:36] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10809160 (10taavi) [08:15:18] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10809167 (10taavi) Based on an IRC discussion yesterday, I've disabled Puppet on tools-bastion-13 and hand-updated the sssd config to use codfw LDAP replicas in the hopes that those... [08:21:32] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: project-proxy-puppetmaster-01.project-proxy.eqiad.wmflabs is about to expire in 15d 18h 14m 54s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [14:32:15] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10809254 (10taavi) Seemingly right now processing sudo rules is the main thing that's failing. A few things come to my mind: * The most obvious thing is to raise the timeout (`ldap_s... [17:41:42] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10809396 (10LucasWerkmeister) FWIW, even though systemd complains if you try to restart `sssd-sudo.service` – `lang=shell-session root@tools-bastion-13:~# systemctl restart sssd-sud... [18:18:07] (03merge) 10eliza189: Eliza data update [toolforge-repos/miss-search] (linkhere_branch) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/3 [19:05:23] 06cloud-services-team, 10Data-Services, 10Wikifunctions, 10Abstract Wikipedia team (25Q4 (Apr–Jun)), 07Essential-Work: Make wikifunctionsclient_usage table available on cloud wiki replicas - https://phabricator.wikimedia.org/T392475#10809459 (10DSantamaria) @Jdforrester-WMF, do we need to do something he...