[00:29:40] FIRING: [7x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:11:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:29:40] FIRING: [7x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:29:40] FIRING: [7x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:55:32] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11547901 (10ayounsi) Probably the same root cause as {T412143}. We might need to temporarily ignore those log messages until JTAC provides us with a permanent fix. [10:04:33] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548046 (10Gehel) [10:53:56] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548385 (10Marostegui) Would it be possible to truncate it for now? [11:08:57] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548408 (10ayounsi) >>! In T415270#11548385, @Marostegui wrote: > Would it be possible to truncate it for now? Yep [11:11:12] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548413 (10Marostegui) 05Open→03Resolved a:03Marostegui Thanks ` cumin2024@db1213.eqiad.wmnet[librenms]> truncate table syslog; Query OK, 0 rows affected (0.... [11:12:34] FIRING: DiskSpace: Disk space config-master1001:9100:/ 6.793% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=config-master1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:14:33] looking at config-master1001 [11:15:21] should recover soon [11:17:34] RESOLVED: DiskSpace: Disk space config-master1001:9100:/ 6.703% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=config-master1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [11:24:25] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:35:46] 10netops, 06Infrastructure-Foundations, 06Traffic: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11548459 (10cmooney) p:05Medium→03Low Moving this to low priority as the issue appears to be resolved. However I'm keeping it open as I want to follo... [12:19:25] FIRING: [10x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:24:25] FIRING: [10x] SystemdUnitFailed: gitlab-package-puller.service on apt-staging2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:30:10] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548597 (10cmooney) So looking at dse-k8s-worker1013 it has now been up for 1 day 18 hours, yet we st... [13:20:28] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548743 (10JAllemandou) It seems that the `dse-k8s-worker1019` still has the problem: {F71597128} [13:24:25] FIRING: [10x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:29:25] FIRING: [10x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:39:56] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11549260 (10jhathaway) @Nardog Microsoft appears to have resolved the issue, are you receiving ticket updates now? [17:29:25] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:29:40] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed