[02:34:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:34:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:45:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [07:56:37] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.05 - 2026.01.23), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11544162 (10JAllemandou) >>! In T414460#11542216, @ops-monitoring-bot wrote: > Roll-reboot of nodes in... [08:45:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [10:34:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:43:36] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.05 - 2026.01.23), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11544445 (10ops-monitoring-bot) Roll-reboot of nodes in dse-eqiad cluster started by btullis: * dse-k8... [10:46:13] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.05 - 2026.01.23), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11544459 (10BTullis) >>! In T414460#11544162, @JAllemandou wrote: >>>! In T414460#11542216, @ops-monit... [11:29:28] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: Support listing pooled / active authdns hosts (rather than all) - https://phabricator.wikimedia.org/T375014#11544573 (10Volans) @MLechvien-WMF I haven't work on this since the original unplanned effort that generated the above patc... [14:34:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:44:54] XioNoX: topranks: anything I can help with including asking Traffic for reviews for the incident training stuff for netops? [14:45:12] or in general with anything else. (sadly I won't be there but I at least want to be helpful from here) [14:45:43] it’s a shame alright :( [14:46:21] don't worry, the beer I owe carries over [14:46:22] I’m just afk right now, will ping you when I’m back (Arzhel is off today btw ) [14:46:57] no worries at all, but please let me know if we can help [14:50:03] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11545354 (10A_smart_kitten) [15:05:06] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270 (10Marostegui) 03NEW [15:06:35] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11545478 (10Marostegui) p:05Triage→03High [15:09:02] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability, 06SRE: librenms.syslog table size - https://phabricator.wikimedia.org/T349362#11545494 (10Marostegui) [15:09:17] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11545495 (10Marostegui) [15:30:15] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11545586 (10cmooney) > At T349362#9267648 we discussed having just 15 days of retention, is that still the case? If it is, maybe we need to reduce it a bit further? T... [15:32:25] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11545610 (10Marostegui) Thank you! [15:41:42] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11545670 (10Aklapper) What does https://phabricator.wikimedia.org/mail/query/inbox/ show for you? [16:10:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [16:12:46] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11545817 (10jhathaway) @Nardog we changed our DMARC policy on wikimedia.org to quarantine on 2026-01-20, so that is probably the cause of the deliverability issu... [17:04:25] FIRING: [3x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:10:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [17:13:41] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [18:50:44] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11546581 (10cmooney) >>! In T412525#11528293, @Jclark-ctr wrote: > @cmooney i have disconnected all the switches @Jclark-ctr I'm ha... [19:18:41] RESOLVED: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [19:23:38] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11546641 (10VRiley-WMF) Hey @cmooney, I was able to check C2 and confirm there is a cable from the managment port on the switch to th... [19:24:25] FIRING: [5x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:26:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11546643 (10VRiley-WMF) Please disreguard, I thought we were talking about lsw1-c2 [19:36:04] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11546665 (10Reedy) [20:08:41] FIRING: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [20:29:25] FIRING: [7x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:46:30] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11546770 (10cmooney) Thanks for the help with this one guys! All the switches have been reset to factory defaults. So they can be re... [21:08:41] RESOLVED: NetboxPhysicalHosts: Netbox - Report parity errors between PuppetDB and Netbox for physical devices. - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxPhysicalHosts [22:15:51] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11546971 (10jhathaway) @Nardog this appears to be an issue with Microsoft's email platform, they are throttling our mail server. I have opened a ticket with the... [23:26:55] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11547179 (10jhathaway) p:05Triage→03High [23:27:13] 10Mail, 06Infrastructure-Foundations, 10Phabricator: Phabricator task notification emails not delivered - https://phabricator.wikimedia.org/T415265#11547180 (10jhathaway) a:03jhathaway