[00:05:25] RESOLVED: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -4d 11h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [02:29:49] FIRING: DiskSpace: Disk space build2001:9100:/ 0.5978% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [03:14:35] RESOLVED: DiskSpace: Disk space build2001:9100:/ 1.968% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=build2001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [05:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -4d 15h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [08:43:25] FIRING: SystemdUnitFailed: netbox_ganeti_eqsin_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -4d 19h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [10:57:57] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11902014 (10cmooney) Just an update here. The patch to map CS1 into low-priority has been rolled out across the network... [11:41:05] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11902144 (10cmooney) So it seems the reason for this is some ferm complexity. When puppet signals a 'refresh' to it it asks ferm itself to re... [12:43:40] FIRING: SystemdUnitFailed: netbox_ganeti_eqsin_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -4d 23h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [16:37:38] 10Mail, 06Infrastructure-Foundations: Update the documentation on mail alias maintenance - https://phabricator.wikimedia.org/T425798 (10LSobanski) 03NEW [16:43:40] FIRING: SystemdUnitFailed: netbox_ganeti_eqsin_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:44:46] 10Mail, 06Infrastructure-Foundations, 07Documentation: Update the documentation on mail alias maintenance - https://phabricator.wikimedia.org/T425798#11903139 (10A_smart_kitten) [17:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -5d 3h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [18:56:30] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux: BFD broken with default homer configuration - https://phabricator.wikimedia.org/T425813 (10cmooney) 03NEW p:05Triage→03High [20:43:40] FIRING: SystemdUnitFailed: netbox_ganeti_eqsin_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:15:22] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -5d 7h 20m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry