[01:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [05:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [08:33:13] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10994459 (10MoritzMuehlenhoff) >>! In T378028#10993697, @Dzahn wrote: > But another question comes to mind.. and that is.. do VRTS machi... [09:13:55] FIRING: MaxConntrack: Max conntrack at 86.41% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [09:15:59] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10994585 (10Arnoldokoth) @Dzahn We used to run it on VMs but we kept running into resource issues (especially with `clamav`) even after... [09:16:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Netbox: remove old cr2-codfw Switch Control Board inventory items - https://phabricator.wikimedia.org/T398940#10994586 (10ayounsi) We can remove them from Netbox if they're not in the device anymore. and add them to the spare tracking... [09:18:55] RESOLVED: MaxConntrack: Max conntrack at 84.14% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [09:37:03] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10994674 (10MoritzMuehlenhoff) That said, if you striclty need a physical host for the tests, you could use puppetserver2003. I decommed... [09:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:27:55] 10netops, 06Infrastructure-Foundations, 06SRE: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#10994796 (10cmooney) [10:35:52] debmonitor now checks all kubernetes clusters \o/ [11:56:55] FIRING: MaxConntrack: Max conntrack at 80.53% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [12:01:55] RESOLVED: MaxConntrack: Max conntrack at 80.53% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [13:22:30] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10995390 (10Arnoldokoth) Thanks @MoritzMuehlenhoff We'll consider that... But I'm doubtful we "strictly" need to test this on hardware.... [13:43:28] FIRING: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki1001:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [13:47:04] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [13:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [13:53:28] RESOLVED: SystemdUnitCrashLoop: node-bgpalerter.service crashloop on rpki1001:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:13:52] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 06SRE, and 2 others: Replace Exim on VRTS servers with Postfix - https://phabricator.wikimedia.org/T378028#10995529 (10Dzahn) Thanks all. I am not sure though if the request was for "temp testing setup" or just for "a new system to replace the... [14:47:04] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [15:16:45] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Netbox: remove old cr2-codfw Switch Control Board inventory items - https://phabricator.wikimedia.org/T398940#10995790 (10RobH) >>! In T398940#10994586, @ayounsi wrote: > We can remove them from Netbox if they're not in the device anym... [17:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [21:49:03] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06Traffic: eqsin purged consumers lag - https://phabricator.wikimedia.org/T399221#10996713 (10cmooney) Arelion came back to say they did move a path but that they see CRC errors inbound from us in codfw: ` 2025-07-11 19:48 Hello Team, We ha... [21:49:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06Traffic: eqsin purged consumers lag - https://phabricator.wikimedia.org/T399221#10996717 (10cmooney) p:05Triage→03High [21:50:07] FIRING: [8x] PuppetConstantChange: Puppet performing a change on every puppet run on puppetmaster1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:49:56] FIRING: MaxConntrack: Max conntrack at 83.28% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [23:54:55] RESOLVED: MaxConntrack: Max conntrack at 83.39% on krb1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack