[17:06:16] FIRING: [2x] PfwCoreBGPDown: Fundraising Firewall core BGP session down between pfw1-codfw and (null) (10.195.0.248) - group VPN - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DPfwCoreBGPDown [17:14:59] uh oh re. ^^^ [17:15:37] we're seeing cross-datacenter traffic down, probably correlated [17:35:46] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143 (10colewhite) 03NEW [17:36:17] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11444783 (10colewhite) [17:37:41] RESOLVED: PfwCoreBGPDown: Fundraising Firewall core BGP session down between pfw1-eqiad and (null) (10.195.0.249) - group VPN - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=eqiad&var-device=pfw1-eqiad:9804&var-bgp_group=VPN&var-bgp_neighbor=(null) - https://alerts.wikimedia.org/?q=alertname%3DPfwCoreBGPDow [18:21:37] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11444984 (10ayounsi) Looks like a repeat of {T398433} but for a different switch. As it's not in a VXLAN fabric, we should look at upgrading that one switch. [18:32:44] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux ARP resolution bug on v24.10.x+ - https://phabricator.wikimedia.org/T409178#11445009 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2a98251c-6798-469c-a3de-57fcfb13969f) set by cmooney@cumin1003 for 2:00:00 on 17 host(s) and t... [18:39:23] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11445028 (10colewhite) List of affected hosts: ` asw1-b3-magru asw1-b4-magru asw1-bw27-esams asw1-by27-esams cloudsw1-b1-codfw lsw1-a2-codfw lsw1-a3-codfw lsw1-a4-codfw lsw1-a... [18:48:12] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia SR-Linux ARP resolution bug on v24.10.x+ - https://phabricator.wikimedia.org/T409178#11445044 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1aee9e7e-d36b-4c56-8cac-746f48098c6f) set by cmooney@cumin1003 for 2:00:00 on 2 host(s) and th... [18:48:28] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11445046 (10ayounsi) Thanks, that's a lot !! I guess we will have to start 2026 with a switch upgrade campaign... [18:55:21] 10netops, 06Infrastructure-Foundations, 10Observability-Logging: ~5k/logs/sec from netdev - https://phabricator.wikimedia.org/T412143#11445078 (10ayounsi) a:03ayounsi [20:56:25] 10netops, 06Infrastructure-Foundations, 06SRE: Nokia: how to approach schema differences in SR-Linux versions - https://phabricator.wikimedia.org/T412157 (10cmooney) 03NEW p:05Triage→03High