[00:05:16] <jinxer-wm>	 RESOLVED: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[00:07:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[00:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[00:11:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in eqsin - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[00:40:06] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221052
[00:40:06] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221052 (owner: 10TrainBranchBot)
[00:41:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:46:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[00:46:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in eqsin - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[00:53:45] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221052 (owner: 10TrainBranchBot)
[01:00:47] <logmsgbot>	 !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
[01:07:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[01:07:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[01:10:01] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221054
[01:10:01] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221054 (owner: 10TrainBranchBot)
[01:12:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:17:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::7a4f:9b00:174e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:32:25] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221054 (owner: 10TrainBranchBot)
[02:32:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:37:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:43:20] <icinga-wm>	 PROBLEM - HTTPS-status-wikimedia-org on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate status.wikimedia.org valid until 2025-12-28 19:04:50 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Wikitech-static
[02:45:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 216737368 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:45:26] <icinga-wm>	 PROBLEM - HTTPS-status-wikimedia-org on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate status.wikimedia.org valid until 2025-12-28 19:04:50 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Wikitech-static
[02:45:52] <icinga-wm>	 PROBLEM - HTTPS-wikitech-static on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate status.wikimedia.org valid until 2025-12-28 19:04:50 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Wikitech-static
[02:46:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 3963816 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[02:48:00] <icinga-wm>	 PROBLEM - HTTPS-status-wikimedia-org on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate status.wikimedia.org valid until 2025-12-28 19:04:50 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Wikitech-static
[02:48:32] <icinga-wm>	 PROBLEM - HTTPS-wikitech-static on wikitech-static.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate status.wikimedia.org valid until 2025-12-28 19:04:50 +0000 (expires in 2 days) https://wikitech.wikimedia.org/wiki/Wikitech-static
[02:52:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:55:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:00:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:20:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:25:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[03:52:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[04:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[04:15:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:16:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:22:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[04:26:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:28:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:33:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:35:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:40:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:43:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:03:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:08:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:09:13] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:18:08] <icinga-wm>	 PROBLEM - Host ncredir7003 is DOWN: CRITICAL - Time to live exceeded (10.140.2.3)
[05:18:08] <icinga-wm>	 PROBLEM - Host ncredir7004 is DOWN: CRITICAL - Time to live exceeded (10.140.2.8)
[05:18:10] <icinga-wm>	 PROBLEM - Host asw1-b3-magru is DOWN: CRITICAL - Time to live exceeded (195.200.68.130)
[05:18:42] <icinga-wm>	 RECOVERY - Host ncredir7003 is UP: PING OK - Packet loss = 0%, RTA = 137.09 ms
[05:18:42] <icinga-wm>	 RECOVERY - Host ncredir7004 is UP: PING OK - Packet loss = 0%, RTA = 137.12 ms
[05:18:48] <icinga-wm>	 RECOVERY - Host asw1-b3-magru is UP: PING OK - Packet loss = 0%, RTA = 139.70 ms
[05:22:48] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: CRITICAL - Time to live exceeded (195.200.68.101)
[05:22:48] <icinga-wm>	 PROBLEM - Host doh7003 is DOWN: CRITICAL - Time to live exceeded (195.200.68.98)
[05:23:04] <icinga-wm>	 RECOVERY - Host doh7003 is UP: PING OK - Packet loss = 0%, RTA = 137.01 ms
[05:23:14] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 137.02 ms
[05:34:13] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:38:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:43:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:03:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:04:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:19:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:19:36] <wikibugs>	 (03PS4) 10Pppery: Rm frequency.json file [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217871 (https://phabricator.wikimedia.org/T412652)
[06:22:16] <wikibugs>	 (03PS5) 10Pppery: Rm "translations" into English [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651)
[06:25:11] <wikibugs>	 (03PS2) 10Pppery: Rename raw.json to en-x-raw.json [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217874 (https://phabricator.wikimedia.org/T412646)
[07:00:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:13:20] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 80%, RTA = 8944.27 ms
[07:13:38] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[07:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251226T0800)
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251226T0800)
[08:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[08:16:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[09:03:38] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Add an internal translation file for this repo's own strings [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217873 (https://phabricator.wikimedia.org/T412651) (owner: 10Pppery)
[10:40:32] <icinga-wm>	 PROBLEM - Host install7002 is DOWN: CRITICAL - Time to live exceeded (195.200.68.100)
[10:40:36] <icinga-wm>	 PROBLEM - Host prometheus7002 is DOWN: CRITICAL - Time to live exceeded (10.140.2.5)
[10:41:08] <icinga-wm>	 PROBLEM - Host ncredir7003 is DOWN: CRITICAL - Time to live exceeded (10.140.2.3)
[10:41:08] <icinga-wm>	 PROBLEM - Host mr1-magru is DOWN: CRITICAL - Time to live exceeded (195.200.68.132)
[10:41:12] <icinga-wm>	 PROBLEM - Host ncredir7004 is DOWN: CRITICAL - Time to live exceeded (10.140.2.8)
[10:41:24] <icinga-wm>	 RECOVERY - Host install7002 is UP: PING OK - Packet loss = 0%, RTA = 137.77 ms
[10:41:26] <icinga-wm>	 RECOVERY - Host mr1-magru is UP: PING OK - Packet loss = 0%, RTA = 137.09 ms
[10:41:28] <icinga-wm>	 RECOVERY - Host prometheus7002 is UP: PING OK - Packet loss = 0%, RTA = 137.09 ms
[10:41:28] <icinga-wm>	 RECOVERY - Host ncredir7003 is UP: PING OK - Packet loss = 0%, RTA = 137.29 ms
[10:41:44] <icinga-wm>	 RECOVERY - Host ncredir7004 is UP: PING OK - Packet loss = 0%, RTA = 137.21 ms
[10:47:20] <icinga-wm>	 PROBLEM - Host durum7004 is DOWN: CRITICAL - Time to live exceeded (10.140.2.7)
[10:48:02] <icinga-wm>	 RECOVERY - Host durum7004 is UP: PING OK - Packet loss = 0%, RTA = 137.09 ms
[10:51:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[10:53:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[10:59:17] <LBLaiSiNanHai>	 hey
[10:59:35] <LBLaiSiNanHai>	 why mediawiki-content-history-completeness-v1 has a negative availabilty
[11:00:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:08:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:13:43] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Rm "translations" into English [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651) (owner: 10Pppery)
[11:37:10] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] Rm frequency.json file [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217871 (https://phabricator.wikimedia.org/T412652) (owner: 10Pppery)
[11:40:38] <wikibugs>	 (03PS6) 10Pppery: Rm "translations" into English [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651)
[11:41:07] <wikibugs>	 (03PS3) 10Pppery: Rename raw.json to en-x-raw.json [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217874 (https://phabricator.wikimedia.org/T412646)
[11:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[11:51:36] <wikibugs>	 (03CR) 10Aklapper: "[This may need another round of arc liberate first]" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651) (owner: 10Pppery)
[12:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251226T0800)
[12:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251226T0800)
[12:00:04] <jouncebot>	 jelto, arnoldokoth, mutante, and arnaudb: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) GitLab version upgrades deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251226T1200).
[12:05:53] <wikibugs>	 10SRE-swift-storage, 06Commons: Commons file not found - https://phabricator.wikimedia.org/T413507#11484861 (10Aklapper) @Jeff_G: Please add the #sre-swift-storage project tag for missing files. Thanks.
[12:06:09] <wikibugs>	 10SRE-swift-storage, 06Commons: Commons file not found - https://phabricator.wikimedia.org/T413507#11484862 (10Aklapper)
[12:08:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[12:08:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[12:09:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[12:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[12:15:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:05:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:29:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[14:26:28] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:09:13] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:34:13] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[16:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[16:15:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:17:20] <icinga-wm>	 PROBLEM - Ubuntu mirror in sync with upstream on mirror1001 is CRITICAL: /srv/mirrors/ubuntu is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
[16:27:25] <jinxer-wm>	 FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag
[17:20:17] <wikibugs>	 (03CR) 10Pppery: "I ran arc liberate and it didn't change anything." [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651) (owner: 10Pppery)
[17:29:31] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[17:46:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[17:49:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[17:54:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:04:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:09:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:11:20] <icinga-wm>	 RECOVERY - Ubuntu mirror in sync with upstream on mirror1001 is OK: /srv/mirrors/ubuntu is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
[18:12:25] <jinxer-wm>	 RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag
[18:14:16] <jinxer-wm>	 RESOLVED: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:15:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:16:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[18:20:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1275:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1275 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[18:25:40] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1275:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1275 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[18:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:09:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7002 is DOWN: CRITICAL - Time to live exceeded (10.140.1.4)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7010 is DOWN: CRITICAL - Time to live exceeded (10.140.1.8)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7005 is DOWN: CRITICAL - Time to live exceeded (10.140.0.5)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7015 is DOWN: CRITICAL - Time to live exceeded (10.140.0.10)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7003 is DOWN: CRITICAL - Time to live exceeded (10.140.0.4)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7012 is DOWN: CRITICAL - Time to live exceeded (10.140.1.9)
[19:10:46] <icinga-wm>	 PROBLEM - Host cp7008 is DOWN: CRITICAL - Time to live exceeded (10.140.1.7)
[19:10:56] <icinga-wm>	 RECOVERY - Host cp7010 is UP: PING OK - Packet loss = 0%, RTA = 137.95 ms
[19:10:58] <icinga-wm>	 RECOVERY - Host cp7005 is UP: PING OK - Packet loss = 0%, RTA = 137.88 ms
[19:10:58] <icinga-wm>	 RECOVERY - Host cp7002 is UP: PING OK - Packet loss = 0%, RTA = 137.91 ms
[19:10:58] <icinga-wm>	 RECOVERY - Host cp7008 is UP: PING OK - Packet loss = 0%, RTA = 137.95 ms
[19:10:58] <icinga-wm>	 RECOVERY - Host cp7015 is UP: PING OK - Packet loss = 0%, RTA = 137.98 ms
[19:11:12] <icinga-wm>	 RECOVERY - Host cp7003 is UP: PING OK - Packet loss = 0%, RTA = 138.69 ms
[19:11:12] <icinga-wm>	 RECOVERY - Host cp7012 is UP: PING OK - Packet loss = 0%, RTA = 138.72 ms
[19:14:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[19:23:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=upload&var-origin=swift.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:25:06] <claime>	 !incidents
[19:25:06] <sirenbot>	 7241 (UNACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[19:25:06] <sirenbot>	 7239 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[19:25:06] <sirenbot>	 7238 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[19:25:11] <claime>	 !ack 7241
[19:25:12] <sirenbot>	 7241 (ACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[19:30:22] <andrewbogott>	 !log test message
[19:30:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:40] <logmsgbot>	 !log cgoubert@cumin1003 START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
[19:38:51] <jinxer-wm>	 FIRING: [3x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:39:29] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] "Hmm, now that I `git clean -fx` etc I also don't see any differences anymore, sorry! LGTM, thanks!" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217872 (https://phabricator.wikimedia.org/T412651) (owner: 10Pppery)
[19:43:45] <logmsgbot>	 !log cgoubert@cumin1003 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
[19:43:51] <jinxer-wm>	 FIRING: [3x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[19:48:51] <jinxer-wm>	 RESOLVED: [2x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:50:45] <wikibugs>	 (03CR) 10Aklapper: [V:03+2 C:03+2] "lgtm and seems to behave as expected" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217874 (https://phabricator.wikimedia.org/T412646) (owner: 10Pppery)
[19:54:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-eqdfw and fe80::a6e1:1a00:1a6f:d3a3 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[19:59:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-eqdfw and fe80::a6e1:1a00:1a6f:d3a3 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqdfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[20:04:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=upload&var-origin=swift.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[20:05:12] <claime>	 !incidents
[20:05:13] <sirenbot>	 7242 (ACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[20:05:13] <sirenbot>	 7241 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[20:05:13] <sirenbot>	 7239 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[20:09:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[20:11:01] <wikibugs>	 (03PS6) 10Pppery: Rename various locales so their translations can actually be found [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1215393
[20:12:16] <wikibugs>	 (03CR) 10Pppery: "Rebased, generated the library map again." [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1215393 (owner: 10Pppery)
[20:14:51] <jinxer-wm>	 FIRING: [3x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[20:18:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[20:18:53] <wikibugs>	 10SRE-swift-storage, 06Commons: Commons file not found - https://phabricator.wikimedia.org/T413507#11485084 (10TheDJ) > That could explain why the uploader wants it deleted.  which uploader ? how is the uploader relevant ?  Seems the filepage was deleted today ?
[20:32:18] <icinga-wm>	 PROBLEM - Host cp7013 is DOWN: CRITICAL - Time to live exceeded (10.140.0.9)
[20:32:48] <icinga-wm>	 PROBLEM - Host doh7003 is DOWN: PING CRITICAL - Packet loss = 100%
[20:32:48] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: PING CRITICAL - Packet loss = 100%
[20:33:04] <icinga-wm>	 RECOVERY - Host doh7003 is UP: PING OK - Packet loss = 0%, RTA = 136.98 ms
[20:33:06] <icinga-wm>	 RECOVERY - Host cp7013 is UP: PING OK - Packet loss = 0%, RTA = 136.50 ms
[20:33:14] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 136.96 ms
[20:35:12] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[20:35:17] <jinxer-wm>	 FIRING: [4x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[20:38:41] <claime>	 !incidents
[20:38:41] <sirenbot>	 7242 (ACKED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[20:38:41] <sirenbot>	 7243 (ACKED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[20:38:42] <sirenbot>	 7241 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[20:40:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[20:40:20] <jinxer-wm>	 RESOLVED: [3x] ATSBackendErrorsHigh: ATS: elevated 5xx errors from swift.discovery.wmnet in drmrs #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging  - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[20:43:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[21:04:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[21:08:48] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: CRITICAL - Time to live exceeded (195.200.68.101)
[21:08:52] <icinga-wm>	 PROBLEM - Host mr1-magru is DOWN: CRITICAL - Time to live exceeded (195.200.68.132)
[21:09:30] <icinga-wm>	 RECOVERY - Host mr1-magru is UP: PING OK - Packet loss = 0%, RTA = 137.21 ms
[21:09:40] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 137.07 ms
[21:31:12] <wikibugs>	 (03CR) 10ArielGlenn: "Sorry for these very late comments, I was far behind on reading these." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1217516 (https://phabricator.wikimedia.org/T410379) (owner: 10Daniel Kinzler)
[21:39:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[21:55:00] <wikibugs>	 (03CR) 10ArielGlenn: rest gateway: add smoke tests (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215605 (owner: 10Daniel Kinzler)
[22:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:34:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[22:35:00] <icinga-wm>	 PROBLEM - Host wikikube-worker1053 is DOWN: PING CRITICAL - Packet loss = 100%
[22:35:24] <icinga-wm>	 RECOVERY - Host wikikube-worker1053 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms
[22:50:34] <wikibugs>	 (03PS1) 10Zabe: Pin imagelinks migration to old schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221096 (https://phabricator.wikimedia.org/T299953)
[22:53:27] <wikibugs>	 (03PS1) 10Zabe: BETA: Set imagelinks migration to write both [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221098 (https://phabricator.wikimedia.org/T299953)
[22:53:51] <wikibugs>	 (03PS2) 10Zabe: BETA: Set imagelinks migration to write both [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221098 (https://phabricator.wikimedia.org/T413526)
[22:58:14] <wikibugs>	 10SRE-swift-storage, 06Commons: Commons file not found - https://phabricator.wikimedia.org/T413507#11485173 (10Peachey88) //Sound of Broken Record:// If the request for deletion for images relates to something being broken, or tasks about image issues getting filed, it would be really nice that it gets flagged...
[23:02:06] <icinga-wm>	 PROBLEM - Host wikikube-worker1053 is DOWN: PING CRITICAL - Packet loss = 100%
[23:03:24] <icinga-wm>	 RECOVERY - Host wikikube-worker1053 is UP: PING WARNING - Packet loss = 33%, RTA = 508.52 ms
[23:35:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[23:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown