[00:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[00:20:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[00:21:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[00:40:03] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221104
[00:40:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221104 (owner: 10TrainBranchBot)
[00:44:44] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 75%, RTA = 5579.10 ms
[00:45:10] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING WARNING - Packet loss = 0%, RTA = 942.22 ms
[00:46:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[00:52:25] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1221104 (owner: 10TrainBranchBot)
[01:00:40] <logmsgbot>	 !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
[01:10:02] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl2003.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[01:10:18] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221105
[01:10:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221105 (owner: 10TrainBranchBot)
[01:11:02] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[01:24:51] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 24m 10s)
[01:27:54] <icinga-wm>	 PROBLEM - Host ncredir7003 is DOWN: CRITICAL - Time to live exceeded (10.140.2.3)
[01:28:08] <icinga-wm>	 RECOVERY - Host ncredir7003 is UP: PING OK - Packet loss = 0%, RTA = 137.02 ms
[01:32:28] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1221105 (owner: 10TrainBranchBot)
[01:48:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:03:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:04:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:24:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:25:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[02:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[02:31:12] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 50%, RTA = 2917.26 ms
[02:31:36] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[02:35:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:23:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:34:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[03:49:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[03:51:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:01:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:06:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[04:26:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:27:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:52:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[04:53:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:03:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[05:09:13] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:33:32] <icinga-wm>	 PROBLEM - Host wikikube-worker1053 is DOWN: PING CRITICAL - Packet loss = 77%, RTA = 8443.39 ms
[05:34:13] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:34:22] <icinga-wm>	 RECOVERY - Host wikikube-worker1053 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[05:53:14] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 100%
[05:54:10] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[06:04:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:14:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:21:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[06:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:36:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[07:23:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:35:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[07:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[08:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[09:20:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[10:21:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[10:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:23:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:30:51] <jinxer-wm>	 FIRING: TransitPeeringTransportOutSaturation: Transit, peering or transport OUT traffic above 90% capacity - cr3-eqsin:xe-0/1/3 (Peering: Equinix (Wikimedia-SG1-IX-00 Singapore, MAC filter) {#1016}) #page - https://w.wiki/Gbyf - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr3-eqsin:9804 - https://alerts.wikimedia.org/?q=alertname%3DTransitPeeringTransportOutSaturation
[11:32:22] <elukey>	 o/
[11:32:50] <elukey>	 acked the incidnet
[11:32:53] <elukey>	 *incident
[11:33:48] <moritzm>	 I acked it in the VO app earlier, not sure if that propagated yet
[11:33:50] <icinga-wm>	 PROBLEM - Host wikikube-worker1016 is DOWN: PING CRITICAL - Packet loss = 100%
[11:34:18] <icinga-wm>	 RECOVERY - Host wikikube-worker1016 is UP: PING OK - Packet loss = 0%, RTA = 0.36 ms
[11:34:39] <elukey>	 moritzm: o/
[11:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[11:51:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[12:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[12:14:19] <jasmine_>	 !incidents
[12:14:20] <sirenbot>	 7244 (ACKED)  TransitPeeringTransportOutSaturation network sre (cr3-eqsin:9804 Peering: Equinix (Wikimedia-SG1-IX-00 Singapore, MAC filter) {#1016} xe-0/1/3 gnmi eqsin)
[12:14:20] <sirenbot>	 7243 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[12:14:20] <sirenbot>	 7242 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[12:14:21] <sirenbot>	 7241 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[12:30:51] <jinxer-wm>	 RESOLVED: TransitPeeringTransportOutSaturation: Transit, peering or transport OUT traffic above 90% capacity - cr3-eqsin:xe-0/1/3 (Peering: Equinix (Wikimedia-SG1-IX-00 Singapore, MAC filter) {#1016}) #page - https://w.wiki/Gbyf - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr3-eqsin:9804 - https://alerts.wikimedia.org/?q=alertname%3DTransitPeeringTransportOutSaturation
[12:50:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[13:03:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:10:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[13:20:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[13:23:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[13:33:47] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/x0zp5851 on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[14:10:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[14:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:09:13] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:34:14] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[16:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[16:13:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[16:16:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[16:21:10] <icinga-wm>	 PROBLEM - Ensure traffic_server is running for instance backend on cp7004 is CRITICAL: PROCS CRITICAL: 3 processes with args /usr/bin/traffic_server -M --httpport 3128 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[16:22:10] <icinga-wm>	 RECOVERY - Ensure traffic_server is running for instance backend on cp7004 is OK: PROCS OK: 1 process with args /usr/bin/traffic_server -M --httpport 3128 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[16:31:05] <wikibugs>	 (03PS1) 10Pppery: Delete the translations/ dir before regenerating [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130
[16:31:44] <wikibugs>	 (03PS2) 10Pppery: Delete the translations/ dir before regenerating [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130
[17:11:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[17:14:41] <wikibugs>	 (03PS1) 10Pppery: Handle all format specifiers [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221131 (https://phabricator.wikimedia.org/T413529)
[17:18:50] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): Delete the translations/ dir before regenerating (031 comment) [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130 (owner: 10Pppery)
[17:34:02] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/x0zp5851 on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[17:35:03] <wikibugs>	 (03CR) 10Pppery: Delete the translations/ dir before regenerating (031 comment) [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130 (owner: 10Pppery)
[17:39:45] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): Delete the translations/ dir before regenerating (031 comment) [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130 (owner: 10Pppery)
[17:56:16] <jinxer-wm>	 RESOLVED: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[17:57:16] <jinxer-wm>	 FIRING: ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:07:16] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[18:11:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[18:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[19:04:07] <wikibugs>	 (03PS3) 10Pppery: Delete the translations/ dir before regenerating [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130
[19:05:12] <wikibugs>	 (03CR) 10Pppery: Delete the translations/ dir before regenerating (031 comment) [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130 (owner: 10Pppery)
[19:06:41] <wikibugs>	 (03PS1) 10Pppery: Rebuild library map automatically after generating files [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221162
[19:08:24] <wikibugs>	 (03CR) 10Pppery: "I just realized there's a surprising amount of subtlety here; with this patch the mainloop runs with the library map out of date. Since th" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221130 (owner: 10Pppery)
[19:09:29] <wikibugs>	 (03CR) 10Pppery: Rebuild library map automatically after generating files (031 comment) [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221162 (owner: 10Pppery)
[19:10:24] <icinga-wm>	 PROBLEM - Host ncredir7003 is DOWN: CRITICAL - Time to live exceeded (10.140.2.3)
[19:10:50] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: PING CRITICAL - Packet loss = 100%
[19:10:50] <icinga-wm>	 PROBLEM - Host doh7003 is DOWN: PING CRITICAL - Packet loss = 100%
[19:10:58] <icinga-wm>	 RECOVERY - Host ncredir7003 is UP: PING OK - Packet loss = 0%, RTA = 137.32 ms
[19:11:06] <icinga-wm>	 RECOVERY - Host doh7003 is UP: PING OK - Packet loss = 0%, RTA = 137.07 ms
[19:11:14] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 137.22 ms
[19:15:36] <icinga-wm>	 PROBLEM - Host ncredir7003 is DOWN: PING CRITICAL - Packet loss = 100%
[19:15:40] <icinga-wm>	 PROBLEM - Host doh7003 is DOWN: CRITICAL - Time to live exceeded (195.200.68.98)
[19:15:40] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: CRITICAL - Time to live exceeded (195.200.68.101)
[19:15:50] <icinga-wm>	 RECOVERY - Host ncredir7003 is UP: PING OK - Packet loss = 0%, RTA = 137.20 ms
[19:15:54] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 137.02 ms
[19:16:02] <icinga-wm>	 RECOVERY - Host doh7003 is UP: PING OK - Packet loss = 0%, RTA = 136.91 ms
[19:23:55] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[19:26:40] <icinga-wm>	 PROBLEM - Host doh7003 is DOWN: CRITICAL - Time to live exceeded (195.200.68.98)
[19:26:40] <icinga-wm>	 PROBLEM - Host doh7004 is DOWN: CRITICAL - Time to live exceeded (195.200.68.101)
[19:26:58] <icinga-wm>	 RECOVERY - Host doh7004 is UP: PING OK - Packet loss = 0%, RTA = 137.12 ms
[19:26:58] <icinga-wm>	 RECOVERY - Host doh7003 is UP: PING OK - Packet loss = 0%, RTA = 137.09 ms
[19:35:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:36:55] <jasmine_>	 !incidents
[19:36:56] <sirenbot>	 7245 (UNACKED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[19:36:56] <sirenbot>	 7244 (RESOLVED)  TransitPeeringTransportOutSaturation network sre (cr3-eqsin:9804 Peering: Equinix (Wikimedia-SG1-IX-00 Singapore, MAC filter) {#1016} xe-0/1/3 gnmi eqsin)
[19:36:56] <sirenbot>	 7243 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[19:36:56] <sirenbot>	 7242 (RESOLVED)  ATSBackendErrorsHigh cache_upload sre (swift.discovery.wmnet esams)
[19:37:37] <jasmine_>	 !ack 7245
[19:37:38] <sirenbot>	 7245 (ACKED)  ATSBackendErrorsHigh cache_text sre (rest-gateway-ro.discovery.wmnet esams)
[19:40:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[19:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[19:56:45] <jinxer-wm>	 RESOLVED: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[20:10:26] <jinxer-wm>	 FIRING: [5x] PuppetCertificateAboutToExpire: Puppet CA certificate config-master.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[20:17:36] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 66%, RTA = 2819.33 ms
[20:18:10] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[20:54:28] <icinga-wm>	 PROBLEM - Thanos swift https on thanos-fe1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Thanos
[20:55:26] <icinga-wm>	 RECOVERY - Thanos swift https on thanos-fe1007 is OK: HTTP OK: HTTP/1.1 200 OK - 282 bytes in 7.990 second response time https://wikitech.wikimedia.org/wiki/Thanos
[20:58:18] <wikibugs>	 (03PS9) 10Pppery: Extract strings from US English locale as source strings and apply PLURAL [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1217844 (https://phabricator.wikimedia.org/T412421)
[20:58:18] <wikibugs>	 (03PS2) 10Pppery: Handle all format specifiers [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221131 (https://phabricator.wikimedia.org/T413529)
[21:01:30] <wikibugs>	 (03PS1) 10Pppery: Add a `bin/translatewiki roundtrip` workflow to validate the string-mangling code [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221180 (https://phabricator.wikimedia.org/T413532)
[21:07:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[21:08:19] <rzl>	 o/
[21:12:42] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[21:12:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[21:30:26] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:34:02] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/x0zp5851 on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[21:34:14] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:45:42] <wikibugs>	 (03PS3) 10Pppery: Handle all format specifiers [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221131 (https://phabricator.wikimedia.org/T413529)
[21:46:12] <wikibugs>	 (03PS2) 10Pppery: Add a `bin/translatewiki roundtrip` workflow to validate the string-mangling code [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221180 (https://phabricator.wikimedia.org/T413532)
[22:02:42] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[22:03:41] <wikibugs>	 (03PS1) 10Pppery: Set up `arc lint`, make it pass, update README [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221191 (https://phabricator.wikimedia.org/T413531)
[22:03:58] <wikibugs>	 (03PS2) 10Pppery: Set up `arc lint`, make it pass, update README [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221191 (https://phabricator.wikimedia.org/T413531)
[22:05:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[22:07:31] <jinxer-wm>	 FIRING: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[22:10:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from rest-gateway-ro.discovery.wmnet in esams #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=esams&var-cluster=text&var-origin=rest-gateway-ro.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[22:26:43] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1012:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1012:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[22:32:38] <wikibugs>	 06SRE, 06serviceops: Decide whether to exclude {api,rest}-gateway-ro from ATSBackendErrorsHigh - https://phabricator.wikimedia.org/T413544 (10RLazarus) 03NEW
[22:33:11] <wikibugs>	 (03PS1) 10RLazarus: team-sre: Exclude {api,rest}-gateway-ro from ATSBackendErrorsHigh [alerts] - 10https://gerrit.wikimedia.org/r/1221195 (https://phabricator.wikimedia.org/T413544)
[22:34:39] <wikibugs>	 06SRE, 06serviceops, 13Patch-For-Review: Decide whether to exclude {api,rest}-gateway-ro from ATSBackendErrorsHigh - https://phabricator.wikimedia.org/T413544#11485690 (10RLazarus) Proposing https://gerrit.wikimedia.org/r/1221195 as an interim solution over the break, and once we're back we can either keep i...
[22:39:11] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] "looks good" [alerts] - 10https://gerrit.wikimedia.org/r/1221195 (https://phabricator.wikimedia.org/T413544) (owner: 10RLazarus)
[22:39:42] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] team-sre: Exclude {api,rest}-gateway-ro from ATSBackendErrorsHigh [alerts] - 10https://gerrit.wikimedia.org/r/1221195 (https://phabricator.wikimedia.org/T413544) (owner: 10RLazarus)
[22:41:25] <wikibugs>	 (03Merged) 10jenkins-bot: team-sre: Exclude {api,rest}-gateway-ro from ATSBackendErrorsHigh [alerts] - 10https://gerrit.wikimedia.org/r/1221195 (https://phabricator.wikimedia.org/T413544) (owner: 10RLazarus)
[22:55:26] <wikibugs>	 (03PS3) 10Pppery: Add a `bin/translatewiki roundtrip` workflow to validate the string-mangling code [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221180 (https://phabricator.wikimedia.org/T413532)
[22:58:17] <wikibugs>	 (03PS3) 10Pppery: Set up `arc lint`, make it pass, update README [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1221191 (https://phabricator.wikimedia.org/T413531)
[23:29:26] <icinga-wm>	 PROBLEM - Host wikikube-worker1275 is DOWN: PING CRITICAL - Packet loss = 90%, RTA = 9100.04 ms
[23:30:06] <icinga-wm>	 RECOVERY - Host wikikube-worker1275 is UP: PING WARNING - Packet loss = 50%, RTA = 1271.94 ms
[23:32:16] <jinxer-wm>	 RESOLVED: [2x] ErrorBudgetBurn: xlab-standalone-event-system-success-rate-v1 <no value> - https://slo.wikimedia.org/?search=xlab-standalone-event-system-success-rate-v1   - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn
[23:33:15] <jinxer-wm>	 FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[23:38:15] <jinxer-wm>	 RESOLVED: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[23:41:15] <jinxer-wm>	 FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[23:46:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[23:48:38] <jinxer-wm>	 FIRING: GnmiTargetDown: lsw1-b6-codfw is unreachable through gNMI - https://wikitech.wikimedia.org/wiki/Network_telemetry#Troubleshooting - https://grafana.wikimedia.org/d/eab73c60-a402-4f9b-a4a7-ea489b374458/gnmic - https://alerts.wikimedia.org/?q=alertname%3DGnmiTargetDown
[23:52:15] <jinxer-wm>	 FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate
[23:57:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-api-ext - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate