[01:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:23:20] PROBLEM - MariaDB sustained replica lag on s8 on db2166 is CRITICAL: 140 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2166&var-port=9104 [03:30:48] FIRING: MysqlReplicationLag: MySQL instance db2166:9104@s8 has too large replication lag (9m 27s). Its replication source is db2161.codfw.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2166&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [03:30:48] FIRING: MysqlReplicationLagPtHeartbeat: MySQL instance db2166:9104 has too large replication lag (9m 50s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2166&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [03:32:14] FIRING: MysqlHostIoPressure: MySQL instance db2166:9100 has too much pressure on its io capabilities: (546.8m) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/rYdddlPWk/node-exporter-full?orgId=1&refresh=1m&viewPanel=323&var-datasource=thanos&var-job=All&var-node=db2166%3A9100 - https://alerts.wikimedia.org/?q=alertname%3DMysqlHostIoPressure [03:35:48] RESOLVED: MysqlReplicationLag: MySQL instance db2166:9104@s8 has too large replication lag (12m 43s). Its replication source is db2161.codfw.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2166&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [03:35:48] RESOLVED: MysqlReplicationLagPtHeartbeat: MySQL instance db2166:9104 has too large replication lag (12m 43s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2166&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [03:37:15] RESOLVED: MysqlHostIoPressure: MySQL instance db2166:9100 has too much pressure on its io capabilities: (443.1m) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/rYdddlPWk/node-exporter-full?orgId=1&refresh=1m&viewPanel=323&var-datasource=thanos&var-job=All&var-node=db2166%3A9100 - https://alerts.wikimedia.org/?q=alertname%3DMysqlHostIoPressure [03:40:20] RECOVERY - MariaDB sustained replica lag on s8 on db2166 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2166&var-port=9104 [05:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed