[08:28:17] FYI Last dump for db_inventory at eqiad (db1215) taken on 2026-03-03 00:39:45 is 316 KiB, but the previous one was 118 KiB, a change of +169.0 % [10:08:48] FIRING: [31x] MysqlReplicationLagPtHeartbeat: MySQL instance db1157:9104 has too large replication lag (11m 16s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:11:15] PROBLEM - MariaDB sustained replica lag on s4 on db1160 is CRITICAL: 731 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1160&var-port=9104 [10:12:11] PROBLEM - MariaDB sustained replica lag on x1 on db1237 is CRITICAL: 559 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1237&var-port=9104 [10:12:19] PROBLEM - MariaDB sustained replica lag on s2 on db1222 is CRITICAL: 638 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104 [10:12:21] PROBLEM - MariaDB sustained replica lag on s1 on db2145 is CRITICAL: 691 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2145&var-port=9104 [10:12:21] PROBLEM - MariaDB sustained replica lag on s1 on db2173 is CRITICAL: 17 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2173&var-port=9104 [10:12:22] PROBLEM - MariaDB sustained replica lag on s1 on db2203 is CRITICAL: 720 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2203&var-port=9104 [10:12:22] PROBLEM - MariaDB sustained replica lag on s3 on db2149 is CRITICAL: 711 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2149&var-port=9104 [10:12:23] PROBLEM - MariaDB sustained replica lag on s1 on db2188 is CRITICAL: 330 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2188&var-port=9104 [10:13:48] FIRING: [174x] MysqlReplicationLagPtHeartbeat: MySQL instance db1156:9104 has too large replication lag (14m 45s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:13:59] PROBLEM - MariaDB sustained replica lag on s3 on db2194 is CRITICAL: 173 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2194&var-port=9104 [10:14:01] RECOVERY - MariaDB sustained replica lag on s1 on db2173 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2173&var-port=9104 [10:14:01] PROBLEM - MariaDB sustained replica lag on s5 on db2171 is CRITICAL: 323.5 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2171&var-port=9104 [10:14:03] FIRING: [174x] MysqlReplicationLagPtHeartbeat: MySQL instance db1156:9104 has too large replication lag (14m 29s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:14:03] PROBLEM - MariaDB sustained replica lag on s6 on db2169 is CRITICAL: 152 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2169&var-port=9104 [10:14:03] PROBLEM - MariaDB sustained replica lag on s6 on db2158 is CRITICAL: 77.5 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2158&var-port=9104 [10:15:01] RECOVERY - MariaDB sustained replica lag on s2 on db1222 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104 [10:15:03] RECOVERY - MariaDB sustained replica lag on s3 on db2149 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2149&var-port=9104 [10:15:36] PROBLEM - MariaDB sustained replica lag on s8 on db1193 is CRITICAL: 914 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1193&var-port=9104 [10:15:38] PROBLEM - MariaDB sustained replica lag on s8 on db1177 is CRITICAL: 97 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1177&var-port=9104 [10:16:02] RECOVERY - MariaDB sustained replica lag on s1 on db2188 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2188&var-port=9104 [10:16:52] RECOVERY - MariaDB sustained replica lag on x1 on db1237 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1237&var-port=9104 [10:17:00] RECOVERY - MariaDB sustained replica lag on s4 on db1160 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1160&var-port=9104 [10:17:01] RECOVERY - MariaDB sustained replica lag on s5 on db2171 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2171&var-port=9104 [10:17:02] RECOVERY - MariaDB sustained replica lag on s1 on db2145 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2145&var-port=9104 [10:17:04] RECOVERY - MariaDB sustained replica lag on s6 on db2169 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2169&var-port=9104 [10:17:04] RECOVERY - MariaDB sustained replica lag on s1 on db2203 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2203&var-port=9104 [10:17:36] RECOVERY - MariaDB sustained replica lag on s8 on db1193 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1193&var-port=9104 [10:18:00] RECOVERY - MariaDB sustained replica lag on s3 on db2194 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2194&var-port=9104 [10:18:06] RECOVERY - MariaDB sustained replica lag on s6 on db2158 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2158&var-port=9104 [10:18:48] FIRING: [160x] MysqlReplicationLagPtHeartbeat: MySQL instance db1156:9104 has too large replication lag (19m 10s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:19:58] PROBLEM - MariaDB sustained replica lag on s7 on db1174 is CRITICAL: 153 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1174&var-port=9104 [10:20:01] PROBLEM - MariaDB sustained replica lag on s7 on db1158 is CRITICAL: 571 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1158&var-port=9104 [10:20:40] RECOVERY - MariaDB sustained replica lag on s8 on db1177 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1177&var-port=9104 [10:20:52] PROBLEM - MariaDB sustained replica lag on s2 on db1156 is CRITICAL: 904 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [10:20:58] RECOVERY - MariaDB sustained replica lag on s7 on db1174 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1174&var-port=9104 [10:21:00] RECOVERY - MariaDB sustained replica lag on s7 on db1158 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1158&var-port=9104 [10:21:39] PROBLEM - MariaDB sustained replica lag on s2 on db1188 is CRITICAL: 508 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1188&var-port=9104 [10:22:53] RECOVERY - MariaDB sustained replica lag on s2 on db1156 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [10:23:48] RESOLVED: [146x] MysqlReplicationLagPtHeartbeat: MySQL instance db1157:9104 has too large replication lag (14m 56s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [10:26:39] RECOVERY - MariaDB sustained replica lag on s2 on db1188 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1188&var-port=9104 [10:27:31] "RESOLVED: [146x]" lolsob [12:15:16] can ms-fe2021-23 be downtimed? I am guessing those are either new or old ones? [12:16:27] gone for lunch, but meaning: https://alerts.wikimedia.org/?q=alertname%3DSwift%20https%20backend&q=team%3Dsre&q=%40receiver%3Dirc-spam probably a downtime expired or something [12:24:40] they're still with DC-Ops per T416243 [12:24:41] T416243: Q3:rack/setup/install ms-fe202[1-4] - https://phabricator.wikimedia.org/T416243 [12:32:05] (I've silenced those alerts for a few days) [12:52:37] Thank you [14:25:40] bah, Dell have rejigged the SCSI ids of their disk controllers in config-J [14:26:41] Oh, no, maybe in fact this system is misconfigured [14:31:17] yeah, they've gone RAID-0 not JBOD [14:37:18] JBOD is RAID-0, just 1 RAID-0 per disk :-D [14:40:14] no, it's really not [14:46:42] I guess is not, as it would be pass-through vs striped (?) [14:47:56] presented to the host OS very differently [14:48:13] (e.g. smartctl will work on a JBOD drive) [17:49:13] FIRING: SystemdUnitFailed: confd_prometheus_metrics.service on ms-be1096:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:53:54] FIRING: [3x] SystemdUnitFailed: confd_prometheus_metrics.service on ms-be1096:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:59:13] FIRING: [4x] SystemdUnitFailed: confd_prometheus_metrics.service on ms-be1096:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:32:16] PROBLEM - MariaDB sustained replica lag on s1 on db2145 is CRITICAL: 94.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2145&var-port=9104 [23:33:16] RECOVERY - MariaDB sustained replica lag on s1 on db2145 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2145&var-port=9104