[06:08:58] PROBLEM - MariaDB sustained replica lag on s5 on db2157 is CRITICAL: 14.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2157&var-port=9104 [06:09:58] RECOVERY - MariaDB sustained replica lag on s5 on db2157 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2157&var-port=9104 [06:10:03] checking [06:11:24] It was a small spike that wasn't even captured on the graphs [09:23:20] there's an alert on db1208 (analytics_meta) - anybody working on it? [09:35:37] btullis: ^ [10:39:52] Thanks. Not currently working on it, but I can have a look now. [10:40:40] This is the host from which our backups are taken, so it's not directly service affecting. [11:19:55] Can I get a +1 to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229538 please? Removing three drained hosts so I can reimage them into per-rack VLANs [11:23:37] Thanks <3 [11:25:03] marostegui: deployed the switchmaster change now [11:26:56] thanks! [11:30:01] Emperor: I have restarted media backups with a new node, but that should only extend backups for 1-2 months. It is catching up now. [11:30:51] jynus: thanks for the update. It's good for my blood pressure to know there are backups for when I screw up ;-) [11:31:11] that's the whole point of backups, adding agility [11:31:54] I will ping here when the backlog clears and we go back to backing up seconds or minutes after upload [11:32:31] 👍 [11:42:26] and with that we are in excess of 160 million files uploaded and backed up [12:17:25] FIRING: SystemdUnitFailed: swift_ring_manager.service on ms-fe2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:32:25] RESOLVED: SystemdUnitFailed: swift_ring_manager.service on ms-fe2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:32:56] why did I look at etherpad's database? why... [18:33:31] just wow