[00:00:30] !log db54 is now replicating from db30 [00:00:32] Logged the message, Master [00:02:01] LeslieCarr: thanks! [00:04:34] now i can make use of things like querying ganglia for the slave lag of every single core db in pmtpa and eqiad in json (http://ganglia.wikimedia.org/latest/graph.php?r=hour&title=&vl=&x=&n=&hreg[]=^db\d%2B&mreg[]=mysql_slave_lag&aggregate=1&json=1) [00:15:32] New patchset: Diederik; "Initial commit, feedback Catrope incorporated." [analytics/udp-filters] (master) - https://gerrit.wikimedia.org/r/2142 [00:25:35] PROBLEM - MySQL Slave Delay on db54 is CRITICAL: CRIT replication delay 3492 seconds [00:32:51] RECOVERY - MySQL Slave Delay on db54 is OK: OK replication delay 0 seconds [00:55:21] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [01:28:52] New patchset: Ottomata; "Documentation, abstracting Pipeline class further." [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2149 [01:28:53] New patchset: Ottomata; "Documentation, assertions" [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2150 [01:28:55] New patchset: Ottomata; "Phew, rewrite of UserAgentPipeline is coming along." [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2151 [01:38:50] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2151 [01:40:03] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2150 [01:41:33] New review: Diederik; "Ok." [analytics/reportcard] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2149 [01:41:34] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2151 [01:41:34] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2150 [01:41:34] Change merged: Diederik; [analytics/reportcard] (master) - https://gerrit.wikimedia.org/r/2149 [02:18:32] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1540s [02:21:32] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1720s [02:41:02] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 6s [02:44:02] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 10s [03:21:12] RECOVERY - Puppet freshness on virt3 is OK: puppet ran at Sat Jan 28 03:21:04 UTC 2012 [03:23:42] RECOVERY - Puppet freshness on sodium is OK: puppet ran at Sat Jan 28 03:23:38 UTC 2012 [03:50:32] PROBLEM - RAID on sodium is CRITICAL: Connection refused by host [03:51:32] PROBLEM - Disk space on virt3 is CRITICAL: Connection refused by host [03:54:24] RECOVERY - RAID on sodium is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [03:54:54] RECOVERY - Disk space on virt3 is OK: DISK OK [04:19:24] RECOVERY - Disk space on es1004 is OK: DISK OK [04:21:14] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:41:22] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [08:50:33] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:00:13] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [09:11:26] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [10:06:17] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 404344 MB (3% inode=99%): [10:09:57] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 382566 MB (3% inode=99%): [11:04:39] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [11:19:09] RECOVERY - MySQL slave status on es1004 is OK: OK: [16:20:26] PROBLEM - check_minfraud3 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:25:26] RECOVERY - check_minfraud3 on payments1 is OK: HTTP OK: HTTP/1.1 200 OK - 8644 bytes in 3.451 second response time [16:49:16] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:17:08] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [19:21:48] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [20:43:00] PROBLEM - Backend Squid HTTP on knsq25 is CRITICAL: Connection refused [20:48:50] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:15:31] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours