[00:11:59] New patchset: Lcarr; "moving es monitoring to nrpe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4498 [00:12:15] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/4498 [00:13:05] New patchset: Lcarr; "moving es monitoring to nrpe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4498 [00:13:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4498 [00:13:41] binasher: ^^ look good this time (don't hit submit though… don't want to break monitoring on friday eve) [00:16:09] i like how you changed db::es to a parameterized class [00:16:17] looks good! [00:16:43] oh wait, i didn't properly spec out the variable in the monitoring class [00:17:01] oh i did [00:17:03] nm :) [00:17:05] yay [00:17:27] that means it is time to head home :) [00:18:28] have a good weekend [00:22:17] thanks, you too [01:25:24] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [01:25:33] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [02:49:54] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 218 seconds [02:52:54] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [02:52:54] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [03:07:54] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [03:07:54] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [04:18:15] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:25:45] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [04:40:36] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [05:52:57] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [06:25:58] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 182 seconds [06:26:06] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 185 seconds [06:48:16] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [07:00:43] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 27 seconds [07:01:01] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 12 seconds [09:41:26] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 183 seconds [09:41:35] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 190 seconds [09:42:02] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 194 seconds [09:43:05] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 235 seconds [12:54:15] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [12:54:15] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [13:09:15] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [13:09:15] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [14:26:32] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [14:34:11] RECOVERY - MySQL slave status on es1004 is OK: OK: [14:41:39] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [15:21:24] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 217 seconds [15:21:51] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 245 seconds [15:33:42] PROBLEM - MySQL disk space on db1047 is CRITICAL: DISK CRITICAL - free space: /a 68110 MB (3% inode=99%): [15:53:23] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [16:23:14] PROBLEM - jenkins_service_running on aluminium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:24:26] RECOVERY - jenkins_service_running on aluminium is OK: PROCS OK: 3 processes with args jenkins [16:39:53] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (10961) [16:40:11] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (10935) [16:49:08] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [17:19:53] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [17:20:20] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [17:23:20] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [17:47:14] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [18:06:35] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [18:06:44] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [18:18:35] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [18:18:53] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [19:31:51] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 254 seconds [19:32:00] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 262 seconds [19:42:39] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay 0 seconds [19:42:57] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay 0 seconds [19:46:06] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 7 seconds [19:47:18] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [22:55:25] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [22:55:25] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [23:10:25] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [23:10:25] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [23:12:49] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [23:13:07] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds