[00:08:50] New patchset: Asher; "extend check_mysql_slave_delay to support pt-heartbeat monitoring by server_id in a multi-tier replication tree + nrpe hooks." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1920 [00:09:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1920 [00:15:21] Would someone mind throwing things at formey? svn is running really slow, and I can't even get a login prompt from ssh [00:15:40] Reedy: laner is on it [00:15:45] cheers [00:15:51] This seems to be getting more common [00:16:15] Usually viewcgi process maxing cpu and causing swapdeath [00:16:37] load average 31.48 [00:16:40] Awesone [00:23:45] PROBLEM - Puppet freshness on srv191 is CRITICAL: Puppet has not run in the last 10 hours [00:26:46] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1920 [00:30:16] Ryan_Lane: https://gerrit.wikimedia.org/r/#change,1920 [00:32:29] Change abandoned: Ryan Lane; "something went wrong." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1920 [00:33:17] New patchset: Asher; "extend check_mysql_slave_delay to support pt-heartbeat monitoring by server_id in a multi-tier replication tree + nrpe hooks." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1921 [00:33:46] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1921 [00:33:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1921 [00:33:49] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1921 [00:33:52] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1921 [00:47:13] Hah, I see mutante is in the credits for the RSS extension [02:29:07] New patchset: Bhartshorne; "added query duration statistics" [operations/software] (master) - https://gerrit.wikimedia.org/r/1922 [02:29:54] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1922 [02:43:19] PROBLEM - Host sq46 is DOWN: PING CRITICAL - Packet loss = 100% [02:52:59] RECOVERY - Puppet freshness on srv191 is OK: puppet ran at Sat Jan 14 02:52:36 UTC 2012 [04:15:32] RECOVERY - Disk space on es1004 is OK: DISK OK [04:16:52] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:39:28] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [07:29:29] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [10:06:24] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 445028 MB (3% inode=99%): [10:08:24] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 436164 MB (3% inode=99%): [10:09:34] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [10:37:04] RECOVERY - MySQL slave status on es1004 is OK: OK: [16:54:29] New review: Petrb; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1918 [17:39:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [20:19:23] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [21:10:20] PROBLEM - LDAP on virt1 is CRITICAL: Connection refused [21:12:50] PROBLEM - LDAPS on virt1 is CRITICAL: Connection refused [23:41:42] PROBLEM - Host virt1 is DOWN: PING CRITICAL - Packet loss = 100% [23:50:02] PROBLEM - Host virt1.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%