[00:00:40] PROBLEM - SSH on srv276 is CRITICAL: Server answer: [00:00:50] RECOVERY - DPKG on srv262 is OK: All packages OK [00:30:20] RECOVERY - SSH on srv276 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:30:20] RECOVERY - DPKG on srv276 is OK: All packages OK [00:32:50] RECOVERY - Disk space on srv276 is OK: DISK OK [00:38:20] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [00:38:30] RECOVERY - RAID on srv276 is OK: OK: no RAID installed [00:46:59] !log killing long running show_bug.cgi procs on kaulen [00:46:59] Logged the message, Master [00:47:01] :D [00:47:16] I was just about to to that, cheers [00:47:32] Saves me having to write an email/try and find some ops about [01:29:49] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [02:58:45] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Sun Jan 8 02:58:38 UTC 2012 [03:13:13] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [03:21:23] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [03:34:23] RECOVERY - Puppet freshness on db1003 is OK: puppet ran at Sun Jan 8 03:34:11 UTC 2012 [03:35:23] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours [03:49:13] PROBLEM - MySQL replication status on db1025 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1401s [03:57:34] RECOVERY - MySQL replication status on db1025 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:21:24] RECOVERY - Disk space on es1004 is OK: DISK OK [04:22:14] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:37:04] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [06:57:08] PROBLEM - Squid on brewster is CRITICAL: Connection refused [10:06:07] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 442897 MB (3% inode=99%): [10:09:57] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 414797 MB (3% inode=99%): [10:24:07] RECOVERY - MySQL slave status on es1004 is OK: OK: [13:23:06] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [13:31:06] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [13:44:37] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours [14:08:37] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [14:15:52] could someone fix brewster? [14:15:58] apergos: ^ [14:19:23] it's news to me that there is any squid process on brewster [14:26:52] I don't see off hand why nagios thinks it should be doing something else [14:39:17] ah well I see the root of the problem [14:39:20] but... [14:51:11] !log cleared out some very large squid logs on brewster, (basically all of them) plus lighty logs, disk was full. restarted squid manually [14:51:13] Logged the message, Master [14:52:34] RECOVERY - Squid on brewster is OK: TCP OK - 0.002 second response time on port 8080 [14:57:15] !log removed old puppet lockfile on brewster, ran by hand [14:57:16] Logged the message, Master [15:11:59] apergos: without it puppet is broken [15:12:12] I couldn't do puppetd -tv [15:23:48] on what host? [16:43:49] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Sun Jan 8 16:43:48 UTC 2012 [23:00:14] what's going on with this http://en.wikipedia.org/wiki/Special:Contributions/208.80.154.52 [23:03:40] what do you guys use that IP for? [23:06:11] anyone around? [23:10:48] Ryan_Lane: you around? [23:11:06] Prodego, it resolves as being a squid box [23:11:12] cp1042.wikimedia.org [23:11:48] well obviously we have some sort of problem if people are editing from it [23:12:02] this could be an XFF issue, or a misconfiguration issue [23:12:11] someone needs to investigate this [23:12:13] I think it's an xff issue [23:12:14] hangon [23:12:23] yeah [23:12:24] it's not listed [23:13:40] ok, so are all xff headers being respected or something? [23:13:58] perhaps anyone who is on a host where xff headers are respected can just add their own xff headers as well? [23:14:30] Are there any other hosts around that IP also showing up as editing? [23:15:28] that IP is the only one I saw, but I was just shown it and haven't looked [23:15:30] let me ask [23:16:35] looks like all it's ip neighbours are allocated too [23:24:23] Prodego, that's all the eqiad deployed cp10** listed in XFF now [23:25:23] so that server should be passing IPs correctly now? [23:25:42] cp1001-cp1042 will be, yes [23:26:24] so I'l let you know if we see anything more [23:27:19] thanks Reedy [23:32:18] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [23:40:18] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [23:54:18] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours