[00:22:02] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.35:11000 (timeout) [00:23:23] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [00:29:05] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [00:31:56] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [00:56:25] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [01:42:37] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 234 seconds [01:48:10] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 30 seconds [01:50:34] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [01:53:12] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [02:24:24] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [02:25:19] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [02:27:15] PROBLEM - Host mw8 is DOWN: PING CRITICAL - Packet loss = 100% [02:29:30] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [02:57:06] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [03:15:15] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [03:16:36] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [03:20:57] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [03:22:27] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [04:03:21] PROBLEM - Puppet freshness on blondel is CRITICAL: Puppet has not run in the last 10 hours [04:13:24] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [04:15:53] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [04:49:56] PROBLEM - Apache HTTP on srv245 is CRITICAL: Connection refused [04:53:32] PROBLEM - Host srv243 is DOWN: PING CRITICAL - Packet loss = 100% [04:55:38] RECOVERY - Host srv243 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [05:02:14] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.208:11000 (Connection refused) [05:04:17] PROBLEM - NTP on srv208 is CRITICAL: NTP CRITICAL: Offset unknown [05:04:53] PROBLEM - Apache HTTP on srv208 is CRITICAL: Connection refused [05:04:53] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [05:07:17] RECOVERY - NTP on srv208 is OK: NTP OK: Offset -0.0167388916 secs [05:10:53] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (timeout) [05:12:23] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [05:14:11] PROBLEM - Apache HTTP on srv244 is CRITICAL: Connection refused [05:14:47] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [05:15:32] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [05:17:47] PROBLEM - Host srv231 is DOWN: PING CRITICAL - Packet loss = 100% [05:18:59] RECOVERY - Host srv231 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [05:22:35] PROBLEM - Apache HTTP on srv231 is CRITICAL: Connection refused [05:25:17] RECOVERY - Apache HTTP on srv231 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [05:28:53] PROBLEM - Apache HTTP on srv200 is CRITICAL: Connection refused [05:29:38] RECOVERY - Apache HTTP on srv244 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [05:33:23] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [05:33:50] PROBLEM - Host srv210 is DOWN: PING CRITICAL - Packet loss = 100% [05:36:05] RECOVERY - Host srv210 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [05:39:05] PROBLEM - Apache HTTP on srv210 is CRITICAL: Connection refused [05:41:11] PROBLEM - Host srv236 is DOWN: PING CRITICAL - Packet loss = 100% [05:42:14] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.236:11000 (Connection refused) [05:43:26] RECOVERY - Host srv236 is UP: PING OK - Packet loss = 0%, RTA = 1.57 ms [05:46:26] PROBLEM - Apache HTTP on srv236 is CRITICAL: Connection refused [05:46:26] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [05:51:05] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) [05:53:41] PROBLEM - Apache HTTP on srv213 is CRITICAL: Connection refused [05:53:41] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [05:58:20] PROBLEM - Host srv264 is DOWN: PING CRITICAL - Packet loss = 100% [05:59:23] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [05:59:41] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) [06:00:08] RECOVERY - Host srv264 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [06:00:15] * jeremyb wonders if apergos is up yet? [06:00:20] nagios is pretty chatty [06:01:02] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [06:03:35] RECOVERY - Apache HTTP on srv210 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.042 second response time [06:04:20] PROBLEM - Apache HTTP on srv264 is CRITICAL: Connection refused [06:06:17] PROBLEM - Host srv258 is DOWN: PING CRITICAL - Packet loss = 100% [06:08:14] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.27:11000 (timeout) 10.0.8.23:11000 (timeout) [06:08:41] RECOVERY - Host srv258 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [06:11:05] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [06:11:50] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [06:12:44] PROBLEM - Apache HTTP on srv258 is CRITICAL: Connection refused [06:13:38] yes. is there an issue? [06:13:59] idk. just nagios seems to extra flappy [06:14:04] it's not flappy [06:14:30] there's a job running that is rebooting some hosts that have been up a long time, doing them one at a time with a few minutes in between [06:15:23] how does that account for hosts that have alerted several times? [06:15:53] you'll see it be unvailable, then be up but no http, then http back [06:16:30] hrmm. that is what happened for the particular host i'm looking back at now (236) [06:16:54] I did some manual reboots yesterday so I'm quite familiar with the pattern :-P [06:17:05] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [06:17:06] right, i noticed you did ;) [06:17:14] PROBLEM - Apache HTTP on mw44 is CRITICAL: Connection refused [06:17:40] but there was ~25 mins between HTTP conn refused (which was after ping came back) and HTTP recovery. sounds like a long time to me [06:18:05] that's becaus of how the recovery works [06:18:07] because puppet is what starts http back up [06:18:15] if somenoe reboots manually, [06:18:22] it ensures the mediawiki codebase is up to date before it starts apache [06:18:23] they (me) will hop on the host and force a puppet run [06:18:27] so that we rsync mw [06:18:44] and then it will start apache; otherwise you are waiting for the run to happen for up to 30 mins [06:19:02] aha [06:19:11] i didn't realize puppet was involved [06:19:12] in reality, it's not much of a problem [06:19:18] we don't have hosts try to force a puppet run immediately when they come back up, [06:19:33] the reboots are spaced 8 minutes apart [06:19:36] because imagine that several of them drop off and are rebooted simultaneously... [06:19:39] so, at most 3 hosts are down at a time [06:20:00] so there's aabout 6 more that are down cause they are down (and not the script) [06:20:01] and that's assuming all puppet runs take a full 30 minutes [06:20:03] which isn't true [06:20:07] I'll have a look at those now [06:20:23] yeah, it's possible puppet fails on some [06:20:41] these were down yesterday [06:20:47] I think they died before your script got to em [06:20:59] ssh -l root srv162.mgmt [06:20:59] ssh: connect to host srv162.mgmt port 22: Connection timed out [06:21:06] well that's another one I won't look at then [06:21:16] apergos: there's your good friend srv206 ;) [06:21:33] yeah, my good friend 206 is going to stay like it is [06:21:42] I'm talking about hosts actually down, not ssh-able [06:25:10] !powercycled srv284 (unresponsive at mgmt console) [06:25:20] PROBLEM - Apache HTTP on srv247 is CRITICAL: Connection refused [06:28:20] hmm looks like it doesn't want to come up. not getting out of the bios phase [06:29:23] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.11:11000 (Connection timed out) [06:30:08] PROBLEM - Host srv261 is DOWN: PING CRITICAL - Packet loss = 100% [06:31:02] * jeremyb wonders if there's some graceful way to cycle an entire memcahced cluster [06:31:43] e.g. if you could pass the whole contents of the memcached to the new node for a given slot when it takes over (right before booting the old node for that slot) [06:32:18] hmm it wants to reinstall [06:32:37] that's pretty odd [06:32:41] RECOVERY - Host srv261 is UP: PING OK - Packet loss = 0%, RTA = 1.46 ms [06:33:44] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [06:34:20] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [06:35:26] hrmm, there's http://docs.libmemcached.org/bin/memdump.html and several other relevant google hits [06:35:50] PROBLEM - Apache HTTP on srv261 is CRITICAL: Connection refused [06:35:55] i guess with the new approach of trusting memcached less it doesn't matter so much [06:36:35] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [06:36:44] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [06:37:56] PROBLEM - Host srv267 is DOWN: PING CRITICAL - Packet loss = 100% [06:38:05] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.17:11000 (Connection timed out) [06:40:02] RECOVERY - Host srv267 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [06:40:56] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [06:44:32] PROBLEM - Apache HTTP on srv267 is CRITICAL: Connection refused [06:45:08] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.37:11000 (timeout) [06:48:08] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [06:48:44] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [06:48:44] RECOVERY - Apache HTTP on srv261 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [06:54:43] PROBLEM - Apache HTTP on srv246 is CRITICAL: Connection refused [06:55:10] PROBLEM - NTP on srv246 is CRITICAL: NTP CRITICAL: Offset unknown [06:59:40] RECOVERY - NTP on srv246 is OK: NTP OK: Offset -0.003497123718 secs [07:06:07] PROBLEM - Apache HTTP on srv232 is CRITICAL: Connection refused [07:08:22] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.207:11000 (Connection timed out) [07:13:46] PROBLEM - Apache HTTP on srv207 is CRITICAL: Connection refused [07:16:01] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [07:17:49] !log powercycled mw8 [07:17:55] Logged the message, Master [07:18:34] RECOVERY - Host mw8 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [07:19:29] !log reinstalled srv284, seems to be up now [07:19:32] Logged the message, Master [07:21:43] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [07:22:10] PROBLEM - Apache HTTP on srv212 is CRITICAL: Connection refused [07:22:19] RECOVERY - Apache HTTP on srv246 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [07:22:35] !log installing upgrades on srv212 [07:22:38] Logged the message, Master [07:25:46] PROBLEM - Host mw35 is DOWN: PING CRITICAL - Packet loss = 100% [07:26:04] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [07:27:16] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [07:27:52] RECOVERY - Host mw35 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [07:28:54] so mutante if you care I have a summary of the boxes that are still down (not being handled by the reboot script) [07:29:13] RECOVERY - Apache HTTP on srv207 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [07:29:52] apergos: ok, sure, where is it [07:30:07] srv206 has an old rt ticket for it, puppet refuses to run on it, there's an odd error, tossin the yaml file didn't help any, see the logs. srv174 and 188 aren't reachable even mgmt port. srv281 has an open ticket for hardware... [07:31:11] ssl3004 isn't reachable even via mgmt port, it *was* until * rebooted it, so tht's worrying [07:31:37] PROBLEM - Apache HTTP on mw35 is CRITICAL: Connection refused [07:31:55] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.27:11000 (timeout) 10.0.8.13:11000 (timeout) [07:33:35] srv162 not reachable even on mgmt port. [07:34:05] there's only 266 I haven't looked at yet [07:34:25] apergos: srv162, srv174 - they are gone, decommissioned [07:34:37] this is only for hosts iin mediawiki-installation or whatever it's called [07:34:40] if they were in a list, that was outdated i guess [07:34:47] hmm might wanna remove them from that list :-D [07:35:53] i'm installing upgrades on some anyways.. seeing that samba upgrade just doesnt feel good [07:36:04] srv266 says: Severity: Non Recoverable, SEL:CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted from the mgmt console [07:36:43] I'll power cycle it anyways and see what happens [07:37:04] why don't i see that one on nagios..hmm [07:37:14] PROBLEM - Memcached on srv284 is CRITICAL: Connection refused [07:37:21] !log powercycling srv266, had this message on mgmt console: Severity: Non Recoverable, SEL:CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted [07:37:24] Logged the message, Master [07:37:41] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [07:39:20] PROBLEM - Apache HTTP on srv270 is CRITICAL: Connection refused [07:39:47] PROBLEM - Apache HTTP on srv265 is CRITICAL: Connection refused [07:40:23] RECOVERY - Apache HTTP on srv212 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [07:40:59] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [07:41:44] PROBLEM - Host mw36 is DOWN: PING CRITICAL - Packet loss = 100% [07:43:50] RECOVERY - Host mw36 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [07:44:03] err: Could not run Puppet configuration client: Invalid parameter system at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:50 [07:44:08] this on srv266 [07:44:11] awesome [07:44:16] apergos: srv206 - yes, weird puppet error and had hardware errors in last july :p hmm [07:44:18] same as on 206 [07:44:29] both the same puppet error? [07:44:36] yes [07:45:17] I;m saying, isn't it a bit odd that both have the same puppet error? [07:45:48] if they each have some sort of hardware error I would expect different types of failure [07:45:49] s [07:45:59] the code it refers to is in define systemuser [07:46:13] there is a user created with "system => true" [07:46:20] it does not like that parameter..uhm [07:47:08] PROBLEM - Apache HTTP on mw36 is CRITICAL: Connection refused [07:50:35] !log upgrading mw36 [07:50:39] Logged the message, Master [07:51:18] apergos: taking srv206 for reinstall [07:52:41] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [07:52:52] ok [07:53:17] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.051 second response time [07:53:53] PROBLEM - Apache HTTP on mw11 is CRITICAL: Connection refused [07:57:29] PROBLEM - Host srv206 is DOWN: PING CRITICAL - Packet loss = 100% [07:58:05] PROBLEM - Host srv265 is DOWN: PING CRITICAL - Packet loss = 100% [07:59:44] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.15:11000 (Connection refused) 10.0.11.34:11000 (timeout) [07:59:45] !log reinstalling srv206 [07:59:48] Logged the message, Master [08:00:38] RECOVERY - Host srv265 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [08:02:35] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [08:03:02] RECOVERY - Host srv206 is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [08:05:35] PROBLEM - Host srv274 is DOWN: PING CRITICAL - Packet loss = 100% [08:07:05] PROBLEM - Memcached on srv206 is CRITICAL: Connection refused [08:07:14] PROBLEM - SSH on srv206 is CRITICAL: Connection refused [08:07:59] RECOVERY - Host srv274 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [08:08:53] !log upgraded mw1,mw2,mw35 [08:08:55] Logged the message, Master [08:10:59] PROBLEM - Apache HTTP on mw1 is CRITICAL: Connection refused [08:12:56] RECOVERY - SSH on srv206 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:13:32] RECOVERY - Apache HTTP on mw11 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.318 second response time [08:13:41] PROBLEM - Apache HTTP on mw2 is CRITICAL: Connection refused [08:15:20] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) [08:17:53] PROBLEM - Apache HTTP on mw33 is CRITICAL: Connection refused [08:18:11] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [08:18:20] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [08:19:44] apergos: anyways, the puppet problem is gone after reinstall. i can also take 266 then [08:19:50] great [08:22:14] PROBLEM - Host srv276 is DOWN: PING CRITICAL - Packet loss = 100% [08:22:46] !reinstalling srv266 [08:23:44] RECOVERY - Host srv276 is UP: PING OK - Packet loss = 0%, RTA = 2.14 ms [08:23:53] PROBLEM - NTP on srv206 is CRITICAL: NTP CRITICAL: No response from NTP server [08:24:11] RECOVERY - Apache HTTP on srv206 is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.006 seconds [08:28:22] PROBLEM - Apache HTTP on srv276 is CRITICAL: Connection refused [08:29:07] RECOVERY - Apache HTTP on mw1 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.055 second response time [08:29:34] PROBLEM - SSH on srv266 is CRITICAL: Connection refused [08:29:43] PROBLEM - Apache HTTP on srv266 is CRITICAL: Connection refused [08:30:01] PROBLEM - Memcached on srv266 is CRITICAL: Connection refused [08:31:58] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.23:11000 (timeout) [08:32:46] mutante: you probably want to put !log there [08:33:00] !reinstalling :P [08:33:37] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [08:33:55] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.373 second response time [08:34:04] PROBLEM - Apache HTTP on srv287 is CRITICAL: Connection refused [08:34:13] !log reinstalling srv266 [08:34:15] Logged the message, Master [08:34:15] thx petan|wk [08:34:51] btw when ur not busy poke me [08:35:07] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.042 second response time [08:35:16] RECOVERY - SSH on srv266 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:35:16] RECOVERY - NTP on srv206 is OK: NTP OK: Offset 0.007565736771 secs [08:35:34] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.021 second response time [08:38:16] PROBLEM - Host srv283 is DOWN: PING CRITICAL - Packet loss = 100% [08:39:46] RECOVERY - Host srv283 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [08:40:49] RECOVERY - Apache HTTP on mw2 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.074 second response time [08:42:37] RECOVERY - Apache HTTP on srv266 is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.007 seconds [08:44:07] PROBLEM - Apache HTTP on srv283 is CRITICAL: Connection refused [08:45:37] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.241:11000 (Connection timed out) [08:47:07] PROBLEM - NTP on srv266 is CRITICAL: NTP CRITICAL: Offset unknown [08:48:46] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [08:49:58] PROBLEM - Apache HTTP on srv241 is CRITICAL: Connection refused [08:54:19] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [08:54:28] PROBLEM - Host srv263 is DOWN: PING CRITICAL - Packet loss = 100% [08:56:34] RECOVERY - Host srv263 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [08:57:01] RECOVERY - NTP on srv266 is OK: NTP OK: Offset -0.02310967445 secs [08:59:43] PROBLEM - Apache HTTP on srv263 is CRITICAL: Connection refused [09:00:55] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [09:02:16] PROBLEM - Host srv277 is DOWN: PING CRITICAL - Packet loss = 100% [09:02:34] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.27:11000 (Connection refused) [09:03:55] RECOVERY - Host srv277 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms [09:04:04] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [09:06:46] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [09:08:16] PROBLEM - Apache HTTP on srv277 is CRITICAL: Connection refused [09:09:28] PROBLEM - Host srv271 is DOWN: PING CRITICAL - Packet loss = 100% [09:11:43] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.21:11000 (Connection refused) [09:12:10] RECOVERY - Host srv271 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [09:13:05] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [09:19:49] PROBLEM - NTP on mw5 is CRITICAL: NTP CRITICAL: Offset unknown [09:20:07] PROBLEM - Apache HTTP on mw5 is CRITICAL: Connection refused [09:21:30] what apache? it's supposed to be nginx over there anyways [09:21:37] RECOVERY - Apache HTTP on mw5 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.523 second response time [09:21:49] thanks [09:21:55] thanks a whole lot nagios [09:24:01] RECOVERY - NTP on mw5 is OK: NTP OK: Offset 0.04755532742 secs [09:29:34] PROBLEM - Apache HTTP on mw62 is CRITICAL: Connection refused [09:32:52] RECOVERY - Apache HTTP on srv277 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [09:38:34] PROBLEM - Apache HTTP on mw59 is CRITICAL: Connection refused [09:42:19] PROBLEM - Host srv282 is DOWN: PING CRITICAL - Packet loss = 100% [09:42:46] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [09:44:16] RECOVERY - Host srv282 is UP: PING OK - Packet loss = 0%, RTA = 0.39 ms [09:45:01] RECOVERY - Apache HTTP on mw62 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [09:47:25] PROBLEM - Apache HTTP on srv282 is CRITICAL: Connection refused [09:49:58] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [09:50:16] RECOVERY - Apache HTTP on srv282 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [09:52:13] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [09:55:58] PROBLEM - Apache HTTP on mw27 is CRITICAL: Connection refused [10:01:58] PROBLEM - Apache HTTP on mw7 is CRITICAL: Connection refused [10:03:19] RECOVERY - Apache HTTP on mw7 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [10:05:52] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [10:09:28] PROBLEM - Apache HTTP on mw61 is CRITICAL: Connection refused [10:14:09] PROBLEM - Host mw70 is DOWN: PING CRITICAL - Packet loss = 100% [10:15:21] RECOVERY - Host mw70 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [10:18:31] PROBLEM - Apache HTTP on mw70 is CRITICAL: Connection refused [10:20:00] RECOVERY - Apache HTTP on mw70 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.106 second response time [10:21:39] PROBLEM - Host mw10 is DOWN: PING CRITICAL - Packet loss = 100% [10:23:00] RECOVERY - Host mw10 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [10:25:51] RECOVERY - Apache HTTP on mw61 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [10:26:00] PROBLEM - Apache HTTP on mw10 is CRITICAL: Connection refused [10:34:51] PROBLEM - Apache HTTP on mw69 is CRITICAL: Connection refused [10:41:54] RECOVERY - Apache HTTP on mw10 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [10:42:48] PROBLEM - Apache HTTP on mw71 is CRITICAL: Connection refused [10:50:36] PROBLEM - Apache HTTP on mw47 is CRITICAL: Connection refused [10:57:12] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [10:58:06] PROBLEM - Apache HTTP on mw53 is CRITICAL: Connection refused [10:59:09] RECOVERY - Apache HTTP on mw69 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.118 second response time [11:01:33] PROBLEM - Host mw37 is DOWN: PING CRITICAL - Packet loss = 100% [11:03:03] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.24:11000 (timeout) 10.0.11.37:11000 (Connection refused) [11:03:57] RECOVERY - Apache HTTP on mw71 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.114 second response time [11:04:24] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:04:24] RECOVERY - Host mw37 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [11:07:33] PROBLEM - Apache HTTP on mw37 is CRITICAL: Connection refused [11:07:33] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [11:09:57] PROBLEM - Host srv286 is DOWN: PING CRITICAL - Packet loss = 100% [11:11:09] RECOVERY - Host srv286 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms [11:11:27] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.36:11000 (Connection refused) 10.0.8.23:11000 (timeout) [11:12:57] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:15:21] PROBLEM - Apache HTTP on srv286 is CRITICAL: Connection refused [11:18:30] PROBLEM - Host mw55 is DOWN: PING CRITICAL - Packet loss = 100% [11:19:15] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [11:20:27] RECOVERY - Host mw55 is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms [11:21:48] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [11:30:48] PROBLEM - Apache HTTP on mw57 is CRITICAL: Connection refused [11:33:48] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [11:42:21] PROBLEM - Host srv269 is DOWN: PING CRITICAL - Packet loss = 100% [11:44:45] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.22:11000 (timeout) [11:44:54] RECOVERY - Host srv269 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [11:46:15] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [11:47:54] PROBLEM - Apache HTTP on srv269 is CRITICAL: Connection refused [11:54:48] PROBLEM - Apache HTTP on mw67 is CRITICAL: Connection refused [11:57:48] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [12:02:27] PROBLEM - Apache HTTP on mw19 is CRITICAL: Connection refused [12:08:50] RECOVERY - Apache HTTP on mw67 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [12:11:50] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [12:12:08] PROBLEM - Apache HTTP on mw9 is CRITICAL: Connection refused [12:14:59] RECOVERY - Apache HTTP on mw9 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.042 second response time [12:14:59] PROBLEM - Host srv272 is DOWN: PING CRITICAL - Packet loss = 100% [12:16:47] RECOVERY - Host srv272 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [12:20:05] PROBLEM - Apache HTTP on srv272 is CRITICAL: Connection refused [12:25:56] PROBLEM - Apache HTTP on mw41 is CRITICAL: Connection refused [12:28:56] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.042 second response time [12:34:56] PROBLEM - Apache HTTP on mw26 is CRITICAL: Connection refused [12:35:41] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [12:37:38] PROBLEM - Host mw74 is DOWN: PING CRITICAL - Packet loss = 100% [12:40:11] RECOVERY - Host mw74 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [12:41:59] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [12:44:05] PROBLEM - Apache HTTP on mw74 is CRITICAL: Connection refused [12:51:17] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [12:51:26] PROBLEM - Apache HTTP on mw73 is CRITICAL: Connection refused [12:54:17] PROBLEM - Host mw68 is DOWN: PING CRITICAL - Packet loss = 100% [12:54:35] hello [12:54:48] hi nosy [12:54:53] does anyone of you know if ms6 in haarlem is still in production? [12:55:04] hello mutante [12:55:07] do you know this? [12:55:49] it looks like it is, not decommissioned [12:56:02] mutante: thanks [12:56:13] don't see open ticket with the name either [12:56:15] k,np [12:56:23] RECOVERY - Host mw68 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [12:57:26] mutante: i just asked because oracle asked wmde to renew a support contract for it [12:57:44] if this host was no longer needed that would be good to know [12:58:18] nosy: ok, well i don't know about the contract but the host is up and running [12:59:50] PROBLEM - Apache HTTP on mw68 is CRITICAL: Connection refused [13:02:41] RECOVERY - Apache HTTP on mw74 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [13:06:49] PROBLEM - Apache HTTP on mw72 is CRITICAL: Connection refused [13:08:10] RECOVERY - Apache HTTP on mw72 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [13:14:46] PROBLEM - Apache HTTP on mw66 is CRITICAL: Connection refused [13:17:46] RECOVERY - Apache HTTP on mw73 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [13:18:49] PROBLEM - Host mw64 is DOWN: PING CRITICAL - Packet loss = 100% [13:20:28] RECOVERY - Apache HTTP on mw66 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [13:20:46] RECOVERY - Host mw64 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [13:22:43] RECOVERY - Apache HTTP on mw68 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.093 second response time [13:24:04] PROBLEM - Apache HTTP on mw64 is CRITICAL: Connection refused [13:38:10] PROBLEM - Apache HTTP on mw22 is CRITICAL: Connection refused [13:44:19] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [13:46:16] PROBLEM - Apache HTTP on mw65 is CRITICAL: Connection refused [13:52:16] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.054 second response time [13:55:16] PROBLEM - Apache HTTP on mw34 is CRITICAL: Connection refused [14:00:26] PROBLEM - Host mw45 is DOWN: PING CRITICAL - Packet loss = 100% [14:01:53] RECOVERY - Apache HTTP on mw65 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.044 second response time [14:02:01] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.11.45:11000 (Connection timed out) [14:02:37] PROBLEM - Apache HTTP on mw2 is CRITICAL: Connection refused [14:04:48] PROBLEM - Puppet freshness on blondel is CRITICAL: Puppet has not run in the last 10 hours [14:10:39] PROBLEM - Apache HTTP on mw52 is CRITICAL: Connection refused [14:12:42] hi ops room [14:12:49] who's around to approve my tickets? [14:13:05] Have you brought bribes? [14:13:12] hmmm [14:13:12] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [14:13:16] i have two cookies sitting next to me [14:13:20] you can come and get them? [14:13:28] they are coconut oatmeal raisin! [14:13:31] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [14:13:33] wait i am not myself [14:13:37] ottomata1? brb [14:13:58]