[00:52:40] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [00:53:07] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [01:03:41] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:03:41] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [01:03:41] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [01:03:41] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [01:03:41] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [01:03:42] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [01:40:35] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 239 seconds [01:41:29] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 293 seconds [01:41:56] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [01:42:23] PROBLEM - Host ekrem is DOWN: CRITICAL - Host Unreachable (208.80.152.178) [01:43:00] Does anyone on the channel atm have shell access? [01:45:05] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 27 seconds [01:45:05] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (39726) [01:45:32] PROBLEM - check_job_queue on neon is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , zhwiki (40484) [01:45:59] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 19 seconds [02:35:50] Jasper_Deng: did you need something? [02:36:12] (i'm assuming from the way you asked that it's not warranted but people can be paged. etc.) [02:58:53] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [03:03:41] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [03:04:34] jeremyb: yep [03:04:40] two things actually, now [03:04:43] Jasper_Deng: what's up? [03:04:58] jeremyb: 1: are you able to do anything about http://status.wikimedia.org/8777/156492/IRC-RecentChanges? [03:05:37] ergh [03:05:41] 2: Are you able to do anything about the particular account at http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#2001:0:0:0:0:0:0:1? [03:09:02] Jasper_Deng: so, have we scanned other wikis for similar names yet? [03:09:08] nope [03:09:12] * Jasper_Deng has known of no such scan [03:10:32] would you be interested in doing something about that? maybe with hoo? or i guess it would be a fairly simple matter to query the toolserver replicas [03:10:38] i don't have a TS account so i can't do it myself [03:11:09] * Jasper_Deng doesn't know how to do SQL queries on Toolserver [03:11:35] well you need an account there first [03:11:50] you could ask in #wikimedia-toolserver and see if someone's willing to do it for you [03:11:53] maybe Brooke [03:12:22] ugh [03:12:58] or if they really can't log in at all and they didn't complain themselves it can probably wait a couple days [03:13:11] (it's been so long already) [03:13:38] Bugzilla? [03:14:24] sure [03:14:48] still may speed things up if there's some idea of which wikis are effected and to what extent [03:16:31] I don't know how to do that [03:16:36] * Jasper_Deng needs to first compose a suitable regex [03:17:13] well you could query first for `like '%:%'` and then narrow it down from there [03:17:23] i rarely see : in usernames [03:17:49] anyway, for the other issue: i tested myself from 2 different connections. both with netcat and irssi. definitely a problem there [03:17:58] that is a bigger emergency [03:18:04] Yep. [03:18:11] 15 01:42:23 <+nagios-wm> PROBLEM - Host ekrem is DOWN: CRITICAL - Host Unreachable (208.80.152.178) [03:18:18] ^^^ that's the problem [03:18:22] can anything be done? [03:18:29] i can poke some people maybe [03:18:48] idk, I think sysadmins are off work today [03:18:51] (for today*) [03:19:43] Jasper_Deng: they're on all day and all night if it's serious enough [03:20:10] * Jasper_Deng doesn't know who to poke [03:20:25] i think i can figure that out [03:49:44] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [03:50:04] nagios-wm: quiet! [03:50:21] more screaming about ekrem, less about ms-be[67]. kthx [03:53:24] yo [03:53:29] jeremyb: [03:53:33] don't really know anything much [03:53:41] nagios complained, watchmouse looks bad [03:53:52] tried irssi and netcat both from 2 different connections [03:53:55] just get timeouts [03:53:59] jeremyb: what time is it your side of the earth? ;) [03:54:17] 15 01:42:23 <+nagios-wm> PROBLEM - Host ekrem is DOWN: CRITICAL - Host Unreachable (208.80.152.178) [03:54:21] T3rminat0r: UTC [03:54:25] ok [03:54:32] so that's a little over 2 hrs back [03:54:37] yea. [03:55:08] !log powercycling ekrem [03:55:15] it looks like the last line of an aborted reboot or something [03:55:18] Logged the message, Mistress of the network gear. [03:55:37] hrmmm? as in some odd runlevel? [03:55:56] yeah [03:55:59] like unresponsive [03:56:09] hm [03:56:19] hi paravoid! [03:56:21] long time, no see [03:56:51] sda is fsck'ing [03:57:13] LeslieCarr: just been too long? [03:57:14] RECOVERY - Host ekrem is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [03:57:20] yeah [03:57:22] 227 days [03:59:16] ok, ekrem should be up [03:59:17] yay [03:59:35] LeslieCarr: still econnrefused [03:59:46] really? i just ssh'ed in [03:59:46] instead of timeout like before [03:59:54] well 6667 not 22 [04:00:00] ah [04:00:14] yes triple checked [04:00:22] weird [04:00:28] well, i used 22 ... [04:00:55] (6667 is IRC in case you've forgetten. that's the ircd box) [04:01:30] yeah [04:01:32] thanks [04:02:05] hrm, updating puppet but wonder if irc doesn't auto start [04:02:36] oh, maybe it doesn't just fix itself [04:04:18] paravoid: http://wikitech.wikimedia.org/view/IRCD [04:04:20] just used that [04:04:34] Jasper_Deng: please sanity check [04:04:36] !log started ircd using the instructions here - http://wikitech.wikimedia.org/view/IRCD on ekrem [04:04:37] LeslieCarr: i'm in [04:04:45] I think puppet would have started it too [04:04:46] jeremyb: ? [04:04:46] Logged the message, Mistress of the network gear. [04:04:48] not seeing any edits [04:04:52] there's an ensure => running in puppet at least [04:04:56] in #en.wikipedia at least [04:05:01] The bots need to be restarted [04:05:10] Jasper_Deng: first the ircd has to work [04:05:11] yeah, not sure if it's trying to run the correct thing though [04:05:23] oh need to do /usr/local/bin/start-ircbot [04:05:54] jeremyb: is it working now ? [04:06:34] i can get into 6667 [04:06:41] LeslieCarr: looks broken [04:06:51] grrr, y u break ekrem ? [04:06:58] LeslieCarr: MOTD lists an example channel and that channel doesn't give me edits [04:07:22] :( [04:07:34] the corresponding onwiki RC feed has edits this minute [04:07:37] (enwiki) [04:12:24] paravoid: any ideas ? [04:13:01] * jeremyb goes to get a cold beverage ;) [04:15:07] * jeremyb goes to look at the way labs has it set up (for beta) [04:15:20] LeslieCarr: it seems to be recovering per http://status.wikimedia.org/8777/156492/IRC-RecentChanges [04:16:02] hrm, now is it actually up though :) [04:16:19] Jasper_Deng: i think that's just TCP, not functionality [04:16:26] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [04:16:29] Jasper_Deng: are you able to connect to it and see edits? [04:16:37] what's the channel to check for the irc feed? [04:16:48] LeslieCarr: where? [04:16:54] jeremyb: I can't test it, because bots need to be restarted [04:17:03] LeslieCarr: there's bots that consume the feed and spit it back out again on a different network [04:17:14] Jasper_Deng: what are you talking about? [04:17:19] Jasper_Deng: can't you pretend to be a bot? [04:17:32] * Jasper_Deng does not know how to impersonate a bot [04:17:43] yay! works now [04:17:48] LeslieCarr: did you change something? [04:18:14] did i change something ? [04:18:24] i didn't, did you paravoid ? [04:18:33] I did [04:18:38] killed the bot, let it restart [04:19:28] yay [04:19:36] ok, closing laptop again [04:20:08] LeslieCarr: k, thanks, enjoy... err, you're a ghost [04:20:58] heh [04:22:28] paravoid: you too of course for finishing it off ;) [05:07:44] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [05:29:11] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 198 seconds [05:29:20] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 204 seconds [05:32:20] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [05:32:38] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [05:34:26] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [05:46:26] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 181 seconds [05:49:35] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [06:04:44] PROBLEM - Squid on brewster is CRITICAL: Connection refused [06:28:44] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 185 seconds [06:29:56] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 238 seconds [06:33:14] RECOVERY - Squid on brewster is OK: TCP OK - 0.002 second response time on port 8080 [06:33:50] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [06:33:50] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:36:05] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 183 seconds [06:36:23] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 197 seconds [06:49:12] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [06:49:39] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 7 seconds [07:36:18] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 184 seconds [07:37:03] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [07:37:39] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 236 seconds [07:40:39] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 8 seconds [07:40:48] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 13 seconds [07:49:57] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [08:01:48] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 187 seconds [08:03:09] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 198 seconds [08:06:18] PROBLEM - MySQL Replication Heartbeat on db1042 is CRITICAL: CRIT replication delay 181 seconds [08:07:39] PROBLEM - MySQL Slave Delay on db1042 is CRITICAL: CRIT replication delay 197 seconds [08:10:39] RECOVERY - MySQL Slave Delay on db1042 is OK: OK replication delay 0 seconds [08:10:48] RECOVERY - MySQL Replication Heartbeat on db1042 is OK: OK replication delay 0 seconds [10:31:39] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [10:40:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [11:05:06] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [11:43:03] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [11:48:36] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [13:00:00] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:07:57] PROBLEM - SSH on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:08:15] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:08:51] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.209:11000 (timeout) [13:09:18] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:09:36] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [13:13:30] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [13:18:09] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.14:11000 (timeout) [13:23:24] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:33] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:33] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:42] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:51] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:18] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:27] PROBLEM - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:27] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:27] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:27] PROBLEM - Apache HTTP on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:36] PROBLEM - Apache HTTP on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:36] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:54] PROBLEM - Apache HTTP on srv236 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:54] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:24:54] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:03] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:03] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:12] PROBLEM - Apache HTTP on mw26 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:21] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:21] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:30] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:30] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:30] PROBLEM - SSH on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:30] PROBLEM - Apache HTTP on srv267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:39] PROBLEM - Apache HTTP on mw19 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:39] PROBLEM - Memcached on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:39] PROBLEM - Apache HTTP on mw35 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:48] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out [13:25:48] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:48] PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:48] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:48] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:49] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:49] PROBLEM - SSH on srv267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:57] PROBLEM - Apache HTTP on mw56 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:57] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:58] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:58] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:06] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:06] PROBLEM - Apache HTTP on mw53 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:06] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:06] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:06] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:07] PROBLEM - Apache HTTP on srv207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:07] PROBLEM - Apache HTTP on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:15] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:15] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:24] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:24] PROBLEM - Apache HTTP on mw45 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:24] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:42] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:42] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:42] PROBLEM - Apache HTTP on mw32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:42] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:42] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:43] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:51] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:51] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:26:51] PROBLEM - Apache HTTP on srv235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:00] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:00] PROBLEM - Apache HTTP on srv194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:00] PROBLEM - Apache HTTP on srv240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:00] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:00] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:01] PROBLEM - Apache HTTP on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:09] RECOVERY - Memcached on srv260 is OK: TCP OK - 0.001 second response time on port 11000 [13:27:09] PROBLEM - Apache HTTP on srv268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:09] PROBLEM - Apache HTTP on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:09] PROBLEM - Apache HTTP on srv209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:18] RECOVERY - SSH on srv267 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:27:18] PROBLEM - Apache HTTP on srv233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:18] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:27] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.0 504 Gateway Time-out [13:27:27] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:27] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:27] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:27] PROBLEM - Apache HTTP on srv285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:28] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:28] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:29] PROBLEM - SSH on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:29] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:30] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:36] PROBLEM - Apache HTTP on srv244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:36] PROBLEM - Apache HTTP on srv228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:36] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:45] PROBLEM - SSH on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:45] PROBLEM - Apache HTTP on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:45] PROBLEM - Apache HTTP on srv261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:45] PROBLEM - Apache HTTP on srv197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:45] PROBLEM - Apache HTTP on srv241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:46] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:46] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:54] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:54] PROBLEM - Apache HTTP on srv282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:54] PROBLEM - Apache HTTP on srv210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:54] PROBLEM - Apache HTTP on srv205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:54] PROBLEM - Apache HTTP on srv202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:27:55] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:03] PROBLEM - Apache HTTP on srv203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:03] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:03] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:03] PROBLEM - Apache HTTP on srv246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:12] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:12] PROBLEM - Apache HTTP on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:12] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:12] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:12] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out [13:28:21] PROBLEM - SSH on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:21] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:21] PROBLEM - Apache HTTP on srv243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:21] PROBLEM - Apache HTTP on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:30] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:30] PROBLEM - Apache HTTP on srv231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:30] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:30] PROBLEM - Apache HTTP on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:30] PROBLEM - Apache HTTP on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:31] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:48] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:48] PROBLEM - Apache HTTP on srv195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:48] PROBLEM - Apache HTTP on srv247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:48] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:48] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:49] PROBLEM - Apache HTTP on srv226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:49] PROBLEM - Apache HTTP on srv198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:50] PROBLEM - Apache HTTP on srv201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:50] PROBLEM - Apache HTTP on srv208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:58] PROBLEM - Apache HTTP on srv289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:58] PROBLEM - Apache HTTP on srv225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:58] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:58] PROBLEM - Apache HTTP on srv278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:28:58] RECOVERY - SSH on srv272 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:29:06] PROBLEM - Apache HTTP on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:29:06] PROBLEM - SSH on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:29:15] PROBLEM - Apache HTTP on srv238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:29:24] Lots of people reporting issues with en.wikipedia.org. [13:29:31] :o [13:29:43] http://status.wikimedia.org/ seems to indicate some issues. [13:29:51] PROBLEM - SSH on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:30:05] Not just en.wp, but should be all [13:30:09] RECOVERY - SSH on srv264 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:30:18] * aude panics [13:31:29] hi LeslieCarr [13:31:30] PROBLEM - SSH on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:31:41] hey, so is shit dying ? [13:31:57] PROBLEM - SSH on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:31:57] PROBLEM - Memcached on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:32:02] oh memcache [13:32:04] nagios-wm seems to think so [13:32:05] :P [13:32:06] RECOVERY - SSH on srv280 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:32:13] LeslieCarr: yes [13:32:14] edits still come through though [13:32:15] RECOVERY - SSH on srv262 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:32:24] PROBLEM - Memcached on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:32:27] Error 503 Service Unavailable on mobile [13:32:32] Varnish cache server [13:32:42] enwiki not coming up for me at all [13:32:51] RECOVERY - SSH on srv269 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:33:30] fuck, like 10 of the memcache servers are down [13:33:36] PROBLEM - SSH on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:33:41] :( [13:33:45] RECOVERY - Memcached on srv272 is OK: TCP OK - 0.003 second response time on port 11000 [13:34:07] here [13:34:30] RECOVERY - SSH on srv270 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:34:33] hi [13:34:39] hey [13:34:39] PROBLEM - SSH on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:34:46] notpeter: something is craycray with memcache, huge loads on the memcache servers and memcache isn't responding on a lot of them [13:35:06] RECOVERY - Memcached on srv275 is OK: TCP OK - 8.999 second response time on port 11000 [13:36:09] PROBLEM - SSH on srv261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:36:09] i'm looking at srv288 now ... [13:36:32] srv288 is frozen, [13:36:36] !log rebooting srv288 [13:36:41] oh wait no [13:36:44] responding now [13:36:45] Logged the message, Mistress of the network gear. [13:37:00] just very slow [13:37:30] RECOVERY - SSH on srv288 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:37:36] it's back for me [13:37:39] RECOVERY - SSH on srv261 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:37:49] not the mobile [13:38:03] and could just be something i have cached [13:38:06] RECOVERY - SSH on srv286 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:38:15] PROBLEM - Memcached on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:24] not working really [13:38:33] PROBLEM - SSH on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:38:51] PROBLEM - Memcached on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:39:18] RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:39:30] so, i restarted apache2 on srv288 and that seems to have alleviated the unresponsive issue on that machine [13:39:45] RECOVERY - SSH on srv275 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:39:47] good [13:39:54] PROBLEM - Memcached on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:12] RECOVERY - Memcached on srv286 is OK: TCP OK - 0.001 second response time on port 11000 [13:40:40] i am moving on to srv280 [13:41:06] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61569 bytes in 2.499 seconds [13:41:09] mark/notpeter do you remember what the root cause was last time we had all the memcache machines going crazy ? [13:41:18] memcached is not going crazy [13:41:30] the boxes are going into swap [13:41:58] * aude is getting some pages to load, but very slow and some not loading [13:42:18] PROBLEM - SSH on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:42:41] this feels like the same problem we had a few months back ? trying to remember the root cause from that one... [13:43:12] PROBLEM - SSH on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:14] june 20th email - Incident report: appservers slowness/outage [13:44:15] PROBLEM - SSH on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:18] that thread [13:44:33] RECOVERY - SSH on srv263 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:44:33] PROBLEM - SSH on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:42] PROBLEM - Backend Squid HTTP on sq72 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:42] PROBLEM - SSH on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:44:42] PROBLEM - SSH on srv289 is CRITICAL: Server answer: [13:45:54] PROBLEM - SSH on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:54] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:45:54] RECOVERY - Memcached on srv260 is OK: TCP OK - 2.997 second response time on port 11000 [13:46:21] PROBLEM - SSH on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:47:15] RECOVERY - SSH on srv271 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:47:33] RECOVERY - Memcached on srv275 is OK: TCP OK - 2.998 second response time on port 11000 [13:47:42] RECOVERY - SSH on srv286 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:47:42] RECOVERY - SSH on srv280 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:48:36] PROBLEM - SSH on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:48:45] PROBLEM - SSH on srv261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:30] PROBLEM - Memcached on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:15] PROBLEM - SSH on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:50:15] RECOVERY - SSH on srv261 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:50:15] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48275 bytes in 7.406 seconds [13:50:42] RECOVERY - SSH on srv275 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:51:09] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [13:51:45] RECOVERY - SSH on srv258 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:51:45] PROBLEM - SSH on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:52:01] tail -f sampled-1000.log | grep Main_Page\? [13:52:03] RECOVERY - SSH on srv283 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:52:10] wrong window [13:52:30] RECOVERY - Memcached on srv262 is OK: TCP OK - 0.002 second response time on port 11000 [13:52:39] PROBLEM - SSH on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:15] RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:53:24] PROBLEM - Memcached on srv258 is CRITICAL: Connection refused [13:53:51] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61566 bytes in 8.287 seconds [13:54:09] RECOVERY - SSH on srv286 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:54:45] RECOVERY - SSH on srv288 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [13:54:54] RECOVERY - Memcached on srv258 is OK: TCP OK - 0.003 second response time on port 11000 [13:55:12] PROBLEM - SSH on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:12] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:14] :-) [13:55:39] PROBLEM - SSH on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:56:06] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61565 bytes in 2.648 seconds [13:57:11] !log rebooting srv270 (unresponsive) [13:57:20] Logged the message, Mistress of the network gear. [13:58:21] PROBLEM - SSH on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:30] PROBLEM - SSH on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:57] PROBLEM - Memcached on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:59:15] PROBLEM - Host srv270 is DOWN: PING CRITICAL - Packet loss = 100% [13:59:51] RECOVERY - SSH on srv283 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:00:27] RECOVERY - Memcached on srv274 is OK: TCP OK - 0.001 second response time on port 11000 [14:00:45] RECOVERY - Host srv270 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [14:00:54] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:01:03] RECOVERY - SSH on srv270 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:01:12] PROBLEM - SSH on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:02:06] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:03] !log powercycling srv259 (unresponsive) [14:03:12] Logged the message, Mistress of the network gear. [14:03:27] i don't think powercycling boxes is very helpful leslie [14:03:36] PROBLEM - Host srv262 is DOWN: PING CRITICAL - Packet loss = 100% [14:03:43] they're just in swap deeath [14:04:12] RECOVERY - SSH on srv288 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:04:46] ok [14:04:57] PROBLEM - Memcached on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:05:24] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48087 bytes in 2.186 seconds [14:05:24] PROBLEM - Host srv259 is DOWN: PING CRITICAL - Packet loss = 100% [14:05:33] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61565 bytes in 0.967 seconds [14:05:42] PROBLEM - Memcached on srv289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:06:00] RECOVERY - Apache HTTP on srv288 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.296 second response time [14:06:00] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.652 second response time [14:06:00] RECOVERY - Apache HTTP on srv243 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.060 second response time [14:06:00] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39958 bytes in 6.162 seconds [14:06:09] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61379 bytes in 0.153 seconds [14:06:09] RECOVERY - SSH on srv259 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:06:09] RECOVERY - Apache HTTP on srv207 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.765 second response time [14:06:09] RECOVERY - Apache HTTP on srv231 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.464 second response time [14:06:09] RECOVERY - Apache HTTP on srv240 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.495 second response time [14:06:10] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.229 second response time [14:06:10] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.588 second response time [14:06:11] RECOVERY - Apache HTTP on srv235 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.521 second response time [14:06:11] RECOVERY - Apache HTTP on srv280 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.276 second response time [14:06:12] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.719 second response time [14:06:12] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.622 second response time [14:06:18] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.917 second response time [14:06:18] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.216 second response time [14:06:18] RECOVERY - Apache HTTP on srv229 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.062 second response time [14:06:18] RECOVERY - Host srv259 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [14:06:18] RECOVERY - Apache HTTP on srv277 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.833 second response time [14:06:27] RECOVERY - Memcached on srv275 is OK: TCP OK - 0.003 second response time on port 11000 [14:06:27] RECOVERY - Apache HTTP on srv209 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [14:06:27] RECOVERY - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39768 bytes in 0.191 seconds [14:06:27] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.679 second response time [14:06:27] RECOVERY - Apache HTTP on srv194 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [14:06:28] RECOVERY - Backend Squid HTTP on sq72 is OK: HTTP OK HTTP/1.0 200 OK - 1249 bytes in 0.003 seconds [14:06:28] RECOVERY - Apache HTTP on srv233 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.295 second response time [14:06:29] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.615 second response time [14:06:29] RECOVERY - Apache HTTP on srv225 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.090 second response time [14:06:36] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.279 second response time [14:06:36] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.497 second response time [14:06:36] RECOVERY - Apache HTTP on srv268 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.224 second response time [14:06:36] RECOVERY - Apache HTTP on srv198 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.175 second response time [14:06:36] RECOVERY - Apache HTTP on srv226 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.273 second response time [14:06:37] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61570 bytes in 1.054 seconds [14:06:45] RECOVERY - SSH on srv262 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:06:45] RECOVERY - Apache HTTP on srv228 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [14:06:45] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [14:06:45] RECOVERY - Apache HTTP on srv238 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [14:06:45] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [14:06:46] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [14:06:46] RECOVERY - Apache HTTP on srv212 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [14:06:47] RECOVERY - Apache HTTP on srv201 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [14:06:54] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61370 bytes in 0.161 seconds [14:06:54] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [14:06:54] RECOVERY - Apache HTTP on srv196 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [14:06:54] RECOVERY - Apache HTTP on srv285 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [14:06:54] RECOVERY - Apache HTTP on srv237 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [14:06:55] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [14:06:55] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [14:06:56] RECOVERY - Host srv262 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [14:07:03] RECOVERY - Apache HTTP on srv261 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [14:07:03] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.289 second response time [14:07:03] RECOVERY - Apache HTTP on srv244 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.279 second response time [14:07:03] RECOVERY - Apache HTTP on srv197 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.294 second response time [14:07:12] RECOVERY - Apache HTTP on srv210 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [14:07:12] RECOVERY - Apache HTTP on srv282 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.021 second response time [14:07:12] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.283 second response time [14:07:12] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.281 second response time [14:07:12] RECOVERY - Memcached on srv289 is OK: TCP OK - 2.999 second response time on port 11000 [14:07:21] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [14:07:21] RECOVERY - Apache HTTP on srv202 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [14:07:21] RECOVERY - Apache HTTP on srv205 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [14:07:21] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [14:07:21] RECOVERY - Apache HTTP on srv246 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [14:07:22] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [14:07:22] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [14:07:23] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.997 second response time [14:07:30] RECOVERY - Apache HTTP on srv203 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [14:07:30] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [14:07:39] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.292 second response time [14:07:39] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.292 second response time [14:07:39] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.296 second response time [14:07:39] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61785 bytes in 0.429 seconds [14:07:40] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.036 second response time [14:07:40] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48275 bytes in 2.636 seconds [14:07:48] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.294 second response time [14:07:48] RECOVERY - Apache HTTP on srv195 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.288 second response time [14:07:57] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.074 second response time [14:08:06] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.683 second response time [14:08:15] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.704 second response time [14:08:15] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.524 second response time [14:08:24] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.737 second response time [14:08:24] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.686 second response time [14:08:42] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48282 bytes in 2.466 seconds [14:08:42] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.015 second response time [14:08:51] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.893 second response time [14:09:09] Recovery! :) [14:09:09] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.456 second response time [14:09:09] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.638 second response time [14:09:18] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.458 second response time [14:09:18] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.437 second response time [14:09:45] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 0.913 seconds [14:09:54] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.151 second response time [14:10:03] Hydriz: don't say that... [14:10:03] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.644 second response time [14:10:03] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.693 second response time [14:10:12] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.195 second response time [14:10:19] hmm? [14:10:38] Don't count the chickens before they're hatched. [14:10:39] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 1.893 seconds [14:10:57] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 0.147 seconds [14:11:06] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.995 second response time [14:12:09] PROBLEM - SSH on srv260 is CRITICAL: Server answer: [14:12:36] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:12:45] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.574 second response time [14:12:54] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.530 second response time [14:12:54] RECOVERY - Apache HTTP on mw56 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.320 second response time [14:13:03] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.946 second response time [14:13:12] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.206 second response time [14:13:48] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.251 second response time [14:13:57] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:06] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:06] PROBLEM - Apache HTTP on mw26 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:24] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:33] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:42] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:51] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:51] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [14:15:54] PROBLEM - Apache HTTP on mw19 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:21] RECOVERY - Apache HTTP on srv262 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [14:16:39] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:24] RECOVERY - SSH on srv274 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:17:24] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:24] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:42] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.826 second response time [14:17:42] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:42] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:42] PROBLEM - Apache HTTP on mw56 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:51] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.422 second response time [14:17:51] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:51] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:51] PROBLEM - Apache HTTP on mw53 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:18] PROBLEM - Host srv275 is DOWN: PING CRITICAL - Packet loss = 100% [14:18:27] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:27] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:45] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.853 second response time [14:18:45] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.332 second response time [14:18:54] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.789 second response time [14:18:54] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.028 second response time [14:18:54] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.090 second response time [14:18:54] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.268 second response time [14:19:03] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.386 second response time [14:19:03] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.732 second response time [14:19:03] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.861 second response time [14:19:03] RECOVERY - Apache HTTP on mw56 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.977 second response time [14:19:03] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.004 second response time [14:19:12] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.169 second response time [14:19:12] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [14:19:12] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [14:19:21] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.265 second response time [14:19:21] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.331 second response time [14:19:30] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.589 second response time [14:19:30] RECOVERY - Apache HTTP on mw17 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.234 second response time [14:19:39] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.031 second response time [14:19:39] RECOVERY - Apache HTTP on mw45 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.506 second response time [14:19:48] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.094 second response time [14:19:48] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.585 second response time [14:19:48] RECOVERY - Apache HTTP on mw32 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.801 second response time [14:19:48] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.374 second response time [14:19:57] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.825 second response time [14:19:57] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.030 second response time [14:20:06] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.637 second response time [14:20:06] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.640 second response time [14:20:06] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.069 second response time [14:20:06] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.742 second response time [14:20:15] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.989 second response time [14:20:25] RECOVERY - SSH on srv275 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:20:25] RECOVERY - Apache HTTP on srv274 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [14:20:25] RECOVERY - Host srv275 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [14:20:42] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.581 second response time [14:20:42] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.994 second response time [14:21:18] RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:21:36] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.681 second response time [14:21:54] RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.050 second response time [14:22:10] LeslieCarr: what was the problem? [14:22:23] swapdeath on application servers [14:22:30] ok [14:22:36] can i tweet that? ;) [14:22:45] aude: i think we're not out of the woods? [14:22:46] or in english? [14:22:52] jeremyb: really? [14:23:07] sadly yes [14:23:11] :( [14:23:15] PROBLEM - Apache HTTP on srv242 is CRITICAL: Connection refused [14:23:17] * aude was the first to tweet the problem [14:24:19] * jeremyb crowns aude. maybe chatter -> #-tech [14:24:22] it's a symptom, not a cause [14:24:38] heh [14:25:03] RECOVERY - SSH on srv289 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:26:24] RECOVERY - Apache HTTP on srv289 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [14:27:09] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [14:31:48] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:31:57] PROBLEM - Apache HTTP on srv267 is CRITICAL: Connection refused [14:31:57] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.37:11000 (Connection timed out) 10.0.8.14:11000 (timeout) 10.0.2.235:11000 (timeout) 10.0.8.23:11000 (timeout) 10.0.8.38:11000 (timeout) [14:32:06] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:06] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:24] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:33] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:33] PROBLEM - Apache HTTP on mw35 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:42] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:42] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:42] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:42] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:51] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:51] PROBLEM - Apache HTTP on mw56 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:51] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:51] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:32:51] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:00] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:00] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:00] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:00] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:00] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:01] RECOVERY - Apache HTTP on srv259 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.028 second response time [14:33:01] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:09] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:18] PROBLEM - Apache HTTP on mw53 is CRITICAL: Connection refused [14:33:18] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:27] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:27] PROBLEM - Apache HTTP on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:27] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:27] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:27] PROBLEM - Apache HTTP on mw45 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:28] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.445 second response time [14:33:36] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:36] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:36] PROBLEM - Apache HTTP on mw32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:36] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:36] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:37] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:45] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:45] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:45] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:45] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:45] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:54] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:54] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:03] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:03] PROBLEM - Memcached on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:12] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:21] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:21] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 0.154 seconds [14:34:30] PROBLEM - Apache HTTP on mw19 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:30] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:30] PROBLEM - SSH on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:57] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.461 second response time [14:34:57] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61805 bytes in 1.824 seconds [14:35:33] RECOVERY - Memcached on srv288 is OK: TCP OK - 0.002 second response time on port 11000 [14:37:21] RECOVERY - SSH on srv287 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:37:58] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.831 second response time [14:38:06] PROBLEM - Puppet freshness on ekrem is CRITICAL: Puppet has not run in the last 10 hours [14:38:15] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:15] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:33] PROBLEM - SSH on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:33] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.042 second response time [14:38:42] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:36] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:54] PROBLEM - Apache HTTP on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:54] PROBLEM - SSH on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:54] PROBLEM - Apache HTTP on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:54] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:03] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:03] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:03] PROBLEM - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:12] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39961 bytes in 9.900 seconds [14:40:12] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:40:30] PROBLEM - Apache HTTP on srv285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:06] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.037 second response time [14:41:15] PROBLEM - Apache HTTP on srv236 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:15] PROBLEM - Apache HTTP on srv243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:24] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:25] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:25] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:25] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:25] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:33] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:33] PROBLEM - Apache HTTP on srv207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:33] PROBLEM - Apache HTTP on srv194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:33] PROBLEM - Apache HTTP on srv247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:42] PROBLEM - Apache HTTP on srv240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:42] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:42] PROBLEM - Apache HTTP on srv209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:42] PROBLEM - Apache HTTP on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:42] PROBLEM - Memcached on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:43] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:43] PROBLEM - Apache HTTP on srv278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:51] PROBLEM - Apache HTTP on srv238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:51] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:51] PROBLEM - Apache HTTP on srv208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:51] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:00] PROBLEM - Apache HTTP on srv233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:00] PROBLEM - Apache HTTP on srv228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:09] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:18] PROBLEM - Apache HTTP on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:18] PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:45] PROBLEM - Apache HTTP on srv241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:45] PROBLEM - Apache HTTP on srv205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:45] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:42:45] PROBLEM - Apache HTTP on srv203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:03] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:03] RECOVERY - SSH on srv276 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:43:48] PROBLEM - SSH on srv285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:06] PROBLEM - SSH on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:15] PROBLEM - Apache HTTP on srv246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:24] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61377 bytes in 0.198 seconds [14:44:51] PROBLEM - Apache HTTP on srv195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:51] PROBLEM - SSH on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:51] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:51] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:51] PROBLEM - Apache HTTP on srv268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:00] PROBLEM - Apache HTTP on srv226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:00] PROBLEM - Apache HTTP on srv198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:00] PROBLEM - Apache HTTP on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:00] PROBLEM - Apache HTTP on srv289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:09] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:09] PROBLEM - Apache HTTP on srv201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:09] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:18] PROBLEM - Apache HTTP on srv261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:18] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:18] PROBLEM - Apache HTTP on srv282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:18] PROBLEM - Apache HTTP on srv197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:18] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:19] PROBLEM - Apache HTTP on srv225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:19] PROBLEM - Apache HTTP on srv210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:20] PROBLEM - Apache HTTP on srv244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:20] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:21] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:27] PROBLEM - Apache HTTP on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:27] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:27] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:27] PROBLEM - Apache HTTP on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:36] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:36] PROBLEM - SSH on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:36] PROBLEM - Apache HTTP on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:45] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:45] PROBLEM - Apache HTTP on srv202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:54] PROBLEM - Apache HTTP on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:54] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:54] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:45:54] PROBLEM - Apache HTTP on srv231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:12] PROBLEM - Apache HTTP on srv267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:21] PROBLEM - Apache HTTP on srv235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:48] RECOVERY - Apache HTTP on srv261 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.785 second response time [14:46:48] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.028 second response time [14:46:48] RECOVERY - Apache HTTP on srv275 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.355 second response time [14:46:48] RECOVERY - Apache HTTP on srv225 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.029 second response time [14:46:57] RECOVERY - Apache HTTP on srv282 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.325 second response time [14:46:57] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.304 second response time [14:46:57] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.378 second response time [14:46:57] RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.700 second response time [14:46:57] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.650 second response time [14:47:06] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.244 second response time [14:47:06] RECOVERY - SSH on srv271 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:47:06] RECOVERY - Apache HTTP on srv259 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.978 second response time [14:47:15] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61567 bytes in 1.091 seconds [14:47:15] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 4.236 seconds [14:47:15] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.430 second response time [14:47:15] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.376 second response time [14:47:15] RECOVERY - Apache HTTP on srv262 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.999 second response time [14:47:16] RECOVERY - Apache HTTP on srv202 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.499 second response time [14:47:24] RECOVERY - Apache HTTP on srv246 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.193 second response time [14:47:24] RECOVERY - Apache HTTP on srv203 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.793 second response time [14:47:24] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.022 second response time [14:47:24] RECOVERY - Apache HTTP on srv231 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.059 second response time [14:47:24] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.096 second response time [14:47:33] RECOVERY - Apache HTTP on srv207 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.116 second response time [14:47:33] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.474 second response time [14:47:33] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.081 second response time [14:47:42] RECOVERY - SSH on srv280 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:47:42] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.484 second response time [14:47:42] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 2.847 seconds [14:47:42] RECOVERY - Apache HTTP on srv235 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.331 second response time [14:47:51] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.938 second response time [14:47:51] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.814 second response time [14:47:51] RECOVERY - Apache HTTP on srv195 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.528 second response time [14:47:51] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.423 second response time [14:48:00] RECOVERY - Apache HTTP on srv229 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.110 second response time [14:48:00] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.049 second response time [14:48:00] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.598 second response time [14:48:00] RECOVERY - Apache HTTP on srv268 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.270 second response time [14:48:00] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.597 second response time [14:48:01] RECOVERY - Memcached on srv276 is OK: TCP OK - 8.996 second response time on port 11000 [14:48:01] RECOVERY - Apache HTTP on srv274 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.963 second response time [14:48:02] RECOVERY - Apache HTTP on srv289 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.662 second response time [14:48:02] RECOVERY - Apache HTTP on srv238 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.574 second response time [14:48:03] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.384 second response time [14:48:03] RECOVERY - Apache HTTP on srv226 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.353 second response time [14:48:04] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.774 second response time [14:48:04] RECOVERY - Apache HTTP on srv209 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.819 second response time [14:48:05] RECOVERY - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39768 bytes in 7.037 seconds [14:48:09] RECOVERY - Apache HTTP on srv198 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.956 second response time [14:48:09] RECOVERY - Apache HTTP on srv196 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.037 second response time [14:48:09] RECOVERY - Apache HTTP on srv228 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.492 second response time [14:48:09] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.061 second response time [14:48:09] RECOVERY - Apache HTTP on srv201 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.924 second response time [14:48:18] RECOVERY - Apache HTTP on srv233 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.026 second response time [14:48:18] RECOVERY - Apache HTTP on srv197 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.712 second response time [14:48:18] RECOVERY - Apache HTTP on srv210 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.738 second response time [14:48:18] RECOVERY - Apache HTTP on srv244 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.253 second response time [14:48:18] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.465 second response time [14:48:19] RECOVERY - Apache HTTP on srv237 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.039 second response time [14:48:27] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.984 second response time [14:48:27] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.891 second response time [14:48:27] RECOVERY - Apache HTTP on srv212 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.799 second response time [14:48:45] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.859 second response time [14:48:45] RECOVERY - Apache HTTP on srv243 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.041 second response time [14:48:45] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.752 second response time [14:48:54] RECOVERY - Apache HTTP on srv205 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.154 second response time [14:49:12] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time [14:49:12] RECOVERY - Apache HTTP on srv194 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [14:49:12] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48277 bytes in 1.891 seconds [14:49:21] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [14:49:21] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.303 second response time [14:49:21] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.827 second response time [14:49:21] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48275 bytes in 1.682 seconds [14:49:30] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.439 second response time [14:49:30] RECOVERY - Apache HTTP on srv277 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [14:49:30] RECOVERY - Apache HTTP on srv240 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [14:49:30] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.800 second response time [14:49:30] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 0.141 seconds [14:49:39] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.249 second response time [14:49:39] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.923 second response time [14:49:39] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.800 second response time [14:49:48] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.720 second response time [14:49:48] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.688 second response time [14:49:48] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.736 second response time [14:49:48] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.404 second response time [14:49:48] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.495 second response time [14:49:57] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.689 second response time [14:49:57] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.184 second response time [14:50:06] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.821 second response time [14:50:15] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.925 second response time [14:50:24] RECOVERY - SSH on srv288 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:50:33] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61803 bytes in 5.038 seconds [14:50:42] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.854 second response time [14:50:51] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.128 second response time [14:51:00] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39958 bytes in 0.786 seconds [14:51:18] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.885 second response time [14:51:18] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 1.058 seconds [14:51:27] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.738 second response time [14:51:36] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.054 second response time [14:51:36] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.115 second response time [14:52:12] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.383 second response time [14:52:30] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.515 second response time [14:52:48] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.265 second response time [14:52:48] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.749 second response time [14:53:15] PROBLEM - Memcached on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:42] RECOVERY - Apache HTTP on srv288 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.798 second response time [14:54:00] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:00] PROBLEM - Apache HTTP on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:09] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:18] PROBLEM - SSH on srv280 is CRITICAL: Server answer: [14:54:27] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:36] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:45] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:45] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:45] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:54:45] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:21] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:30] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:39] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:57] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.662 second response time [14:55:57] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:57] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:57] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:57] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:33] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:42] PROBLEM - SSH on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:18] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:18] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:36] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:36] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:36] PROBLEM - SSH on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:48] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:57] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.093 second response time [14:58:57] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48275 bytes in 1.721 seconds [14:58:57] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.828 second response time [14:59:06] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:06] PROBLEM - Memcached on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:06] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.409 second response time [14:59:06] PROBLEM - Memcached on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:06] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.181 second response time [14:59:15] RECOVERY - SSH on srv285 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [14:59:24] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.860 second response time [14:59:24] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.313 second response time [14:59:24] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.451 second response time [14:59:33] RECOVERY - Apache HTTP on mw56 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.022 second response time [14:59:33] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.645 second response time [15:00:00] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.187 second response time [15:00:00] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61556 bytes in 8.779 seconds [15:00:00] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.045 second response time [15:00:18] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.330 second response time [15:00:18] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.173 second response time [15:00:18] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.517 second response time [15:00:27] RECOVERY - Memcached on srv271 is OK: TCP OK - 0.004 second response time on port 11000 [15:00:36] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.480 second response time [15:00:36] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.524 second response time [15:00:45] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.935 second response time [15:00:54] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.391 second response time [15:01:21] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.368 second response time [15:01:48] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.527 second response time [15:01:48] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.661 second response time [15:01:57] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.624 second response time [15:01:57] RECOVERY - SSH on srv280 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:01:57] RECOVERY - Apache HTTP on mw32 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.506 second response time [15:02:06] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.993 second response time [15:02:06] RECOVERY - Memcached on srv280 is OK: TCP OK - 0.004 second response time on port 11000 [15:02:06] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.024 second response time [15:02:15] RECOVERY - Memcached on srv269 is OK: TCP OK - 2.993 second response time on port 11000 [15:02:42] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.471 second response time [15:03:00] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.846 second response time [15:03:09] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.990 second response time [15:03:27] RECOVERY - Apache HTTP on srv285 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [15:03:27] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.792 second response time [15:03:36] RECOVERY - SSH on srv269 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:03:36] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.286 second response time [15:03:45] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.051 second response time [15:04:12] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.742 second response time [15:04:12] RECOVERY - Apache HTTP on mw17 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.672 second response time [15:04:30] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [15:04:39] RECOVERY - Apache HTTP on mw45 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.653 second response time [15:04:39] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [15:06:36] RECOVERY - Apache HTTP on srv280 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [15:06:36] PROBLEM - Memcached on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:30] PROBLEM - Memcached on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:57] RECOVERY - Memcached on srv271 is OK: TCP OK - 0.011 second response time on port 11000 [15:08:42] RECOVERY - SSH on srv271 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:09:00] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:10:39] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [15:12:09] RECOVERY - Memcached on srv286 is OK: TCP OK - 8.996 second response time on port 11000 [15:12:36] PROBLEM - Memcached on srv271 is CRITICAL: Connection refused [15:14:06] RECOVERY - Memcached on srv271 is OK: TCP OK - 0.009 second response time on port 11000 [15:22:57] RECOVERY - SSH on srv286 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:23:33] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [15:24:45] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [16:15:54] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [16:35:06] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [16:35:06] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [16:42:09] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.17:11000 (Connection timed out) 10.0.2.228:11000 (timeout) 10.0.8.26:11000 (Connection timed out) 10.0.2.207:11000 (timeout) [16:43:32] ugh [16:51:36] PROBLEM - Apache HTTP on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:52:57] PROBLEM - SSH on srv277 is CRITICAL: Server answer: [16:53:24] PROBLEM - Memcached on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:54:45] RECOVERY - Memcached on srv277 is OK: TCP OK - 2.996 second response time on port 11000 [17:00:36] RECOVERY - Apache HTTP on srv277 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.706 second response time [17:00:45] RECOVERY - SSH on srv277 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:12:09] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:12:45] PROBLEM - SSH on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:03] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:30] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.937 second response time [17:13:39] PROBLEM - SSH on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:14:15] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:16:37] hey [17:16:41] hey [17:16:52] 15 16:42:09 <+nagios-wm> PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.8.17:11000 (Connection timed out) 10.0.2.228:11000 (timeout) 10.0.8.26:11000 (Connection timed out) 10.0.2.207:11000 (timeout) [17:16:57] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:17:06] that hasn't recovered yet. it's been ~35 mins [17:17:11] eep yeah [17:17:19] and some apaches are SSH/HTTP critical [17:17:19] we should really make memcache failures page us [17:17:20] anyways [17:17:40] watchmouse turned yellow 20ish mins ago [17:18:18] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [17:18:27] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:18:54] PROBLEM - SSH on srv273 is CRITICAL: Server answer: [17:19:05] from trying to login to srv271 this feels like the same issue... [17:21:18] PROBLEM - Memcached on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:22:57] RECOVERY - Memcached on srv273 is OK: TCP OK - 0.009 second response time on port 11000 [17:23:33] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:24:02] * jeremyb wonders what the difference is between 'Connection timed out' and 'timeout'. i guess one is connect timeout and one is successful connect and then later timeout [17:24:45] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.281 second response time [17:24:45] PROBLEM - Apache HTTP on srv285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:30] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:39] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:48] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:48] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:25:57] PROBLEM - Apache HTTP on mw26 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:26:15] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:26:24] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:26:51] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:26:51] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:00] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:09] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - Apache HTTP on srv238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - Apache HTTP on mw45 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:27] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:36] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:36] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:36] PROBLEM - Apache HTTP on mw53 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:45] PROBLEM - Apache HTTP on mw56 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:45] PROBLEM - Apache HTTP on srv228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:54] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:03] PROBLEM - Apache HTTP on mw19 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:03] PROBLEM - SSH on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:03] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:03] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:12] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:12] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:12] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:12] PROBLEM - Apache HTTP on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - SSH on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:21] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:30] PROBLEM - Apache HTTP on srv226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:30] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - SSH on srv277 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:39] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:48] RECOVERY - Apache HTTP on srv229 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.986 second response time [17:28:48] PROBLEM - Apache HTTP on mw35 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:48] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:48] PROBLEM - Apache HTTP on srv280 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:48] RECOVERY - Apache HTTP on srv238 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.296 second response time [17:28:57] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 0.771 seconds [17:28:57] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:06] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:06] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:06] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:15] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:24] PROBLEM - Apache HTTP on srv241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:24] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.454 second response time [17:29:33] RECOVERY - SSH on srv263 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:29:33] PROBLEM - Apache HTTP on srv267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:42] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:29:42] PROBLEM - SSH on srv265 is CRITICAL: Server answer: [17:30:00] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.879 second response time [17:30:09] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:45] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:30:54] PROBLEM - Apache HTTP on srv236 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:03] PROBLEM - Apache HTTP on srv278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:12] PROBLEM - SSH on srv226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:30] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:57] RECOVERY - SSH on srv264 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:31:57] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.146 second response time [17:31:57] PROBLEM - Apache HTTP on srv243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:15] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61394 bytes in 9.927 seconds [17:32:15] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:24] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.733 second response time [17:32:33] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.300 second response time [17:32:33] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:43] RECOVERY - SSH on srv273 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:32:43] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:00] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:18] RECOVERY - SSH on srv277 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:33:27] PROBLEM - Apache HTTP on srv209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:36] PROBLEM - Apache HTTP on srv238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:33:54] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:03] PROBLEM - Apache HTTP on srv244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:03] PROBLEM - Apache HTTP on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:12] PROBLEM - Apache HTTP on srv289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:13] PROBLEM - Memcached on srv226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:13] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:21] PROBLEM - Memcached on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:21] PROBLEM - Apache HTTP on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:21] PROBLEM - Apache HTTP on srv235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:30] PROBLEM - Apache HTTP on srv205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:30] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:30] PROBLEM - Apache HTTP on srv268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:30] PROBLEM - Apache HTTP on srv203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:39] PROBLEM - Apache HTTP on srv194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:39] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:39] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [17:34:39] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:40] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:40] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:48] PROBLEM - Apache HTTP on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:34:57] PROBLEM - Apache HTTP on srv225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:15] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out [17:35:15] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:42] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:42] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:35:51] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:36:00] PROBLEM - SSH on srv289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:00] RECOVERY - Memcached on srv288 is OK: TCP OK - 3.009 second response time on port 11000 [17:36:09] PROBLEM - SSH on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:36] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:36] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:36] PROBLEM - Apache HTTP on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:36] PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:36] PROBLEM - Apache HTTP on srv246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:45] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:45] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:45] PROBLEM - Apache HTTP on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:36:54] PROBLEM - Apache HTTP on srv195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:03] PROBLEM - Apache HTTP on srv207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:03] PROBLEM - Apache HTTP on srv233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:03] PROBLEM - SSH on srv275 is CRITICAL: Server answer: [17:37:03] PROBLEM - Apache HTTP on srv202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:03] PROBLEM - Apache HTTP on srv261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:12] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:12] PROBLEM - Apache HTTP on srv208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:12] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:12] PROBLEM - Apache HTTP on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:12] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:13] PROBLEM - Apache HTTP on srv231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:13] PROBLEM - Apache HTTP on srv282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:21] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:21] RECOVERY - SSH on srv289 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:37:39] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:39] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:48] jeremyb: yea, your guess is most likely right. (a "socket timed out" means, it managed to open it, but couldn't get any data from it. timeout means no open socket after 10 secs [17:38:06] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [17:38:15] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:38:25] not necessarily couldn't get any. just had more to get and stopped waiting. i think [17:39:03] jeremyb: well, it waited for 10 seconds, but got no open socket by that time, yea [17:39:09] RECOVERY - SSH on srv276 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:39:51] T3rminat0r: no, the other one. got a socket and some data back [17:39:59] T3rminat0r: and then gave up on a read() [17:40:04] yea [17:40:07] also possible [17:40:12] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.535 second response time [17:40:12] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61394 bytes in 7.401 seconds [17:40:13] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.299 second response time [17:40:21] RECOVERY - Apache HTTP on srv262 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.785 second response time [17:40:21] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.579 second response time [17:40:21] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.066 second response time [17:40:27] too lazy to read nagios docs now ;) [17:40:30] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.516 second response time [17:40:30] PROBLEM - Backend Squid HTTP on cp1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:30] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.205 second response time [17:40:30] RECOVERY - Apache HTTP on srv235 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.180 second response time [17:40:30] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.282 second response time [17:40:39] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.979 second response time [17:40:39] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.807 second response time [17:40:40] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61565 bytes in 2.928 seconds [17:40:40] RECOVERY - Apache HTTP on srv285 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.100 second response time [17:40:40] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.541 second response time [17:40:40] RECOVERY - Apache HTTP on srv203 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.113 second response time [17:40:44] T3rminat0r: i think nagios docs are quite irrelevant [17:40:48] RECOVERY - Apache HTTP on srv194 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.456 second response time [17:40:48] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.393 second response time [17:40:48] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.187 second response time [17:40:48] RECOVERY - Apache HTTP on srv268 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.309 second response time [17:40:48] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.587 second response time [17:40:53] T3rminat0r: anyway, not relevant to the outage [17:40:57] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.365 second response time [17:40:57] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.178 second response time [17:41:06] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.828 second response time [17:41:06] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [17:41:06] RECOVERY - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39768 bytes in 1.123 seconds [17:41:13] jeremyb: true, I ment for the difference of timeout/timed out) [17:41:15] RECOVERY - Apache HTTP on srv246 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [17:41:15] RECOVERY - Apache HTTP on srv209 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.606 second response time [17:41:15] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61379 bytes in 2.172 seconds [17:41:15] RECOVERY - Apache HTTP on srv243 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.124 second response time [17:41:15] RECOVERY - Apache HTTP on srv259 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [17:41:15] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.928 second response time [17:41:15] RECOVERY - Apache HTTP on srv225 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.525 second response time [17:41:16] RECOVERY - Apache HTTP on srv274 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [17:41:16] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61370 bytes in 2.171 seconds [17:41:17] RECOVERY - Apache HTTP on srv288 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.864 second response time [17:41:24] RECOVERY - Apache HTTP on srv195 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.466 second response time [17:41:24] RECOVERY - Apache HTTP on srv198 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.246 second response time [17:41:33] RECOVERY - Apache HTTP on srv238 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.127 second response time [17:41:33] RECOVERY - Apache HTTP on srv207 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.277 second response time [17:41:33] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.623 second response time [17:41:33] RECOVERY - Apache HTTP on srv202 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.826 second response time [17:41:33] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.873 second response time [17:41:33] RECOVERY - Apache HTTP on srv196 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.284 second response time [17:41:33] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.762 second response time [17:41:42] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.047 second response time [17:41:42] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.541 second response time [17:41:51] RECOVERY - Apache HTTP on srv231 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.949 second response time [17:41:51] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.518 second response time [17:41:51] PROBLEM - SSH on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:51] RECOVERY - Apache HTTP on srv228 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.212 second response time [17:41:51] RECOVERY - Apache HTTP on srv282 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.569 second response time [17:41:51] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.113 second response time [17:41:51] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.670 second response time [17:42:00] RECOVERY - Apache HTTP on srv210 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.903 second response time [17:42:00] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39961 bytes in 7.915 seconds [17:42:00] RECOVERY - Backend Squid HTTP on cp1012 is OK: HTTP OK HTTP/1.0 200 OK - 1258 bytes in 0.056 seconds [17:42:00] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 9.733 seconds [17:42:09] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.956 second response time [17:42:09] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 6.003 seconds [17:42:09] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 4.644 seconds [17:42:18] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48272 bytes in 2.189 seconds [17:42:27] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.412 second response time [17:42:36] PROBLEM - SSH on srv267 is CRITICAL: Server answer: [17:42:45] RECOVERY - Apache HTTP on srv197 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.410 second response time [17:42:54] RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.324 second response time [17:42:54] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 1.656 seconds [17:43:21] RECOVERY - Apache HTTP on srv201 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [17:43:21] RECOVERY - Apache HTTP on srv240 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.123 second response time [17:43:21] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48282 bytes in 3.111 seconds [17:43:39] RECOVERY - Apache HTTP on srv205 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.117 second response time [17:43:39] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.088 second response time [17:44:06] PROBLEM - SSH on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:15] PROBLEM - Memcached on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:42] PROBLEM - Memcached on srv212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:09] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:18] RECOVERY - SSH on srv280 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:45:18] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:27] PROBLEM - Apache HTTP on srv285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:27] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:45] PROBLEM - Apache HTTP on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:30] RECOVERY - SSH on srv275 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:46:30] PROBLEM - Memcached on srv280 is CRITICAL: Connection refused [17:46:39] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:40] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:57] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:06] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:06] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:07] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:15] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:33] PROBLEM - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:33] PROBLEM - Apache HTTP on srv246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:33] PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:42] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:43] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:43] PROBLEM - Apache HTTP on srv195 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:51] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:51] PROBLEM - Apache HTTP on srv259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:51] PROBLEM - Apache HTTP on srv243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:51] PROBLEM - Apache HTTP on srv207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:51] PROBLEM - Apache HTTP on srv213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:00] RECOVERY - Memcached on srv280 is OK: TCP OK - 0.001 second response time on port 11000 [17:48:00] PROBLEM - Apache HTTP on srv202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:00] PROBLEM - Apache HTTP on srv196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:00] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - SSH on srv263 is CRITICAL: Server answer: [17:48:09] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - Apache HTTP on srv236 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - Apache HTTP on srv262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - Apache HTTP on srv231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:09] PROBLEM - Apache HTTP on srv200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:10] PROBLEM - Apache HTTP on srv282 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:10] PROBLEM - Apache HTTP on srv241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:11] PROBLEM - Apache HTTP on srv208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:11] PROBLEM - Apache HTTP on srv228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:18] PROBLEM - Apache HTTP on srv201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:19] PROBLEM - Apache HTTP on srv240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:19] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:27] PROBLEM - Apache HTTP on srv235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:27] PROBLEM - Apache HTTP on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:27] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - Apache HTTP on srv242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:36] PROBLEM - Apache HTTP on srv203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:37] PROBLEM - Apache HTTP on srv194 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:45] PROBLEM - Apache HTTP on srv205 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:45] PROBLEM - Apache HTTP on srv278 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:45] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:48:45] PROBLEM - Apache HTTP on srv268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:03] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:12] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:12] PROBLEM - Apache HTTP on srv225 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:12] PROBLEM - Apache HTTP on srv197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:12] PROBLEM - Apache HTTP on srv209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:21] PROBLEM - Apache HTTP on srv198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:21] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:21] PROBLEM - Apache HTTP on srv288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:21] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.784 second response time [17:49:21] RECOVERY - Apache HTTP on srv196 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.717 second response time [17:49:30] RECOVERY - Apache HTTP on srv202 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.713 second response time [17:49:30] RECOVERY - Apache HTTP on srv229 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.222 second response time [17:49:30] PROBLEM - Apache HTTP on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:30] PROBLEM - Apache HTTP on srv238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:49:30] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.741 second response time [17:49:39] RECOVERY - Apache HTTP on srv262 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.511 second response time [17:49:39] RECOVERY - Apache HTTP on srv233 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.990 second response time [17:49:39] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.598 second response time [17:49:39] RECOVERY - Apache HTTP on srv208 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.825 second response time [17:49:39] RECOVERY - Apache HTTP on srv231 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.978 second response time [17:49:39] RECOVERY - SSH on srv263 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:49:39] RECOVERY - Apache HTTP on srv201 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.937 second response time [17:49:40] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.033 second response time [17:49:40] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 4.690 seconds [17:49:41] RECOVERY - Apache HTTP on srv210 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.046 second response time [17:49:41] RECOVERY - Apache HTTP on srv286 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.225 second response time [17:49:42] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48279 bytes in 6.686 seconds [17:49:48] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39961 bytes in 8.151 seconds [17:49:48] RECOVERY - Apache HTTP on srv200 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.345 second response time [17:49:57] RECOVERY - Apache HTTP on srv285 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [17:49:57] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.104 second response time [17:49:57] RECOVERY - Apache HTTP on srv235 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.807 second response time [17:49:57] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.197 second response time [17:49:57] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.744 second response time [17:50:06] RECOVERY - Apache HTTP on srv242 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [17:50:06] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.897 second response time [17:50:06] RECOVERY - Apache HTTP on srv203 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.471 second response time [17:50:06] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [17:50:06] RECOVERY - Apache HTTP on srv194 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.823 second response time [17:50:06] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.957 second response time [17:50:06] RECOVERY - Apache HTTP on srv205 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.128 second response time [17:50:15] RECOVERY - Apache HTTP on srv268 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [17:50:15] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [17:50:15] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [17:50:15] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.458 second response time [17:50:24] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.205 second response time [17:50:24] PROBLEM - NTP on srv226 is CRITICAL: NTP CRITICAL: No response from NTP server [17:50:33] RECOVERY - Apache HTTP on srv197 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [17:50:33] RECOVERY - Apache HTTP on srv225 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [17:50:33] RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [17:50:33] RECOVERY - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39768 bytes in 1.920 seconds [17:50:42] RECOVERY - Apache HTTP on srv280 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.070 second response time [17:50:42] RECOVERY - Apache HTTP on srv209 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.110 second response time [17:50:42] RECOVERY - Apache HTTP on srv198 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [17:50:42] RECOVERY - Apache HTTP on srv288 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.022 second response time [17:50:42] RECOVERY - Apache HTTP on srv246 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [17:50:42] RECOVERY - Apache HTTP on srv195 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.640 second response time [17:50:42] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.027 second response time [17:50:51] RECOVERY - Apache HTTP on srv243 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [17:50:51] RECOVERY - Apache HTTP on srv259 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [17:50:51] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 1.161 seconds [17:50:51] RECOVERY - Apache HTTP on srv207 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [17:50:51] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [17:50:51] RECOVERY - Apache HTTP on srv274 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [17:50:51] PROBLEM - SSH on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:50:52] RECOVERY - Apache HTTP on srv213 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.919 second response time [17:50:52] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61372 bytes in 2.390 seconds [17:51:00] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.691 second response time [17:51:00] RECOVERY - Apache HTTP on srv244 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [17:51:00] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.037 second response time [17:51:09] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [17:51:09] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.033 second response time [17:51:09] RECOVERY - Apache HTTP on srv228 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [17:51:09] RECOVERY - Apache HTTP on srv269 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [17:51:18] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [17:51:18] RECOVERY - Apache HTTP on srv240 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [17:51:27] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61562 bytes in 3.059 seconds [17:51:36] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 1.402 seconds [17:51:36] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 3.081 seconds [17:51:54] RECOVERY - Apache HTTP on srv275 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.136 second response time [17:52:21] RECOVERY - SSH on srv264 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:52:21] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61377 bytes in 2.215 seconds [17:52:48] RECOVERY - Apache HTTP on srv282 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.527 second response time [17:52:48] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.736 second response time [17:53:06] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [17:53:06] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61567 bytes in 3.246 seconds [17:54:09] PROBLEM - Apache HTTP on srv229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:09] RECOVERY - Apache HTTP on srv238 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.814 second response time [17:54:18] RECOVERY - Apache HTTP on mw56 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.988 second response time [17:54:27] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 0.408 seconds [17:54:28] PROBLEM - SSH on srv289 is CRITICAL: Server answer: [17:54:36] PROBLEM - SSH on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:36] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:54:36] RECOVERY - Apache HTTP on srv277 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.922 second response time [17:55:01] ? [17:55:10] we still having an outage [17:55:11] ? [17:55:21] RECOVERY - SSH on srv267 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:55:30] RECOVERY - Apache HTTP on srv229 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [17:55:39] RECOVERY - Apache HTTP on srv261 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.024 second response time [17:56:15] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.352 second response time [17:56:15] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48273 bytes in 2.815 seconds [17:56:24] PROBLEM - SSH on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:56:24] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:56:26] Ryan_Lane: yes [17:56:35] Ryan_Lane: well again rather than still [17:57:00] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:28] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:57:45] PROBLEM - SSH on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:45] RECOVERY - SSH on srv283 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [17:57:45] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 7.039 seconds [17:58:21] PROBLEM - Memcached on srv267 is CRITICAL: Connection refused [17:58:30] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.458 second response time [17:58:57] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:57] PROBLEM - Apache HTTP on mw56 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:06] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.491 second response time [17:59:24] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:24] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:25] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:59:51] RECOVERY - Memcached on srv267 is OK: TCP OK - 0.004 second response time on port 11000 [18:00:27] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:00:45] PROBLEM - NTP on srv237 is CRITICAL: NTP CRITICAL: No response from NTP server [18:00:45] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 5.154 seconds [18:00:54] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.477 second response time [18:01:03] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [18:01:21] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:01:30] PROBLEM - Apache HTTP on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:01:57] PROBLEM - SSH on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:02:15] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 4.132 seconds [18:02:24] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.071 second response time [18:03:09] PROBLEM - SSH on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:18] PROBLEM - Apache HTTP on srv209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:18] PROBLEM - Memcached on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:18] PROBLEM - NTP on srv212 is CRITICAL: NTP CRITICAL: No response from NTP server [18:03:36] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:45] PROBLEM - Apache HTTP on srv241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:54] PROBLEM - Apache HTTP on srv272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:54] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:04:57] PROBLEM - Apache HTTP on srv202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:04:57] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:04:57] RECOVERY - SSH on srv275 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:04:57] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61379 bytes in 2.167 seconds [18:05:15] RECOVERY - SSH on srv289 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:05:24] RECOVERY - SSH on srv273 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:05:24] PROBLEM - SSH on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:33] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:33] PROBLEM - Apache HTTP on srv240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:42] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:42] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:42] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:05:51] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.660 second response time [18:06:09] PROBLEM - Apache HTTP on srv260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:18] PROBLEM - SSH on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:18] RECOVERY - Apache HTTP on srv209 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.045 second response time [18:06:18] RECOVERY - Memcached on srv272 is OK: TCP OK - 2.999 second response time on port 11000 [18:06:18] RECOVERY - Apache HTTP on srv202 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.819 second response time [18:06:27] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:27] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 9.828 seconds [18:06:27] PROBLEM - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:36] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.396 second response time [18:06:36] PROBLEM - Apache HTTP on srv274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:45] PROBLEM - Apache HTTP on srv236 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:54] PROBLEM - Apache HTTP on srv227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:03] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.110 second response time [18:07:13] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:13] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:13] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:08:33] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48280 bytes in 3.157 seconds [18:08:42] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48273 bytes in 2.737 seconds [18:09:36] RECOVERY - LVS HTTP IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39768 bytes in 1.410 seconds [18:09:36] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.882 second response time [18:09:45] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.486 second response time [18:10:03] RECOVERY - SSH on srv260 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:10:12] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39965 bytes in 2.525 seconds [18:10:21] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39961 bytes in 2.747 seconds [18:10:21] RECOVERY - Apache HTTP on srv240 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.721 second response time [18:10:21] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 8.800 seconds [18:10:30] PROBLEM - Apache HTTP on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:10:30] PROBLEM - Apache HTTP on srv232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:10:30] RECOVERY - Apache HTTP on srv289 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.799 second response time [18:10:30] PROBLEM - Apache HTTP on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:10:39] RECOVERY - Apache HTTP on srv275 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.538 second response time [18:11:06] PROBLEM - Apache HTTP on srv283 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:11:15] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:11:24] RECOVERY - Apache HTTP on srv227 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.373 second response time [18:11:33] PROBLEM - Apache HTTP on srv245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:11:42] PROBLEM - Apache HTTP on srv235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:00] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:00] PROBLEM - Apache HTTP on srv239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:27] RECOVERY - SSH on srv274 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:12:45] RECOVERY - Apache HTTP on srv283 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.064 second response time [18:12:45] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.928 second response time [18:12:54] RECOVERY - Apache HTTP on srv245 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [18:12:54] RECOVERY - Apache HTTP on srv241 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [18:12:54] PROBLEM - SSH on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:54] PROBLEM - SSH on srv258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:13:12] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61565 bytes in 2.905 seconds [18:13:12] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61569 bytes in 3.135 seconds [18:13:12] RECOVERY - Apache HTTP on srv235 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.736 second response time [18:13:30] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:13:30] RECOVERY - Apache HTTP on srv232 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.825 second response time [18:13:48] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 9.350 seconds [18:14:24] RECOVERY - SSH on srv258 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:15:00] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:15:18] PROBLEM - Apache HTTP on srv230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:15:45] RECOVERY - SSH on srv263 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:16:03] PROBLEM - Apache HTTP on srv233 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:16:30] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39968 bytes in 2.026 seconds [18:16:30] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.671 second response time [18:16:30] RECOVERY - Apache HTTP on srv258 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.771 second response time [18:17:06] RECOVERY - SSH on srv272 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:17:06] RECOVERY - Apache HTTP on srv260 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.656 second response time [18:17:24] RECOVERY - Apache HTTP on srv274 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.063 second response time [18:17:33] RECOVERY - Apache HTTP on srv233 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [18:17:51] PROBLEM - SSH on srv270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:18:00] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:18:09] PROBLEM - Apache HTTP on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:19:21] RECOVERY - SSH on srv270 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:19:30] PROBLEM - Apache HTTP on srv271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:21:09] RECOVERY - Apache HTTP on srv239 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:21:09] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:03] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.160 second response time [18:22:21] PROBLEM - SSH on srv265 is CRITICAL: Server answer: [18:22:30] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61572 bytes in 3.304 seconds [18:22:48] PROBLEM - Apache HTTP on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:24:18] RECOVERY - Apache HTTP on srv270 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:25:21] PROBLEM - SSH on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:48] RECOVERY - Apache HTTP on srv272 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.754 second response time [18:25:58] PROBLEM - Apache HTTP on srv263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:58] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:58] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:26:15] PROBLEM - Apache HTTP on srv275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:27:45] RECOVERY - Apache HTTP on srv275 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.940 second response time [18:28:21] PROBLEM - Apache HTTP on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:30] PROBLEM - SSH on srv287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:30] PROBLEM - SSH on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:30] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:39] RECOVERY - Apache HTTP on srv271 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [18:29:06] PROBLEM - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:15] PROBLEM - Memcached on srv211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:29:42] !log logs stopped on flourine between 18:11 and 18:27 UTC [18:29:52] Logged the message, Mistress of the network gear. [18:30:27] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 1.590 seconds [18:30:36] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 0.507 seconds [18:30:36] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:30:54] PROBLEM - Apache HTTP on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:39] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:39] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:32:06] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 61572 bytes in 7.592 seconds [18:32:15] PROBLEM - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:33:36] RECOVERY - LVS HTTP IPv4 on wiktionary-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 61565 bytes in 2.919 seconds [18:34:39] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:34:39] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 1.134 seconds [18:35:04] New patchset: Mark Bergsma; "Reduce MaxClientsPerRequest to 4 to mitigate memleak" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23870 [18:35:33] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:35:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23870 [18:36:07] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23870 [18:36:54] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.593 second response time [18:36:54] RECOVERY - LVS HTTPS IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39965 bytes in 1.914 seconds [18:37:18] !log Reduced MaxRequestsPerClient from 300 to 4 in apache.conf [18:37:27] Logged the message, Master [18:38:06] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.827 second response time [18:38:15] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:38:33] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 0.139 seconds [18:39:36] PROBLEM - SSH on srv265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:40:21] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.799 second response time [18:40:58] PROBLEM - Apache HTTP on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:40:58] RECOVERY - SSH on srv265 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:41:33] PROBLEM - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:33] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:51] PROBLEM - SSH on srv279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:18] RECOVERY - SSH on srv287 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:42:27] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.093 second response time [18:42:36] PROBLEM - SSH on srv273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:45] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48270 bytes in 7.627 seconds [18:42:54] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:54] PROBLEM - Apache HTTP on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:54] RECOVERY - LVS HTTP IPv4 on foundation-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 39961 bytes in 6.777 seconds [18:43:03] RECOVERY - Apache HTTP on srv265 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.254 second response time [18:43:12] RECOVERY - SSH on srv279 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:43:39] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.510 second response time [18:43:57] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:43:57] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.619 second response time [18:43:57] PROBLEM - Apache HTTP on srv234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:44:06] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:44:06] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.309 second response time [18:44:15] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [18:44:42] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.785 second response time [18:44:51] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:45:00] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.753 second response time [18:45:00] PROBLEM - Apache HTTP on mw18 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:45:27] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48089 bytes in 5.415 seconds [18:46:12] PROBLEM - SSH on srv264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:21] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:30] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.708 second response time [18:46:48] RECOVERY - Apache HTTP on srv287 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.051 second response time [18:47:06] PROBLEM - NTP on srv211 is CRITICAL: NTP CRITICAL: No response from NTP server [18:47:15] PROBLEM - Apache HTTP on mw41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:47:24] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.006 second response time [18:48:00] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.649 second response time [18:48:27] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:48:36] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.052 second response time [18:48:45] PROBLEM - Apache HTTP on mw20 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:48:45] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:48:45] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.582 second response time [18:48:54] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:48:54] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.959 second response time [18:48:54] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.875 second response time [18:49:03] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:49:03] RECOVERY - SSH on srv264 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:49:12] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 48282 bytes in 2.107 seconds [18:49:30] RECOVERY - Apache HTTP on srv279 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.025 second response time [18:49:30] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:49:48] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:50:15] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.639 second response time [18:51:00] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 63189 bytes in 7.260 seconds [18:51:18] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:51:36] PROBLEM - Apache HTTP on srv276 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:52:03] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:52:48] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:06] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.939 second response time [18:53:24] PROBLEM - Apache HTTP on mw57 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:33] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:33] PROBLEM - Apache HTTP on mw53 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:42] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:51] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:56:51] PROBLEM - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:57:09] RECOVERY - Apache HTTP on srv230 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [18:58:12] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.esams.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48275 bytes in 1.884 seconds [18:59:24] PROBLEM - Apache HTTP on srv267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:59:30] New patchset: Faidon; "Keep memcached memory out of swap" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23872 [18:59:51] RECOVERY - LVS HTTPS IPv4 on foundation-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 39775 bytes in 0.183 seconds [18:59:51] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.291 second response time [19:00:18] PROBLEM - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:00:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23872 [19:01:03] RECOVERY - SSH on srv263 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:01:39] RECOVERY - LVS HTTP IPv4 on wikiversity-lb.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.0 200 OK - 48082 bytes in 0.964 seconds [19:01:57] RECOVERY - SSH on srv237 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:01:57] RECOVERY - Memcached on srv237 is OK: TCP OK - 0.001 second response time on port 11000 [19:02:51] RECOVERY - Apache HTTP on srv264 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [19:03:00] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.432 second response time [19:03:00] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.950 second response time [19:03:00] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.873 second response time [19:03:00] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.810 second response time [19:03:09] RECOVERY - Apache HTTP on mw19 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.958 second response time [19:03:18] RECOVERY - Apache HTTP on mw41 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.482 second response time [19:03:18] RECOVERY - Apache HTTP on srv263 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time [19:03:18] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:27] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.360 second response time [19:03:27] RECOVERY - SSH on srv212 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:03:27] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.746 second response time [19:03:27] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.180 second response time [19:03:27] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.716 second response time [19:03:27] RECOVERY - Apache HTTP on mw32 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.006 second response time [19:03:28] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.336 second response time [19:03:36] RECOVERY - Memcached on srv211 is OK: TCP OK - 0.000 second response time on port 11000 [19:03:36] RECOVERY - Apache HTTP on mw18 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.376 second response time [19:03:36] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.011 second response time [19:03:36] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.852 second response time [19:03:45] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.135 second response time [19:03:45] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.906 second response time [19:03:54] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.898 second response time [19:04:03] RECOVERY - Memcached on srv212 is OK: TCP OK - 0.003 second response time on port 11000 [19:04:03] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [19:04:03] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [19:04:03] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.092 second response time [19:04:03] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.737 second response time [19:04:03] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.582 second response time [19:04:12] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [19:04:12] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.026 second response time [19:04:12] RECOVERY - Apache HTTP on mw45 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.046 second response time [19:04:12] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [19:04:12] PROBLEM - Host srv273 is DOWN: PING CRITICAL - Packet loss = 100% [19:04:21] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.027 second response time [19:04:21] RECOVERY - Apache HTTP on mw53 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.048 second response time [19:04:21] RECOVERY - SSH on srv211 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:04:21] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.277 second response time [19:04:21] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [19:04:21] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [19:04:21] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.469 second response time [19:04:30] RECOVERY - Apache HTTP on mw17 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [19:04:30] RECOVERY - Apache HTTP on mw56 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.034 second response time [19:04:30] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.912 second response time [19:04:30] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.031 second response time [19:04:30] RECOVERY - SSH on srv226 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:04:30] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time [19:04:30] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [19:04:43] !log Power cycled srv212, 211, 226, 273, 237 [19:04:48] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.140 second response time [19:04:52] Logged the message, Master [19:04:57] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 61399 bytes in 0.186 seconds [19:04:58] RECOVERY - Memcached on srv226 is OK: TCP OK - 0.011 second response time on port 11000 [19:05:06] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [19:05:24] RECOVERY - Apache HTTP on srv267 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.598 second response time [19:06:00] RECOVERY - SSH on srv273 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:06:09] RECOVERY - NTP on srv237 is OK: NTP OK: Offset -0.04341959953 secs [19:06:09] RECOVERY - Host srv273 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [19:06:34] New patchset: Tim Starling; "Swapping out broken mc servers" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23873 [19:06:48] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23873 [19:07:03] RECOVERY - NTP on srv212 is OK: NTP OK: Offset 0.03188371658 secs [19:07:03] RECOVERY - NTP on srv211 is OK: NTP OK: Offset -0.07390582561 secs [19:08:42] RECOVERY - NTP on srv226 is OK: NTP OK: Offset 0.004644751549 secs [19:18:54] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [19:20:24] RECOVERY - Apache HTTP on srv212 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time [19:22:21] RECOVERY - Apache HTTP on srv211 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [19:28:57] RECOVERY - Apache HTTP on srv226 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [19:31:03] RECOVERY - Apache HTTP on srv273 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [19:33:18] RECOVERY - Apache HTTP on srv237 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.044 second response time [19:53:42] PROBLEM - check google safe browsing for wikimedia.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:42] PROBLEM - check google safe browsing for wiktionary.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:42] PROBLEM - check google safe browsing for wikinews.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:54:09] PROBLEM - check google safe browsing for wikisource.org on google is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:03] RECOVERY - check google safe browsing for wikimedia.org on google is OK: HTTP OK HTTP/1.0 200 OK - 0.155 second response time [19:55:03] RECOVERY - check google safe browsing for wiktionary.org on google is OK: HTTP OK HTTP/1.0 200 OK - 0.155 second response time [19:55:03] RECOVERY - check google safe browsing for wikinews.org on google is OK: HTTP OK HTTP/1.0 200 OK - 0.157 second response time [19:55:30] RECOVERY - check google safe browsing for wikisource.org on google is OK: HTTP OK HTTP/1.0 200 OK - 0.178 second response time [19:57:47] oooh, interesting. not seen that check before. will have to read it sometime [20:41:06] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:50:06] PROBLEM - Apache HTTP on srv250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:12] PROBLEM - SSH on srv250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:12] PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Could not connect: 10.0.2.250:11000 (Connection timed out) [20:52:30] PROBLEM - Memcached on srv250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:06:00] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [21:06:00] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [21:06:00] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [21:06:00] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [21:06:00] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [21:06:00] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [21:10:03] PROBLEM - NTP on srv250 is CRITICAL: NTP CRITICAL: No response from NTP server [21:43:57] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [23:01:03] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [23:52:03] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours