[01:06:25] 2014/03/17 01:01 CRIT wolfsbane Load avg. CRITICAL - load average: 81.31, 69.34, 34.76 [01:11:25] 2014/03/17 01:10 WARN ortelius Load avg. WARNING - load average: 17.61, 23.23, 14.04 [01:13:25] 2014/03/17 01:12 CRIT ortelius Load avg. CRITICAL - load average: 33.21, 23.12, 14.90 [02:23:29] 2014/03/17 02:23 WARN ortelius Load avg. WARNING - load average: 0.54, 1.79, 18.76 [02:23:29] 2014/03/17 02:23 WARN wolfsbane Load avg. WARNING - load average: 0.96, 1.95, 19.70 [02:27:29] 2014/03/17 02:27 OK ortelius Load avg. OK - load average: 0.52, 1.10, 14.55 [02:28:30] 2014/03/17 02:28 OK wolfsbane Load avg. OK - load average: 0.59, 1.09, 14.32 [03:07:32] 2014/03/17 03:06 OK wolfsbane / DISK OK - free space: / 6316 MB (21% inode=93%): [03:07:32] 2014/03/17 03:06 OK wolfsbane /tmp DISK OK - free space: / 6316 MB (21% inode=93%): [19:04:27] 2014/03/17 18:57 WARN wolfsbane / DISK WARNING - free space: / 6269 MB (20% inode=93%): [19:04:27] 2014/03/17 18:57 WARN wolfsbane /tmp DISK WARNING - free space: / 6266 MB (20% inode=93%): [19:57:55] o7 [19:58:02] maintenane tonight [19:58:52] we will update yarrow, nightshade, turnera, sage, thyme, z-dat-s2-b, z-dat-s5-b [19:59:42] i will start updating turnera [20:02:30] I will start with nightshade [20:04:37] and yarrow [20:07:16] turnera gets a new kernel...i will have to reboot... [20:07:25] ldap will be away during reboot [20:07:26] same with nightshade and yarrow [20:07:35] not regarding ldap, but kernel [20:10:51] 2014/03/17 20:09 OK cassia SMTP SMTP OK - 0.183 sec. response time [20:10:51] 2014/03/17 20:10 OK cassia SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:10:51] 2014/03/17 20:09 WARN nightshade APT APT WARNING: 0 packages available for upgrade (0 critical updates). warnings detected, errors detected. run with -v for information. [20:10:51] 2014/03/17 20:09 OK z-dat-s3-a SMTP SMTP OK - 0.002 sec. response time [20:10:51] 2014/03/17 20:09 OK z-dat-s3-a SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:10:51] 2014/03/17 20:09 OK z-dat-s4-a SMTP SMTP OK - 0.003 sec. response time [20:10:51] 2014/03/17 20:09 OK z-dat-s4-a SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:10:51] 2014/03/17 20:09 OK z-dat-s6-a SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:10:51] 2014/03/17 20:09 OK z-dat-s7-a SMTP SMTP OK - 0.223 sec. response time [20:10:51] 2014/03/17 20:09 OK z-dat-s7-a SSH SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [20:11:36] 2014/03/17 20:10 OK nightshade APT APT OK: 0 packages available for upgrade (0 critical updates). [20:11:36] 2014/03/17 20:11 ?? ortelius Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:11:36] 2014/03/17 20:10 ?? willow Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:11:36] 2014/03/17 20:11 ?? wolfsbane Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:11:36] 2014/03/17 20:10 OK yarrow APT APT OK: 0 packages available for upgrade (0 critical updates). [20:11:36] 2014/03/17 20:10 OK z-dat-s6-a SMTP SMTP OK - 0.216 sec. response time [20:13:58] rebooting yarrow - no users logged in, so no special warning here [20:14:55] Danny_B , jem- : you have session open on nightshade which is scheduled to be rebooted - please save your work, if you have anything open and prepare for reboot - thanks! [20:15:34] amette: thx, sec, i'll ping you [20:15:36] 2014/03/17 20:15 CRIT yarrow / Connection refused or timed out [20:15:36] 2014/03/17 20:14 CRIT yarrow /tmp Connection refused or timed out [20:15:36] 2014/03/17 20:14 CRIT yarrow /var Timeout while attempting connection [20:15:36] 2014/03/17 20:14 CRIT yarrow /var/tmp Timeout while attempting connection [20:15:36] 2014/03/17 20:14 CRIT yarrow APT Timeout while attempting connection [20:15:36] 2014/03/17 20:14 CRIT yarrow Environment IPMI Connection refused or timed out [20:15:36] 2014/03/17 20:15 CRIT yarrow Load avg. Connection refused or timed out [20:15:37] 2014/03/17 20:15 CRIT yarrow NTP CRITICAL - Socket timeout after 10 seconds [20:15:37] 2014/03/17 20:15 CRIT yarrow PING CRITICAL - Host Unreachable (yarrow) [20:15:38] 2014/03/17 20:15 CRIT yarrow SMTP No route to host [20:15:38] 2014/03/17 20:14 CRIT yarrow SRaid Connection refused or timed out [20:15:39] 2014/03/17 20:14 CRIT yarrow Sensors Connection refused or timed out [20:16:36] 2014/03/17 20:15 CRIT yarrow SSH No route to host [20:17:36] 2014/03/17 20:10 ?? nightshade Sun Grid Engine execd Cannot execute /sge/GE/bin/linux-x64/qhost [20:19:36] 2014/03/17 20:19 OK yarrow / DISK OK - free space: / 1581 MB (88% inode=94%): [20:19:37] 2014/03/17 20:18 OK yarrow /tmp DISK OK - free space: /tmp 4093 MB (96% inode=99%): [20:19:37] 2014/03/17 20:18 OK yarrow /var DISK OK - free space: /var 12259 MB (91% inode=94%): [20:19:37] 2014/03/17 20:18 OK yarrow /var/tmp DISK OK - free space: /var/tmp 827 MB (97% inode=99%): [20:19:37] 2014/03/17 20:18 OK yarrow APT APT OK: 0 packages available for upgrade (0 critical updates). [20:19:37] 2014/03/17 20:18 OK yarrow Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [20:19:37] 2014/03/17 20:19 OK yarrow Load avg. OK - load average: 0.99, 0.38, 0.14 [20:19:38] 2014/03/17 20:19 OK yarrow PING PING OK - Packet loss = 0%, RTA = 0.12 ms [20:19:38] 2014/03/17 20:19 OK yarrow SMTP SMTP OK - 0.107 sec. response time [20:19:39] 2014/03/17 20:18 OK yarrow SRaid OK md0 status=[UU]. [20:19:39] 2014/03/17 20:18 OK yarrow SSH SSH OK - OpenSSH_5.5p1 Debian-6+squeeze4 (protocol 2.0) [20:19:40] 2014/03/17 20:18 OK yarrow Sensors sensor ok [20:19:40] 2014/03/17 20:18 OK yarrow aliasd TCP OK - 0.001 second response time on port 984 [500 Not found.] [20:20:45] amette: shoot [20:21:09] Danny_B: cool, thanks! Still waiting a bit, if jem- turns up... [20:21:31] amette: assuming just a few mins, right? [20:21:36] 2014/03/17 20:18 ?? yarrow Sun Grid Engine execd Cannot execute /sge/GE/bin/linux-x64/qhost [20:22:02] Danny_B: yup, correct - then hopefully no problems with turnera LDAP this time and everything should be back to normal soon [20:22:36] 2014/03/17 20:21 CRIT nightshade Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:22:36] 2014/03/17 20:21 CRIT willow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:22:38] hopefully no probs with ldap will apear [20:22:55] Danny_B: i have hopes :) [20:23:03] jem- has no other processes than sshd open, so I'll go for reboot of nightshade now [20:23:17] so do i, nosy1 ;-) [20:23:36] 2014/03/17 20:22 CRIT ortelius Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:24:36] 2014/03/17 20:19 WARN yarrow NTP NTP WARNING: Server has the LI_ALARM bit set, Offset -0.099619 secs [20:25:36] 2014/03/17 20:21 CRIT nightshade SSH CRITICAL - Socket timeout after 10 seconds [20:26:36] 2014/03/17 20:26 CRIT nightshade / Connection refused or timed out [20:26:37] 2014/03/17 20:26 CRIT nightshade /tmp Connection refused or timed out [20:26:37] 2014/03/17 20:25 CRIT nightshade Environment IPMI Timeout while attempting connection [20:26:37] 2014/03/17 20:26 CRIT nightshade NTP CRITICAL - Socket timeout after 10 seconds [20:26:37] 2014/03/17 20:26 CRIT nightshade PING CRITICAL - Host Unreachable (nightshade) [20:26:37] 2014/03/17 20:26 CRIT nightshade SMTP No route to host [20:27:36] 2014/03/17 20:26 CRIT nightshade /var Connection refused or timed out [20:27:37] 2014/03/17 20:26 CRIT nightshade /var/tmp Connection refused or timed out [20:27:37] 2014/03/17 20:26 CRIT nightshade APT Connection refused or timed out [20:27:37] 2014/03/17 20:27 CRIT nightshade Load avg. Connection refused or timed out [20:27:37] 2014/03/17 20:26 CRIT nightshade Sensors Connection refused or timed out [20:27:37] 2014/03/17 20:26 CRIT nightshade aliasd Connection refused or timed out [20:29:37] 2014/03/17 20:29 OK nightshade /tmp DISK OK - free space: /tmp 4317 MB (96% inode=99%): [20:29:37] 2014/03/17 20:29 OK nightshade SSH SSH OK - OpenSSH_5.5p1 Debian-6+squeeze4 (protocol 2.0) [20:30:37] 2014/03/17 20:29 OK nightshade / DISK OK - free space: / 1592 MB (89% inode=94%): [20:30:37] 2014/03/17 20:29 OK nightshade /var DISK OK - free space: /var 9353 MB (69% inode=47%): [20:30:37] 2014/03/17 20:29 OK nightshade /var/tmp DISK OK - free space: /var/tmp 872 MB (98% inode=99%): [20:30:37] 2014/03/17 20:29 OK nightshade APT APT OK: 0 packages available for upgrade (0 critical updates). [20:30:37] 2014/03/17 20:29 OK nightshade Environment IPMI ok: temperature ok fan ok voltage ok chassis ok [20:30:37] 2014/03/17 20:30 OK nightshade Load avg. OK - load average: 1.18, 0.57, 0.21 [20:30:37] 2014/03/17 20:30 OK nightshade NTP NTP OK: Offset -0.043325 secs [20:30:38] 2014/03/17 20:30 OK nightshade PING PING OK - Packet loss = 0%, RTA = 0.22 ms [20:30:38] 2014/03/17 20:29 OK nightshade Sensors sensor ok [20:30:39] 2014/03/17 20:29 ?? nightshade Sun Grid Engine execd Cannot execute /sge/GE/bin/linux-x64/qhost [20:30:39] 2014/03/17 20:29 OK nightshade aliasd TCP OK - 0.002 second response time on port 984 [500 Not found.] [20:30:40] 2014/03/17 20:30 ?? ortelius Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:30:40] 2014/03/17 20:29 ?? willow Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:31:37] 2014/03/17 20:30 CRIT willow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:33:38] 2014/03/17 20:32 ?? willow Sun Grid Engine execd Cannot execute /sge/GE/bin/sol-amd64/qstat [20:34:20] amette: nosy1 no cron on restart again, as usual [20:34:21] bash: /home/danny_b/bots/listwatcher/run: No such file or directory [20:34:24] danny_b@nightshade:/var/spool/cron$ [20:34:26] etc... [20:34:37] 2014/03/17 20:34 CRIT yarrow NTP NTP CRITICAL: Server not synchronized, Offset unknown [20:34:52] oh i go look [20:35:59] Danny_B: root@nightshade:/home/rnosy# ls /home/danny_b/bots/listwatcher/run [20:35:59] /home/danny_b/bots/listwatcher/run [20:36:02] works for me? [20:36:16] Danny_B: how is it for you now? [20:37:38] 2014/03/17 20:37 WARN wolfsbane Sun Grid Engine execd NRPE: Unable to read output [20:37:38] 2014/03/17 20:37 WARN yarrow NTP NTP WARNING: Server has the LI_ALARM bit set, Offset -0.000189 secs [20:38:44] nosy1: the deal is home is not mounted when reboot cron is running [20:39:07] Danny_B: is it still not there for you? [20:39:38] 2014/03/17 20:38 WARN willow Sun Grid Engine execd NRPE: Unable to read output [20:39:43] it is running now. i had to start it manually [20:39:51] all my cron @ reboot stuff [20:40:52] interesting [20:41:18] quota: Cannot resolve mountpoint path /sge: Stale NFS file handle [20:41:38] 2014/03/17 20:41 WARN ortelius Sun Grid Engine execd NRPE: Unable to read output [20:43:38] 2014/03/17 20:42 CRIT nightshade Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:43:38] 2014/03/17 20:42 CRIT willow Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:43:38] Danny_B: I know...currently fiddling with it [20:44:38] 2014/03/17 20:43 CRIT ortelius Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:44:38] 2014/03/17 20:43 CRIT wolfsbane Sun Grid Engine execd CHECK_NRPE: Socket timeout after 30 seconds. [20:45:33] just letting you know... [20:46:05] Danny_B: thx [20:47:39] 2014/03/17 20:46 ?? nightshade Sun Grid Engine execd Cannot execute /sge/GE/bin/linux-x64/qhost [20:47:40] 2014/03/17 20:47 WARN ortelius Sun Grid Engine execd NRPE: Unable to read output [20:47:40] 2014/03/17 20:46 WARN willow Sun Grid Engine execd NRPE: Unable to read output [20:47:40] 2014/03/17 20:47 WARN wolfsbane Sun Grid Engine execd NRPE: Unable to read output [20:48:28] going to reboot turnera now [20:51:07] * amette crossing fingers [20:59:11] ldap has problems again [20:59:15] i am checking [21:00:55] web down [21:04:07] ldap should be there again [21:04:14] Danny_B: i guess its back up now too [21:04:22] how does it look for you? [21:05:13] yup [21:18:51] z-dat-s5-b and z-dat-s2-b done too [21:27:04] sage scheduled for reboot - no users logged in, so just sayin'... [21:37:20] s5-user will restart with sage [21:38:04] sage reboots [21:56:13] sage rebooted [21:58:51] ok maintenance done [21:58:53] sge coming back [21:59:03] * amette checks out, too [21:59:30] indeed i go [21:59:35] *wave* [23:35:05] Platonides * [Toolserver-l] hawthorn problems: broken sge mount and crond