[00:01:52] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [00:02:32] Load avg. on willow is WARNING: WARNING - load average: 26.62, 16.80, 13.67 [00:03:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.725586/1.95, alarm hl:np_load_avg=2.037598/2.0, alarm hl:mem_free=997.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.725586/2.3, alarm hl:np_load_long=1.694824/2.5, alarm hl:cpu=83.600000/98, alarm hl:mem_free=997.000000M/200M, alarm hl:available=1/0 [00:04:12] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:04:33] Load avg. on willow is OK: OK - load average: 12.80, 14.79, 13.29 [00:04:44] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:06:23] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 299310 MB (5% inode=33%): [00:08:34] Load avg. on willow is WARNING: WARNING - load average: 14.01, 15.63, 14.05 [00:17:53] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [00:19:22] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.145996/1.95, alarm hl:np_load_avg=1.890625/2.0, alarm hl:mem_free=445.000000M/350M, alarm hl:available=1/0 [00:29:54] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [00:31:13] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:36:13] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [00:50:33] Load avg. on willow is WARNING: WARNING - load average: 16.34, 14.67, 13.39 [00:51:33] / on wolfsbane is WARNING: DISK WARNING - free space: / 5995 MB (20% inode=93%): [00:51:33] Load avg. on willow is OK: OK - load average: 12.86, 13.92, 13.20 [00:57:23] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [01:01:54] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [01:02:33] Load avg. on willow is WARNING: WARNING - load average: 17.03, 15.28, 13.70 [01:04:54] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:06:33] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 301073 MB (5% inode=33%): [01:16:10] Hello, is any TS admin or user available for some explanation? [01:16:24] Hazard-SJ, explanation for what? [01:17:16] I'd like to know about the interfaces etc. provided, whether only SSH is used and not FTP and so on. [01:17:54] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [01:27:03] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:30:54] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [01:31:23] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:36:16] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [01:36:26] Hazard-SJ: What? [01:36:32] Just ask your question. [01:36:56] Hazard-SJ: both ssh and scp are available [01:40:02] I'm not at all familiar with SCP, but barely with SSH. [01:47:32] Would I have to work on that first? [01:47:40] Joan, Betacommand: ^ [01:48:10] Hazard-SJ: I'd start by figuring out what you want to do. [01:48:24] Then work on figuring out how to do it. [01:48:36] Learning random technologies doesn't seem like a good use of time. [01:48:45] Do you need SSH? SCP? SFTP? Dunno. What are you trying to do? [01:48:46] Hazard-SJ: what OS is your home unit [01:48:56] I want to run cron jobs via Toolserver [01:49:04] Windows 7 [01:49:26] So you just need to SSH in to the Toolserver. [01:49:31] And edit your crontab. [01:49:36] Sounds like Betacommand can help. [01:49:42] :) [01:49:54] Hazard-SJ: look into putty and winSCP [01:50:06] that will give you ssh and file transfer ability [01:50:12] Betacommand: I already have PuTTY [01:50:38] Hazard-SJ: then you have commandline access to the toolserver if you have an account [01:51:05] WinSCP will let you transfer files [01:51:15] I don't have an account (yet), but will request one soon. [01:51:36] / on wolfsbane is WARNING: DISK WARNING - free space: / 5748 MB (19% inode=93%): [01:51:41] The rest is getting to know Linux/commandline [01:52:37] So Betacommand, I need both PuTTY and WinSCP, or is just PuTTY okay? [01:53:16] Both [01:53:43] Putty handles ssh and WinSCP takes care of file transfering [01:57:37] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [02:02:16] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [02:05:16] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:05:55] Betacommand: I installed WinSCP and I'm having "fun" with it so far. [02:07:10] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 300999 MB (5% inode=33%): [02:11:39] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [02:11:56] ewwww, get the windows out of my scrollback [02:13:23] jeremyb: for some of us, it is the only option [02:13:46] Betacommand: live cd, VM [02:14:18] jeremyb: some things just dont work as well in a VM [02:18:19] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [02:27:29] Sun Grid Engine execd on willow is WARNING: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.898926/2.3, alarm hl:np_load_long=1.658691/2.5, alarm hl:cpu=99.600000/98, alarm hl:mem_free=980.000000M/200M, alarm hl:available=1/0 [02:28:29] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [02:31:19] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [02:31:29] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:36:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [02:52:10] / on wolfsbane is WARNING: DISK WARNING - free space: / 5704 MB (19% inode=93%): [02:55:56] 3(created) [ACCAPP-505] Cron jobs for Hazard-Bot; Account Approval; New Account <10https://jira.toolserver.org/browse/ACCAPP-505> (Hazard-SJ ) [02:57:39] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [03:02:20] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [03:03:09] Load avg. on willow is WARNING: WARNING - load average: 16.15, 17.31, 15.32 [03:03:29] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.020508/1.95, alarm hl:np_load_avg=2.165039/2.0, alarm hl:mem_free=902.000000M/350M, alarm hl:available=1/0 [03:05:20] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:05:29] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [03:07:09] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 299093 MB (5% inode=33%): [03:08:29] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.716309/1.95, alarm hl:np_load_avg=2.082520/2.0, alarm hl:mem_free=1371.000000M/350M, alarm hl:available=1/0 [03:10:10] Load avg. on willow is OK: OK - load average: 9.54, 14.21, 14.83 [03:15:10] Load avg. on willow is WARNING: WARNING - load average: 11.75, 15.06, 15.21 [03:19:20] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [03:31:20] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [03:31:29] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:36:20] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [03:50:09] Load avg. on willow is WARNING: WARNING - load average: 15.36, 17.29, 16.66 [03:52:10] / on wolfsbane is WARNING: DISK WARNING - free space: / 5600 MB (18% inode=93%): [03:57:39] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [04:00:10] Load avg. on willow is OK: OK - load average: 10.61, 13.02, 14.82 [04:02:29] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [04:03:10] Load avg. on willow is WARNING: WARNING - load average: 14.75, 16.40, 16.07 [04:05:30] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:08:10] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 299058 MB (5% inode=33%): [04:09:09] Load avg. on willow is OK: OK - load average: 9.88, 13.24, 14.79 [04:19:29] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [04:21:59] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:26:29] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [04:31:29] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:31:29] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [04:33:19] Load avg. on willow is WARNING: WARNING - load average: 13.97, 15.39, 14.75 [04:34:19] Load avg. on willow is OK: OK - load average: 11.86, 14.50, 14.47 [04:37:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [04:46:40] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [04:52:10] / on wolfsbane is WARNING: DISK WARNING - free space: / 5446 MB (18% inode=93%): [04:55:20] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:20] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:20] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:20] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:20] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:20] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:21] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:30] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:30] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:30] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:30] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:30] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:30] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:40] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:40] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:40] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:40] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:40] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:40] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:40] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:41] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:41] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:42] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:42] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:43] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:56:00] SMF on z-dat-s3-a is OK: OK - all services online [04:56:00] SMF on z-dat-s7-a is OK: OK - all services online [04:56:00] SMF on z-dat-s4-a is OK: OK - all services online [04:56:00] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [04:56:10] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [04:56:10] SMTP on hyacinth is OK: SMTP OK - 0.002 sec. response time [04:56:10] Load avg. on z-dat-s7-a is OK: OK - load average: 1.62, 1.80, 2.42 [04:56:10] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 94755 MB (23% inode=99%): [04:56:10] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2163 MB (99% inode=99%): [04:56:11] Load avg. on z-dat-s4-a is OK: OK - load average: 1.62, 1.80, 2.42 [04:56:11] SMTP on z-dat-s3-a is OK: SMTP OK - 0.004 sec. response time [04:56:12] / on z-dat-s4-a is OK: DISK OK - free space: / 8341 MB (27% inode=85%): [04:56:12] / on z-dat-s6-a is OK: DISK OK - free space: / 8341 MB (27% inode=85%): [04:56:13] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 165138 MB (17% inode=99%): [04:56:13] / on z-dat-s7-a is OK: DISK OK - free space: / 8341 MB (27% inode=85%): [04:56:14] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2113 MB (99% inode=99%): [04:56:14] Load avg. on z-dat-s3-a is OK: OK - load average: 1.79, 1.84, 2.43 [04:56:15] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 82436 MB (20% inode=99%): [04:56:30] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [04:56:31] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [04:58:00] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [05:02:22] Load avg. on willow is WARNING: WARNING - load average: 19.46, 16.20, 14.81 [05:03:01] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [05:06:02] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:08:10] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298991 MB (5% inode=33%): [05:09:30] Load avg. on willow is OK: OK - load average: 11.22, 14.16, 14.41 [05:14:30] Load avg. on willow is WARNING: WARNING - load average: 16.25, 14.84, 14.53 [05:20:01] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [05:27:03] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [05:27:03] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:31:50] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:32:00] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [05:38:00] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [05:47:03] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.788086/1.10, alarm hl:np_load_long=0.743164/1.55, alarm hl:mem_free=15618.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.788086/1.00, alarm hl:np_load_long=0.743164/1.50, alarm hl:mem_free=15618.000000M/600M, alarm hl:available=1/0 [05:51:02] Hydriz Wikipedia * Re: [Toolserver-l] Interwiki bot MMP planning [05:52:11] / on wolfsbane is WARNING: DISK WARNING - free space: / 5346 MB (17% inode=93%): [05:53:01] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:58:12] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [05:59:01] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.262695/1.10, alarm hl:np_load_long=1.046875/1.55, alarm hl:mem_free=15632.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.262695/1.00, alarm hl:np_load_long=1.046875/1.50, alarm hl:mem_free=15632.000000M/600M, alarm hl:available=1/0 [06:03:01] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [06:06:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:08:31] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298897 MB (5% inode=33%): [06:20:01] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [06:28:01] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [06:31:51] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:32:01] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [06:37:11] [[CommonsHelper2]] 10https://wiki.toolserver.org/w/index.php?diff=7194&oldid=5914&rcid=9586 * Krinkle * (-74) () [06:38:00] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [06:43:32] Load avg. on willow is CRITICAL: CRITICAL - load average: 19.27, 20.43, 21.09 [06:46:31] Load avg. on willow is WARNING: WARNING - load average: 16.68, 17.93, 19.92 [06:52:21] / on wolfsbane is WARNING: DISK WARNING - free space: / 5138 MB (17% inode=93%): [06:56:02] K. Peachey * Re: [Toolserver-l] Interwiki bot MMP planning [06:58:31] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.80, 23.25, 20.65 [06:59:01] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [07:03:01] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [07:07:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:08:31] Load avg. on willow is WARNING: WARNING - load average: 14.29, 18.61, 19.87 [07:08:31] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298828 MB (5% inode=33%): [07:20:01] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [07:28:01] Sun Grid Engine execd on willow is CRITICAL: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.359863/2.3, alarm hl:np_load_long=2.232910/2.5, alarm hl:cpu=99.400000/98, alarm hl:mem_free=1135.000000M/200M, alarm hl:available=1/0: medium-sol@willow in error state: QERROR as result of job 1997008s failure [07:31:50] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:32:02] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [07:36:31] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.12, 20.33, 18.85 [07:37:31] Load avg. on willow is WARNING: WARNING - load average: 28.59, 21.55, 19.38 [07:38:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [07:42:53] is the Free Image Search Tool working? [07:52:21] / on wolfsbane is WARNING: DISK WARNING - free space: / 4833 MB (16% inode=93%): [07:55:31] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.35, 21.39, 19.12 [07:56:32] Load avg. on willow is WARNING: WARNING - load average: 23.34, 20.73, 19.04 [07:59:11] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [08:02:01] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:03:10] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [08:03:52] [[List of Wikimedia bots]] ! 10https://wiki.toolserver.org/w/index.php?diff=7195&oldid=6932&rcid=9587 * 31.59.199.129 * (+76) (/* {{#language:ar}} */ ) [08:08:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:08:32] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298790 MB (5% inode=33%): [08:10:02] hi! I'm unable to establish a TUSC account.. [08:10:18] TUSC can't find my talk page edit.. [08:11:51] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [08:20:11] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [08:29:01] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [08:31:21] / on wolfsbane is OK: DISK OK - free space: / 8218 MB (27% inode=93%): [08:31:51] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:32:11] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [08:34:32] Load avg. on willow is OK: OK - load average: 9.52, 12.83, 14.76 [08:38:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [08:59:11] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [09:02:32] Load avg. on willow is WARNING: WARNING - load average: 18.33, 17.46, 15.11 [09:03:11] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [09:08:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:08:32] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298674 MB (5% inode=33%): [09:09:42] Load avg. on willow is OK: OK - load average: 10.63, 14.05, 14.41 [09:21:11] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [09:24:42] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:24:51] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:51] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:24:51] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:25:11] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:25:31] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:25:31] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:25:31] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:25:41] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:25:41] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:25:41] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:25:41] SMF on hyacinth is OK: OK - all services online [09:26:01] SMF on z-dat-s6-a is OK: OK - all services online [09:26:01] SMF on z-dat-s3-a is OK: OK - all services online [09:26:01] SMF on z-dat-s4-a is OK: OK - all services online [09:26:31] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:29:01] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [09:32:01] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:32:41] Load avg. on willow is WARNING: WARNING - load average: 14.29, 15.08, 13.97 [09:33:10] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [09:33:41] Load avg. on willow is OK: OK - load average: 10.41, 13.81, 13.59 [09:39:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [09:50:41] Load avg. on willow is WARNING: WARNING - load average: 17.67, 15.39, 14.13 [09:55:30] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:02] SMF on z-dat-s4-a is OK: OK - all services online [09:59:02] DaB. * Re: [Toolserver-l] Interwiki bot MMP planning [09:59:10] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [10:03:11] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [10:08:22] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:08:41] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298592 MB (5% inode=33%): [10:12:31] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:41] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:41] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:41] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:50] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:51] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:12:52] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:52] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:52] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:12:52] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:13:22] SMTP on hyacinth is OK: SMTP OK - 0.031 sec. response time [10:13:22] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:13:31] SMTP on z-dat-s4-a is OK: SMTP OK - 0.003 sec. response time [10:13:31] SMTP on z-dat-s7-a is OK: SMTP OK - 0.003 sec. response time [10:13:31] SMTP on z-dat-s6-a is OK: SMTP OK - 0.012 sec. response time [10:13:41] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:13:42] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:13:42] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:13:42] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:13:42] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:21:21] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [10:29:22] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [10:32:00] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:33:21] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [10:39:15] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [10:56:55] Load avg. on willow is WARNING: WARNING - load average: 11.85, 15.16, 14.45 [10:57:56] Load avg. on willow is OK: OK - load average: 9.53, 13.97, 14.08 [10:59:25] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [11:00:15] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.10, alarm hl:np_load_long=0.765625/1.55, alarm hl:mem_free=13833.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.00, alarm hl:np_load_long=0.765625/1.50, alarm hl:mem_free=13833.000000M/600M, alarm hl:available=1/0 [11:02:55] Load avg. on willow is WARNING: WARNING - load average: 17.48, 18.88, 16.23 [11:03:25] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [11:08:24] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:08:56] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298301 MB (5% inode=33%): [11:21:15] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [11:21:25] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [11:24:14] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.072266/1.00, alarm hl:np_load_long=1.105469/1.50, alarm hl:mem_free=14435.000000M/600M, alarm hl:available=1/0 [11:30:25] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1997008s failure [11:32:05] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:33:24] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [11:37:15] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [11:39:15] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [11:58:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:59:34] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [12:03:26] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.129395/1.95, alarm hl:np_load_avg=2.208496/2.0, alarm hl:mem_free=1167.000000M/350M, alarm hl:available=1/0 [12:03:35] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [12:08:25] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:09:06] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298227 MB (5% inode=33%): [12:10:44] [[Interwiki bot MMP planning]] ! 10https://wiki.toolserver.org/w/index.php?diff=7196&oldid=7193&rcid=9588 * Grimlock * (+488) () [12:16:06] Load avg. on willow is WARNING: WARNING - load average: 13.63, 18.66, 18.03 [12:21:35] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [12:22:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:27:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.629883/1.95, alarm hl:np_load_avg=2.032227/2.0, alarm hl:mem_free=1183.000000M/350M, alarm hl:available=1/0 [12:28:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:31:05] Load avg. on willow is CRITICAL: CRITICAL - load average: 34.50, 21.20, 18.39 [12:32:06] Load avg. on willow is WARNING: WARNING - load average: 19.82, 19.49, 17.96 [12:32:15] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:33:35] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [12:39:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [12:42:55] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:43:15] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:43:15] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:43:34] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [12:44:05] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:44:05] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:46:15] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.206055/1.10, alarm hl:np_load_long=0.815430/1.55, alarm hl:mem_free=14623.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.206055/1.00, alarm hl:np_load_long=0.815430/1.50, alarm hl:mem_free=14623.000000M/600M, alarm hl:available=1/0 [12:47:14] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [12:51:05] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:15] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:15] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:15] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:16] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:25] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:25] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:25] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:26] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:27] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:27] Load avg. on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:28] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:35] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:35] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:35] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:35] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:35] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:45] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:45] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:45] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:45] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:54] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:54] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:55] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:55] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:55] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:51:55] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:52:05] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:52:14] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:52:25] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [12:52:26] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [12:52:45] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [12:52:45] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [12:52:45] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [12:52:55] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [12:53:45] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [12:53:45] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 223.000000 [12:53:45] MySQL on z-dat-s3-a is OK: Uptime: 923584 Threads: 35 Questions: 1005386243 Slow queries: 68801 Opens: 9753785 Flush tables: 1 Open tables: 16384 Queries per second avg: 1088.570 [12:53:45] MySQL slave on z-dat-s3-a is OK: Uptime: 923584 Threads: 35 Questions: 1005386244 Slow queries: 68801 Opens: 9753785 Flush tables: 1 Open tables: 16384 Queries per second avg: 1088.570 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 296 [12:53:55] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 94430 MB (23% inode=99%): [12:53:55] MySQL on z-dat-s7-a is OK: Uptime: 923587 Threads: 9 Questions: 306589512 Slow queries: 28447 Opens: 2638674 Flush tables: 1 Open tables: 6423 Queries per second avg: 331.955 [12:53:55] SMF on z-dat-s7-a is OK: OK - all services online [12:53:55] / on z-dat-s3-a is OK: DISK OK - free space: / 8339 MB (27% inode=85%): [12:53:55] Load avg. on z-dat-s7-a is OK: OK - load average: 0.12, 0.98, 1.60 [12:53:56] Load avg. on z-dat-s4-a is OK: OK - load average: 0.12, 0.98, 1.60 [12:53:56] / on z-dat-s7-a is OK: DISK OK - free space: / 8339 MB (27% inode=85%): [12:53:57] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 162994 MB (16% inode=99%): [12:53:57] / on z-dat-s6-a is OK: DISK OK - free space: / 8339 MB (27% inode=85%): [12:53:58] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 2350 MB (99% inode=99%): [12:53:58] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 82106 MB (20% inode=99%): [12:53:59] Load avg. on z-dat-s3-a is OK: OK - load average: 0.21, 0.99, 1.60 [12:53:59] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2351 MB (99% inode=99%): [12:54:00] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:54:15] SMF on z-dat-s4-a is OK: OK - all services online [12:54:16] SMF on z-dat-s6-a is OK: OK - all services online [12:54:16] SMF on hyacinth is OK: OK - all services online [12:54:16] SMF on z-dat-s3-a is OK: OK - all services online [12:54:25] SMTP on hyacinth is OK: SMTP OK - 0.098 sec. response time [12:54:25] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 162993 MB (16% inode=99%): [12:54:26] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2290 MB (99% inode=99%): [12:54:26] SMTP on z-dat-s3-a is OK: SMTP OK - 0.112 sec. response time [12:54:26] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2293 MB (99% inode=99%): [12:54:45] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [13:00:05] Load avg. on willow is OK: OK - load average: 7.59, 11.21, 14.54 [13:00:25] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.316406/1.10, alarm hl:np_load_long=0.316406/1.55, alarm hl:mem_free=445.000000M/500M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.316406/1.00, alarm hl:np_load_long=0.316406/1.50, alarm hl:mem_free=445.000000M/600M, alarm hl:available=1/0 [13:00:25] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [13:03:06] Load avg. on willow is WARNING: WARNING - load average: 12.36, 15.21, 15.89 [13:03:35] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [13:07:25] Sun Grid Engine execd on wolfsbane is OK: testqueue@wolfsbane OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [13:08:25] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:10:06] Load avg. on willow is OK: OK - load average: 9.94, 13.37, 14.95 [13:10:06] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 298164 MB (5% inode=33%): [13:21:35] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [13:32:16] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:34:25] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [13:37:44] / on wolfsbane is WARNING: DISK WARNING - free space: / 6283 MB (20% inode=93%): [13:39:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [13:57:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.120117/1.95, alarm hl:np_load_avg=1.743164/2.0, alarm hl:mem_free=317.000000M/350M, alarm hl:available=1/0 [14:00:34] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [14:03:05] Load avg. on willow is WARNING: WARNING - load average: 12.66, 15.41, 14.80 [14:03:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [14:03:35] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [14:04:05] Load avg. on willow is OK: OK - load average: 10.83, 14.48, 14.52 [14:08:36] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:08:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.063965/1.95, alarm hl:np_load_avg=2.157227/2.0, alarm hl:mem_free=294.000000M/350M, alarm hl:available=1/0 [14:09:04] Load avg. on willow is WARNING: WARNING - load average: 11.41, 15.79, 15.30 [14:10:15] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297965 MB (5% inode=33%): [14:22:58] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [14:25:54] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Unavailable and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [14:33:55] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default svc:/application/management/common-agent-container-2:default [14:35:06] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [14:38:40] / on wolfsbane is WARNING: DISK WARNING - free space: / 5850 MB (19% inode=93%): [14:40:15] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [14:48:52] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:52] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:53] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:48:53] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:49:37] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:38] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:39] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:39] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:39] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:39] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:14] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2404 MB (99% inode=99%): [14:50:14] SMF on z-dat-s6-a is OK: OK - all services online [14:50:14] SMF on z-dat-s3-a is OK: OK - all services online [14:50:14] Load avg. on z-dat-s4-a is OK: OK - load average: 0.17, 0.99, 1.40 [14:50:14] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 82040 MB (20% inode=99%): [14:50:14] Load avg. on z-dat-s7-a is OK: OK - load average: 0.17, 0.99, 1.40 [14:50:15] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2400 MB (99% inode=99%): [14:50:15] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2401 MB (99% inode=99%): [14:50:15] SMF on z-dat-s7-a is OK: OK - all services online [14:50:16] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:16] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:50:26] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:50:26] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:50:26] SMTP on z-dat-s7-a is OK: SMTP OK - 0.207 sec. response time [14:50:26] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:50:26] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:50:26] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [14:50:27] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [14:50:35] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [14:58:39] That wasn't spammy at all. [15:01:08] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [15:05:24] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [15:09:15] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:10:54] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297903 MB (5% inode=33%): [15:24:18] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [15:27:13] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Unavailable and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [15:35:14] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [15:36:04] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:39:54] / on wolfsbane is WARNING: DISK WARNING - free space: / 5514 MB (18% inode=93%): [15:41:56] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [15:46:39] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.092774/1.00, alarm hl:np_load_long=0.844727/1.50, alarm hl:mem_free=14908.000000M/600M, alarm hl:available=1/0 [15:48:00] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [16:01:56] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [16:05:51] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [16:09:23] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:11:14] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297827 MB (5% inode=33%): [16:14:22] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.136719/1.10, alarm hl:np_load_long=1.121094/1.55, alarm hl:mem_free=14419.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.136719/1.00, alarm hl:np_load_long=1.121094/1.50, alarm hl:mem_free=14419.000000M/600M, alarm hl:available=1/0 [16:24:51] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [16:28:12] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Unavailable and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [16:28:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [16:35:48] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [16:37:41] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default svc:/application/management/common-agent-container-2:default [16:40:39] / on wolfsbane is WARNING: DISK WARNING - free space: / 5230 MB (17% inode=93%): [16:42:30] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [16:46:23] Merlissimo: around ? [16:46:36] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.418945/1.10, alarm hl:np_load_long=1.166016/1.55, alarm hl:mem_free=14602.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.418945/1.00, alarm hl:np_load_long=1.166016/1.50, alarm hl:mem_free=14602.000000M/600M, alarm hl:available=1/0 [16:47:29] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [16:52:44] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.735351/1.95, alarm hl:np_load_avg=2.359863/2.0, alarm hl:mem_free=1055.000000M/350M, alarm hl:available=1/0 [17:00:53] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:02:24] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.025391/1.00, alarm hl:np_load_long=1.019531/1.50, alarm hl:mem_free=14887.000000M/600M, alarm hl:available=1/0 [17:02:24] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [17:03:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.402832/1.95, alarm hl:np_load_avg=3.058105/2.0, alarm hl:mem_free=779.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.402832/2.3, alarm hl:np_load_long=2.724609/2.5, alarm hl:cpu=74.300000/98, alarm hl:mem_free=779.000000M/200M, alarm hl:available=1/0 [17:06:59] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [17:11:56] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297789 MB (5% inode=33%): [17:12:34] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:17:52] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:24:51] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [17:29:02] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Unavailable and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [17:36:16] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [17:37:42] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:42:23] / on wolfsbane is WARNING: DISK WARNING - free space: / 4833 MB (16% inode=93%): [17:44:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [17:53:15] / on wolfsbane is OK: DISK OK - free space: / 6474 MB (21% inode=93%): [17:53:55] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:57:49] Sun Grid Engine execd on wolfsbane is OK: testqueue@wolfsbane OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [18:02:31] it's just me who is getting SGE errors? [18:03:30] Alchimista: nope [18:03:36] Alchimista: i'm getting some too… [18:03:38] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [18:04:29] Toto_Azero|away: do you know if anyone has already reported it? [18:05:04] dunno… but i think not [18:08:53] 3(created) [TS-1375] Hard disk failure in turnera; Toolserver; Bug <10https://jira.toolserver.org/browse/TS-1375> (Marlen Caemmerer) [18:10:28] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [18:10:56] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:12:59] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:13:38] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297640 MB (5% inode=33%): [18:25:52] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [18:31:45] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Unavailable and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [18:34:01] Sun Grid Engine execd on wolfsbane is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:36:55] / on wolfsbane is WARNING: DISK WARNING - free space: / 6272 MB (20% inode=93%): [18:36:55] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [18:37:10] Sun Grid Engine execd on wolfsbane is OK: testqueue@wolfsbane OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [18:39:23] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:40:08] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [18:44:38] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [19:00:30] Load avg. on willow is CRITICAL: CRITICAL - load average: 19.97, 29.09, 27.67 [19:01:52] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [19:02:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:04:04] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [19:12:17] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [19:14:59] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297566 MB (5% inode=33%): [19:23:42] / on wolfsbane is OK: DISK OK - free space: / 8730 MB (29% inode=93%): [19:25:12] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=4.165527/1.95, alarm hl:np_load_avg=3.753418/2.0, alarm hl:mem_free=1436.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=4.165527/2.3, alarm hl:np_load_long=3.956543/2.5, alarm hl:cpu=54.900000/98, alarm hl:mem_free=1436.000000M/200M, alarm hl:available=1/0 [19:27:02] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [19:32:28] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [19:32:28] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [19:32:28] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [19:32:41] Load avg. on willow is CRITICAL: CRITICAL - load average: 22.69, 21.88, 26.71 [19:32:41] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:32:41] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.029785/1.95, alarm hl:np_load_avg=2.579590/2.0, alarm hl:mem_free=1686.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.029785/2.3, alarm hl:np_load_long=3.292480/2.5, alarm hl:cpu=59.100000/98, alarm hl:mem_free=1686.000000M/200M, alarm hl:available=1/0 [19:33:05] hello all [19:33:08] @replag [19:33:08] DaBPunkt: s5-rr-a: 13s [+0.00 s/s]; s5-user: 13s [+0.00 s/s] [19:33:08] DiskSuite on turnera is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:34:41] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.544922/1.10, alarm hl:np_load_long=1.043945/1.55, alarm hl:mem_free=14668.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.544922/1.00, alarm hl:np_load_long=1.043945/1.50, alarm hl:mem_free=14668.000000M/600M, alarm hl:available=1/0 [19:37:10] Sun Grid Engine execd on ortelius is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:37:53] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [19:38:48] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.007812/1.10, alarm hl:np_load_long=1.223633/1.55, alarm hl:mem_free=14595.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.007812/1.00, alarm hl:np_load_long=1.223633/1.50, alarm hl:mem_free=14595.000000M/600M, alarm hl:available=1/0 [19:40:04] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:44:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [19:49:42] DaBPunkt: are you aware that SGE is having some problems? I'm getting emails of cron/qcronsub errors [19:50:43] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:52:21] Alchimista: no, but I will look at it now [19:53:00] DaBPunkt: other people had mentioned it, do you need a copy of the error? [19:54:38] Alchimista: yes please [19:55:16] looks like the nfs-service has a problem [19:55:19] it is very slow [19:55:55] http://pastebin.com/yZuHapaM [19:57:00] tnx [19:57:11] i had a warning day 5: Segmentation Fault - core dumped [19:57:15] turnera seems to be the problem [19:57:32] I will take a look in the syslog and reboot it afterwards [19:59:32] Load avg. on willow is WARNING: WARNING - load average: 8.23, 13.83, 19.88 [19:59:44] by the way, who takes care of py*related things? [20:00:52] 3(created) [MNT-1234] Reboot turnera; Maintenance; Emergency work <10https://jira.toolserver.org/browse/MNT-1234> (DaB.) [20:02:54] 3(commented) [MNT-1234] Reboot turnera <10https://jira.toolserver.org/browse/MNT-1234> (DaB.) [20:04:16] FC 0/15 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/15:UP: 1 int NOK : CRITICAL [20:05:59] /home will be away for a moment [20:08:37] Da [20:08:52] DaBPunkt: turnera is having a hard disk problem [20:08:57] Load avg. on willow is OK: OK - load average: 8.50, 9.43, 14.86 [20:09:41] anyone know why TS pages have been jumping in and out of functionality all day? [20:10:11] like I can't get to phpMyAdmin right now [20:10:33] correction, I just got it back [20:10:36] but still [20:12:47] FC 0/12 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/12:UP: 1 int NOK : CRITICAL [20:15:47] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297466 MB (5% inode=33%): [20:18:06] NTP on turnera is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:18:48] SMTP on turnera is CRITICAL: Connection refused [20:18:48] / on turnera is CRITICAL: Connection refused by host [20:18:48] SSH on turnera is CRITICAL: Connection refused [20:18:48] /tmp on turnera is CRITICAL: Connection refused by host [20:18:48] ts-array5 on turnera is CRITICAL: Connection refused by host [20:18:48] Free Memory on turnera is CRITICAL: Connection refused by host [20:18:48] Load avg. on turnera is CRITICAL: Connection refused by host [20:18:57] FMA on turnera is CRITICAL: ERROR - unexpected output from snmpwalk [20:18:57] Environment IPMI on turnera is CRITICAL: Connection refused by host [20:27:48] FC 0/8 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/8:UP: 1 int NOK : CRITICAL [20:32:56] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:33:47] Free Memory on damiana is CRITICAL: CRITICAL - 3.7% (156496 kB) free! [20:35:47] Free Memory on damiana is WARNING: WARNING - 5.2% (218352 kB) free! [20:38:47] Free Memory on damiana is CRITICAL: CRITICAL - 4.1% (171896 kB) free! [20:38:48] FC 0/11 on fsw1-n1-oe16-esams.mgmt is CRITICAL: FC port 0/11:UP: 1 int NOK : CRITICAL [20:38:57] DiskSuite on turnera is CRITICAL: Connection refused by host [20:42:35] nosy: did you infoirm sebastian? [20:42:47] DaBPunkt: why? [20:43:02] i opened a ticket at oracle [20:43:11] ah, still under gurantee? [20:43:14] yes [20:46:47] 3(created) [ACCAPP-506] ArticleNet: Development of a grafical tool to show relations between WP articles; Account Approval; New Account <10https://jira.toolserver.org/browse/ACCAPP-506> (Claus Colloseus) [20:47:42] hi all ! [20:48:00] willow does not accept anymore my SSH key [20:48:05] any problem ? [20:48:16] should I try another server to connect ?? [20:48:37] yes authentication should not work [20:48:40] Grimlock-fr: no, we have a problem with a HA-node [20:48:49] ok [20:48:54] please wait some minutes [20:48:58] no pb [20:49:11] It's late, I will try again tomorrow :) [20:49:15] DaBPunkt: damiana went away too probably too few memory [20:49:38] I have to think about my "scripts" organisation, by the way [20:49:44] so... [20:49:49] Thanks a lot ! [20:50:08] DaBPunkt, I don't think it should be falling back to keyboard-interactive [20:50:12] ChuispastonBot will restart later ! [20:50:56] wb tsbot [20:50:56] nosy: I know. I have to fix turneras boot-probelms first [20:51:05] SMF on turnera is CRITICAL: Connection refused by host [20:51:16] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [20:51:23] DaBPunkt: we now have access to turnera.mgmt [20:51:27] since yesterday [20:51:32] nosy: I know [20:53:04] I deleted the defect metadb-entries and reboot trunera now again [20:53:06] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.385254/1.95, alarm hl:np_load_avg=0.842773/2.0, alarm hl:mem_free=272.000000M/350M, alarm hl:available=1/0 [20:55:16] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:00:46] Load avg. on turnera is OK: OK - load average: 0.88, 0.63, 0.30 [21:01:06] SMTP on turnera is OK: SMTP OK - 0.478 sec. response time [21:01:15] /tmp on turnera is OK: DISK OK - free space: /tmp 11165 MB (99% inode=99%): [21:01:15] SSH on turnera is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [21:01:25] ts-array5 on turnera is OK: 2/2 paths are active [21:01:25] Environment IPMI on turnera is OK: ok: temperature ok fan ok voltage ok chassis ok [21:01:35] FMA on turnera is OK: OK [21:01:46] NTP on turnera is OK: NTP OK: Offset -0.062538 secs [21:03:05] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.383301/1.95, alarm hl:np_load_avg=1.404785/2.0, alarm hl:mem_free=312.000000M/350M, alarm hl:available=1/0 [21:04:23] Load avg. on turnera is OK: OK - load average: 0.16, 0.40, 0.28 [21:04:29] NTP on turnera is OK: NTP OK: Offset -0.024298 secs [21:05:30] FC 0/13 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 49 during synchronization. [21:05:30] FC 0/22 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 51 during synchronization. [21:05:40] FC 0/23 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 53 during synchronization. [21:05:40] FC 0/14 [hyacinth] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 54 during synchronization. [21:05:40] eth0 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 55 during synchronization. [21:05:40] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.637695/1.95, alarm hl:np_load_avg=1.403320/2.0, alarm hl:mem_free=253.000000M/350M, alarm hl:available=1/0 [21:05:49] FC 0/3 [cassia] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 56 during synchronization. [21:05:49] FC 0/15 [fsw1-n1-oe16-esams] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 57 during synchronization. [21:05:50] FC 0/4 [hemlock] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 58 during synchronization. [21:05:50] FC 0/16 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 59 during synchronization. [21:06:00] FC 0/5 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 60 during synchronization. [21:06:00] FC 0/17 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 61 during synchronization. [21:06:09] FC 0/18 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 62 during synchronization. [21:06:09] FC 0/6 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 63 during synchronization. [21:06:09] FC 0/0 [far1-n1-oe16-esams A2] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 64 during synchronization. [21:06:09] FC 0/1 [far1-n1-oe16-esams B2] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 65 during synchronization. [21:06:10] FC 0/19 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 66 during synchronization. [21:06:10] FC 0/7 [daphne] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 67 during synchronization. [21:06:19] FC 0/8 [adenia] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 68 during synchronization. [21:06:19] FC 0/10 [rosemary] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 69 during synchronization. [21:06:19] FC 0/2 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 70 during synchronization. [21:06:29] FC 0/11 [fsw1-n1-oe16-esams] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 71 during synchronization. [21:06:29] FC 0/9 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 72 during synchronization. [21:06:30] FC 0/20 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 73 during synchronization. [21:06:30] FC 0/12 [thyme] on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 74 during synchronization. [21:06:30] FC 0/21 on fsw2-n1-oe16-esams.mgmt is UNKNOWN: ERROR opening session: Received usmStatsUnknownUserNames.0 Report-PDU with value 77 during synchronization. [21:08:17] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:08:58] looks like I booted the wrong snapshort. will try again [21:12:49] DaBPunkt: Stuff in my crontab that uses SGE to (re)start long-running stuff if they are down, is breaking [21:12:56] it is starting it every minute, undonditionally [21:13:00] I haven't changed that line in months [21:13:17] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.309082/1.95, alarm hl:np_load_avg=1.294922/2.0, alarm hl:mem_free=170.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.309082/2.3, alarm hl:np_load_long=1.210938/2.5, alarm hl:cpu=91.300000/98, alarm hl:mem_free=170.000000M/200M, alarm hl:available=1/0 [21:13:19] dozens of irc bot dupes coming online [21:13:38] on willow [21:15:27] / on turnera is OK: DISK OK - free space: / 48105 MB (66% inode=95%): [21:15:27] Free Memory on turnera is OK: OK - 87.6% (3665708 kB) free. [21:16:27] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 297387 MB (5% inode=33%): [21:18:17] NTP on turnera is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.008078 secs [21:19:26] I will now move nfs back ti turnera to free some ram on damiana [21:19:46] DaBPunkt: did you turn off SGE for the time being? no problem, just let me know [21:20:05] [[Special:Log/newusers]] create 10 * Cosmetologistsalary * (New user account) [21:20:25] Krinkle: no [21:20:43] hm.. because the tasks put into SGE aren't being started. [21:20:53] maybe there is a long queue? checking.. [21:21:04] (I killed all the dupe bots, but none are coming back) [21:21:22] job-ID prior name user state submit/start at queue slots ja-task-ID [21:21:22] ----------------------------------------------------------------------------------------------------------------- [21:21:23] 2003805 0.04537 dbbot_wm krinkle qw 05/08/2012 21:13:01 1 [21:22:18] MySQL slave on thyme is CRITICAL: (Return code of 139 is out of bounds) [21:22:20] qstat --help does not document the states. What does 'qw' mean? the queue column is empty [21:22:27] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [21:22:58] 3(commented) [MNT-1234] Reboot turnera <10https://jira.toolserver.org/browse/MNT-1234> (DaB.) [21:23:07] ah, "pending jobs" [21:23:08] okay [21:23:48] everything *should* be ok again [21:24:57] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:26:29] DaBPunkt: when I do `qstat -f` it shows it is still stuck under "no queue - PENDING" [21:26:39] usually it starts within seconds [21:27:34] Krinkle: I guess several jobs statted at the same time. But I can kill the pending job of you like [21:28:24] thanks, (qdel , right?) - but I worry about the lots and lots of others with similar setups [21:28:34] and my own under other MMP accounts [21:28:51] [[User:Cosmetologistsalary]] !NM 10https://wiki.toolserver.org/w/index.php?oldid=7197&rcid=9590 * Cosmetologistsalary * (+263) (Created page with "What just does a cosmetologist do on a daily basis? Discover out by finding out what the standard cosmetology work description is. My Website: [http://www.cosmetologistsalaryr...") [21:29:31] [[Special:Log/delete]] delete 10 * Dab * (deleted "[[02User:Cosmetologistsalary10]]": SPAM) [21:29:43] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 296249 MB (5% inode=33%): [21:29:43] Free Memory on turnera is OK: OK - 49.2% (2059568 kB) free. [21:29:43] Load avg. on turnera is OK: OK - load average: 0.52, 0.47, 0.37 [21:29:51] NTP on turnera is WARNING: NTP WARNING: Server has the LI_ALARM bit set, Offset -0.013356 secs [21:47:07] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 296218 MB (5% inode=33%): [21:47:07] Free Memory on turnera is OK: OK - 38.3% (1603220 kB) free. [21:47:07] Load avg. on turnera is OK: OK - load average: 0.46, 0.47, 0.44 [21:47:15] NTP on turnera is OK: NTP OK: Offset 0.005198 secs [21:47:15] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:54:16] DaBPunkt: Assuming you didn't fix my entry manually, - it is working now :) [21:57:19] Krinkle: qstat -j shows you the reason why a job is pending [21:57:28] ok [21:59:06] Krinkle: what happend exactly? you used qcronsub and more than one job with the same name were submitted at the same time? [21:59:26] I use what was recommended to "us" instead of Phoenix [21:59:30] cronsub in crontab [21:59:34] every minute [21:59:39] Krinkle, give him the cron line [21:59:57] one moment, Im updating a tool from github on ts [22:00:08] * * * * * cronsub -l -s dbbot_wm $HOME/bots/dbbot-wm-start.sh [22:00:14] that one [22:00:37] Phoenix? [22:00:39] it doesn't start unless it is already started (either -l or -s does that, I dont know) [22:00:40] Phoenix Wright. [22:00:47] Phoenix Wright. Thats Wright. [22:01:11] -s is the name [22:01:56] or not... [22:01:58] Krinkle: i'll change cronsub to use qcronsub. [22:02:04] ? [22:02:14] Please don't change stuff unless you're sure it doesn't break any existing usecases. [22:02:21] This is working fine and I dont care any thing else [22:02:31] Platonides: if you are rewriting it you shoudl use qcronsub directly [22:02:40] -s seems to be "-j y -o $HOME/${JOBNAME}.out" [22:02:46] and -l "-l h_rt=INFINITY" [22:03:10] I don't see what stops it for launching a new one [22:03:11] Krinkle: cronsub is the "old way" which is still working as before, qcronsub is my new version [22:03:25] Merlissimo: I missed the announcement, right ? [22:03:25] unless it's the default for a named job [22:03:33] Krinkle, it was in toolserver-l [22:03:41] Okay :) [22:03:53] There is a lot of mailing lists :) [22:03:53] Krinkle: yes, but cronsub should still work [22:04:29] I'm glad it is fixed now, but any idea how the same job ended up dozens of times in qstat ? [22:04:38] (and all running) [22:05:32] cronsub uses qstat do check if the job is running, but qstat failed because tunera was down. qcronsub uses a jsv. [22:07:08] so it shouldn't assume not running if the server is down, right? [22:07:19] that sounds like a bug [22:08:16] DaBPunkt: i changed /opt/local/bin/cronsub some time ago, but changes are lost. how can i change is permanently? [22:09:08] Merlissimo: I will check if it is in puppet tomorrow [22:10:03] according to source its in puupet, but that is new [22:20:25] nacht ts [22:32:54] Load avg. on willow is WARNING: WARNING - load average: 13.69, 17.68, 15.01 [22:33:24] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.666504/1.95, alarm hl:np_load_avg=2.195801/2.0, alarm hl:mem_free=893.000000M/350M, alarm hl:available=1/0 [22:34:24] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:34:55] Load avg. on willow is OK: OK - load average: 7.90, 14.35, 14.09 [22:44:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.000488/1.95, alarm hl:np_load_avg=2.023438/2.0, alarm hl:mem_free=637.000000M/350M, alarm hl:available=1/0 [22:44:55] Load avg. on willow is WARNING: WARNING - load average: 12.47, 15.28, 14.67 [22:46:36] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:46:55] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [22:47:14] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:47:45] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [22:48:05] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 296077 MB (5% inode=33%): [23:20:35] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.737305/1.95, alarm hl:np_load_avg=2.419434/2.0, alarm hl:mem_free=977.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.737305/2.3, alarm hl:np_load_long=2.296875/2.5, alarm hl:cpu=99.700000/98, alarm hl:mem_free=977.000000M/200M, alarm hl:available=1/0 [23:20:56] Load avg. on willow is WARNING: WARNING - load average: 18.98, 19.04, 18.31 [23:30:55] Load avg. on willow is CRITICAL: CRITICAL - load average: 51.54, 25.95, 20.59 [23:32:56] Load avg. on willow is WARNING: WARNING - load average: 15.57, 21.01, 19.40 [23:43:04] Load avg. on willow is CRITICAL: CRITICAL - load average: 40.57, 25.40, 21.24 [23:47:06] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [23:47:15] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:47:35] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:47:54] DiskSuite on turnera is CRITICAL: CRITICAL - submirror d42 of mirror d40 is Needs and submirror d32 of mirror d30 is Needs and submirror d22 of mirror d20 is Needs and submirror d12 of mirror d10 is Needs [23:48:06] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 295975 MB (5% inode=33%): [23:59:36] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK