[00:00:14] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:00:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.984375/1.95, alarm hl:np_load_avg=1.944824/2.0, alarm hl:mem_free=238.000000M/350M, alarm hl:available=1/0 [00:02:31] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:04:14] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [00:06:12] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 408693 MB (7% inode=41%): [00:16:47] Are there any tool roots around? [00:17:17] @replag [00:17:17] euphoria[afk]: s1-rr-a: 9h 43m 41s [+0.07 s/s]; s1-user: 9h 43m 41s [+0.07 s/s]; s3-rr-a: 38s [+0.00 s/s]; s3-user: 38s [+0.00 s/s]; s6-rr-a: 12s [-0.00 s/s]; s6-user: 12s [-0.00 s/s] [00:17:23] Guess not [00:17:28] :/ [00:18:13] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.327149/1.10, alarm hl:np_load_long=0.774414/1.55, alarm hl:mem_free=16573.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.327149/1.00, alarm hl:np_load_long=0.774414/1.50, alarm hl:mem_free=16573.000000M/600M, alarm hl:available=1/0 [00:18:13] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:18:21] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:19:12] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [00:19:12] Load avg. on willow is WARNING: WARNING - load average: 18.88, 20.43, 19.79 [00:20:30] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [00:25:21] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.41, 20.49, 20.03 [00:29:23] [[Special:Log/newusers]] create 10 * Coffee3384 * (New user account) [00:30:13] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.10, alarm hl:np_load_long=0.885742/1.55, alarm hl:mem_free=16229.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.00, alarm hl:np_load_long=0.885742/1.50, alarm hl:mem_free=16229.000000M/600M, alarm hl:available=1/0 [00:36:21] [[User:Coffee3384]] !NM 10https://wiki.toolserver.org/w/index.php?oldid=7109&rcid=9384 * Coffee3384 * (+323) (Created page with "I'm a reasonably avid blogger and article author and luxuriate in talking about new things that I use. I've been trying to find a brand new coffee machine and got here throughout...") [00:38:12] Load avg. on willow is WARNING: WARNING - load average: 19.33, 19.52, 19.98 [00:40:30] [[Special:Log/delete]] delete 10 * MZMcBride * (deleted "[[02User:Coffee338410]]": spam) [00:40:48] [[Special:Log/block]] block 10 * MZMcBride * (blocked [[02User:Coffee338410]] with an expiry time of infinite (account creation disabled): inappropriate behavior) [00:41:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.32, 20.05, 20.09 [00:41:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:46:14] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 36411 [00:48:21] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 36518.000000 [00:58:21] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [01:00:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:01:22] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.962402/1.95, alarm hl:np_load_avg=2.629395/2.0, alarm hl:mem_free=407.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.962402/2.3, alarm hl:np_load_long=2.544434/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=407.000000M/150M, alarm hl:available=1/0 [01:03:31] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:05:15] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [01:07:11] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 410208 MB (7% inode=41%): [01:17:13] Load avg. on willow is CRITICAL: CRITICAL - load average: 17.84, 19.54, 20.23 [01:18:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:20:31] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [01:24:13] Load avg. on willow is WARNING: WARNING - load average: 18.96, 19.48, 20.00 [01:25:20] Load avg. on willow is CRITICAL: CRITICAL - load average: 20.56, 19.77, 20.06 [01:26:11] Load avg. on willow is WARNING: WARNING - load average: 19.22, 19.56, 19.97 [01:41:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:47:12] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 39487 [01:48:21] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 39550.000000 [01:49:10] is anyone else here? [01:53:21] SSH on adenia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:54:21] SSH on adenia is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:55:12] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.193360/1.10, alarm hl:np_load_long=0.893555/1.55, alarm hl:mem_free=17967.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.193360/1.00, alarm hl:np_load_long=0.893555/1.50, alarm hl:mem_free=17967.000000M/600M, alarm hl:available=1/0 [01:57:11] Load avg. on willow is WARNING: WARNING - load average: 16.91, 16.97, 17.96 [01:58:12] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [01:58:31] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [02:00:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:01:00] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:01:31] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.675781/1.95, alarm hl:np_load_avg=2.349121/2.0, alarm hl:mem_free=635.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.675781/2.3, alarm hl:np_load_long=2.307617/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=635.000000M/150M, alarm hl:available=1/0 [02:03:31] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:06:13] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [02:07:11] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 410072 MB (7% inode=41%): [02:15:11] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:15:23] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.148438/1.10, alarm hl:np_load_long=0.933594/1.55, alarm hl:mem_free=17637.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.148438/1.00, alarm hl:np_load_long=0.933594/1.50, alarm hl:mem_free=17637.000000M/600M, alarm hl:available=1/0 [02:16:12] Load avg. on willow is WARNING: WARNING - load average: 21.50, 21.10, 19.98 [02:18:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.11, 20.84, 20.01 [02:18:22] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:20:31] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [02:21:42] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129934 MB (10% inode=99%): [02:26:41] /sql on cassia is OK: DISK OK - free space: /sql 130546 MB (11% inode=99%): [02:36:01] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [02:41:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:48:12] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 42612 [02:48:22] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 42620.000000 [02:59:22] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [03:00:12] Load avg. on willow is WARNING: WARNING - load average: 24.64, 20.00, 19.73 [03:00:23] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:01:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 27.25, 22.17, 20.55 [03:01:42] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:01:42] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=3.363770/1.95, alarm hl:np_load_avg=2.772461/2.0, alarm hl:mem_free=526.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=3.363770/2.3, alarm hl:np_load_long=2.570801/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=526.000000M/150M, alarm hl:available=1/0 [03:02:31] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [03:03:31] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:06:22] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [03:07:12] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 408244 MB (7% inode=40%): [03:18:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:29:21] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.025391/1.00, alarm hl:np_load_long=0.902344/1.50, alarm hl:mem_free=17409.000000M/600M, alarm hl:available=1/0 [03:30:21] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:35:22] Load avg. on willow is WARNING: WARNING - load average: 19.86, 19.54, 20.00 [03:36:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 22.08, 20.13, 20.17 [03:39:21] Load avg. on willow is WARNING: WARNING - load average: 18.49, 19.46, 19.91 [03:42:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:48:21] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 45620.000000 [03:49:11] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 45660 [03:53:21] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.10, alarm hl:np_load_long=0.955078/1.55, alarm hl:mem_free=16902.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.00, alarm hl:np_load_long=0.955078/1.50, alarm hl:mem_free=16902.000000M/600M, alarm hl:available=1/0 [03:54:22] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:59:32] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [04:00:23] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:02:32] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [04:02:43] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.390137/1.95, alarm hl:np_load_avg=2.358887/2.0, alarm hl:mem_free=129.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.390137/2.3, alarm hl:np_load_long=2.399414/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=129.000000M/150M, alarm hl:available=1/0 [04:03:24] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.618164/1.10, alarm hl:np_load_long=1.036133/1.55, alarm hl:mem_free=16874.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.618164/1.00, alarm hl:np_load_long=1.036133/1.50, alarm hl:mem_free=16874.000000M/600M, alarm hl:available=1/0 [04:03:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:06:23] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [04:07:11] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 408199 MB (7% inode=40%): [04:11:31] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:12:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:18:22] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:24:22] Load avg. on willow is WARNING: WARNING - load average: 18.87, 18.68, 18.91 [04:26:42] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129966 MB (10% inode=99%): [04:30:22] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:31:21] Load avg. on willow is WARNING: WARNING - load average: 18.71, 18.14, 18.50 [04:35:22] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:40:32] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:42:43] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:43:22] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.243164/1.10, alarm hl:np_load_long=1.166992/1.55, alarm hl:mem_free=16994.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.243164/1.00, alarm hl:np_load_long=1.166992/1.50, alarm hl:mem_free=16994.000000M/600M, alarm hl:available=1/0 [04:45:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:48:21] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 48520.000000 [04:49:11] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 48559 [04:52:02] DeltaQuad * [Toolserver-l] Toolserver having a bad day? [04:52:51] /sql on cassia is OK: DISK OK - free space: /sql 132117 MB (11% inode=99%): [04:59:31] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [05:00:23] Load avg. on willow is WARNING: WARNING - load average: 21.30, 20.08, 19.65 [05:00:33] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:01:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 24.11, 21.21, 20.08 [05:02:33] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [05:03:22] Load avg. on willow is WARNING: WARNING - load average: 19.57, 20.50, 19.96 [05:03:31] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.422363/1.95, alarm hl:np_load_avg=2.564453/2.0, alarm hl:mem_free=410.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.422363/2.3, alarm hl:np_load_long=2.494629/2.5, alarm hl:cpu=99.500000/98, alarm hl:mem_free=410.000000M/150M, alarm hl:available=1/0 [05:07:13] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 408163 MB (7% inode=40%): [05:07:23] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [05:12:47] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:16:23] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:18:32] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:22:22] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.084961/1.00, alarm hl:np_load_long=1.104492/1.50, alarm hl:mem_free=15489.000000M/600M, alarm hl:available=1/0 [05:22:30] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:22:41] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:23:22] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [05:23:32] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [05:25:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:32:06] The query killer died apparently [05:34:21] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.074219/1.00, alarm hl:np_load_long=1.043945/1.50, alarm hl:mem_free=16739.000000M/600M, alarm hl:available=1/0 [05:39:22] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:41:01] [[User:Ferrinojv1647]] M 10https://wiki.toolserver.org/w/index.php?diff=7110&oldid=7108&rcid=9387 * Dispenser * (-74) (Removed possible spam link) [05:42:20] Load avg. on willow is CRITICAL: CRITICAL - load average: 19.54, 23.75, 22.66 [05:42:51] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:48:31] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 51408.000000 [05:50:12] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 51497 [05:59:21] Load avg. on willow is WARNING: WARNING - load average: 17.88, 18.12, 19.91 [05:59:41] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [06:00:41] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:03:41] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.317383/1.95, alarm hl:np_load_avg=2.305664/2.0, alarm hl:mem_free=687.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.317383/2.3, alarm hl:np_load_long=2.454102/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=687.000000M/150M, alarm hl:available=1/0 [06:08:12] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 408071 MB (7% inode=40%): [06:11:22] Load avg. on willow is CRITICAL: CRITICAL - load average: 25.37, 20.64, 20.03 [06:13:42] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:16:23] Load avg. on willow is WARNING: WARNING - load average: 19.09, 20.04, 19.97 [06:18:32] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:23:31] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [06:23:43] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [06:27:52] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129890 MB (10% inode=99%): [06:42:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:48:21] Free Memory on damiana is WARNING: WARNING - 7.0% (291676 kB) free! [06:48:31] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 54386.000000 [06:50:12] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 54472 [06:55:00] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:00] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:11] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:23] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:23] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:23] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:23] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:43] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:43] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:44] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:44] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:45] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:45] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:55:46] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:01] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:01] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:01] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:56:11] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [06:56:21] / on z-dat-s4-a is OK: DISK OK - free space: / 8450 MB (28% inode=85%): [06:56:21] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 175743 MB (18% inode=99%): [06:56:21] Load avg. on z-dat-s4-a is OK: OK - load average: 1.36, 1.89, 2.40 [06:56:21] Load avg. on z-dat-s3-a is OK: OK - load average: 1.36, 1.89, 2.40 [06:56:21] Load avg. on z-dat-s6-a is OK: OK - load average: 1.36, 1.89, 2.40 [06:56:22] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 97420 MB (24% inode=99%): [06:56:22] / on z-dat-s3-a is OK: DISK OK - free space: / 8450 MB (28% inode=85%): [06:56:23] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 1943 MB (99% inode=99%): [06:56:23] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 175743 MB (18% inode=99%): [06:56:24] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 1942 MB (99% inode=99%): [06:56:24] / on z-dat-s7-a is OK: DISK OK - free space: / 8450 MB (28% inode=85%): [06:56:25] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 1946 MB (99% inode=99%): [06:56:25] / on z-dat-s6-a is OK: DISK OK - free space: / 8450 MB (28% inode=85%): [06:56:26] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 101505 MB (25% inode=99%): [06:56:40] SMTP on hyacinth is OK: SMTP OK - 0.004 sec. response time [06:56:40] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [06:56:51] SMTP on z-dat-s7-a is OK: SMTP OK - 0.004 sec. response time [06:56:52] SMTP on z-dat-s6-a is OK: SMTP OK - 0.003 sec. response time [06:59:42] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [07:00:42] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:02:44] @replag [07:02:45] matthewrbowker: s1-rr-a: 15h 18m 23s [+0.83 s/s]; s1-user: 15h 18m 24s [+0.83 s/s]; s2-user: 13s [+0.00 s/s]; s3-rr-a: 1m 24s [+0.00 s/s]; s3-user: 1m 24s [+0.00 s/s] [07:02:52] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:03:42] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.335938/1.95, alarm hl:np_load_avg=2.263184/2.0, alarm hl:mem_free=793.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.335938/2.3, alarm hl:np_load_long=2.274414/2.5, alarm hl:cpu=96.300000/98, alarm hl:mem_free=793.000000M/150M, alarm hl:available=1/0 [07:05:00] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 1.599 second response time [07:05:12] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.557 second response time [07:05:42] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [07:06:52] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.124 second response time [07:07:10] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.025 second response time [07:07:43] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:09:10] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407964 MB (7% inode=40%): [07:11:51] /sql on cassia is OK: DISK OK - free space: /sql 131554 MB (11% inode=99%): [07:12:21] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:13:21] Load avg. on willow is WARNING: WARNING - load average: 19.63, 19.26, 18.68 [07:13:31] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:14:41] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:17:31] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [07:19:31] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:20:32] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:22:21] Free Memory on damiana is CRITICAL: CRITICAL - 5.0% (211300 kB) free! [07:35:43] Load avg. on wolfsbane is CRITICAL: Connection refused by host [07:35:53] Free Memory on damiana is OK: OK - 64.2% (2687592 kB) free. [07:36:23] / on wolfsbane is CRITICAL: Connection refused by host [07:36:23] /tmp on wolfsbane is CRITICAL: Connection refused by host [07:36:23] Load avg. on willow is OK: OK - load average: 14.30, 12.58, 14.88 [07:36:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [07:36:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [07:37:04] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [07:42:03] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.199219/1.95, alarm hl:np_load_avg=1.946289/2.0, alarm hl:mem_free=305.000000M/350M, alarm hl:available=1/0 [07:42:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:45:23] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.152344/1.10, alarm hl:np_load_long=0.814453/1.55, alarm hl:mem_free=16921.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.152344/1.00, alarm hl:np_load_long=0.814453/1.50, alarm hl:mem_free=16921.000000M/600M, alarm hl:available=1/0 [07:46:24] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [07:48:53] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 57405.000000 [07:49:03] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [07:49:23] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.013672/1.00, alarm hl:np_load_long=0.863281/1.50, alarm hl:mem_free=17510.000000M/600M, alarm hl:available=1/0 [07:50:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 57472 [07:51:04] SMF on wolfsbane is CRITICAL: Connection refused by host [07:53:24] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [07:58:23] Load avg. on willow is WARNING: WARNING - load average: 17.50, 17.75, 16.96 [07:59:43] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [08:00:53] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:09:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407888 MB (7% inode=40%): [08:09:39] [[Special:Log/newusers]] create 10 * Duplifinder * (New user account) [08:10:24] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.91, 22.68, 19.28 [08:11:03] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=3.864258/1.95, alarm hl:np_load_avg=2.834961/2.0, alarm hl:mem_free=63.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=3.864258/2.3, alarm hl:np_load_long=2.410156/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=63.000000M/150M, alarm hl:available=1/0 [08:14:43] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:20:13] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [08:26:03] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:26:34] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:26:34] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:26:43] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [08:27:23] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:27:23] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:27:23] Load avg. on willow is WARNING: WARNING - load average: 18.50, 19.04, 19.97 [08:31:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 24.31, 20.24, 20.11 [08:32:24] Load avg. on willow is WARNING: WARNING - load average: 19.77, 19.67, 19.91 [08:34:24] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:34:24] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:34:34] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:34] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:34] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:34] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:34:34] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:34:34] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:34] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:35] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:43] Load avg. on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:54] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:54] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:34:54] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:03] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:03] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:03] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:13] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:13] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:13] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:13] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:23] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:35:44] Load avg. on wolfsbane is CRITICAL: Connection refused by host [08:35:54] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [08:36:04] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [08:36:04] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [08:36:14] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [08:36:14] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [08:36:24] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [08:36:24] / on wolfsbane is CRITICAL: Connection refused by host [08:36:24] /tmp on wolfsbane is CRITICAL: Connection refused by host [08:36:33] / on z-dat-s6-a is OK: DISK OK - free space: / 8445 MB (28% inode=85%): [08:36:33] / on z-dat-s3-a is OK: DISK OK - free space: / 8445 MB (28% inode=85%): [08:36:33] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 176483 MB (18% inode=99%): [08:36:33] SMF on z-dat-s6-a is OK: OK - all services online [08:36:34] SMF on z-dat-s7-a is OK: OK - all services online [08:36:34] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 257.000000 [08:36:34] Cluster on wolfsbane is CRITICAL: Connection refused by host [08:36:34] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2157 MB (99% inode=99%): [08:36:35] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 176480 MB (18% inode=99%): [08:36:44] MySQL slave on z-dat-s3-a is OK: Uptime: 4575982 Threads: 19 Questions: 5204170583 Slow queries: 254103 Opens: 39329129 Flush tables: 1 Open tables: 16384 Queries per second avg: 1137.279 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 274 [08:36:44] MySQL slave on z-dat-s6-a is OK: Uptime: 1516561 Threads: 17 Questions: 359833204 Slow queries: 90559 Opens: 3850358 Flush tables: 2 Open tables: 2879 Queries per second avg: 237.269 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 305 [08:36:44] MySQL slave on z-dat-s7-a is OK: Uptime: 5008403 Threads: 13 Questions: 1171899299 Slow queries: 145982 Opens: 9560766 Flush tables: 1 Open tables: 7466 Queries per second avg: 233.986 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 324 [08:36:44] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [08:36:44] Load avg. on z-dat-s7-a is OK: OK - load average: 0.18, 0.80, 1.30 [08:36:45] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2152 MB (99% inode=99%): [08:36:45] SMF on hyacinth is OK: OK - all services online [08:36:46] Load avg. on z-dat-s3-a is OK: OK - load average: 0.23, 0.80, 1.30 [08:36:46] MySQL on z-dat-s3-a is OK: Uptime: 4575989 Threads: 19 Questions: 5204171690 Slow queries: 254103 Opens: 39329202 Flush tables: 1 Open tables: 16384 Queries per second avg: 1137.278 [08:36:47] MySQL on z-dat-s7-a is OK: Uptime: 5008409 Threads: 10 Questions: 1171899819 Slow queries: 145986 Opens: 9560766 Flush tables: 1 Open tables: 7466 Queries per second avg: 233.986 [08:36:53] / on hyacinth is OK: DISK OK - free space: / 8445 MB (28% inode=85%): [08:37:03] /tmp on hyacinth is OK: DISK OK - free space: /tmp 2203 MB (99% inode=99%): [08:37:03] SMF on z-dat-s3-a is OK: OK - all services online [08:37:03] SMF on z-dat-s4-a is OK: OK - all services online [08:37:12] Load avg. on hyacinth is OK: OK - load average: 1.82, 1.14, 1.40 [08:37:12] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [08:37:12] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:37:13] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:42:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:49:33] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:49:45] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:49:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 60420.000000 [08:50:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 60436 [08:52:03] SMF on wolfsbane is CRITICAL: Connection refused by host [08:53:03] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [08:53:24] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.045 second response time [08:53:24] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [08:53:33] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.018 second response time [08:55:23] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:24] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:44] Load avg. on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:03] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:03] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:03] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:03] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:12] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:13] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:13] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:13] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:13] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:13] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:56:14] Load avg. on hyacinth is OK: OK - load average: 0.30, 0.81, 1.28 [08:56:32] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 176700 MB (18% inode=99%): [08:56:43] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 1950 MB (99% inode=99%): [08:56:44] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 1951 MB (99% inode=99%): [08:56:44] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 101435 MB (25% inode=99%): [08:56:44] Load avg. on z-dat-s3-a is OK: OK - load average: 1.41, 1.02, 1.34 [08:56:44] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 1971 MB (99% inode=99%): [08:56:44] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 1969 MB (99% inode=99%): [08:56:44] Load avg. on z-dat-s6-a is OK: OK - load average: 1.43, 1.03, 1.34 [08:56:44] Load avg. on z-dat-s4-a is OK: OK - load average: 1.43, 1.03, 1.34 [08:56:45] Load avg. on z-dat-s7-a is OK: OK - load average: 1.43, 1.03, 1.34 [08:59:23] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.139649/1.10, alarm hl:np_load_long=0.855469/1.55, alarm hl:mem_free=17044.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.139649/1.00, alarm hl:np_load_long=0.855469/1.50, alarm hl:mem_free=17044.000000M/600M, alarm hl:available=1/0 [08:59:44] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [08:59:53] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130048 MB (10% inode=99%): [09:00:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [09:00:44] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:00:53] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:01:32] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:53] /sql on cassia is OK: DISK OK - free space: /sql 130165 MB (11% inode=99%): [09:03:03] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.202637/1.95, alarm hl:np_load_avg=2.220215/2.0, alarm hl:mem_free=458.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.202637/2.3, alarm hl:np_load_long=2.162598/2.5, alarm hl:cpu=98.500000/98, alarm hl:mem_free=458.000000M/150M, alarm hl:available=1/0 [09:09:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407800 MB (7% inode=40%): [09:13:22] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.092774/1.00, alarm hl:np_load_long=0.931640/1.50, alarm hl:mem_free=16973.000000M/600M, alarm hl:available=1/0 [09:14:43] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:17:47] [[User:Tparis/Index]] !N 10https://wiki.toolserver.org/w/index.php?oldid=7111&rcid=9389 * Forstbirdo * (+43) (Created page with "Vidu Esperanta Vikipedio, uzanto:Forstbirdo") [09:20:13] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [09:31:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.09, 24.88, 20.61 [09:32:22] [[Special:Log/newusers]] create 10 * Flightatt865 * (New user account) [09:35:44] Load avg. on wolfsbane is CRITICAL: Connection refused by host [09:37:23] / on wolfsbane is CRITICAL: Connection refused by host [09:37:23] /tmp on wolfsbane is CRITICAL: Connection refused by host [09:37:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [09:37:44] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [09:39:19] hi all [09:39:29] why the heck is the toolserver down >_> [09:40:13] server: yarrow DOWN Re-Setup in progress. [09:40:52] and yet wolfsbane isn't working either [09:41:05] oh phewey, no one else is here to notice and fix it :( [09:42:21] Magog_the_Ogre: there is no 'the toolserver', and I have no idea why you would want to do anything with yarrow or wolfsbane directly [09:42:24] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:29] (as in: there are multiple servers) [09:42:33] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:44] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:42:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:42:53] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:53] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:42:58] login (=willow) works and http://toolserver.org also works [09:43:01] um ,because none of the toolservers are up right now? [09:43:03] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:43:07] login is working fine [09:43:11] the web interface is down [09:43:13] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:43:16] http://toolserver.org/~magog/fileinfo.php [09:43:33] NTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:35] http://toolserver.org/~magnus/commonshelper.php [09:43:37] how is that 'down'? [09:43:43] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:43:46] I see a web page [09:43:55] Internal Server Error (500) [09:43:55] The server encountered an internal error and is unable to complete your request at this time. If the problem persists, please contact the owner of the tool you are trying to use and inform them of this error, quoting the following information: [09:43:55] Request host: toolserver.org [09:43:55] Request path: GET /zeus-php-handler [09:43:55] HTTP server at toolserver.org - ts-admins [at] toolserver [dot] org [09:44:17] another error: Bad Gateway [09:44:17] An error occurred while communicating with another application or an upstream server. [09:44:17] There may be more information about this error in the server's error logs. [09:44:17] If you have any queries about this error, please e-mail ts-admins@toolserver.org. [09:44:17] Back to toolserver.org homepage [09:44:18] [09:44:20] [ Powered by Zeus Web Server ] [09:44:24] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:44:24] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:44:43] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:44:43] http://wolfsbane.toolserver.org/~magnus/commonshelper.php works for me [09:44:53] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:44:53] Load avg. on z-dat-s3-a is OK: OK - load average: 0.06, 0.94, 1.43 [09:44:53] SMTP on hyacinth is OK: SMTP OK - 3.617 sec. response time [09:44:53] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 247.000000 [09:44:53] MySQL on z-dat-s7-a is OK: Uptime: 5012494 Threads: 5 Questions: 1172116779 Slow queries: 146082 Opens: 9560796 Flush tables: 1 Open tables: 7466 Queries per second avg: 233.839 [09:44:54] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:44:54] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:45:03] MySQL slave on z-dat-s7-a is OK: Uptime: 5012505 Threads: 5 Questions: 1172120674 Slow queries: 146082 Opens: 9560796 Flush tables: 1 Open tables: 7466 Queries per second avg: 233.839 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 153 [09:45:04] ...just came back up [09:45:13] SMTP on z-dat-s4-a is OK: SMTP OK - 0.031 sec. response time [09:45:13] SMTP on z-dat-s7-a is OK: SMTP OK - 0.004 sec. response time [09:45:13] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:45:23] NTP on hyacinth is OK: NTP OK: Offset -0.002743 secs [09:45:24] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [09:45:24] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.195312/1.10, alarm hl:np_load_long=1.318359/1.55, alarm hl:mem_free=17003.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.195312/1.00, alarm hl:np_load_long=1.318359/1.50, alarm hl:mem_free=17003.000000M/600M, alarm hl:available=1/0 [09:45:24] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:47:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [09:47:45] ...and it's down again [09:49:23] Load avg. on willow is WARNING: WARNING - load average: 15.68, 18.45, 19.84 [09:49:28] probably resource issues then [09:49:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 63494.000000 [09:50:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 63507 [09:50:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.67, 19.26, 20.02 [09:50:57] the server is very slow indeed [09:51:05] =wolfsbane [09:51:23] Load avg. on willow is WARNING: WARNING - load average: 17.20, 18.42, 19.67 [09:52:11] valhallasw@wolfsbane:~$ top [09:52:11] -bash: fork: Not enough space [09:53:03] SMF on wolfsbane is CRITICAL: Connection refused by host [09:54:01] HEH [09:54:03] *heh [09:54:10] I wonder what's causing it to spasm [09:54:22] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [10:00:03] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [10:00:43] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [10:00:53] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:03:13] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.967285/1.95, alarm hl:np_load_avg=2.056152/2.0, alarm hl:mem_free=921.000000M/350M, alarm hl:available=1/0 [10:03:54] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129667 MB (10% inode=99%): [10:09:53] /sql on cassia is OK: DISK OK - free space: /sql 130158 MB (11% inode=99%): [10:10:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407357 MB (7% inode=40%): [10:14:53] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:15:53] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130101 MB (10% inode=99%): [10:20:12] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [10:25:38] hmm. web server is slow for static pages too. [10:31:44] Sun Grid Engine execd on willow is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:32:14] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.791016/1.95, alarm hl:np_load_avg=2.591309/2.0, alarm hl:mem_free=81.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.791016/2.3, alarm hl:np_load_long=2.418945/2.5, alarm hl:cpu=99.300000/98, alarm hl:mem_free=81.000000M/150M, alarm hl:available=1/0 [10:35:54] Load avg. on wolfsbane is CRITICAL: Connection refused by host [10:37:23] / on wolfsbane is CRITICAL: Connection refused by host [10:37:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [10:38:23] /tmp on wolfsbane is CRITICAL: Connection refused by host [10:38:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [10:43:02] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:44:39] soo... doesn't solaris have an OOM killer like linux? [10:47:52] 3(commented) [ACCAPP-488] Solving links to disambiguation pages in Wikipedia via "Personalized" Crowdsourcing <10https://jira.toolserver.org/browse/ACCAPP-488> (Amr Ebaid) [10:49:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 66331.000000 [10:50:14] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 66345 [10:50:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 27.36, 21.89, 20.30 [10:53:13] SMF on wolfsbane is CRITICAL: Connection refused by host [10:54:23] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [10:58:23] Load avg. on willow is WARNING: WARNING - load average: 17.96, 19.56, 19.86 [11:00:54] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [11:00:54] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:01:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 22.67, 20.33, 20.05 [11:03:33] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:04:54] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130103 MB (10% inode=99%): [11:05:54] /sql on cassia is OK: DISK OK - free space: /sql 130144 MB (11% inode=99%): [11:10:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407271 MB (7% inode=40%): [11:12:54] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130117 MB (10% inode=99%): [11:14:24] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.033 second response time [11:14:54] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:18:33] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:20:23] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [11:32:24] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.359375/1.95, alarm hl:np_load_avg=2.511719/2.0, alarm hl:mem_free=202.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.359375/2.3, alarm hl:np_load_long=2.522949/2.5, alarm hl:cpu=99.400000/98, alarm hl:mem_free=202.000000M/150M, alarm hl:available=1/0 [11:35:55] Load avg. on wolfsbane is CRITICAL: Connection refused by host [11:35:55] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:37:23] / on wolfsbane is CRITICAL: Connection refused by host [11:38:23] /tmp on wolfsbane is CRITICAL: Connection refused by host [11:38:32] Cluster on wolfsbane is CRITICAL: Connection refused by host [11:39:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [11:43:03] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:43:33] Load avg. on willow is WARNING: WARNING - load average: 16.76, 16.59, 18.02 [11:44:23] /tmp on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:44:33] Cluster on wolfsbane is OK: CLUSTER OK ! [11:44:43] Environment IPMI on wolfsbane is OK: ok: temperature ok fan ok voltage ok chassis ok [11:44:53] Load avg. on wolfsbane is OK: OK - load average: 0.32, 0.33, 0.34 [11:45:24] SMF on wolfsbane is OK: OK - all services online [11:45:24] / on wolfsbane is OK: DISK OK - free space: / 9541 MB (31% inode=93%): [11:45:24] /tmp on wolfsbane is OK: DISK OK - free space: /tmp 78 MB (27% inode=99%): [11:46:33] Sun Grid Engine execd on wolfsbane is UNKNOWN: NRPE: Unable to read output [11:49:12] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:49:33] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:50:13] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [11:50:42] Sun Grid Engine execd on wolfsbane is UNKNOWN: NRPE: Unable to read output [11:50:53] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 69041.000000 [11:51:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 69056 [11:51:53] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130089 MB (10% inode=99%): [11:57:22] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.005 second response time [11:57:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.006 second response time [11:58:53] /sql on cassia is OK: DISK OK - free space: /sql 130210 MB (11% inode=99%): [12:00:53] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [12:00:54] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:01:33] /tmp on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:01:33] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:01:54] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:01:55] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129830 MB (10% inode=99%): [12:04:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [12:05:24] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.10, alarm hl:np_load_long=0.847656/1.55, alarm hl:mem_free=16713.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.00, alarm hl:np_load_long=0.847656/1.50, alarm hl:mem_free=16713.000000M/600M, alarm hl:available=1/0 [12:05:43] Cluster on wolfsbane is UNKNOWN: NRPE: Unable to read output [12:05:55] Environment IPMI on wolfsbane is OK: ok: temperature ok fan ok voltage ok chassis ok [12:06:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [12:06:34] Cluster on wolfsbane is CRITICAL: Connection refused by host [12:10:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407203 MB (7% inode=40%): [12:14:53] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:16:54] Load avg. on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:17:53] Load avg. on wolfsbane is OK: OK - load average: 0.24, 0.29, 0.36 [12:20:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [12:20:24] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.453125/1.10, alarm hl:np_load_long=0.905273/1.55, alarm hl:mem_free=16536.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.453125/1.00, alarm hl:np_load_long=0.905273/1.50, alarm hl:mem_free=16536.000000M/600M, alarm hl:available=1/0 [12:20:54] Load avg. on wolfsbane is CRITICAL: Connection refused by host [12:21:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [12:22:24] / on wolfsbane is CRITICAL: Connection refused by host [12:25:03] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:03] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:13] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:14] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:14] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:14] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:25] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:25] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:25] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:25] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:25] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:25] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:25] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:26] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:34] NTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:34] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:34] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:43] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:43] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:53] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:53] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:53] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:53] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:25:54] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:03] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:04] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:13] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:23] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:26:23] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [12:26:23] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [12:26:23] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [12:26:42] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [12:26:42] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [12:27:25] hmm, no admins around and ts has serious issues :-/ [12:27:43] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 101232 MB (25% inode=99%): [12:27:43] / on z-dat-s7-a is OK: DISK OK - free space: / 8444 MB (28% inode=85%): [12:27:43] / on z-dat-s3-a is OK: DISK OK - free space: / 8444 MB (28% inode=85%): [12:27:43] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2130 MB (99% inode=99%): [12:27:43] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 176043 MB (18% inode=99%): [12:27:43] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2137 MB (99% inode=99%): [12:27:43] / on z-dat-s6-a is OK: DISK OK - free space: / 8444 MB (28% inode=85%): [12:27:44] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 176043 MB (18% inode=99%): [12:27:44] SMF on z-dat-s3-a is OK: OK - all services online [12:27:45] SMF on z-dat-s7-a is OK: OK - all services online [12:27:45] SMF on z-dat-s6-a is OK: OK - all services online [12:27:46] MySQL on z-dat-s7-a is OK: Uptime: 5022272 Threads: 10 Questions: 1173837958 Slow queries: 146425 Opens: 9580460 Flush tables: 1 Open tables: 7439 Queries per second avg: 233.726 [12:28:02] MySQL slave on z-dat-s3-a is OK: Uptime: 4589865 Threads: 22 Questions: 5221313122 Slow queries: 254927 Opens: 39572942 Flush tables: 1 Open tables: 16384 Queries per second avg: 1137.574 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 273 [12:28:12] SMTP on z-dat-s6-a is OK: SMTP OK - 0.003 sec. response time [12:28:12] SMTP on z-dat-s7-a is OK: SMTP OK - 0.011 sec. response time [12:28:12] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:28:13] SMTP on z-dat-s3-a is OK: SMTP OK - 0.015 sec. response time [12:28:13] SMTP on z-dat-s4-a is OK: SMTP OK - 0.037 sec. response time [12:28:13] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:28:13] SMF on z-dat-s4-a is OK: OK - all services online [12:28:22] NTP on hyacinth is OK: NTP OK: Offset -0.000808 secs [12:28:22] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:28:23] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [12:28:23] / on z-dat-s4-a is OK: DISK OK - free space: / 8444 MB (28% inode=85%): [12:29:06] Danny_B|backup: we could just clobber the people running bots on the web server [12:30:05] yeah, those iw bots, they should have been killed already [12:30:22] totoazero - 661 MB RAM [12:30:25] for iw bot [12:30:38] ceradon 561 MB [12:30:52] another ceradon 508 [12:31:02] why ppl run more iw bots? [12:31:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [12:32:13] SMF on wolfsbane is CRITICAL: Connection refused by host [12:32:22] /tmp on wolfsbane is CRITICAL: Connection refused by host [12:32:25] Daniel_WMDE: is there anything you could do? [12:33:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.307129/1.95, alarm hl:np_load_avg=2.373535/2.0, alarm hl:mem_free=150.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.307129/2.3, alarm hl:np_load_long=2.267578/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=150.000000M/150M, alarm hl:available=1/0 [12:35:05] let's kill all iw bots now [12:35:22] johang@wolfsbane:~$ ps -A -o user,rss |sort |awk 'BEGIN { K = NULL; V = 0; N = 0; }; { if (K == $1) { V += $2; N += 1; } else { print K, V, N; K = $1; V = $2; N = 1; } }; END { print K, V, N; }' |sort -k 2 -n -r |head -n 10 [12:35:33] [[Roots]] N 10https://wiki.toolserver.org/w/index.php?oldid=7112&rcid=9391 * Dispenser * (+35) (Redirected page to [[System administrators]]) [12:35:33] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.814 second response time [12:35:37] cobi 1311552 144 [12:35:37] para 820268 109 [12:35:43] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.884 second response time [12:35:46] 144 processes, 1.3GB [12:36:23] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.007 second response time [12:36:53] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 2.910 second response time [12:37:29] about the same on willow [12:37:31] reza 903556 113 [12:37:31] beria 814340 94 [12:37:33] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [12:37:36] Reedy: Still got root? [12:37:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.315 second response time [12:43:03] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:43:33] Load avg. on willow is WARNING: WARNING - load average: 17.17, 17.14, 17.49 [12:43:33] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:12] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 71601 [12:51:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 71619.000000 [12:57:13] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:57:43] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [13:00:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [13:00:54] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [13:00:54] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:03:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.194336/1.95, alarm hl:np_load_avg=2.165527/2.0, alarm hl:mem_free=439.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.194336/2.3, alarm hl:np_load_long=2.143555/2.5, alarm hl:cpu=99.600000/98, alarm hl:mem_free=439.000000M/150M, alarm hl:available=1/0 [13:03:23] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.202148/1.10, alarm hl:np_load_long=0.755860/1.55, alarm hl:mem_free=16730.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.202148/1.00, alarm hl:np_load_long=0.755860/1.50, alarm hl:mem_free=16730.000000M/600M, alarm hl:available=1/0 [13:06:23] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [13:10:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407201 MB (7% inode=40%): [13:10:54] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129824 MB (10% inode=99%): [13:15:53] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:20:53] Load avg. on wolfsbane is CRITICAL: Connection refused by host [13:21:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [13:22:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [13:22:53] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 5.970 second response time [13:23:23] / on wolfsbane is CRITICAL: Connection refused by host [13:31:54] 3(commented) [UTRS-98] Account creation is impossible <10https://jira.toolserver.org/browse/UTRS-98> (Martijn Hoekstra) [13:32:23] /tmp on wolfsbane is CRITICAL: Connection refused by host [13:32:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [13:32:53] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.640 second response time [13:33:12] SMF on wolfsbane is CRITICAL: Connection refused by host [13:33:56] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 7.680 second response time [13:34:53] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.938 second response time [13:37:34] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [13:43:02] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:43:34] Load avg. on willow is WARNING: WARNING - load average: 15.70, 15.61, 16.55 [13:51:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 72209 [13:51:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [13:51:53] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 72197.000000 [13:54:34] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.796387/1.95, alarm hl:np_load_avg=1.857910/2.0, alarm hl:mem_free=213.000000M/350M, alarm hl:available=1/0 [13:57:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [14:00:53] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [14:00:53] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:10:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 407023 MB (7% inode=40%): [14:16:53] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:19:33] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.161133/1.95, alarm hl:np_load_avg=2.222656/2.0, alarm hl:mem_free=161.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.161133/2.3, alarm hl:np_load_long=2.164551/2.5, alarm hl:cpu=98.700000/98, alarm hl:mem_free=161.000000M/150M, alarm hl:available=1/0 [14:21:53] Load avg. on wolfsbane is CRITICAL: Connection refused by host [14:22:22] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [14:23:22] / on wolfsbane is CRITICAL: Connection refused by host [14:23:42] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [14:32:33] /tmp on wolfsbane is CRITICAL: Connection refused by host [14:32:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [14:33:13] SMF on wolfsbane is CRITICAL: Connection refused by host [14:37:34] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [14:37:44] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:37:54] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:41:34] /sql on z-dat-s4-a is WARNING: DISK WARNING - free space: /sql 39674 MB (9% inode=99%): [14:43:13] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:44:34] Load avg. on willow is WARNING: WARNING - load average: 17.75, 17.91, 17.56 [14:50:33] /sql on z-dat-s4-a is CRITICAL: DISK CRITICAL - free space: /sql 21944 MB (5% inode=99%): [14:51:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 71198 [14:51:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 71187.000000 [14:52:33] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 53395 MB (13% inode=99%): [14:53:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.012 second response time [14:53:51] @replag [14:53:56] Krinkle: s1-rr-a: 19h 45m 35s [-]; s1-user: 19h 45m 35s [-]; s2-user: 22m 12s [-] [14:54:02] Krinkle * Re: [Toolserver-l] Toolserver having a bad day? [14:55:34] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.006 second response time [15:00:54] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [15:00:54] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:10:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 406918 MB (7% inode=40%): [15:16:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:16:54] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:21:43] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:21:53] Load avg. on wolfsbane is CRITICAL: Connection refused by host [15:21:53] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:22] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [15:23:23] / on wolfsbane is CRITICAL: Connection refused by host [15:23:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [15:25:33] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.277 second response time [15:25:34] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.968750/1.95, alarm hl:np_load_avg=1.881836/2.0, alarm hl:mem_free=362.000000M/350M, alarm hl:available=1/0 [15:28:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:28:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.055 second response time [15:32:33] /tmp on wolfsbane is CRITICAL: Connection refused by host [15:32:33] Cluster on wolfsbane is CRITICAL: Connection refused by host [15:33:14] SMF on wolfsbane is CRITICAL: Connection refused by host [15:37:34] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [15:37:53] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:43] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:38:55] [[Special:Log/newusers]] create 10 * Fma12 * (New user account) [15:43:23] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:44:34] Load avg. on willow is WARNING: WARNING - load average: 15.38, 15.38, 15.40 [15:47:02] Shubinator * Re: [Toolserver-l] Toolserver having a bad day? [15:51:13] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 69936 [15:51:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 69912.000000 [15:52:34] Load avg. on willow is OK: OK - load average: 13.24, 14.41, 14.94 [16:00:54] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [16:01:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:05:33] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.187 second response time [16:08:35] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.522 second response time [16:08:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.033 second response time [16:09:43] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 9.527 second response time [16:10:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 406798 MB (7% inode=40%): [16:11:53] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:12:33] Load avg. on willow is WARNING: WARNING - load average: 15.35, 14.80, 14.67 [16:14:34] Load avg. on willow is OK: OK - load average: 14.77, 14.88, 14.72 [16:16:54] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:18:23] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1951.000000 [16:18:43] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.262 second response time [16:21:54] Load avg. on wolfsbane is CRITICAL: Connection refused by host [16:22:23] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [16:23:23] / on wolfsbane is CRITICAL: Connection refused by host [16:23:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [16:32:34] /tmp on wolfsbane is CRITICAL: Connection refused by host [16:32:34] Load avg. on willow is WARNING: WARNING - load average: 16.96, 15.50, 14.77 [16:32:34] Cluster on wolfsbane is CRITICAL: Connection refused by host [16:33:12] SMF on wolfsbane is CRITICAL: Connection refused by host [16:38:34] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [16:40:43] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 9.949 second response time [16:41:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:22] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:44:54] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.454 second response time [16:46:24] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3627.000000 [16:48:03] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 8.412 second response time [16:48:56] @replag [16:48:56] Joan: s1-rr-a: 18h 51m 35s [-0.47 s/s]; s1-user: 18h 51m 35s [-0.47 s/s]; s2-user: 31m 2s [+0.08 s/s]; s2-user-c: 1h 9s [-]; s3-rr-a: 33s [-]; s3-user: 33s [-]; s5-user-c: 1h 9s [-] [16:49:23] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3590.000000 [16:51:23] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 67805 [16:51:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 67793.000000 [16:57:33] Load avg. on willow is WARNING: WARNING - load average: 13.10, 14.77, 15.25 [16:58:34] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.051 second response time [16:59:34] Load avg. on willow is OK: OK - load average: 13.10, 14.23, 14.99 [17:00:54] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [17:01:02] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:01:45] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:02:34] Load avg. on willow is WARNING: WARNING - load average: 14.91, 14.73, 15.06 [17:09:33] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.092 second response time [17:10:23] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1757.000000 [17:10:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 406681 MB (7% inode=40%): [17:10:53] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.109 second response time [17:13:22] ohi guyz, password reminder doesn't work [17:14:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:16:33] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.865234/1.95, alarm hl:np_load_avg=2.011230/2.0, alarm hl:mem_free=195.000000M/350M, alarm hl:available=1/0 [17:17:54] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:19:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:20:51] @replag [17:20:51] Joan: s1-rr-a: 18h 54m 58s [+0.11 s/s]; s1-user: 18h 54m 58s [+0.11 s/s]; s2-user-c: 2m 42s [-1.80 s/s]; s3-rr-a: 15s [-0.01 s/s]; s3-user: 15s [-0.01 s/s]; s5-user-c: 2m 42s [-1.80 s/s] [17:21:53] Load avg. on wolfsbane is CRITICAL: Connection refused by host [17:22:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [17:22:34] Load avg. on willow is OK: OK - load average: 12.66, 14.01, 15.00 [17:23:22] / on wolfsbane is CRITICAL: Connection refused by host [17:24:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [17:28:54] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.842 second response time [17:31:43] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 6.597 second response time [17:32:35] /tmp on wolfsbane is CRITICAL: Connection refused by host [17:32:35] Cluster on wolfsbane is CRITICAL: Connection refused by host [17:33:24] SMF on wolfsbane is CRITICAL: Connection refused by host [17:34:54] /sql on cassia is OK: DISK OK - free space: /sql 130537 MB (11% inode=99%): [17:37:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.735840/1.95, alarm hl:np_load_avg=1.740234/2.0, alarm hl:mem_free=236.000000M/350M, alarm hl:available=1/0 [17:39:02] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129296 MB (10% inode=99%): [17:39:34] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [17:43:23] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:45:03] /sql on cassia is OK: DISK OK - free space: /sql 130170 MB (11% inode=99%): [17:45:34] Load avg. on willow is WARNING: WARNING - load average: 19.00, 16.05, 15.09 [17:49:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:51:23] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 67683 [17:51:54] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 67684.000000 [17:52:34] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.994629/1.95, alarm hl:np_load_avg=2.004883/2.0, alarm hl:mem_free=627.000000M/350M, alarm hl:available=1/0 [17:54:34] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.356445/1.10, alarm hl:np_load_long=0.938477/1.55, alarm hl:mem_free=16811.000000M/500M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.356445/1.00, alarm hl:np_load_long=0.938477/1.50, alarm hl:mem_free=16811.000000M/600M, alarm hl:available=1/0 [17:55:34] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:57:49] dab [17:58:03] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130064 MB (10% inode=99%): [18:01:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:01:03] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [18:02:03] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 6.258 second response time [18:06:04] /sql on cassia is OK: DISK OK - free space: /sql 130463 MB (11% inode=99%): [18:06:43] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.591 second response time [18:07:44] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 8.606 second response time [18:08:05] @replag all [18:08:05] Joan: s1-rr-a: 18h 40m 29s [-0.31 s/s]; s1-user: 18h 40m 29s [-0.31 s/s]; s2-user: 2s [-0.39 s/s]; s2-user-c: 0s [-0.06 s/s]; s3-rr-a: 24s [+0.00 s/s]; s3-user: 24s [+0.00 s/s]; s4-rr-a: 0s [-]; s4-user: 0s [-] [18:08:06] Joan: s5-rr-a: 0s [-]; s5-user: 0s [-]; s5-user-c: 0s [-0.06 s/s]; s6-rr-a: 1s [-]; s6-user: 1s [-]; s7-rr-a: 0s [-]; s7-user: 0s [-] [18:09:03] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130057 MB (10% inode=99%): [18:10:34] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 406594 MB (7% inode=40%): [18:15:33] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.009766/1.95, alarm hl:np_load_avg=2.175781/2.0, alarm hl:mem_free=217.000000M/350M, alarm hl:available=1/0 [18:18:03] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:21:54] Load avg. on wolfsbane is CRITICAL: Connection refused by host [18:22:23] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [18:23:23] / on wolfsbane is CRITICAL: Connection refused by host [18:24:43] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [18:25:34] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:26:02] /sql on cassia is OK: DISK OK - free space: /sql 130261 MB (11% inode=99%): [18:29:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.120605/1.95, alarm hl:np_load_avg=1.950684/2.0, alarm hl:mem_free=696.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.120605/2.3, alarm hl:np_load_long=2.012207/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=696.000000M/150M, alarm hl:available=1/0 [18:29:54] 3(created) [MAGNUS-316] link to ESIS library in cas.php to be changed; Magnus' tools; Minor Bug <10https://jira.toolserver.org/browse/MAGNUS-316> (Gianluigi Gamba) [18:32:34] /tmp on wolfsbane is CRITICAL: Connection refused by host [18:33:24] SMF on wolfsbane is CRITICAL: Connection refused by host [18:33:34] Cluster on wolfsbane is CRITICAL: Connection refused by host [18:34:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:38:54] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.572 second response time [18:39:33] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [18:39:33] Load avg. on willow is OK: OK - load average: 12.23, 13.44, 14.98 [18:40:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:43:43] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:46:34] Load avg. on willow is WARNING: WARNING - load average: 16.66, 15.25, 15.19 [18:46:34] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.128418/1.95, alarm hl:np_load_avg=1.910156/2.0, alarm hl:mem_free=522.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.128418/2.3, alarm hl:np_load_long=1.900391/2.5, alarm hl:cpu=98.400000/98, alarm hl:mem_free=522.000000M/150M, alarm hl:available=1/0 [18:49:53] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.604 second response time [18:51:53] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 65939.000000 [18:52:23] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 65932 [18:52:33] Load avg. on willow is OK: OK - load average: 13.67, 14.53, 14.91 [18:59:35] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.021484/1.00, alarm hl:np_load_long=0.780273/1.50, alarm hl:mem_free=17272.000000M/600M, alarm hl:available=1/0 [19:00:33] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [19:01:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:01:03] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [19:05:33] Load avg. on willow is WARNING: WARNING - load average: 17.20, 15.46, 14.93 [19:07:44] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 2.741 second response time [19:08:29] [[User:Flightatt865]] !NM 10https://wiki.toolserver.org/w/index.php?oldid=7113&rcid=9393 * Flightatt865 * (+250) (Created page with "I created this page about flight attendant salary to provide orientation about the salaries ranges for a stewards My Webpage: [http://www.medical-assistant-salary-online.org/fl...") [19:09:44] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.673 second response time [19:10:43] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 406494 MB (7% inode=40%): [19:10:43] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.005 second response time [19:14:54] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 8.124 second response time [19:18:02] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:21:54] Load avg. on wolfsbane is CRITICAL: Connection refused by host [19:22:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [19:23:23] / on wolfsbane is CRITICAL: Connection refused by host [19:23:58] [[Special:Log/newusers]] create 10 * Caixaktn23 * (New user account) [19:24:44] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [19:27:10] [[User:Flightatt865]] ! 10https://wiki.toolserver.org/w/index.php?diff=7114&oldid=7113&rcid=9395 * 82.61.20.46 * (+16) (delete) [19:32:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:32:34] /tmp on wolfsbane is CRITICAL: Connection refused by host [19:33:44] Cluster on wolfsbane is CRITICAL: Connection refused by host [19:34:03] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130096 MB (10% inode=99%): [19:34:23] SMF on wolfsbane is CRITICAL: Connection refused by host [19:37:04] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.572 second response time [19:38:03] [[User:Caixaktn23]] !NM 10https://wiki.toolserver.org/w/index.php?oldid=7115&rcid=9396 * Caixaktn23 * (+503) (Created page with "Por favor, Asegúrese de Visitar página web acerca de la Caixa, la Caixa particulares, bancos y contenedores de España. El sitio de Blog ser considerado como un Zona el lugar...") [19:38:05] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 4.898 second response time [19:38:43] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.019 second response time [19:39:33] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [19:40:04] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.046 second response time [19:43:44] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:44:32] What on Earth... [19:44:43] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.297363/1.95, alarm hl:np_load_avg=2.122070/2.0, alarm hl:mem_free=195.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.297363/2.3, alarm hl:np_load_long=2.078613/2.5, alarm hl:cpu=100.000000/98, alarm hl:mem_free=195.000000M/150M, alarm hl:available=1/0 [19:45:09] [[Special:Log/delete]] delete 10 * MZMcBride * (deleted "[[02User:Flightatt86510]]": spam) [19:45:25] [[Special:Log/block]] block 10 * MZMcBride * (blocked [[02User:Flightatt86510]] with an expiry time of infinite (account creation disabled): inappropriate behavior) [19:45:47] [[Special:Log/delete]] delete 10 * MZMcBride * (deleted "[[02User:Caixaktn2310]]": spam) [19:46:00] [[Special:Log/block]] block 10 * MZMcBride * (blocked [[02User:Caixaktn2310]] with an expiry time of infinite (account creation disabled): inappropriate behavior) [19:46:03] /sql on cassia is OK: DISK OK - free space: /sql 130460 MB (11% inode=99%): [19:50:23] SMF on willow is OK: OK - all services online [19:51:03] /sql on cassia is WARNING: DISK WARNING - free space: /sql 129869 MB (10% inode=99%): [19:52:04] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 64918.000000 [19:52:23] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 64892 [19:52:34] hey guys [19:52:36] what's up [19:52:48] I logged on about 12 hours ago to report yarrow was having issues [19:52:52] now willow is having issues [19:52:59] -bash: fork: Not enough space [19:53:03] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:57:13] willow and wolfsbane is out of memory basically [19:57:52] due to people running various bots that hog all memory. [19:58:52] I looked at top and ps aux yesterday. [19:58:59] There was some strangeness. [19:59:12] No DaB. or nosy, I guess? [19:59:37] You'd think with like 300 accounts, you could find a few sysadmins willing to volunteer some time as roots... [19:59:40] I remember the last time this happened there was something I ran that was able to pinpoint the greatest transgressor [20:00:11] Well, there may be a few programs using a lot of memory. But what I was seeing was a lot of small processes using a bit of memory. [20:00:20] a lot --> 100+ in at least one case. [20:00:51] user darafsh and user reza each have 100 processes @ 900 MB at willow for example [20:01:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:01:03] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [20:01:14] I don't even know how this works, because I have a process that runs daily, and once or twice a month, it just dies an ugly out-of-memory death [20:01:28] I have to run it on my local machine [20:01:43] johang: Yeah... needs a root with a cluebat. [20:01:45] so I don't know how they manage to run something with enough memory to kill the TS [20:02:04] on wolfsbane user cobi has 140 processes @ 1.2GB (I thought 1GB was the limit!) [20:02:05] maybe it's a PHP thing, or maybe my program uses an outstandingly high amount of memory (it does) [20:02:14] wth [20:02:24] / on wolfsbane is OK: DISK OK - free space: / 8325 MB (27% inode=93%): [20:02:34] /tmp on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:02:37] he is using up almost 200GB of RAM then?! [20:02:44] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:02:44] Cluster on wolfsbane is OK: CLUSTER OK ! [20:02:47] no, total 1.2GB [20:02:53] Environment IPMI on wolfsbane is OK: ok: temperature ok fan ok voltage ok chassis ok [20:03:03] Load avg. on wolfsbane is OK: OK - load average: 1.72, 1.35, 1.07 [20:03:33] /tmp on wolfsbane is OK: DISK OK - free space: /tmp 319 MB (74% inode=99%): [20:03:34] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [20:05:28] oh whew [20:05:44] Load avg. on willow is WARNING: WARNING - load average: 16.83, 16.66, 16.89 [20:05:50] that's high but not pants-shittingly high [20:06:00] (pardon my French) [20:06:05] lol [20:06:24] 1.2GB is 15% of all memory on wolfsbane [20:06:42] question [20:06:48] Load is actually better today than it was yesterday. [20:06:57] But it's still fairly high. [20:07:07] earlier today I wouldn't even run "ls" [20:07:11] if we have a bot running on a one-time basis, but over a long time, should we be using job scheduling for that? [20:07:22] Magog_the_Ogre: yes. [20:10:03] /sql on cassia is OK: DISK OK - free space: /sql 130475 MB (11% inode=99%): [20:11:43] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 403818 MB (7% inode=40%): [20:13:03] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130052 MB (10% inode=99%): [20:17:43] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [20:23:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [20:25:23] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:26:23] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [20:26:44] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:27:12] Environment IPMI on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:27:53] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:04] Environment IPMI on adenia is OK: ok: temperature ok fan ok voltage ok chassis ok [20:28:13] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:34] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [20:32:04] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:32:33] Sun Grid Engine execd on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:32:44] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.972168/1.95, alarm hl:np_load_avg=1.887695/2.0, alarm hl:mem_free=589.000000M/350M, alarm hl:available=1/0 [20:33:44] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [20:34:44] Load avg. on willow is OK: OK - load average: 12.92, 14.21, 14.96 [20:37:04] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [20:37:44] Cluster on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:38:44] Cluster on wolfsbane is OK: CLUSTER OK ! [20:40:44] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.022 second response time [20:42:05] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.791 second response time [20:43:03] Load avg. on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:43:12] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:22] SMF on wolfsbane is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [20:43:54] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:43:54] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:44:04] /sql on cassia is OK: DISK OK - free space: /sql 130642 MB (11% inode=99%): [20:44:12] Load avg. on wolfsbane is OK: OK - load average: 0.33, 0.41, 0.61 [20:46:44] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.209961/1.95, alarm hl:np_load_avg=1.842285/2.0, alarm hl:mem_free=454.000000M/350M, alarm hl:available=1/0 [20:47:44] Load avg. on willow is WARNING: WARNING - load average: 15.13, 14.56, 14.38 [20:48:44] Load avg. on willow is OK: OK - load average: 14.82, 14.60, 14.41 [20:51:59] [[Special:Log/newusers]] create 10 * Caixakhn43 * (New user account) [20:52:03] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 63548.000000 [20:52:04] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:52:23] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 63525 [20:53:04] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:58:03] toolserver.org HTTP on wolfsbane is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.774 second response time [20:59:44] Cluster on wolfsbane is CRITICAL: Connection refused by host [20:59:53] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [21:00:04] Load avg. on wolfsbane is CRITICAL: Connection refused by host [21:00:22] / on wolfsbane is CRITICAL: Connection refused by host [21:00:32] /tmp on wolfsbane is CRITICAL: Connection refused by host [21:01:04] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:01:04] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [21:02:44] Load avg. on willow is WARNING: WARNING - load average: 16.25, 15.60, 14.82 [21:05:23] SMF on wolfsbane is CRITICAL: Connection refused by host [21:05:56] toolserver.org HTTP on ortelius is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 3.225 second response time [21:07:53] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.445 second response time [21:11:43] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 403692 MB (7% inode=40%): [21:17:32] Sun Grid Engine execd on wolfsbane is CRITICAL: Connection refused by host [21:23:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [21:26:39] [[User:Caixakhn43]] !NM 10https://wiki.toolserver.org/w/index.php?oldid=7116&rcid=9402 * Caixakhn43 * (+515) (Created page with "Por favor, Asegúrese de Visitar web acerca de la Caixa, la Caixa particulares, bancos instituciones bancarias las instituciones financieras y contenedores de España. El sitio...") [21:44:03] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:44:53] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.978516/1.95, alarm hl:np_load_avg=2.013184/2.0, alarm hl:mem_free=538.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.978516/2.3, alarm hl:np_load_long=2.047363/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=538.000000M/150M, alarm hl:available=1/0 [21:46:35] @replag [21:46:35] Joan: s1-rr-a: 16h 50m 2s [-0.51 s/s]; s1-user: 16h 50m 2s [-0.51 s/s] [21:52:04] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 60291.000000 [21:52:22] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 60262 [21:53:04] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:59:43] Cluster on wolfsbane is CRITICAL: Connection refused by host [22:00:04] Environment IPMI on wolfsbane is CRITICAL: Connection refused by host [22:00:04] Load avg. on wolfsbane is CRITICAL: Connection refused by host [22:00:23] / on wolfsbane is CRITICAL: Connection refused by host [22:00:33] /tmp on wolfsbane is CRITICAL: Connection refused by host [22:01:04] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [22:01:12] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:02:43] Load avg. on willow is WARNING: WARNING - load average: 16.71, 16.96, 16.86 [22:03:03] Environment IPMI on wolfsbane is OK: ok: temperature ok fan ok voltage ok chassis ok [22:03:03] Load avg. on wolfsbane is OK: OK - load average: 1.46, 1.09, 0.92 [22:03:22] / on wolfsbane is OK: DISK OK - free space: / 7858 MB (26% inode=93%): [22:03:33] /tmp on wolfsbane is OK: DISK OK - free space: /tmp 987 MB (89% inode=99%): [22:03:44] Cluster on wolfsbane is OK: CLUSTER OK ! [22:06:23] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [22:09:13] /sql on cassia is WARNING: DISK WARNING - free space: /sql 130046 MB (10% inode=99%): [22:11:42] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 402297 MB (7% inode=40%): [22:14:25] [[Special:Log/delete]] delete 10 * MZMcBride * (deleted "[[02User:Caixakhn4310]]": spam) [22:15:08] [[Special:Log/block]] block 10 * MZMcBride * (blocked [[02User:Caixakhn4310]] with an expiry time of infinite (account creation disabled): inappropriate behavior) [22:17:33] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [22:17:53] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:20:54] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.929688/1.95, alarm hl:np_load_avg=1.921387/2.0, alarm hl:mem_free=223.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.929688/2.3, alarm hl:np_load_long=2.013184/2.5, alarm hl:cpu=99.200000/98, alarm hl:mem_free=223.000000M/150M, alarm hl:available=1/0 [22:23:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [22:36:53] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:44:53] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.052246/1.95, alarm hl:np_load_avg=1.940918/2.0, alarm hl:mem_free=491.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.052246/2.3, alarm hl:np_load_long=1.958496/2.5, alarm hl:cpu=99.900000/98, alarm hl:mem_free=491.000000M/150M, alarm hl:available=1/0 [22:45:02] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:49:14] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [22:50:13] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:52:13] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 57989.000000 [22:52:24] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 57986 [22:58:53] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:01:14] APT on yarrow is CRITICAL: APT CRITICAL: 9 packages available for upgrade (9 critical updates). [23:01:14] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:01:53] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.219238/1.95, alarm hl:np_load_avg=2.029785/2.0, alarm hl:mem_free=615.000000M/350M, alarm hl:available=1/0 [23:02:43] Load avg. on willow is WARNING: WARNING - load average: 16.45, 16.09, 16.12 [23:07:23] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [23:11:43] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 400934 MB (7% inode=40%): [23:15:14] SMF on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [23:16:14] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:17:44] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [23:23:29] @replag [23:23:29] Joan: s1-rr-a: 15h 38m 11s [-0.74 s/s]; s1-user: 15h 38m 11s [-0.74 s/s]; s3-rr-a: 21s [-0.00 s/s]; s3-user: 21s [-0.00 s/s] [23:23:44] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default offline: svc:/system/cluster/scsymon-srv:default [23:45:03] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:52:14] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 54502.000000 [23:52:22] MySQL slave on thyme is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 54485