[00:00:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.16, 19.79, 17.00 [00:09:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.670410/1.75, alarm hl:np_load_avg=1.995117/2.0, alarm hl:mem_free=130.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.670410/1.9, alarm hl:np_load_long=2.025391/2.25, alarm hl:mem_free=130.000000M/200M, alarm hl:available=1/0 [00:13:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:20:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [00:20:48] Load avg. on willow is WARNING: WARNING - load average: 21.85, 20.15, 18.51 [00:21:38] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:25:08] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:32:27] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 174363.000000 [00:37:00] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [00:37:18] SSH on adenia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:39:18] SSH on adenia is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [00:39:38] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:41:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [00:48:01] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:51:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.170410/1.75, alarm hl:np_load_avg=2.074219/2.0, alarm hl:mem_free=780.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.170410/1.9, alarm hl:np_load_long=2.117188/2.25, alarm hl:mem_free=780.000000M/200M, alarm hl:available=1/0 [01:10:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 27.78, 22.72, 20.06 [01:11:49] Load avg. on willow is WARNING: WARNING - load average: 21.90, 22.04, 20.00 [01:12:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.16, 21.88, 20.07 [01:13:38] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:20:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [01:21:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:25:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:32:28] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 177963.000000 [01:37:59] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [01:39:38] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:41:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [01:43:19] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.275391/1.10, alarm hl:np_load_long=0.657227/1.55, alarm hl:mem_free=19463.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.275391/1.00, alarm hl:np_load_long=0.657227/1.50, alarm hl:mem_free=19463.000000M/350M, alarm hl:available=1/0 [01:46:18] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [01:50:48] Load avg. on willow is WARNING: WARNING - load average: 24.92, 19.83, 18.82 [01:52:07] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.202148/1.75, alarm hl:np_load_avg=2.337891/2.0, alarm hl:mem_free=271.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.202148/1.9, alarm hl:np_load_long=2.311523/2.25, alarm hl:mem_free=271.000000M/200M, alarm hl:available=1/0 [02:00:48] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.61, 21.77, 19.51 [02:01:49] Load avg. on willow is WARNING: WARNING - load average: 19.75, 20.27, 19.12 [02:10:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.04, 22.95, 20.23 [02:13:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:20:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [02:21:38] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:25:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:32:28] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 181564.000000 [02:38:20] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [02:39:38] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:41:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:49:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 19.52, 20.14, 20.47 [02:52:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.259277/1.75, alarm hl:np_load_avg=2.515137/2.0, alarm hl:mem_free=361.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.259277/1.9, alarm hl:np_load_long=2.559570/2.25, alarm hl:mem_free=361.000000M/200M, alarm hl:available=1/0 [02:57:48] Load avg. on willow is WARNING: WARNING - load average: 16.77, 18.97, 19.92 [03:00:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.54, 21.83, 20.70 [03:04:49] Load avg. on willow is WARNING: WARNING - load average: 16.52, 19.18, 19.86 [03:14:38] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:20:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [03:21:38] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:25:28] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:32:29] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 185165.000000 [03:38:20] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [03:39:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:41:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [03:50:48] Load avg. on willow is WARNING: WARNING - load average: 21.50, 18.23, 17.99 [03:52:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.301758/1.75, alarm hl:np_load_avg=2.248047/2.0, alarm hl:mem_free=344.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.301758/1.9, alarm hl:np_load_long=2.240234/2.25, alarm hl:mem_free=344.000000M/200M, alarm hl:available=1/0 [04:00:48] Load avg. on willow is CRITICAL: CRITICAL - load average: 34.55, 22.71, 19.49 [04:01:48] Load avg. on willow is WARNING: WARNING - load average: 23.23, 21.57, 19.29 [04:10:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.95, 22.92, 20.27 [04:14:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:20:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [04:21:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:25:49] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:30:28] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.448242/1.10, alarm hl:np_load_long=0.797852/1.55, alarm hl:mem_free=18926.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.448242/1.00, alarm hl:np_load_long=0.797852/1.50, alarm hl:mem_free=18926.000000M/350M, alarm hl:available=1/0 [04:30:48] Load avg. on willow is WARNING: WARNING - load average: 26.17, 21.16, 19.93 [04:31:30] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:32:29] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 188766.000000 [04:37:48] Load avg. on willow is CRITICAL: CRITICAL - load average: 23.69, 21.10, 20.12 [04:38:49] Load avg. on willow is WARNING: WARNING - load average: 19.12, 20.26, 19.88 [04:39:20] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [04:39:42] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:40:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 25.31, 21.70, 20.43 [04:41:40] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [04:53:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.046875/1.75, alarm hl:np_load_avg=2.139648/2.0, alarm hl:mem_free=586.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.046875/1.9, alarm hl:np_load_long=2.331543/2.25, alarm hl:mem_free=586.000000M/200M, alarm hl:available=1/0 [05:05:51] Load avg. on willow is WARNING: WARNING - load average: 17.89, 18.05, 18.37 [05:14:40] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:20:39] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [05:21:40] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:25:57] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:30:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.91, 23.16, 20.62 [05:32:39] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 192376.000000 [05:33:49] Load avg. on willow is WARNING: WARNING - load average: 19.26, 20.52, 19.99 [05:36:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 21.53, 20.48, 20.02 [05:40:18] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [05:40:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:41:49] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:43:39] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.411133/1.10, alarm hl:np_load_long=0.759765/1.55, alarm hl:mem_free=19217.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.411133/1.00, alarm hl:np_load_long=0.759765/1.50, alarm hl:mem_free=19217.000000M/350M, alarm hl:available=1/0 [05:45:39] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [05:53:40] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.347168/1.75, alarm hl:np_load_avg=2.776367/2.0, alarm hl:mem_free=179.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.347168/1.9, alarm hl:np_load_long=2.685059/2.25, alarm hl:mem_free=179.000000M/200M, alarm hl:available=1/0 [05:58:50] Load avg. on willow is CRITICAL: CRITICAL - load average: 18.35, 21.30, 21.41 [06:14:49] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:20:40] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [06:21:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:25:59] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:32:40] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 195975.000000 [06:40:29] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [06:41:39] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:41:48] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [06:54:40] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.839355/1.75, alarm hl:np_load_avg=3.083008/2.0, alarm hl:mem_free=197.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.839355/1.9, alarm hl:np_load_long=3.002930/2.25, alarm hl:mem_free=197.000000M/200M, alarm hl:available=1/0 [06:55:38] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.018555/1.00, alarm hl:np_load_long=0.748047/1.50, alarm hl:mem_free=19452.000000M/350M, alarm hl:available=1/0 [06:57:38] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [06:59:08] Load avg. on willow is CRITICAL: CRITICAL - load average: 16.35, 21.18, 22.85 [07:14:49] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:21:39] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [07:21:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:26:59] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:32:48] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 199580.000000 [07:37:02] Sun Grid Engine execd on wolfsbane is WARNING: short-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=1.184570/1.10, alarm hl:np_load_long=0.599609/1.55, alarm hl:mem_free=2064.000000M/300M, alarm hl:available=1/0: medium-sol@wolfsbane exceedes load threshold: alarm hl:np_load_short=1.184570/1.00, alarm hl:np_load_long=0.599609/1.50, alarm hl:mem_free=2064.000000M/350M, alarm hl:available=1/0 [07:38:02] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane OK: medium-sol@wolfsbane OK [07:40:32] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [07:42:02] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [07:42:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:54:12] Load avg. on willow is WARNING: WARNING - load average: 13.03, 18.00, 19.82 [07:54:41] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.625488/1.75, alarm hl:np_load_avg=2.208496/2.0, alarm hl:mem_free=627.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.625488/1.9, alarm hl:np_load_long=2.458496/2.25, alarm hl:mem_free=627.000000M/200M, alarm hl:available=1/0 [08:03:02] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.762695/1.10, alarm hl:np_load_long=0.906250/1.55, alarm hl:mem_free=19525.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.762695/1.00, alarm hl:np_load_long=0.906250/1.50, alarm hl:mem_free=19525.000000M/350M, alarm hl:available=1/0 [08:05:02] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [08:15:04] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:22:02] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [08:22:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:27:11] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:30:40] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [08:33:02] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 203193.000000 [08:33:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.07, 24.66, 20.83 [08:41:31] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [08:42:01] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [08:43:01] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [08:44:02] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:08:41] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:16:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:23:02] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [09:23:02] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:27:11] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:30:41] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [09:33:02] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 206794.000000 [09:33:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.04, 30.40, 30.71 [09:41:31] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [09:42:02] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:43:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [09:44:02] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.234375/1.10, alarm hl:np_load_long=0.850586/1.55, alarm hl:mem_free=19053.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.234375/1.00, alarm hl:np_load_long=0.850586/1.50, alarm hl:mem_free=19053.000000M/350M, alarm hl:available=1/0 [09:45:02] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [10:00:12] Load avg. on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:01:11] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.15, 31.12, 30.20 [10:16:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:23:03] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [10:23:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:25:21] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:21] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:22] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:22] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:31] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:25:42] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:51] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:25:51] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:26:02] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:02] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:02] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:12] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:26:12] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:12] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:12] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:13] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:13] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:13] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:13] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:31] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:31] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:31] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [10:26:32] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:32] / on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:32] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:32] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:32] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:32] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:33] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:33] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:34] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:26:41] SMF on z-dat-s7-a is OK: OK - all services online [10:26:41] / on z-dat-s6-a is OK: DISK OK - free space: / 8520 MB (28% inode=85%): [10:26:41] Load avg. on z-dat-s3-a is OK: OK - load average: 0.43, 1.41, 2.02 [10:26:41] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 3177 MB (99% inode=99%): [10:26:41] SMF on z-dat-s6-a is OK: OK - all services online [10:26:41] Load avg. on z-dat-s4-a is OK: OK - load average: 0.44, 1.42, 2.02 [10:26:41] SMF on z-dat-s3-a is OK: OK - all services online [10:26:42] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 187028 MB (19% inode=99%): [10:27:01] Load avg. on z-dat-s7-a is OK: OK - load average: 0.93, 1.46, 2.03 [10:27:02] SMTP on z-dat-s3-a is OK: SMTP OK - 0.003 sec. response time [10:27:02] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 107093 MB (26% inode=99%): [10:27:02] / on z-dat-s3-a is OK: DISK OK - free space: / 8520 MB (28% inode=85%): [10:27:02] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 3148 MB (99% inode=99%): [10:27:02] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 103862 MB (25% inode=99%): [10:27:03] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 3161 MB (99% inode=99%): [10:27:03] Load avg. on z-dat-s6-a is OK: OK - load average: 1.24, 1.52, 2.04 [10:27:04] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 187023 MB (19% inode=99%): [10:27:04] SMF on z-dat-s4-a is OK: OK - all services online [10:27:05] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 3175 MB (99% inode=99%): [10:27:05] / on z-dat-s7-a is OK: DISK OK - free space: / 8520 MB (28% inode=85%): [10:27:06] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:27:11] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:27:11] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:27:12] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:27:12] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:27:12] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [10:27:31] SMTP on hyacinth is OK: SMTP OK - 0.003 sec. response time [10:30:41] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [10:33:02] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 210394.000000 [10:41:31] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [10:42:03] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [10:42:53] 3(commented) [GEOHACK-22] Nokia link badly needs revising ! <10https://jira.toolserver.org/browse/GEOHACK-22> (Ville S) [10:43:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [10:49:01] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:01:11] Load avg. on willow is CRITICAL: CRITICAL - load average: 29.80, 29.00, 28.31 [11:08:50] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [11:13:02] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.371094/1.10, alarm hl:np_load_long=0.833985/1.55, alarm hl:mem_free=18779.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.371094/1.00, alarm hl:np_load_long=0.833985/1.50, alarm hl:mem_free=18779.000000M/350M, alarm hl:available=1/0 [11:16:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:20:02] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [11:23:01] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.623047/1.10, alarm hl:np_load_long=1.121094/1.55, alarm hl:mem_free=18873.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.623047/1.00, alarm hl:np_load_long=1.121094/1.50, alarm hl:mem_free=18873.000000M/350M, alarm hl:available=1/0 [11:23:12] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:24:01] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [11:27:22] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:27:41] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:28:11] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [11:30:40] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [11:34:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 214055.000000 [11:41:31] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [11:42:12] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [11:43:11] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [12:01:12] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.08, 30.40, 28.96 [12:11:38] [[Special:Log/newusers]] create 10 * Muhammed * (New user account) [12:16:11] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:24:01] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [12:24:12] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:27:22] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:30:41] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [12:34:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 217655.000000 [12:41:42] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [12:42:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [12:42:27] news about nightshade? [12:44:11] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [13:02:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.47, 31.96, 29.16 [13:16:22] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:24:01] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [13:24:01] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:24:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:28:31] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [13:31:41] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [13:34:00] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 221262.000000 [13:42:41] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [13:43:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [13:45:10] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:02:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 35.64, 29.29, 28.58 [14:16:22] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:24:00] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [14:24:31] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:27:42] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:31:42] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in error state: QERROR as result of job 1878653s failure: longrun-sol@willow in error state: QERROR as result of job 1878653s failure [14:34:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 224861.000000 [14:40:12] could someone please `git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git` in an arbitrary (temp) dir on willow and tell me what happens? [14:40:21] see https://bugzilla.wikimedia.org/35709 [14:42:43] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [14:43:42] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [14:45:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:59:01] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.162109/1.10, alarm hl:np_load_long=0.841797/1.55, alarm hl:mem_free=18413.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.162109/1.00, alarm hl:np_load_long=0.841797/1.50, alarm hl:mem_free=18413.000000M/350M, alarm hl:available=1/0 [15:00:01] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [15:02:41] Danny_B|backup: see me above. please test [15:02:44] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.17, 31.00, 30.20 [15:02:44] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=4.055664/1.75, alarm hl:np_load_avg=3.931641/2.0, alarm hl:mem_free=362.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=4.055664/1.9, alarm hl:np_load_long=3.792969/2.25, alarm hl:mem_free=362.000000M/200M, alarm hl:available=1/0 [15:03:53] bbaib [15:03:57] bbiab* [15:16:42] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:24:00] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [15:24:42] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:27:51] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:28:42] Load avg. on willow is WARNING: WARNING - load average: 15.68, 16.30, 19.66 [15:29:54] 3(commented) [ACCAPP-481] I am active user in http://fa.wiktionary.org and i want this account to run bot on this wiki also updating it's statics. <10https://jira.toolserver.org/browse/ACCAPP-481> (Abdollah Abdollahi) [15:30:42] Load avg. on willow is CRITICAL: CRITICAL - load average: 26.06, 18.78, 20.10 [15:32:42] Load avg. on willow is WARNING: WARNING - load average: 19.14, 18.76, 19.94 [15:34:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 228461.000000 [15:42:45] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [15:43:06] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.250000/1.10, alarm hl:np_load_long=0.850586/1.55, alarm hl:mem_free=18590.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.250000/1.00, alarm hl:np_load_long=0.850586/1.50, alarm hl:mem_free=18590.000000M/350M, alarm hl:available=1/0 [15:43:46] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [15:45:06] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [15:45:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [15:54:05] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [15:57:05] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.723633/1.75, alarm hl:np_load_avg=2.011230/2.0, alarm hl:mem_free=1210.000000M/350M, alarm hl:available=1/0 [16:15:16] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:16:45] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:18:16] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.829101/1.75, alarm hl:np_load_avg=2.034180/2.0, alarm hl:mem_free=863.000000M/350M, alarm hl:available=1/0 [16:23:15] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:24:15] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [16:24:45] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:28:25] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:33:35] Load avg. on willow is WARNING: WARNING - load average: 14.62, 17.05, 17.05 [16:34:16] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 232075.000000 [16:42:55] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [16:43:55] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [16:44:25] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.167969/1.10, alarm hl:np_load_long=0.761719/1.55, alarm hl:mem_free=18908.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.167969/1.00, alarm hl:np_load_long=0.761719/1.50, alarm hl:mem_free=18908.000000M/350M, alarm hl:available=1/0 [16:45:25] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [16:45:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [16:59:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.952149/1.75, alarm hl:np_load_avg=2.092773/2.0, alarm hl:mem_free=642.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.952149/1.9, alarm hl:np_load_long=2.112305/2.25, alarm hl:mem_free=642.000000M/200M, alarm hl:available=1/0 [17:08:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:11:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.053711/1.75, alarm hl:np_load_avg=2.102051/2.0, alarm hl:mem_free=288.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.053711/1.9, alarm hl:np_load_long=2.126953/2.25, alarm hl:mem_free=288.000000M/200M, alarm hl:available=1/0 [17:14:26] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.049805/1.00, alarm hl:np_load_long=0.870117/1.50, alarm hl:mem_free=18650.000000M/350M, alarm hl:available=1/0 [17:15:25] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:16:55] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:17:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [17:24:15] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [17:24:56] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:28:35] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:30:34] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.61, 21.15, 18.32 [17:31:35] Load avg. on willow is WARNING: WARNING - load average: 19.54, 19.61, 17.95 [17:34:15] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 235675.000000 [17:42:55] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [17:43:55] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [17:45:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:00:35] Load avg. on willow is CRITICAL: CRITICAL - load average: 33.75, 21.77, 18.94 [18:01:36] Load avg. on willow is WARNING: WARNING - load average: 23.59, 20.99, 18.84 [18:02:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.399902/1.75, alarm hl:np_load_avg=2.523438/2.0, alarm hl:mem_free=543.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.399902/1.9, alarm hl:np_load_long=2.329590/2.25, alarm hl:mem_free=543.000000M/200M, alarm hl:available=1/0 [18:16:55] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:24:55] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:25:14] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [18:28:34] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:31:57] 3(created) [ACCAPP-490] Run SQL queries on Wikipedia editor stats for Wikimedia Foundation related research; Account Approval; New Account <10https://jira.toolserver.org/browse/ACCAPP-490> (Maryana Pinchuk) [18:34:25] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 239280.000000 [18:41:41] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:54] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:41:54] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:02] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:02] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:42:02] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:42:02] Environment IPMI on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:42:25] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:42:33] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [18:42:33] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:42] Load avg. on z-dat-s7-a is OK: OK - load average: 0.28, 1.51, 2.10 [18:42:43] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 103540 MB (25% inode=99%): [18:42:43] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [18:42:43] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [18:42:43] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [18:42:52] Environment IPMI on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [18:42:52] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [18:42:53] SMF on hyacinth is OK: OK - all services online [18:43:22] SMTP on z-dat-s7-a is OK: SMTP OK - 0.133 sec. response time [18:43:32] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [18:43:53] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [18:44:02] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [18:45:32] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:46:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.335449/1.75, alarm hl:np_load_avg=2.044922/2.0, alarm hl:mem_free=249.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.335449/1.9, alarm hl:np_load_long=2.100586/2.25, alarm hl:mem_free=249.000000M/200M, alarm hl:available=1/0 [19:01:42] Load avg. on willow is WARNING: WARNING - load average: 19.04, 18.57, 17.62 [19:17:52] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:25:23] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [19:25:53] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:28:42] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:35:23] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 242942.000000 [19:43:53] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [19:44:02] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [19:45:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [19:46:33] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.854492/1.75, alarm hl:np_load_avg=2.573730/2.0, alarm hl:mem_free=366.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.854492/1.9, alarm hl:np_load_long=2.385254/2.25, alarm hl:mem_free=366.000000M/200M, alarm hl:available=1/0 [20:01:42] Load avg. on willow is WARNING: WARNING - load average: 22.44, 19.61, 18.94 [20:03:57] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:08:47] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [20:10:53] 3(commented) [ACCAPP-461] Analysing the development of Bots <10https://jira.toolserver.org/browse/ACCAPP-461> (DaB.) [20:18:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:24:07] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:25:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [20:26:07] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:29:37] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:35:29] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 246544.000000 [20:44:07] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [20:44:08] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [20:45:37] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [20:46:37] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.577148/1.75, alarm hl:np_load_avg=2.330566/2.0, alarm hl:mem_free=129.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.577148/1.9, alarm hl:np_load_long=2.316406/2.25, alarm hl:mem_free=129.000000M/200M, alarm hl:available=1/0 [21:02:38] Load avg. on willow is WARNING: WARNING - load average: 17.62, 18.16, 18.00 [21:19:17] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:20:38] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:23:37] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.556152/1.75, alarm hl:np_load_avg=2.234863/2.0, alarm hl:mem_free=298.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.556152/1.9, alarm hl:np_load_long=2.226562/2.25, alarm hl:mem_free=298.000000M/200M, alarm hl:available=1/0 [21:26:28] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [21:27:07] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:29:47] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:35:38] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 250156.000000 [21:39:47] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:44:07] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [21:44:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [21:45:48] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:01:48] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.371094/1.75, alarm hl:np_load_avg=2.321289/2.0, alarm hl:mem_free=616.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.371094/1.9, alarm hl:np_load_long=2.230469/2.25, alarm hl:mem_free=616.000000M/200M, alarm hl:available=1/0 [22:02:48] Load avg. on willow is WARNING: WARNING - load average: 17.57, 18.23, 17.77 [22:19:16] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:24:51] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:26:37] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [22:27:17] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:30:07] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:32:48] Load avg. on willow is OK: OK - load average: 8.77, 12.38, 14.78 [22:35:50] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 253761.000000 [22:36:48] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.335938/1.10, alarm hl:np_load_long=0.824219/1.55, alarm hl:mem_free=18699.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.335938/1.00, alarm hl:np_load_long=0.824219/1.50, alarm hl:mem_free=18699.000000M/350M, alarm hl:available=1/0 [22:37:48] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [22:44:06] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [22:44:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [22:45:57] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:47:57] Load avg. on willow is WARNING: WARNING - load average: 14.64, 15.41, 15.10 [22:47:57] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.150391/1.75, alarm hl:np_load_avg=1.988770/2.0, alarm hl:mem_free=840.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.150391/1.9, alarm hl:np_load_long=1.907226/2.25, alarm hl:mem_free=840.000000M/200M, alarm hl:available=1/0 [22:51:58] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [22:52:57] Load avg. on willow is OK: OK - load average: 7.97, 13.32, 14.53 [23:01:56] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.518555/1.75, alarm hl:np_load_avg=2.297852/2.0, alarm hl:mem_free=604.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.518555/1.9, alarm hl:np_load_long=2.041016/2.25, alarm hl:mem_free=604.000000M/200M, alarm hl:available=1/0 [23:15:58] Load avg. on willow is WARNING: WARNING - load average: 17.41, 17.30, 16.79 [23:19:18] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:22:58] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:25:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.507324/1.75, alarm hl:np_load_avg=2.115234/2.0, alarm hl:mem_free=607.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.507324/1.9, alarm hl:np_load_long=2.082031/2.25, alarm hl:mem_free=607.000000M/200M, alarm hl:available=1/0 [23:26:38] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [23:27:18] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:29:58] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:30:18] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:35:50] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 257368.000000 [23:44:19] MySQL slave on rosemary is CRITICAL: (Return code of 139 is out of bounds) [23:44:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [23:46:00] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default