[00:04:49] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:13:57] MySQL slave on thyme is WARNING: No slaves defined [00:13:57] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.432617/1.75, alarm hl:np_load_avg=1.586914/2.0, alarm hl:mem_free=139.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.432617/1.9, alarm hl:np_load_long=1.711426/2.25, alarm hl:mem_free=139.000000M/200M, alarm hl:available=1/0 [00:37:57] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:40:56] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [00:41:04] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 962758.000000 [00:41:58] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.784180/1.75, alarm hl:np_load_avg=1.717774/2.0, alarm hl:mem_free=241.000000M/350M, alarm hl:available=1/0 [00:46:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:48:34] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:51:58] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:52:04] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:52:58] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:53:17] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [00:59:18] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [01:04:57] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:13:58] MySQL slave on thyme is WARNING: No slaves defined [01:32:59] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.839355/1.75, alarm hl:np_load_avg=1.686524/2.0, alarm hl:mem_free=376.000000M/350M, alarm hl:available=1/0 [01:34:59] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [01:40:56] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [01:41:16] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 966365.000000 [01:44:57] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.677734/1.75, alarm hl:np_load_avg=1.619141/2.0, alarm hl:mem_free=307.000000M/350M, alarm hl:available=1/0 [01:51:58] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:52:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:52:59] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:59:26] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:01:17] Load avg. on willow is WARNING: WARNING - load average: 19.26, 16.51, 14.70 [02:04:58] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:09:17] Load avg. on willow is OK: OK - load average: 12.32, 14.46, 14.47 [02:13:58] MySQL slave on thyme is WARNING: No slaves defined [02:16:59] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.125000/1.10, alarm hl:np_load_long=0.742188/1.55, alarm hl:mem_free=20360.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.125000/1.00, alarm hl:np_load_long=0.742188/1.50, alarm hl:mem_free=20360.000000M/350M, alarm hl:available=1/0 [02:17:59] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [02:40:57] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [02:41:50] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 969994.000000 [02:52:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:52:16] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:53:07] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:59:57] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [03:02:10] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 123259 MB (12% inode=99%): [03:05:17] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:14:07] MySQL slave on thyme is WARNING: No slaves defined [03:28:17] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.462891/1.10, alarm hl:np_load_long=0.898438/1.55, alarm hl:mem_free=19471.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.462891/1.00, alarm hl:np_load_long=0.898438/1.50, alarm hl:mem_free=19471.000000M/350M, alarm hl:available=1/0 [03:29:17] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:41:15] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [03:42:47] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 973655.000000 [03:52:17] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:52:26] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:53:18] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:00:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [04:06:15] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:14:07] MySQL slave on thyme is WARNING: No slaves defined [04:36:19] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.145508/1.10, alarm hl:np_load_long=0.764648/1.55, alarm hl:mem_free=20297.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.145508/1.00, alarm hl:np_load_long=0.764648/1.50, alarm hl:mem_free=20297.000000M/350M, alarm hl:available=1/0 [04:37:17] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:41:18] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [04:42:49] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 977256.000000 [04:52:17] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:52:48] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:53:25] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:01:16] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:06:16] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:14:18] MySQL slave on thyme is WARNING: No slaves defined [05:17:18] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.339355/1.75, alarm hl:np_load_avg=1.501465/2.0, alarm hl:mem_free=219.000000M/350M, alarm hl:available=1/0 [05:19:19] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [05:42:18] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [05:43:47] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 980916.000000 [05:52:18] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:53:25] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:53:46] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:01:27] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [06:06:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:06:29] @replag [06:06:30] matthewrbowker: s1-rr-a: 1w 4d 8h 51m 25s [+1.00 s/s]; s1-rr-a-c: 10s [+0.00 s/s]; s1-user: 1w 4d 8h 51m 25s [+1.00 s/s]; s2-user-c: 10s [+0.00 s/s]; s3-rr-a: 26s [-0.00 s/s]; s3-user: 26s [-0.00 s/s]; s4-rr-a: 10s [+0.00 s/s]; s4-user: 10s [+0.00 s/s] [06:06:31] matthewrbowker: s5-user-c: 10s [+0.00 s/s] [06:10:26] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.236328/1.75, alarm hl:np_load_avg=1.623535/2.0, alarm hl:mem_free=243.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.236328/1.9, alarm hl:np_load_long=1.477539/2.25, alarm hl:mem_free=243.000000M/200M, alarm hl:available=1/0 [06:13:25] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [06:15:15] MySQL slave on thyme is WARNING: No slaves defined [06:42:29] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [06:43:46] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 984518.000000 [06:52:26] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:53:35] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:54:46] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:01:47] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [07:06:45] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:10:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.095215/1.75, alarm hl:np_load_avg=1.583496/2.0, alarm hl:mem_free=235.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.095215/1.9, alarm hl:np_load_long=1.423828/2.25, alarm hl:mem_free=235.000000M/200M, alarm hl:available=1/0 [07:13:28] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [07:15:28] MySQL slave on thyme is WARNING: No slaves defined [07:16:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.864258/1.75, alarm hl:np_load_avg=1.713379/2.0, alarm hl:mem_free=206.000000M/350M, alarm hl:available=1/0 [07:36:05] Load avg. on willow is CRITICAL: CRITICAL - load average: 30.10, 19.81, 14.93 [07:38:05] Load avg. on willow is WARNING: WARNING - load average: 24.66, 21.81, 16.34 [07:42:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.050293/1.75, alarm hl:np_load_avg=2.595703/2.0, alarm hl:mem_free=458.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.050293/1.9, alarm hl:np_load_long=2.179199/2.25, alarm hl:mem_free=458.000000M/200M, alarm hl:available=1/0 [07:43:27] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [07:43:55] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 988124.000000 [07:52:36] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:52:36] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [07:53:46] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:54:55] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:56:14] Environment IPMI on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:57:15] Environment IPMI on adenia is OK: ok: temperature ok fan ok voltage ok chassis ok [07:57:36] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.003906/1.75, alarm hl:np_load_avg=1.852051/2.0, alarm hl:mem_free=631.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.003906/1.9, alarm hl:np_load_long=1.945312/2.25, alarm hl:mem_free=631.000000M/200M, alarm hl:available=1/0 [08:01:47] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [08:02:37] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [08:03:37] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:06:46] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [08:15:38] MySQL slave on thyme is WARNING: No slaves defined [08:28:05] Load avg. on willow is OK: OK - load average: 11.50, 13.74, 14.79 [08:31:14] Environment IPMI on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:31:56] Environment IPMI on adenia is OK: ok: temperature ok fan ok voltage ok chassis ok [08:32:06] Load avg. on willow is WARNING: WARNING - load average: 15.50, 15.72, 15.39 [08:36:06] Load avg. on willow is OK: OK - load average: 13.20, 14.50, 14.95 [08:43:38] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [08:43:56] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 991726.000000 [08:52:45] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:53:47] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:55:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:58:15] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [08:59:05] Load avg. on willow is WARNING: WARNING - load average: 17.43, 16.93, 16.09 [09:01:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.657715/1.75, alarm hl:np_load_avg=2.432129/2.0, alarm hl:mem_free=208.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.657715/1.9, alarm hl:np_load_long=2.151367/2.25, alarm hl:mem_free=208.000000M/200M, alarm hl:available=1/0 [09:02:46] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:07:45] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [09:16:36] MySQL slave on thyme is WARNING: No slaves defined [09:34:45] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [09:38:45] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.123047/1.75, alarm hl:np_load_avg=1.981934/2.0, alarm hl:mem_free=487.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.123047/1.9, alarm hl:np_load_long=2.015625/2.25, alarm hl:mem_free=487.000000M/200M, alarm hl:available=1/0 [09:44:37] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [09:44:56] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 995388.000000 [09:49:45] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [09:52:46] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:54:45] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:55:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:59:06] Load avg. on willow is WARNING: WARNING - load average: 10.97, 13.74, 15.18 [10:00:05] Load avg. on willow is OK: OK - load average: 10.70, 13.18, 14.89 [10:02:56] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [10:03:06] Load avg. on willow is WARNING: WARNING - load average: 16.27, 14.98, 15.35 [10:07:46] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [10:16:36] MySQL slave on thyme is WARNING: No slaves defined [10:22:05] Load avg. on willow is OK: OK - load average: 11.26, 13.77, 14.89 [10:43:35] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:45:08] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 998996.000000 [10:45:35] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [10:52:56] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:53:16] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:54:55] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:55:16] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:03:06] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [11:03:06] Load avg. on willow is WARNING: WARNING - load average: 15.51, 15.50, 14.58 [11:07:55] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [11:14:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.606934/1.75, alarm hl:np_load_avg=2.393066/2.0, alarm hl:mem_free=399.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.606934/1.9, alarm hl:np_load_long=2.103516/2.25, alarm hl:mem_free=399.000000M/200M, alarm hl:available=1/0 [11:17:35] MySQL slave on thyme is WARNING: No slaves defined [11:18:46] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:23:47] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.192383/1.75, alarm hl:np_load_avg=1.948730/2.0, alarm hl:mem_free=716.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.192383/1.9, alarm hl:np_load_long=1.962402/2.25, alarm hl:mem_free=716.000000M/200M, alarm hl:available=1/0 [11:26:46] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [11:28:06] Load avg. on willow is OK: OK - load average: 11.87, 13.92, 15.00 [11:33:07] Load avg. on willow is WARNING: WARNING - load average: 17.18, 15.91, 15.54 [11:45:16] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1002601.000000 [11:45:35] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [11:53:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:55:06] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:55:25] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:01:16] Load avg. on willow is CRITICAL: CRITICAL - load average: 33.20, 22.32, 18.14 [12:02:16] Load avg. on willow is WARNING: WARNING - load average: 27.04, 22.58, 18.50 [12:03:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [12:08:05] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [12:14:46] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.153320/1.75, alarm hl:np_load_avg=2.326172/2.0, alarm hl:mem_free=321.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.153320/1.9, alarm hl:np_load_long=2.292969/2.25, alarm hl:mem_free=321.000000M/200M, alarm hl:available=1/0 [12:17:36] MySQL slave on thyme is WARNING: No slaves defined [12:33:56] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:36:55] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.927734/1.75, alarm hl:np_load_avg=1.979980/2.0, alarm hl:mem_free=397.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.927734/1.9, alarm hl:np_load_long=2.057129/2.25, alarm hl:mem_free=397.000000M/200M, alarm hl:available=1/0 [12:39:06] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [12:43:35] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:44:15] Load avg. on willow is OK: OK - load average: 11.18, 13.00, 14.87 [12:46:17] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1006262.000000 [12:46:35] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [12:52:16] Load avg. on willow is WARNING: WARNING - load average: 17.81, 16.79, 15.82 [12:53:15] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [12:54:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:55:06] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:55:25] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:03:18] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [13:08:06] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [13:17:35] MySQL slave on thyme is WARNING: No slaves defined [13:19:17] Load avg. on willow is OK: OK - load average: 11.85, 13.77, 14.95 [13:30:07] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.118164/1.10, alarm hl:np_load_long=0.832031/1.55, alarm hl:mem_free=20099.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=2.118164/1.00, alarm hl:np_load_long=0.832031/1.50, alarm hl:mem_free=20099.000000M/350M, alarm hl:available=1/0 [13:30:17] Load avg. on willow is WARNING: WARNING - load average: 15.29, 13.53, 13.97 [13:32:17] Load avg. on willow is OK: OK - load average: 13.58, 14.76, 14.50 [13:39:06] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.828613/1.75, alarm hl:np_load_avg=1.765137/2.0, alarm hl:mem_free=317.000000M/350M, alarm hl:available=1/0 [13:43:05] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [13:46:45] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [13:47:16] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1009927.000000 [13:53:07] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [13:54:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:55:06] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:55:16] Load avg. on willow is WARNING: WARNING - load average: 17.27, 15.94, 15.82 [13:55:26] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:56:07] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.205566/1.75, alarm hl:np_load_avg=2.038574/2.0, alarm hl:mem_free=428.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.205566/1.9, alarm hl:np_load_long=1.994629/2.25, alarm hl:mem_free=428.000000M/200M, alarm hl:available=1/0 [14:03:25] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [14:08:06] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 104152 MB (10% inode=99%): [14:08:06] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:17:47] MySQL slave on thyme is WARNING: No slaves defined [14:29:06] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [14:32:06] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.446777/1.75, alarm hl:np_load_avg=2.280273/2.0, alarm hl:mem_free=263.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.446777/1.9, alarm hl:np_load_long=2.207520/2.25, alarm hl:mem_free=263.000000M/200M, alarm hl:available=1/0 [14:42:16] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [14:46:46] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [14:47:24] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1013532.000000 [14:54:16] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:55:17] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:55:25] Load avg. on willow is WARNING: WARNING - load average: 13.91, 14.00, 15.64 [14:55:46] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:03:35] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [15:08:16] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [15:15:16] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.836426/1.75, alarm hl:np_load_avg=2.176758/2.0, alarm hl:mem_free=163.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.836426/1.9, alarm hl:np_load_long=2.142090/2.25, alarm hl:mem_free=163.000000M/200M, alarm hl:available=1/0 [15:18:46] MySQL slave on thyme is WARNING: No slaves defined [15:47:00] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [15:47:29] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1017137.000000 [15:50:10] @replag [15:50:10] IWorld: s1-rr-a: 1w 4d 18h 35m 6s [+1.00 s/s]; s1-user: 1w 4d 18h 35m 6s [+1.00 s/s]; s2-user: 11s [+0.00 s/s] [15:53:29] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.214844/1.10, alarm hl:np_load_long=0.784180/1.55, alarm hl:mem_free=20135.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.214844/1.00, alarm hl:np_load_long=0.784180/1.50, alarm hl:mem_free=20135.000000M/350M, alarm hl:available=1/0 [15:54:29] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:54:29] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [15:55:28] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:55:36] Load avg. on willow is WARNING: WARNING - load average: 20.82, 17.92, 17.20 [15:55:59] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:03:59] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [16:04:28] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 135477 MB (13% inode=99%): [16:08:27] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [16:15:27] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.664062/1.75, alarm hl:np_load_avg=2.323242/2.0, alarm hl:mem_free=158.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.664062/1.9, alarm hl:np_load_long=2.198730/2.25, alarm hl:mem_free=158.000000M/200M, alarm hl:available=1/0 [16:18:57] MySQL slave on thyme is WARNING: No slaves defined [16:27:30] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:27:52] 3(commented) [MNT-1225] Growing replag on S1 due to a database migration at WMF <10https://jira.toolserver.org/browse/MNT-1225> (Cyberpower678) [16:32:38] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.923340/1.75, alarm hl:np_load_avg=2.063477/2.0, alarm hl:mem_free=638.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.923340/1.9, alarm hl:np_load_long=2.064453/2.25, alarm hl:mem_free=638.000000M/200M, alarm hl:available=1/0 [16:33:39] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:43:37] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.279297/1.10, alarm hl:np_load_long=0.724610/1.55, alarm hl:mem_free=20279.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.279297/1.00, alarm hl:np_load_long=0.724610/1.50, alarm hl:mem_free=20279.000000M/350M, alarm hl:available=1/0 [16:44:38] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [16:47:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [16:48:27] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1020801.000000 [16:54:38] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:55:37] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:55:37] Load avg. on willow is WARNING: WARNING - load average: 15.74, 15.98, 16.12 [16:56:07] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:57:38] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.903809/1.75, alarm hl:np_load_avg=1.920899/2.0, alarm hl:mem_free=369.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.903809/1.9, alarm hl:np_load_long=1.982910/2.25, alarm hl:mem_free=369.000000M/200M, alarm hl:available=1/0 [17:03:52] 3(commented) [ACCAPP-485] UTRS developer <10https://jira.toolserver.org/browse/ACCAPP-485> (TParis) [17:04:58] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [17:08:38] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [17:18:58] MySQL slave on thyme is WARNING: No slaves defined [17:48:38] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1024410.000000 [17:48:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [17:54:48] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:55:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:55:49] Load avg. on willow is WARNING: WARNING - load average: 14.82, 15.30, 16.60 [17:56:17] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:57:48] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.871094/1.75, alarm hl:np_load_avg=1.896973/2.0, alarm hl:mem_free=252.000000M/350M, alarm hl:available=1/0 [18:00:48] Load avg. on willow is CRITICAL: CRITICAL - load average: 32.18, 20.32, 18.02 [18:01:48] Load avg. on willow is WARNING: WARNING - load average: 23.84, 20.25, 18.15 [18:05:08] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [18:08:48] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:10:47] Load avg. on willow is CRITICAL: CRITICAL - load average: 26.64, 22.91, 20.16 [18:19:08] MySQL slave on thyme is WARNING: No slaves defined [18:29:49] Load avg. on willow is WARNING: WARNING - load average: 17.92, 19.31, 19.99 [18:30:49] Load avg. on willow is CRITICAL: CRITICAL - load average: 31.02, 22.88, 21.21 [18:34:49] Load avg. on willow is WARNING: WARNING - load average: 16.26, 19.09, 19.98 [18:38:18] Hello. [18:48:47] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1028020.000000 [18:49:07] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [18:55:00] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:55:59] Load avg. on willow is WARNING: WARNING - load average: 16.49, 17.83, 18.69 [18:55:59] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:56:38] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:57:57] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.983887/1.75, alarm hl:np_load_avg=2.171387/2.0, alarm hl:mem_free=426.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.983887/1.9, alarm hl:np_load_long=2.304199/2.25, alarm hl:mem_free=426.000000M/200M, alarm hl:available=1/0 [19:05:18] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [19:08:59] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [19:19:17] MySQL slave on thyme is WARNING: No slaves defined [19:30:59] Load avg. on willow is CRITICAL: CRITICAL - load average: 29.42, 21.59, 20.00 [19:31:59] Load avg. on willow is WARNING: WARNING - load average: 22.53, 21.00, 19.89 [19:48:47] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1031621.000000 [19:49:17] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [19:50:57] Load avg. on willow is CRITICAL: CRITICAL - load average: 24.77, 21.52, 20.13 [19:51:59] Load avg. on willow is WARNING: WARNING - load average: 19.76, 20.66, 19.91 [19:55:59] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:55:59] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:56:37] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:57:58] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.117676/1.75, alarm hl:np_load_avg=2.335449/2.0, alarm hl:mem_free=452.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.117676/1.9, alarm hl:np_load_long=2.408203/2.25, alarm hl:mem_free=452.000000M/200M, alarm hl:available=1/0 [20:00:58] Load avg. on willow is CRITICAL: CRITICAL - load average: 26.61, 21.54, 20.20 [20:06:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [20:08:38] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:09:08] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [20:19:17] MySQL slave on thyme is WARNING: No slaves defined [20:31:37] * Dispenser pokes nosy for a status update [20:36:09] Load avg. on willow is WARNING: WARNING - load average: 15.55, 16.99, 18.25 [20:48:57] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1035229.000000 [20:49:18] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [20:56:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:56:08] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:56:38] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:58:08] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.051758/1.75, alarm hl:np_load_avg=2.155273/2.0, alarm hl:mem_free=344.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.051758/1.9, alarm hl:np_load_long=2.250488/2.25, alarm hl:mem_free=344.000000M/200M, alarm hl:available=1/0 [20:58:18] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [21:06:18] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [21:09:09] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [21:19:18] MySQL slave on thyme is WARNING: No slaves defined [21:36:20] Load avg. on willow is WARNING: WARNING - load average: 17.52, 19.48, 19.25 [21:49:07] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1038833.000000 [21:49:20] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [21:56:20] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:56:21] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:56:48] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:58:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.756836/1.75, alarm hl:np_load_avg=2.100098/2.0, alarm hl:mem_free=245.000000M/350M, alarm hl:available=1/0 [22:06:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [22:09:21] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:14:54] @replag [22:14:54] Thehelpfulone: s1-rr-a: 1w 5d 59m 50s [+1.00 s/s]; s1-user: 1w 5d 59m 50s [+1.00 s/s]; s3-rr-a: 28s [+0.00 s/s]; s3-user: 28s [+0.00 s/s] [22:19:19] MySQL slave on thyme is WARNING: No slaves defined [22:36:13] holy kerschnitzels, why is replag on s1 so high [22:36:18] it's like two weeks high [22:36:20] Load avg. on willow is WARNING: WARNING - load average: 16.98, 16.08, 16.19 [22:37:32] enthdegree: https://jira.toolserver.org/browse/MNT-1227 [22:39:50] Dispenser: thanks! [22:49:07] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1042434.000000 [22:49:21] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [22:56:27] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:56:57] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:57:19] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:58:20] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.970703/1.75, alarm hl:np_load_avg=2.069336/2.0, alarm hl:mem_free=277.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.970703/1.9, alarm hl:np_load_long=2.064453/2.25, alarm hl:mem_free=277.000000M/200M, alarm hl:available=1/0 [23:06:27] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [23:10:28] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [23:19:20] MySQL slave on thyme is WARNING: No slaves defined [23:37:18] Load avg. on willow is WARNING: WARNING - load average: 17.55, 17.82, 18.39 [23:49:08] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 1046035.000000 [23:50:19] s1 replag on thyme is CRITICAL: QUERY CRITICAL: Unknown database enwiki_p [23:56:28] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:56:57] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:57:28] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:58:29] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.048340/1.75, alarm hl:np_load_avg=2.330566/2.0, alarm hl:mem_free=251.000000M/350M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.048340/1.9, alarm hl:np_load_long=2.348145/2.25, alarm hl:mem_free=251.000000M/200M, alarm hl:available=1/0