[00:02:49] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.409180/1.8, alarm hl:np_load_avg=1.367188/2.3, alarm hl:mem_free=194.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=1.409180/1.9, alarm hl:np_load_long=1.186524/2.25, alarm hl:mem_free=194.000000M/200M, alarm hl:available=1/0 [00:19:08] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 274 bytes in 12.457 second response time [00:23:47] nacht ts [00:24:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:24:58] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:25:08] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [00:25:28] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:26:18] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453962 MB (8% inode=44%): [00:26:48] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:26:48] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [00:27:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [00:30:48] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.910156/1.8, alarm hl:np_load_avg=1.376465/2.3, alarm hl:mem_free=214.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=1.910156/1.9, alarm hl:np_load_long=1.308105/2.25, alarm hl:mem_free=214.000000M/200M, alarm hl:available=1/0 [00:36:17] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [00:38:39] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 273 bytes in 10.801 second response time [00:45:49] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [00:54:39] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 30733 MB (3% inode=99%): [00:57:32] Load avg. on cassia is WARNING: WARNING - load average: 15.27, 13.95, 12.07 [00:59:30] Load avg. on cassia is OK: OK - load average: 14.78, 14.27, 12.43 [01:12:38] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 271 bytes in 10.830 second response time [01:24:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:25:31] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:25:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:26:31] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 456165 MB (8% inode=44%): [01:26:49] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:27:30] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [01:41:29] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=2.204102/1.10, alarm hl:np_load_long=0.976562/1.55, alarm hl:mem_free=18047.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=2.204102/1.00, alarm hl:np_load_long=0.976562/1.50, alarm hl:mem_free=18047.000000M/300M, alarm hl:available=1/0 [01:47:32] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [02:12:49] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.666504/1.8, alarm hl:np_load_avg=1.572266/2.3, alarm hl:mem_free=194.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=1.666504/1.9, alarm hl:np_load_long=1.331543/2.25, alarm hl:mem_free=194.000000M/200M, alarm hl:available=1/0 [02:14:31] Sun Grid Engine execd on ortelius is WARNING: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.012695/1.00, alarm hl:np_load_long=0.861328/1.50, alarm hl:mem_free=18387.000000M/300M, alarm hl:available=1/0 [02:15:30] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [02:24:49] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:26:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 454959 MB (8% inode=44%): [02:26:32] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:26:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:26:57] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:27:31] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:29:30] Sun Grid Engine execd on ortelius is WARNING: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.083984/1.00, alarm hl:np_load_long=0.958985/1.50, alarm hl:mem_free=18088.000000M/300M, alarm hl:available=1/0 [02:38:09] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [02:48:08] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.075195/1.8, alarm hl:np_load_avg=1.221191/2.3, alarm hl:mem_free=228.000000M/300M, alarm hl:available=1/0 [02:55:08] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [03:04:59] good night :) [03:25:01] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:26:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 454245 MB (8% inode=44%): [03:26:32] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:26:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:27:10] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:27:33] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [03:38:34] Sun Grid Engine execd on ortelius is WARNING: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.023438/1.00, alarm hl:np_load_long=0.948242/1.50, alarm hl:mem_free=18041.000000M/300M, alarm hl:available=1/0 [03:39:33] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [03:42:32] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.10, alarm hl:np_load_long=0.964844/1.55, alarm hl:mem_free=17946.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.101562/1.00, alarm hl:np_load_long=0.964844/1.50, alarm hl:mem_free=17946.000000M/300M, alarm hl:available=1/0 [04:06:42] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.353516/1.10, alarm hl:np_load_long=1.240234/1.55, alarm hl:mem_free=18011.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.353516/1.00, alarm hl:np_load_long=1.240234/1.50, alarm hl:mem_free=18011.000000M/300M, alarm hl:available=1/0 [04:08:11] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.327637/1.8, alarm hl:np_load_avg=1.314453/2.3, alarm hl:mem_free=257.000000M/300M, alarm hl:available=1/0 [04:09:10] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [04:09:43] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [04:12:11] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.300293/1.8, alarm hl:np_load_avg=1.350586/2.3, alarm hl:mem_free=246.000000M/300M, alarm hl:available=1/0 [04:17:11] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 273 bytes in 17.367 second response time [04:25:01] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:26:42] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 454207 MB (8% inode=44%): [04:26:42] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:26:42] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:27:20] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:27:42] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [04:30:17] [[Wiki server assignments]] ! 10https://wiki.toolserver.org/w/index.php?diff=6850&oldid=6805&rcid=9025 * 91.198.174.202 * (+0) (updated page) [04:32:11] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [04:36:20] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 273 bytes in 18.798 second response time [04:38:21] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [04:47:42] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.230469/1.10, alarm hl:np_load_long=0.927735/1.55, alarm hl:mem_free=18179.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.230469/1.00, alarm hl:np_load_long=0.927735/1.50, alarm hl:mem_free=18179.000000M/300M, alarm hl:available=1/0 [04:48:43] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [04:54:42] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.163086/1.10, alarm hl:np_load_long=0.943360/1.55, alarm hl:mem_free=18301.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.163086/1.00, alarm hl:np_load_long=0.943360/1.50, alarm hl:mem_free=18301.000000M/300M, alarm hl:available=1/0 [04:59:21] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [05:10:35] Hm.. anyone know if it should be possible (and if so how) to ssh from a web server to a loginserver? [05:11:02] The cvn MMT runs bots on willow (they're fairly large so shouldn't run on a web server), and they are started/killed via a PHP control panel [05:11:21] right now this process is very ugly by using a mysql database and a minutely cronjob [05:11:41] php adds a row and minutely cronjob on willow pops the row and executes it [05:12:01] would be much nicer if the php script could execute it directly on willow [05:12:38] I noticed today that 'ssh wolfsbane' works directly from willow, no need to verify within TS. That's very nice. But I'm not sure how to get ssh going without ssh2 PECL, or is there a way to get it via regular exec() ? [05:16:11] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=2.381348/1.8, alarm hl:np_load_avg=1.809570/2.3, alarm hl:mem_free=174.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=2.381348/1.9, alarm hl:np_load_long=1.520996/2.25, alarm hl:mem_free=174.000000M/200M, alarm hl:available=1/0 [05:20:12] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [05:23:11] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=0.891113/1.8, alarm hl:np_load_avg=1.284180/2.3, alarm hl:mem_free=253.000000M/300M, alarm hl:available=1/0 [05:25:02] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:26:42] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 454149 MB (8% inode=44%): [05:26:42] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:26:42] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:27:20] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:27:42] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:31:39] hm.. I'll give proc_open a try [05:34:11] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [05:59:21] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [06:03:11] oh ssh takes an argument for command -_- [06:03:13] duh! [06:26:01] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:26:51] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:26:51] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:27:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:27:42] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 454102 MB (8% inode=44%): [06:27:52] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [06:32:42] Load avg. on willow is WARNING: WARNING - load average: 14.57, 15.86, 13.56 [06:35:32] Load avg. on willow is OK: OK - load average: 12.61, 14.51, 13.42 [06:53:20] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 274 bytes in 17.858 second response time [07:02:33] Load avg. on willow is WARNING: WARNING - load average: 25.90, 21.32, 16.07 [07:03:11] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=2.466797/1.8, alarm hl:np_load_avg=2.537598/2.3, alarm hl:mem_free=315.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=2.466797/1.9, alarm hl:np_load_long=1.986816/2.25, alarm hl:mem_free=315.000000M/200M, alarm hl:available=1/0 [07:05:33] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [07:06:31] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 274 bytes in 18.250 second response time [07:16:11] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [07:17:31] Load avg. on willow is OK: OK - load average: 11.21, 13.50, 14.90 [07:26:11] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:26:51] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:26:52] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:27:21] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:27:52] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453989 MB (8% inode=44%): [07:27:52] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [07:32:41] Load avg. on willow is WARNING: WARNING - load average: 15.16, 14.26, 13.90 [07:33:12] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.866211/1.8, alarm hl:np_load_avg=1.777344/2.3, alarm hl:mem_free=740.000000M/300M, alarm hl:available=1/0 [07:33:41] Load avg. on willow is OK: OK - load average: 11.09, 13.28, 13.58 [07:34:11] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [07:52:31] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [08:01:21] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 274 bytes in 9.077 second response time [08:26:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:26:52] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:27:01] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [08:27:53] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453882 MB (8% inode=44%): [08:28:01] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:09:02] Dr. Trigon * Re: [Toolserver-l] [Wikitech-l] 403: User account expired toolserver.org/~soxred93 [09:19:02] Magnus Manske * Re: [Toolserver-l] [Wikitech-l] 403: User account expired toolserver.org/~soxred93 [09:26:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:27:02] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:27:02] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:28:02] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:28:51] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453753 MB (8% inode=44%): [09:31:41] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:52] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:52] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:52] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:02] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:12] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:12] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:12] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:12] Load avg. on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:12] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:21] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:21] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:21] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:21] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:32:21] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:21] Load avg. on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:22] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:31] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:32] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:33] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:42] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:52] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:52] SMF on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:32:52] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:33:04] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:33:04] MySQL on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:33:13] MySQL slave on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:33:13] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:33:51] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:11] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [09:34:13] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:34:13] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [09:34:22] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:35:12] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:35:42] Load avg. on hyacinth is OK: OK - load average: 0.03, 0.58, 1.27 [09:35:51] Load avg. on z-dat-s4-a is OK: OK - load average: 0.03, 0.57, 1.26 [09:35:52] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 118648 MB (29% inode=99%): [09:35:53] / on z-dat-s4-a is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:35:53] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:35:53] SMF on z-dat-s3-a is OK: OK - all services online [09:35:54] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:35:54] SMF on z-dat-s7-a is OK: OK - all services online [09:35:54] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 2842 MB (99% inode=99%): [09:35:57] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 2842 MB (99% inode=99%): [09:35:57] SMTP on hyacinth is OK: SMTP OK - 9.826 sec. response time [09:35:57] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:35:57] MySQL on z-dat-s4-a is OK: Uptime: 1118825 Threads: 10 Questions: 43315161 Slow queries: 18212 Opens: 25253 Flush tables: 1 Open tables: 493 Queries per second avg: 38.714 [09:35:58] MySQL on z-dat-s3-a is OK: Uptime: 1209938 Threads: 19 Questions: 1570083229 Slow queries: 101046 Opens: 11447444 Flush tables: 1 Open tables: 16384 Queries per second avg: 1297.655 [09:35:58] MySQL slave on z-dat-s3-a is OK: Uptime: 1209938 Threads: 18 Questions: 1570083230 Slow queries: 101046 Opens: 11447444 Flush tables: 1 Open tables: 16384 Queries per second avg: 1297.655 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 390 [09:35:58] / on z-dat-s6-a is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:35:59] Load avg. on z-dat-s6-a is OK: OK - load average: 0.08, 0.57, 1.25 [09:35:59] MySQL slave on z-dat-s7-a is OK: Uptime: 1642360 Threads: 4 Questions: 395472392 Slow queries: 60334 Opens: 2935000 Flush tables: 1 Open tables: 6706 Queries per second avg: 240.795 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 393 [09:36:13] / on hyacinth is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:36:13] SMF on hyacinth is OK: OK - all services online [09:36:13] /tmp on hyacinth is OK: DISK OK - free space: /tmp 2928 MB (100% inode=99%): [09:36:14] SMTP on z-dat-s4-a is OK: SMTP OK - 0.002 sec. response time [09:36:14] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:36:21] SMF on z-dat-s4-a is OK: OK - all services online [09:36:22] SMF on z-dat-s6-a is OK: OK - all services online [09:36:32] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:36:42] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [09:58:42] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:42] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:42] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:58:42] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:52] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:52] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:52] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:59:02] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:12] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:12] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:13] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:13] SMF on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:14] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:14] Load avg. on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:21] Load avg. on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:22] Load avg. on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:22] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:22] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:22] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:59:22] /tmp on hyacinth is OK: DISK OK - free space: /tmp 3078 MB (100% inode=99%): [09:59:22] / on hyacinth is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:59:31] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:59:32] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:59:42] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [09:59:42] / on z-dat-s4-a is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:59:42] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 117935 MB (29% inode=99%): [09:59:42] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:59:42] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:59:42] /sql on z-dat-s3-a is OK: DISK OK - free space: /sql 202502 MB (20% inode=99%): [09:59:42] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:59:43] SMF on z-dat-s7-a is OK: OK - all services online [09:59:43] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 3097 MB (99% inode=99%): [09:59:44] Load avg. on z-dat-s7-a is OK: OK - load average: 1.02, 1.47, 1.67 [09:59:51] Load avg. on z-dat-s3-a is OK: OK - load average: 1.07, 1.47, 1.67 [09:59:51] Load avg. on z-dat-s4-a is OK: OK - load average: 1.07, 1.47, 1.67 [09:59:51] / on z-dat-s7-a is OK: DISK OK - free space: / 11643 MB (38% inode=87%): [09:59:51] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 3097 MB (99% inode=99%): [09:59:52] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 3097 MB (99% inode=99%): [10:20:54] 3(created) [MNT-1221] Hard disk failure in daphne; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1221> (Marlen Caemmerer) [10:26:22] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:27:12] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:27:12] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:28:13] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [10:28:51] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453667 MB (8% inode=44%): [10:42:21] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=2.064453/1.8, alarm hl:np_load_avg=1.752441/2.3, alarm hl:mem_free=714.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=2.064453/1.9, alarm hl:np_load_long=1.415527/2.25, alarm hl:mem_free=714.000000M/200M, alarm hl:available=1/0 [10:42:41] Load avg. on willow is WARNING: WARNING - load average: 16.20, 14.28, 11.55 [10:44:41] Load avg. on willow is OK: OK - load average: 14.86, 14.41, 11.94 [10:47:41] Load avg. on willow is WARNING: WARNING - load average: 14.50, 15.11, 12.72 [10:48:21] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [10:52:21] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=2.242188/1.8, alarm hl:np_load_avg=2.173828/2.3, alarm hl:mem_free=891.000000M/300M, alarm hl:available=1/0: longrun@willow exceedes load threshold: alarm hl:np_load_short=2.242188/1.9, alarm hl:np_load_long=1.779785/2.25, alarm hl:mem_free=891.000000M/200M, alarm hl:available=1/0 [11:20:40] Load avg. on willow is WARNING: WARNING - load average: 20.07, 17.84, 17.40 [11:24:22] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [11:26:22] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:28:12] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:28:12] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:28:22] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [11:28:52] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453421 MB (8% inode=44%): [11:36:41] Load avg. on willow is OK: OK - load average: 9.52, 12.66, 14.72 [12:27:21] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:27:31] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:28:12] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:28:21] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:29:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 453309 MB (8% inode=44%): [12:29:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [12:42:12] Sun Grid Engine execd on ortelius is WARNING: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.027344/1.00, alarm hl:np_load_long=0.838867/1.50, alarm hl:mem_free=17420.000000M/300M, alarm hl:available=1/0 [12:45:12] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [12:55:02] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 30709 MB (3% inode=99%): [13:12:51] [[Toolserver notice]] 10https://wiki.toolserver.org/w/index.php?diff=6851&oldid=3413&rcid=9026 * Krinkle * (+5) (use 'ts-' namespace, use the classname instead of ID in CSS) [13:14:22] [[Toolserver notice]] 10https://wiki.toolserver.org/w/index.php?diff=6852&oldid=6851&rcid=9027 * Krinkle * (+15) (Not sure if it can contain html, but unless intended better to escape. Not just security but also readability (e.g "foo < 5; bar > 10" might get corrupted if not escaped properly)) [13:27:31] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:27:41] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:28:11] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:28:21] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:29:02] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452744 MB (8% inode=44%): [13:29:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [14:02:57] 3(updated) [ACCAPP-473] Can I have a nickname here? <10https://jira.toolserver.org/browse/ACCAPP-473> (Bernhard Westmark) [14:27:51] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:28:21] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:28:31] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:29:21] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:29:21] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [14:30:02] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452561 MB (8% inode=44%): [14:42:21] APT on yarrow is WARNING: APT WARNING: 0 packages available for upgrade (0 critical updates). warnings detected, errors detected. run with -v for information. [14:47:21] APT on yarrow is OK: APT OK: 0 packages available for upgrade (0 critical updates). [14:53:07] toolserver: 03dab * r1139 10/trunk/TSStatus/ (2 files in 2 dirs): [14:53:07] toolserver: -Added more service-types. [14:53:07] toolserver: -Added a info-icon. [15:27:51] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:28:22] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:28:31] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:29:22] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:29:31] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [15:30:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452343 MB (8% inode=44%): [15:38:22] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.832031/1.10, alarm hl:np_load_long=0.902344/1.55, alarm hl:mem_free=18139.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.832031/1.00, alarm hl:np_load_long=0.902344/1.50, alarm hl:mem_free=18139.000000M/300M, alarm hl:available=1/0 [15:48:26] [[Special:Log/newusers]] create 10 * SammMaust * (New user account) [15:49:04] [[Web Based Weight loss plan Reduces Pounds]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6853&rcid=9029 * SammMaust * (+4151) (Created page with "Just the culinary gurus in Hell's Cooking area understand what to carry out with all those! For many swift and simple ideas to drop excess fat quick, it is best to consult with y...") [15:54:51] [[Special:Log/delete]] delete 10 * Dab * (deleted "[[02Web Based Weight loss plan Reduces Pounds10]]": SPAM) [15:55:16] [[Special:Log/block]] block 10 * Dab * (blocked [[02User:SammMaust10]] with an expiry time of infinite (account creation disabled): Inserting nonsense/gibberish into pages) [16:03:22] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [16:07:46] toolserver: 03dab * r1140 10/trunk/TSStatus/WEB-INF/ (4 files in 3 dirs): [16:07:46] toolserver: -Updated web.xml to tomcat7-syntax. [16:07:46] toolserver: -Began to introduce master/slave-system. [16:15:32] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_short=1.104004/1.8, alarm hl:np_load_avg=0.911621/2.3, alarm hl:mem_free=231.000000M/300M, alarm hl:available=1/0 [16:16:31] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [16:27:52] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:28:22] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:28:31] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:29:21] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [16:29:31] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [16:30:01] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452613 MB (8% inode=44%): [16:57:21] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.454101/1.10, alarm hl:np_load_long=1.079101/1.55, alarm hl:mem_free=17893.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.454101/1.00, alarm hl:np_load_long=1.079101/1.50, alarm hl:mem_free=17893.000000M/300M, alarm hl:available=1/0 [16:59:21] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [17:25:19] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.733399/1.10, alarm hl:np_load_long=0.993164/1.55, alarm hl:mem_free=18250.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.733399/1.00, alarm hl:np_load_long=0.993164/1.50, alarm hl:mem_free=18250.000000M/300M, alarm hl:available=1/0 [17:28:00] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:28:29] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:29:18] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [17:29:29] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:29:29] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:29:37] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [17:30:08] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452423 MB (8% inode=44%): [18:21:18] Sun Grid Engine execd on ortelius is WARNING: short@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.10, alarm hl:np_load_long=0.911133/1.55, alarm hl:mem_free=18220.000000M/300M, alarm hl:available=1/0: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.168945/1.00, alarm hl:np_load_long=0.911133/1.50, alarm hl:mem_free=18220.000000M/300M, alarm hl:available=1/0 [18:22:19] Sun Grid Engine execd on ortelius is OK: short@ortelius OK: all.q@ortelius OK [18:28:08] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:29:28] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:29:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [18:30:09] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452254 MB (8% inode=44%): [18:30:28] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [18:30:28] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:31:26] hi DaBPunkt [18:53:19] Sun Grid Engine execd on ortelius is WARNING: all.q@ortelius exceedes load threshold: alarm hl:np_load_short=1.113281/1.00, alarm hl:np_load_long=0.885742/1.50, alarm hl:mem_free=17412.000000M/300M, alarm hl:available=1/0 [18:56:19] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius disabled: all.q@ortelius OK [19:06:28] Sun Grid Engine execd on wolfsbane is WARNING: all.q@wolfsbane exceedes load threshold: alarm hl:np_load_short=0.210938/1.00, alarm hl:np_load_long=0.228027/1.50, alarm hl:mem_free=298.000000M/300M, alarm hl:available=1/0 [19:07:28] Sun Grid Engine execd on wolfsbane is OK: short-sol@wolfsbane disabled: all.q@wolfsbane OK: medium-sol@wolfsbane disabled [19:28:08] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:28:28] hello [19:28:44] i will do a maintenance of s1 in a few minutes [19:29:18] rosemary will get a san volume for commons if mysql is fine with it [19:29:28] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:29:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [19:29:58] i will have to stop mysql on rosemary for moving commons [19:30:38] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:31:08] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 452088 MB (8% inode=44%): [19:31:55] 3(created) [MNT-1222] maintenance Wed 14. March; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [19:33:53] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [19:34:03] nosy or DaBPunkt: I've got a problem, my SGE jobs aren't running, all my cronsub's stay in queue with state qw forever [19:34:18] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [19:34:27] This is happening since around 13:45 UTC today [19:34:28] jem-: yes, that's indended. We have a maintenance to do there [19:34:42] Oh [19:34:54] DaBPunkt: because of the update? [19:35:00] yes [19:35:01] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [19:35:01] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [19:35:01] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [19:35:03] sure this is since 13:45? [19:35:38] It that goes for me, nosy, yes, quite sure [19:35:54] nosy: all jobs running longer than 19:15 were not startet today [19:35:55] we stopped the accepting arround 15 O'cklck UTC [19:35:59] http://es.wikipedia.org/wiki/Especial:Contribuciones/Jembot [19:36:17] ok [19:36:41] at 18:45 it was still possible to submit a job having a runtime less than half an hour [19:37:03] is wikipedia staging a update? [19:37:22] JRWR: how do you mean? [19:37:55] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [19:37:56] we here at toolserver are trying to move a database to a san volume and to update sge [19:38:50] JRWR: what do you mean by update? you mean an update of the mediawiki? or database software or something? [19:56:09] /aux0 on hemlock is OK: DISK OK - free space: /aux0 674894 MB (12% inode=54%): [20:01:55] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [20:11:08] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:11:58] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.057 second response time [20:13:28] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: all.q@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [20:13:38] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [20:22:55] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [20:28:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:28:18] Sun Grid Engine execd on ortelius is CRITICAL: short-sol@ortelius in unknown state: all.q@ortelius in unknown state: medium-sol@ortelius in unknown state [20:28:28] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [20:28:52] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [20:29:28] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:29:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [20:30:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:33:54] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [20:34:18] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [20:36:00] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [20:36:00] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [20:36:00] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [20:53:53] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [20:54:35] [[Special:Log/newusers]] create 10 * Gnuinux * (New user account) [20:56:33] [[User:Gnuinux]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6854&rcid=9033 * Gnuinux * (+1) (Created page with ".") [20:58:13] [[User talk:Gnuinux]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6855&rcid=9034 * Gnuinux * (+1) (Created page with ".") [21:01:39] Sun Grid Engine execd on willow is CRITICAL: medium-sol@willow in unknown state: longrun-sol@willow in unknown state [21:13:29] Sun Grid Engine execd on wolfsbane is CRITICAL: short-sol@wolfsbane in unknown state: medium-sol@wolfsbane in unknown state [21:13:39] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [21:28:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:28:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [21:28:28] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [21:28:28] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [21:28:38] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [21:29:38] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:30:38] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:30:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [21:34:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [21:36:58] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [21:36:59] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [21:36:59] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [22:01:09] /msg NickServ REGISTER bunyip79 melbournejosh@gmail.com [22:02:07] net: you might want to change that password [22:03:58] MySQL on rosemary is OK: Uptime: 16 Threads: 2 Questions: 11 Slow queries: 0 Opens: 20 Flush tables: 1 Open tables: 13 Queries per second avg: 0.687 [22:03:58] MySQL slave on rosemary is OK: Uptime: 16 Threads: 2 Questions: 13 Slow queries: 0 Opens: 20 Flush tables: 1 Open tables: 13 Queries per second avg: 0.812 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [22:06:58] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [22:08:58] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [22:12:09] /sql on rosemary is OK: DISK OK - free space: /sql 321789 MB (33% inode=99%): [22:12:58] MySQL on rosemary is OK: Uptime: 53 Threads: 5 Questions: 1690 Slow queries: 6 Opens: 93 Flush tables: 1 Open tables: 86 Queries per second avg: 31.886 [22:13:39] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [22:17:55] Merlissimo: hello, i'm having problems with qsub, i belive they've started this afternoon, i'm now having this error message: http://pastebin.com/Mnv5yJ7G [22:18:26] Alchimista: thats subject of maintenance currently [22:18:55] oh...syntax error probably not [22:19:30] but sge is currently in maintenance state [22:20:02] Ah, ok then, i'll whait. any idea of how much time it will take? i'll change the cronie tab, to prevent new submissions [22:20:43] hm...seems between 5 and 60 minutes [22:22:10] ok, so just make your magic ;) [22:26:58] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3573.000000 [22:28:09] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:28:27] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:28:28] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:28:38] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [22:28:39] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:28:42] i dont, dab and merl are doing the magic [22:29:28] merl for the most parts [22:29:39] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:29:58] s4 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1800.000000 [22:30:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:30:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [22:31:09] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 580832 MB (10% inode=50%): [22:34:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 8653.000000 [22:36:52] 3(commented) [MNT-1222] maintenance Wed 14. March <10https://jira.toolserver.org/browse/MNT-1222> (DaB.) [23:06:58] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 4829 [23:13:39] SMF on wolfsbane is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [23:28:18] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:28:28] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:28:29] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [23:28:59] are the databases locked down currently? I'm getting errors that a database on sql-s1-user is in readonly mode [23:29:38] SMF on ortelius is CRITICAL: ERROR - maintenance: svc:/application/sge/execd:toolserver [23:29:39] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:29:49] * Hersfold missed the bit in the topic that says "Status: Maintenance" and assumes that's why, never mind [23:30:39] SMF on damiana is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:30:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:30:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:30:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [23:31:09] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 568607 MB (10% inode=49%): [23:31:54] Hersfold: http://status.toolserver.org/ few things are down and i'm getting an email every 5 mins about it [23:32:06] yeah just looked there [23:35:18] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3967.000000 [23:44:02] Platonides * [Toolserver-l] /opt/local/bin/cronsub[4]: /sge62/default/common/settings.sh [23:44:18] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3579.000000 [23:44:59] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3527 [23:48:09] /sql on thyme is WARNING: DISK WARNING - free space: /sql 197880 MB (20% inode=99%): [23:49:08] /sql on thyme is OK: DISK OK - free space: /sql 201684 MB (21% inode=99%): [23:51:02] [[w:en:User:Madman]] * Re: [Toolserver-l] /opt/local/bin/cronsub[4]: /sge62/default/common/settings.sh [23:57:58] MySQL slave on rosemary is OK: Uptime: 6352 Threads: 8 Questions: 2473005 Slow queries: 423 Opens: 397 Flush tables: 1 Open tables: 356 Queries per second avg: 389.326 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1757 [23:58:18] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1711.000000