[00:03:05] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [00:04:06] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 470204 MB (8% inode=44%): [00:07:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 79755 [00:12:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.907715/1.8, alarm hl:np_load_avg=1.626465/2.3, alarm hl:mem_free=134.000000M/300M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.907715/1.9, alarm hl:np_load_long=1.530762/2.25, alarm hl:mem_free=134.000000M/200M, alarm hl:available=1/0 [00:12:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [00:13:32] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 273 bytes in 10.196 second response time [00:20:24] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:26:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [00:27:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.080566/1.8, alarm hl:np_load_avg=1.203613/2.3, alarm hl:mem_free=248.000000M/300M, alarm hl:available=1/0 [00:29:42] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 184470.000000 [00:30:11] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 184501.000000 [00:34:23] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:47:23] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.826172/1.8, alarm hl:np_load_avg=0.818359/2.3, alarm hl:mem_free=276.000000M/300M, alarm hl:available=1/0 [00:51:24] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [00:56:23] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [00:56:23] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [00:58:02] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [00:58:23] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:58:23] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [01:02:22] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.777832/1.8, alarm hl:np_load_avg=0.778809/2.3, alarm hl:mem_free=268.000000M/300M, alarm hl:available=1/0 [01:05:02] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 469563 MB (8% inode=44%): [01:08:01] MySQL slave on rosemary is CRITICAL: (Service Check Timed Out) [01:12:27] Any roots around to restart the query-killer? [01:13:54] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [01:14:22] MySQL on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [01:21:03] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 272 bytes in 10.293 second response time [01:29:55] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [01:30:22] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 188108.000000 [01:56:52] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [01:57:23] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [01:58:12] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [01:58:54] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:58:54] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:05:12] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 469109 MB (8% inode=44%): [02:08:41] MySQL slave on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [02:09:34] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [02:14:03] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [02:14:33] MySQL on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [02:15:22] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 273 bytes in 10.749 second response time [02:21:33] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [02:30:05] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [02:30:33] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 191718.000000 [02:47:22] fisheye.toolserver.org on web.amaranth is OK: HTTP OK: HTTP/1.1 200 OK - 272 bytes in 10.072 second response time [02:48:53] 3(commented) [MNT-1225] Growing replag on S1 due to a database migration at WMF <10https://jira.toolserver.org/browse/MNT-1225> (Dispenser) [02:50:34] @replag [02:50:34] matthewrbowker: s1-rr-a: 2d 5h 35m 30s [+1.00 s/s]; s1-user: 2d 5h 35m 30s [+1.00 s/s]; s2-user: 11s [-0.00 s/s]; s3-rr-a: 5m 11s [+0.02 s/s]; s3-user: 5m 11s [+0.02 s/s]; s7-rr-a: 10s [+0.00 s/s]; s7-user: 10s [+0.00 s/s] [02:53:05] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 195978 MB (20% inode=99%): [02:57:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [02:57:33] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [02:58:22] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [02:59:06] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [02:59:52] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:05:29] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 466863 MB (8% inode=44%): [03:08:44] MySQL slave on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [03:10:21] ^ Dispenser: LOL, you're not the only one with the problem :P [03:12:52] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.000976/1.00, alarm hl:np_load_long=0.829102/1.50, alarm hl:mem_free=21200.000000M/300M, alarm hl:available=1/0 [03:13:09] While I don't have as many tools as Magnus, the polish on my ensures that my users are quite vocal about it [03:14:13] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [03:14:42] MySQL on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [03:14:52] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [03:18:07] Dispenser: That's both a good thing and a bad thing. [03:30:15] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [03:30:43] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 195329.000000 [03:57:14] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [03:57:44] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [03:58:44] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [03:59:14] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [04:00:15] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:06:23] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 466424 MB (8% inode=44%): [04:08:43] MySQL slave on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [04:14:12] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [04:14:44] MySQL on rosemary is CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [04:30:43] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 198933.000000 [04:31:12] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: User nagios has exceeded the max_user_connections resource (current value: 15) [04:44:52] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.090820/1.00, alarm hl:np_load_long=0.833008/1.50, alarm hl:mem_free=21340.000000M/300M, alarm hl:available=1/0 [04:45:52] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [04:51:01] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 272 bytes in 19.633 second response time [04:55:26] SMF on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:55:52] SMF on hyacinth is OK: OK - all services online [04:57:24] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [04:57:53] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [04:58:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [04:59:24] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:00:24] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:01:02] fisheye.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 273 bytes in 20.444 second response time [05:03:24] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.115234/1.8, alarm hl:np_load_avg=0.978516/2.3, alarm hl:mem_free=229.000000M/300M, alarm hl:available=1/0 [05:04:02] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 273 bytes in 19.587 second response time [05:05:45] SMF on rosemary is CRITICAL: ERROR - maintenance: svc:/application/ts/mysql-51:default [05:06:23] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [05:06:23] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 464294 MB (8% inode=44%): [05:06:54] SMF on rosemary is OK: OK - all services online [05:08:43] MySQL slave on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [05:13:25] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.423340/1.8, alarm hl:np_load_avg=1.221191/2.3, alarm hl:mem_free=281.000000M/300M, alarm hl:available=1/0 [05:14:43] MySQL on rosemary is CRITICAL: Cant connect to MySQL server on rosemary (146) [05:15:12] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on rosemary (146) [05:19:43] MySQL on rosemary is OK: Uptime: 777 Threads: 50 Questions: 427 Slow queries: 2 Opens: 67 Flush tables: 1 Open tables: 57 Queries per second avg: 0.549 [05:19:43] MySQL slave on rosemary is OK: Uptime: 779 Threads: 52 Questions: 439 Slow queries: 2 Opens: 67 Flush tables: 1 Open tables: 57 Queries per second avg: 0.563 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [05:20:12] s4 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 839.000000 [05:26:53] 3(commented) [MAGNUS-308] Catscan broken on plwiki <10https://jira.toolserver.org/browse/MAGNUS-308> (bulwersator) [05:30:42] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 202536.000000 [05:31:24] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 202568.000000 [05:39:02] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 273 bytes in 17.784 second response time [05:47:11] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [05:47:54] 3(commented) [MNT-1225] Growing replag on S1 due to a database migration at WMF <10https://jira.toolserver.org/browse/MNT-1225> (Marlen Caemmerer) [05:48:02] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 272 bytes in 18.504 second response time [05:56:12] fisheye.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 273 bytes in 20.902 second response time [05:57:32] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [05:58:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [05:58:53] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [05:59:33] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [05:59:57] 3(issue comment edited) [MNT-1225] Growing replag on S1 due to a database migration at WMF <10https://jira.toolserver.org/browse/MNT-1225> (Dispenser) [06:00:33] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:05:59] 3(resolved) [OSM-3] Ptolemy Postgres crashed twice <10https://jira.toolserver.org/browse/OSM-3> (Marlen Caemmerer) [06:06:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 464232 MB (8% inode=44%): [06:07:53] 3(commented) [OSM-6] mapnik tirex-backend occasionally failing <10https://jira.toolserver.org/browse/OSM-6> (Marlen Caemmerer) [06:13:53] 3(commented) [OSM-9] Periodic "Connection refused" on tirex mod_tile socket <10https://jira.toolserver.org/browse/OSM-9> (Marlen Caemmerer) [06:17:11] fisheye.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 274 bytes in 20.661 second response time [06:22:56] 3(created) [MAGNUS-309] Make URLs protocol relative; Magnus' tools; Minor Bug <10https://jira.toolserver.org/browse/MAGNUS-309> (Dispenser) [06:30:52] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 206139.000000 [06:32:23] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 206229.000000 [06:57:43] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [06:58:02] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [06:59:02] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [06:59:43] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [07:00:43] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:02:22] Load avg. on willow is WARNING: WARNING - load average: 24.50, 16.47, 11.11 [07:03:32] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.636719/1.8, alarm hl:np_load_avg=2.082031/2.3, alarm hl:mem_free=292.000000M/300M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.636719/1.9, alarm hl:np_load_long=1.430176/2.25, alarm hl:mem_free=292.000000M/200M, alarm hl:available=1/0 [07:06:45] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 464185 MB (8% inode=44%): [07:11:23] Load avg. on willow is CRITICAL: CRITICAL - load average: 36.99, 23.36, 16.25 [07:12:22] Load avg. on willow is WARNING: WARNING - load average: 26.66, 22.97, 16.55 [07:17:22] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [07:21:33] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [07:26:22] Load avg. on willow is OK: OK - load average: 9.11, 12.39, 14.96 [07:31:02] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 209750.000000 [07:33:23] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 209889.000000 [07:57:45] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [07:58:16] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [07:59:26] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [07:59:46] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [08:00:45] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:01:54] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:07:04] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 464064 MB (8% inode=44%): [08:17:37] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [08:31:15] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 213361.000000 [08:33:36] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 213501.000000 [08:58:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [08:58:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [08:59:35] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [09:00:04] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [09:01:03] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:01:25] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:07:04] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 464004 MB (8% inode=44%): [09:11:45] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:11:45] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:54] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:11:55] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:55] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:55] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:55] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:56] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:56] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:12:16] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:12:16] / on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:12:35] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:12:45] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:12:45] s4 replag on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [09:12:46] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:12:54] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [09:13:15] / on hyacinth is OK: DISK OK - free space: / 11633 MB (38% inode=87%): [09:13:15] /tmp on hyacinth is OK: DISK OK - free space: /tmp 3044 MB (100% inode=99%): [09:13:15] s4 replag on z-dat-s4-a is OK: QUERY OK: SELECT ts_rc_age() returned 232.000000 [09:13:15] MySQL slave on z-dat-s7-a is OK: Uptime: 2332201 Threads: 18 Questions: 533183869 Slow queries: 77923 Opens: 3913369 Flush tables: 1 Open tables: 6919 Queries per second avg: 228.618 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 274 [09:13:16] MySQL on z-dat-s7-a is OK: Uptime: 2332202 Threads: 18 Questions: 533183939 Slow queries: 77923 Opens: 3913369 Flush tables: 1 Open tables: 6919 Queries per second avg: 228.618 [09:13:26] SMF on z-dat-s6-a is OK: OK - all services online [09:13:26] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:13:26] MySQL on z-dat-s3-a is OK: Uptime: 1899791 Threads: 15 Questions: 2083768294 Slow queries: 123420 Opens: 14559159 Flush tables: 1 Open tables: 16384 Queries per second avg: 1096.840 [09:13:26] MySQL slave on z-dat-s3-a is OK: Uptime: 1899791 Threads: 15 Questions: 2083768296 Slow queries: 123420 Opens: 14559159 Flush tables: 1 Open tables: 16384 Queries per second avg: 1096.840 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 246 [09:13:35] SMTP on z-dat-s7-a is OK: SMTP OK - 0.004 sec. response time [09:13:45] SMTP on hyacinth is OK: SMTP OK - 0.003 sec. response time [09:13:46] SMTP on z-dat-s4-a is OK: SMTP OK - 0.004 sec. response time [09:13:46] SMTP on z-dat-s3-a is OK: SMTP OK - 0.002 sec. response time [09:13:46] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:13:46] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:13:46] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:17:46] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [09:32:15] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 217026.000000 [09:33:46] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 217112.000000 [09:55:25] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:55:44] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:44] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] / on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] /sql on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] / on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] / on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:45] SMF on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:55:55] SMF on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:56:15] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:56:15] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [09:56:15] /sql on z-dat-s6-a is OK: DISK OK - free space: /sql 198417 MB (20% inode=99%): [09:56:15] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [09:56:15] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:56:16] / on z-dat-s7-a is OK: DISK OK - free space: / 11633 MB (38% inode=87%): [09:56:16] /sql on z-dat-s4-a is OK: DISK OK - free space: /sql 113965 MB (28% inode=99%): [09:56:17] / on z-dat-s4-a is OK: DISK OK - free space: / 11633 MB (38% inode=87%): [09:56:17] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 2890 MB (99% inode=99%): [09:56:18] SMF on z-dat-s3-a is OK: OK - all services online [09:56:24] / on z-dat-s6-a is OK: DISK OK - free space: / 11633 MB (38% inode=87%): [09:56:24] MySQL slave on z-dat-s3-a is OK: Uptime: 1902367 Threads: 11 Questions: 2084778537 Slow queries: 123509 Opens: 14560117 Flush tables: 1 Open tables: 16384 Queries per second avg: 1095.886 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 110 [09:56:24] MySQL on z-dat-s3-a is OK: Uptime: 1902367 Threads: 11 Questions: 2084778539 Slow queries: 123509 Opens: 14560117 Flush tables: 1 Open tables: 16384 Queries per second avg: 1095.886 [09:56:24] SMF on z-dat-s6-a is OK: OK - all services online [09:58:04] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [09:58:24] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [09:59:34] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [10:00:04] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [10:01:04] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:06:56] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.854004/1.8, alarm hl:np_load_avg=0.840820/2.3, alarm hl:mem_free=250.000000M/300M, alarm hl:available=1/0 [10:07:04] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 463914 MB (8% inode=44%): [10:13:55] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [10:16:57] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.894043/1.8, alarm hl:np_load_avg=0.866211/2.3, alarm hl:mem_free=246.000000M/300M, alarm hl:available=1/0 [10:17:55] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [10:23:39] [[Talk:User-store]] ! 10https://wiki.toolserver.org/w/index.php?diff=6914&oldid=6620&rcid=9108 * Nemobis * (+970) (/* Stats compression */ new section) [10:24:05] /aux0 on hemlock is OK: DISK OK - free space: /aux0 624338 MB (11% inode=52%): [10:26:57] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:32:24] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 220635.000000 [10:33:55] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 220721.000000 [10:47:02] [[Template:Hidden]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6915&rcid=9109 * Nemobis * (+30) (Redirected page to [[Template:Dropdown]]) [10:47:15] [[Template:Collapsed]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6916&rcid=9110 * Nemobis * (+30) (Redirected page to [[Template:Dropdown]]) [10:48:03] [[Talk:User-store]] !M 10https://wiki.toolserver.org/w/index.php?diff=6917&oldid=6914&rcid=9111 * Nemobis * (+31) (/* Update */ collpase) [10:51:25] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [10:56:58] /sql on thyme is CRITICAL: DISK CRITICAL - free space: /sql 85338 MB (8% inode=99%): [10:58:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [10:58:45] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [11:00:05] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [11:00:05] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [11:01:07] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:07:55] 3(created) [MNT-1226] Adjusted munin runs on hemlock; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1226> (Marlen Caemmerer) [11:07:58] 3(resolved) [MNT-1226] Adjusted munin runs on hemlock <10https://jira.toolserver.org/browse/MNT-1226> (Marlen Caemmerer) [11:18:35] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [11:32:45] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 224253.000000 [11:34:05] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 224339.000000 [11:38:38] [[~hippietrail/apiproxy.fcgi]] ! 10https://wiki.toolserver.org/w/index.php?diff=6918&oldid=2257&rcid=9112 * Bailey Helton * (+56) () [11:38:40] [[Ghel]] ! 10https://wiki.toolserver.org/w/index.php?diff=6919&oldid=6199&rcid=9113 * Bailey Helton * (+60) () [11:38:42] [[Editcounter]] ! 10https://wiki.toolserver.org/w/index.php?diff=6920&oldid=3515&rcid=9114 * Bailey Helton * (+63) () [11:58:25] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [11:59:06] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [12:00:26] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [12:00:35] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [12:01:26] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:18:55] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [12:32:46] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 227859.000000 [12:33:56] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.215820/1.10, alarm hl:np_load_long=0.808594/1.55, alarm hl:mem_free=21521.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.215820/1.00, alarm hl:np_load_long=0.808594/1.50, alarm hl:mem_free=21521.000000M/300M, alarm hl:available=1/0 [12:34:16] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 227942.000000 [12:34:56] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [12:43:56] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.10, alarm hl:np_load_long=0.869140/1.55, alarm hl:mem_free=21192.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.222656/1.00, alarm hl:np_load_long=0.869140/1.50, alarm hl:mem_free=21192.000000M/300M, alarm hl:available=1/0 [12:46:56] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:58:48] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [12:59:07] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [13:00:47] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [13:00:56] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [13:01:48] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [13:04:15] @replag [13:04:16] DaBPunkt: s1-rr-a: 2d 15h 49m 11s [+1.00 s/s]; s1-user: 2d 15h 49m 11s [+1.00 s/s] [13:16:59] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [13:19:15] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [13:32:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 231467.000000 [13:34:45] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 231578.000000 [13:58:58] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [13:59:17] DaBPunkt: How is ts_replag implement? If I query a wiki that updates every hour will it give me the replag from the most frequently updated wiki? [13:59:25] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:00:56] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [14:01:35] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [14:01:56] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:02:23] Dispenser: the table/view ts_replag? That is the time-diff between the last edit of a wiki and now [14:07:36] DaBPunkt: So I connect to angwiktionary_p on the S3 clusters which has a replication lag of 50 sec and ts_replag gives me 10,356 sec [14:10:15] Dispenser: the REAL replag of s3 is 0 secs. at the moment. The problem is that the users can not see the real replag, so every replag-tool that is run by users uses a time-fiff between a wiki and now(). The problem is now to CHOOSE the right wiki to do this, because if there is no activity on the wiki, the replag would (wrongly) increase. [14:13:38] Dispenser: angwiktionary_p for example would NOT be a good wii to choose for a cluster, because the last edit there was yesterday, 18 o' clock [14:17:41] The idea was that someone else would select such a wiki and change (e.g. itwiki strike) [14:18:59] in a tool? [14:19:41] in the view [14:19:57] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [14:20:14] Dispenser: the view gives you the replag ofr a wiki, not of a cluster [14:20:51] So tool authors could just SELECT ts_replag FROM toolserver.replag and not worry [14:22:44] [[Replication lag]] 10https://wiki.toolserver.org/w/index.php?diff=6921&oldid=6327&rcid=9115 * Dispenser * (+36) (/* Determining lag by wiki */ More efficient query. As EXPLAIN puts it "Select tables optimized away") [14:33:07] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 235074.000000 [14:33:31] Dispenser: there is no simple way to get the replag of a cluster and store it somewhere; and there is also the problem if there are more than 1 cluster on a server (for example, commons is nearly everywhere) [14:34:57] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 235188.000000 [14:48:26] [[Code snippets]] 10https://wiki.toolserver.org/w/index.php?diff=6922&oldid=5972&rcid=9116 * Dispenser * (-30) (/* Calculate replag */ More efficient query. As EXPLAIN puts it "Select tables optimized away") [14:53:55] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 172513 MB (17% inode=99%): [14:59:05] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [14:59:36] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [14:59:40] I hate to ask because I know it's been asked a million times already and I don't know where else to work besides the "status" page, but any update on when the sql-s1-rr will return? [14:59:54] err sql-s1-user rather [15:01:06] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [15:01:08] return? [15:01:18] It's locked up in read-only mode [15:01:45] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [15:01:46] as soon as the shema-update is finish [15:01:52] Alright, thanks [15:02:06] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:02:10] Also, could you tell me what project to submit a ticket to, to get my display name on JIRA changed? [15:04:06] TParis: I can try to do it now. How is the old-name and how should the nw-name be? [15:20:56] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [15:24:52] 3(created) [TS-1334] When writing to userdb: The MySQL server is running with the --read-only option; Toolserver: Databases; Bug <10https://jira.toolserver.org/browse/TS-1334> (Aaron Halfaker) [15:26:03] Aaron Halfaker * [Toolserver-l] The MySQL server is running with the --read-only option [15:31:56] 3(resolved) [TS-1334] When writing to userdb: The MySQL server is running with the --read-only option <10https://jira.toolserver.org/browse/TS-1334> (DaB.) [15:34:06] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 238734.000000 [15:35:21] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 238805.000000 [15:36:56] 3(commented) [TS-1334] When writing to userdb: The MySQL server is running with the --read-only option <10https://jira.toolserver.org/browse/TS-1334> (Aaron Halfaker) [15:46:53] 3(commented) [TS-1334] When writing to userdb: The MySQL server is running with the --read-only option <10https://jira.toolserver.org/browse/TS-1334> (DaB.) [15:50:56] 3(commented) [TS-1334] When writing to userdb: The MySQL server is running with the --read-only option <10https://jira.toolserver.org/browse/TS-1334> (Aaron Halfaker) [15:56:19] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.626953/1.8, alarm hl:np_load_avg=0.605957/2.3, alarm hl:mem_free=223.000000M/300M, alarm hl:available=1/0 [15:59:19] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [15:59:59] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [16:00:52] 3(created) [PATHOSCHILD-7] The top part of the scrolldown; Pathoschild's tools; Trivial Improvement <10https://jira.toolserver.org/browse/PATHOSCHILD-7> (Ankit Maity) [16:01:19] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [16:01:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [16:02:07] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [16:02:28] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:02:57] 3(updated) [PATHOSCHILD-7] The top part of the scrolldown is blank <10https://jira.toolserver.org/browse/PATHOSCHILD-7> (Ankit Maity) [16:21:37] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [16:34:18] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 242350.000000 [16:35:20] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 242410.000000 [16:59:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:00:08] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [17:01:38] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [17:02:20] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [17:02:38] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:07:38] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.537109/1.10, alarm hl:np_load_long=0.895508/1.55, alarm hl:mem_free=21223.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.537109/1.00, alarm hl:np_load_long=0.895508/1.50, alarm hl:mem_free=21223.000000M/300M, alarm hl:available=1/0 [17:17:38] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:17:46] @replag [17:17:47] DaBPunkt: s1-rr-a: 2d 20h 2m 42s [+1.00 s/s]; s1-user: 2d 20h 2m 42s [+1.00 s/s]; s6-rr-a: 20s [+0.00 s/s]; s6-user: 20s [+0.00 s/s] [17:21:47] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [17:30:04] [[Special:Log/newusers]] create 10 * Jackie * (New user account) [17:34:19] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 245949.000000 [17:36:19] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 246070.000000 [17:39:06] [[User talk:Jackie]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6923&rcid=9118 * Jackie * (+37) (soft redirect) [17:42:57] [[User:Jackie]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6924&rcid=9119 * Jackie * (+118) (own) [17:51:44] [[Заглавная страница]] ! 10https://wiki.toolserver.org/w/index.php?diff=6925&oldid=5827&rcid=9120 * Jackie * (+99) (fix/add links) [17:52:32] DaBPunkt: Is it too much trouble if we want to move p_unblock off sql-s1-user after this? Who do we submit a ticket to for that? [17:54:45] TParis: you can move it to whereever you like (it belongs to sql if it is cluster-independent) [17:55:37] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=3.679688/1.10, alarm hl:np_load_long=1.730469/1.55, alarm hl:mem_free=21374.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=3.679688/1.00, alarm hl:np_load_long=1.730469/1.50, alarm hl:mem_free=21374.000000M/300M, alarm hl:available=1/0 [17:56:29] Alright, thanks. [17:58:47] @replag jsmith: s1-sec-c: 13s [-0.01 s/s]; s2/s5-pri-c: 14m 44s [+0.00 s/s]; s3-rr: 60s [+0.00 s/s]; s3-user: 60s [+0.00 s/s]; s4-rr: 14m 44s [+0.00 s/s]; s4-user: 13s [-0.02 s/s] [17:58:47] kondicherry: replag Show current replag [17:59:18] @replag [17:59:38] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [17:59:39] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [17:59:42] ? [18:00:18] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [18:02:28] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [18:02:39] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:02:39] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [18:21:57] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [18:34:31] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 249556.000000 [18:36:48] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 249695.000000 [18:44:50] Sun Grid Engine execd on ortelius is WARNING: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.009766/1.00, alarm hl:np_load_long=0.928711/1.50, alarm hl:mem_free=21420.000000M/300M, alarm hl:available=1/0 [18:45:48] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [18:59:49] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [19:00:17] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [19:02:51] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:02:51] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [19:02:58] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [19:22:28] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [19:35:33] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 253226.000000 [19:36:59] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 253307.000000 [19:41:59] RAID on adenia is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:00:07] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [20:00:18] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [20:03:08] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:03:08] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [20:03:18] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [20:04:09] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.844727/1.8, alarm hl:np_load_avg=0.795410/2.3, alarm hl:mem_free=285.000000M/300M, alarm hl:available=1/0 [20:05:11] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [20:08:07] Sun Grid Engine execd on ortelius is WARNING: short-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.906250/1.10, alarm hl:np_load_long=0.937500/1.55, alarm hl:mem_free=20972.000000M/300M, alarm hl:available=1/0: medium-sol@ortelius exceedes load threshold: alarm hl:np_load_short=1.906250/1.00, alarm hl:np_load_long=0.937500/1.50, alarm hl:mem_free=20972.000000M/300M, alarm hl:available=1/0 [20:10:08] Sun Grid Engine execd on ortelius is OK: short-sol@ortelius OK: medium-sol@ortelius OK [20:11:38] RAID on adenia is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [20:22:47] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [20:26:03] Russell Blau * [Toolserver-l] s1 replication question [20:35:38] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 256831.000000 [20:37:57] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 256970.000000 [20:47:03] DaB. * Re: [Toolserver-l] s1 replication question [21:00:08] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [21:00:37] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [21:03:10] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:03:17] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [21:03:45] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [21:08:18] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.013672/1.8, alarm hl:np_load_avg=0.926269/2.3, alarm hl:mem_free=181.000000M/300M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.013672/1.9, alarm hl:np_load_long=0.841309/2.25, alarm hl:mem_free=181.000000M/200M, alarm hl:available=1/0 [21:10:18] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:13:17] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=0.939941/1.8, alarm hl:np_load_avg=0.966797/2.3, alarm hl:mem_free=281.000000M/300M, alarm hl:available=1/0 [21:23:17] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [21:35:59] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 260446.000000 [21:38:08] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 260575.000000 [21:38:17] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.884277/1.8, alarm hl:np_load_avg=1.433105/2.3, alarm hl:mem_free=324.000000M/300M, alarm hl:available=1/0 [21:39:17] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [21:43:17] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.821777/1.8, alarm hl:np_load_avg=1.656738/2.3, alarm hl:mem_free=261.000000M/300M, alarm hl:available=1/0 [22:00:27] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [22:00:57] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [22:02:47] Load avg. on willow is WARNING: WARNING - load average: 14.53, 15.21, 13.82 [22:03:18] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:03:19] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [22:04:03] Marlen Caemmerer * Re: [Toolserver-l] s1 replication question [22:04:18] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [22:07:47] Load avg. on willow is OK: OK - load average: 13.18, 14.85, 14.14 [22:12:47] Load avg. on willow is WARNING: WARNING - load average: 18.48, 17.07, 15.26 [22:23:47] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [22:36:58] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 264108.000000 [22:38:08] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 264178.000000 [22:43:47] Load avg. on willow is OK: OK - load average: 12.93, 14.37, 14.92 [22:44:27] nacht ts [22:57:38] /sql on thyme is CRITICAL: DISK CRITICAL - free space: /sql 66438 MB (6% inode=99%): [23:00:28] SMF on willow is CRITICAL: ERROR - maintenance: svc:/network/puppetmasterd:default [23:00:56] 3(commented) [MNT-1196] ptolemy is not longer accessable by ipv6 <10https://jira.toolserver.org/browse/MNT-1196> (Marlen Caemmerer) [23:01:08] SMF on damiana is CRITICAL: ERROR - maintenance: svc:/network/ldap/client:default [23:03:27] SMF on turnera is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:03:27] RAID on daphne is CRITICAL: ERROR - TOTAL: 2: FAILED: 0: DEGRADED: 1 [23:04:28] FMA on yarrow is CRITICAL: ERROR - unexpected output from snmpwalk [23:05:47] Load avg. on willow is WARNING: WARNING - load average: 17.51, 15.57, 14.79 [23:13:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=2.687988/1.8, alarm hl:np_load_avg=2.281738/2.3, alarm hl:mem_free=467.000000M/300M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=2.687988/1.9, alarm hl:np_load_long=2.023926/2.25, alarm hl:mem_free=467.000000M/200M, alarm hl:available=1/0 [23:18:27] Sun Grid Engine execd on willow is OK: medium-sol@willow OK: longrun-sol@willow OK [23:21:28] Sun Grid Engine execd on willow is WARNING: medium-sol@willow exceedes load threshold: alarm hl:np_load_short=1.946289/1.8, alarm hl:np_load_avg=1.958496/2.3, alarm hl:mem_free=608.000000M/300M, alarm hl:available=1/0: longrun-sol@willow exceedes load threshold: alarm hl:np_load_short=1.946289/1.9, alarm hl:np_load_long=1.955078/2.25, alarm hl:mem_free=608.000000M/200M, alarm hl:available=1/0 [23:23:47] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [23:24:47] Load avg. on willow is OK: OK - load average: 10.91, 13.51, 14.79 [23:37:02] s1 replag on thyme is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 267712.000000 [23:38:10] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 267781.000000