[00:03:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [00:07:33] nighty ! [00:12:30] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.090332/2.00, alarm hl:mem_free=201.000000M/250M [00:18:30] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [00:27:19] MySQL slave on z-dat-s7-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1942 [00:48:00] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:48:00] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [00:48:19] MySQL slave on z-dat-s7-a is OK: Uptime: 536960 Threads: 17 Questions: 109444308 Slow queries: 28791 Opens: 1278243 Flush tables: 1 Open tables: 3457 Queries per second avg: 203.822 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1778 [00:48:29] SSH on nightshade.mgmt is CRITICAL: Server answer: [00:50:49] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62364 MB (6% inode=99%): [00:55:00] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 170442 MB (3% inode=23%): [01:03:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [01:27:10] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:10] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:27:20] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:20] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:29] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:29] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:30] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:27:30] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:27:49] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [01:27:59] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 3722 MB (99% inode=99%): [01:27:59] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 3723 MB (99% inode=99%): [01:27:59] SMTP on z-dat-s7-a is OK: SMTP OK - 0.036 sec. response time [01:28:10] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:28:10] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:28:19] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:28:20] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:43:59] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1969 [01:45:59] MySQL slave on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [01:46:30] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:46:49] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:46:59] /v/sql on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:46:59] /tmp on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:10] SMTP on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:10] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:10] SMTP on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:10] SMTP on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:20] /tmp on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:20] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:20] /tmp on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:20] Load avg. on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:30] SSH on hyacinth is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:30] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:30] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:30] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:30] /tmp on z-dat-s4-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:31] /tmp on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:47:31] /v/sql on hyacinth is OK: DISK OK - free space: /v/sql 222711 MB (23% inode=99%): [01:47:49] /tmp on hyacinth is OK: DISK OK - free space: /tmp 3666 MB (99% inode=99%): [01:47:49] Load avg. on hyacinth is OK: OK - load average: 0.10, 1.03, 1.32 [01:47:59] MySQL on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [01:47:59] MySQL slave on z-dat-s7-a is CRITICAL: (Service Check Timed Out) [01:48:09] SMTP on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:48:09] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:48:09] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [01:48:21] MySQL on z-dat-s6-a is CRITICAL: (Service Check Timed Out) [01:48:21] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:48:30] MySQL slave on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [01:48:30] MySQL on z-dat-s3-a is CRITICAL: (Service Check Timed Out) [01:48:30] SSH on nightshade.mgmt is CRITICAL: Server answer: [01:48:59] MySQL slave on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [01:48:59] MySQL on z-dat-s4-a is CRITICAL: (Service Check Timed Out) [01:50:39] MySQL on z-dat-s7-a is OK: Uptime: 540706 Threads: 17 Questions: 110009460 Slow queries: 29243 Opens: 1279177 Flush tables: 1 Open tables: 3482 Queries per second avg: 203.455 [01:50:39] MySQL slave on z-dat-s7-a is OK: Uptime: 540706 Threads: 17 Questions: 110009460 Slow queries: 29243 Opens: 1279177 Flush tables: 1 Open tables: 3482 Queries per second avg: 203.455 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 662 [01:50:49] MySQL slave on z-dat-s6-a is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2379 [01:50:49] MySQL on z-dat-s6-a is OK: Uptime: 3035192 Threads: 19 Questions: 526806781 Slow queries: 209005 Opens: 5915495 Flush tables: 2 Open tables: 2899 Queries per second avg: 173.566 [01:50:49] /tmp on z-dat-s7-a is OK: DISK OK - free space: /tmp 3568 MB (99% inode=99%): [01:50:49] /sql on z-dat-s7-a is OK: DISK OK - free space: /sql 131120 MB (32% inode=99%): [01:50:49] /tmp on z-dat-s3-a is OK: DISK OK - free space: /tmp 3643 MB (99% inode=99%): [01:50:49] MySQL slave on z-dat-s4-a is OK: Uptime: 2952415 Threads: 9 Questions: 124419846 Slow queries: 30982 Opens: 20116 Flush tables: 1 Open tables: 431 Queries per second avg: 42.141 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 311 [01:50:49] MySQL on z-dat-s4-a is OK: Uptime: 2952415 Threads: 9 Questions: 124419846 Slow queries: 30982 Opens: 20116 Flush tables: 1 Open tables: 431 Queries per second avg: 42.141 [01:50:50] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62274 MB (6% inode=99%): [01:50:59] MySQL slave on z-dat-s3-a is OK: Uptime: 1768284 Threads: 22 Questions: 1796991808 Slow queries: 215447 Opens: 24768927 Flush tables: 2 Open tables: 16384 Queries per second avg: 1016.234 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 630 [01:50:59] MySQL on z-dat-s3-a is OK: Uptime: 1768284 Threads: 21 Questions: 1796991810 Slow queries: 215447 Opens: 24768927 Flush tables: 2 Open tables: 16384 Queries per second avg: 1016.234 [01:50:59] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [01:50:59] /tmp on z-dat-s4-a is OK: DISK OK - free space: /tmp 3658 MB (99% inode=99%): [01:50:59] /tmp on z-dat-s6-a is OK: DISK OK - free space: /tmp 3658 MB (99% inode=99%): [01:51:00] SMTP on z-dat-s7-a is OK: SMTP OK - 0.148 sec. response time [01:51:00] SMTP on z-dat-s6-a is OK: SMTP OK - 0.005 sec. response time [01:51:01] SMTP on z-dat-s3-a is OK: SMTP OK - 0.076 sec. response time [01:51:01] SMTP on hyacinth is OK: SMTP OK - 0.271 sec. response time [01:51:09] SMTP on z-dat-s4-a is OK: SMTP OK - 0.112 sec. response time [01:51:19] SSH on hyacinth is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:51:20] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:51:20] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:51:21] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:51:21] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [01:51:29] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [01:54:58] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 170224 MB (3% inode=23%): [02:03:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [02:33:50] MySQL slave on z-dat-s6-a is OK: Uptime: 3037775 Threads: 12 Questions: 526922021 Slow queries: 209214 Opens: 5915501 Flush tables: 2 Open tables: 2902 Queries per second avg: 173.456 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1727 [02:48:10] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:48:10] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [02:48:39] SSH on nightshade.mgmt is CRITICAL: Server answer: [02:51:49] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62188 MB (6% inode=99%): [02:54:59] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 169777 MB (3% inode=22%): [03:27:30] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:40] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:59] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:28:19] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [03:28:29] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [03:28:39] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [03:33:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [03:42:00] toolserver.org HTTP on wolfsbane is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 239 bytes in 7.914 second response time [03:43:00] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.034 second response time [03:48:09] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:48:10] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [03:48:39] SSH on nightshade.mgmt is CRITICAL: Server answer: [03:49:39] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:49:59] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:50:09] /v/sql on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:50:39] /v/sql on hyacinth is OK: DISK OK - free space: /v/sql 222965 MB (23% inode=99%): [03:50:39] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [03:52:00] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62114 MB (6% inode=99%): [03:54:59] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 169526 MB (3% inode=22%): [04:48:20] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:48:20] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [04:48:39] SSH on nightshade.mgmt is CRITICAL: Server answer: [04:53:00] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61805 MB (6% inode=99%): [04:55:00] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 169264 MB (3% inode=22%): [05:02:09] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:59] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.030 second response time [05:03:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [05:42:09] Environment on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:42:49] Environment on hyacinth is OK: ok: temperature ok fan ok voltage ok chassis ok [05:45:39] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.164062/2.00, alarm hl:mem_free=238.000000M/250M [05:46:40] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [05:48:40] SSH on nightshade.mgmt is CRITICAL: Server answer: [05:49:19] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:49:19] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [05:54:00] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61928 MB (6% inode=99%): [05:55:00] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166748 MB (3% inode=22%): [05:57:40] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.181152/2.00, alarm hl:mem_free=241.000000M/250M [06:02:20] Load avg. on willow is WARNING: WARNING - load average: 15.33, 12.56, 10.64 [06:03:19] Load avg. on willow is OK: OK - load average: 11.37, 11.89, 10.53 [06:03:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [06:05:59] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [06:17:19] Load avg. on willow is WARNING: WARNING - load average: 19.45, 15.05, 12.36 [06:22:39] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_avg=1.574219/1.50, alarm hl:mem_free=195.000000M/100M: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.509766/2.00, alarm hl:mem_free=195.000000M/250M [06:27:39] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [06:48:19] Load avg. on willow is WARNING: WARNING - load average: 17.00, 13.94, 11.91 [06:48:49] SSH on nightshade.mgmt is CRITICAL: Server answer: [06:48:49] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_avg=1.735840/1.50, alarm hl:mem_free=555.000000M/100M [06:49:19] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:49:20] Load avg. on willow is OK: OK - load average: 10.88, 12.71, 11.60 [06:49:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [06:52:49] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [06:54:00] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61902 MB (6% inode=99%): [06:55:59] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166657 MB (3% inode=22%): [07:00:50] Sun Grid Engine execd on willow is WARNING: all.q@willow exceedes load threshold: alarm hl:np_load_avg=1.582031/1.50, alarm hl:mem_free=245.000000M/100M: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.465820/2.00, alarm hl:mem_free=245.000000M/250M [07:03:50] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [07:05:59] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [07:20:59] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 68083 MB (7% inode=99%): [07:48:50] SSH on nightshade.mgmt is CRITICAL: Server answer: [07:49:30] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:50:19] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [07:54:59] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61832 MB (6% inode=99%): [07:55:59] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166497 MB (3% inode=22%): [08:06:12] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [08:22:42] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:23:32] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:33:52] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [08:49:32] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:49:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [08:50:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [08:53:21] /tmp on willow is WARNING: DISK WARNING - free space: /tmp 100 MB (19% inode=99%): [08:55:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62717 MB (6% inode=99%): [08:55:41] SSH on z-dat-s3-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:41] SSH on z-dat-s6-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:41] RAID on hyacinth is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:55:41] SSH on z-dat-s7-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:55:41] SSH on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:56:12] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166371 MB (3% inode=22%): [08:56:12] RAID on hyacinth is OK: OK - TOTAL: 2: FAILED: 0: DEGRADED: 0 [08:56:21] /tmp on willow is OK: DISK OK - free space: /tmp 131 MB (25% inode=99%): [08:56:32] SSH on z-dat-s3-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:56:32] SSH on z-dat-s6-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:56:32] SSH on z-dat-s7-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [08:56:32] SSH on z-dat-s4-a is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [09:07:12] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [09:33:52] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [09:42:21] SMTP on z-dat-s4-a is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:12] SMTP on z-dat-s4-a is OK: SMTP OK - 0.003 sec. response time [09:49:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:49:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [09:50:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [09:53:42] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.039551/2.00, alarm hl:mem_free=175.000000M/250M [09:55:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62650 MB (6% inode=99%): [09:57:13] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166251 MB (3% inode=22%): [10:00:42] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [10:04:41] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.174316/2.00, alarm hl:mem_free=139.000000M/250M [10:07:12] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [10:28:41] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.082520/2.00, alarm hl:mem_free=124.000000M/250M [10:36:41] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [10:45:42] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=1.117188/2.00, alarm hl:mem_free=197.000000M/250M [10:46:41] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [10:49:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:49:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [10:50:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [10:55:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62591 MB (6% inode=99%): [10:58:11] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 166146 MB (3% inode=22%): [11:03:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [11:07:12] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [11:13:41] Sun Grid Engine execd on willow is WARNING: longrun@willow exceedes load threshold: alarm hl:np_load_long=0.905273/2.00, alarm hl:mem_free=173.000000M/250M [11:15:42] Sun Grid Engine execd on willow is OK: all.q@willow OK: longrun@willow OK [11:49:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:49:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [11:50:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [11:55:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62526 MB (6% inode=99%): [11:58:11] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 165813 MB (3% inode=22%): [12:03:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [12:08:11] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [12:49:31] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:49:41] SSH on nightshade.mgmt is CRITICAL: Server answer: [12:51:21] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [12:55:11] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62447 MB (6% inode=99%): [12:58:11] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 165668 MB (3% inode=22%): [13:03:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [13:08:11] toolserver.org HTTP on wolfsbane is CRITICAL: Connection refused [13:38:36] hello all. Happy new year :-) [13:40:47] DaBPunkt thank you! happy new year to you and your family! [13:50:20] 3(created) [MNT-1167] Nagios reports that toolserver.org on wolsfbane is down; Maintenance; Emergency work <10https://jira.toolserver.org/browse/MNT-1167> (DaB.) [13:52:27] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.014 second response time [13:56:31] looks like there is a problem with the www-ha-server [14:03:09] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.010 second response time [14:03:09] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62361 MB (6% inode=99%): [14:03:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [14:03:17] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [14:03:34] ok, should be fixed now [14:04:20] 3(updated) [MNT-1167] Nagios reports that toolserver.org on wolsfbane is down <10https://jira.toolserver.org/browse/MNT-1167> (DaB.) [14:05:45] time for breakfast. [14:33:47] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [15:00:05] DaB. * [Toolserver-l] Happy New Year 2012 [15:02:37] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 165294 MB (3% inode=22%): [15:02:38] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:03:17] SSH on nightshade.mgmt is CRITICAL: Server answer: [15:03:17] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [15:04:19] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62278 MB (6% inode=99%): [15:33:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [16:02:38] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 165149 MB (3% inode=22%): [16:02:38] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:03:19] SSH on nightshade.mgmt is CRITICAL: Server answer: [16:03:19] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [16:04:19] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62184 MB (6% inode=99%): [16:33:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [16:57:22] [[Special:Log/newusers]] create 10 * Aristotys * (New user account) [16:59:55] [[User talk:Aristotys]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6503&rcid=8544 * Aristotys * (+91) (Created page with ".GHUNSHYAM KUMAR (ÅSHU) ROGBEER.--~~~~") [17:00:17] [[User:Aristotys]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6504&rcid=8545 * Aristotys * (+1) (Created page with ".") [17:01:13] [[Työkalupalvelin]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6505&rcid=8546 * Aristotys * (+1) (Created page with ".") [17:02:47] [[User:Misza13]] !N 10https://wiki.toolserver.org/w/index.php?oldid=6506&rcid=8547 * Aristotys * (+1) (Created page with ".") [17:02:50] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 164997 MB (3% inode=22%): [17:02:50] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:03:19] SSH on nightshade.mgmt is CRITICAL: Server answer: [17:03:19] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [17:04:19] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62104 MB (6% inode=99%): [18:03:19] SSH on nightshade.mgmt is CRITICAL: Server answer: [18:03:19] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:03:48] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 164793 MB (3% inode=22%): [18:03:49] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [18:03:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [18:04:19] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61993 MB (6% inode=99%): [18:15:22] [[Special:Log/block]] block 10 * Betacommand * (blocked [[02User:Aristotys10]] with an expiry time of infinite (account creation disabled): Inserting nonsense/gibberish into pages) [18:15:56] [[Special:Log/delete]] delete 10 * Betacommand * (deleted "[[02User talk:Aristotys10]]": content was: ".GHUNSHYAM KUMAR (ÅSHU) ROGBEER.--[[User:Aristotys|Aristotys]] 16:59, 1 January 2012 (UTC)" (and the only contributor was "[[Special:Contributions/Aristotys|Aristotys]]")) [18:15:57] [[Special:Log/delete]] delete 10 * Betacommand * (deleted "[[02User:Aristotys10]]": content was: "." (and the only contributor was "[[Special:Contributions/Aristotys|Aristotys]]")) [18:15:58] [[Special:Log/delete]] delete 10 * Betacommand * (deleted "[[02Työkalupalvelin10]]": content was: "." (and the only contributor was "[[Special:Contributions/Aristotys|Aristotys]]")) [18:15:58] [[Special:Log/delete]] delete 10 * Betacommand * (deleted "[[02User:Misza1310]]": content was: "." (and the only contributor was "[[Special:Contributions/Aristotys|Aristotys]]")) [19:03:29] SSH on nightshade.mgmt is CRITICAL: Server answer: [19:03:29] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:03:50] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 164640 MB (3% inode=22%): [19:03:51] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [19:03:51] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [19:05:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61901 MB (6% inode=99%): [19:21:59] /aux0 on daphne is CRITICAL: DISK CRITICAL - free space: /aux0 66486 MB (6% inode=99%): [19:34:21] 3(commented) [TS-1253] Approval is needed for installation of software <10https://jira.toolserver.org/browse/TS-1253> (DaB.) [19:34:26] 3(assigned) [TS-1253] Approval is needed for installation of software <10https://jira.toolserver.org/browse/TS-1253> (DaB.) [19:52:28] zzz [20:00:21] 3(commented) [TS-867] Nicer URL for hillshading tiles <10https://jira.toolserver.org/browse/TS-867> [20:02:20] 3(assigned) [TS-867] Nicer URL for hillshading tiles <10https://jira.toolserver.org/browse/TS-867> (DaB.) [20:03:29] SSH on nightshade.mgmt is CRITICAL: Server answer: [20:03:29] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:04:48] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 164442 MB (3% inode=22%): [20:04:49] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [20:05:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61910 MB (6% inode=99%): [20:09:21] 3(assigned) [TS-867] Nicer URL for hillshading tiles <10https://jira.toolserver.org/browse/TS-867> (DaB.) [20:11:20] 3(commented) [TS-867] Nicer URL for hillshading tiles <10https://jira.toolserver.org/browse/TS-867> (DaB.) [20:29:46] hi all [20:33:49] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [20:51:22] 3(commented) [TS-867] Nicer URL for hillshading tiles <10https://jira.toolserver.org/browse/TS-867> (Kai Krueger) [20:53:25] 3(commented) [TS-1253] Approval is needed for installation of software <10https://jira.toolserver.org/browse/TS-1253> (Mono) [20:58:14] @replag [20:58:15] matthewrbowker: s3-rr: 45s [+0.00 s/s]; s3-user: 45s [+0.00 s/s] [21:01:24] 3(resolved) [TS-1253] Approval is needed for installation of software <10https://jira.toolserver.org/browse/TS-1253> (DaB.) [21:03:38] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:04:27] SSH on nightshade.mgmt is CRITICAL: Server answer: [21:04:49] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 164412 MB (3% inode=22%): [21:04:50] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [21:05:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 61715 MB (6% inode=99%): [21:30:38] /tmp on willow is CRITICAL: DISK CRITICAL - free space: /tmp 0 MB (0% inode=99%): [21:33:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [22:04:32] SSH on nightshade.mgmt is CRITICAL: Server answer: [22:04:38] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:04:55] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 150641 MB (2% inode=20%): [22:05:13] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [22:05:17] why everything is said continuosly to be critical? [22:05:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62548 MB (6% inode=99%): [22:25:21] 3(commented) [TS-971] Openstreetmap -Tiles expiring is not running <10https://jira.toolserver.org/browse/TS-971> (Kai Krueger) [22:30:38] /tmp on willow is CRITICAL: DISK CRITICAL - free space: /tmp 0 MB (0% inode=99%): [22:38:10] oh oh [22:39:38] /tmp on willow is OK: DISK OK - free space: /tmp 319 MB (62% inode=99%): [23:00:22] 3(created) [MNT-1168] Delete old osm-titles.; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1168> (DaB.) [23:03:48] Virtual disks on far1-n1-oe16-esams.mgmt is CRITICAL: (Service Check Timed Out) [23:04:39] SSH on nightshade.mgmt is CRITICAL: Server answer: [23:04:40] SMF on turnera.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:05:17] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 62439 MB (6% inode=99%): [23:05:50] /aux0 on hemlock is CRITICAL: DISK CRITICAL - free space: /aux0 153946 MB (2% inode=21%): [23:06:10] SMF on damiana.esi is CRITICAL: ERROR - offline: svc:/system/cluster/scsymon-srv:default [23:07:45] DaBPunkt: Are you using modification or access time to decide if a file can be deleted? [23:08:04] modification [23:08:19] I doubt that the accesstime is set [23:08:23] It might be better to use access time in future [23:09:13] I can chnage it, but it will not make a huge differnce. Modern fielsystems doesn't set the access-time today anymore, only when the file is changed [23:10:43] stat /tiles/hikebike/0/0/0/0/0/0.meta gives me quite a different access and modification date [23:11:30] but I don't know how accurate it is, as indeed access time is often more of a heuristic [23:11:52] Btw, is it possible to delete directories that are empty? [23:13:24] 3(commented) [MNT-1168] Delete old osm-titles. <10https://jira.toolserver.org/browse/MNT-1168> (DaB.) [23:13:45] Thanks [23:19:31] apmon: should be possible, yes [23:21:10] should I? [23:22:11] yes, I think that would be useful [23:23:00] Both to get a better overview of where tiles are and it might help performance of some of the expiry tools [23:27:21] 3(created) [MNT-1169] Delete empty osm-dirs; Maintenance; Minor work <10https://jira.toolserver.org/browse/MNT-1169> (DaB.) [23:28:51] running [23:28:59] great [23:31:21] DaBPunkt: I have the following question regarding performance of /tiles/ on hemlock due to tile expiry: [23:32:10] mod_tile, the apache module used to serve map tiles, checks the modification time of the tile served and compares it to the "planet import time". [23:32:29] If it is older, it decides it is outdated and re-renders the map tile. [23:33:12] The expiry works by calculating a list of changed tiles during the diff import into the postgresql database. [23:34:07] These changed tiles then get their modification times set back to sometime in 2000, which is definately older then the "planet import time" and thus mod_tile consideres them as outdated and needing re-rendering [23:34:30] (what a hack ;)) [23:34:43] Yes, it is... ;-) [23:36:01] Now the question: This expiry has to stat and modify timestamps of a very large number of files. The last hourly diff tried to touch I think something like a 100000 files for 10 or so styles each [23:36:37] Given that /tiles/ is on a shared nfs mount, is this kind of disk activity an issue for others? [23:37:21] Of those 100.000 files the vast majority of files don't exist as they have never been rendered before. [23:38:07] You can see the stats of a single run at /home/project/o/s/m/osm/tools/diff-import/logs/expiry.log in case you are interested [23:39:41] apmon: the shared is mostly used for big files like dumps so I doubt the impact is great. Of course you should not do too many disc-access in a given time [23:40:24] 100.000 in 1 h sounds too much [23:40:59] Yes, I am trying to see if I can get it down [23:43:30] sooner or later the OSM-render will need its own disc-array; maybe yarrow's old one or so. [23:53:46] apmon: out of couriosity: What is the first sub-dir behindt the level for? [23:55:17] The directory structure is a slightly odd hash [23:56:04] It takes the X and Y coordinates of the tiles, then interleaves them 4 bit at a time, and then chops them into 8 bit blocks per directory level [23:56:10] oh, I tought that it was level/0/längengrad/breitengrad (or switch) [23:56:25] switch → swaped [23:56:55] ok [23:57:17] The web url is /style/z/x/y.png, but the on disk structure is a hashed version to spread it more evenly accross directories [23:57:37] It also combines 64 tiles into a single metatile to reduce the number of files on disk