[00:00:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [00:06:21] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:07:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [00:10:33] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 85 MB (19% inode=98%): [00:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [00:24:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:24:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [00:26:13] s4 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 4765.000000 [00:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18873.000000 [00:32:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 18922 [00:33:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:33:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:34:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 565085 MB (10% inode=46%): [00:34:32] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 93 MB (20% inode=98%): [00:39:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:39:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [00:40:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [00:40:32] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 22454.000000 [00:44:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [00:46:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [00:51:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [00:52:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [00:53:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [01:00:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [01:02:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:20:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 566687 MB (10% inode=46%): [01:20:33] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 78 MB (18% inode=98%): [01:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [01:24:12] s4 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3502.000000 [01:25:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [01:30:12] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20788.000000 [01:31:13] s4 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1797.000000 [01:33:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 20799 [01:33:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [01:33:34] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [01:34:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 566653 MB (10% inode=46%): [01:34:33] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 93 MB (20% inode=98%): [01:39:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [01:39:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [01:40:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [01:40:32] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21229.000000 [01:45:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [01:45:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [01:46:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [01:46:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [01:52:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [01:52:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [01:53:54] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [01:59:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 566626 MB (10% inode=46%): [01:59:33] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 75 MB (17% inode=98%): [02:00:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [02:14:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:14:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:20:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:20:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:20:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:20:33] SSH on hemlock is CRITICAL: Server answer: [02:20:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [02:20:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:25:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [02:28:21] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:29:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:30:12] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21287.000000 [02:33:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 21230 [02:40:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [02:40:33] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21667.000000 [02:42:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:45:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [02:46:14] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [02:49:21] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:50:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [02:52:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [02:52:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [02:54:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [02:55:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [02:57:53] SMTP on hemlock is CRITICAL: Connection refused [03:01:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [03:10:42] SSH on hemlock is OK: SSH OK - OpenSSH_5.8p2-hpn13v11 (protocol 2.0) [03:11:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564640 MB (10% inode=46%): [03:11:33] /tmp on hemlock is WARNING: DISK WARNING - free space: /tmp 61 MB (14% inode=98%): [03:11:53] / on hemlock is CRITICAL: Connection refused by host [03:12:03] Load avg. on hemlock is CRITICAL: Connection refused by host [03:12:03] /home on hemlock is CRITICAL: Connection refused by host [03:12:03] Environment IPMI on hemlock is CRITICAL: Connection refused by host [03:12:33] /aux0 on hemlock is CRITICAL: Connection refused by host [03:12:33] /tmp on hemlock is CRITICAL: Connection refused by host [03:12:33] CAM on hemlock is CRITICAL: Connection refused by host [03:12:53] / on hemlock is OK: DISK OK - free space: / 6390 MB (32% inode=89%): [03:13:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:13:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:13:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:13:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564640 MB (10% inode=46%): [03:13:33] /tmp on hemlock is OK: DISK OK - free space: /tmp 94 MB (21% inode=98%): [03:13:33] CAM on hemlock is OK: OK - cam detected no new errors [03:14:02] Load avg. on hemlock is OK: OK - load average: 0.03, 0.09, 0.31 [03:14:03] /home on hemlock is OK: DISK OK - free space: /home 18998 MB (37% inode=87%): [03:14:13] Environment IPMI on hemlock is OK: ok: temperature ok fan ok voltage ok chassis ok [03:15:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [03:21:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:21:42] SSH on hemlock is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:54] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:25:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:25:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [03:26:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:30:14] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 21336.000000 [03:33:03] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 21263 [03:33:34] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:35:24] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:39:34] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:40:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:40:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:40:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [03:40:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [03:40:33] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20518.000000 [03:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:43:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:45:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [03:47:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [03:52:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [03:53:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [03:54:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [03:55:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [03:57:54] SMTP on hemlock is CRITICAL: Connection refused [04:01:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [04:08:44] Web servers are not responding [04:08:52] operation time out and/or 504 gateway timeout [04:09:10] wolfsbane, ortelius and HTTPS [04:09:12] all three [04:09:19] http://ortelius.toolserver.org/~cvn/index.html [04:09:26] http://wolfsbane.toolserver.org/~cvn/index.html [04:09:32] https://toolserver.org/~cvn/index.html [04:10:00] (unrelated replag) [04:10:01] @replag [04:10:01] Krinkle: s1-rr-a: 5h 45m 56s [+0.00 s/s]; s1-user: 5h 45m 56s [+0.00 s/s]; s2-user: 10h 42m 33s [+0.10 s/s]; s2-user-c: 4h 31m 21s [-0.55 s/s]; s3-rr-a: 1m 50s [-0.03 s/s]; s3-user: 1m 50s [-0.03 s/s]; s5-user-c: 4h 31m 19s [-0.55 s/s] [04:18:21] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:18:25] Krinkle * [Toolserver-l] Web servers unresponsive [04:19:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [04:21:32] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:22:32] SSH on hemlock is CRITICAL: Server answer: [04:22:52] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:26:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:26:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [04:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 20410.000000 [04:34:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 20165 [04:34:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:40:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [04:40:32] s4 replag on cassia is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 11101.000000 [04:40:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:41:02] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:41:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:41:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [04:42:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:44:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [04:47:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [04:49:32] fisheye.toolserver.org on web.amaranth is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - 271 bytes in 20.525 second response time [04:52:17] why is http://toolserver.org/~pathoschild/stalktoy/index.php?target=2000%3A%3A%2F4 throwing a 504? [04:52:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [04:54:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [04:54:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [04:56:07] Brooke:^ [04:56:32] s4 replag on cassia is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3187.000000 [04:58:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [04:58:32] s4 replag on cassia is OK: QUERY OK: SELECT ts_rc_age() returned 1680.000000 [04:58:53] SMTP on hemlock is CRITICAL: Connection refused [05:01:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [05:08:13] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:09:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:15:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [05:21:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:21:34] Jasper_Deng: toolserver webservers are down for an unknown reason right now [05:22:32] SSH on hemlock is CRITICAL: Server answer: [05:22:52] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:26:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [05:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 19806.000000 [05:34:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 19920 [05:34:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:35:22] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [05:40:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [05:40:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:41:02] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:41:02] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:41:02] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [05:42:33] fisheye.toolserver.org on web.amaranth is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 274 bytes in 19.797 second response time [05:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:43:32] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [05:44:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [05:48:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [05:53:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [05:55:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [05:55:52] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [05:58:52] SMTP on hemlock is CRITICAL: Connection refused [05:59:03] Anyone know when everything should be back up and working? [06:00:52] /sql on cassia is CRITICAL: DISK CRITICAL - free space: /sql 8775 MB (0% inode=93%): [06:02:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [06:03:43] I guess nobody knows then?.. [06:03:50] Riley: I think most of the admins are sleeping... [06:04:01] Krinkle already sent something to the mailing list [06:04:13] And you cant exactly file a jira ticket... [06:04:19] Which mailing list? [06:04:20] indeed [06:04:35] http://bit.ly/toolserverLast [06:04:36] Toolserver-; [06:04:39] -l* [06:04:42] Ah. [06:04:46] Well this is wonderful.. [06:05:09] Er Krinkle, does that link go to toolserver.org? [06:05:13] http://lists.wikimedia.org/pipermail/toolserver-l/2012-September/005249.html [06:05:14] yeah.. [06:05:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:06:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [06:09:25] legoktm: SHEESH. [06:11:04] Just sent an ACC emailing list email out. Thanks. [06:18:20] <_9xl> @replag [06:18:21] _9xl: s1-rr-a: 5h 22m 28s [-0.18 s/s]; s1-user: 5h 22m 28s [-0.18 s/s]; s2-user: 9h 47m 17s [-0.43 s/s]; s3-rr-a: 1m 26s [-0.00 s/s]; s3-user: 1m 26s [-0.00 s/s] [06:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [06:21:32] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:22:33] SSH on hemlock is CRITICAL: Server answer: [06:22:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:24:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:25:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [06:26:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [06:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 18302.000000 [06:31:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:34:03] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 18069 [06:34:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:41:02] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:41:02] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:41:02] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:41:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [06:41:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [06:42:25] ralf * Re: [Toolserver-l] Web servers unresponsive [06:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:43:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [06:44:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:46:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [06:48:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [06:53:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [06:55:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [06:55:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [06:58:53] SMTP on hemlock is CRITICAL: Connection refused [07:02:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [07:05:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [07:22:32] SSH on hemlock is CRITICAL: Server answer: [07:22:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:22:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:26:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [07:30:12] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 12316.000000 [07:34:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:35:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 11690 [07:41:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:42:02] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:42:02] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:42:02] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:42:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [07:42:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:43:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [07:45:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [07:46:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [07:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [07:48:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [07:53:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [07:55:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [07:55:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [07:58:53] SMTP on hemlock is CRITICAL: Connection refused [08:02:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [08:20:41] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [08:22:32] SSH on hemlock is CRITICAL: Server answer: [08:22:32] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:23:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:26:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [08:29:56] NFS server hemlock not responding still trying [08:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 7086.000000 [08:33:55] DanielK_WMDE: any idea why the toolserver is down or anything we can do to fix? (or who can fix?) [08:34:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:35:02] MySQL slave on rosemary is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 6621 [08:35:42] seems sick [08:37:45] I think Danny_B|backup said he was working on it [08:38:05] * Danny_B|backup definitely not [08:38:23] i don't think Danny_B|backup can work on it [08:39:26] Sorry guess I misenterpreted your message then [08:41:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:42:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [08:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:43:02] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:43:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:43:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:44:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [08:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [08:48:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [08:49:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [08:49:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [08:53:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [08:55:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [08:55:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [08:59:52] SMTP on hemlock is CRITICAL: Connection refused [09:00:54] /sql on z-dat-s6-a is WARNING: DISK WARNING - free space: /sql 80697 MB (8% inode=98%): [09:00:54] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41551 MB (10% inode=99%): [09:00:55] /sql on z-dat-s3-a is WARNING: DISK WARNING - free space: /sql 80688 MB (8% inode=98%): [09:02:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [09:07:33] Load avg. on yarrow is WARNING: WARNING - load average: 16.00, 15.56, 14.72 [09:13:55] DanielK_WMDE: can you let dab or nosy know somehow, that there are serious issues, pls? [09:15:18] * Damianz loves a 502 first thing in the morning [09:20:41] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [09:22:32] SSH on hemlock is CRITICAL: Server answer: [09:22:32] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:23:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:27:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [09:30:13] s1 replag on rosemary is CRITICAL: QUERY CRITICAL: SELECT ts_rc_age() returned 3858.000000 [09:30:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:31:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [09:32:13] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 3575.000000 [09:33:02] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3456 [09:35:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:42:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:42:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [09:44:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:44:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:44:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [09:45:41] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [09:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:46:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [09:49:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [09:50:02] MySQL slave on rosemary is OK: Uptime: 14149956 Threads: 126 Questions: 6180025272 Slow queries: 1850827 Opens: 233510 Flush tables: 6 Open tables: 3817 Queries per second avg: 436.752 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1767 [09:50:12] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1742.000000 [09:52:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 121693 MB (20% inode=99%): [09:54:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [09:55:34] @replag [09:55:34] Quentinv57: s1-rr-a: 29m 3s [-1.32 s/s]; s1-user: 29m 3s [-1.32 s/s]; s2-user: 8h 27m 48s [-0.59 s/s]; s3-rr-a: 2m 2s [+0.01 s/s]; s3-user: 2m 2s [+0.01 s/s] [09:55:50] @help [09:55:50] Type @commands for list of commands. This bot is running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 1.8.20.12 source code licensed under GPL and located at https://github.com/benapetr/wikimedia-bot [09:55:50] Quentinv57: (help [] []) -- This command gives a useful description of what does. is only necessary if the command is in more than one plugin. [09:55:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [09:56:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [09:56:30] @commands [09:56:30] Commands: channellist, trusted, trustadd, trustdel, info, statistics-off, statistics-on, statistics-reset, configure, infobot-link, infobot-share-trust+, infobot-share-trust-, infobot-share-off, infobot-share-on, infobot-detail, infobot-off, refresh, infobot-on, drop, whoami, add, reload, suppress-off, suppress-on, help, RC-, recentchanges-on, language, infobot-ignore+, infobot-ignore-, recentchanges-off, logon, logoff, recentchanges-, recentchanges+, RC+ [09:56:55] >.< [09:59:53] SMTP on hemlock is CRITICAL: Connection refused [10:02:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [10:04:02] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1975 [10:04:13] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 1978.000000 [10:07:32] Load avg. on yarrow is WARNING: WARNING - load average: 18.26, 17.72, 16.79 [10:08:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [10:22:33] SSH on hemlock is CRITICAL: Server answer: [10:22:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:23:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:24:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:26:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:26:30] Quentinv57: what are you trying to do? [10:27:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [10:35:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:42:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:43:16] closedmouth, I was trying to know what's causing the problem, but it does not matter much because I cannot fix it [10:43:34] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [10:43:38] as said in #wikimedia-stewards, without Toolserver working I can't do much of my steward work [10:43:51] bad situation [10:44:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:44:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:44:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:45:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [10:46:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [10:48:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [10:49:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [10:49:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [10:52:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 107186 MB (17% inode=99%): [10:54:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [10:55:54] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [10:56:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [10:59:54] SMTP on hemlock is CRITICAL: Connection refused [11:02:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [11:05:02] MySQL slave on rosemary is WARNING: SLOW_SLAVE WARNING: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2989 [11:05:13] s1 replag on rosemary is WARNING: QUERY WARNING: SELECT ts_rc_age() returned 2976.000000 [11:05:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:07:33] Load avg. on yarrow is WARNING: WARNING - load average: 20.16, 19.61, 18.75 [11:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [11:22:32] SSH on hemlock is CRITICAL: Server answer: [11:22:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:23:55] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:25:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [11:27:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [11:31:03] MySQL slave on rosemary is OK: Uptime: 14156017 Threads: 124 Questions: 6182694546 Slow queries: 1850950 Opens: 233522 Flush tables: 6 Open tables: 3815 Queries per second avg: 436.753 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1677 [11:31:13] s1 replag on rosemary is OK: QUERY OK: SELECT ts_rc_age() returned 1648.000000 [11:34:55] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 110978 MB (11% inode=99%): [11:36:36] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:42:34] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:44:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:44:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [11:45:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:45:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [11:46:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:46:34] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [11:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [11:49:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [11:52:54] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 105696 MB (17% inode=99%): [11:54:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [11:55:55] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [11:56:14] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [11:59:54] SMTP on hemlock is CRITICAL: Connection refused [12:00:33] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 20.92, 20.27, 20.06 [12:03:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [12:17:57] @replag [12:17:57] Yellowcard: s2-user: 1h 23m 43s [-2.98 s/s]; s3-rr-a: 50s [-0.01 s/s]; s3-user: 50s [-0.01 s/s]; s7-rr-a: 14s [-]; s7-user: 14s [-] [12:19:23] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:19:44] hi [12:20:24] kümmert sich jemand um den toolserver? [12:20:41] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [12:21:12] ??? [12:21:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [12:22:34] SSH on hemlock is CRITICAL: Server answer: [12:22:34] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:24:10] im moment ist keiner da [12:24:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:24:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:25:53] Lady: Es ist ab etwa sechs Stunden jetzt gewesen, und ich habe keine Anhaltspunkte es gearbeitet hat am gesehen. [12:26:30] mmmhhh ... [12:26:48] bei mir ab 4:50 MESZ ;) [12:26:49] da weder DaB. noch nosy im Chat sind, wird sich hier auch nichts tun [12:27:09] so ein ..... Mist [12:27:16] river war da, macht aber wohl nichts mehr [12:27:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [12:27:34] river hat mMn keine root-Rechte mehr [12:27:38] aha [12:27:50] ich habe DaB. mal auf seiner Disk die Info hinterlassen [12:28:09] ich habe ein ticket erstellt [12:28:16] ich habe irgendwo gelesen das es jetzt einen bezahlten server admin gibt? [12:28:37] *weiß nicht mehr wo genau* [12:28:58] auf journal.toolserver.org habe ich es heute zufällig gelesen [12:29:06] das hat sich wohl erledigt [12:29:12] ach, ne, nosy macht das teilzeit [12:30:08] ahh da Steef389 da ist: danke für die WW Stimme ;-( [12:30:41] teilzeit? [12:31:12] ja, so nebenbei [12:31:41] ach das war uralt, sehe ich grade [12:31:55] von 2010 [12:32:19] auf der mailingliste steht bestimmt auch was aktuelleres [12:32:20] was man bei WP nicht alles "nebenbei" macht ;-) [12:32:34] die lese ich nicht [12:32:42] da komme ich nicht mit klar [12:32:43] oder im blog von wmde [12:32:46] *schäm* [12:32:55] och *tröst* [12:34:36] da gehts mir doch gleich wieder besse [12:36:34] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:40:11] die wahrscheinlichkeit, dass heute noch was passiert dürfte gegen 0 tendieren [12:42:34] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:44:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:44:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [12:45:04] da bleibt nix anderes als abzuwarten [12:45:18] bin dann mal wieder weg *wink* [12:45:26] *wink* [12:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:46:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:46:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [12:46:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [12:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [12:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [12:53:52] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 104342 MB (17% inode=99%): [12:55:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [12:56:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [12:57:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [12:59:54] SMTP on hemlock is CRITICAL: Connection refused [13:00:33] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 22.57, 22.13, 21.97 [13:04:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [13:07:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [13:23:32] SSH on hemlock is CRITICAL: Server answer: [13:23:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:24:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:27:20] no info in the topic, so, I will ask :) what happened with toolserver? some tools are not responding... [13:28:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:28:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [13:29:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:29:18] ;) [13:30:35] * Damianz thinks he might just disable his bot monitoring until the web stuff re-appears so he doesn't get spammed all day [13:31:13] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:37:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:42:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:43:34] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:44:02] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:45:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [13:46:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:46:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [13:46:23] toolserver down? [13:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [13:46:42] jopp [13:46:54] mmovchin: yes, for almost 8 hours now [13:47:11] oh no :( [13:47:14] whats the problem? [13:47:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [13:48:18] i think it's file system [13:48:51] ok [13:49:01] and where are the admins? :) [13:49:18] have a nice weekend [13:49:23] Drunk, naked, tied to a tree.... or it's a sunday [13:49:28] lol [13:49:41] heh [13:49:48] it's Oktoberfest here in munich :D [13:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [13:50:19] so it's all three, have you seen them there? ;) [13:50:36] no, I haven't [13:50:36] mmovchin: if i've been watching everything close enough, it's another memory overload. [13:50:37] I should get around to finishing updating my code so it can fallback to raping the wikipedia api rather than using the sql replicas.... which sucks perf wise but ~30seconds slow per edit vs dead isn't really an argument [13:50:58] but now I'm going to Wies'n [13:51:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [13:51:17] ok thanks [13:51:42] but sql should work? [13:52:30] Uses a php based api which just times out and throws a 502 as the queries are horrid and eventually plan on going away so are abstracted from the main code. [13:52:54] giftpflanze: sql @ z-dat-s4-a is not going to work, others might, I dunno [13:53:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 103000 MB (16% inode=99%): [13:55:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [13:56:07] @replag [13:56:07] liangent: No replag found. Use "@replag all" to see all lags. [13:56:21] @replag all [13:56:21] liangent: s1-rr-a: 1s [-0.12 s/s]; s1-user: 1s [-0.12 s/s]; s2-user: 16s [-0.28 s/s]; s2-user-c: 1s [-0.46 s/s]; s3-rr-a: 19s [-0.00 s/s]; s3-user: 19s [-0.00 s/s]; s4-rr-a: 1s [-]; s4-user: 1s [-0.00 s/s] [13:56:22] liangent: s5-rr-a: 3s [-0.00 s/s]; s5-user: 3s [-0.00 s/s]; s5-user-c: 1s [-0.46 s/s]; s6-rr-a: 2s [-0.00 s/s]; s6-user: 2s [-0.00 s/s]; s7-rr-a: 4s [-0.00 s/s]; s7-user: 4s [-0.00 s/s] [13:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [13:57:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [14:00:54] SMTP on hemlock is CRITICAL: Connection refused [14:01:32] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 25.60, 24.59, 24.18 [14:05:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [14:09:53] /sql on rosemary is CRITICAL: DISK CRITICAL - free space: /sql 106184 MB (10% inode=99%): [14:10:53] /sql on rosemary is WARNING: DISK WARNING - free space: /sql 111074 MB (11% inode=99%): [14:12:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:14:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:14:17] DeltaQuad: opening a jira ticket is mostly the fastest way to contact ts admins. I think you know more details (e.g. since when ..) than me. Can you create the ticket? [14:14:52] Merlissimo: IIRC someone said earlier that one was already open, but I never checked/found where [14:15:04] that may be unrelated [14:15:54] giftpflanze: what was it about again? [14:16:03] user-store [14:16:12] i only found one about user store [14:16:12] https://jira.toolserver.org/browse/TS-1519 [14:16:42] hmm k [14:17:26] I've never done this before, where am I supposed to file it? [14:18:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:18:28] under TS i guess [14:18:47] yes [14:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [14:22:21] from time to time i get "cannot allocate xxx bytes; core dumped" [14:22:31] and cron seems to be down as well [14:23:35] SSH on hemlock is CRITICAL: Server answer: [14:23:35] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:24:43] maybe it's a problem with hemlock? [14:24:52] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:24:55] trying to access its mounts hangs [14:28:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [14:28:43] https://jira.toolserver.org/browse/TS-1520 [14:29:19] it looked like wolfsbane, hemlock, and ortelius were all having issues earlier. [14:29:20] s4-a was announced to be down [14:29:22] i think [14:29:33] giftpflanze: I thought it was supposed to be back up [14:29:57] ya it is, we just lost logs [14:30:18] for like two days worth...*sigh* [14:31:49] giftpflanze: submit-crontab is stored on hemlock [14:32:15] ok, that's an explanation :) [14:32:52] * Merlissimo found an unkown server ;-) [14:33:09] ? [14:33:37] with linux 2.6.32 [14:37:10] nnightshade ? [14:38:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:38:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:39:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:39:23] Platonides: 8 letters [14:40:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [14:41:33] nightshade has 10 and yarrow 6 :S [14:41:40] what server did you discover? [14:42:10] and how did you do it? [14:42:26] i shouldn't post this name here. If all can know it Dab. would have posted it. perhaps a test server [14:43:02] clematis and hawthorn have 8 letters but they are still solaris [14:43:02] i only checked on which server user-store is mounted [14:43:07] hemlock [14:43:30] Platonides: you can login to hemlock? [14:43:33] that's 7 letters and you can't ssh in since it's almost-down [14:43:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:46] no, ssh_exchange_identification: Connection closed by remote host [14:44:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:45:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:45:13] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:46:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [14:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [14:47:03] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:47:03] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [14:47:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [14:48:45] Platonides: i just read the wikipedia article about that name. you can eat its fruits ;-) [14:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [14:51:48] the misterious server? [14:52:02] yes [14:52:05] our servers are poisonus! [14:52:20] never heard of that name before [14:53:54] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 101919 MB (16% inode=99%): [14:55:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [14:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [14:57:54] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [14:59:16] rosemary maybe? [14:59:40] it'd fit the naming requisites [14:59:47] although I can't login there [15:01:32] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 28.52, 27.53, 27.17 [15:01:53] SMTP on hemlock is CRITICAL: Connection refused [15:02:28] I iterated the server list, but the only two Linux servers were nightshade and yarrow: for i in $(cut -f1 -d, /etc/ssh/ssh_known_hosts| grep -v '^#' | grep -v mgmt | grep -v scs-oe16-esams.esi | uniq); do echo $i && ssh -n $i uname -a; done [15:06:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [15:06:39] no, rosemary is Solaris, too [15:10:19] as are hawthorn and clematis [15:20:41] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [15:23:14] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:24:32] SSH on hemlock is CRITICAL: Server answer: [15:24:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:24:54] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:26:12] Sun Grid Engine execd on willow is CRITICAL: Connection refused by host [15:28:13] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [15:28:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [15:29:22] Sun Grid Engine execd on willow is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:38:32] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:43:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:44:33] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:45:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [15:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [15:47:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [15:48:02] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:48:02] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [15:50:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [15:54:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 100858 MB (16% inode=99%): [15:56:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [15:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [15:57:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [16:01:33] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 30.58, 29.58, 29.18 [16:02:52] SMTP on hemlock is CRITICAL: Connection refused [16:06:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [16:20:42] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [16:24:32] SSH on hemlock is CRITICAL: Server answer: [16:24:33] CAM on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:25:53] / on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:28:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [16:39:33] /aux0 on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:43:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:32] /tmp on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:46:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:46:03] Load avg. on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:46:34] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [16:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [16:47:33] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [16:48:01] /home on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:48:13] Environment IPMI on hemlock is UNKNOWN: CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages. [16:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [16:54:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 99776 MB (16% inode=99%): [16:56:28] sql-s4-user is down it looks like... [16:56:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [16:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [16:57:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [17:01:33] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 32.54, 31.53, 31.12 [17:02:53] SMTP on hemlock is CRITICAL: Connection refused [17:06:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [17:11:36] getting forks errors on willow [17:12:42] top shows a lot of things from yasbot, including interwiki.py's [17:12:52] * DarkoNeko sighs [17:12:53] / on hemlock is CRITICAL: Connection refused by host [17:13:03] /home on hemlock is CRITICAL: Connection refused by host [17:13:03] Environment IPMI on hemlock is CRITICAL: Connection refused by host [17:13:03] Load avg. on hemlock is CRITICAL: Connection refused by host [17:13:06] yasbot causing problems? it was recently blocked on enwiki [17:13:14] FMA on hemlock is CRITICAL: ERROR - unexpected output from snmpwalk [17:13:33] /tmp on hemlock is CRITICAL: Connection refused by host [17:13:33] /aux0 on hemlock is CRITICAL: Connection refused by host [17:13:33] CAM on hemlock is CRITICAL: Connection refused by host [17:15:10] I don't have the adequate tools to pinpoints if someone is flooding really more process than the others, but I see him a lot on "top" [17:15:33] what surprises me is I thought taht interwiki.py was pretty much banned [17:17:27] probably remembering wrong [17:18:28] interwiki bots have a mmp now [17:18:29] can't even do a "man ps" lol [17:18:35] a mmp ? [17:19:16] multi-maintainer project [17:19:18] https://wiki.toolserver.org/view/Interwiki_bot_MMP_planning [17:20:03] ooh [17:20:26] i believe yasbot is violating the interwiki rule then [17:20:28] i think [17:20:51] Hi, I keep on getting segmentation faults on a lot of my cron jobs, am I the only one? [17:20:53] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [17:21:06] no, it's been hell since around a week ago [17:21:18] well, not segfault, a lot of mine haven't run altogether. [17:21:33] is cron running? [17:21:35] apparently someone flooded the cron system with new tasks, or at leas that's my theory [17:21:48] there wouldn't be so many failure if we just reached the limit [17:22:06] Well, the resto of jobs seem to be working well [17:22:09] rest* [17:23:35] frwiki's welcoming system took quite a huge hit [17:25:33] SSH on hemlock is CRITICAL: Connection refused [17:26:02] Environment IPMI on willow is CRITICAL: Connection refused by host [17:26:52] /tmp on willow is CRITICAL: Connection refused by host [17:26:53] / on willow is CRITICAL: Connection refused by host [17:26:54] Load avg. on willow is CRITICAL: Connection refused by host [17:28:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [17:29:00] aarghl [17:30:12] Sun Grid Engine execd on willow is CRITICAL: Connection refused by host [17:30:18] beria has... 87 welcome.py running ? apparently with the same params ? :o [17:31:04] Heh. [17:31:11] she's on IRC, will try to contact [17:31:38] quite possible some breakage in cron, crond shouldn't restart a process already started [17:33:53] I'm not sur ehow to interpret the "time" parameter on top [17:34:42] cpu time used [17:34:54] STIME is start time [17:35:19] ah, gotta find that one then, thanks phe, noommos [17:35:58] they've started about 6 of the welcome processes today [17:36:30] they're run without -break and so continue running endlessely ? [17:38:11] not sure [17:38:57] *check the help* yes, if not specified the script will loop. but I may be missing the end of the command line [17:41:04] beria doesn't seems to be around her comp [17:41:57] should we try a jira/requet/something to get most of those killed as a temp measure ? I'm not sure who to ping [17:42:18] tsadmins aren't around [17:42:28] thre gotta be an emergency mail somewhere [17:43:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [17:46:54] phe: cron does not care of other processes and alsways starts a new one [17:47:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [17:47:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [17:48:12] there should be only one instance of each bot running [17:48:42] with the exception you can run more if you are by the computer and present on this channel [17:48:49] so anytime necessary you can kill it [17:49:35] Danny_B|backup: which bot? [17:49:45] any [17:50:12] running 8 or more clones of the same bot with same params is hella unfriendly for others [17:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [17:50:35] (that's whyt i just quickly noticed via top) [17:50:36] bryan has running 10 TsLogBot processes on sge [17:51:46] what is hamsang.py? a pywikipedia script? [17:52:00] i have 3 instances of my bot running, but not more than one on each server [17:53:11] noommos: you can read the source [17:54:12] script written by reza [17:55:53] /sql on ptolemy is WARNING: DISK WARNING - free space: /sql 98677 MB (16% inode=99%): [17:56:13] mail sent to the ts-admins address [17:56:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [17:56:33] ah wait, waht did I miss ? *reads backlog* [17:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [17:58:54] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [18:00:36] DarkoNeko: i've sent mail to them earlier today [18:00:40] yasbot is also stuck on the imagecaptcha at ptwiki [18:00:41] now that's a friendly name, stupidTS [18:00:43] So is any part of TS working? [18:00:44] Danny_B|backup, ah, thanks [18:00:53] /sql on cassia is CRITICAL: DISK CRITICAL - free space: /sql 8593 MB (0% inode=93%): [18:00:58] stupidTS, you sure wants us to feel nice talking to you [18:01:09] he's just running interwiki bots without even looking at local bot policies [18:01:32] Load avg. on yarrow is CRITICAL: CRITICAL - load average: 34.55, 33.57, 33.15 [18:01:46] Better? [18:01:53] the command line of TS is accessible, with a lot fo scripts running [18:02:06] web access doesn't seems back on, probably related to the issue above [18:02:08] much better, thanks :) [18:02:15] sge scheduler is still working ;-) [18:02:40] DarkoNeko: Thank you. [18:02:47] noommos: it's not an interwiki bot (altough i havn't read source in detail) [18:02:49] ETA when it's all back up? [18:02:53] SMTP on hemlock is CRITICAL: Connection refused [18:03:08] Merlissimo: one of his other processes [18:03:37] Riley: 5-10 min after ETA of ts-admin back ;-) [18:03:38] we've found some problematic scripts that hopefully will be get riden of/corrected whenever a TS admin sees our mails ; I'm not sure if it will correct everything but that can only help [18:03:54] Merlissimo: Thanks. [18:04:16] Riley: do you have an ETA for admin back? [18:04:32] I wish I could. [18:05:41] brb, dinner [18:05:48] DarkoNeko; web issues are related to hemlock [18:06:13] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [18:08:26] ah, ok [18:10:15] does anybody know how to test if user-store is available? [18:10:45] Try and use it? [18:11:16] just hangs [18:11:25] no, i need a simply indicator/sensor test [18:11:58] Try and write some file, use a timeout and just let sigalarm denote fail [18:12:06] beria's welcome.py issue is getting taken care of by its owner [18:12:20] i have written a load sensor fpr sge before but that fails because it takes too long. thats why i have set fs-user-store resource to 0 manually this morning [18:12:54] / on hemlock is CRITICAL: Connection refused by host [18:13:13] FMA on hemlock is CRITICAL: ERROR - unexpected output from snmpwalk [18:13:32] /tmp on hemlock is CRITICAL: Connection refused by host [18:13:33] /aux0 on hemlock is CRITICAL: Connection refused by host [18:13:33] CAM on hemlock is CRITICAL: Connection refused by host [18:14:02] Load avg. on hemlock is CRITICAL: Connection refused by host [18:14:02] /home on hemlock is CRITICAL: Connection refused by host [18:14:02] Environment IPMI on hemlock is CRITICAL: Connection refused by host [18:14:31] my current sensor simply tries to read a file from that file system, but this takes longer than 15 seconds [18:15:54] If it takes longer than15 seconds it should be a fail [18:20:52] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [18:25:33] SSH on hemlock is CRITICAL: Connection refused [18:26:02] Environment IPMI on willow is CRITICAL: Connection refused by host [18:26:53] /tmp on willow is CRITICAL: Connection refused by host [18:26:54] / on willow is CRITICAL: Connection refused by host [18:26:54] Load avg. on willow is CRITICAL: Connection refused by host [18:28:33] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [18:30:13] Sun Grid Engine execd on willow is CRITICAL: Connection refused by host [18:32:39] hello all [18:33:26] DaB. * Re: [Toolserver-l] Web servers unresponsive [18:33:43] DaBPunkt: Thanks for the email update [18:37:06] Riley: no problem. I was busy the hole day, away from my pc so I didn't noticed the problem [18:37:23] I guess I should re-enable the sms-alert [18:38:00] What ever gets ACC back up :D [18:38:14] how does that work? [18:38:28] giftpflanze: what? [18:38:32] the alert [18:38:49] email2sms-bridge [18:38:55] oh [18:39:41] I disabled it after my sms-space on my handy was full because z-dat-s* had have a hiccup… [18:43:42] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [18:47:33] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [18:47:34] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [18:48:03] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:50:13] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [18:50:32] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.003 second response time [18:52:52] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.007 second response time [18:53:38] ok, it should work now [18:54:07] \o\ hello [18:54:52] /sql on ptolemy is OK: DISK OK - free space: /sql 220052 MB (36% inode=99%): [18:56:16] somehow i lost my data on user-store [18:56:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [18:56:34] user-store is unavaiable at the moment [18:56:39] so is the backup [18:57:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [18:58:41] thx, DaBPunkt [18:59:32] Load avg. on yarrow is WARNING: WARNING - load average: 0.00, 5.66, 19.41 [18:59:52] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [19:03:33] Load avg. on yarrow is OK: OK - load average: 0.00, 2.54, 14.98 [19:06:22] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [19:08:25] DaB. * Re: [Toolserver-announce] [Toolserver-l] Web servers unresponsive [19:08:36] what happened to toolserver? [19:08:42] it's down [19:09:28] Nirvanchik: see the last few messages at http://lists.wikimedia.org/pipermail/toolserver-l/2012-September/date.html first ? [19:09:37] and then ask questions? [19:14:27] * Damianz pats DaB [19:14:39] Now just nip back 9hours and do the same :P [19:21:52] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [19:22:21] the good news is: the array is stil viseble from hemlock. So I guess we need no colo-visit [19:23:00] "If it really does not matter for you when your task starts, then take the [19:23:00] position of the first letter of your user-name and add 2 ("dab" > "d" > 4 > 6)" cool instraction) I'll follow this [19:24:52] (if I ever get to the toolserver) [19:25:54] thanks everybody, good night, [19:26:02] Environment IPMI on willow is CRITICAL: Connection refused by host [19:26:53] /tmp on willow is CRITICAL: Connection refused by host [19:27:14] Nirvanchik: test [19:27:28] jeremyb: test [19:27:39] sorry, I'm off [19:27:52] Load avg. on willow is CRITICAL: Connection refused by host [19:27:52] / on willow is CRITICAL: Connection refused by host [19:29:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [19:30:12] Sun Grid Engine execd on willow is CRITICAL: Connection refused by host [19:46:41] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [19:48:31] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [19:48:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [19:50:21] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [19:57:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [19:58:13] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [19:58:39] DaBPunkt: i still have issues with /mnt/user-store [19:58:52] Free Memory on damiana is WARNING: WARNING - 6.5% (548052 kB) free! [19:59:01] no one said that it's available again [20:00:52] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [20:03:25] Platonides * Re: [Toolserver-l] Web servers unresponsive [20:07:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [20:11:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564640 MB (10% inode=46%): [20:11:42] rebooting hemlock [20:16:52] \o/ [20:18:02] toolserver.org HTTP on wolfsbane is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:18:41] toolserver.org HTTP on ortelius is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:20:20] failed. Restored previous setup and rebooted again to bring back the web-service [20:20:25] Hersfold * Re: [Toolserver-l] Web servers unresponsive [20:20:26] need a short moment [20:21:52] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [20:26:01] Environment IPMI on willow is CRITICAL: Connection refused by host [20:26:33] toolserver.org HTTP on ortelius is WARNING: HTTP WARNING: HTTP/1.1 200 OK - 239 bytes in 0.594 second response time [20:26:52] toolserver.org HTTP on wolfsbane is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.005 second response time [20:27:32] toolserver.org HTTP on ortelius is OK: HTTP OK: HTTP/1.1 200 OK - 239 bytes in 0.003 second response time [20:27:41] back [20:27:51] /tmp on willow is CRITICAL: Connection refused by host [20:27:52] Load avg. on willow is CRITICAL: Connection refused by host [20:28:52] / on willow is CRITICAL: Connection refused by host [20:29:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [20:30:13] Sun Grid Engine execd on willow is CRITICAL: Connection refused by host [20:34:57] nighty~ [20:41:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564628 MB (10% inode=46%): [20:42:35] yay, user-store is up [20:43:04] giftpflanze: partly: qstat -F fs-user-store [20:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [20:48:25] Hersfold * Re: [Toolserver-l] Web servers unresponsive [20:48:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [20:48:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [20:51:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [20:57:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [20:58:52] Free Memory on damiana is WARNING: WARNING - 6.4% (540240 kB) free! [20:59:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [21:00:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [21:01:51] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41575 MB (10% inode=99%): [21:01:52] /sql on z-dat-s6-a is WARNING: DISK WARNING - free space: /sql 81539 MB (8% inode=98%): [21:01:53] /sql on z-dat-s3-a is WARNING: DISK WARNING - free space: /sql 81539 MB (8% inode=98%): [21:07:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [21:12:53] /tmp on willow is OK: DISK OK - free space: / 40359 MB (38% inode=99%): [21:12:53] / on willow is OK: DISK OK - free space: / 40359 MB (38% inode=99%): [21:12:53] Load avg. on willow is OK: OK - load average: 7.49, 6.63, 5.77 [21:13:12] Environment IPMI on willow is OK: ok: temperature ok fan ok voltage ok chassis ok [21:13:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [21:21:53] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [21:29:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [21:42:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564285 MB (10% inode=46%): [21:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [21:48:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [21:48:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [21:52:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [21:54:25] DaB. * Re: [Toolserver-announce] [Toolserver-l] Web servers unresponsive [21:57:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [21:58:52] Free Memory on damiana is WARNING: WARNING - 6.7% (558600 kB) free! [21:59:12] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:01:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [22:07:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [22:09:01] DaBPunkt: Do the toolserver webservers normally automatically create E-Tags for the content they serve up for caching? [22:12:55] apmon: At least I didn't change anything at the webserver-program [22:13:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [22:14:00] It looks like http://toolserver.org/~osm/libs/openlayers/latest/OpenLayers.js isn't generating E-Tag or cache headers [22:14:43] and it did before? [22:14:51] Given that that is very large (nearly 1Mb) and is loaded when you click on "Karte " in e.g. the german wikipedia, it would be good if it could be cached [22:14:58] DaBPunkt: No idea, probably not [22:15:34] I don't think it has anything to do with the outage. I just happened to notice it now, as my connection is slow and so it took ages to load the map in wikipedia [22:17:05] And I was trying to figure out why it is quite as slow. Being 1Mb is presumably the main reason. But increased caching might help mitigate that. [22:19:02] apmon: AFAIS apache use the FileETag-directive to cinfig the etags. But I can't find nothing for zeus [22:22:52] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:27:36] Do you know if enabling etags through .htaccess works? [22:28:02] no, but you could just try [22:28:18] OK, thanks, will do [22:29:31] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [22:41:22] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:42:33] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 564180 MB (10% inode=46%): [22:44:02] I was able to enable cache-control: max-age form .htaccess, but not etag. [22:44:50] They are mentioned in the Zeus 7.1 manual, but in the Zeus 4.3 manual they are only mentioned as missing from the Perl API [22:44:52] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41505 MB (10% inode=99%): [22:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [22:48:22] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:48:22] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:48:22] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [22:48:32] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [22:48:32] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [22:52:12] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [22:53:53] /sql on z-dat-s6-a is WARNING: DISK WARNING - free space: /sql 81947 MB (8% inode=98%): [22:53:53] /sql on z-dat-s3-a is WARNING: DISK WARNING - free space: /sql 81926 MB (8% inode=98%): [22:57:32] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [22:59:21] Sun Grid Engine execd on wolfsbane is WARNING: NRPE: Unable to read output [22:59:52] Free Memory on damiana is WARNING: WARNING - 7.0% (586764 kB) free! [23:01:53] SRaid on nightshade is CRITICAL: NRPE: Unable to read output [23:01:53] Free Memory on damiana is OK: OK - 7.1% (591396 kB) free. [23:07:12] FMA on amaranth is CRITICAL: Failed components: hc://:product-id=SUN-FIRE-X4150:server-id=amaranth:chassis-id=0819QAR1D1:serial=518545072303039020:part=72T256520HFD3SB:revision=--/motherboard=0/memory-controller=1/dram-channel=2/dimm=3/rank=7 [23:07:53] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41479 MB (10% inode=99%): [23:11:22] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:12:52] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41475 MB (10% inode=99%): [23:13:12] Sun Grid Engine execd on willow is WARNING: NRPE: Unable to read output [23:22:51] Environment IPMI on thyme is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:29:32] SMF on web.amaranth is CRITICAL: ERROR - maintenance: svc:/application/jira:default [23:36:52] Load avg. on ortelius is WARNING: WARNING - load average: 15.95, 14.51, 8.55 [23:42:52] Load avg. on ortelius is OK: OK - load average: 11.95, 14.90, 11.01 [23:43:32] /aux0 on hemlock is WARNING: DISK WARNING - free space: /aux0 562827 MB (10% inode=46%): [23:43:37] nacht ts [23:46:42] fisheye.toolserver.org on web.amaranth is CRITICAL: CRITICAL - Socket timeout after 21 seconds [23:49:31] MySQL slave on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [23:49:31] MySQL on z-dat-s4-a is CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [23:52:22] Sun Grid Engine execd on ortelius is WARNING: NRPE: Unable to read output [23:55:22] /sql on z-dat-s6-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:55:23] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:55:23] /sql on z-dat-s7-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds. [23:55:53] /sql on z-dat-s6-a is WARNING: DISK WARNING - free space: /sql 81892 MB (8% inode=98%): [23:55:53] /sql on z-dat-s3-a is WARNING: DISK WARNING - free space: /sql 81892 MB (8% inode=98%): [23:55:53] /sql on z-dat-s7-a is WARNING: DISK WARNING - free space: /sql 41423 MB (10% inode=99%): [23:57:33] s4 replag on z-dat-s4-a is CRITICAL: QUERY CRITICAL: Cant connect to MySQL server on z-dat-s4-a (146) [23:58:22] /sql on z-dat-s3-a is CRITICAL: CHECK_NRPE: Socket timeout after 30 seconds.