[00:03:48] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [00:11:17] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [00:16:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [00:41:19] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [00:46:19] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:11:21] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [01:14:51] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.64, 7.25, 5.78 [01:16:21] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [01:43:29] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 166 processes [01:44:39] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [01:49:49] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:04:19] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [02:09:09] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [02:15:34] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [02:20:44] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:43:41] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 24% free memory [02:47:45] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [02:49:15] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.24, 4.88, 4.00 [02:49:32] 06/29/2012 - 02:49:32 - User laner may have been modified in LDAP or locally, updating key in project(s): deployment-prep [02:55:47] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [02:55:57] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 7.31, 6.39, 4.90 [02:59:53] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:00:14] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [03:01:48] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:48] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:48] PROBLEM Current Load is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:48] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:48] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:14] PROBLEM Disk Space is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:15] PROBLEM Current Users is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:05] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:05] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:05] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:16] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:19] PROBLEM dpkg-check is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:19] PROBLEM Total Processes is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:24] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:13:33] RECOVERY Disk Space is now: OK on gluster-4 i-000002e4 output: DISK OK [03:13:33] RECOVERY Current Users is now: OK on gluster-4 i-000002e4 output: USERS OK - 0 users currently logged in [03:13:33] PROBLEM Free ram is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:15:47] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [03:15:52] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [03:15:52] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 5.41, 9.65, 8.26 [03:16:01] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:06] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM dpkg-check is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM Free ram is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM Current Users is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM Disk Space is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM dpkg-check is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:07] PROBLEM Total Processes is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:08] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:11] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:12] PROBLEM Disk Space is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:12] PROBLEM Current Users is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:12] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:17] PROBLEM Total Processes is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:22] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:22] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:22] PROBLEM Current Load is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:22] PROBLEM Free ram is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:16:22] PROBLEM Current Load is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [03:18:01] RECOVERY Current Load is now: OK on pybal-precise i-00000289 output: OK - load average: 1.18, 2.54, 3.40 [03:18:01] RECOVERY dpkg-check is now: OK on gluster-4 i-000002e4 output: All packages OK [03:18:01] RECOVERY Total Processes is now: OK on gluster-4 i-000002e4 output: PROCS OK: 84 processes [03:18:06] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [03:18:06] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 68% free memory [03:18:06] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [03:18:06] RECOVERY Current Load is now: OK on gluster-4 i-000002e4 output: OK - load average: 0.27, 3.43, 4.96 [03:18:06] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [03:18:07] RECOVERY Current Load is now: OK on precise-test i-00000231 output: OK - load average: 1.37, 3.56, 3.91 [03:18:07] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 83 processes [03:18:11] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 80% free memory [03:18:11] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 90 processes [03:18:22] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:18:22] PROBLEM Current Load is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:18:22] PROBLEM Disk Space is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:18:22] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:18:28] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [03:19:06] RECOVERY Free ram is now: OK on deployment-apache31 i-000002d4 output: OK: 88% free memory [03:20:28] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [03:20:28] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.95, 11.12, 9.40 [03:20:28] RECOVERY Disk Space is now: OK on deployment-apache31 i-000002d4 output: DISK OK [03:20:28] PROBLEM Current Load is now: WARNING on deployment-apache31 i-000002d4 output: WARNING - load average: 0.70, 4.28, 5.50 [03:20:28] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [03:20:59] RECOVERY Total Processes is now: OK on deployment-apache31 i-000002d4 output: PROCS OK: 125 processes [03:21:04] RECOVERY Current Users is now: OK on deployment-apache31 i-000002d4 output: USERS OK - 0 users currently logged in [03:21:05] RECOVERY Free ram is now: OK on pybal-precise i-00000289 output: OK: 78% free memory [03:21:05] RECOVERY Current Users is now: OK on pybal-precise i-00000289 output: USERS OK - 0 users currently logged in [03:21:05] RECOVERY dpkg-check is now: OK on deployment-apache31 i-000002d4 output: All packages OK [03:21:05] RECOVERY Disk Space is now: OK on pybal-precise i-00000289 output: DISK OK [03:21:05] RECOVERY Current Load is now: OK on mwreview i-000002ae output: OK - load average: 0.34, 2.24, 3.35 [03:21:05] RECOVERY Disk Space is now: OK on configtest-main i-000002dd output: DISK OK [03:21:06] RECOVERY Current Users is now: OK on configtest-main i-000002dd output: USERS OK - 0 users currently logged in [03:21:06] RECOVERY Total Processes is now: OK on redis1 i-000002b6 output: PROCS OK: 88 processes [03:21:10] RECOVERY Free ram is now: OK on redis1 i-000002b6 output: OK: 85% free memory [03:21:10] RECOVERY Current Load is now: OK on configtest-main i-000002dd output: OK - load average: 0.45, 3.18, 3.72 [03:21:10] RECOVERY Total Processes is now: OK on pybal-precise i-00000289 output: PROCS OK: 89 processes [03:21:15] RECOVERY dpkg-check is now: OK on pybal-precise i-00000289 output: All packages OK [03:21:21] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 3.15, 4.71, 4.86 [03:21:21] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 85 processes [03:22:05] RECOVERY Current Load is now: OK on maps-tilemill1 i-00000294 output: OK - load average: 0.16, 3.28, 4.27 [03:22:05] RECOVERY Current Load is now: OK on redis1 i-000002b6 output: OK - load average: 0.97, 4.08, 4.17 [03:22:05] RECOVERY Disk Space is now: OK on redis1 i-000002b6 output: DISK OK [03:22:05] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af output: All packages OK [03:24:35] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [03:24:35] RECOVERY Current Load is now: OK on deployment-apache31 i-000002d4 output: OK - load average: 0.27, 1.82, 4.10 [03:26:55] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:29:50] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 20% free memory [03:30:39] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 1.45, 1.93, 4.12 [03:43:14] PROBLEM Free ram is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:47:37] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:06] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [03:49:42] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [03:49:42] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:47] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:54] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:54] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:48] PROBLEM Disk Space is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:49] PROBLEM Current Users is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:56:59] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [03:57:25] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [03:57:25] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [03:58:19] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:44] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 12% free memory [03:58:55] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:55] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [03:58:55] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:45] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:45] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:50] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:02:45] RECOVERY SSH is now: OK on bots-sql2 i-000000af output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [04:02:45] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 68 MB (5% inode=57%): [04:03:00] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:05] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:05] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:05] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:05] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:03:13] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:05:17] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 89% free memory [04:05:23] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [04:07:33] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 14.19, 13.47, 9.13 [04:08:10] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:10] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:10] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:10] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:10] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [04:08:15] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:09:36] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [04:09:36] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [04:09:36] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.25, 1.97, 3.61 [04:12:00] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [04:12:01] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [04:12:01] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 88% free memory [04:12:01] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 86 processes [04:12:27] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:28] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:28] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:28] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:12:33] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:13:09] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 4.93, 5.13, 4.81 [04:13:09] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:13:09] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:13:30] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:13:30] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 93% free memory [04:13:30] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 91 processes [04:14:26] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 11% free memory [04:15:00] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:15:09] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 13% free memory [04:15:24] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:15:54] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [04:15:54] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [04:15:54] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 110 processes [04:15:59] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [04:16:09] PROBLEM Current Load is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [04:16:39] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:17:02] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:17:26] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:48] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:20:49] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [04:20:49] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:54] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:54] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:54] PROBLEM Total Processes is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:59] PROBLEM dpkg-check is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:20:59] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [04:20:59] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 96 processes [04:21:25] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 94% free memory [04:21:46] PROBLEM Free ram is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:46] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:46] PROBLEM dpkg-check is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:46] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:46] PROBLEM Total Processes is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:21:51] RECOVERY Current Load is now: OK on rds i-00000207 output: OK - load average: 0.71, 2.49, 4.21 [04:21:51] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [04:21:57] PROBLEM dpkg-check is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:22:49] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [04:22:49] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [04:22:50] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 1.19, 3.20, 5.90 [04:23:15] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 136 processes [04:24:17] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [04:24:32] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:24:38] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:25:15] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [04:25:20] RECOVERY Disk Space is now: OK on maps-tilemill1 i-00000294 output: DISK OK [04:25:20] RECOVERY Free ram is now: OK on maps-tilemill1 i-00000294 output: OK: 83% free memory [04:25:54] RECOVERY dpkg-check is now: OK on maps-tilemill1 i-00000294 output: All packages OK [04:25:54] RECOVERY Total Processes is now: OK on maps-tilemill1 i-00000294 output: PROCS OK: 104 processes [04:27:52] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:27:52] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [04:27:57] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:28:27] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 92% free memory [04:29:24] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [04:30:20] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:32:39] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:32:39] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [04:33:36] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [04:35:13] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:37:02] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:02] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:07] PROBLEM Disk Space is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:08] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:08] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:37:08] PROBLEM Free ram is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [04:40:29] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:41:49] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 156 processes [04:41:54] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [04:41:54] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [04:41:54] RECOVERY Current Load is now: OK on incubator-bot2 i-00000252 output: OK - load average: 0.54, 1.64, 1.98 [04:41:54] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [04:41:54] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 35% free memory [04:41:58] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:41:58] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:42:03] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:42:03] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [04:43:40] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [04:45:11] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:46:46] RECOVERY Total Processes is now: OK on incubator-bot2 i-00000252 output: PROCS OK: 148 processes [04:51:42] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [04:59:20] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:03:16] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.70, 0.90, 3.25 [05:08:09] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.53, 0.86, 2.61 [05:21:42] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [05:30:32] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [05:41:13] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 18% free memory [05:51:48] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [05:54:09] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [06:00:29] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [06:01:29] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:14:21] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [06:21:56] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [06:31:15] PROBLEM Free ram is now: CRITICAL on gluster-1 i-000002df output: CHECK_NRPE: Socket timeout after 10 seconds. [06:31:16] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:31:26] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [06:33:37] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [06:35:36] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [06:35:46] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 46.51, 22.26, 10.59 [06:35:46] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 153 processes [06:35:51] PROBLEM Free ram is now: UNKNOWN on gluster-1 i-000002df output: NRPE: Unable to read output [06:35:51] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 166 processes [06:35:57] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [06:35:57] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:06] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:40:06] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:43:07] RECOVERY Disk Space is now: OK on ipv6test1 i-00000282 output: DISK OK [06:43:39] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:15] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:15] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:15] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:44:15] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:57] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:26] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:26] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:57] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:46] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [06:58:46] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:21] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:06:11] uh? what just happened. my instance diead? [07:06:14] http://ganglia.wmflabs.org/latest/?r=2hr&cs=&ce=&c=integration&h=integration-apache1&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [07:06:27] since 6:20 started building up something [07:06:29] I wasn't even using it [07:14:18] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [07:14:18] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [07:14:18] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:15:40] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:16] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [07:18:16] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [07:22:36] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [07:24:42] PROBLEM Disk Space is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:24:42] PROBLEM Total Processes is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:24:52] PROBLEM Disk Space is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:24:53] PROBLEM Total Processes is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:25:32] PROBLEM Total Processes is now: WARNING on incubator-bot2 i-00000252 output: PROCS WARNING: 156 processes [07:25:37] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 6.79, 7.66, 7.84 [07:25:37] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [07:25:37] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [07:25:42] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 115 processes [07:25:47] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 72% free memory [07:25:47] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [07:25:47] PROBLEM Current Load is now: WARNING on worker1 i-00000208 output: WARNING - load average: 5.95, 6.84, 7.36 [07:25:52] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [07:25:52] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [07:25:52] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 92% free memory [07:25:52] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 90 processes [07:28:15] PROBLEM Current Load is now: WARNING on gluster-4 i-000002e4 output: WARNING - load average: 9.11, 8.22, 7.22 [07:28:15] PROBLEM Current Load is now: WARNING on labs-nfs1 i-0000005d output: WARNING - load average: 8.34, 8.82, 9.00 [07:28:15] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 9.69, 12.19, 13.61 [07:28:25] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:28:30] PROBLEM Current Load is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:29:36] PROBLEM Current Load is now: CRITICAL on integration-apache1 i-000002eb output: CRITICAL - load average: 10.12, 23.84, 29.67 [07:29:36] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 5.04, 7.01, 8.03 [07:30:04] RECOVERY Total Processes is now: OK on integration-apache1 i-000002eb output: PROCS OK: 104 processes [07:32:34] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 5.32, 5.95, 5.99 [07:32:34] PROBLEM Current Users is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:34] PROBLEM SSH is now: CRITICAL on bots-sql2 i-000000af output: CRITICAL - Socket timeout after 10 seconds [07:32:34] PROBLEM dpkg-check is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:34] PROBLEM Total Processes is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Current Load is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Current Users is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Disk Space is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Free ram is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM dpkg-check is now: CRITICAL on incubator-bot2 i-00000252 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Current Users is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:39] PROBLEM Disk Space is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:40] PROBLEM Free ram is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:40] PROBLEM Total Processes is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:44] PROBLEM dpkg-check is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:44] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:50] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:33:34] PROBLEM Current Load is now: WARNING on ee-prototype i-0000013d output: WARNING - load average: 4.22, 5.26, 5.83 [07:33:34] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 4.12, 4.80, 5.81 [07:33:34] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CRITICAL - load average: 5.71, 7.80, 21.17 [07:33:34] PROBLEM Current Load is now: WARNING on hugglewiki i-000000aa output: WARNING - load average: 7.43, 6.73, 7.21 [07:34:00] PROBLEM Current Load is now: WARNING on grail i-000002c6 output: WARNING - load average: 1.16, 4.66, 6.10 [07:34:00] PROBLEM Current Load is now: WARNING on wep i-000000c2 output: WARNING - load average: 5.05, 5.94, 5.92 [07:34:00] PROBLEM Current Load is now: WARNING on kripke i-00000268 output: WARNING - load average: 7.43, 6.90, 6.69 [07:34:00] PROBLEM Current Load is now: WARNING on ganglia-test2 i-00000250 output: WARNING - load average: 6.00, 6.77, 6.24 [07:34:00] PROBLEM Current Load is now: WARNING on swift-fe1 i-000001d2 output: WARNING - load average: 4.36, 5.20, 5.62 [07:34:00] PROBLEM Current Load is now: WARNING on bots-apache1 i-000000b0 output: WARNING - load average: 7.05, 5.86, 7.60 [07:34:36] PROBLEM Current Load is now: WARNING on redis1 i-000002b6 output: WARNING - load average: 5.49, 5.71, 6.20 [07:34:36] PROBLEM Disk Space is now: WARNING on ipv6test1 i-00000282 output: DISK WARNING - free space: / 69 MB (5% inode=57%): [07:34:36] PROBLEM Current Load is now: WARNING on ipv6test1 i-00000282 output: WARNING - load average: 0.44, 4.35, 5.85 [07:35:07] PROBLEM Free ram is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:08] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:18] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:23] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:28] PROBLEM Current Load is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:28] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:28] PROBLEM Current Load is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:37:56] RECOVERY Current Load is now: OK on pybal-precise i-00000289 output: OK - load average: 1.18, 3.24, 4.74 [07:37:56] RECOVERY Current Users is now: OK on pybal-precise i-00000289 output: USERS OK - 0 users currently logged in [07:37:56] RECOVERY Disk Space is now: OK on pybal-precise i-00000289 output: DISK OK [07:37:56] RECOVERY Free ram is now: OK on pybal-precise i-00000289 output: OK: 78% free memory [07:37:56] RECOVERY Total Processes is now: OK on pybal-precise i-00000289 output: PROCS OK: 83 processes [07:38:01] RECOVERY dpkg-check is now: OK on pybal-precise i-00000289 output: All packages OK [07:38:01] RECOVERY Total Processes is now: OK on gluster-4 i-000002e4 output: PROCS OK: 97 processes [07:38:06] RECOVERY Disk Space is now: OK on grail i-000002c6 output: DISK OK [07:38:07] PROBLEM Current Load is now: WARNING on wikisource-web i-000000fe output: WARNING - load average: 1.32, 3.32, 5.24 [07:38:11] RECOVERY Current Users is now: OK on bots-sql2 i-000000af output: USERS OK - 0 users currently logged in [07:38:12] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 2.80, 4.67, 6.66 [07:38:12] PROBLEM Current Load is now: WARNING on psm-precise i-000002f2 output: WARNING - load average: 8.26, 10.87, 10.22 [07:38:12] PROBLEM Current Users is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:12] PROBLEM Total Processes is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:17] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:21] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:21] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:41] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 10.54, 9.87, 8.12 [07:38:55] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: PROCS CRITICAL: 205 processes [07:39:00] RECOVERY Current Load is now: OK on ee-prototype i-0000013d output: OK - load average: 0.42, 2.27, 4.37 [07:39:00] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 15.95, 10.67, 18.28 [07:39:05] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [07:39:05] RECOVERY Current Load is now: OK on grail i-000002c6 output: OK - load average: 0.24, 1.83, 4.46 [07:39:05] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [07:39:05] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af output: All packages OK [07:39:05] RECOVERY Current Load is now: OK on ipv6test1 i-00000282 output: OK - load average: 3.88, 3.14, 4.82 [07:39:05] PROBLEM Current Load is now: WARNING on configtest-main i-000002dd output: WARNING - load average: 8.35, 9.38, 8.07 [07:39:05] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:10] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:10] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:10] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:10] PROBLEM Total Processes is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:16] PROBLEM dpkg-check is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:16] PROBLEM Current Load is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:16] PROBLEM Free ram is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:16] PROBLEM Free ram is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:16] PROBLEM Disk Space is now: CRITICAL on deployment-apache31 i-000002d4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:39:47] PROBLEM Current Load is now: CRITICAL on redis1 i-000002b6 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:25] pfff [07:40:51] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:01] PROBLEM Current Users is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:01] PROBLEM Disk Space is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:01] PROBLEM dpkg-check is now: CRITICAL on gluster-4 i-000002e4 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:01] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:07] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:07] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:07] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:07] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:41:07] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [07:42:01] PROBLEM Total Processes is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:42:06] PROBLEM Current Users is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:42:06] PROBLEM Disk Space is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:09] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 5.23, 5.83, 6.59 [07:43:09] RECOVERY SSH is now: OK on bots-sql2 i-000000af output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:43:10] RECOVERY dpkg-check is now: OK on deployment-apache31 i-000002d4 output: All packages OK [07:43:10] RECOVERY Current Load is now: OK on wikisource-web i-000000fe output: OK - load average: 0.55, 1.54, 3.86 [07:43:10] RECOVERY Disk Space is now: OK on bots-sql2 i-000000af output: DISK OK [07:43:10] RECOVERY Total Processes is now: OK on bots-sql2 i-000000af output: PROCS OK: 87 processes [07:43:14] PROBLEM Current Load is now: WARNING on mwreview i-000002ae output: WARNING - load average: 5.07, 6.96, 7.92 [07:43:14] PROBLEM Current Load is now: WARNING on rds i-00000207 output: WARNING - load average: 6.03, 5.33, 5.18 [07:43:15] PROBLEM Current Load is now: WARNING on integration-apache1 i-000002eb output: WARNING - load average: 12.63, 12.72, 19.20 [07:43:26] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:26] PROBLEM Disk Space is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:26] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:43:26] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:19] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 0.92, 2.14, 4.80 [07:44:29] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 7.44, 9.45, 12.16 [07:44:29] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [07:44:29] RECOVERY Total Processes is now: OK on maps-tilemill1 i-00000294 output: PROCS OK: 111 processes [07:44:34] RECOVERY Free ram is now: OK on maps-tilemill1 i-00000294 output: OK: 80% free memory [07:44:34] RECOVERY dpkg-check is now: OK on maps-tilemill1 i-00000294 output: All packages OK [07:44:35] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 198 processes [07:44:39] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 2.70, 2.99, 4.43 [07:44:45] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:44:45] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 139 processes [07:44:50] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [07:44:55] RECOVERY Disk Space is now: OK on maps-tilemill1 i-00000294 output: DISK OK [07:44:55] PROBLEM Current Load is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:10] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:45:28] RECOVERY Current Load is now: OK on labs-nfs1 i-0000005d output: OK - load average: 7.00, 3.93, 4.65 [07:45:28] RECOVERY Current Load is now: OK on wep i-000000c2 output: OK - load average: 0.12, 1.25, 3.42 [07:45:28] RECOVERY Current Load is now: OK on kripke i-00000268 output: OK - load average: 0.27, 1.95, 4.38 [07:45:28] RECOVERY Current Load is now: OK on swift-fe1 i-000001d2 output: OK - load average: 0.18, 1.24, 3.36 [07:45:28] RECOVERY Current Load is now: OK on bots-apache1 i-000000b0 output: OK - load average: 0.06, 1.08, 4.05 [07:45:28] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [07:45:29] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [07:45:29] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 5.08, 7.63, 10.58 [07:45:30] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 138 processes [07:46:01] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:46:07] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:46:17] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:47:24] RECOVERY Current Users is now: OK on integration-apache1 i-000002eb output: USERS OK - 0 users currently logged in [07:47:24] RECOVERY Disk Space is now: OK on integration-apache1 i-000002eb output: DISK OK [07:49:10] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 4.06, 4.90, 6.08 [07:49:10] RECOVERY Current Users is now: OK on incubator-bot2 i-00000252 output: USERS OK - 0 users currently logged in [07:49:10] RECOVERY Disk Space is now: OK on incubator-bot2 i-00000252 output: DISK OK [07:49:10] RECOVERY Free ram is now: OK on incubator-bot2 i-00000252 output: OK: 33% free memory [07:49:10] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [07:49:15] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:49:15] PROBLEM Current Load is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [07:50:05] PROBLEM Current Load is now: WARNING on psm-precise i-000002f2 output: WARNING - load average: 0.68, 2.73, 6.02 [07:50:05] RECOVERY Current Users is now: OK on deployment-apache31 i-000002d4 output: USERS OK - 0 users currently logged in [07:50:05] RECOVERY Total Processes is now: OK on deployment-apache31 i-000002d4 output: PROCS OK: 124 processes [07:50:10] PROBLEM Current Load is now: WARNING on deployment-apache31 i-000002d4 output: WARNING - load average: 1.39, 3.98, 7.91 [07:50:10] RECOVERY Free ram is now: OK on deployment-apache31 i-000002d4 output: OK: 90% free memory [07:50:10] RECOVERY Disk Space is now: OK on deployment-apache31 i-000002d4 output: DISK OK [07:50:10] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:50:15] PROBLEM Total Processes is now: CRITICAL on ganglia-test2 i-00000250 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:51:37] PROBLEM Current Load is now: WARNING on gluster-4 i-000002e4 output: WARNING - load average: 3.92, 6.13, 6.72 [07:51:37] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 84 processes [07:51:42] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 94% free memory [07:51:42] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [07:51:47] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 88% free memory [07:51:47] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [07:51:48] RECOVERY Current Load is now: OK on reportcard2 i-000001ea output: OK - load average: 3.63, 3.73, 4.28 [07:51:48] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [07:51:48] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [07:51:48] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 91 processes [07:51:53] RECOVERY Current Users is now: OK on gluster-4 i-000002e4 output: USERS OK - 0 users currently logged in [07:51:53] RECOVERY Disk Space is now: OK on gluster-4 i-000002e4 output: DISK OK [07:51:53] RECOVERY dpkg-check is now: OK on gluster-4 i-000002e4 output: All packages OK [07:52:03] PROBLEM Total Processes is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:52:08] PROBLEM Current Users is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:52:08] PROBLEM Disk Space is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:52:08] PROBLEM dpkg-check is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [07:52:37] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [07:52:37] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [07:52:37] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 94% free memory [07:52:37] RECOVERY dpkg-check is now: OK on mobile-testing i-00000271 output: All packages OK [07:52:37] RECOVERY Current Load is now: OK on incubator-bot2 i-00000252 output: OK - load average: 0.39, 2.30, 4.66 [07:54:25] RECOVERY Current Load is now: OK on rds i-00000207 output: OK - load average: 1.03, 3.43, 4.54 [07:54:25] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [07:54:44] RECOVERY Current Load is now: OK on hugglewiki i-000000aa output: OK - load average: 0.27, 1.72, 3.88 [07:55:53] RECOVERY Total Processes is now: OK on configtest-main i-000002dd output: PROCS OK: 96 processes [07:55:58] RECOVERY Current Users is now: OK on configtest-main i-000002dd output: USERS OK - 0 users currently logged in [07:55:58] RECOVERY Disk Space is now: OK on configtest-main i-000002dd output: DISK OK [07:55:58] RECOVERY dpkg-check is now: OK on configtest-main i-000002dd output: All packages OK [07:56:03] RECOVERY Current Load is now: OK on ganglia-test2 i-00000250 output: OK - load average: 0.41, 2.54, 4.52 [07:56:03] PROBLEM Current Load is now: WARNING on redis1 i-000002b6 output: WARNING - load average: 0.13, 2.88, 5.07 [07:59:52] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.12, 1.00, 3.65 [07:59:52] RECOVERY Current Load is now: OK on deployment-apache31 i-000002d4 output: OK - load average: 0.47, 0.94, 4.31 [08:00:42] PROBLEM Puppet freshness is now: CRITICAL on gerrit i-000000ff output: Puppet has not run in last 20 hours [08:01:22] RECOVERY Current Load is now: OK on configtest-main i-000002dd output: OK - load average: 0.08, 1.21, 3.79 [08:09:46] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.44, 0.66, 4.08 [08:16:27] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [08:17:42] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 2.14, 1.33, 3.36 [08:18:14] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:23:08] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.76, 1.26, 2.77 [08:48:03] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [08:49:33] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [08:58:26] !log deployment-prep restarted jobrunner service (had a wrong path pointing to common-backup) [08:58:28] Logged the message, Master [09:02:41] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.12, 6.00, 6.03 [09:11:21] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [09:15:21] PROBLEM Free ram is now: UNKNOWN on psm-precise i-000002f2 output: NRPE: Unable to read output [09:15:21] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 161 processes [09:16:21] PROBLEM Free ram is now: UNKNOWN on gluster-4 i-000002e4 output: NRPE: Unable to read output [09:18:21] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [09:20:51] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [09:26:05] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [09:30:26] PROBLEM Total Processes is now: WARNING on ganglia-test2 i-00000250 output: PROCS WARNING: 192 processes [09:48:23] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [09:51:43] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:18:27] PROBLEM host: incubator-apache is DOWN address: i-00000211 CRITICAL - Host Unreachable (i-00000211) [10:21:27] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [10:22:37] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [10:26:03] New review: Hashar; "This patch is no more needed and can be abandoned. We are now deploying the misc script on 'beta' ba..." [operations/puppet] (test); V: 0 C: -2; - https://gerrit.wikimedia.org/r/6118 [10:26:17] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [10:29:46] Change abandoned: Dzahn; "thanks for clarifying this is not needed anymore. abandoned." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/6118 [10:32:30] !log incubator Rebooting incubator-apache, seems to be out of memory (See console log: http://pastebin.com/TqJ1ud2b) [10:32:31] Logged the message, Master [10:32:52] RECOVERY host: incubator-apache is UP address: i-00000211 PING OK - Packet loss = 0%, RTA = 1.05 ms [10:34:12] RECOVERY HTTP is now: OK on incubator-apache i-00000211 output: HTTP OK: HTTP/1.1 200 OK - 473 bytes in 0.010 second response time [10:34:12] RECOVERY SSH is now: OK on incubator-apache i-00000211 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [10:34:12] RECOVERY Free ram is now: OK on incubator-apache i-00000211 output: OK: 89% free memory [10:34:12] RECOVERY Disk Space is now: OK on incubator-apache i-00000211 output: DISK OK [10:34:12] RECOVERY Current Load is now: OK on incubator-apache i-00000211 output: OK - load average: 1.52, 1.37, 0.67 [10:34:13] RECOVERY dpkg-check is now: OK on incubator-apache i-00000211 output: All packages OK [10:36:33] RECOVERY Current Users is now: OK on incubator-apache i-00000211 output: USERS OK - 1 users currently logged in [10:36:33] RECOVERY Total Processes is now: OK on incubator-apache i-00000211 output: PROCS OK: 130 processes [10:53:32] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:02:38] 06/29/2012 - 11:02:38 - Creating a home directory for smeyer at /export/keys/smeyer [11:03:42] 06/29/2012 - 11:03:42 - Updating keys for smeyer at /export/keys/smeyer [11:22:34] 06/29/2012 - 11:22:34 - Created a home directory for smeyer in project(s): wikidata-dev [11:23:31] 06/29/2012 - 11:23:31 - User smeyer may have been modified in LDAP or locally, updating key in project(s): wikidata-dev [11:24:22] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:30:34] 06/29/2012 - 11:30:33 - Created a home directory for smeyer in project(s): bastion [11:31:33] 06/29/2012 - 11:31:32 - User smeyer may have been modified in LDAP or locally, updating key in project(s): bastion [11:34:49] hello there. we have a problem at wikidata. [11:35:04] we need to give access to Silke_WMDE_ [11:35:20] * Silke_WMDE_ waves [11:35:59] My labsconsole and gerrit account works alright, but I cannot connect to bastion via ssh [11:38:57] Silke_WMDE_: Jens_WMDE: smeyere? [11:39:01] err, smeyer* [11:39:24] yes [11:39:26] jeremyb: yep [11:39:51] so, separate issue: she needs to be granted some global roles [11:39:53] https://labsconsole.wikimedia.org/wiki/Special:NovaRole [11:40:25] but the first blocker is she needs to be added as a member of the bastion group [11:40:32] s/group/project/ [11:40:52] https://labsconsole.wikimedia.org/wiki/Special:NovaProject [11:41:15] maybe mutante is still around? [11:41:26] you can't do that? [11:41:34] i can't do much [11:41:40] :) [11:41:45] Silke_WMDE_: which method did you use to set up ssh? [11:42:17] although... 29 11:30:33 < labs-home-wm> 06/29/2012 - 11:30:33 - Created a home directory for smeyer in project(s): bastion [11:42:19] ssh config file [11:42:20] how odd [11:42:31] Silke_WMDE_: ProxyCommand? [11:42:57] ohhh, wait i was searching for shell name instead of wiki username [11:43:04] she's Silke Meyer [11:43:19] and she's in bastion already [11:43:32] jeremyb: I added her to bastion [11:43:48] Abraham_WMDE: i see now [11:44:00] so, i can probably fix this without ops: it's a problem w/ Silke_WMDE_'s local config [11:44:34] pastebin your conf? run `ssh -vvv bastion1.pmtpa.wmflabs echo foo` and pastebin that? [11:44:38] Silke_WMDE_ [11:44:47] ok moment [11:45:26] what OS? [11:46:40] oh. http://paste.debian.net/176909/ [11:47:12] and conf? [11:48:25] with just "bastion" instead of TLD http://paste.debian.net/176910/ [11:48:59] conf, conf, i want conf! [11:49:13] conf: http://paste.debian.net/176911/ [11:49:55] huh?! [11:49:59] where did that come from? [11:50:30] the conf? it's basically mine [11:50:43] well yours shouldn't work either [11:50:56] ha! turns out it does! [11:50:56] eh? no need to add to any global roles [11:51:04] that's handled automatically [11:51:22] is Silke_WMDE_ in the bastion project? [11:51:26] Ryan_Lane1: then why do i have a global role? anyway, that's not the current issue [11:51:29] jeremyb: I can assure you that my conf works fine for me. [11:51:30] Ryan_Lane: yes [11:51:34] Ryan_Lane: I added her to bastion [11:51:44] it's negative nscd cache, then [11:51:53] gimme a sec [11:52:08] Silke_WMDE_: can you just copy the config from https://labsconsole.wikimedia.org/wiki/Help:Access#Using_ProxyCommand_ssh_option ? [11:52:12] now try [11:52:36] \o/ [11:52:39] works! [11:52:40] Ryan_Lane: :* [11:52:45] thx! [11:52:48] yw [11:52:50] oooh [11:53:28] so it was authorization not authentication failing i guess [11:53:37] yes [11:54:11] Silke_WMDE_: still, Jens_WMDE's config sucks. what happens if leslie does another renumbering? [11:54:22] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [11:54:39] (also there's a bunch of redundancy in there that doesn't belong) [11:54:41] i know i know [11:54:59] another renumbering? [11:55:09] there's an IP address [11:55:11] Ryan_Lane: IP allocation [11:55:24] Ryan_Lane: he's not using DNS, has IP hardcoded in ssh config [11:55:28] gonna be a serious pain in the ass if our IP space changes [11:55:34] hah [11:55:49] orrrrrrrrr.... [11:55:58] okay okay, i'll change it. [11:56:16] well, I mean for me, if leslie changes it [11:56:28] I'd fight pretty hard not to have it changed [11:56:45] that said, we may need to take your IP away to reuse it for something at some point [11:56:47] like we did for bastion [11:56:55] Ryan_Lane: shouldn't you be airborne? [11:57:02] tomorrow, it seems [11:57:14] ahhh, that's what off by one meant [11:57:39] http://pastebin.com/5HNVxa4k happy nao? :) [11:58:04] Jens_WMDE: no [11:58:19] Jens_WMDE: can you just use the one from the wiki? [11:58:22] :'( [11:58:27] okay okay [11:58:32] PROBLEM Current Users is now: CRITICAL on bastion-restricted1 i-0000019b output: USERS CRITICAL - 13 users currently logged in [11:58:41] 13 - unlucky for some [12:00:31] PROBLEM Total Processes is now: CRITICAL on psm-precise i-000002f2 output: CHECK_NRPE: Socket timeout after 10 seconds. [12:01:29] Damianz: what's 13? [12:01:39] PROBLEM Current Users is now: CRITICAL on bastion-restricted1 i-0000019b output: USERS CRITICAL - 13 users currently logged in [12:01:52] oh... [12:02:04] * jeremyb runs away [12:05:22] PROBLEM Total Processes is now: WARNING on psm-precise i-000002f2 output: PROCS WARNING: 161 processes [12:07:18] jeremyb: thanks, updated the conf [12:07:28] me too [12:07:40] off to get f00d now. [12:24:33] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:30:20] hashar: I've disabled public join access to the labs swarm [12:30:36] testswarm-browserstack now joins to /run/browserstack/?run_token=*** [12:30:48] so you need a token to join the swarm isn't it? [12:31:02] would most probably save us from false positivies [12:31:16] !log testswarm config: client.requireRunToken = true; [12:31:18] Logged the message, Master [12:31:23] !log integration testswarm/config: client.requireRunToken = true; [12:31:24] Logged the message, Master [12:31:32] hashar: yep [12:32:20] that feature is mostly for swarms like ours and jquery's that can rely on automated joining, and thus worth the trade off to block possible malice users, fake user agents, broken DDL windows, toolbars that inject javascript and what not [12:32:24] I've seen it all :P [12:32:36] and linux browsers that pretend to be Safari or Chrome [12:32:43] but have additional bugs [12:54:41] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [12:55:31] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 18% free memory [13:00:28] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 46% free memory [13:24:10] Ryan_Lane: for your information, we have the instance project now exported as an env variable. And /etc/wmflabs-instancename :-D [13:25:02] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [13:25:03] cool [13:26:20] <^demon> Ryan_Lane: So, I've decided having code reviews with no comments is *really* annoying on IRC. Who cares if someone did a +1 but didn't say anything? Or a +2, they're probably about to merge. [13:26:24] <^demon> So I killed it: https://gerrit.wikimedia.org/r/#/c/13146/ [13:26:51] sweet [13:29:55] ^demon: deployed [13:30:03] <^demon> Yay, less IRC spam :) [13:37:40] <^demon> Ryan_Lane: I'm trying to schedule the gerrit upgrade for next week. How's wednesday afternoon work for you? Looks to be pretty open--deployments seem to wrap up around 1pm. [13:37:52] should be good for me [13:38:41] <^demon> Ok, I'll put us down for 2-3pm PDT on Wed. [13:38:53] great [13:39:18] <^demon> Oh wait, that's July 4. [13:39:19] <^demon> duh [13:39:25] <^demon> No wonder we're light on deployments [13:40:20] hahaha [13:40:21] yeah [13:40:22] bad day [13:41:26] <^demon> Monday is open deployment wise, but people are usually busy catching up from the weekend still. [13:41:44] <^demon> Granted, "people" here is just you and me. [13:41:45] monday is a bad day [13:42:01] lol [13:42:07] The 4th is an awesome day :D [13:42:11] I'll probably still be jetlagged [13:43:26] <^demon> Is 4pm sf time on thursday ok by you? [13:43:40] sure [13:43:49] <^demon> Ok, we'll do that [13:55:06] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:02:56] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.40, 3.83, 4.85 [14:25:06] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [14:33:56] PROBLEM Disk Space is now: WARNING on deployment-nfs-memc i-000000d7 output: DISK WARNING - free space: /mnt 1147 MB (5% inode=51%): [14:39:11] grg [14:44:06] PROBLEM Puppet freshness is now: CRITICAL on wikistats-01 i-00000042 output: Puppet has not run in last 20 hours [14:54:00] RECOVERY Disk Space is now: OK on deployment-nfs-memc i-000000d7 output: DISK OK [14:55:10] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:02:00] PROBLEM Disk Space is now: WARNING on deployment-nfs-memc i-000000d7 output: DISK WARNING - free space: /mnt 1147 MB (5% inode=51%): [15:02:44] !log deployment-prep deleted .nfs** files in /mnt/export/upload6/ [15:02:45] Logged the message, Master [15:03:20] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [15:03:20] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [15:07:37] !log deployment-prep Removing thumbnails that not have been access for the last 15 days : sudo find . -atime +15 -wholename '*/thumb/*' -exec rm {} \; [15:07:39] Logged the message, Master [15:08:10] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 19% free memory [15:08:10] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [15:08:37] Ryan_Lane: do we have a way to manually resize /dev/vdb on an instance? Would like to avoid having to rebuild the 'beta' instance that is holding files :-) [15:08:49] nope [15:09:03] use project storage? [15:09:07] or is this for a database? [15:15:01] Ryan_Lane: that is for files fetched from commons and generated thumbnails [15:15:12] currently on /dev/vdb on a labs instance named deployment-nfs-memc [15:15:24] use project storage [15:15:47] I thought we did not want to use /data/project for some reason [15:15:51] will definitely migrate there :) [15:16:10] it doesn't work for databases [15:16:28] would we be available to change owner rights there and play with files just like if they were locals? [15:16:42] yes [15:16:47] it's shared storage like any other kind [15:18:41] that will remove one instance. yeah! [15:20:30] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [15:25:10] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [15:55:17] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:14:16] <^demon> Ryan_Lane: I'm trying to reconfigure the gerrit labs install to use puppet, but I don't see any of the gerrit classes when trying to configure my instance. [16:14:34] they probably don't exist [16:14:41] you'll need to add it for your project [16:14:55] via Manage puppet groups [16:15:10] <^demon> Ah ok. [16:25:57] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:27:42] 06/29/2012 - 16:27:41 - Creating a home directory for jforrester at /export/keys/jforrester [16:28:40] 06/29/2012 - 16:28:40 - Updating keys for jforrester at /export/keys/jforrester [16:33:04] RECOVERY Puppet freshness is now: OK on gerrit i-000000ff output: puppet ran at Fri Jun 29 16:32:58 UTC 2012 [16:54:12] andrewbogott: so… how do I use devstack again? [16:54:20] I'm pretty confused [16:54:44] What step are you confused at? [16:54:50] well, it's running [16:54:53] I've made my changes [16:55:02] but I have no clue where it sticks the credentials [16:55:17] what changes? Code changes? [16:55:22] yeah [16:55:27] I added bgp support to nova [16:55:32] for floating IPs [16:56:07] it tells me it made a user and password [16:56:10] but it doesn't seem to work [16:56:25] and you'd think it would stick it into a file or something [16:56:42] Ok, so, backing up... [16:56:51] a) You started out with a clean machine [16:56:54] yep [16:56:57] b) You checked out devstack [16:57:00] c) you ran stack.sh [16:57:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [16:57:10] d) you edited some of the nova code that stach.sh checked out in /opt/stack/nova [16:57:12] e) ??? [16:57:21] killall -u stack [16:57:26] stack.sh [16:57:31] that worked fine [16:57:40] my code is live [16:57:42] Ok, sounds good so far. [16:57:45] but I can't run the nova command [16:57:47] Ah! [16:57:57] OK, you have scripts in your devstack dir that will set up credentials &c. [16:58:03] oh? [16:58:22] $ source openrc [16:58:26] $ source localrc [16:58:42] (Also, possibly $source stackrc, I can never remember if open calls stack or if stack calls open) [16:58:47] Then you should be ready to roll. [16:58:56] hm [16:58:56] no [16:59:02] You must provide a username, eithervia --username or via env[NOVA_USERNAME] [16:59:20] Lemme get to where you are. [16:59:23] What command are you running? [16:59:28] nova list [16:59:54] ok, give me a minute. [17:00:32] I sorta thought that 'username' had been renamed to 'os_username', for one thing. [17:00:38] heh [17:00:40] The default usernames in devstack are 'test' and 'admin' [17:00:52] But, in a minute or so I'll be able to try this myself. [17:00:53] says admin and demo to me [17:00:56] launching everything... [17:01:29] I don't see NOVA_USERNAME defined in any of the RC files [17:03:19] ah. got that working [17:03:31] I just sourced openrc and localrc [17:03:39] and now nova list is working for me [17:03:54] hm [17:04:01] I wonder why that stuff is missing from mine [17:04:04] I know that when I do something that requires admin credentials I do nova --os_username admin blahblah [17:04:32] I don't have NOVA_USERNAME defined but I do have OS_USERNAME [17:05:09] ah. I see [17:05:15] strange that the nova command is requiring it [17:05:23] I wonder if that's a regression [17:05:31] because OS_USERNAME is indeed set [17:05:42] same with OS_PASSWORD [17:05:42] If you want to send me your patch, I can doublecheck. [17:05:56] oh, it's not a regression in my patch [17:06:04] mine only affects the nova-network service [17:06:11] it's totally backend [17:06:12] I may have a moderately mismatched suite of tools on the VM I'm using now. [17:06:15] e.g. an older keystone. [17:06:19] heh [17:06:26] I can update. [17:06:46] I think I'm good to go now [17:07:05] What did you change? [17:07:08] oh [17:07:11] motherfucker [17:07:25] fucking hp cloud [17:07:31] heh [17:07:49] I used the same vm to run hp cloud's custom version of the nova client [17:08:00] no wonder [17:08:29] why they need a custom version, who knows [17:08:38] You're probably skirting some arg rename... that'd do it. [17:10:13] thankfully there's a devstack excercise that'll test my changes :) [17:11:06] hm [17:11:20] even after getting rid of that package [17:11:26] now I get a different error [17:11:30] Could not find any suitable endpoint. Correct region? [17:11:30] ERROR: [17:11:50] uuuugggghhhh [17:12:33] this is why I like fresh vms for everything [17:12:58] I advice starting over right now! It'll only cost you an extra 10 minutes. [17:13:04] s/advice/advise/ [17:13:11] well, this is a local vm [17:13:39] I'm pretty sure I figured out the issue :) [17:14:46] yep [17:14:48] working now [17:16:33] Cool. I'm bound for lunch, back online in a few. [17:16:54] ok [17:27:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [17:40:32] 06/29/2012 - 17:40:32 - Created a home directory for erosen in project(s): reportcard [17:41:31] 06/29/2012 - 17:41:30 - User erosen may have been modified in LDAP or locally, updating key in project(s): reportcard [17:48:24] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [17:48:34] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [17:53:14] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 17% free memory [17:53:14] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [17:57:04] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:01:54] Ryan_Lane: can you add mlitn to the svn user group on ee-prototype? [18:02:19] eh? [18:02:36] is that on tesla? [18:03:16] not sure, I always get to it by sshing to ee-prototype from bastion [18:03:25] this is in labs? [18:03:31] yeah, sorry [18:03:47] http://ee-prototype.wmflabs.org [18:03:53] ok [18:04:07] so, everything in labs is centralized [18:04:36] it confuses me when you want me to do something within a specific project or instance [18:04:46] ah, sorry [18:04:55] no worries [18:05:19] that's an LDAP group [18:05:21] I just added him [18:05:26] yay, thanks! [18:05:31] you'll probably need to clear the nscd cache on the instance [18:05:34] nscd -i group [18:05:37] nscd -i passwd [18:05:43] ok [18:06:04] if there's any way I can handle that in the future, just let me know [18:06:14] adding people to groups I mean [18:09:31] nope. only ops can, generally [18:09:51] you can add people to projects you are in [18:09:57] or roles you are in [18:10:05] but groups like wmf, svn, and ops are special [18:11:18] andrewbogott: all this bash code makes me want to cry [18:11:18] heh [18:11:48] There's a project to rewrite everything in python. Not sure about how far along it is. [18:11:54] yeah [18:11:58] By 'everything' I mean 'devstack' [18:12:05] I heard about that like 8 months ago [18:12:06] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.31, 6.12, 5.33 [18:27:06] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [18:33:36] PROBLEM Total Processes is now: CRITICAL on bastion-restricted1 i-0000019b output: PROCS CRITICAL: 892 processes [18:48:23] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [18:57:43] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:03:45] PROBLEM Current Load is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:04:25] PROBLEM Current Users is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:05:05] PROBLEM Disk Space is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:05:45] PROBLEM Free ram is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:06:55] PROBLEM Total Processes is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:07:35] PROBLEM dpkg-check is now: CRITICAL on signwriting-ase i-000002f5 output: Connection refused by host [19:10:45] PROBLEM Free ram is now: UNKNOWN on signwriting-ase i-000002f5 output: NRPE: Unable to read output [19:11:55] RECOVERY Total Processes is now: OK on signwriting-ase i-000002f5 output: PROCS OK: 77 processes [19:12:35] RECOVERY dpkg-check is now: OK on signwriting-ase i-000002f5 output: All packages OK [19:13:45] RECOVERY Current Load is now: OK on signwriting-ase i-000002f5 output: OK - load average: 0.12, 1.04, 0.86 [19:14:25] RECOVERY Current Users is now: OK on signwriting-ase i-000002f5 output: USERS OK - 0 users currently logged in [19:15:05] RECOVERY Disk Space is now: OK on signwriting-ase i-000002f5 output: DISK OK [19:27:43] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [19:40:42] andrewbogott: if I need to run a new command as root, where do I need to add that? [19:41:18] I know the root-wrap stuff changed [19:41:30] Ryan_Lane1: Thierry just changed that code, and I haven't read the latest patch yet. It's here... https://review.openstack.org/#/c/8747/ [19:41:35] ah [19:41:35] heh [19:41:59] Do you want to add it as a one-off, or actually include it in your patch? [19:42:05] include it [19:42:15] I need to run exabgp with some environment before it [19:42:47] I'm about to need to know how to do this anyway, so lemme read the new code and I'll get back to you. [19:43:12] ok [19:43:14] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [19:43:20] I found where I need to add it :) [19:43:28] etc/nova/rootwrap.d [19:48:08] Yep, looks straightforward. And much less annoying than it was before. [19:48:17] yeah. seems easy [19:57:44] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [20:15:34] 06/29/2012 - 20:15:33 - Created a home directory for slevinski in project(s): signwriting [20:15:42] 06/29/2012 - 20:15:41 - Creating a home directory for slevinski at /export/keys/slevinski [20:16:31] 06/29/2012 - 20:16:31 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:16:39] 06/29/2012 - 20:16:39 - Updating keys for slevinski at /export/keys/slevinski [20:19:33] 06/29/2012 - 20:19:33 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:19:46] 06/29/2012 - 20:19:46 - Updating keys for slevinski at /export/keys/slevinski [20:20:33] 06/29/2012 - 20:20:33 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:20:40] 06/29/2012 - 20:20:39 - Updating keys for slevinski at /export/keys/slevinski [20:22:34] 06/29/2012 - 20:22:33 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:22:40] 06/29/2012 - 20:22:40 - Updating keys for slevinski at /export/keys/slevinski [20:23:33] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:23:33] 06/29/2012 - 20:23:33 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:23:40] 06/29/2012 - 20:23:39 - Updating keys for slevinski at /export/keys/slevinski [20:28:23] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [20:28:23] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [20:31:13] PROBLEM Free ram is now: CRITICAL on etherpad-lite i-000002de output: CHECK_NRPE: Socket timeout after 10 seconds. [20:36:03] PROBLEM Free ram is now: UNKNOWN on etherpad-lite i-000002de output: NRPE: Unable to read output [20:43:32] 06/29/2012 - 20:43:32 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:43:39] 06/29/2012 - 20:43:38 - Updating keys for slevinski at /export/keys/slevinski [20:47:19] PROBLEM Free ram is now: CRITICAL on configtest-main i-000002dd output: CHECK_NRPE: Socket timeout after 10 seconds. [20:51:32] 06/29/2012 - 20:51:32 - User slevinski may have been modified in LDAP or locally, updating key in project(s): signwriting [20:51:40] 06/29/2012 - 20:51:40 - Updating keys for slevinski at /export/keys/slevinski [20:52:03] PROBLEM Free ram is now: UNKNOWN on configtest-main i-000002dd output: NRPE: Unable to read output [20:58:23] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:18:54] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:24] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: Warning: 18% free memory [21:28:46] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [21:58:44] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:59:34] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:03:34] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [22:13:14] PROBLEM Puppet freshness is now: CRITICAL on psm-precise i-000002f2 output: Puppet has not run in last 20 hours [22:29:34] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [22:53:37] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3594 MB (20% inode=75%): [22:59:37] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:03:22] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [23:07:59] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.80, 6.61, 6.47 [23:17:19] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Socket timeout after 10 seconds. [23:22:09] PROBLEM Free ram is now: UNKNOWN on integration-apache1 i-000002eb output: NRPE: Unable to read output [23:29:39] PROBLEM host: nginx-dev2 is DOWN address: i-000002f0 CRITICAL - Host Unreachable (i-000002f0) [23:38:48] PROBLEM Free ram is now: CRITICAL on wikistats-history-01 i-000002e2 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:39:41] PROBLEM Current Load is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM Current Users is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM Disk Space is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM dpkg-check is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:41] PROBLEM Total Processes is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:40:48] PROBLEM Free ram is now: CRITICAL on incubator-bot0 i-00000296 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:31] PROBLEM Current Users is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:31] PROBLEM Disk Space is now: CRITICAL on dumps-2 i-000002d8 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:41] PROBLEM Current Load is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:41:41] PROBLEM Current Users is now: CRITICAL on pediapress-ocg2 i-00000234 output: CHECK_NRPE: Socket timeout after 10 seconds. [23:43:31] PROBLEM Free ram is now: UNKNOWN on wikistats-history-01 i-000002e2 output: NRPE: Unable to read output [23:44:31] RECOVERY Current Load is now: OK on incubator-bot0 i-00000296 output: OK - load average: 0.55, 2.33, 1.57 [23:45:31] RECOVERY Current Users is now: OK on incubator-bot0 i-00000296 output: USERS OK - 0 users currently logged in [23:45:31] RECOVERY Disk Space is now: OK on incubator-bot0 i-00000296 output: DISK OK [23:45:31] RECOVERY dpkg-check is now: OK on incubator-bot0 i-00000296 output: All packages OK [23:45:31] RECOVERY Total Processes is now: OK on incubator-bot0 i-00000296 output: PROCS OK: 88 processes [23:45:36] RECOVERY Free ram is now: OK on incubator-bot0 i-00000296 output: OK: 87% free memory [23:46:21] RECOVERY Current Users is now: OK on dumps-2 i-000002d8 output: USERS OK - 0 users currently logged in [23:46:21] RECOVERY Disk Space is now: OK on dumps-2 i-000002d8 output: DISK OK [23:46:31] RECOVERY Current Load is now: OK on pediapress-ocg2 i-00000234 output: OK - load average: 0.37, 1.87, 1.34 [23:46:31] RECOVERY Current Users is now: OK on pediapress-ocg2 i-00000234 output: USERS OK - 0 users currently logged in