[01:56:06] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [02:16:19] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.15, 6.39, 5.53 [02:46:03] RECOVERY Total Processes is now: OK on mwreview-test4 i-000002b2 output: PROCS OK: 80 processes [02:46:08] RECOVERY dpkg-check is now: OK on mwreview-test4 i-000002b2 output: All packages OK [02:46:31] RECOVERY Current Load is now: OK on mwreview-test4 i-000002b2 output: OK - load average: 0.31, 0.16, 0.09 [02:48:34] RECOVERY Current Users is now: OK on mwreview-test4 i-000002b2 output: USERS OK - 0 users currently logged in [02:48:34] RECOVERY Disk Space is now: OK on mwreview-test4 i-000002b2 output: DISK OK [02:48:34] RECOVERY Free ram is now: OK on mwreview-test4 i-000002b2 output: OK: 89% free memory [03:00:20] 05/31/2012 - 03:00:20 - Updating keys for laner at /export/home/deployment-prep/laner [03:07:23] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:47] PROBLEM dpkg-check is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:57] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:09:57] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:02] PROBLEM Current Users is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:03] PROBLEM Disk Space is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:03] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:10:21] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 14.43, 13.40, 8.10 [03:11:02] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [03:11:37] PROBLEM Free ram is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:11:57] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:30] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:30] PROBLEM Current Users is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:30] PROBLEM Current Load is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:12:30] PROBLEM Total Processes is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [03:14:13] RECOVERY dpkg-check is now: OK on incubator-bot1 i-00000251 output: All packages OK [03:14:14] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 0.95, 3.07, 2.97 [03:14:14] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 118 processes [03:14:19] RECOVERY Current Users is now: OK on incubator-bot1 i-00000251 output: USERS OK - 0 users currently logged in [03:14:19] RECOVERY Disk Space is now: OK on incubator-bot1 i-00000251 output: DISK OK [03:14:19] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 54% free memory [03:16:18] RECOVERY Free ram is now: OK on upload-wizard i-0000021c output: OK: 87% free memory [03:16:48] RECOVERY Current Load is now: OK on mwreview i-000002ae output: OK - load average: 0.29, 2.59, 2.12 [03:17:18] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [03:17:18] RECOVERY Current Users is now: OK on upload-wizard i-0000021c output: USERS OK - 0 users currently logged in [03:17:18] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 0.23, 2.39, 1.99 [03:17:18] RECOVERY Total Processes is now: OK on upload-wizard i-0000021c output: PROCS OK: 98 processes [03:30:16] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.66, 1.62, 3.29 [03:35:21] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.63, 1.22, 2.61 [03:51:33] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:53:21] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:21] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:21] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [03:53:21] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [03:54:18] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [03:54:18] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [03:57:53] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:57:53] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 13% free memory [03:59:07] RECOVERY Disk Space is now: OK on reportcard2 i-000001ea output: DISK OK [03:59:07] RECOVERY Current Users is now: OK on reportcard2 i-000001ea output: USERS OK - 0 users currently logged in [03:59:07] RECOVERY Total Processes is now: OK on reportcard2 i-000001ea output: PROCS OK: 84 processes [03:59:12] RECOVERY Free ram is now: OK on reportcard2 i-000001ea output: OK: 85% free memory [03:59:34] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 0.19, 1.66, 1.55 [03:59:34] RECOVERY Current Load is now: OK on reportcard2 i-000001ea output: OK - load average: 0.27, 1.80, 1.50 [04:04:46] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 13% free memory [04:06:49] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:11:50] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:17:51] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [04:18:26] 05/31/2012 - 04:18:25 - Updating keys for hydriz at /export/home/bastion/hydriz [04:18:37] 05/31/2012 - 04:18:36 - Updating keys for hydriz at /export/home/incubator/hydriz [04:18:39] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: CHECK_NRPE: Socket timeout after 10 seconds. [04:18:44] 05/31/2012 - 04:18:44 - Updating keys for hydriz at /export/home/dumps/hydriz [04:19:12] 05/31/2012 - 04:19:12 - Updating keys for hydriz at /export/home/bots/hydriz [04:23:20] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:23:20] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 96% free memory [04:30:21] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:35:15] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [05:45:16] PROBLEM Puppet freshness is now: CRITICAL on nova-precise1 i-00000236 output: Puppet has not run in last 20 hours [05:50:16] PROBLEM Puppet freshness is now: CRITICAL on nova-essex-test i-000001f9 output: Puppet has not run in last 20 hours [06:00:21] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [06:46:11] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [06:46:11] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 6.79, 8.86, 6.85 [06:46:27] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:45] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:47:45] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM Disk Space is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM Current Users is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:50:52] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:59] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:59] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:51:59] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:40] PROBLEM Total Processes is now: CRITICAL on mwreview i-000002ae output: Network is unreachable [06:52:46] PROBLEM Free ram is now: CRITICAL on mwreview i-000002ae output: Network is unreachable [06:52:51] PROBLEM Current Load is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:51] PROBLEM Current Users is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:51] PROBLEM Free ram is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:52:51] PROBLEM Total Processes is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:07] PROBLEM dpkg-check is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:07] PROBLEM Current Users is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:07] PROBLEM Disk Space is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:07] PROBLEM Free ram is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:07] PROBLEM Total Processes is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:12] PROBLEM Current Load is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:13] PROBLEM Free ram is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:13] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:13] PROBLEM Disk Space is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:13] PROBLEM Current Users is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:13] PROBLEM Current Users is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:14] PROBLEM Disk Space is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:40] PROBLEM Current Load is now: WARNING on bots-apache1 i-000000b0 output: WARNING - load average: 11.97, 10.07, 6.81 [06:53:40] PROBLEM dpkg-check is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:41] PROBLEM Current Load is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:41] PROBLEM Total Processes is now: CRITICAL on rds i-00000207 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:53:48] PROBLEM HTTP is now: CRITICAL on bots-apache1 i-000000b0 output: CRITICAL - Socket timeout after 10 seconds [06:53:48] PROBLEM Disk Space is now: CRITICAL on mobile-testing i-00000271 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:00:06] PROBLEM Current Load is now: WARNING on worker1 i-00000208 output: WARNING - load average: 4.56, 5.68, 5.70 [07:00:06] RECOVERY Disk Space is now: OK on migration1 i-00000261 output: DISK OK [07:00:06] PROBLEM Current Load is now: WARNING on migration1 i-00000261 output: WARNING - load average: 4.94, 6.98, 5.99 [07:00:06] RECOVERY Free ram is now: OK on worker1 i-00000208 output: OK: 89% free memory [07:00:06] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [07:00:07] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 78% free memory [07:00:08] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 94 processes [07:00:11] RECOVERY Total Processes is now: OK on worker1 i-00000208 output: PROCS OK: 98 processes [07:01:49] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [07:03:22] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [07:03:51] RECOVERY Current Load is now: OK on reportcard2 i-000001ea output: OK - load average: 4.27, 5.14, 4.76 [07:03:51] RECOVERY dpkg-check is now: OK on reportcard2 i-000001ea output: All packages OK [07:04:06] PROBLEM Current Load is now: WARNING on mwreview i-000002ae output: WARNING - load average: 7.12, 7.78, 6.03 [07:04:06] RECOVERY Current Users is now: OK on mwreview i-000002ae output: USERS OK - 0 users currently logged in [07:04:06] RECOVERY Disk Space is now: OK on mwreview i-000002ae output: DISK OK [07:04:07] RECOVERY Free ram is now: OK on mwreview i-000002ae output: OK: 76% free memory [07:04:07] RECOVERY Total Processes is now: OK on mwreview i-000002ae output: PROCS OK: 119 processes [07:04:12] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [07:04:12] RECOVERY HTTP is now: OK on bots-apache1 i-000000b0 output: HTTP OK: HTTP/1.1 200 OK - 1666 bytes in 6.500 second response time [07:05:44] PROBLEM HTTP is now: CRITICAL on mailman-01 i-00000235 output: CRITICAL - Socket timeout after 10 seconds [07:06:36] PROBLEM Current Load is now: CRITICAL on incubator-bot1 i-00000251 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:02] PROBLEM Current Load is now: WARNING on aggregator-test3 i-00000293 output: WARNING - load average: 0.76, 4.44, 5.41 [07:08:08] PROBLEM Current Load is now: WARNING on incubator-bot2 i-00000252 output: WARNING - load average: 2.67, 5.29, 5.69 [07:08:08] PROBLEM Free ram is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:08] PROBLEM Disk Space is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:08] PROBLEM Disk Space is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:08] PROBLEM Total Processes is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:13] PROBLEM dpkg-check is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:13] PROBLEM dpkg-check is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:13] PROBLEM SSH is now: CRITICAL on ipv6test1 i-00000282 output: CRITICAL - Socket timeout after 10 seconds [07:09:26] PROBLEM dpkg-check is now: CRITICAL on ipv6test1 i-00000282 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:26] PROBLEM dpkg-check is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:26] PROBLEM Total Processes is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:36] PROBLEM Current Users is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:36] PROBLEM Current Load is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:36] PROBLEM Disk Space is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:36] PROBLEM Total Processes is now: CRITICAL on precise-test i-00000231 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:55] PROBLEM Current Users is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:55] PROBLEM Free ram is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:55] PROBLEM Current Load is now: CRITICAL on pybal-precise i-00000289 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:55] PROBLEM Current Load is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:56] PROBLEM Current Users is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:09:56] PROBLEM Free ram is now: CRITICAL on maps-tilemill1 i-00000294 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:12:49] RECOVERY Current Users is now: OK on ganglia-test5 i-000002a7 output: USERS OK - 0 users currently logged in [07:12:49] RECOVERY Disk Space is now: OK on ganglia-test5 i-000002a7 output: DISK OK [07:12:49] PROBLEM Current Load is now: WARNING on ganglia-test5 i-000002a7 output: WARNING - load average: 7.94, 9.17, 8.59 [07:12:49] RECOVERY dpkg-check is now: OK on ganglia-test5 i-000002a7 output: All packages OK [07:12:49] RECOVERY Free ram is now: OK on ganglia-test5 i-000002a7 output: OK: 67% free memory [07:12:49] RECOVERY Total Processes is now: OK on ganglia-test5 i-000002a7 output: PROCS OK: 206 processes [07:13:31] PROBLEM Current Load is now: CRITICAL on mwreview i-000002ae output: CHECK_NRPE: Socket timeout after 10 seconds. [07:14:29] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 1.13, 3.11, 4.54 [07:14:30] PROBLEM HTTP is now: WARNING on mailman-01 i-00000235 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 498 bytes in 0.012 second response time [07:14:30] RECOVERY Current Load is now: OK on aggregator-test3 i-00000293 output: OK - load average: 3.20, 3.38, 4.53 [07:14:30] RECOVERY Current Load is now: OK on incubator-bot2 i-00000252 output: OK - load average: 4.80, 3.78, 4.67 [07:14:30] RECOVERY Disk Space is now: OK on pybal-precise i-00000289 output: DISK OK [07:14:31] RECOVERY Disk Space is now: OK on maps-tilemill1 i-00000294 output: DISK OK [07:14:31] RECOVERY Total Processes is now: OK on maps-tilemill1 i-00000294 output: PROCS OK: 111 processes [07:14:35] RECOVERY Free ram is now: OK on precise-test i-00000231 output: OK: 76% free memory [07:14:35] RECOVERY dpkg-check is now: OK on maps-tilemill1 i-00000294 output: All packages OK [07:14:40] RECOVERY dpkg-check is now: OK on precise-test i-00000231 output: All packages OK [07:14:40] RECOVERY Current Users is now: OK on precise-test i-00000231 output: USERS OK - 0 users currently logged in [07:14:45] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:15:19] PROBLEM Current Load is now: WARNING on incubator-bot1 i-00000251 output: WARNING - load average: 3.84, 5.36, 5.72 [07:16:05] PROBLEM Current Load is now: WARNING on maps-tilemill1 i-00000294 output: WARNING - load average: 1.80, 5.53, 6.48 [07:16:05] RECOVERY Current Users is now: OK on maps-tilemill1 i-00000294 output: USERS OK - 0 users currently logged in [07:16:05] RECOVERY Free ram is now: OK on maps-tilemill1 i-00000294 output: OK: 86% free memory [07:16:15] RECOVERY Free ram is now: OK on rds i-00000207 output: OK: 92% free memory [07:16:15] RECOVERY Current Load is now: OK on rds i-00000207 output: OK - load average: 0.69, 3.01, 4.57 [07:16:15] RECOVERY Disk Space is now: OK on rds i-00000207 output: DISK OK [07:16:15] RECOVERY Total Processes is now: OK on rds i-00000207 output: PROCS OK: 84 processes [07:16:20] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [07:16:20] RECOVERY Current Users is now: OK on pybal-precise i-00000289 output: USERS OK - 0 users currently logged in [07:16:20] RECOVERY Total Processes is now: OK on pybal-precise i-00000289 output: PROCS OK: 111 processes [07:16:25] RECOVERY dpkg-check is now: OK on pybal-precise i-00000289 output: All packages OK [07:16:25] RECOVERY Free ram is now: OK on pybal-precise i-00000289 output: OK: 85% free memory [07:16:25] PROBLEM Current Load is now: WARNING on pybal-precise i-00000289 output: WARNING - load average: 6.09, 6.42, 6.20 [07:16:25] PROBLEM Current Load is now: WARNING on precise-test i-00000231 output: WARNING - load average: 3.72, 5.45, 6.07 [07:16:25] RECOVERY Disk Space is now: OK on precise-test i-00000231 output: DISK OK [07:16:25] RECOVERY Total Processes is now: OK on precise-test i-00000231 output: PROCS OK: 88 processes [07:16:41] PROBLEM Current Load is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:17:19] PROBLEM Current Load is now: WARNING on bots-2 i-0000009c output: WARNING - load average: 3.18, 5.26, 5.31 [07:17:19] PROBLEM Current Load is now: WARNING on upload-wizard i-0000021c output: WARNING - load average: 4.86, 6.38, 5.19 [07:18:20] PROBLEM Current Load is now: WARNING on ipv6test1 i-00000282 output: WARNING - load average: 2.61, 5.22, 5.26 [07:18:25] PROBLEM Free ram is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:25] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:25] PROBLEM Total Processes is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:08] RECOVERY Current Load is now: OK on incubator-bot1 i-00000251 output: OK - load average: 0.68, 2.65, 4.47 [07:19:13] RECOVERY SSH is now: OK on ipv6test1 i-00000282 output: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [07:19:30] PROBLEM Free ram is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:19:30] PROBLEM Total Processes is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [07:21:08] RECOVERY Current Load is now: OK on maps-tilemill1 i-00000294 output: OK - load average: 0.09, 2.10, 4.72 [07:21:08] RECOVERY Current Load is now: OK on precise-test i-00000231 output: OK - load average: 0.46, 2.83, 4.81 [07:21:08] RECOVERY dpkg-check is now: OK on ipv6test1 i-00000282 output: All packages OK [07:21:08] RECOVERY Current Load is now: OK on bots-apache1 i-000000b0 output: OK - load average: 3.08, 3.47, 4.98 [07:21:08] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 1.58, 5.82, 8.89 [07:21:08] RECOVERY Current Users is now: OK on mobile-testing i-00000271 output: USERS OK - 0 users currently logged in [07:21:08] RECOVERY Free ram is now: OK on mobile-testing i-00000271 output: OK: 67% free memory [07:21:10] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [07:21:10] RECOVERY Total Processes is now: OK on mobile-testing i-00000271 output: PROCS OK: 227 processes [07:22:03] RECOVERY Current Load is now: OK on upload-wizard i-0000021c output: OK - load average: 0.08, 2.59, 3.91 [07:22:03] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 1.59, 3.07, 4.34 [07:23:23] RECOVERY Current Load is now: OK on ipv6test1 i-00000282 output: OK - load average: 0.05, 2.03, 3.87 [07:23:45] RECOVERY dpkg-check is now: OK on mobile-testing i-00000271 output: All packages OK [07:26:11] RECOVERY Current Load is now: OK on pybal-precise i-00000289 output: OK - load average: 0.07, 1.38, 3.74 [07:28:13] Spam spam spam spam. Lovely spam! Wonderful spam! [07:36:16] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 1.60, 1.38, 3.98 [07:38:05] so the stupid cluster is dead again :( [07:38:46] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.04, 1.33, 3.90 [07:43:42] !log deployment-prep hashar: restarted udp2log daemon on -feed (15 to 20 python processes there [07:43:49] Logged the message, Master [07:47:27] !log deployment-prep hashar: restarting squid, seems stalled [07:47:28] Logged the message, Master [07:48:41] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.59, 0.85, 2.43 [07:49:08] TBloemink: don't see any spam [07:49:28] Flood flood flood flood. Lovely flood! Wonderful flood! [07:49:29] :P [07:53:29] !log deployment-prep hashar: restart failed, puppet dead, various squid related process in zombie mode --> rebooting deployment-squid [07:53:31] Logged the message, Master [07:53:53] that is the lazy way to fix stuff [07:59:34] PROBLEM host: deployment-squid is DOWN address: i-000000dc CRITICAL - Host Unreachable (i-000000dc) [08:00:07] !log deployment-prep Rebooting -squid using nova web interface [08:00:09] Logged the message, Master [08:09:49] RECOVERY host: deployment-squid is UP address: i-000000dc PING OK - Packet loss = 0%, RTA = 0.43 ms [08:16:28] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [08:16:36] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [08:16:36] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on nova-precise1 i-00000236 output: Puppet has not run in last 20 hours [08:16:46] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on nova-essex-test i-000001f9 output: Puppet has not run in last 20 hours [08:17:36] ACKNOWLEDGEMENT Puppet freshness is now: CRITICAL on localpuppet2 i-0000029b output: Puppet has not run in last 20 hours [08:18:57] !log deployment-prep Squid answering again :-D [08:18:58] Logged the message, Master [08:51:14] !log mingle dbtest server created and updated files in place [08:51:15] Logged the message, Master [08:57:14] !log shop-analytics set up main with apache,php and certs, assigned IP [08:57:15] Logged the message, Master [09:03:52] PROBLEM HTTP is now: CRITICAL on shop-analytics-main i-000001e6 output: Connection refused [09:06:54] * Jamesofur sighs [09:07:45] Ryan_Lane: around or just auto joining? :) [09:07:57] Jamesofur: depends ;) [09:08:25] I'm trying to run puppet to install apache but it's getting an error about duplicate definitions on the remote server [09:08:39] what do you have defined? [09:08:40] jamesur@shop-analytics-main:/etc/init.d$ sudo puppetd -tv [09:08:40] info: Loading facts in projectgid [09:08:42] info: Loading facts in default_gateway [09:08:43] info: Loading facts in projectgid [09:08:45] info: Loading facts in default_gateway [09:08:46] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Package[apache2] is already defined in file /etc/puppet/manifests/webserver.pp at line 69; cannot redefine at /etc/puppet/manifests/webserver.pp:29 on node i-000001e6.pmtpa.wmflabs [09:08:48] warning: Not using cache on failed catalog [09:08:49] use pastebin ;) [09:08:49] err: Could not retrieve catalog; skipping run [09:08:56] yeah that would help wouldn't it [09:09:12] Jamesofur: what classes/variables do you have set? [09:09:21] some classes conflict [09:09:55] I don't really have anything set yet, it's the first time I've used the instance and was just trying to get things set up so configured it for apache, php and the certificates [09:10:52] then tried to run puppet to pull them in [09:11:23] Jamesofur: right, but which classes? [09:11:33] some conflict... [09:13:51] sorry, still figuring out the context/terminology for puppet. classes for examples as in the webserver part of web server::php5 ? [09:16:38] Jamesofur: in the configure interface [09:16:59] there's groups, in the groups are classes and variables [09:17:02] classes have checkboxes [09:17:08] which ones do you have checked? [09:17:38] ok sorry, php5, apache2 and certificates::star_wmflabs_org and certificates::star_wmflabs [09:17:56] do we only need one of the web server ones? [09:17:57] I'm betting php5 and apache2 are conflicting [09:18:04] just define the php5 one [09:18:10] * Jamesofur nods, lets try that makes sense [09:18:25] our apache puppet stuff is 100% fucked up [09:18:52] Ryan_Lane: there we go :) that did it [09:19:01] what I get for assumeing [09:19:07] assuming too [09:19:23] well, it's not a bad assumption [09:19:49] * Jamesofur nods [09:21:23] there we go, perfect [09:23:07] !log shop-analytics puppet run with Ryan's help, base landing page up and live on wmflabs in basic config for now more tomorrow. [09:23:08] Logged the message, Master [09:23:29] Ryan_Lane: thanks for the help, I really wanted to get that done before bed [09:23:36] yw [09:24:20] RECOVERY HTTP is now: OK on shop-analytics-main i-000001e6 output: HTTP OK: HTTP/1.1 200 OK - 8733 bytes in 0.015 second response time [09:25:23] damn right [10:01:56] paravoid: can you write up some documentation and send a post to labs-l about the puppet stuff? [10:57:30] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.99, 4.95, 5.00 [11:05:30] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.95, 5.50, 5.19 [11:35:35] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.78, 4.76, 4.99 [12:36:40] 05/31/2012 - 12:36:40 - Updating keys for vvv at /export/home/opengrok/vvv [12:37:08] 05/31/2012 - 12:37:08 - Updating keys for vvv at /export/home/openstack/vvv [12:37:16] 05/31/2012 - 12:37:16 - Updating keys for vvv at /export/home/bastion/vvv [12:37:40] o.0 [12:38:05] * vvv wondwers why [12:38:08] I wonder if openstack can handle utf8 yet... we could make some awesome usernames. [12:38:50] Like the evil smilie? [12:39:04] Yeah :D [12:42:21] Damianz: it will when we upgrade [12:42:51] They fix it in essex? [12:43:04] yeah, we pushed in some fixes for that too [12:43:16] Cool [12:44:21] New patchset: Andrew Bogott; "Link /var/www/core to /var/www/wiki" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9509 [12:44:37] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9509 [12:44:37] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/9509 [12:49:41] I'm kinda sad oracle are murdering OCFS2 for rhel6, was gonna be my gluster fix but maybe I'll just try gluster again on my webserver and hope small files != stupid read latency again :( [12:53:43] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.59, 6.07, 6.07 [12:56:14] PROBLEM dpkg-check is now: CRITICAL on mwreview-test4 i-000002b2 output: DPKG CRITICAL dpkg reports broken packages [12:56:17] * Damianz throws something at whoever lagged io out again [12:56:31] * Reedy blames Damianz [12:56:45] * Hydriz watches the object fly to the culprit [12:57:10] does anyone know how nagios get the new monitor_service commands ? [12:57:17] is it reloaded automatically by puppet ? [12:57:17] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?hostgroup=deployment-prep&style=detail [12:57:25] apaches show with a warning 403 Forbidden there [12:57:46] hasher: Can you fix beta so that it does not use the bit.wikimedia.org/static-trunk url? [12:57:49] It uses some c++ parsing the wiki pages IIRC [12:57:49] though that should be fixed https://gerrit.wikimedia.org/r/#/c/9391/1/manifests/apaches.pp,unified [12:58:17] Hydriz: what for ? [12:58:21] * Damianz points Reedy towards lcarr and wonders if we can have some huge pipes :D [12:58:28] its showing not found messages for the images [12:58:32] Hydriz: don't we want to run beta from master ? [12:58:43] but the directory does not exist [12:58:51] which images ? which directory? [12:59:02] https://bits.wikimedia.org/static-trunk/skins/vector/images/search-ltr.png?303-4 [12:59:07] our search icon... [12:59:33] Ryan_Lane: do you have any idea how nagios on wmflabs works? I added a new monitor_service yesterday but it is not showing up / fixing the issue on deployment-prep [12:59:42] it doesn't use puppet [13:00:04] ohhh [13:00:13] it does a semantic search, getting back a json object [13:00:16] for all instances [13:00:32] it then sees which puppet classes are used, and makes monitors based on that [13:00:38] we added yesterday a new class apaches::monitoring::labs <-- https://gerrit.wikimedia.org/r/#/c/9391/1/manifests/apaches.pp,unified [13:00:55] so it looks like the new monitor_service is not going to be applied :D [13:01:00] right [13:01:14] RECOVERY dpkg-check is now: OK on mwreview-test4 i-000002b2 output: All packages OK [13:01:28] Hydriz: well it should be available off the regular url, something like en.wikimedia.beta.wmflabs.org/w/skins/vector/images/search-ltr.png or something similar [13:01:45] Hydriz: I don't want to have beta mix up stuff with a wmf branch :-] [13:02:07] yeah, but its not showing images... [13:02:21] unless we get our own bit.wikimedia.beta.wmflabs.org [13:02:25] *bits [13:02:55] Can we kill idiots with fire? [13:03:05] Hydriz: open a bug please :-D [13:03:16] * Hydriz facepalms [13:03:16] Hydriz: I will definitely implements bit.wikimedia.beta.wmflabs.org at some point [13:03:31] meanwhile, I need to investigate the issue, and find out what is wrong [13:03:35] * Hydriz hates making bugs [13:03:39] I am not going to do that now [13:03:49] bits would be good from a point of varnish [13:04:05] (bringing daughter to doctor, then have to catch my train to Paris to then fly to Berlin) :-D [13:04:06] so maybe [13:04:11] this evening I will have a Look [13:04:22] or Friday afternoon [13:04:28] Berlin is nice, Berlin > Doctor [13:04:28] might just end up creating bits hehe [13:04:41] Daugther >>>>> my needs :-] [13:04:56] Nah, gotta fend for yourself at somepoint :D [13:05:21] Hydriz: bugs are good cause they provide a record of what we do. This way someone can refer to the bug number to query about progress, find out who did what etc… :D [13:05:36] Hydriz: though I understand it is a bit boring to create one line bugs [13:05:43] is there a bot to help? [13:05:56] to create bugs ? [13:06:01] not I know of [13:06:01] Would be nice [13:06:10] yeah, I wish for one [13:06:14] well [13:06:17] packing my luggage [13:06:25] bringing daughter. See you later tonight maybe [13:51:59] A newer build of the Ubuntu lucid server image is available. [13:51:59] It is named 'release' and has build serial '20120403' [13:52:00] heh [13:52:49] lol [13:57:49] 05/31/2012 - 13:57:49 - Creating a home directory for faidon at /export/home/mwreview/faidon [13:58:44] 05/31/2012 - 13:58:44 - Updating keys for faidon at /export/home/mwreview/faidon [14:00:48] 05/31/2012 - 14:00:48 - Creating a home directory for bharris at /export/home/mwreview/bharris [14:00:48] 05/31/2012 - 14:00:48 - Creating a home directory for erik at /export/home/mwreview/erik [14:01:56] 05/31/2012 - 14:01:56 - Updating keys for bharris at /export/home/mwreview/bharris [14:01:56] 05/31/2012 - 14:01:56 - Updating keys for erik at /export/home/mwreview/erik [14:06:02] Reedy: yeah, we're using an old version [14:09:24] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 5% free memory [14:10:09] cool [14:10:15] thats really low [14:14:28] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 13% free memory [14:17:31] * Thehelpfulone blames Hydriz [14:17:38] using up all the ram [14:17:40] :( [14:17:48] I don't use the bots project lol [14:17:51] !nagios [14:17:52] http://nagios.wmflabs.org/nagios3 [14:27:27] YuviPanda: interesting cloak, how did you get that one? [14:27:47] Being at a conference [14:28:08] I whois'd myself for the first time [14:28:17] gj [14:28:24] ty [14:28:41] Reedy: 9_9 [14:57:23] petan|wk: How do we remove an alias from the wm-bot? [14:58:04] !blah unalias [14:58:13] oh I see lol [14:58:23] thanks! :) [14:58:26] !blah alias wm-bot [14:58:27] Created new alias for this key [14:58:29] !blah unalias [14:58:30] Alias removed! [14:58:55] Needs documentation though [14:59:07] I only see operator commands, not normal-user commands [15:41:11] PROBLEM Puppet freshness is now: CRITICAL on blamemaps-m1small i-000002a1 output: Puppet has not run in last 20 hours [16:18:45] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282 output: Warning: 19% free memory [16:33:37] RECOVERY Free ram is now: OK on ipv6test1 i-00000282 output: OK: 22% free memory [16:43:21] PROBLEM Puppet freshness is now: CRITICAL on mailman-01 i-00000235 output: Puppet has not run in last 20 hours [17:02:08] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282 output: Warning: 19% free memory [17:14:08] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 4.68, 4.68, 5.00 [17:36:42] Thehelpfulone: is there some sort of #cvn-labs yet? [17:38:18] not yet, once petan makes the RC feed [17:38:44] Ok [18:44:24] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 5.64, 5.57, 5.19 [18:54:03] PROBLEM Disk Space is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [18:54:43] PROBLEM Free ram is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [18:55:53] PROBLEM Total Processes is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [18:56:30] PROBLEM dpkg-check is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [18:58:02] PROBLEM Current Load is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [18:58:31] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 22.46, 16.73, 11.33 [18:58:31] PROBLEM Current Users is now: CRITICAL on build1 i-000002b3 output: Connection refused by host [20:04:50] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 5% free memory [20:09:40] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 6% free memory [20:15:50] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3555 MB (20% inode=77%): [20:20:50] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [20:24:30] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 2.69, 2.39, 4.80 [20:38:49] PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 3613 MB (20% inode=77%): [20:49:29] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 1.45, 3.73, 4.66 [20:49:49] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 5% free memory [21:06:25] 05/31/2012 - 21:06:25 - Updating keys for akhanna at /export/home/outreach/akhanna [21:14:25] PROBLEM Total Processes is now: CRITICAL on mwreview-test5 i-000002b4 output: Connection refused by host [21:15:05] PROBLEM dpkg-check is now: CRITICAL on mwreview-test5 i-000002b4 output: Connection refused by host [21:16:15] PROBLEM Current Load is now: CRITICAL on mwreview-test5 i-000002b4 output: Connection refused by host [21:16:55] PROBLEM Current Users is now: CRITICAL on mwreview-test5 i-000002b4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:17:35] PROBLEM Disk Space is now: CRITICAL on mwreview-test5 i-000002b4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:18:18] PROBLEM Free ram is now: CRITICAL on mwreview-test5 i-000002b4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:21:21] RECOVERY Current Load is now: OK on mwreview-test5 i-000002b4 output: OK - load average: 0.91, 0.98, 0.82 [21:21:58] RECOVERY Current Users is now: OK on mwreview-test5 i-000002b4 output: USERS OK - 1 users currently logged in [21:22:39] RECOVERY Disk Space is now: OK on mwreview-test5 i-000002b4 output: DISK OK [21:23:01] !log shop-analytics updated pushed to site [21:23:10] Logged the message, Master [21:24:28] RECOVERY Total Processes is now: OK on mwreview-test5 i-000002b4 output: PROCS OK: 90 processes [21:28:41] RECOVERY Free ram is now: OK on mwreview-test5 i-000002b4 output: OK: 87% free memory [21:33:18] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:34:21] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:34:21] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [21:35:13] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 9% free memory [21:38:15] RECOVERY Current Load is now: OK on worker1 i-00000208 output: OK - load average: 3.03, 4.43, 2.61 [21:39:20] RECOVERY Current Users is now: OK on worker1 i-00000208 output: USERS OK - 0 users currently logged in [21:39:20] RECOVERY Disk Space is now: OK on worker1 i-00000208 output: DISK OK [21:46:11] RECOVERY dpkg-check is now: OK on mwreview-test5 i-000002b4 output: All packages OK [22:03:00] PROBLEM Total Processes is now: CRITICAL on mwreview-test5 i-000002b4 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:11:01] PROBLEM Current Load is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:11:33] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 5% free memory [22:11:58] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: CHECK_NRPE: Socket timeout after 10 seconds. [22:16:20] PROBLEM Puppet freshness is now: CRITICAL on aggregator-test3 i-00000293 output: Puppet has not run in last 20 hours [22:17:00] RECOVERY Current Load is now: OK on reportcard2 i-000001ea output: OK - load average: 0.04, 1.09, 1.18 [22:21:20] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 3.22, 3.51, 3.28 [22:21:20] PROBLEM Puppet freshness is now: CRITICAL on dumps-2 i-00000257 output: Puppet has not run in last 20 hours [22:21:25] PROBLEM Puppet freshness is now: CRITICAL on bots-cb i-0000009e output: Puppet has not run in last 20 hours [22:26:27] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 1.89, 2.16, 2.72 [22:36:41] OMG labs is unusuably slow [22:36:55] I'm getting multi-second ping times from bastion to my instance [22:37:10] That's on an INTERNAL, VIRTUAL network, those ping times should be <1ms [22:37:54] Oh and now things are responsive again, weird [22:44:51] RoanKattouw: deployment-prep is worse ;) [22:46:14] PROBLEM Current Users is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:46:15] PROBLEM Disk Space is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:46:48] RoanKattouw: Should we kidnap paravoid and hold him hostage until Ryan fixes it? [22:47:05] Maybe I should try not running my PHP code off of /mnt [22:47:29] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 3.87, 6.77, 5.04 [22:47:29] PROBLEM Current Load is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:29] PROBLEM Current Users is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:29] PROBLEM Disk Space is now: CRITICAL on worker1 i-00000208 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:29] PROBLEM Free ram is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:30] PROBLEM Total Processes is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:47:34] PROBLEM dpkg-check is now: CRITICAL on ganglia-test5 i-000002a7 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:21] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 27.51, 17.83, 8.88 [22:54:31] PROBLEM Current Load is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:31] PROBLEM Total Processes is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:54:36] PROBLEM dpkg-check is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:55:19] PROBLEM Current Users is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:55:19] PROBLEM Free ram is now: CRITICAL on migration1 i-00000261 output: CHECK_NRPE: Socket timeout after 10 seconds. [22:59:10] RECOVERY Current Load is now: OK on migration1 i-00000261 output: OK - load average: 0.31, 1.79, 1.52 [22:59:10] RECOVERY Total Processes is now: OK on migration1 i-00000261 output: PROCS OK: 90 processes [22:59:15] RECOVERY dpkg-check is now: OK on migration1 i-00000261 output: All packages OK [23:00:18] RECOVERY Current Users is now: OK on migration1 i-00000261 output: USERS OK - 0 users currently logged in [23:00:23] RECOVERY Free ram is now: OK on migration1 i-00000261 output: OK: 78% free memory [23:08:59] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 1.79, 2.71, 4.56 [23:18:23] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282 output: Warning: 19% free memory [23:23:18] RECOVERY Free ram is now: OK on ipv6test1 i-00000282 output: OK: 20% free memory [23:26:25] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.00, 6.69, 5.40 [23:31:27] PROBLEM Current Load is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [23:36:38] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.06, 7.03, 6.17