[00:46:32] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 11% free memory [00:51:32] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 26% free memory [01:01:12] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 7.27, 6.60, 5.40 [01:31:12] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.91, 4.05, 4.73 [01:54:25] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 6.56, 6.01, 5.31 [01:58:32] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 19% free memory [02:44:32] RECOVERY Puppet freshness is now: OK on deployment-web2 i-00000125 output: puppet ran at Sat Apr 21 02:44:10 UTC 2012 [02:49:42] RECOVERY Puppet freshness is now: OK on deployment-web4 i-00000163 output: puppet ran at Sat Apr 21 02:49:27 UTC 2012 [02:58:12] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [03:02:12] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [03:12:22] PROBLEM Free ram is now: WARNING on deployment-web4 i-00000163 output: Warning: 18% free memory [03:17:23] RECOVERY Free ram is now: OK on deployment-web4 i-00000163 output: OK: 21% free memory [03:44:49] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:47:19] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [03:49:49] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 16% free memory [04:00:49] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [04:04:49] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:09:50] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:09:50] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 4% free memory [04:12:20] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:14:50] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:17:20] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 92% free memory [04:20:50] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 3% free memory [04:25:50] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [06:43:34] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 5.40, 6.27, 5.45 [07:03:34] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.20, 3.08, 4.55 [07:21:43] New patchset: Dzahn; "wikistats - define a logdir" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5500 [07:21:56] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/5500 [07:22:25] New review: Dzahn; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5500 [07:22:28] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5500 [07:24:20] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [09:43:22] PROBLEM Free ram is now: WARNING on deployment-web4 i-00000163 output: Warning: 19% free memory [09:48:22] RECOVERY Free ram is now: OK on deployment-web4 i-00000163 output: OK: 22% free memory [10:43:22] PROBLEM Free ram is now: WARNING on deployment-web4 i-00000163 output: Warning: 19% free memory [10:44:35] Change on 12mediawiki a page Wikimedia Labs/status was modified, changed by 115.252.142.55 link https://www.mediawiki.org/w/index.php?diff=528146 edit summary: /* 2012-04-21 */ new section [10:44:36] Change on 12mediawiki a page Wikimedia Labs/status was modified, changed by 115.252.142.55 link https://www.mediawiki.org/w/index.php?diff=528146 edit summary: /* 2012-04-21 */ new section [10:44:42] PROBLEM Disk Space is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:42] PROBLEM Total Processes is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:47] PROBLEM Current Users is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:47] PROBLEM Free ram is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:47] PROBLEM dpkg-check is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:45:02] PROBLEM Current Load is now: CRITICAL on bots-cb i-0000009e output: CHECK_NRPE: Socket timeout after 10 seconds. [10:49:32] RECOVERY Disk Space is now: OK on bots-cb i-0000009e output: DISK OK [10:49:32] RECOVERY Current Users is now: OK on bots-cb i-0000009e output: USERS OK - 0 users currently logged in [10:49:32] RECOVERY Total Processes is now: OK on bots-cb i-0000009e output: PROCS OK: 102 processes [10:49:37] RECOVERY Free ram is now: OK on bots-cb i-0000009e output: OK: 68% free memory [10:49:37] RECOVERY dpkg-check is now: OK on bots-cb i-0000009e output: All packages OK [10:49:52] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 2.58, 19.21, 13.66 [11:09:56] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.33, 0.59, 3.90 [12:27:16] PROBLEM Free ram is now: WARNING on deployment-web6 i-000001d9 output: Warning: 19% free memory [12:57:16] RECOVERY Free ram is now: OK on deployment-web6 i-000001d9 output: OK: 20% free memory [12:59:16] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [13:03:19] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [13:18:39] PROBLEM Free ram is now: WARNING on incubator-bots2 i-00000119 output: Warning: 19% free memory [13:25:19] PROBLEM Free ram is now: WARNING on deployment-web6 i-000001d9 output: Warning: 19% free memory [14:19:32] why may I be rejected from connecting to the instance I just created? [14:19:38] ie. Permission denied (publickey). [14:20:01] (yes, I'm tunneling through bastion) [14:26:39] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [14:27:19] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 1.51, 4.36, 2.41 [14:27:35] your keys may not be updated Platonides? [14:27:36] it doesn't look right either that it is renewing the dhcp leases each 50 seconds... [14:27:51] Thehelpfulone, I can connect perfectly to bastion [14:27:59] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 3.29, 17.07, 10.19 [14:32:19] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.23, 1.76, 1.80 [14:42:59] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.25, 1.07, 4.00 [14:56:39] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [15:13:44] PROBLEM Free ram is now: CRITICAL on incubator-apache i-00000211 output: Connection refused by host [15:14:51] PROBLEM Free ram is now: CRITICAL on incubator-bots2 i-00000119 output: CHECK_NRPE: Socket timeout after 10 seconds. [15:15:01] PROBLEM Total Processes is now: CRITICAL on incubator-apache i-00000211 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:16:01] PROBLEM dpkg-check is now: CRITICAL on incubator-apache i-00000211 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:17:01] PROBLEM Current Load is now: CRITICAL on incubator-apache i-00000211 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:17:41] PROBLEM Current Users is now: CRITICAL on incubator-apache i-00000211 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:18:24] RECOVERY Total Processes is now: OK on incubator-bots2 i-00000119 output: PROCS OK: 123 processes [15:18:29] PROBLEM Disk Space is now: CRITICAL on incubator-apache i-00000211 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:18:39] PROBLEM Current Load is now: CRITICAL on incubator-bots2 i-00000119 output: CRITICAL - load average: 526.15, 1496.76, 803.32 [15:18:49] RECOVERY Free ram is now: OK on incubator-apache i-00000211 output: OK: 93% free memory [15:19:19] RECOVERY Free ram is now: OK on incubator-bots2 i-00000119 output: OK: 94% free memory [15:19:59] RECOVERY Total Processes is now: OK on incubator-apache i-00000211 output: PROCS OK: 98 processes [15:20:59] RECOVERY dpkg-check is now: OK on incubator-apache i-00000211 output: All packages OK [15:21:31] !log incubator Resolved severely overloaded instance incubator-bots2, was in the process of kill all bots on that server to decrease the number of processes that is running on that server (from 23k+ to >200) [15:21:33] Logged the message, Master [15:21:59] RECOVERY Current Load is now: OK on incubator-apache i-00000211 output: OK - load average: 0.06, 0.37, 0.36 [15:22:39] RECOVERY Current Users is now: OK on incubator-apache i-00000211 output: USERS OK - 1 users currently logged in [15:22:50] 23k bots? :O [15:23:09] RECOVERY Disk Space is now: OK on incubator-apache i-00000211 output: DISK OK [15:23:38] lol no [15:23:50] its only 5, but I had a script to keep it running [15:24:02] and it has been running for almost a month now [15:25:59] well, 23k copies of the same 5 bots [15:26:29] yeah, kinda [15:26:39] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [15:27:54] !log incubator Initialising instance incubator-apache to replace broken incubator-web [15:27:55] Logged the message, Master [15:35:31] Change on 12mediawiki a page Wikimedia Labs/status was modified, changed by Sumanah link https://www.mediawiki.org/w/index.php?diff=528162 edit summary: Undo revision 528146 by [[Special:Contributions/115.252.142.55|115.252.142.55]] ([[User talk:115.252.142.55|talk]]) [15:35:32] Change on 12mediawiki a page Wikimedia Labs/status was modified, changed by Sumanah link https://www.mediawiki.org/w/index.php?diff=528162 edit summary: Undo revision 528146 by [[Special:Contributions/115.252.142.55|115.252.142.55]] ([[User talk:115.252.142.55|talk]]) [15:56:39] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [15:59:26] ping Ryan_Lane [16:12:19] PROBLEM Free ram is now: CRITICAL on bots-3 i-000000e5 output: Critical: 2% free memory [16:13:39] PROBLEM Free ram is now: WARNING on deployment-web4 i-00000163 output: Warning: 19% free memory [16:17:19] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 49% free memory [16:18:39] RECOVERY Free ram is now: OK on deployment-web4 i-00000163 output: OK: 22% free memory [16:18:39] PROBLEM Current Load is now: WARNING on incubator-bots2 i-00000119 output: WARNING - load average: 0.00, 0.02, 16.72 [16:26:39] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [16:38:40] RECOVERY Current Load is now: OK on incubator-bots2 i-00000119 output: OK - load average: 0.00, 0.02, 4.61 [16:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [17:13:40] PROBLEM Free ram is now: WARNING on deployment-web4 i-00000163 output: Warning: 18% free memory [17:25:12] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [17:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [17:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [18:04:30] PROBLEM Free ram is now: WARNING on deployment-web2 i-00000125 output: Warning: 19% free memory [18:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [18:30:20] RECOVERY Free ram is now: OK on deployment-web6 i-000001d9 output: OK: 20% free memory [18:43:20] PROBLEM Free ram is now: WARNING on deployment-web6 i-000001d9 output: Warning: 19% free memory [18:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [19:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [19:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [20:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [20:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [21:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [21:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [22:03:20] RECOVERY Free ram is now: OK on deployment-web6 i-000001d9 output: OK: 20% free memory [22:26:20] PROBLEM Free ram is now: WARNING on deployment-web6 i-000001d9 output: Warning: 19% free memory [22:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [22:31:10] PROBLEM Puppet freshness is now: CRITICAL on mobile-feeds i-000000c1 output: Puppet has not run in last 20 hours [22:56:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [23:00:10] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [23:04:10] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [23:26:40] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100% [23:56:41] PROBLEM host: grail is DOWN address: i-00000210 PING CRITICAL - Packet loss = 100%