[00:17:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [00:25:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [00:47:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [02:11:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:12:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [02:18:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [02:19:56] !log LocalisationUpdate completed (1.24wmf17) at 2014-08-24 02:18:52+00:00 [02:20:08] Logged the message, Master [02:26:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [02:32:16] !log LocalisationUpdate completed (1.24wmf18) at 2014-08-24 02:31:13+00:00 [02:32:22] Logged the message, Master [02:46:24] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:48:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [02:49:14] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [02:49:15] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.011 second response time [02:49:24] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [02:49:44] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [02:50:44] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: Puppet has 1 failures [02:59:04] PROBLEM - HTTP 5xx req/min on labmon1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [02:59:44] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [03:01:14] PROBLEM - puppet last run on amssq58 is CRITICAL: CRITICAL: Epic puppet fail [03:02:44] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet has 1 failures [03:02:44] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 6 failures [03:02:44] PROBLEM - puppet last run on amslvs4 is CRITICAL: CRITICAL: Puppet has 1 failures [03:06:44] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [03:08:14] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [03:14:04] RECOVERY - HTTP 5xx req/min on labmon1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:14:44] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:17:44] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [03:17:44] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [03:18:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 24 03:17:46 UTC 2014 (duration 17m 45s) [03:18:58] Logged the message, Master [03:19:44] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [03:21:14] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:00:12] (03PS1) 10Ori.livneh: hhvm: dedupe mysql config key [operations/puppet] - 10https://gerrit.wikimedia.org/r/156016 [04:06:00] (03CR) 10Ori.livneh: "Yes, the apache class should go." [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [04:09:33] (03PS3) 10Ori.livneh: mw-rc-irc - use apache::site [operations/puppet] - 10https://gerrit.wikimedia.org/r/153843 (owner: 10Dzahn) [04:19:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [04:27:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [04:32:53] (03CR) 10Ori.livneh: "Reverting the change in its entirety does more than is necessary for fixing the problem -- it's enough to remove the 'include ::diamond' l" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [04:37:04] (03PS3) 10Ori.livneh: Don't include ::diamond in apache::monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [04:38:21] (03CR) 10Ori.livneh: "PS3 cherry-picked in labs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [04:38:59] (03CR) 10Hoo man: [C: 04-1] "This still needs a beta<>production switch AFAIS" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/155753 (owner: 1001tonythomas) [04:39:57] (03CR) 10Ori.livneh: [C: 04-1] "https://gerrit.wikimedia.org/r/#/c/154401/ is the proper fix; modules should be realm-agnostic." [operations/puppet] - 10https://gerrit.wikimedia.org/r/154401 (owner: 10Hashar) [04:43:25] ori: Wrong link [04:43:27] :P [04:43:39] You just want to use defined() over there I guess [04:44:17] yes, re: wrong link (and thanks for pointing it out); no, re: defined() [04:44:45] (03CR) 10Ori.livneh: "https://gerrit.wikimedia.org/r/#/c/152943/ , rather" [operations/puppet] - 10https://gerrit.wikimedia.org/r/154401 (owner: 10Hashar) [04:45:06] puppet fails over duplicate definitions for a reason [04:45:47] it could just as well tolerate them silently, when the declarations don't actually contradict one another, which is usually the case [04:45:56] but it doesn't; it goes out of its way to report that as an error [04:46:48] the reason it does that has to do with the design philosophy of puppet, which stipulates you should declare things in one place [04:47:28] because doing otherwise is redundant and most likely a sign of confusion or carelessness [04:47:52] yeah, true [04:48:32] just using defined means you get *some* definition which seems awry [04:48:34] so that makes defined() something of a hack in my book, and it doesn't even work unless all declarations of the relevant resource are enclosed in a defined() check [04:49:09] or rather, in such cases it works irregularly, since it depends on the parse order, which in puppet (again, by design) is undefined [04:49:31] yeah, so you jsut get *something* [04:49:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [04:49:42] right [04:50:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:51:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [04:52:02] hi, by the way. hope my response wasn't too cranky. i'm fighting a ferocious headache. [05:10:55] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:11:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:13:54] ori: Not an issue... I'm half asleep right now (although it's only 1:13am) [05:14:00] well, "only" [05:50:24] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:50:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [05:51:24] RECOVERY - mailman_ctl on sodium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [05:51:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [06:20:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [06:27:25] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Epic puppet fail [06:27:45] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Epic puppet fail [06:27:54] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: Epic puppet fail [06:28:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [06:28:34] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:44] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:44] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:14] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:14] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:54] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:34] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: Epic puppet fail [06:45:04] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:45:45] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:45:45] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:14] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:34] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:44] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:55] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:47:34] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:50:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [06:51:15] RECOVERY - Disk space on ms1004 is OK: DISK OK [06:51:44] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:54:15] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 485 MB (2% inode=94%): /var/lib/ureadahead/debugfs 485 MB (2% inode=94%): [07:41:54] (03PS1) 10Matanya: syslog: qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/156022 [08:10:55] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [08:11:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [08:21:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [08:29:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [08:50:54] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [08:51:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [08:51:54] RECOVERY - mailman_qrunner on sodium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [10:22:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [10:30:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [10:41:54] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 37 data above and 0 below the confidence bounds [10:52:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [11:40:25] (03Abandoned) 10Hashar: diamond dupe def with apache::monitoring on labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/154401 (owner: 10Hashar) [12:02:12] !log Removed IPv6 subnet 2620:0:860:2::/64 from cr2-pmtpa:irb.101 [12:02:19] Logged the message, Master [12:07:30] (03PS1) 10Mark Bergsma: Remove pmtpa IPv6 subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/156032 [12:08:23] (03CR) 10Mark Bergsma: [C: 032] Remove pmtpa IPv6 subnets [operations/dns] - 10https://gerrit.wikimedia.org/r/156032 (owner: 10Mark Bergsma) [12:09:14] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Epic puppet fail [12:16:43] (03PS1) 10Mark Bergsma: Allocate codfw IPv6 subnets (public & private) [operations/dns] - 10https://gerrit.wikimedia.org/r/156033 [12:19:05] (03CR) 10Mark Bergsma: [C: 032] Allocate codfw IPv6 subnets (public & private) [operations/dns] - 10https://gerrit.wikimedia.org/r/156033 (owner: 10Mark Bergsma) [12:23:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [12:28:14] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [12:31:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [12:53:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [13:05:35] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: Puppet has 1 failures [13:23:34] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [14:00:33] (03CR) 10Andrew Bogott: Random stab at getting wikitech config in here. (036 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/155789 (owner: 10Andrew Bogott) [14:22:14] aude: are you around ? [14:24:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [14:27:17] godog: when you have time, more file disappearing issue: https://upload.wikimedia.org/wikipedia/ja/4/4e/Kado-itaiji.PNG [14:32:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [14:54:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [15:24:24] PROBLEM - Disk space on elastic1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19719 MB (3% inode=99%): [15:26:24] PROBLEM - Disk space on elastic1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19893 MB (3% inode=99%): [15:49:04] PROBLEM - Disk space on elastic1012 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19543 MB (3% inode=99%): [15:59:04] PROBLEM - Disk space on elastic1012 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19980 MB (3% inode=99%): [16:05:04] PROBLEM - Disk space on elastic1012 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20092 MB (3% inode=99%): [16:07:24] PROBLEM - Disk space on elastic1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 19752 MB (3% inode=99%): [16:17:04] PROBLEM - Disk space on elastic1012 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20204 MB (3% inode=99%): [16:25:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [16:33:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [16:37:38] How's the quake impact in SF itself? I see lots of news reports and photos about pretty bad situations in e.g. Sonoma, but no real story on any lesser impact down in SF itself (e.g. power outages, etc). [16:52:24] PROBLEM - Disk space on elastic1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20159 MB (3% inode=99%): [16:55:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [16:57:24] PROBLEM - Disk space on elastic1002 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 20057 MB (3% inode=99%): [17:06:04] RECOVERY - Disk space on elastic1012 is OK: DISK OK [17:07:24] RECOVERY - Disk space on elastic1002 is OK: DISK OK [17:54:35] (03PS1) 10Ori.livneh: Update ~ori bash env [operations/puppet] - 10https://gerrit.wikimedia.org/r/156045 [17:55:28] (03CR) 10Ori.livneh: [C: 032 V: 032] Update ~ori bash env [operations/puppet] - 10https://gerrit.wikimedia.org/r/156045 (owner: 10Ori.livneh) [18:26:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [18:34:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [18:56:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [19:18:15] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: Epic puppet fail [19:24:37] "epic puppet fail" [19:37:15] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [20:27:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [20:35:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [20:50:24] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:52:14] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [20:55:24] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:56:14] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [20:57:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [22:21:05] (03PS1) 10Ori.livneh: apache: when sourcing env-enabled/*, redirect stdout to stderr [operations/puppet] - 10https://gerrit.wikimedia.org/r/156060 [22:28:34] PROBLEM - Puppet freshness on elastic1015 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:08:52 UTC [22:36:34] PROBLEM - Puppet freshness on elastic1013 is CRITICAL: Last successful Puppet run was Sat 23 Aug 2014 06:16:33 UTC [22:42:54] <^d> Aw, puppet's not happy on elastic boxen? [22:53:24] PROBLEM - RAID on analytics1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:56:02] <^d> Notice: Run of Puppet configuration client already in progress; skipping (/var/lib/puppet/state/agent_catalog_run.lock exists) [22:56:06] <^d> Boo :\ [22:56:14] RECOVERY - RAID on analytics1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [22:56:47] <^d> I see no puppet run. [22:57:13] <^d> can one just nuke the lock file? [22:58:05] <^d> Ahh, full disk. [22:58:34] PROBLEM - Puppet freshness on analytics1003 is CRITICAL: Last successful Puppet run was Fri 22 Aug 2014 20:33:50 UTC [23:13:25] RECOVERY - Disk space on elastic1013 is OK: DISK OK [23:16:25] RECOVERY - Disk space on elastic1015 is OK: DISK OK [23:17:12] <^d> !log slow indexing log going pretty bonanzas on elastic101[35]. Probably others too? Filling /var/log. [23:17:19] Logged the message, Master [23:24:35] RECOVERY - Puppet freshness on elastic1015 is OK: puppet ran at Sun Aug 24 23:24:31 UTC 2014 [23:25:14] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [23:28:04] RECOVERY - Puppet freshness on elastic1013 is OK: puppet ran at Sun Aug 24 23:27:59 UTC 2014 [23:28:14] RECOVERY - puppet last run on elastic1013 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures