[00:12:32] <Krinkle>	 ori: MWRealm.sh in ops-mw-config seems to use wikimedia-realm and wikimedia-site
[00:12:37] <Krinkle>	 https://github.com/search?q=wikimedia-realm+@wikimedia&type=Code&ref=searchresults
[00:12:41] <Krinkle>	 defaulst to pmtpa otherwise :P
[00:13:01] <Krinkle>	 also Jenkins requires the file
[00:13:02] <Krinkle>	 https://github.com/search?utf8=%E2%9C%93&q=wikimedia-site+%40wikimedia&type=Code&ref=searchresults
[00:13:49] <Krinkle>	 Its' not required absent so it'll be fine I suppose
[00:16:56] <Krinkle>	 Jenkine slaves are both inside and outside labs, it requires the site separate from realm.
[00:42:51] <grrrit-wm>	 (03PS1) 10Ori.livneh: Remove usage of MWRealm.sh [puppet] - 10https://gerrit.wikimedia.org/r/250289 
[01:04:42] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:06:33] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING
[01:12:13] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:14:12] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING
[01:55:53] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 9.09% of data above the critical threshold [500.0]
[02:00:02] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[02:03:37] <grrrit-wm>	 (03PS1) 10Ori.livneh: include mediawiki::multimedia on all application servers [puppet] - 10https://gerrit.wikimedia.org/r/250291 (https://phabricator.wikimedia.org/T35186) 
[02:21:11] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.27.0-wmf.4/cache/l10n: l10nupdate for 1.27.0-wmf.4 (duration: 06m 36s)
[02:21:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:24:46] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.4) at 2015-11-01 02:24:46+00:00
[02:24:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:43:15] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Remove usage of MWRealm.sh [puppet] - 10https://gerrit.wikimedia.org/r/250289 (owner: 10Ori.livneh)
[02:44:52] <grrrit-wm>	 (03PS1) 10Ori.livneh: Delete MWRealm.sh; now unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250292 
[02:45:35] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Delete MWRealm.sh; now unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250292 (owner: 10Ori.livneh)
[02:45:40] <grrrit-wm>	 (03Merged) 10jenkins-bot: Delete MWRealm.sh; now unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250292 (owner: 10Ori.livneh)
[03:10:55] <grrrit-wm>	 (03PS1) 10Ori.livneh: Revert "Don't commit interwiki.cdb anymore" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250294 
[03:35:23] <icinga-wm>	 PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:42] <icinga-wm>	 PROBLEM - puppet last run on mw1034 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:02:04] <icinga-wm>	 RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:03:22] <icinga-wm>	 RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:05:33] <icinga-wm>	 PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25)
[05:07:22] <icinga-wm>	 RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits.
[05:10:02] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 10.34% of data above the critical threshold [100000000.0]
[05:11:32] <icinga-wm>	 PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: Puppet has 1 failures
[05:25:49] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Nov  1 05:25:49 UTC 2015 (duration 25m 48s)
[05:25:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:36:34] <icinga-wm>	 RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[05:52:54] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 14.81% of data above the critical threshold [100000000.0]
[06:26:28] <grrrit-wm>	 (03Restored) 10Ori.livneh: Redirect most noc.wikimedia.org/conf URLs to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/224214 (owner: 10Alex Monk)
[06:26:53] <icinga-wm>	 RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[06:27:58] <grrrit-wm>	 (03CR) 10Ori.livneh: "Legoktm, MaxSem -- I know that change is bewildering and terrifying, but /conf is going away and will be replaced with a link to http://gi" [puppet] - 10https://gerrit.wikimedia.org/r/224214 (owner: 10Alex Monk)
[06:29:43] <icinga-wm>	 PROBLEM - puppet last run on mw2052 is CRITICAL: CRITICAL: puppet fail
[06:29:52] <grrrit-wm>	 (03CR) 10Legoktm: "It definitely doesn't: T83702." [puppet] - 10https://gerrit.wikimedia.org/r/224214 (owner: 10Alex Monk)
[06:30:53] <icinga-wm>	 PROBLEM - puppet last run on mw2016 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:02] <icinga-wm>	 PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 3 failures
[06:31:13] <icinga-wm>	 PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:34] <icinga-wm>	 PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:35] <icinga-wm>	 PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:32] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:32] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:54] <icinga-wm>	 PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:34] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:37:24] <icinga-wm>	 PROBLEM - puppet last run on wtp2013 is CRITICAL: CRITICAL: puppet fail
[06:56:13] <icinga-wm>	 RECOVERY - puppet last run on mw2016 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[06:56:13] <icinga-wm>	 RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[06:56:23] <icinga-wm>	 RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:44] <icinga-wm>	 RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[06:56:52] <icinga-wm>	 RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[06:57:22] <icinga-wm>	 RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:53] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:33] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:34] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:42] <icinga-wm>	 RECOVERY - puppet last run on mw2052 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[07:04:33] <icinga-wm>	 RECOVERY - puppet last run on wtp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:29:24] <icinga-wm>	 PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[10:42:02] <grrrit-wm>	 (03Abandoned) 10Prtksxna: TextExtracts: Add classes and elements to the exclusion list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/126226 (https://bugzilla.wikimedia.org/63164) (owner: 10Prtksxna)
[11:22:22] <icinga-wm>	 PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: Puppet has 1 failures
[11:49:03] <icinga-wm>	 RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[11:54:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:54:44] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[11:54:44] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:08:32] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[12:12:44] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[12:13:44] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.001 second response time on port 9042
[12:16:02] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[12:16:22] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[12:18:13] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:21:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:25:14] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[12:30:52] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:37:13] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[12:39:53] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:41:12] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[12:45:12] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[12:47:52] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.013 second response time on port 9042
[12:48:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:50:34] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:15:44] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[13:18:22] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[13:18:53] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[13:23:34] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.002 second response time on port 9042
[13:26:43] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:28:23] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[13:29:13] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[13:29:23] <icinga-wm>	 PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: Puppet has 1 failures
[13:33:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:34:33] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.001 second response time on port 9042
[13:37:02] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[13:37:32] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[13:43:42] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[13:44:44] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:53:13] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:54:32] <icinga-wm>	 RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[13:57:53] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.000 second response time on port 9042
[13:59:03] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:16:23] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[14:17:56] <Krenair>	 terbium doesn't have unzip installed?
[14:19:32] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[14:21:53] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:22:23] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[14:22:56] <Krenair>	 worked around it with python's zipfile module
[14:24:12] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[14:25:22] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[14:27:52] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:29:34] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:31:32] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[14:32:13] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 3.008 second response time on port 9042
[14:33:22] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[14:36:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:02] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:37:43] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[14:38:12] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[14:43:33] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:44:12] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:45:52] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[14:47:02] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[14:51:14] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:56:54] <icinga-wm>	 PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail
[14:59:04] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.004 second response time on port 9042
[15:03:24] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:23:32] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[15:23:32] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[15:23:53] <icinga-wm>	 RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[15:24:42] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[15:26:02] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[15:29:03] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:30:12] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:33:52] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[15:35:42] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:36:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:38:03] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[15:40:32] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.006 second response time on port 9042
[15:41:02] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING
[15:41:42] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[15:43:33] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:46:02] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[15:46:33] <icinga-wm>	 PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: puppet fail
[15:47:02] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[15:47:03] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:52:22] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[15:52:23] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:54:52] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.997 second response time on port 9042
[15:57:52] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[15:57:53] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:00:32] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[16:05:44] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.014 second response time on port 9042
[16:10:02] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:10:34] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:15:22] <icinga-wm>	 RECOVERY - puppet last run on wtp2020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:41:42] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[16:44:43] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[16:44:43] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[16:50:02] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:50:02] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:51:44] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[16:54:44] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1001 is OK: All endpoints are healthy
[16:57:22] <icinga-wm>	 PROBLEM - Restbase endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:59:03] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1002 is OK: All endpoints are healthy
[16:59:43] <gwicke>	 !log restarting cassandra on aqs1002; was out of heap space
[16:59:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:01:02] <icinga-wm>	 RECOVERY - Restbase endpoints health on aqs1003 is OK: All endpoints are healthy
[17:07:03] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.005 second response time on port 9042
[19:13:02] <icinga-wm>	 PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail
[19:23:42] <icinga-wm>	 PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: puppet fail
[19:37:33] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:39:23] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING
[19:39:53] <icinga-wm>	 RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:52:15] <icinga-wm>	 RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[20:29:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[20:34:33] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[20:34:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[20:38:03] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING
[20:39:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[20:44:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[20:49:52] <icinga-wm>	 RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 99 seconds ago with 0 failures
[21:40:06] <grrrit-wm>	 (03CR) 10MZMcBride: "It's difficult for me to look at "change is bewildering and terrifying" and not read it as you being a bit of a jerk. A few Wikimedians, m" [puppet] - 10https://gerrit.wikimedia.org/r/224214 (owner: 10Alex Monk)
[23:29:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[23:34:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[23:39:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[23:44:52] <icinga-wm>	 PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 1 failures
[23:49:52] <icinga-wm>	 RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 104 seconds ago with 0 failures