[00:11:14] <wikibugs>	 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1732320 (10Tgr) Yes, a couple hours ago. We should write to mediawiki-announce, wait a week or so as a courtesy, and then dro...
[02:30:24] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.27.0-wmf.2/cache/l10n: l10nupdate for 1.27.0-wmf.2 (duration: 08m 09s)
[02:30:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:35:14] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.2) at 2015-10-17 02:35:14+00:00
[02:35:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:58:10] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.27.0-wmf.3/cache/l10n: l10nupdate for 1.27.0-wmf.3 (duration: 08m 14s)
[02:58:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:59:18] <icinga-wm>	 PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:02:59] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.3) at 2015-10-17 03:02:58+00:00
[03:03:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:26:37] <icinga-wm>	 RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[03:31:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db1019 is CRITICAL: CRITICAL slave_sql_lag Seconds_Behind_Master: 314
[03:33:39] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db1019 is OK: OK slave_sql_lag Seconds_Behind_Master: 52
[04:24:17] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds
[04:27:38] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 1 below the confidence bounds
[05:06:48] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds
[05:54:12] <krrrit-wm>	 (03PS1) 10Legoktm: zuul: Add zuul-test-repo helper script [puppet] - 10https://gerrit.wikimedia.org/r/247031 
[06:11:37] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 16 data above and 9 below the confidence bounds
[06:14:48] <icinga-wm>	 PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: puppet fail
[06:27:04] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Oct 17 06:27:04 UTC 2015 (duration 27m 3s)
[06:27:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:30:07] <icinga-wm>	 PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: puppet fail
[06:30:08] <icinga-wm>	 PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:30:18] <icinga-wm>	 PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: puppet fail
[06:30:37] <icinga-wm>	 PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:38] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:30:57] <icinga-wm>	 PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:18] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:28] <icinga-wm>	 PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:37] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 3 failures
[06:31:37] <icinga-wm>	 PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:37] <icinga-wm>	 PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:38] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:39] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:48] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:58] <icinga-wm>	 PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:37:57] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2003 is CRITICAL: CRITICAL: puppet fail
[06:41:57] <icinga-wm>	 RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[06:55:58] <icinga-wm>	 RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[06:56:28] <icinga-wm>	 RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[06:56:29] <icinga-wm>	 RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[06:56:37] <icinga-wm>	 RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:38] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[06:57:19] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:57:28] <icinga-wm>	 RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:47] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:48] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:08] <icinga-wm>	 RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[06:58:19] <icinga-wm>	 RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:38] <icinga-wm>	 RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[06:58:48] <icinga-wm>	 RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[07:04:57] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[07:37:58] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[08:06:29] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds
[08:09:57] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 4 below the confidence bounds
[08:38:29] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 7 below the confidence bounds
[08:40:57] <icinga-wm>	 PROBLEM - puppet last run on mw1103 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:43:29] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds
[08:47:18] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[08:48:59] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[09:07:47] <icinga-wm>	 RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[09:34:18] <icinga-wm>	 PROBLEM - HTTP on krypton is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:35:17] <icinga-wm>	 PROBLEM - grafana.wikimedia.org on krypton is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:35:18] <icinga-wm>	 PROBLEM - grafana-admin.wikimedia.org on krypton is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:48:59] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[09:59:55] <krrrit-wm>	 (03CR) 10Hashar: "I have a similar shortcut on my laptop, it is definitely useful." [puppet] - 10https://gerrit.wikimedia.org/r/247031 (owner: 10Legoktm)
[10:31:59] <wikibugs>	 6operations, 7Mail: Google Mail marking Phabricator and Gerrit notification emails as spam - https://phabricator.wikimedia.org/T115416#1732681 (10Nemo_bis) >>! In T115416#1731857, @faidon wrote: >>>! In T115416#1724286, @Nemo_bis wrote: >> Well, gerrit and Phabricator emails are certainly very bad. Multiple bu...
[10:37:57] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 8 below the confidence bounds
[10:38:18] <icinga-wm>	 PROBLEM - NTP on krypton is CRITICAL: NTP CRITICAL: No response from NTP server
[11:32:08] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[11:33:48] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[11:38:48] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 7 below the confidence bounds
[11:42:07] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 6 below the confidence bounds
[11:55:28] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[12:10:48] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds
[12:12:28] <icinga-wm>	 PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail
[12:27:47] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 16 data above and 6 below the confidence bounds
[12:39:38] <icinga-wm>	 RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:51:18] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[13:06:37] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds
[13:14:58] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[13:32:18] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:39:08] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1034 is OK: OK: YARN NodeManager analytics1034.eqiad.wmnet:8041 Node-State: RUNNING
[14:05:55] <godog>	 !log reboot krypton, unable to ssh and no console (VM) iowait through the roof
[14:05:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:07:48] <icinga-wm>	 RECOVERY - grafana.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 522 bytes in 0.012 second response time
[14:07:48] <icinga-wm>	 RECOVERY - grafana-admin.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 534 bytes in 0.011 second response time
[14:08:18] <icinga-wm>	 RECOVERY - HTTP on krypton is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 504 bytes in 0.004 second response time
[14:26:47] <icinga-wm>	 RECOVERY - NTP on krypton is OK: NTP OK: Offset -0.003682971001 secs
[15:25:53] <wikibugs>	 6operations, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: Can't see any page, special:RandomPage gives database error - https://phabricator.wikimedia.org/T115505#1732814 (10Aklapper) Note: https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28technical%29&oldid=686184426#Blanked_articles...
[16:48:48] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[16:50:27] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING
[18:12:18] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1732880 (10yuvipanda) Also where should we do the redirect? I guess any requests to w.wiki/(.*) should redirect to meta.wikimedia.org/w/index.php?ti...
[18:13:29] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1732881 (10yuvipanda) Err, by redirect I mean 'route to' from varnish, not an actual redirect.
[18:30:31] <krrrit-wm>	 (03CR) 10Hashar: [C: 031] zuul: Add zuul-test-repo helper script [puppet] - 10https://gerrit.wikimedia.org/r/247031 (owner: 10Legoktm)
[18:31:29] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[18:31:58] <YuviPanda>	 ori: ^ this has been happening at least 2-3 days every day and you did get asked to be poked for it :P
[18:34:57] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[18:37:32] <Krenair>	 I think I filed a task about it YuviPanda 
[18:42:27] <icinga-wm>	 PROBLEM - WDQS HTTP on wdqs1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 393 bytes in 0.008 second response time
[18:42:39] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 393 bytes in 0.001 second response time
[18:43:42] <SMalyshev>	 oops sorry didn't disable notifications, this is expected maintenance ^
[19:14:17] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[19:25:58] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.005 second response time on port 9042
[19:31:19] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[19:33:39] <icinga-wm>	 PROBLEM - puppet last run on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:33:39] <icinga-wm>	 PROBLEM - RAID on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:07] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:07] <icinga-wm>	 PROBLEM - Disk space on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:08] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:18] <icinga-wm>	 PROBLEM - SSH on analytics1034 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:34:28] <icinga-wm>	 PROBLEM - DPKG on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:28] <icinga-wm>	 PROBLEM - configured eth on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:38] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:48] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:34:48] <icinga-wm>	 PROBLEM - dhclient process on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:35:08] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1034 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:46:33] <_joe_>	 uhm, can someone else take a look? I'm basically sleeping 
[19:47:58] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.006 second response time on port 9042
[19:58:58] <godog>	 I'll take a look
[20:02:39] <godog>	 !log powercycle analytics1034, no console no ssh
[20:02:42] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:03:17] <icinga-wm>	 PROBLEM - NTP on analytics1034 is CRITICAL: NTP CRITICAL: No response from NTP server
[20:05:27] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1034 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[20:05:48] <icinga-wm>	 RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 35 minutes ago with 0 failures
[20:05:48] <icinga-wm>	 RECOVERY - RAID on analytics1034 is OK: OK: optimal, 13 logical, 14 physical
[20:06:08] <icinga-wm>	 RECOVERY - Disk space on analytics1034 is OK: DISK OK
[20:06:08] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1034 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[20:06:08] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1034 is OK: DISK OK
[20:06:28] <icinga-wm>	 RECOVERY - SSH on analytics1034 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0)
[20:06:37] <icinga-wm>	 RECOVERY - DPKG on analytics1034 is OK: All packages OK
[20:06:37] <icinga-wm>	 RECOVERY - configured eth on analytics1034 is OK: OK - interfaces up
[20:06:38] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1034 is OK: OK: YARN NodeManager analytics1034.eqiad.wmnet:8041 Node-State: RUNNING
[20:06:48] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1034 is OK: OK: nf_conntrack is 0 % full
[20:06:57] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1034 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[20:06:58] <icinga-wm>	 RECOVERY - dhclient process on analytics1034 is OK: PROCS OK: 0 processes with command name dhclient
[20:09:58] <icinga-wm>	 RECOVERY - NTP on analytics1034 is OK: NTP OK: Offset 0.00170981884 secs
[20:10:49] <icinga-wm>	 PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail
[20:23:08] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[20:26:38] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[20:36:18] <icinga-wm>	 RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[21:02:37] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[21:05:49] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.003 second response time on port 9042
[21:10:57] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[21:14:18] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.997 second response time on port 9042
[21:28:02] <krrrit-wm>	 (03PS1) 10Yuvipanda: k8s: Turn off verbose logging for kube-proxy [puppet] - 10https://gerrit.wikimedia.org/r/247057 
[21:29:08] <krrrit-wm>	 (03PS2) 10Yuvipanda: k8s: Turn off verbose logging for kube-proxy [puppet] - 10https://gerrit.wikimedia.org/r/247057 
[21:29:18] <krrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Turn off verbose logging for kube-proxy [puppet] - 10https://gerrit.wikimedia.org/r/247057 (owner: 10Yuvipanda)
[21:39:39] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[21:41:17] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING
[21:43:42] <krrrit-wm>	 (03CR) 10JanZerebecki: [C: 031] "I use this all the time, from your home dir :) , this is a better place for it." [puppet] - 10https://gerrit.wikimedia.org/r/247031 (owner: 10Legoktm)
[23:02:48] <icinga-wm>	 PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: puppet fail
[23:17:38] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[23:19:18] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[23:31:09] <icinga-wm>	 RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[23:31:27] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out
[23:34:39] <icinga-wm>	 RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.001 second response time on port 9042
[23:39:47] <icinga-wm>	 PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection refused
[23:39:50] <icinga-wm>	 PROBLEM - Analytics Cassandra database on aqs1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon
[23:42:58] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[23:44:37] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[23:49:28] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds
[23:56:27] <icinga-wm>	 RECOVERY - Analytics Cassandra database on aqs1002 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon