[00:00:00] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[00:05:09] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[00:05:09] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[00:05:19] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[00:05:59] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[00:06:19] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[00:06:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures
[00:06:38] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[00:06:39] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[00:06:39] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[00:06:39] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[00:06:51] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[00:15:49] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:15:49] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:09] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:10] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:29] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:29] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:29] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:30] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:40] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:59] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:16:59] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:17:08] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:27:19] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[00:27:19] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[00:27:48] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[00:27:48] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures
[00:27:59] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[00:27:59] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[00:27:59] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[00:28:00] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[00:28:18] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[00:28:28] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[00:28:28] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[00:28:39] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[01:15:29] <aude>	 !log applied Ibd302e1 to terbium for debugging broken wikidata rdf dumps
[01:15:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:21:38] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:21:38] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:21:48] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:22:40] <grrrit-wm>	 (03PS1) 10Alex Monk: labs IP aliasing: Print error and continue when not able to get instances for a project [puppet] - 10https://gerrit.wikimedia.org/r/286263 (https://phabricator.wikimedia.org/T133946) 
[01:23:58] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:27:28] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:27:49] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:27:57] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:28:17] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:28:18] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:28:47] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:28:57] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:28:57] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:30:40] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[01:30:47] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[01:30:47] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 34 minutes ago with 0 failures
[01:35:50] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:37:41] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:39:01] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[01:44:50] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:45:01] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:46:50] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[01:50:02] <wikibugs>	 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2253068 (10Krenair) While looking at {T99072} I found another: ```krenair@tools-bastion-03:~$ host 10.68.16.97 97.16.68.10.in-addr.arpa domain name pointer ci-jessi...
[01:52:50] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:54:41] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[01:55:10] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 59 minutes ago with 0 failures
[01:55:10] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[01:55:11] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: CRITICAL: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: LOST
[01:55:11] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[01:55:32] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[01:55:32] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:00:41] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:20] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:20] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:20] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:21] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:50] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:01:50] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:08:01] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 227, down: 0, dormant: 0, excluded: 0, unused: 0
[02:11:30] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[02:11:30] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[02:11:30] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:11:30] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[02:11:40] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[02:11:51] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[02:12:11] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[02:12:20] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[02:12:51] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 17 minutes ago with 0 failures
[02:12:51] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[02:13:00] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[02:17:22] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:20:40] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: CRITICAL: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: LOST
[02:20:50] <icinga-wm>	 PROBLEM - Apache HTTP on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:22:02] <icinga-wm>	 PROBLEM - configured eth on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:22:11] <icinga-wm>	 PROBLEM - HHVM rendering on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:22:31] <icinga-wm>	 PROBLEM - RAID on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:22:51] <icinga-wm>	 PROBLEM - nutcracker port on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:10] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:11] <icinga-wm>	 PROBLEM - nutcracker process on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:21] <icinga-wm>	 PROBLEM - SSH on mw1134 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:23:22] <icinga-wm>	 PROBLEM - DPKG on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:32] <icinga-wm>	 PROBLEM - Disk space on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:32] <icinga-wm>	 PROBLEM - puppet last run on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:51] <icinga-wm>	 PROBLEM - HHVM processes on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:51] <icinga-wm>	 PROBLEM - dhclient process on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:23:52] <icinga-wm>	 PROBLEM - salt-minion processes on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:25:11] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:25:11] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:25:11] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:25:21] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:25:40] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:00] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:11] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:40] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:40] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:40] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:26:41] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:27:01] <icinga-wm>	 RECOVERY - nutcracker process on mw1134 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:28:00] <icinga-wm>	 PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80759 MB (15% inode=99%)
[02:28:23] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[02:28:23] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 33 minutes ago with 0 failures
[02:28:30] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[02:28:31] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[02:28:51] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:28:51] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[02:28:51] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[02:28:52] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[02:29:02] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[02:29:30] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[02:29:42] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[02:29:51] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[02:33:11] <icinga-wm>	 PROBLEM - nutcracker process on mw1134 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:41] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:41] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:41] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:51] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:11] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:11] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:11] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:11] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:21] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:41] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:00] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:11] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:36:20] <icinga-wm>	 RECOVERY - RAID on mw1134 is OK: OK: no RAID installed
[02:36:32] <icinga-wm>	 RECOVERY - nutcracker port on mw1134 is OK: TCP OK - 0.000 second response time on port 11212
[02:36:32] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[02:36:32] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[02:36:40] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 41 minutes ago with 0 failures
[02:36:40] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[02:36:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1134 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.821 second response time
[02:36:51] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1134 is OK: OK: nf_conntrack is 0 % full
[02:37:00] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:37:00] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[02:37:00] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[02:37:00] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[02:37:00] <icinga-wm>	 RECOVERY - nutcracker process on mw1134 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[02:37:02] <icinga-wm>	 RECOVERY - SSH on mw1134 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[02:37:10] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[02:37:11] <icinga-wm>	 RECOVERY - DPKG on mw1134 is OK: All packages OK
[02:37:20] <icinga-wm>	 RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 44 minutes ago with 0 failures
[02:37:20] <icinga-wm>	 RECOVERY - Disk space on mw1134 is OK: DISK OK
[02:37:22] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[02:37:41] <icinga-wm>	 RECOVERY - dhclient process on mw1134 is OK: PROCS OK: 0 processes with command name dhclient
[02:37:41] <icinga-wm>	 RECOVERY - HHVM processes on mw1134 is OK: PROCS OK: 6 processes with command name hhvm
[02:37:42] <icinga-wm>	 RECOVERY - salt-minion processes on mw1134 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:37:42] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[02:37:43] <icinga-wm>	 RECOVERY - configured eth on mw1134 is OK: OK - interfaces up
[02:38:00] <icinga-wm>	 RECOVERY - HHVM rendering on mw1134 is OK: HTTP OK: HTTP/1.1 200 OK - 66385 bytes in 0.337 second response time
[02:39:52] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[03:49:45] <grrrit-wm>	 (03PS1) 10Papaul: DHCP: changing the install to trusty to test since jessie is not detecting the disks Bug: T132976 [puppet] - 10https://gerrit.wikimedia.org/r/286266 (https://phabricator.wikimedia.org/T132976) 
[03:52:26] <grrrit-wm>	 (03Abandoned) 10Papaul: DHCP: changing the install to trusty to test since jessie is not detecting the disks Bug: T132976 [puppet] - 10https://gerrit.wikimedia.org/r/286266 (https://phabricator.wikimedia.org/T132976) (owner: 10Papaul)
[04:03:00] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:09] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:21] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:30] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:49] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:59] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:03:59] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:05:20] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures
[04:05:31] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[04:05:39] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[04:06:00] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:06:19] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[04:06:39] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[04:06:50] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[04:50:20] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:50:20] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:50:40] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:50:41] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:52:09] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[04:52:09] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:52:20] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[04:52:30] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[05:16:18] <icinga-wm>	 PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80692 MB (15% inode=99%)
[05:23:57] <icinga-wm>	 PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80735 MB (15% inode=99%)
[05:54:47] <icinga-wm>	 RECOVERY - Disk space on elastic1017 is OK: DISK OK
[05:59:05] <grrrit-wm>	 (03CR) 10Glaisher: add jamwiki to langlist, InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[06:16:55] <gehel>	 !log restarting elasticsearch server elastic1028.eqiad.wmnet (T110236)
[06:16:56] <stashbot>	 T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236
[06:17:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:29:57] <icinga-wm>	 PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: puppet fail
[06:30:38] <icinga-wm>	 PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:58] <icinga-wm>	 PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail
[06:32:27] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:46] <gehel>	 !log restarting elasticsearch server elastic1029.eqiad.wmnet (T110236)
[06:32:47] <stashbot>	 T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236
[06:32:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:33:27] <icinga-wm>	 PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:43:47] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 225, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR
[06:47:37] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 227, down: 0, dormant: 0, excluded: 0, unused: 0
[06:56:27] <icinga-wm>	 RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[06:56:37] <icinga-wm>	 RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[06:56:47] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[06:57:26] <icinga-wm>	 RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:58:17] <icinga-wm>	 RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:15:55] <gehel>	 !log restarting elasticsearch server elastic1030.eqiad.wmnet (T110236)
[07:15:56] <stashbot>	 T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236
[07:16:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:42:59] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 225, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR
[07:44:51] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 227, down: 0, dormant: 0, excluded: 0, unused: 0
[07:52:40] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:54:29] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy
[08:28:16] <gehel>	 !log restarting elasticsearch server elastic1031.eqiad.wmnet (T110236)
[08:28:17] <stashbot>	 T110236: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236
[08:28:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:40:31] <grrrit-wm>	 (03CR) 10Alex Monk: "also needs to be in various dblists" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[10:08:33] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:08:42] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:02] <icinga-wm>	 PROBLEM - DPKG on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:02] <icinga-wm>	 PROBLEM - Disk space on Hadoop worker on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:03] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:14] <icinga-wm>	 PROBLEM - Disk space on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:15] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:23] <icinga-wm>	 PROBLEM - RAID on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:33] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:09:43] <icinga-wm>	 PROBLEM - configured eth on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:10:12] <icinga-wm>	 PROBLEM - dhclient process on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:10:22] <icinga-wm>	 PROBLEM - Check size of conntrack table on analytics1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:12:43] <icinga-wm>	 RECOVERY - DPKG on analytics1047 is OK: All packages OK
[10:12:43] <icinga-wm>	 RECOVERY - Disk space on Hadoop worker on analytics1047 is OK: DISK OK
[10:13:13] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[10:13:44] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table bawiktionary.hitcounter doesnt exist on query. Default database: information_schema. Query: DELETE FROM bawiktionary.hitcounter
[10:13:54] <icinga-wm>	 RECOVERY - Check size of conntrack table on analytics1047 is OK: OK: nf_conntrack is 0 % full
[10:14:12] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures
[10:14:54] <icinga-wm>	 RECOVERY - Disk space on analytics1047 is OK: DISK OK
[10:14:58] * volans looking at dbstore1001 ^^^
[10:17:03] <icinga-wm>	 RECOVERY - RAID on analytics1047 is OK: OK: optimal, 13 logical, 14 physical
[10:17:13] <icinga-wm>	 RECOVERY - configured eth on analytics1047 is OK: OK - interfaces up
[10:17:43] <icinga-wm>	 RECOVERY - dhclient process on analytics1047 is OK: PROCS OK: 0 processes with command name dhclient
[10:18:12] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[10:18:34] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1047 is OK: OK: YARN NodeManager analytics1047.eqiad.wmnet:8041 Node-State: RUNNING
[10:18:43] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1047 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[10:19:00] <wikibugs>	 06Operations, 06Discovery, 03Discovery-Search-Sprint, 07Elasticsearch, 13Patch-For-Review: Use unicast instead of multicast for node communication - https://phabricator.wikimedia.org/T110236#2253327 (10Gehel) First restart to enable unicast completed on eqiad and codfw. Second restart to come...
[10:19:34] <volans>	 !log restarted slave on dbstore1001 skipping missing database T132837
[10:19:35] <stashbot>	 T132837: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837
[10:19:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:21:33] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[10:25:15] <volans>	 I'll expect it to break again later for the other missing ones, I'll keep an eye on it to fix it
[10:30:41] <jynus>	 that was my fault, I reimaged db1038 on thursday
[10:31:08] <volans>	 yep, no problem :)
[10:45:30] <volans>	 !log Reset slave on sanitarium:3311 due to corrupted relay log after skipping query for duplicate key T132416
[10:45:31] <stashbot>	 T132416: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416
[10:45:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:47:47] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2252547 (10Mtherwjs) jam.wikipedia.org redirected https://incubator.wikimedia.org/w/index.php?title=Wp/jam/Mien_Piej&redirectfrom=infopage
[11:02:36] <wikibugs>	 06Operations, 10DBA: Implement slave_run_triggers_for_rbr at sanitarium for labs filtering - https://phabricator.wikimedia.org/T121207#2253354 (10Volans)
[11:40:02] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table zh_cnwiki.hitcounter doesnt exist on query. Default database: information_schema. Query: DELETE FROM zh_cnwiki.hitcounter
[11:44:33] <volans>	 already fixed, was the last one...
[11:45:43] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[13:41:58] <elukey>	 !log disabled puppet on analytics1047 and scheduled downtime for the host, IO errors in the dmesg for /dev/sdd. Stopped also Hadoop daemons to remove it from the cluster temporarily (not sure how to do it properly, will write docs).
[13:42:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:55:46] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2253604 (10elukey)
[14:15:48] <grrrit-wm>	 (03CR) 10Dereckson: add jamwiki to langlist, InitialiseSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[14:35:34] <wikibugs>	 07Puppet, 10Beta-Cluster-Infrastructure: Set up puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#2253666 (10Aklapper)
[14:55:45] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253697 (10Mtherwjs)
[15:02:58] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 660 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5211633 keys - replication_delay is 660
[15:08:47] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5202858 keys - replication_delay is 0
[15:30:02] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253719 (10Glaisher) Could someone provide the translations for the namespace names? If possible, please provide them in the format below with the E...
[15:40:07] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253738 (10Dereckson)
[15:46:57] <grrrit-wm>	 (03PS1) 10Dereckson: Add jam.wikipedia to RESTBase and Labs dnsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/286278 (https://phabricator.wikimedia.org/T134017) 
[15:48:07] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add jam.wikipedia to RESTBase and Labs dnsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/286278 (https://phabricator.wikimedia.org/T134017) (owner: 10Dereckson)
[15:53:39] <grrrit-wm>	 (03PS2) 10Dereckson: Add jam.wikipedia to RESTBase and Labs dnsrecursor [puppet] - 10https://gerrit.wikimedia.org/r/286278 (https://phabricator.wikimedia.org/T134017) 
[16:07:26] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 688 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5211203 keys - replication_delay is 688
[16:08:15] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2252547 (10Dereckson) Another l10n effort is needed for the upload wizard: https://commons.wikimedia.org/wiki/Special:UploadWizard?uselang=jam
[16:13:29] <grrrit-wm>	 (03PS5) 10Dereckson: Initialize configuration for jam.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[16:15:16] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5206022 keys - replication_delay is 0
[16:15:35] <grrrit-wm>	 (03CR) 10Dereckson: "PS5: HD logos, removed regular l10n namespaces, dblists" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[16:16:05] <grrrit-wm>	 (03CR) 10Dereckson: "Changes planned. I forgot the HD logos in InitialiseSettings.php." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[16:17:23] <grrrit-wm>	 (03PS6) 10Dereckson: Initialize configuration for jam.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[16:17:56] <grrrit-wm>	 (03CR) 10Dereckson: "PS6: HD logo in config too (yes for optipng)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286258 (https://phabricator.wikimedia.org/T134017) (owner: 10Dzahn)
[16:20:25] <Nemo_bis>	 moar jam
[16:24:20] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253783 (10MF-Warburg) I'm fine with requiring the namespace translations (is there no way anymore to translate them through twn?), but the upload w...
[16:32:17] <wikibugs>	 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2253793 (10Dereckson) Sure, we agree, this is not a blocker.
[18:09:38] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:10:09] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:10:19] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:11:50] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy
[18:12:09] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy
[18:15:18] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1001 is OK: All endpoints are healthy
[18:17:38] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: /pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end} (Get per article page views) is CRITICAL: Test Get per article page views returned the unexpected status 500 (expecting: 200)
[18:19:28] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy
[18:35:16] <icinga-wm>	 PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: puppet fail
[19:00:07] <icinga-wm>	 RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[19:27:06] <grrrit-wm>	 (03PS1) 10Urbanecm: Add interface editor user group on pswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286285 (https://phabricator.wikimedia.org/T133472) 
[19:38:38] <grrrit-wm>	 (03PS1) 10Urbanecm: Enable user signature in VE in plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286286 (https://phabricator.wikimedia.org/T133978) 
[20:19:30] <grrrit-wm>	 (03PS1) 10Urbanecm: Enable Visual Editor on all namespaces of plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286287 (https://phabricator.wikimedia.org/T133980) 
[20:19:49] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Enable Visual Editor on all namespaces of plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286287 (https://phabricator.wikimedia.org/T133980) (owner: 10Urbanecm)
[20:23:28] <grrrit-wm>	 (03PS2) 10Urbanecm: Enable Visual Editor on all namespaces of plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/286287 (https://phabricator.wikimedia.org/T133980) 
[20:56:22] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[20:58:22] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5222062 keys - replication_delay is 0
[23:49:03] <icinga-wm>	 PROBLEM - DPKG on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:49:11] <icinga-wm>	 PROBLEM - Disk space on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:49:22] <icinga-wm>	 PROBLEM - RAID on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:49:41] <icinga-wm>	 PROBLEM - configured eth on furud is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[23:49:42] <icinga-wm>	 PROBLEM - puppet last run on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:50:01] <icinga-wm>	 PROBLEM - dhclient process on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:50:32] <icinga-wm>	 PROBLEM - salt-minion processes on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:50:33] <icinga-wm>	 PROBLEM - Check size of conntrack table on furud is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.