[00:43:35] <grrrit-wm>	 (03PS35) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[00:46:37] <grrrit-wm>	 (03PS36) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[00:48:04] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for LabeledSectionTransclusion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281237 (https://phabricator.wikimedia.org/T119117) 
[00:50:11] <grrrit-wm>	 (03PS37) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[00:52:40] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for SpamBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281239 (https://phabricator.wikimedia.org/T119117) 
[00:57:28] <grrrit-wm>	 (03PS38) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[01:02:18] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for TitleBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281240 (https://phabricator.wikimedia.org/T119117) 
[01:10:47] <icinga-wm>	 PROBLEM - Apache HTTP on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:10:57] <icinga-wm>	 PROBLEM - HHVM rendering on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:11:07] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:07] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:26] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:11:26] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:59] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:07] <icinga-wm>	 PROBLEM - RAID on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:26] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:26] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:38] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:56] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:12:56] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:13:52] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for Quiz [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281242 (https://phabricator.wikimedia.org/T119117) 
[01:14:38] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for FundraisingTranslateWorkflow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281243 (https://phabricator.wikimedia.org/T119117) 
[01:17:17] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for Gadgets [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281244 (https://phabricator.wikimedia.org/T119117) 
[01:26:38] <grrrit-wm>	 (03PS39) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[01:35:57] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[01:36:27] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 65.52% of data above the critical threshold [5000000.0]
[01:36:38] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[01:36:38] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[01:36:47] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[01:39:35] <grrrit-wm>	 (03PS40) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[01:41:49] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:42:27] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:42:46] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:42:46] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:44:36] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[01:44:37] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[01:44:37] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[01:44:57] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[01:44:57] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[01:46:51] <grrrit-wm>	 (03PS41) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[01:48:58] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[01:48:58] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[01:49:39] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[01:50:51] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for MwEmbedSupport [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281246 (https://phabricator.wikimedia.org/T119117) 
[01:53:47] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[01:53:48] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[01:54:54] <grrrit-wm>	 (03PS42) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 
[01:55:29] <icinga-wm>	 RECOVERY - RAID on mw1114 is OK: OK: no RAID installed
[02:21:17] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0]
[02:34:07] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:34:07] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:36] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:34:47] <icinga-wm>	 PROBLEM - RAID on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:06] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:06] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:18] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:36] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:37] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:47] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:35:47] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:38:19] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[02:38:57] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[02:39:17] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[02:39:26] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[02:39:26] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[02:39:26] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[02:39:47] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[02:39:47] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[02:40:37] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[02:40:46] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[02:44:17] <icinga-wm>	 RECOVERY - RAID on mw1114 is OK: OK: no RAID installed
[02:49:57] <icinga-wm>	 PROBLEM - RAID on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:16] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:16] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:37] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:56] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:56] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:58] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:50:58] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:51:16] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:51:16] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:51:38] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:14:35] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.19) (duration: 56m 43s)
[03:14:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:14:46] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[03:14:46] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[03:15:07] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[03:15:26] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[03:15:26] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[03:15:26] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[03:15:27] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[03:15:37] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:15:37] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[03:16:06] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[03:16:26] <icinga-wm>	 RECOVERY - RAID on mw1114 is OK: OK: no RAID installed
[03:22:26] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[03:22:26] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[03:22:37] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:22:37] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:22:47] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:23:27] <icinga-wm>	 PROBLEM - RAID on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:23:38] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:23:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:24:06] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:24:36] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:24:56] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:46:16] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[03:46:17] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0
[03:46:38] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[03:46:46] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[03:47:06] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[03:47:08] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[03:47:16] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[03:47:17] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[03:47:17] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[03:47:26] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:47:26] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[03:47:47] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[03:48:07] <icinga-wm>	 RECOVERY - RAID on mw1114 is OK: OK: no RAID installed
[03:52:38] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:52:38] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:52:47] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:52:47] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:53:16] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:53:28] <icinga-wm>	 PROBLEM - RAID on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:53:46] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:53:47] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:54:07] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:54:26] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:54:26] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:01:21] <grrrit-wm>	 (03PS43) 10Ladsgroup: ores: Scap3 deployment configurations [puppet] - 10https://gerrit.wikimedia.org/r/280403 (https://phabricator.wikimedia.org/T130404) 
[04:09:26] <icinga-wm>	 PROBLEM - RAID on db1052 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[04:18:26] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[04:18:26] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[04:18:46] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[04:18:57] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[04:18:57] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[04:18:58] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[04:18:58] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[04:19:07] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:19:07] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[04:19:36] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[04:23:46] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:23:46] <icinga-wm>	 PROBLEM - configured eth on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:24:07] <icinga-wm>	 PROBLEM - HHVM processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:24:27] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:24:27] <icinga-wm>	 PROBLEM - nutcracker port on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:24:37] <icinga-wm>	 PROBLEM - salt-minion processes on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:24:37] <icinga-wm>	 PROBLEM - SSH on mw1114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:24:57] <icinga-wm>	 PROBLEM - Disk space on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:26:08] <icinga-wm>	 PROBLEM - nutcracker process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:26:08] <icinga-wm>	 PROBLEM - DPKG on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:33:26] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[04:38:47] <icinga-wm>	 PROBLEM - dhclient process on mw1114 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[04:45:27] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2001 is CRITICAL: CRITICAL: puppet fail
[04:50:06] <icinga-wm>	 RECOVERY - RAID on mw1114 is OK: OK: no RAID installed
[04:50:07] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1114 is OK: OK: nf_conntrack is 0 % full
[04:50:07] <icinga-wm>	 RECOVERY - configured eth on mw1114 is OK: OK - interfaces up
[04:50:37] <icinga-wm>	 RECOVERY - HHVM processes on mw1114 is OK: PROCS OK: 25 processes with command name hhvm
[04:50:48] <icinga-wm>	 RECOVERY - nutcracker process on mw1114 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[04:50:48] <icinga-wm>	 RECOVERY - DPKG on mw1114 is OK: All packages OK
[04:50:57] <icinga-wm>	 RECOVERY - dhclient process on mw1114 is OK: PROCS OK: 0 processes with command name dhclient
[04:50:57] <icinga-wm>	 RECOVERY - nutcracker port on mw1114 is OK: TCP OK - 0.000 second response time on port 11212
[04:50:57] <icinga-wm>	 RECOVERY - salt-minion processes on mw1114 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[04:50:57] <icinga-wm>	 RECOVERY - SSH on mw1114 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[04:51:26] <icinga-wm>	 RECOVERY - Disk space on mw1114 is OK: DISK OK
[05:13:37] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[06:29:37] <icinga-wm>	 PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:29:47] <icinga-wm>	 PROBLEM - Ubuntu mirror in sync with upstream on carbon is CRITICAL: /srv/mirrors/ubuntu is over 12 hours old.
[06:30:27] <icinga-wm>	 PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:28] <icinga-wm>	 PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:30:47] <icinga-wm>	 PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: puppet fail
[06:31:06] <icinga-wm>	 PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:37] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:37] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:38] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 3 failures
[06:34:38] <icinga-wm>	 PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:46] <icinga-wm>	 PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:57] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:43:57] <icinga-wm>	 RECOVERY - Ubuntu mirror in sync with upstream on carbon is OK: /srv/mirrors/ubuntu is over 0 hours old.
[06:49:59] <grrrit-wm>	 (03PS1) 1001tonythomas: Add Newsletter extension to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281249 (https://phabricator.wikimedia.org/T127297) 
[06:55:37] <grrrit-wm>	 (03CR) 10Addshore: [C: 031] Add Newsletter extension to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281249 (https://phabricator.wikimedia.org/T127297) (owner: 1001tonythomas)
[06:55:57] <icinga-wm>	 RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[06:56:37] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:06] <icinga-wm>	 RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[06:57:27] <icinga-wm>	 RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:46] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[06:57:47] <icinga-wm>	 RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:58] <icinga-wm>	 RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:59:37] <icinga-wm>	 RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:33:08] <icinga-wm>	 PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: puppet fail
[08:01:26] <icinga-wm>	 RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:10:32] <icinga-wm>	 ACKNOWLEDGEMENT - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - The requested table is empty or does not exist Faidon Liambotis Not set up yet
[08:10:58] <icinga-wm>	 RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 0.131 second response time
[08:11:16] <icinga-wm>	 RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 64613 bytes in 0.326 second response time
[08:31:00] <grrrit-wm>	 (03PS1) 10Jforrester: Enable VisualEditor on the Project ('Wikipedya') of htwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281263 (https://phabricator.wikimedia.org/T130177) 
[08:37:57] <icinga-wm>	 RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[08:52:05] <grrrit-wm>	 (03PS1) 10Merlijn van Deen: Install python-numpy and python-pandas [puppet] - 10https://gerrit.wikimedia.org/r/281271 
[08:52:29] <valhallasw`cloud>	 YuviPanda: ^
[08:53:42] <grrrit-wm>	 (03PS2) 10Yuvipanda: Install python-numpy and python-pandas [puppet] - 10https://gerrit.wikimedia.org/r/281271 (owner: 10Merlijn van Deen)
[08:53:49] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] Install python-numpy and python-pandas [puppet] - 10https://gerrit.wikimedia.org/r/281271 (owner: 10Merlijn van Deen)
[08:53:51] <valhallasw`cloud>	 <3
[08:54:21] <YuviPanda>	 valhallasw`cloud: :D thanks! going to bed really soon tho
[08:54:32] <valhallasw`cloud>	 there's probably some opsen here that I can get to revert stuff
[08:58:15] * YuviPanda nods
[08:58:23] <YuviPanda>	 valhallasw`cloud: maybe one day we can get bd808 root
[08:58:59] <bd808>	 YuviPanda: heh. the last time I asked that was a very loud NO! but that was a couple of years ago
[09:01:11] <YuviPanda>	 :)
[09:03:28] <valhallasw`cloud>	 puppet is horribly slow
[09:03:38] <valhallasw`cloud>	 or maybe it's just all of the exec hosts 
[09:03:42] <valhallasw`cloud>	 or nfs, or all of them :{
[09:06:33] <wikibugs>	 6Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 7Tracking: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) (tracking) - https://phabricator.wikimedia.org/T10217#2174690 (10Aklapper) p:5High>3Normal The [[ https://www....
[09:14:39] <grrrit-wm>	 (03PS3) 10Mobrovac: Scap3: chown the target root dir if owned by root [puppet] - 10https://gerrit.wikimedia.org/r/279415 
[09:25:58] <valhallasw`cloud>	 YuviPanda: what on earch. Puppet is still running.
[09:26:28] <grrrit-wm>	 (03PS4) 10Mobrovac: Scap3: chown the target root dir if owned by root [puppet] - 10https://gerrit.wikimedia.org/r/279415 
[09:34:39] <grrrit-wm>	 (03CR) 10Mobrovac: "https://puppet-compiler.wmflabs.org/2284/ is happy" [puppet] - 10https://gerrit.wikimedia.org/r/279415 (owner: 10Mobrovac)
[10:14:18] <grrrit-wm>	 (03PS2) 10Jforrester: Enable VisualEditor Beta Feature on Wikisources, Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280828 
[10:14:20] <grrrit-wm>	 (03PS1) 10Jforrester: Enable VisualEditor Beta Feature on Labs enwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281283 
[10:15:08] <James_F>	 RoanKattouw: ^^
[10:15:29] <grrrit-wm>	 (03CR) 10Catrope: [C: 032] Enable VisualEditor Beta Feature on Labs enwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281283 (owner: 10Jforrester)
[10:16:12] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable VisualEditor Beta Feature on Labs enwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281283 (owner: 10Jforrester)
[10:24:17] <grrrit-wm>	 (03PS15) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) 
[10:26:28] <grrrit-wm>	 (03PS5) 10Mobrovac: Scap3: chown the target root dir if owned by root [puppet] - 10https://gerrit.wikimedia.org/r/279415 
[10:27:21] <grrrit-wm>	 (03CR) 10Mobrovac: "@Ottomata, done in PS15" [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) (owner: 10Mobrovac)
[10:32:36] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/).
[10:33:06] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/).
[10:33:28] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 25.93% of data above the critical threshold [100000000.0]
[10:37:43] <paravoid>	 RoanKattouw: ^^^^
[10:37:53] <RoanKattouw>	 Oh, oops, sorry
[10:37:54] <RoanKattouw>	 Will fix
[10:39:37] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge.
[10:57:15] <wikibugs>	 6Operations, 10ops-eqiad: db1052 degraded RAID - https://phabricator.wikimedia.org/T131701#2175325 (10Volans)
[10:58:08] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on db1052 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) Volans T131701
[10:58:27] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[11:22:07] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:26:28] <grrrit-wm>	 (03PS1) 10Ori.livneh: Load the Newsletter extension on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281288 
[11:29:17] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[11:31:30] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] "beta only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281288 (owner: 10Ori.livneh)
[11:31:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Load the Newsletter extension on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281288 (owner: 10Ori.livneh)
[11:33:57] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[11:34:08] <logmsgbot>	 !log ori@tin Synchronized wmf-config/CommonSettings-labs.php: I3ffe65b8: Load the Newsletter extension on the beta cluster (1/2) (duration: 00m 34s)
[11:34:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:34:42] <logmsgbot>	 !log ori@tin Synchronized wmf-config/InitialiseSettings-labs.php: I3ffe65b8: Load the Newsletter extension on the beta cluster (2/2) (duration: 00m 33s)
[11:34:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:35:04] <wikibugs>	 6Operations, 10ops-codfw, 6DC-Ops: db2018 failed disk (degraded RAID) - https://phabricator.wikimedia.org/T128057#2062646 (10Volans) @Papaul @RobH: any news on this?  In particular for **db2017** (failed), **db2018** (failed) and **db2023** (predicted failure), that are masters in codfw, it would be better t...
[11:35:16] <grrrit-wm>	 (03PS1) 10Ori.livneh: Also add Newsletter to extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281291 
[11:35:39] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Also add Newsletter to extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281291 (owner: 10Ori.livneh)
[11:36:11] <grrrit-wm>	 (03Merged) 10jenkins-bot: Also add Newsletter to extension-list-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281291 (owner: 10Ori.livneh)
[11:37:02] <logmsgbot>	 !log ori@tin Synchronized wmf-config/extension-list-labs: I0d081186: Also add Newsletter to extension-list-labs (duration: 00m 27s)
[11:37:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:45:07] <icinga-wm>	 PROBLEM - cassandra-a service on restbase2004 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[11:45:26] <icinga-wm>	 PROBLEM - cassandra-b service on restbase2004 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed
[11:45:47] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.192.32.137:9042 on restbase2004 is CRITICAL: Connection refused
[12:02:57] <icinga-wm>	 RECOVERY - Disk space on restbase2004 is OK: DISK OK
[12:02:57] <icinga-wm>	 RECOVERY - cassandra-a service on restbase2004 is OK: OK - cassandra-a is active
[12:03:17] <icinga-wm>	 RECOVERY - cassandra-b service on restbase2004 is OK: OK - cassandra-b is active
[12:03:37] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.192.32.137:9042 on restbase2004 is OK: TCP OK - 0.040 second response time on port 9042
[12:10:37] <wikibugs>	 6Operations, 6Project-Admins: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#2175693 (10Aklapper) 5Open>3Resolved Resolving as per last comment.
[12:14:41] <wikibugs>	 6Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 7Tracking: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) (tracking) - https://phabricator.wikimedia.org/T10217#2175710 (10Kaihsu) It has not suddenly become urgent. For th...
[12:29:00] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281303 (https://phabricator.wikimedia.org/T119117) 
[12:35:25] <wikibugs>	 6Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 7Tracking: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) (tracking) - https://phabricator.wikimedia.org/T10217#2175793 (10Liuxinyu970226) >>! In T10217#2175710, @Kaihsu wr...
[12:49:34] <wikibugs>	 6Operations, 10ops-eqiad: db1052 degraded RAID - https://phabricator.wikimedia.org/T131701#2175871 (10Cmjohnson) a:3Cmjohnson
[12:50:09] <grrrit-wm>	 (03PS1) 10Jforrester: Let Beta Cluster Commons do upload-from-URL from production Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281307 
[12:53:24] <grrrit-wm>	 (03CR) 10Reedy: "Do we need to set wgCopyUploadProxy?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281307 (owner: 10Jforrester)
[12:58:57] <grrrit-wm>	 (03CR) 10Jforrester: "I don't know." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281307 (owner: 10Jforrester)
[13:01:02] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] "One way to find out" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281307 (owner: 10Jforrester)
[13:01:13] <Reedy>	 |
[13:01:13] * Reedy reserves the right to say "I told you so"
[13:01:26] <grrrit-wm>	 (03Merged) 10jenkins-bot: Let Beta Cluster Commons do upload-from-URL from production Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281307 (owner: 10Jforrester)
[13:01:48] <grrrit-wm>	 (03PS1) 10Dereckson: Use extension registration for GlobalBlocking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281311 (https://phabricator.wikimedia.org/T119117) 
[13:02:36] <logmsgbot>	 !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: labs copy upload thing. noooooop for prod (duration: 00m 28s)
[13:02:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:07:23] <James_F>	 Reedy: Uh-huh.
[13:08:46] <James_F>	 Reedy: "Copy uploads are not available from this domain."
[13:08:55] <Reedy>	 Whitelist?
[13:09:08] <Reedy>	 Oh
[13:09:12] <Reedy>	 proxy being stupid?
[13:09:21] <James_F>	 Probably.
[13:11:43] <Reedy>	 James_F: I think it's explicitly blacklisted
[13:11:52] <Luke081515>	 Hello, can someone tell me, which name the coursecoordinator group has?
[13:12:00] <Luke081515>	 I need it for a patch
[13:14:36] <James_F>	 Reedy: But it's specified for testwiki in prod…
[13:14:43] <Dereckson>	 Luke081515: ep-coordinator the group
[13:14:45] * James_F checks if it works there or is just bitrot.
[13:15:01] <Dereckson>	 Luke081515: see https://www.mediawiki.org/wiki/Extension:Education_Program/Preferences#Course_coordinators
[13:15:05] <Luke081515>	 thanks
[13:16:32] <James_F>	 Reedy: It works in prod. https://test.wikipedia.org/wiki/File:VisualEditor-logo.svg just now…
[13:16:51] <Reedy>	 James_F: See if flickr type thing works?
[13:16:57] <James_F>	 Kk.
[13:17:01] <Reedy>	 Just to see if it's not just internal stuff
[13:19:38] <James_F>	 Gah. Slow Internet is slow.
[13:20:39] <grrrit-wm>	 (03PS1) 10Luke081515: Two permission changes at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281314 (https://phabricator.wikimedia.org/T131684) 
[13:20:42] <James_F>	 Reedy: Flickr via UploadWizard seemed to work fine (it errored about the licence, which suggests it got it).
[13:21:57] <Reedy>	 Mmm
[13:22:48] <James_F>	 Beta Cluster is magic.
[13:25:27] <Reedy>	 +testwiki' => array( 'upload.wikimedia.org' ),
[13:25:57] <James_F>	 '+commonswiki' => array( 'upload.wikimedia.org' )
[13:26:38] <logmsgbot>	 !log reedy@tin Synchronized wmf-config/InitialiseSettings-labs.php: labs copy upload thing. noooooop for prod (duration: 00m 33s)
[13:26:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:26:58] <James_F>	 Oh. Hah.
[13:27:01] <James_F>	 That might help.
[13:27:05] <Reedy>	 Well, no it won't
[13:27:09] <Reedy>	 That's prod not labs :P
[13:27:14] <Reedy>	 But jenkins might not have done it yet
[13:27:31] <James_F>	 Yup, that works.
[13:27:32] <James_F>	 http://commons.wikimedia.beta.wmflabs.org/wiki/File:VisualEditor-logo.svg
[13:27:41] <James_F>	 Ha. Point.
[13:27:44] <James_F>	 It still now works.
[13:27:45] <James_F>	 :-D
[13:28:09] <Reedy>	 haha
[13:28:12] <Reedy>	 sweet
[13:29:27] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0]
[13:30:07] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures
[13:31:07] <icinga-wm>	 PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures
[13:40:07] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 50.00% above the threshold [1000000.0]
[13:55:38] <icinga-wm>	 RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[13:56:36] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[14:00:12] <wikibugs>	 6Operations, 10ops-eqiad: db1052 degraded RAID - https://phabricator.wikimedia.org/T131701#2176137 (10Volans) @Cmjohnson: given the role of this DB, please sync with me before performing the disk swap.  Btw, do we have a spare available?
[16:53:59] <grrrit-wm>	 (03PS1) 10Urbanecm: Permission change at test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281318 
[16:59:36] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0]
[17:00:16] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1012 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0]
[17:13:47] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0]
[17:14:27] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1012 is OK: OK: Less than 50.00% above the threshold [1000000.0]
[18:26:22] <Danny_B>	 503 Service Temporarily Unavailable 
[18:26:26] <Danny_B>	 https://www.mediawiki.org/wiki/File:Mediawiki-vagrant-screenshot.png
[18:42:37] <grrrit-wm>	 (03PS1) 10ArielGlenn: dumps: fix up one more directory reference for cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/281321 
[18:44:08] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] dumps: fix up one more directory reference for cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/281321 (owner: 10ArielGlenn)
[19:08:59] <Krenair>	 Danny_B, WFM
[19:09:07] <icinga-wm>	 PROBLEM - HHVM rendering on mw1146 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.017 second response time
[19:09:36] <icinga-wm>	 PROBLEM - Apache HTTP on mw1146 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.004 second response time
[19:13:01] <Krenair>	 what's up with mw1146?
[19:13:08] <Krenair>	 Danny_B, okay, I see problems with at least one host
[19:14:17] <icinga-wm>	 RECOVERY - HHVM rendering on mw1146 is OK: HTTP OK: HTTP/1.1 200 OK - 66551 bytes in 1.027 second response time
[19:14:47] <icinga-wm>	 RECOVERY - Apache HTTP on mw1146 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.068 second response time
[19:14:51] <Krenair>	 well... either me restarting apache2+hhvm did the trick, or that was a coincidence
[19:16:05] <hoo>	 Krenair: !log it
[19:16:13] <Krenair>	 yeah I was writing my !log as you sent that
[19:16:19] <Krenair>	 !log mw1146 began to respond with 503s to all requests, tried restarting apache2/hhvm and shortly afterwards it started working again
[19:16:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:16:32] <hoo>	 Ok :)
[19:25:34] <grrrit-wm>	 (03PS2) 10Luke081515: Permission change at test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281318 (https://phabricator.wikimedia.org/T131037) (owner: 10Urbanecm)
[19:26:31] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Permission change at test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281318 (https://phabricator.wikimedia.org/T131037) (owner: 10Urbanecm)
[19:57:03] <grrrit-wm>	 (03Abandoned) 10Tim Landscheidt: Tools: Source python-socketio-client for Trusty from backports [puppet] - 10https://gerrit.wikimedia.org/r/238662 (https://phabricator.wikimedia.org/T91874) (owner: 10Tim Landscheidt)
[20:30:21] <grrrit-wm>	 (03PS1) 10ArielGlenn: delay dumps full run start by one more day this month [puppet] - 10https://gerrit.wikimedia.org/r/281323 
[20:31:45] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] delay dumps full run start by one more day this month [puppet] - 10https://gerrit.wikimedia.org/r/281323 (owner: 10ArielGlenn)
[20:55:40] <grrrit-wm>	 (03PS3) 10Dereckson: Permission change at test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281318 (https://phabricator.wikimedia.org/T131037) (owner: 10Urbanecm)
[20:56:46] <grrrit-wm>	 (03CR) 10Dereckson: [C: 031] "PS3: adding a reference to the task number to ease tracking" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281318 (https://phabricator.wikimedia.org/T131037) (owner: 10Urbanecm)
[21:03:26] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[21:17:27] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:15:27] <ajr>	 Hello ops people! I have a request from an enwiki admin to delete https://en.wikipedia.org/wiki/User_talk:Essjay as part of a history merge; can the servers survive that?
[22:15:43] <ajr>	 it has 5,631 revisions at last count so it shouldn't be *too* strenuous I don't think
[22:17:17] <ajr>	 YuviPanda, are you the person to ask about this? Or some !keyword to get sysadmin attention?
[22:18:37] <ori>	 there were database problems as recently as a few hours ago
[22:19:23] <ori>	 it's almost certainly fine, I think you can go ahead.
[22:19:30] <ajr>	 awesome, thanks :)
[22:24:22] <ajr>	 deleted, nothing seems to be broken yet...
[22:28:02] <ajr>	 and undeleting...
[22:32:56] <ajr>	 it worked and the site is still up! hooray
[22:33:00] <ajr>	 thanks again :)
[22:34:04] <Krinkle>	 !log mwscript deleteEqualMessages.php --wiki eswiki
[22:34:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:37:25] <Krinkle>	 !log mwscript deleteEqualMessages.php --wiki eswiki (T45917)
[22:37:26] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[22:37:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:37:29] <Krinkle>	 !log mwscript deleteEqualMessages.php --wiki eswiki --lang-code ca (T45917)
[22:37:30] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[22:37:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:52:36] <wikibugs>	 6Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Switch CI from jsduck deb package to a gemfile/bundler system - https://phabricator.wikimedia.org/T109005#2176878 (10Krinkle) I'd like to offer an alternative to adding a Gemfile everywhere.  Repositories should not have multiple test entry po...
[23:05:22] <grrrit-wm>	 (03PS1) 10Krinkle: Remove inaccessible symlinks at /w/extensions and /w/skins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281379 (https://phabricator.wikimedia.org/T99096)