[00:03:59] <grrrit-wm>	 (03PS1) 10Dzahn: installserver: let bast4001 use carbon [puppet] - 10https://gerrit.wikimedia.org/r/284992 
[00:04:34] <grrrit-wm>	 (03CR) 10Dzahn: "nothing works, install12001 -> timeout, install1001 -> no tftp serving" [puppet] - 10https://gerrit.wikimedia.org/r/284992 (owner: 10Dzahn)
[00:04:39] <grrrit-wm>	 (03PS7) 1020after4: Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) 
[00:04:48] <grrrit-wm>	 (03PS2) 10Dzahn: installserver: let bast4001 use carbon [puppet] - 10https://gerrit.wikimedia.org/r/284992 
[00:07:15] <Jamesofur>	 !log deactivate phabricator account for ktr101 per global ban
[00:07:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:08:10] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] installserver: let bast4001 use carbon [puppet] - 10https://gerrit.wikimedia.org/r/284992 (owner: 10Dzahn)
[00:29:22] <Krenair>	 since when have you been able to do that Jamesofur?
[00:30:00] <grrrit-wm>	 (03PS8) 1020after4: Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) 
[00:30:03] <Jamesofur>	 Krenair: couple months?
[00:30:22] * Jamesofur was given the keys a bit ago for similar purposes by the gods of phabricator
[00:32:41] <icinga-wm>	 RECOVERY - Host bast4001 is UP: PING OK - Packet loss = 0%, RTA = 74.75 ms
[00:32:53] <grrrit-wm>	 (03PS9) 1020after4: Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) 
[00:33:14] <wikibugs>	 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2231806 (10BBlack) If you have time and want to do it (next week!), by all means go for it, I have lots else to keep me busy indefinitely :)  My basic plan was try it for an hour on a...
[00:37:31] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[00:40:00] <grrrit-wm>	 (03PS2) 10Dzahn: install_server: move tftp role to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/284793 (https://phabricator.wikimedia.org/T132757) 
[00:40:31] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no-op on carbon, install2001, bast4001 ..." [puppet] - 10https://gerrit.wikimedia.org/r/284793 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn)
[00:43:41] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[00:55:32] <grrrit-wm>	 (03PS10) 1020after4: Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) 
[00:56:38] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) (owner: 1020after4)
[00:59:21] <grrrit-wm>	 (03PS11) 1020after4: Automate the generation deployment keys (keyholder-managed ssh keys) [puppet] - 10https://gerrit.wikimedia.org/r/284418 (https://phabricator.wikimedia.org/T133211) 
[01:08:00] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[01:26:00] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[01:32:26] <grrrit-wm>	 (03CR) 10Catrope: [C: 031] Use notify-type-availability due to Echo change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284515 (https://phabricator.wikimedia.org/T132820) (owner: 10Mattflaschen)
[01:37:33] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[01:43:33] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[02:14:57] <grrrit-wm>	 (03PS3) 10Catrope: Use notify-type-availability due to Echo change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284515 (https://phabricator.wikimedia.org/T132820) (owner: 10Mattflaschen)
[02:17:23] <grrrit-wm>	 (03CR) 10Catrope: "PS3 adds back compat code that we will need because we will have some wikis running the new code and the rest running the old code for a f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284515 (https://phabricator.wikimedia.org/T132820) (owner: 10Mattflaschen)
[02:22:49] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.21) (duration: 10m 27s)
[02:22:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:37:36] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[02:43:37] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[03:07:57] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[03:14:06] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[04:37:29] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[04:43:29] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[06:07:45] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[06:13:53] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[06:30:23] <icinga-wm>	 PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: puppet fail
[06:31:04] <icinga-wm>	 PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:14] <icinga-wm>	 PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:15] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: Puppet has 3 failures
[06:32:37] <icinga-wm>	 PROBLEM - puppet last run on mw2016 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:49] <icinga-wm>	 PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:37] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:08] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:48] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:37:47] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[06:39:38] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 3 failures
[06:43:47] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[06:55:47] <icinga-wm>	 RECOVERY - puppet last run on mw2016 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[06:56:07] <icinga-wm>	 RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[06:57:08] <icinga-wm>	 RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:09] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[06:57:17] <icinga-wm>	 RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[06:57:17] <icinga-wm>	 RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:38] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:57:38] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:28] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:06:52] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[07:12:51] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[07:40:47] <icinga-wm>	 PROBLEM - puppet last run on mw2059 is CRITICAL: CRITICAL: puppet fail
[07:46:10] <wikibugs>	 06Operations: Investigate Ubuntu fork of ttf-indic-fonts and bring it in Jessie - https://phabricator.wikimedia.org/T103328#1387212 (10KartikMistry) fonts-indic should be fine (I know I'm late to comment but, still).
[07:52:48] <icinga-wm>	 PROBLEM - RAID on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:53:08] <icinga-wm>	 PROBLEM - configured eth on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:53:27] <icinga-wm>	 PROBLEM - DPKG on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:53:28] <icinga-wm>	 PROBLEM - dhclient process on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:53:38] <icinga-wm>	 PROBLEM - salt-minion processes on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:53:47] <icinga-wm>	 PROBLEM - puppet last run on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:54:09] <icinga-wm>	 PROBLEM - Disk space on hassaleh is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:54:58] <icinga-wm>	 RECOVERY - configured eth on hassaleh is OK: OK - interfaces up
[07:55:09] <icinga-wm>	 RECOVERY - DPKG on hassaleh is OK: All packages OK
[07:55:17] <icinga-wm>	 RECOVERY - dhclient process on hassaleh is OK: PROCS OK: 0 processes with command name dhclient
[07:55:28] <icinga-wm>	 RECOVERY - salt-minion processes on hassaleh is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[07:55:37] <icinga-wm>	 RECOVERY - puppet last run on hassaleh is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures
[07:55:58] <icinga-wm>	 RECOVERY - Disk space on hassaleh is OK: DISK OK
[07:56:38] <icinga-wm>	 RECOVERY - RAID on hassaleh is OK: OK: no RAID installed
[08:08:58] <icinga-wm>	 RECOVERY - puppet last run on mw2059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:27:08] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed Filippo Giunchedi bootstrapping new hardware
[08:37:07] <icinga-wm>	 RECOVERY - cassandra-a service on restbase1015 is OK: OK - cassandra-a is active
[08:43:17] <icinga-wm>	 PROBLEM - cassandra-a service on restbase1015 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed
[08:59:08] <icinga-wm>	 PROBLEM - configured eth on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:17] <icinga-wm>	 PROBLEM - salt-minion processes on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:17] <icinga-wm>	 PROBLEM - Disk space on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:31] <icinga-wm>	 PROBLEM - dhclient process on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:37] <icinga-wm>	 PROBLEM - Check size of conntrack table on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:48] <icinga-wm>	 PROBLEM - puppet last run on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:00:17] <icinga-wm>	 PROBLEM - DPKG on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:00:39] <icinga-wm>	 PROBLEM - RAID on meitnerium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:02:58] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:40:24] <wikibugs>	 07Puppet, 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2232007 (10hashar) Seems `10-self.conf` varies the hostname fqdn :( ```...
[09:49:35] <wikibugs>	 06Operations, 10OfflineContentGenerator, 13Patch-For-Review, 05codfw-rollout: Unable to download PDF files of articles - https://phabricator.wikimedia.org/T133136#2232021 (10Aklapper)
[09:57:24] <wikibugs>	 06Operations, 10Traffic: Seeing desktop text cache while browsing mobile sites - https://phabricator.wikimedia.org/T133441#2232043 (10Southparkfan)
[10:05:55] <icinga-wm>	 PROBLEM - NTP on meitnerium is CRITICAL: NTP CRITICAL: No response from NTP server
[10:06:44] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[10:27:14] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:29:14] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:06:33] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[11:08:34] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:26:54] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[11:33:25] <icinga-wm>	 PROBLEM - puppet last run on graphite2001 is CRITICAL: CRITICAL: puppet fail
[11:37:23] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:43:34] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[11:45:34] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:51:34] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[11:57:45] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:59:55] <icinga-wm>	 RECOVERY - puppet last run on graphite2001 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[12:06:50] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[12:14:50] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:17:10] <icinga-wm>	 PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: Puppet has 1 failures
[12:20:42] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[12:22:41] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:41:54] <icinga-wm>	 RECOVERY - puppet last run on mw2103 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[12:42:54] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[12:44:54] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:50:46] <grrrit-wm>	 (03PS1) 10Urbanecm: Add Subject namespace to hiwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285008 (https://phabricator.wikimedia.org/T133440) 
[13:01:04] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[13:05:14] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:11:14] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 657 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5183040 keys - replication_delay is 657
[13:11:14] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[13:19:15] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5137373 keys - replication_delay is 0
[13:27:23] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:32:53] <grrrit-wm>	 (03PS1) 10Urbanecm: Enable DynamicPageList extension on tewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T133032) 
[13:57:51] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[14:12:11] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:18:21] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[14:24:22] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:30:31] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[14:34:41] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:40:43] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[14:42:51] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:55:01] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[15:11:23] <icinga-wm>	 PROBLEM - Host ns2-v4 is DOWN: PING CRITICAL - Packet loss = 100%
[15:11:23] <icinga-wm>	 PROBLEM - Host eeden is DOWN: PING CRITICAL - Packet loss = 100%
[15:13:03] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:13:14] <icinga-wm>	 RECOVERY - Host eeden is UP: PING OK - Packet loss = 0%, RTA = 83.77 ms
[15:14:44] <icinga-wm>	 RECOVERY - Host ns2-v4 is UP: PING OK - Packet loss = 0%, RTA = 82.83 ms
[15:16:54] <icinga-wm>	 PROBLEM - HHVM rendering on mw1142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:17:44] <icinga-wm>	 PROBLEM - Apache HTTP on mw1142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:18:54] <icinga-wm>	 PROBLEM - HHVM processes on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:19:04] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[15:19:23] <icinga-wm>	 PROBLEM - nutcracker port on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:19:24] <icinga-wm>	 PROBLEM - configured eth on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:19:44] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:19:45] <icinga-wm>	 PROBLEM - dhclient process on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:20:03] <icinga-wm>	 PROBLEM - RAID on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:20:03] <icinga-wm>	 PROBLEM - nutcracker process on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:20:04] <icinga-wm>	 PROBLEM - puppet last run on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:20:25] <icinga-wm>	 PROBLEM - SSH on mw1142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:21:54] <icinga-wm>	 RECOVERY - nutcracker process on mw1142 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[15:23:04] <icinga-wm>	 PROBLEM - Disk space on cp1008 is CRITICAL: DISK CRITICAL - free space: / 342 MB (3% inode=84%)
[15:24:55] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:28:05] <icinga-wm>	 PROBLEM - nutcracker process on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:28:15] <icinga-wm>	 PROBLEM - DPKG on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:28:33] <icinga-wm>	 PROBLEM - salt-minion processes on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:28:35] <icinga-wm>	 PROBLEM - Disk space on mw1142 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:31:14] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[15:33:34] <icinga-wm>	 PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: puppet fail
[15:38:54] <wikibugs>	 06Operations, 10MobileFrontend, 10Traffic: Seeing desktop text cache while browsing mobile sites - https://phabricator.wikimedia.org/T133441#2232357 (10BBlack) p:05Triage>03High This doesn't seem to be a (varnish) cache effect.  I get the same behavior on your test URLs when forcing cache misses.  If thi...
[15:41:02] <bblack>	 anyone who's around, there's something fishy going on with mobile-vs-desktop rendering, noted on foundation wiki only so far: ^
[15:43:33] <icinga-wm>	 RECOVERY - HHVM processes on mw1142 is OK: PROCS OK: 6 processes with command name hhvm
[15:43:44] <icinga-wm>	 RECOVERY - nutcracker port on mw1142 is OK: TCP OK - 0.000 second response time on port 11212
[15:43:45] <icinga-wm>	 RECOVERY - configured eth on mw1142 is OK: OK - interfaces up
[15:44:04] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1142 is OK: OK: nf_conntrack is 0 % full
[15:44:13] <icinga-wm>	 RECOVERY - dhclient process on mw1142 is OK: PROCS OK: 0 processes with command name dhclient
[15:44:24] <icinga-wm>	 RECOVERY - nutcracker process on mw1142 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[15:44:24] <icinga-wm>	 RECOVERY - RAID on mw1142 is OK: OK: no RAID installed
[15:44:33] <icinga-wm>	 RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 58 minutes ago with 0 failures
[15:44:37] <SPF|Cloud>	 bblack: it's completely random. I'm trying to reproduce this on other wikis now, but haven't found anything yet
[15:44:43] <icinga-wm>	 RECOVERY - DPKG on mw1142 is OK: All packages OK
[15:44:53] <icinga-wm>	 RECOVERY - salt-minion processes on mw1142 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[15:44:54] <icinga-wm>	 RECOVERY - SSH on mw1142 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0)
[15:44:55] <icinga-wm>	 RECOVERY - Disk space on mw1142 is OK: DISK OK
[15:45:33] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[15:46:33] <bblack>	 SPF|Cloud: it helps to evade the varnish caches by appending random query URLs, e.g. /wiki/Foo?a3gt89h45h=oeairglj
[15:47:04] <SPF|Cloud>	 Using the Firefox extension for appending X-Wikimedia-Debug should work too
[15:47:12] <SPF|Cloud>	 (I guess?)
[15:47:15] <bblack>	 yeah :)
[15:50:43] <icinga-wm>	 PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: puppet fail
[15:51:22] <grrrit-wm>	 (03PS1) 10Ladsgroup: ores: fix staging configs [puppet] - 10https://gerrit.wikimedia.org/r/285010 
[15:53:43] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[16:00:04] <icinga-wm>	 RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:01:54] <wikibugs>	 06Operations, 10MobileFrontend, 10Traffic: Seeing desktop text cache while browsing mobile sites - https://phabricator.wikimedia.org/T133441#2232375 (10Southparkfan) Yeah, this doesn't seem to be a Varnish problem:  I tried the following to force the mobile version of the site on a non mobile-site (nl.wikipe...
[16:10:42] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[16:22:42] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[16:24:03] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 13.79% of data above the critical threshold [100000000.0]
[16:39:03] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[16:46:32] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[16:49:04] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[16:55:04] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[17:02:41] <grrrit-wm>	 (03PS1) 10Yuvipanda: [WIP] MySQL backend for storing roles / hiera data for labs [puppet] - 10https://gerrit.wikimedia.org/r/285014 (https://phabricator.wikimedia.org/T133412) 
[17:03:44] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [WIP] MySQL backend for storing roles / hiera data for labs [puppet] - 10https://gerrit.wikimedia.org/r/285014 (https://phabricator.wikimedia.org/T133412) (owner: 10Yuvipanda)
[17:07:23] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[17:21:42] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[17:27:43] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[17:29:26] <wikibugs>	 06Operations, 10Ops-Access-Requests: Access Request - https://phabricator.wikimedia.org/T133464#2232536 (10Zppix)
[17:39:14] <wikibugs>	 06Operations, 10Ops-Access-Requests: Access Request - https://phabricator.wikimedia.org/T133464#2232536 (10JanZerebecki) Which permission is required for that?
[17:40:02] <wikibugs>	 06Operations, 10Ops-Access-Requests: Access Request - https://phabricator.wikimedia.org/T133464#2232554 (10JanZerebecki) Context: T131132
[18:15:03] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[18:21:04] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[18:23:12] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[18:27:14] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:29:24] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Server answer
[18:30:43] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:32:29] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1003 is OK: All endpoints are healthy
[18:33:00] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy
[18:33:10] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:35:00] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1001 is OK: All endpoints are healthy
[18:41:06] <wikibugs>	 06Operations, 10Ops-Access-Requests: Access Request - https://phabricator.wikimedia.org/T133464#2232625 (10Krenair) 05Open>03Invalid This is not a server access request, there just needs to be code patches submitted to gerrit, and approved by someone with +2 in the appropriate repository. (+2 rights are re...
[18:58:50] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[19:07:39] <icinga-wm>	 PROBLEM - SSH on meitnerium is CRITICAL: Connection timed out
[19:08:18] <icinga-wm>	 RECOVERY - salt-minion processes on meitnerium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[19:08:28] <icinga-wm>	 RECOVERY - Check size of conntrack table on meitnerium is OK: OK: nf_conntrack is 0 % full
[19:08:39] <icinga-wm>	 RECOVERY - DPKG on meitnerium is OK: All packages OK
[19:08:59] <icinga-wm>	 RECOVERY - Disk space on meitnerium is OK: DISK OK
[19:09:19] <icinga-wm>	 RECOVERY - dhclient process on meitnerium is OK: PROCS OK: 0 processes with command name dhclient
[19:09:24] <_joe_>	 !log rebooted meitnerium
[19:09:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:09:38] <icinga-wm>	 RECOVERY - SSH on meitnerium is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[19:09:48] <icinga-wm>	 RECOVERY - RAID on meitnerium is OK: OK: no RAID installed
[19:09:49] <icinga-wm>	 RECOVERY - configured eth on meitnerium is OK: OK - interfaces up
[19:11:58] <icinga-wm>	 RECOVERY - puppet last run on meitnerium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:50] <icinga-wm>	 RECOVERY - NTP on meitnerium is OK: NTP OK: Offset -0.001602768898 secs
[19:46:47] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[19:48:46] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5133073 keys - replication_delay is 0
[20:13:16] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 623 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5135044 keys - replication_delay is 623
[20:21:18] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5134385 keys - replication_delay is 0
[20:33:46] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[20:35:48] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5134402 keys - replication_delay is 0
[20:45:47] <icinga-wm>	 RECOVERY - Disk space on cp1008 is OK: DISK OK
[20:49:36] <icinga-wm>	 PROBLEM - Varnish HTCP daemon on cp1008 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 114 (vhtcpd), args vhtcpd
[20:51:17] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1008 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[20:54:50] <wikibugs>	 06Operations, 10Wikimedia-Mailing-lists: Reset administrator password for nlcheckuser-l mailing list - https://phabricator.wikimedia.org/T133449#2232733 (10MarcoAurelio) p:05Triage>03Normal
[21:02:25] <bblack>	 re-downtimed cp1008, its long-term downtime had expired heh
[22:22:29] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2064 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.88 seconds
[22:22:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 325.17 seconds
[22:23:19] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 368.13 seconds
[22:24:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2064 is OK: OK slave_sql_lag Replication lag: 0.39 seconds
[22:24:39] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 0.27 seconds
[22:25:28] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 0.22 seconds
[22:47:11] <grrrit-wm>	 (03PS1) 10Nuria: Read values inbound in X-Analytics header (pageview and preview) [puppet] - 10https://gerrit.wikimedia.org/r/285051 (https://phabricator.wikimedia.org/T133204) 
[22:48:20] <grrrit-wm>	 (03PS2) 10Nuria: Read values inbound in X-Analytics header (pageview and preview) [puppet] - 10https://gerrit.wikimedia.org/r/285051 (https://phabricator.wikimedia.org/T133204) 
[23:11:38] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [100000000.0]
[23:16:51] <grrrit-wm>	 (03PS2) 10BryanDavis: logstash: Make truncated MediaWiki json easier to find [puppet] - 10https://gerrit.wikimedia.org/r/278315 
[23:18:04] <grrrit-wm>	 (03PS8) 10BBlack: letsencrypt module guts + acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/283988 (https://phabricator.wikimedia.org/T132812) 
[23:18:06] <grrrit-wm>	 (03PS8) 10BBlack: create letsencrypt module, install acme-tiny [puppet] - 10https://gerrit.wikimedia.org/r/283761 (https://phabricator.wikimedia.org/T132812) (owner: 10Dzahn)
[23:19:34] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] letsencrypt module guts + acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/283988 (https://phabricator.wikimedia.org/T132812) (owner: 10BBlack)
[23:26:55] <grrrit-wm>	 (03PS9) 10BBlack: letsencrypt module guts + acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/283988 (https://phabricator.wikimedia.org/T132812) 
[23:34:55] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[23:35:28] <grrrit-wm>	 (03PS10) 10BBlack: letsencrypt module guts + acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/283988 (https://phabricator.wikimedia.org/T132812) 
[23:44:24] <icinga-wm>	 RECOVERY - Varnish HTCP daemon on cp1008 is OK: PROCS OK: 1 process with UID = 114 (vhtcpd), args vhtcpd
[23:44:25] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1008 is OK: PROCS OK: 1 process with command name varnishkafka
[23:51:03] <grrrit-wm>	 (03PS11) 10BBlack: letsencrypt module guts + acme-setup script [puppet] - 10https://gerrit.wikimedia.org/r/283988 (https://phabricator.wikimedia.org/T132812)