[00:06:44] <icinga-wm>	 RECOVERY - puppet last run on mw1071 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[01:40:21] <krrrit-wm>	 (03PS1) 10Alex Monk: Remove integration-puppetmaster from the labs monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/244948 
[02:25:32] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.27.0-wmf.2/cache/l10n: l10nupdate for 1.27.0-wmf.2 (duration: 06m 17s)
[02:25:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:28:34] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.2) at 2015-10-11 02:28:34+00:00
[02:28:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:33:16] <icinga-wm>	 RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[03:05:23] <icinga-wm>	 PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail
[03:20:24] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds
[03:25:24] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds
[03:34:43] <icinga-wm>	 PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:34:44] <icinga-wm>	 PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:15] <icinga-wm>	 PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:36:54] <icinga-wm>	 PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:01:34] <icinga-wm>	 RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[04:01:35] <icinga-wm>	 RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[04:01:35] <icinga-wm>	 RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[04:02:05] <icinga-wm>	 RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[04:12:37] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected
[04:21:41] <krrrit-wm>	 (03CR) 10Alex Monk: "Also, should this be marked against T108063?" [puppet] - 10https://gerrit.wikimedia.org/r/243357 (owner: 10Alex Monk)
[04:40:33] <icinga-wm>	 PROBLEM - puppet last run on mw2029 is CRITICAL: CRITICAL: puppet fail
[04:57:41] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Oct 11 04:57:40 UTC 2015 (duration 57m 39s)
[04:57:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:09:14] <icinga-wm>	 RECOVERY - puppet last run on mw2029 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[05:17:28] <krrrit-wm>	 (03CR) 10Tim Landscheidt: "a) Yes, that would fix T108063." [puppet] - 10https://gerrit.wikimedia.org/r/243357 (owner: 10Alex Monk)
[05:33:35] <icinga-wm>	 RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[05:52:13] <icinga-wm>	 PROBLEM - Disk space on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:53:44] <icinga-wm>	 RECOVERY - Disk space on cp1059 is OK: DISK OK
[06:05:35] <icinga-wm>	 PROBLEM - puppet last run on cp1059 is CRITICAL: CRITICAL: puppet fail
[06:29:14] <icinga-wm>	 PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: puppet fail
[06:30:43] <icinga-wm>	 PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:55] <icinga-wm>	 PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:23] <icinga-wm>	 PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:33] <icinga-wm>	 PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:43] <icinga-wm>	 PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:54] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:14] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:15] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:35] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:47] <icinga-wm>	 PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:24] <icinga-wm>	 PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:33] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:34:13] <icinga-wm>	 PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:15] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:34] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:55] <icinga-wm>	 PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:15] <icinga-wm>	 PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:36:14] <icinga-wm>	 PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:50:44] <krrrit-wm>	 (03CR) 10MarcoAurelio: [C: 04-1] Add three groups to itwikiversity, and allow sysops to add or remove users to them (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (owner: 10Gerrit Patch Uploader)
[06:51:25] <icinga-wm>	 PROBLEM - Varnish traffic logger - multicast_relay on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:51:34] <icinga-wm>	 PROBLEM - Varnish traffic logger - erbium on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:53:03] <icinga-wm>	 RECOVERY - Varnish traffic logger - multicast_relay on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-multicast_relay.pid, UID = 111 (varnishlog)
[06:53:04] <icinga-wm>	 RECOVERY - Varnish traffic logger - erbium on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-erbium.pid, UID = 111 (varnishlog)
[06:56:13] <icinga-wm>	 RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[06:56:24] <icinga-wm>	 RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[06:56:25] <icinga-wm>	 RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:54] <icinga-wm>	 RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[06:57:03] <icinga-wm>	 RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[06:57:04] <icinga-wm>	 RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[06:57:05] <icinga-wm>	 RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[06:57:14] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[06:57:23] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:44] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[06:57:45] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:53] <icinga-wm>	 RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:58:03] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[06:58:14] <icinga-wm>	 RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:14] <icinga-wm>	 RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:16] <icinga-wm>	 RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[06:58:24] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:44] <icinga-wm>	 RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:59:04] <icinga-wm>	 RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:05:15] <icinga-wm>	 PROBLEM - Varnish traffic logger - erbium on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:08:25] <icinga-wm>	 RECOVERY - Varnish traffic logger - erbium on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-erbium.pid, UID = 111 (varnishlog)
[07:12:09] <krrrit-wm>	 (03PS1) 10Ori.livneh: cirrus tests: skip if full MediaWiki install is not availale [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244949 
[07:12:31] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] cirrus tests: skip if full MediaWiki install is not availale [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244949 (owner: 10Ori.livneh)
[07:12:37] <krrrit-wm>	 (03Merged) 10jenkins-bot: cirrus tests: skip if full MediaWiki install is not availale [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244949 (owner: 10Ori.livneh)
[07:13:02] <krrrit-wm>	 (03PS2) 10Ori.livneh: Add MEDIAWIKI_DBLIST_DIR define, set to MEDIAWIKI_STAGING_DIR by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244740 
[07:17:53] <krrrit-wm>	 (03PS3) 10Ori.livneh: Add MEDIAWIKI_DBLIST_DIR define, set to MEDIAWIKI_STAGING_DIR by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244740 
[07:18:50] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Add MEDIAWIKI_DBLIST_DIR define, set to MEDIAWIKI_STAGING_DIR by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244740 (owner: 10Ori.livneh)
[07:18:56] <krrrit-wm>	 (03Merged) 10jenkins-bot: Add MEDIAWIKI_DBLIST_DIR define, set to MEDIAWIKI_STAGING_DIR by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244740 (owner: 10Ori.livneh)
[07:23:27] <krrrit-wm>	 (03PS1) 10Ori.livneh: Set MEDIAWIKI_DBLIST_DIR to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244950 
[07:23:35] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Set MEDIAWIKI_DBLIST_DIR to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244950 (owner: 10Ori.livneh)
[07:23:41] <krrrit-wm>	 (03Merged) 10jenkins-bot: Set MEDIAWIKI_DBLIST_DIR to /srv/mediawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244950 (owner: 10Ori.livneh)
[07:50:34] <icinga-wm>	 PROBLEM - RAID on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:52:04] <icinga-wm>	 RECOVERY - RAID on cp1059 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[07:53:24] <icinga-wm>	 PROBLEM - salt-minion processes on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:54:54] <icinga-wm>	 RECOVERY - salt-minion processes on cp1059 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[07:56:45] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[07:58:23] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files on cp1059 is OK: OK
[08:03:44] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:05:54] <icinga-wm>	 PROBLEM - Varnish traffic logger - multicast_relay on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:06:55] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp1059 is OK: reload-vcl successfully ran 93h, 52 minutes ago.
[08:07:25] <icinga-wm>	 RECOVERY - Varnish traffic logger - multicast_relay on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-multicast_relay.pid, UID = 111 (varnishlog)
[08:12:14] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:17:44] <icinga-wm>	 PROBLEM - DPKG on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:20:33] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp1059 is OK: reload-vcl successfully ran 94h, 6 minutes ago.
[08:20:43] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:22:43] <icinga-wm>	 RECOVERY - DPKG on cp1059 is OK: All packages OK
[08:23:54] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files on cp1059 is OK: OK
[08:24:44] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/directors.frontend.vcl on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:25:44] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:26:15] <icinga-wm>	 PROBLEM - Disk space on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:26:23] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/directors.frontend.vcl on cp1059 is OK: No errors detected
[08:27:24] <icinga-wm>	 PROBLEM - Varnish HTCP daemon on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:27:58] <icinga-wm>	 PROBLEM - RAID on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:28:55] <icinga-wm>	 PROBLEM - IPsec on cp3017 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[08:29:04] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp1059 is OK: reload-vcl successfully ran 94h, 14 minutes ago.
[08:29:13] <icinga-wm>	 PROBLEM - IPsec on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:29:14] <icinga-wm>	 PROBLEM - configured eth on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:29:34] <icinga-wm>	 RECOVERY - RAID on cp1059 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[08:31:03] <icinga-wm>	 RECOVERY - configured eth on cp1059 is OK: OK - interfaces up
[08:31:23] <icinga-wm>	 RECOVERY - Disk space on cp1059 is OK: DISK OK
[08:31:33] <icinga-wm>	 PROBLEM - Varnish traffic logger - multicast_relay on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:32:43] <icinga-wm>	 RECOVERY - Varnish HTCP daemon on cp1059 is OK: PROCS OK: 1 process with UID = 114 (vhtcpd), args vhtcpd
[08:36:05] <icinga-wm>	 RECOVERY - IPsec on cp1059 is OK: Strongswan OK - 24 ESP OK
[08:36:13] <icinga-wm>	 PROBLEM - configured eth on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:36:13] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:36:36] <icinga-wm>	 RECOVERY - Varnish traffic logger - multicast_relay on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-multicast_relay.pid, UID = 111 (varnishlog)
[08:37:44] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp1059 is OK: reload-vcl successfully ran 94h, 23 minutes ago.
[08:37:45] <icinga-wm>	 RECOVERY - configured eth on cp1059 is OK: OK - interfaces up
[08:37:54] <icinga-wm>	 PROBLEM - Varnish HTCP daemon on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:39:33] <icinga-wm>	 RECOVERY - Varnish HTCP daemon on cp1059 is OK: PROCS OK: 1 process with UID = 114 (vhtcpd), args vhtcpd
[08:39:57] <icinga-wm>	 PROBLEM - RAID on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:41:33] <icinga-wm>	 RECOVERY - RAID on cp1059 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
[08:41:43] <icinga-wm>	 PROBLEM - Varnish traffic logger - multicast_relay on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:44:34] <icinga-wm>	 PROBLEM - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v4
[08:44:34] <icinga-wm>	 PROBLEM - service on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:46:04] <icinga-wm>	 RECOVERY - service on cp1059 is OK: OK - confd is active
[08:46:15] <icinga-wm>	 PROBLEM - Confd vcl based reload on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:46:37] <icinga-wm>	 PROBLEM - DPKG on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:47:35] <icinga-wm>	 RECOVERY - IPsec on cp3017 is OK: Strongswan OK - 8 ESP OK
[08:47:55] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:14] <icinga-wm>	 RECOVERY - Varnish traffic logger - multicast_relay on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-multicast_relay.pid, UID = 111 (varnishlog)
[08:48:23] <icinga-wm>	 RECOVERY - DPKG on cp1059 is OK: All packages OK
[08:49:34] <icinga-wm>	 RECOVERY - IPsec on cp2003 is OK: Strongswan OK - 8 ESP OK
[08:49:34] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files on cp1059 is OK: OK
[08:51:13] <icinga-wm>	 RECOVERY - Confd vcl based reload on cp1059 is OK: reload-vcl successfully ran 94h, 37 minutes ago.
[08:51:44] <icinga-wm>	 PROBLEM - Disk space on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:55:14] <icinga-wm>	 PROBLEM - Confd template for /etc/varnish/directors.frontend.vcl on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:55:54] <icinga-wm>	 PROBLEM - IPsec on cp2015 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v4
[08:56:24] <icinga-wm>	 PROBLEM - service on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:59:23] <icinga-wm>	 RECOVERY - IPsec on cp2015 is OK: Strongswan OK - 8 ESP OK
[09:00:23] <icinga-wm>	 RECOVERY - Confd template for /etc/varnish/directors.frontend.vcl on cp1059 is OK: No errors detected
[09:01:25] <icinga-wm>	 RECOVERY - service on cp1059 is OK: OK - confd is active
[09:03:23] <icinga-wm>	 PROBLEM - Freshness of OCSP Stapling files on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:03:43] <icinga-wm>	 RECOVERY - Disk space on cp1059 is OK: DISK OK
[09:03:44] <icinga-wm>	 PROBLEM - Varnish traffic logger - erbium on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:04:54] <icinga-wm>	 RECOVERY - Freshness of OCSP Stapling files on cp1059 is OK: OK
[09:05:15] <icinga-wm>	 RECOVERY - Varnish traffic logger - erbium on cp1059 is OK: PROCS OK: 1 process with args varnishncsa-erbium.pid, UID = 111 (varnishlog)
[09:09:04] <icinga-wm>	 PROBLEM - IPsec on cp2009 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[09:12:24] <icinga-wm>	 RECOVERY - IPsec on cp2009 is OK: Strongswan OK - 8 ESP OK
[09:21:54] <icinga-wm>	 PROBLEM - service on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:22:59] <krrrit-wm>	 (03PS1) 10MarcoAurelio: Enable Extension:ShortURL on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244953 (https://phabricator.wikimedia.org/T62956) 
[09:23:24] <icinga-wm>	 RECOVERY - service on cp1059 is OK: OK - confd is active
[09:26:53] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:28:25] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1059 is OK: PROCS OK: 3 processes with command name varnishkafka
[09:49:14] <icinga-wm>	 PROBLEM - DPKG on cp1059 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:52:23] <icinga-wm>	 RECOVERY - DPKG on cp1059 is OK: All packages OK
[10:36:54] <icinga-wm>	 PROBLEM - puppet last run on mw2204 is CRITICAL: CRITICAL: Puppet has 1 failures
[11:05:25] <icinga-wm>	 RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:08:24] <wikibugs>	 6operations, 10ops-eqiad: cp1059 has network issues - https://phabricator.wikimedia.org/T114870#1717913 (10BBlack) I set cp1059 into downtime for a week in icinga, as it's been spamming IRC with random check failures, probably from the network port instability.
[11:11:12] <krrrit-wm>	 (03PS6) 10Glaisher: Add patrol, autopatrol, flood group to itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[11:11:40] <krrrit-wm>	 (03CR) 10Glaisher: "Please follow https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[11:18:23] <krrrit-wm>	 (03CR) 10Steinsplitter: "Line 8808: whitespace missing" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[11:45:23] <krrrit-wm>	 (03CR) 10KartikMistry: [C: 031] Fix nbwiki to nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244736 (owner: 10Amire80)
[11:49:53] <icinga-wm>	 PROBLEM - puppet last run on wtp2010 is CRITICAL: CRITICAL: puppet fail
[12:17:42] <wikibugs>	 6operations, 6Labs, 7Database, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1718004 (10Glaisher)
[12:18:34] <icinga-wm>	 RECOVERY - puppet last run on wtp2010 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[12:24:26] <krrrit-wm>	 (03PS7) 10Gerrit Patch Uploader: Add patrol, autopatrol, flood group to itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) 
[12:24:28] <krrrit-wm>	 (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[12:25:47] <krrrit-wm>	 (03CR) 10Luke081515: "standardization of names flooder => flood like at T115200" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[12:45:53] <icinga-wm>	 PROBLEM - IPsec on cp3015 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[12:47:33] <icinga-wm>	 RECOVERY - IPsec on cp3015 is OK: Strongswan OK - 8 ESP OK
[12:59:54] <icinga-wm>	 PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: puppet fail
[13:10:44] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0]
[13:11:51] <krrrit-wm>	 (03PS8) 10Gerrit Patch Uploader: Add patrol, autopatrol, flood group to itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) 
[13:11:53] <krrrit-wm>	 (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[13:19:05] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[13:26:53] <icinga-wm>	 RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[13:39:45] <krrrit-wm>	 (03CR) 10Alex Monk: [C: 031] Enable Extension:ShortURL on bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244953 (https://phabricator.wikimedia.org/T62956) (owner: 10MarcoAurelio)
[14:05:24] <icinga-wm>	 PROBLEM - puppet last run on mw2210 is CRITICAL: CRITICAL: puppet fail
[14:18:14] <icinga-wm>	 PROBLEM - IPsec on cp3016 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[14:21:24] <icinga-wm>	 RECOVERY - IPsec on cp3016 is OK: Strongswan OK - 8 ESP OK
[14:27:29] <krrrit-wm>	 (03PS1) 10Alex Monk: Format Tmax in slow queries page as a number [software/tendril] - 10https://gerrit.wikimedia.org/r/244964 
[14:32:23] <icinga-wm>	 RECOVERY - puppet last run on mw2210 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[14:56:40] <krrrit-wm>	 (03CR) 10JanZerebecki: "This is not in https-everywhere and it has no relation to a domain that needs to be https only. So we could leave the http variant working" [dns] - 10https://gerrit.wikimedia.org/r/244103 (owner: 10Dzahn)
[15:38:17] <krrrit-wm>	 (03CR) 10Alex Monk: [C: 031] Add patrol, autopatrol, flood group to itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[16:12:23] <icinga-wm>	 PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: puppet fail
[16:14:24] <icinga-wm>	 PROBLEM - IPsec on cp3018 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[16:14:40] <krrrit-wm>	 (03PS1) 1001tonythomas: Make mx1001/mx2001 to HTTP POST to meta.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/245128 (https://phabricator.wikimedia.org/T114984) 
[16:16:04] <icinga-wm>	 RECOVERY - IPsec on cp3018 is OK: Strongswan OK - 8 ESP OK
[16:26:55] <icinga-wm>	 PROBLEM - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[16:32:13] <icinga-wm>	 RECOVERY - IPsec on cp2003 is OK: Strongswan OK - 8 ESP OK
[16:39:14] <icinga-wm>	 RECOVERY - puppet last run on cp2020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:58:09] <krrrit-wm>	 (03PS1) 10Alex Monk: ssh-key-ldap-lookup: Don't print whole list of keys as one line before printing each individual one [puppet] - 10https://gerrit.wikimedia.org/r/245137 
[17:35:10] <wikibugs>	 6operations, 10RESTBase: uneven load on restbase workers - https://phabricator.wikimedia.org/T113579#1718205 (10mobrovac) 5Open>3Invalid a:3mobrovac This needs revisiting in case the status quo remains with incresed load. Resolving for now as invalid.
[17:45:13] <icinga-wm>	 PROBLEM - IPsec on cp3015 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[17:46:53] <icinga-wm>	 RECOVERY - IPsec on cp3015 is OK: Strongswan OK - 8 ESP OK
[18:42:25] <icinga-wm>	 PROBLEM - IPsec on cp3017 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[18:43:14] <icinga-wm>	 PROBLEM - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v4
[18:44:05] <icinga-wm>	 RECOVERY - IPsec on cp3017 is OK: Strongswan OK - 8 ESP OK
[18:44:53] <icinga-wm>	 RECOVERY - IPsec on cp2003 is OK: Strongswan OK - 8 ESP OK
[18:50:45] <krrrit-wm>	 (03CR) 10Florianschmidtwelzow: [C: 031] Add patrol, autopatrol, flood group to itwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[19:19:34] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 0 below the confidence bounds
[19:26:14] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 14 data above and 0 below the confidence bounds
[19:57:17] <wikibugs>	 6operations, 10Wikimedia-DNS, 10Wikimedia-Video: Please set up a CNAME for videoserver.wikimedia.org to Video Editing Server - https://phabricator.wikimedia.org/T99216#1718412 (10brion) Ok I suspect what we're going to end up doing is moving the user interface to MediaWiki-integrated code and just use a back...
[20:06:33] <icinga-wm>	 PROBLEM - IPsec on cp4020 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v4
[20:09:54] <icinga-wm>	 RECOVERY - IPsec on cp4020 is OK: Strongswan OK - 8 ESP OK
[20:11:55] <icinga-wm>	 PROBLEM - IPsec on cp2015 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[20:16:54] <icinga-wm>	 RECOVERY - IPsec on cp2015 is OK: Strongswan OK - 8 ESP OK
[20:36:34] <icinga-wm>	 PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds
[20:41:34] <icinga-wm>	 RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212
[20:55:04] <icinga-wm>	 PROBLEM - IPsec on cp4012 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[20:58:24] <icinga-wm>	 RECOVERY - IPsec on cp4012 is OK: Strongswan OK - 8 ESP OK
[21:27:43] <icinga-wm>	 PROBLEM - IPsec on cp2003 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[21:28:03] <krrrit-wm>	 (03PS2) 10Yuvipanda: ldap: Remove leftover debugging in ldap lookup script [puppet] - 10https://gerrit.wikimedia.org/r/245137 (owner: 10Alex Monk)
[21:28:09] <krrrit-wm>	 (03PS3) 10Yuvipanda: ldap: Remove leftover debugging in ldap lookup script [puppet] - 10https://gerrit.wikimedia.org/r/245137 (owner: 10Alex Monk)
[21:28:25] <krrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] ldap: Remove leftover debugging in ldap lookup script [puppet] - 10https://gerrit.wikimedia.org/r/245137 (owner: 10Alex Monk)
[21:29:28] <icinga-wm>	 RECOVERY - IPsec on cp2003 is OK: Strongswan OK - 8 ESP OK
[21:33:09] <krrrit-wm>	 (03PS2) 10Yuvipanda: Remove integration-puppetmaster from the labs monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/244948 (owner: 10Alex Monk)
[21:33:19] <krrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] Remove integration-puppetmaster from the labs monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/244948 (owner: 10Alex Monk)
[21:35:45] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds
[21:36:29] <krrrit-wm>	 (03CR) 10Alex Monk: [C: 031] "listing for deployment during the coming week" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/243517 (https://phabricator.wikimedia.org/T114613) (owner: 10Alex Monk)
[21:36:50] <krrrit-wm>	 (03CR) 10Alex Monk: [C: 031] "listing for deployment during the coming week" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244378 (https://phabricator.wikimedia.org/T113593) (owner: 10Alex Monk)
[21:40:01] <krrrit-wm>	 (03CR) 10Alex Monk: "I may have been mistaken in how these settings work" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244140 (https://phabricator.wikimedia.org/T114873) (owner: 10TTO)
[21:40:54] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds
[21:44:04] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [5000000.0]
[21:44:24] <krrrit-wm>	 (03PS1) 10Alex Monk: Don't require QuickSurveys to use HTTPS links in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245188 (https://phabricator.wikimedia.org/T114485) 
[21:45:54] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds
[21:49:04] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [5000000.0]
[21:52:34] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 2 below the confidence bounds
[21:57:24] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 1.00% above the threshold [1000000.0]
[21:57:35] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 2 below the confidence bounds
[22:02:53] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 2 below the confidence bounds
[22:16:50] <krrrit-wm>	 (03PS1) 10Yuvipanda: Call utcnow on datetime class, not module [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/245192 (https://phabricator.wikimedia.org/T115225) 
[22:17:09] <krrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Call utcnow on datetime class, not module [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/245192 (https://phabricator.wikimedia.org/T115225) (owner: 10Yuvipanda)
[22:20:50] <krrrit-wm>	 (03PS1) 10Ori.livneh: MWWikiversions::readDbListFile(): normalize all paths [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245193 
[22:21:23] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] MWWikiversions::readDbListFile(): normalize all paths [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245193 (owner: 10Ori.livneh)
[22:21:30] <krrrit-wm>	 (03Merged) 10jenkins-bot: MWWikiversions::readDbListFile(): normalize all paths [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245193 (owner: 10Ori.livneh)
[22:22:44] <icinga-wm>	 PROBLEM - IPsec on cp3017 is CRITICAL: Strongswan CRITICAL - ok: 8 connecting: (unnamed)
[22:24:24] <icinga-wm>	 RECOVERY - IPsec on cp3017 is OK: Strongswan OK - 8 ESP OK
[22:26:46] <krrrit-wm>	 (03CR) 10MarcoAurelio: Add patrol, autopatrol, flood group to itwikiversity (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244896 (https://phabricator.wikimedia.org/T114930) (owner: 10Gerrit Patch Uploader)
[22:27:55] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds
[22:30:28] <krrrit-wm>	 (03PS2) 10Yuvipanda: Call utcnow on datetime class, not module [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/245192 (https://phabricator.wikimedia.org/T115225) 
[22:30:54] <yuvipanda>	 Coren: ^
[22:35:06] <krrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] Call utcnow on datetime class, not module [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/245192 (https://phabricator.wikimedia.org/T115225) (owner: 10Yuvipanda)
[22:35:26] <krrrit-wm>	 (03Merged) 10jenkins-bot: Call utcnow on datetime class, not module [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/245192 (https://phabricator.wikimedia.org/T115225) (owner: 10Yuvipanda)
[22:36:43] <krrrit-wm>	 (03PS1) 10MarcoAurelio: Naming standardization from 'flooder' to 'flood' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245194 (https://phabricator.wikimedia.org/T115200) 
[22:41:23] <krrrit-wm>	 (03PS1) 10Ori.livneh: MWWikiversions::readDbListFile() update callers for Ie6c2fd3129dd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245195 
[22:41:28] <krrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] MWWikiversions::readDbListFile() update callers for Ie6c2fd3129dd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245195 (owner: 10Ori.livneh)
[22:42:03] <krrrit-wm>	 (03PS2) 10Ori.livneh: MWWikiversions::readDbListFile() update callers for Ie6c2fd3129dd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245195 
[22:42:20] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] MWWikiversions::readDbListFile() update callers for Ie6c2fd3129dd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245195 (owner: 10Ori.livneh)
[22:42:28] <krrrit-wm>	 (03Merged) 10jenkins-bot: MWWikiversions::readDbListFile() update callers for Ie6c2fd3129dd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245195 (owner: 10Ori.livneh)
[22:43:39] <krrrit-wm>	 (03CR) 10Luke081515: [C: 031] Naming standardization from 'flooder' to 'flood' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/245194 (https://phabricator.wikimedia.org/T115200) (owner: 10MarcoAurelio)
[22:44:54] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds
[23:10:09] <krrrit-wm>	 (03PS9) 10Ori.livneh: Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 
[23:11:25] <icinga-wm>	 PROBLEM - IPsec on cp4011 is CRITICAL: Strongswan CRITICAL - ok: 7 connecting: cp1059_v6
[23:12:03] <krrrit-wm>	 (03PS10) 10Ori.livneh: Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 
[23:12:48] <krrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 (owner: 10Ori.livneh)
[23:12:54] <krrrit-wm>	 (03Merged) 10jenkins-bot: Move *.dblist to dblists/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/175007 (owner: 10Ori.livneh)
[23:13:05] <icinga-wm>	 RECOVERY - IPsec on cp4011 is OK: Strongswan OK - 8 ESP OK
[23:38:35] <icinga-wm>	 PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: puppet fail