[01:51:05] <grrrit-wm>	 (03PS1) 10Alex Monk: openstack: remove unused volume class, update default version [puppet] - 10https://gerrit.wikimedia.org/r/311304 
[01:56:24] <grrrit-wm>	 (03PS1) 10Alex Monk: openstack: Add basic monitoring for HTTP services [puppet] - 10https://gerrit.wikimedia.org/r/311306 (https://phabricator.wikimedia.org/T42022) 
[01:57:35] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] openstack: Add basic monitoring for HTTP services [puppet] - 10https://gerrit.wikimedia.org/r/311306 (https://phabricator.wikimedia.org/T42022) (owner: 10Alex Monk)
[02:01:32] <grrrit-wm>	 (03PS2) 10Alex Monk: openstack: Add basic monitoring for HTTP services [puppet] - 10https://gerrit.wikimedia.org/r/311306 (https://phabricator.wikimedia.org/T42022) 
[02:15:24] <grrrit-wm>	 (03PS1) 10Alex Monk: openstack: mitaka files/templates: fix puppet header to give correct path [puppet] - 10https://gerrit.wikimedia.org/r/311309 
[02:26:51] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.18) (duration: 10m 18s)
[02:26:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:32:49] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep 18 02:32:49 UTC 2016 (duration 5m 58s)
[02:32:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:37:14] <icinga-wm>	 PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[acme-setup-acme-apt]
[03:13:13] <icinga-wm>	 PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:18:54] <icinga-wm>	 PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:37:44] <icinga-wm>	 RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[03:45:56] <icinga-wm>	 RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[03:53:24] <icinga-wm>	 PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/lib/nagios/plugins/check_raid]
[04:17:56] <icinga-wm>	 RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:31:06] <icinga-wm>	 PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ganglia/conf.d/apache_status.pyconf]
[04:49:47] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[04:49:56] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[04:50:17] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[04:52:16] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[04:55:26] <icinga-wm>	 RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:59:48] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1000.0]
[05:35:27] <icinga-wm>	 PROBLEM - puppet last run on mc2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:39:06] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0]
[05:41:35] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[05:41:54] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[05:46:35] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[05:58:47] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1000.0]
[06:02:29] <icinga-wm>	 RECOVERY - puppet last run on mc2008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:07:54] <icinga-wm>	 PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:11:09] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[06:13:59] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[06:26:07] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:32:37] <icinga-wm>	 RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:38:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:43:11] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[06:52:41] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Not Available - 531 bytes in 0.036 second response time
[07:00:39] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1000.0]
[07:12:39] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3677 bytes in 0.025 second response time
[07:17:50] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[07:30:11] <icinga-wm>	 PROBLEM - puppet last run on mw2217 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:35:13] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[07:37:36] <icinga-wm>	 PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:44:58] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[07:52:21] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[07:57:14] <icinga-wm>	 RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[08:02:25] <icinga-wm>	 RECOVERY - puppet last run on mw2121 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[08:12:17] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[08:14:28] <icinga-wm>	 PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/field.sh]
[08:22:10] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[08:29:42] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [1000.0]
[08:29:53] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [1000.0]
[08:38:03] <wikibugs>	 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2646272 (10Volans)
[08:39:15] <icinga-wm>	 RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:44:36] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[08:46:55] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[08:49:13] <wikibugs>	 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2646296 (10Volans) I'd tend to exclude also: * Ansible because through their Yaml configuration files is not possible to achieve our use cases and their [[http://docs.ansibl...
[08:56:16] <icinga-wm>	 PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:59:17] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[09:14:25] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[09:21:07] <icinga-wm>	 RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[09:26:28] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[09:28:36] <icinga-wm>	 PROBLEM - puppet last run on cp1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:30:16] <_joe_>	 !log varnish-backend-restart on cp1048
[09:30:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:35:26] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[09:47:54] <_joe_>	 !log varnish-backend-restart on cp1063
[09:47:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:48:17] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:48:38] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[09:53:18] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[09:55:38] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[09:55:49] <icinga-wm>	 PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[varnish]
[09:55:57] <icinga-wm>	 PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[varnish]
[09:56:08] <_joe_>	 ?
[09:56:09] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1063 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[09:58:58] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[09:59:53] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:01:20] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[10:02:15] <_joe_>	 all the RAID errors are from swift cluster overloading
[10:02:31] <_joe_>	 which is expected as we had to restart a few eqiad backends
[10:03:39] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:03:40] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:11:01] <icinga-wm>	 RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:11:12] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1063 is OK: PROCS OK: 1 process with command name varnishkafka
[10:15:42] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[10:20:53] <icinga-wm>	 RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[10:24:04] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[10:28:17] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[10:28:35] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[10:30:56] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[10:31:11] <grrrit-wm>	 (03PS1) 10Ema: varnish-backend-restart: fix service invocation [puppet] - 10https://gerrit.wikimedia.org/r/311326 
[10:35:24] <grrrit-wm>	 (03CR) 10Ema: [C: 032] varnish-backend-restart: fix service invocation [puppet] - 10https://gerrit.wikimedia.org/r/311326 (owner: 10Ema)
[10:35:48] <icinga-wm>	 PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[10:36:08] <icinga-wm>	 PROBLEM - HP RAID on ms-be1024 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[10:37:47] <icinga-wm>	 PROBLEM - puppet last run on cp2009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[generate_varnishkafka_webrequest_gmond_pyconf]
[10:40:09] <icinga-wm>	 RECOVERY - puppet last run on cp2009 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[10:40:40] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0]
[10:41:13] <grrrit-wm>	 (03PS1) 10Urbanecm: Add WT namespace alias to NS_PROJECT in mywiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311327 (https://phabricator.wikimedia.org/T140998) 
[10:41:29] <icinga-wm>	 PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[10:47:41] <icinga-wm>	 RECOVERY - puppet last run on cp1073 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[10:48:01] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[10:49:20] <ema>	 !log repooling varnish on cp1074
[10:49:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:49:39] <ema>	 !log repooling varnish on cp1073
[10:49:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:50:11] <ema>	 !log repooling varnish on cp1072
[10:50:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:50:43] <ema>	 !log repooling varnish on cp1071
[10:50:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:51:20] <icinga-wm>	 RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[10:52:36] <ema>	 !log repooling varnish on cp1064
[10:52:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:53:28] <ema>	 !log repooling varnish on cp1062
[10:53:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:54:12] <ema>	 !log repooling varnish on cp1050
[10:54:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:58:23] <icinga-wm>	 RECOVERY - HP RAID on ms-be1024 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[10:58:25] <ema>	 !log varnish-backend restart on cp3044
[10:58:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:59:59] <ema>	 !log varnish-backend restart on cp3037
[11:00:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:00:34] <icinga-wm>	 RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[11:03:11] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[11:08:04] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp3044 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[11:10:34] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp3044 is OK: PROCS OK: 1 process with command name varnishkafka
[11:17:45] <ema>	 !log repooling varnish-be in codfw
[11:17:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[11:17:54] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:31:28] <icinga-wm>	 PROBLEM - puppet last run on elastic2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:35:37] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[11:56:17] <icinga-wm>	 RECOVERY - puppet last run on elastic2007 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures
[13:05:51] <grrrit-wm>	 (03PS1) 10BBlack: cache_upload: increase FE size limit to 2MB [puppet] - 10https://gerrit.wikimedia.org/r/311330 
[13:06:44] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] cache_upload: increase FE size limit to 2MB [puppet] - 10https://gerrit.wikimedia.org/r/311330 (owner: 10BBlack)
[13:14:49] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1064 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[13:19:51] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1064 is OK: PROCS OK: 1 process with command name varnishkafka
[13:29:03] <bblack>	 !log disabling puppet on cp1074, to experiment with vhtcpd regex filter
[13:29:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:51:39] <grrrit-wm>	 (03PS1) 10BBlack: cache_upload: vhtcpd host regex filter [puppet] - 10https://gerrit.wikimedia.org/r/311332 
[13:53:29] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] cache_upload: vhtcpd host regex filter [puppet] - 10https://gerrit.wikimedia.org/r/311332 (owner: 10BBlack)
[13:55:05] <grrrit-wm>	 (03PS2) 10BBlack: cache_upload: vhtcpd host regex filter [puppet] - 10https://gerrit.wikimedia.org/r/311332 
[13:57:33] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] cache_upload: vhtcpd host regex filter [puppet] - 10https://gerrit.wikimedia.org/r/311332 (owner: 10BBlack)
[14:07:53] <grrrit-wm>	 (03PS1) 10BBlack: htcppurger: quoting bugfix for host_regex [puppet] - 10https://gerrit.wikimedia.org/r/311333 
[14:08:10] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] htcppurger: quoting bugfix for host_regex [puppet] - 10https://gerrit.wikimedia.org/r/311333 (owner: 10BBlack)
[14:17:18] <bblack>	 !log restarting varnish backend on cp1073 (503 LRU_Fail pattern, has been up a few days...)
[14:17:19] <icinga-wm>	 PROBLEM - puppet last run on cp2025 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/varnish-backend-restart]
[14:17:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:18:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0]
[14:18:19] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[14:30:46] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[14:30:56] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[14:39:57] <icinga-wm>	 RECOVERY - puppet last run on cp2025 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:42:44] <bblack>	 !log restarting upload varnish backend: cp2022
[14:42:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:52:12] <bblack>	 !log restarting upload varnish backend: cp1049
[14:52:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:00:23] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1072 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[15:06:37] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[15:09:07] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[15:12:38] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1072 is OK: PROCS OK: 1 process with command name varnishkafka
[15:13:00] <bblack>	 !log restarting upload varnish backend: cp2005
[15:13:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:13:29] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1071 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[15:15:59] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1071 is OK: PROCS OK: 1 process with command name varnishkafka
[15:22:30] <grrrit-wm>	 (03PS1) 10BBlack: cache_upload: FE size limit 1MB [puppet] - 10https://gerrit.wikimedia.org/r/311336 
[15:23:03] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] cache_upload: FE size limit 1MB [puppet] - 10https://gerrit.wikimedia.org/r/311336 (owner: 10BBlack)
[15:29:21] <grrrit-wm>	 (03PS4) 10BBlack: cache_upload: one-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) 
[15:29:44] <grrrit-wm>	 (03PS5) 10BBlack: cache_upload: two-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) 
[15:43:11] <bblack>	 !log restarting upload varnish backend: cp2017
[15:43:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:05:39] <icinga-wm>	 PROBLEM - puppet last run on rdb2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:07:44] <icinga-wm>	 PROBLEM - puppet last run on db2053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:33:01] <icinga-wm>	 RECOVERY - puppet last run on rdb2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:35:05] <icinga-wm>	 RECOVERY - puppet last run on db2053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:35:54] <bblack>	 !log restarting upload varnish backend: cp2011
[16:35:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:48:31] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] cache_upload: two-hit-wonder experiment, hit/2+ [puppet] - 10https://gerrit.wikimedia.org/r/308995 (https://phabricator.wikimedia.org/T144187) (owner: 10BBlack)
[16:59:04] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:01:54] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:06:08] <bblack>	 !log restart upload varnish backend: cp2026
[17:06:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:21:40] <icinga-wm>	 PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds.
[17:23:05] <icinga-wm>	 PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:24:06] <icinga-wm>	 RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor
[17:32:13] <bblack>	 !log restart upload varnish backend: cp2020
[17:32:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:32:47] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:42:08] <bblack>	 !log restart upload varnish backend: cp1071
[17:42:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:47:58] <icinga-wm>	 RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:48:57] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1062 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[17:51:31] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1062 is OK: PROCS OK: 1 process with command name varnishkafka
[17:57:19] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[17:58:42] <bblack>	 !log restart upload varnish backend: cp2008
[17:58:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:03:14] <Steinsplitter>	 @seen odder
[18:03:14] <wm-bot>	 Steinsplitter: Last time I saw odder they were quitting the network with reason: Quit: leaving N/A at 8/21/2016 10:28:08 AM (28d7h35m6s ago)
[18:05:54] <Josve05a>	 Guys, we are having thumbail issues on Commons... 5/200 (2.5%) of thumbnails displayed on category pages and file histories are "Missing"....see #wikimeida-tech
[18:06:17] <bblack>	 !log restart upload varnish backend: cp1050 (already in LRU_Fail)
[18:06:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:07:18] <Josve05a>	 (That is 120px thumbnails)
[18:23:30] <grrrit-wm>	 (03PS1) 10BBlack: fix another possible netmapper-1.3+v4 FE crash [puppet] - 10https://gerrit.wikimedia.org/r/311338 
[18:24:11] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] fix another possible netmapper-1.3+v4 FE crash [puppet] - 10https://gerrit.wikimedia.org/r/311338 (owner: 10BBlack)
[18:26:48] <icinga-wm>	 PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:29:28] <icinga-wm>	 RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:48:39] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp3049 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[18:49:38] <wikibugs>	 06Operations, 06Commons, 06Multimedia: Deploy a PHP and HHVM patch (Exif values retrieved incorrectly if they appear before IFD) - https://phabricator.wikimedia.org/T140419#2646724 (10Aklapper)
[19:00:18] <wikibugs>	 06Operations, 10DNS, 10Domains, 10Traffic, and 2 others: Point wikipedia.in to 180.179.52.130 instead of URL forward - https://phabricator.wikimedia.org/T144508#2646737 (10Aklapper) @Naveenpf: Not **here** in T144508. This task is only about pointing wikipedia.in to 180.179.52.130. This task is **not** abo...
[19:03:29] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp3049 is OK: PROCS OK: 1 process with command name varnishkafka
[19:28:15] <bblack>	 !log restart up
[19:28:15] <bblack>	 All status_type @ All / upload (non-PURGE)
[19:28:15] <bblack>	 13:3014:0014:3015:0015:3016:0016:3017:0017:3018:0018:3019:00025 K50 K75 K100 K125 Krate per second
[19:28:18] <bblack>	 get109.8 Kpost962head598options64connect0put0trace0delete0
[19:28:20] <bblack>	 All status_type @ All / upload (PURGE)
[19:28:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:23] <bblack>	 13:3014:0014:3015:0015:3016:0016:3017:0017:3018:0018:3019:00050 K100 K150 K200 Krate per second
[19:28:26] <bblack>	 purge84.0 K
[19:28:28] <bblack>	 bleh
[19:28:57] <bblack>	 !log restart upload backend: cp1064 (already in LRU_Fail, caught early)
[19:29:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:30:44] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2646791 (10Gilles) That sucks, it's not a temp file I deliberately create, it seems to be something ffmpeg creates on its own...  Why didn't "timeout" kill it after a mi...
[19:32:50] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2646807 (10Gilles) Looking at the error mediawiki returns when trying to render the same thumbnail, the symptoms are identical to the case found in T145612
[19:33:49] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1050 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[19:38:50] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1050 is OK: PROCS OK: 1 process with command name varnishkafka
[19:58:36] <bblack>	 !log restart upload backend: cp1074 (stats indicate LRU_Fail imminent)
[19:58:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:22:14] <bblack>	 !log restart upload backend: cp3039
[20:22:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:32:19] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[20:37:21] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[20:37:48] <bblack>	 !log restart upload backend: cp3036
[20:37:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:00:12] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp3034 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[21:02:43] <icinga-wm>	 PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:03:53] <icinga-wm>	 PROBLEM - Varnishkafka log producer on cp1062 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka
[21:12:38] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp3034 is OK: PROCS OK: 1 process with command name varnishkafka
[21:21:10] <icinga-wm>	 RECOVERY - Varnishkafka log producer on cp1062 is OK: PROCS OK: 1 process with command name varnishkafka
[21:27:27] <icinga-wm>	 RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:10:03] <icinga-wm>	 PROBLEM - puppet last run on mw2239 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:23:02] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/spice-html5/spice_sec_auto.html]
[22:35:16] <icinga-wm>	 RECOVERY - puppet last run on mw2239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[22:45:45] <icinga-wm>	 RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures