[03:25:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 786.06 seconds
[03:56:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 192.20 seconds
[05:01:44] <wikibugs>	 (03PS1) 10Gergő Tisza: Enable loginOnly mode for local auth provider on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409638 (https://phabricator.wikimedia.org/T57420)
[05:57:09] <icinga-wm>	 PROBLEM - HP RAID on ms-be1018 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging
[05:57:13] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on ms-be1018 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, 2I:4:1, 2I:4:2 - Controller: OK - Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T186988
[05:57:17] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on ms-be1018 - https://phabricator.wikimedia.org/T186988#3961204 (10ops-monitoring-bot)
[07:12:58] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[07:13:19] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[07:49:29] <wikibugs>	 (03PS1) 10Gergő Tisza: Re-enable cron job for purging ReadingLists data [puppet] - 10https://gerrit.wikimedia.org/r/409645
[07:49:58] <wikibugs>	 (03PS2) 10Gergő Tisza: Re-enable cron job for purging ReadingLists data [puppet] - 10https://gerrit.wikimedia.org/r/409645 (https://phabricator.wikimedia.org/T181107)
[07:50:11] <wikibugs>	 (03PS3) 10Gergő Tisza: Re-enable cron job for purging ReadingLists data [puppet] - 10https://gerrit.wikimedia.org/r/409645 (https://phabricator.wikimedia.org/T181107)
[07:57:19] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[07:57:59] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[08:00:59] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[08:01:19] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[08:11:09] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[08:11:28] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[08:14:18] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[08:14:28] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[08:21:18] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[08:21:29] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[08:30:28] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[08:30:38] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[08:33:28] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[08:33:38] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[08:42:17] <wikibugs>	 (03CR) 10Jayprakash12345: [C: 031] "please do it as soon as possible, :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408045 (https://phabricator.wikimedia.org/T185347) (owner: 10Urbanecm)
[09:59:28] <icinga-wm>	 PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received
[10:00:19] <icinga-wm>	 RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy
[14:04:19] <icinga-wm>	 PROBLEM - Disk space on mx2001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/scan is not accessible: Permission denied
[14:06:43] <moritzm>	 !log installing exim4 security updates on MXs
[14:06:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:07:19] <icinga-wm>	 RECOVERY - Disk space on mx2001 is OK: DISK OK
[14:56:35] <wikibugs>	 (03CR) 10Anomie: [C: 031] "Seems sane. Haven't tested." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409638 (https://phabricator.wikimedia.org/T57420) (owner: 10Gergő Tisza)
[15:56:18] <icinga-wm>	 PROBLEM - puppet last run on analytics1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:21:18] <icinga-wm>	 RECOVERY - puppet last run on analytics1060 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[16:51:59] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1302 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:53:18] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1302 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[16:59:28] <icinga-wm>	 PROBLEM - puppet last run on db1077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:24:28] <icinga-wm>	 RECOVERY - puppet last run on db1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:40:41] <wikibugs>	 (03CR) 10星耀晨曦: "Is there a SWAT deployer to deploy this patch? If this patch is no problem, please deploy it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406487 (https://phabricator.wikimedia.org/T184866) (owner: 10星耀晨曦)
[17:45:27] <wikibugs>	 10Operations, 10Wikidata: Badges not displaying on trwiki - https://phabricator.wikimedia.org/T186815#3961602 (10Sjoerddebruin)
[18:13:38] <icinga-wm>	 PROBLEM - Host dbproxy1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:18:48] <icinga-wm>	 RECOVERY - Host dbproxy1001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.03 ms
[18:48:36] <wikibugs>	 (03PS4) 10ArielGlenn: split up flow dumps into stubs and content passes [dumps] - 10https://gerrit.wikimedia.org/r/355077 (https://phabricator.wikimedia.org/T164262)
[18:48:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] split up flow dumps into stubs and content passes [dumps] - 10https://gerrit.wikimedia.org/r/355077 (https://phabricator.wikimedia.org/T164262) (owner: 10ArielGlenn)
[20:22:58] <icinga-wm>	 PROBLEM - Varnish HTTP text-backend - port 3128 on cp4029 is CRITICAL: connect to address 10.128.0.129 and port 3128: Connection refused
[20:23:59] <icinga-wm>	 RECOVERY - Varnish HTTP text-backend - port 3128 on cp4029 is OK: HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.157 second response time
[20:30:41] <wikibugs>	 (03PS1) 10Gergő Tisza: Increase ReadingLists item limit to 5k [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409712 (https://phabricator.wikimedia.org/T186296)
[20:35:39] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 38.68, 34.78, 32.23
[21:16:48] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 40.04, 34.31, 32.08
[21:20:48] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 39.51, 34.52, 32.50
[21:32:48] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 40.11, 34.02, 32.69
[21:36:49] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 36.55, 32.90, 32.34
[22:07:50] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 43.79, 35.65, 31.85
[22:51:12] <wikibugs>	 10Operations, 10DBA, 10Performance-Team, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication and sharding handling - https://phabricator.wikimedia.org/T133523#3961878 (10jcrespo) See my latest comments on: T167784#3961866  >  The third one is a bigger question regarding active-acti...
[23:22:38] <wikibugs>	 (03PS2) 10Gergő Tisza: Increase ReadingLists item limit to 5k [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409712 (https://phabricator.wikimedia.org/T186296)
[23:43:59] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 20.28, 22.65, 23.93
[23:51:28] <wikibugs>	 (03CR) 10Jcrespo: "count(*) are not accelerated by indexes, at least not significantly." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409712 (https://phabricator.wikimedia.org/T186296) (owner: 10Gergő Tisza)