[01:43:41] <icinga-wm>	 PROBLEM - MegaRAID on db1072 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[01:43:42] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1072 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T199636
[01:43:53] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1072 - https://phabricator.wikimedia.org/T199636 (10ops-monitoring-bot)
[03:27:41] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 888.36 seconds
[03:41:41] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type=create_container https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:42:42] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[03:56:02] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 154.00 seconds
[04:43:31] <icinga-wm>	 PROBLEM - HHVM rendering on mw2218 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:44:22] <icinga-wm>	 RECOVERY - HHVM rendering on mw2218 is OK: HTTP OK: HTTP/1.1 200 OK - 75395 bytes in 0.307 second response time
[05:40:39] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1072 - https://phabricator.wikimedia.org/T199636 (10Marostegui) Can we this disk replaced? Thanks!
[05:40:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1072 - https://phabricator.wikimedia.org/T199636 (10Marostegui) p:05Triage>03Normal
[05:41:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1072 - https://phabricator.wikimedia.org/T199636 (10Marostegui) a:03Cmjohnson
[07:24:11] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1229 is CRITICAL: CRITICAL - load average: 51.54, 35.78, 22.88
[07:27:01] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1286 is CRITICAL: CRITICAL - load average: 58.32, 45.47, 28.74
[07:32:31] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1286 is CRITICAL: CRITICAL - load average: 48.50, 44.55, 33.06
[07:32:52] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1229 is OK: OK - load average: 6.33, 21.35, 23.01
[07:45:41] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1286 is OK: OK - load average: 11.12, 21.98, 28.91
[11:28:41] <icinga-wm>	 PROBLEM - Host cp3033 is DOWN: PING CRITICAL - Packet loss = 100%
[11:29:01] <icinga-wm>	 RECOVERY - Host cp3033 is UP: PING OK - Packet loss = 0%, RTA = 83.65 ms
[11:47:32] <icinga-wm>	 PROBLEM - Host cp3033 is DOWN: PING CRITICAL - Packet loss = 100%
[11:53:31] <icinga-wm>	 PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:53:41] <icinga-wm>	 PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:53:41] <icinga-wm>	 PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:53:41] <icinga-wm>	 PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:53:42] <icinga-wm>	 PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:53:51] <icinga-wm>	 PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:53:51] <icinga-wm>	 PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:53:52] <icinga-wm>	 PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:54:02] <icinga-wm>	 PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:54:11] <icinga-wm>	 PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:54:11] <icinga-wm>	 PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:54:12] <icinga-wm>	 PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:54:12] <icinga-wm>	 PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:54:12] <icinga-wm>	 PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[11:54:21] <icinga-wm>	 PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 52 not-conn: cp3033_v4, cp3033_v6
[11:54:31] <icinga-wm>	 PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 64 not-conn: cp3033_v4, cp3033_v6
[12:03:41] <icinga-wm>	 PROBLEM - cpjobqueue endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:03:51] <icinga-wm>	 PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/base (Get base CSS) timed out before a response was received: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) timed out before a response was received: /_info (retrieve service info) timed out before a response was received: /{domain}/v1/page/metadata/{title}{/revision}{
[12:03:51] <icinga-wm>	 ended metadata for Video article on English Wikipedia) timed out before a response was received
[12:03:51] <icinga-wm>	 PROBLEM - apertium apy on scb2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:03:52] <icinga-wm>	 PROBLEM - mathoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:04:01] <icinga-wm>	 PROBLEM - configured eth on scb2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:04:11] <icinga-wm>	 PROBLEM - DPKG on scb2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:04:11] <icinga-wm>	 PROBLEM - SSH on scb2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:04:12] <icinga-wm>	 PROBLEM - pdfrender on scb2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:04:12] <icinga-wm>	 PROBLEM - eventstreams on scb2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:04:21] <icinga-wm>	 PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/announcements (Retrieve announcements) timed out before a response was received
[12:04:52] <icinga-wm>	 RECOVERY - mathoid endpoints health on scb2001 is OK: All endpoints are healthy
[12:05:01] <icinga-wm>	 RECOVERY - configured eth on scb2001 is OK: OK - interfaces up
[12:05:02] <icinga-wm>	 RECOVERY - SSH on scb2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0)
[12:05:11] <icinga-wm>	 RECOVERY - DPKG on scb2001 is OK: All packages OK
[12:05:11] <icinga-wm>	 RECOVERY - pdfrender on scb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time
[12:05:12] <icinga-wm>	 RECOVERY - eventstreams on scb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 1066 bytes in 0.101 second response time
[12:05:22] <icinga-wm>	 RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy
[12:05:42] <icinga-wm>	 RECOVERY - cpjobqueue endpoints health on scb2001 is OK: All endpoints are healthy
[12:05:52] <icinga-wm>	 RECOVERY - apertium apy on scb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 5996 bytes in 0.074 second response time
[12:06:01] <icinga-wm>	 RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy
[15:34:22] <icinga-wm>	 PROBLEM - puppet last run on scb2004 is CRITICAL: CRITICAL: Puppet has 37 failures. Last run 5 minutes ago with 37 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[cpjobqueue/deploy],Exec[chown /srv/deployment/cpjobqueue for deploy-service],Package[recommendation-api/deploy]
[15:59:51] <icinga-wm>	 RECOVERY - puppet last run on scb2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:47:44] <wikibugs>	 (03PS1) 10Framawiki: Create Reconstruction NS at frwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445929 (https://phabricator.wikimedia.org/T199631)
[21:53:32] <wikibugs>	 (03PS2) 10Framawiki: Create Reconstruction NS at frwikt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/445929 (https://phabricator.wikimedia.org/T199631)