[00:01:31] <wikibugs>	 (03PS1) 10BryanDavis: toolsdb: Remove stale accounts if present in maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/418709 (https://phabricator.wikimedia.org/T188680)
[00:06:44] <wikibugs>	 (03PS2) 10BryanDavis: toolsdb: Remove stale accounts if present in maintain-dbusers [puppet] - 10https://gerrit.wikimedia.org/r/418709 (https://phabricator.wikimedia.org/T188680)
[00:22:05] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy
[00:22:14] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy
[00:22:23] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy
[00:22:53] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1052 bytes in 0.002 second response time
[00:23:23] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy
[00:23:33] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy
[00:24:24] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy
[00:28:33] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:30:13] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:30:24] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:31:43] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:32:13] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy
[00:33:33] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy
[00:34:04] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1052 bytes in 0.005 second response time
[00:34:33] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy
[00:35:43] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:36:23] <icinga-wm>	 PROBLEM - HHVM rendering on mw2113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:36:34] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy
[00:36:43] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:37:14] <icinga-wm>	 RECOVERY - HHVM rendering on mw2113 is OK: HTTP OK: HTTP/1.1 200 OK - 75700 bytes in 0.312 second response time
[00:37:29] <paladox>	 elukey ^^ (sorry for ping again :))
[00:38:23] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:38:44] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:40:43] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy
[00:40:43] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy
[00:41:13] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1052 bytes in 0.002 second response time
[00:43:53] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:46:43] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy
[00:48:03] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:49:03] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy
[00:49:43] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:50:04] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:50:43] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy
[00:51:03] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy
[00:52:03] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:53:03] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy
[00:53:04] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:53:33] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:55:03] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy
[00:55:04] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[00:56:54] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy
[00:58:33] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1052 bytes in 0.003 second response time
[00:59:23] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[01:00:13] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[01:02:13] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy
[01:02:43] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:03:23] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[01:03:34] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on druid-public-broker.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1052 bytes in 0.002 second response time
[01:04:23] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy
[01:05:14] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-id}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received
[01:07:13] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy
[01:07:50] <wikibugs>	 (03PS1) 10BryanDavis: wiki replicas: Add spamblacklist to allowed log types [puppet] - 10https://gerrit.wikimedia.org/r/418710 (https://phabricator.wikimedia.org/T184483)
[01:09:44] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy
[01:09:54] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1006 is OK: OK: no difference between hosts in IPVS/PyBal
[01:09:54] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1003 is OK: OK: no difference between hosts in IPVS/PyBal
[01:10:13] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[01:10:14] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1010 is OK: PYBAL OK - All pools are healthy
[01:13:43] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1010 is OK: OK: no difference between hosts in IPVS/PyBal
[01:20:13] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[01:41:16] <bearND>	 I'd have a patch for this ^. Should I deploy it or do we just ack it? https://gerrit.wikimedia.org/r/418711
[02:01:23] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 35 probes of 297 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[02:02:03] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING WARNING - Packet loss = 64%, RTA = 36.69 ms
[02:06:23] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 9 probes of 297 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[02:39:44] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[03:00:53] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.75 ms
[03:01:03] <icinga-wm>	 PROBLEM - HHVM rendering on mw2182 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:01:53] <icinga-wm>	 RECOVERY - HHVM rendering on mw2182 is OK: HTTP OK: HTTP/1.1 200 OK - 74856 bytes in 0.313 second response time
[03:07:34] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[03:23:23] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 41.12 ms
[03:25:15] <wikibugs>	 (03CR) 10Brian Wolff: [C: 031] wiki replicas: Add spamblacklist to allowed log types [puppet] - 10https://gerrit.wikimedia.org/r/418710 (https://phabricator.wikimedia.org/T184483) (owner: 10BryanDavis)
[03:27:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 755.39 seconds
[03:36:55] <icinga-wm>	 PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[04:00:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.84 seconds
[04:00:43] <icinga-wm>	 PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:43] <icinga-wm>	 PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:43] <icinga-wm>	 PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:43] <icinga-wm>	 PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:43] <icinga-wm>	 PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:54] <icinga-wm>	 PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:54] <icinga-wm>	 PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:24] <icinga-wm>	 PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:33] <icinga-wm>	 PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:53] <icinga-wm>	 RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:01:56] <icinga-wm>	 PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:56] <icinga-wm>	 PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:56] <icinga-wm>	 PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:01:56] <icinga-wm>	 PROBLEM - puppet last run on mw1312 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:02:13] <icinga-wm>	 PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:02:13] <icinga-wm>	 PROBLEM - puppet last run on cp4025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:03:23] <icinga-wm>	 PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:04:13] <icinga-wm>	 PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:04:23] <icinga-wm>	 PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:04:24] <icinga-wm>	 PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:04:43] <icinga-wm>	 PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:29:13] <icinga-wm>	 RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[04:29:33] <icinga-wm>	 RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[04:29:33] <icinga-wm>	 RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:29:53] <icinga-wm>	 RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:30:43] <icinga-wm>	 RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:30:43] <icinga-wm>	 RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:30:43] <icinga-wm>	 RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:30:43] <icinga-wm>	 RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:30:43] <icinga-wm>	 RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:31:03] <icinga-wm>	 RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:31:03] <icinga-wm>	 RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:31:24] <icinga-wm>	 RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:31:33] <icinga-wm>	 RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:31:53] <icinga-wm>	 RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:31:54] <icinga-wm>	 RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:31:54] <icinga-wm>	 RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:31:54] <icinga-wm>	 RECOVERY - puppet last run on mw1312 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:32:13] <icinga-wm>	 RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[04:32:13] <icinga-wm>	 RECOVERY - puppet last run on cp4025 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[04:33:23] <icinga-wm>	 RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[05:12:23] <icinga-wm>	 PROBLEM - HHVM rendering on mw1294 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:13:13] <icinga-wm>	 RECOVERY - HHVM rendering on mw1294 is OK: HTTP OK: HTTP/1.1 200 OK - 74747 bytes in 0.144 second response time
[05:22:53] <icinga-wm>	 PROBLEM - MegaRAID on db1073 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
[05:22:54] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on db1073 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T189403
[05:22:58] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T189403#4040910 (10ops-monitoring-bot)
[05:58:53] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[06:11:23] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T189403#4040923 (10Marostegui) p:05Triage>03High a:03Cmjohnson This is m5 master @cmjohnson do you have an used disk somewhere to replace this one? Thanks!
[06:20:34] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[06:36:23] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 37.71 ms
[06:36:33] <icinga-wm>	 PROBLEM - HHVM rendering on mw2129 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:37:23] <icinga-wm>	 RECOVERY - HHVM rendering on mw2129 is OK: HTTP OK: HTTP/1.1 200 OK - 74809 bytes in 0.296 second response time
[06:45:53] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[07:48:13] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[07:58:34] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[07:58:53] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[08:08:43] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[08:09:03] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[08:11:15] <wikibugs>	 (03PS1) 10Elukey: Fix eventlog1002's ipv6 address [dns] - 10https://gerrit.wikimedia.org/r/418714 (https://phabricator.wikimedia.org/T185667)
[08:16:23] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.73 ms
[08:29:45] <elukey>	 aqs/druid failures happened at midnight UTC are due to big queries again, my team is aware and we'll work on it starting tomorrow :)
[08:29:49] <elukey>	 thanks paladox for the ping! 
[08:32:16] <wikibugs>	 10Operations, 10Analytics: Replace eventlog1001's IP with eventlog1002's in analytics-in4 - https://phabricator.wikimedia.org/T189408#4040987 (10Peachey88)
[08:32:42] <wikibugs>	 10Operations, 10Analytics, 10netops: Replace eventlog1001's IP with eventlog1002's in analytics-in4 - https://phabricator.wikimedia.org/T189408#4040988 (10elukey)
[08:50:38] <elukey>	 !log executed sudo rm /etc/logrotate.d/kafkatee-webrequest-analytics on oxygen/rhenium to stop daily cronspam
[08:50:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:31] <wikibugs>	 10Operations, 10LuaSandbox, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10hardware-requests: Strong reduction of computing time at Wikivoyage needed - https://phabricator.wikimedia.org/T189409#4040993 (10RolandUnger)
[09:35:53] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[09:36:43] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[09:45:43] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[09:46:03] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[10:36:03] <wikibugs>	 (03PS3) 10Zoranzoki21: Revert "Restrict FlaggedRevs to only operated on NS_MAIN on arwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418700 (https://phabricator.wikimedia.org/T148603) (owner: 10Ahmed123)
[11:14:54] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:02:43] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 42.21 ms
[12:33:04] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting deployment access for samwilson - https://phabricator.wikimedia.org/T189414#4041118 (10Samwilson)
[12:38:46] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting access to terbium.eqiad.wmnet for bmansurov - https://phabricator.wikimedia.org/T189285#4037542 (10MarcoAurelio) I guess this is what's called `restricted` in the puppet config.
[12:54:23] <icinga-wm>	 PROBLEM - HHVM rendering on mw2192 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:55:14] <icinga-wm>	 RECOVERY - HHVM rendering on mw2192 is OK: HTTP OK: HTTP/1.1 200 OK - 74749 bytes in 0.301 second response time
[13:12:03] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[15:09:02] <wikibugs>	 (03CR) 10Reedy: Disable abusefilter from collecting private data on Beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/416346 (https://phabricator.wikimedia.org/T188862) (owner: 10MarcoAurelio)
[15:15:34] <icinga-wm>	 PROBLEM - Host wdqs2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:20:23] <icinga-wm>	 PROBLEM - HHVM rendering on mw2204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:21:13] <icinga-wm>	 RECOVERY - HHVM rendering on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 74763 bytes in 0.302 second response time
[15:42:03] <icinga-wm>	 RECOVERY - Host wdqs2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 41.80 ms
[20:33:23] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title} (retrieve structured reference data for the Cat article on English Wikipedia) is WARNING: Test retrieve structured reference data for the Cat article on English Wikipedia responds with unexpected value at path /reference_lists[1]/id =
[22:59:36] <icinga-wm>	 PROBLEM - Host db1069 is DOWN: PING CRITICAL - Packet loss = 100%
[23:28:38] <icinga-wm>	 RECOVERY - Host db1069 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms