[00:23:59] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[00:25:21] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 4.380 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[00:42:45] <icinga-wm>	 PROBLEM - HHVM rendering on mw1221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[00:44:03] <icinga-wm>	 RECOVERY - HHVM rendering on mw1221 is OK: HTTP OK: HTTP/1.1 200 OK - 76098 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[00:44:35] <icinga-wm>	 PROBLEM - Nginx local proxy to videoscaler on mw1310 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[00:44:41] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[00:45:55] <icinga-wm>	 RECOVERY - Nginx local proxy to videoscaler on mw1310 is OK: HTTP OK: HTTP/1.1 200 OK - 288 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[00:45:59] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 327 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[01:00:17] <icinga-wm>	 PROBLEM - Nginx local proxy to jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[01:01:33] <icinga-wm>	 RECOVERY - Nginx local proxy to jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 288 bytes in 0.010 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[01:53:25] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1299 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[01:54:43] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1299 is OK: HTTP OK: HTTP/1.1 200 OK - 327 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[01:58:26] <wikibugs>	 (03CR) 10Ladsgroup: "ping :)" [puppet] - 10https://gerrit.wikimedia.org/r/511078 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup)
[02:22:29] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[02:23:57] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 327 bytes in 9.179 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[02:42:43] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1295 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[02:44:07] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1295 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 5.869 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[02:46:17] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1311 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[02:47:35] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.017 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[02:50:41] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1311 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Jobrunner
[02:51:59] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 271 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Jobrunner
[03:34:39] <icinga-wm>	 PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test]
[04:01:47] <icinga-wm>	 RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:37:19] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2030 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[04:46:07] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2030 is OK: OK - running: The system is fully operational
[06:31:17] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[06:31:41] <icinga-wm>	 PROBLEM - puppet last run on theemin is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[06:58:29] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:53] <icinga-wm>	 RECOVERY - puppet last run on theemin is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[08:03:13] <wikibugs>	 (03PS1) 10Elukey: role::analytics_cluster::coordinator: remove port druid host [puppet] - 10https://gerrit.wikimedia.org/r/517203
[08:05:09] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::coordinator: remove port druid host [puppet] - 10https://gerrit.wikimedia.org/r/517203 (owner: 10Elukey)
[08:11:53] <icinga-wm>	 RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational
[08:13:27] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:13:39] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:13:39] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:13:51] <elukey>	 this is probably me
[08:13:59] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:13:59] <elukey>	 checking
[08:13:59] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:14:31] <icinga-wm>	 PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:17:23] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - druid-public-broker_8082: Servers druid1006.eqiad.wmnet, druid1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[08:17:45] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - druid-public-broker_8082: Servers druid1005.eqiad.wmnet, druid1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[08:19:07] <icinga-wm>	 PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[08:19:08] <elukey>	 yep this is definitely me, fixing a maintenance job for druid caused a problem, and aqs -> druid now is not happy
[08:19:53] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([druid1006.eqiad.wmnet, druid1004.eqiad.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[08:20:45] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([druid1005.eqiad.wmnet, druid1004.eqiad.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[08:21:57] <elukey>	 !log roll restart of druid brokers on druid100[4-6], stuck after regular data drop maintenance
[08:22:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:03] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:22:19] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:22:23] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:22:35] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:22:41] <elukey>	 what the hell
[08:23:03] <elukey>	 the data drop job shouldn't cause this mess :(
[08:23:33] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[08:24:01] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:24:33] <icinga-wm>	 RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs
[08:24:37] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[08:25:21] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[08:26:11] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[09:08:59] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on wtp2020 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops
[09:25:41] <icinga-wm>	 PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman
[09:30:01] <icinga-wm>	 RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman
[10:19:43] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqsin is CRITICAL: CRITICAL: No response from remote host 103.102.166.128 for 1.3.6.1.2.1.2.2.1.2 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:21:05] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqsin is OK: OK: host 103.102.166.128, interfaces up: 38, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:23:08] <librenms-wmf>	 04Critical Alert for device cr1-eqsin.wikimedia.org - Device took too long to poll
[10:25:09] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr1-eqsin.wikimedia.org recovered from Device took too long to poll
[10:56:27] <icinga-wm>	 PROBLEM - EDAC syslog messages on wtp2020 is CRITICAL: 4 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops
[11:15:49] <icinga-wm>	 PROBLEM - EDAC syslog messages on db2084 is CRITICAL: 4.63 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db2084&var-datasource=codfw+prometheus/ops
[12:22:15] <wikibugs>	 10Operations, 10DBA: db2084 temporary correctable hardware errors - https://phabricator.wikimedia.org/T225884 (10Marostegui)
[12:22:33] <icinga-wm>	 ACKNOWLEDGEMENT - EDAC syslog messages on db2084 is CRITICAL: 4.5 ge 4 Marostegui T225884 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db2084&var-datasource=codfw+prometheus/ops
[13:36:35] <icinga-wm>	 PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle.
[13:40:46] <paladox>	 Hmm
[14:03:51] <icinga-wm>	 RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:20:43] <Urbanecm>	 !log running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='AKA MBG' /home/urbanecm/T225886
[14:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:45] <wikibugs>	 10Operations, 10ops-codfw: wtp2020: correctable memory errors - https://phabricator.wikimedia.org/T205712 (10jijiki) New alarms going off for this one  ` [Sun Jun 16 08:30:29 2019] mce: [Hardware Error]: Machine check events logged [Sun Jun 16 08:30:29 2019] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [Sun Jun...
[14:22:38] <icinga-wm>	 ACKNOWLEDGEMENT - EDAC syslog messages on wtp2020 is CRITICAL: 4 ge 4 Effie Mouzeli Task already open for these errors - T205712 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops
[14:22:38] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on wtp2020 is CRITICAL: 4.001 ge 4 Effie Mouzeli Task already open for these errors - T205712 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops
[14:38:24] <wikibugs>	 10Operations, 10Traffic: ATS is currently adding its own server header - https://phabricator.wikimedia.org/T224119 (10Antigng) Also, ATS doesn't change the via header as Varnish does.{F29584602}
[15:04:47] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[15:21:13] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on db2043 is CRITICAL: cluster=mysql device=cciss,2 instance=db2043:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2043&var-datasource=codfw+prometheus/ops
[15:46:36] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on db2043 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:3 - OK: 1I:1:1, 1I:1:2, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T225889 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[15:46:39] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10ops-monitoring-bot)
[15:49:38] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on db2043 is CRITICAL: cluster=mysql device=cciss,2 instance=db2043:9100 job=node site=codfw Marostegui T208323 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2043&var-datasource=codfw+prometheus/ops
[15:50:24] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) a:03Papaul @Papaul can we get the disk replaced?  Thanks!
[15:50:34] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) p:05Triage→03Normal
[15:50:49] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[15:59:00] <icinga-wm>	 RECOVERY - ensure kvm processes are running on cloudvirt1015 is OK: PROCS OK: 7 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:06:11] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10User-Zppix, 10cloud-services-team (Kanban): VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 (10Andrew) I'm put eight test VMs on 1015, will let them run for a few days and then see if they're still up :)
[16:22:37] <wikibugs>	 (03Abandoned) 10Ori.livneh: Configure forensic logging of Apache requests; enable on beta [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh)
[19:20:51] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10tramm) >>! In T204056#5260214, @CRoslof wrote: > Domain names with [[ https://en.wikipedia.org/wiki/Country_code_top-level_domain | country code top-level doma...
[20:56:32] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on db2058 is CRITICAL: cluster=mysql device=cciss,3 instance=db2058:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2058&var-datasource=codfw+prometheus/ops
[22:09:04] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:10:00] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:10:58] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS1299/IPv4: Connect, AS1299/IPv6: Active https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[22:11:58] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:12:24] <icinga-wm>	 RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 414, down: 0, shutdown: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[22:12:52] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:47:58] <icinga-wm>	 PROBLEM - Disk space on dbprov1001 is CRITICAL: DISK CRITICAL - free space: /srv 452151 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[23:13:04] <icinga-wm>	 PROBLEM - HP RAID on db2058 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:4 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[23:13:06] <icinga-wm>	 ACKNOWLEDGEMENT - HP RAID on db2058 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:4 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T225902 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[23:13:10] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10ops-monitoring-bot)
[23:28:24] <icinga-wm>	 PROBLEM - Disk space on dbprov1001 is CRITICAL: DISK CRITICAL - free space: /srv 454674 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space
[23:50:06] <icinga-wm>	 RECOVERY - Disk space on dbprov1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space