[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201011T0700)
[07:25:59] <Amir1>	 it might be just me but wiki seeems really slow
[07:26:55] <Amir1>	 the graphs look fine
[07:40:27] <icinga-wm>	 PROBLEM - Disk space on an-launcher1002 is CRITICAL: DISK CRITICAL - free space: / 2759 MB (3% inode=88%): /tmp 2759 MB (3% inode=88%): /var/tmp 2759 MB (3% inode=88%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops
[08:06:18] <elukey>	 checking --^
[08:06:56] <elukey>	 ah lovely a daemon spamming logs
[08:12:42] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::balancer: add syslog id to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/633349
[08:18:18] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/25818/an-launcher1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/633349 (owner: 10Elukey)
[08:21:47] <icinga-wm>	 RECOVERY - Disk space on an-launcher1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops
[08:25:04] <wikibugs>	 (03PS1) 10Elukey: Decommission analytics1046 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/633350 (https://phabricator.wikimedia.org/T255140)
[08:25:45] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Decommission analytics1046 from Hadoop [puppet] - 10https://gerrit.wikimedia.org/r/633350 (https://phabricator.wikimedia.org/T255140) (owner: 10Elukey)
[10:22:46] <icinga-wm>	 PROBLEM - SSH on ms-be2036 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:24:20] <icinga-wm>	 RECOVERY - SSH on ms-be2036 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[10:24:32] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2036 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:25:28] <icinga-wm>	 PROBLEM - very high load average likely xfs on ms-be2036 is CRITICAL: CRITICAL - load average: 225.96, 380.55, 248.07 https://wikitech.wikimedia.org/wiki/Swift
[10:30:18] <icinga-wm>	 PROBLEM - MD RAID on ms-be2036 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:30:19] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on ms-be2036 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T265208 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[10:30:23] <wikibugs>	 10Operations, 10ops-codfw: Degraded RAID on ms-be2036 - https://phabricator.wikimedia.org/T265208 (10ops-monitoring-bot)
[10:35:00] <icinga-wm>	 PROBLEM - Disk space on ms-be2036 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdb3 is not accessible: Input/output error https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be2036&var-datasource=codfw+prometheus/ops
[10:38:22] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2050 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:44:38] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2036 is OK: OK - load average: 11.91, 15.79, 77.38 https://wikitech.wikimedia.org/wiki/Swift
[10:44:40] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:47:02] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2039 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:52:54] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:55:56] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2051 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:58:04] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:00:24] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2039 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:05:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2039 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:10:56] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2051 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:15:00] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2050 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:32:12] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2024 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:13:12] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:14:28] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2056 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:32:02] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:33:10] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2056 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:33:44] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[12:57:26] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:41:14] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2052 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:49:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2036 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[16:04:12] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2052 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:35:12] <icinga-wm>	 PROBLEM - SSH on ms-be2029 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:41:52] <icinga-wm>	 RECOVERY - SSH on ms-be2029 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:48:16] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:58:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:09:10] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2050 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:14:24] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2050 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:15:14] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2056 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:33:58] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2056 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:51:00] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:52:40] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:02:49] <wikibugs>	 (03PS1) 10Urbanecm: admin: urbanecm's home: Update .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/633365
[18:43:06] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2052 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:03:48] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2052 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:09:46] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2052 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:44:54] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2029 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:58:36] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2029 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:04:18] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2052 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:33:16] <icinga-wm>	 PROBLEM - SSH on ms-be2056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[22:34:46] <icinga-wm>	 RECOVERY - SSH on ms-be2056 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring