[06:09:38] PROBLEM - Check systemd state on wdqs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:11:28] RECOVERY - Check systemd state on wdqs1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:01:27] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @Lucas_WMDE & @James_F - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:51:13] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @James_F - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [21:26:41] PROBLEM - Blazegraph process -wdqs-categories- on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:26:59] PROBLEM - puppet last run on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [21:27:15] PROBLEM - MD RAID on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [21:27:33] PROBLEM - Query Service HTTP Port on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service [21:27:35] PROBLEM - Blazegraph Port for wdqs-categories on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:27:43] PROBLEM - DPKG on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [21:27:47] PROBLEM - Blazegraph Port for wdqs-blazegraph on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:30:01] PROBLEM - Blazegraph process -wdqs-blazegraph- on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:34:39] PROBLEM - Disk space on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1010&var-datasource=eqiad+prometheus/ops [21:43:55] PROBLEM - configured eth on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [21:46:07] PROBLEM - dhclient process on wdqs1010 is CRITICAL: connect to address 10.64.32.63 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [21:54:36] RECOVERY - DPKG on wdqs1010 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [21:54:40] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs1010 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:54:52] RECOVERY - Blazegraph process -wdqs-blazegraph- on wdqs1010 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:54:56] RECOVERY - MD RAID on wdqs1010 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [21:55:04] RECOVERY - Disk space on wdqs1010 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=wdqs1010&var-datasource=eqiad+prometheus/ops [21:55:28] RECOVERY - Query Service HTTP Port on wdqs1010 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.019 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [21:55:28] RECOVERY - dhclient process on wdqs1010 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [21:55:34] RECOVERY - Blazegraph Port for wdqs-categories on wdqs1010 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9990 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [21:59:16] RECOVERY - puppet last run on wdqs1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [22:09:53] RECOVERY - Blazegraph process -wdqs-categories- on wdqs1010 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9990 .* blazegraph-service-.*war https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [22:18:36] RECOVERY - configured eth on wdqs1010 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [23:41:33] RECOVERY - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1973 bytes in 0.103 second response time https://phabricator.wikimedia.org/project/view/71/ [23:43:21] PROBLEM - wikidata-alerts grafana alert on icinga1001 is CRITICAL: CRITICAL: Wikidata Alerts ( https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts ) is alerting: Edits: below 30 per minute (for 2 minutes). https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/Alerts https://grafana.wikimedia.org/d/TUJ0V-0Zk/ [23:45:09] RECOVERY - wikidata-alerts grafana alert on icinga1001 is OK: OK: Wikidata Alerts ( https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts ) is not alerting. https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/Alerts https://grafana.wikimedia.org/d/TUJ0V-0Zk/