[02:03:33] RECOVERY - WDQS SPARQL on wdqs2008 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.224 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:15:49] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:20:27] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 688 bytes in 1.053 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [07:53:01] PROBLEM - WDQS SPARQL on wdqs1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [08:19:47] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:44:31] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:56:31] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:15:45] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:27:57] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:47:07] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:59:31] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:17:17] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:30:31] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:50:27] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:51:45] PROBLEM - Query Service HTTP Port on wdqs1011 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [11:02:45] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:22:03] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:34:19] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:46:45] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:11:29] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:18:47] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:38:13] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:45:39] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:09:55] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:14:47] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:27:13] PROBLEM - Check systemd state on wdqs1011 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:29:35] RECOVERY - Check systemd state on wdqs1011 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:41:25] PROBLEM - puppet last run on wdqs1011 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [16:27:23] PROBLEM - SSH on wdqs1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [18:42:09] RECOVERY - SSH on wdqs1011 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [18:43:57] RECOVERY - Query Service HTTP Port on wdqs1011 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.021 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [18:43:57] RECOVERY - puppet last run on wdqs1011 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [18:43:58] RECOVERY - WDQS SPARQL on wdqs1011 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.074 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [22:26:08] ACKNOWLEDGEMENT - WDQS high update lag on wdqs1011 is CRITICAL: 1.65e+04 ge 3600 Gehel catching up on lag after freeze https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen