[03:17:33] PROBLEM - WDQS SPARQL on wdqs1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [03:40:55] PROBLEM - SSH on wdqs2007 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [10:09:02] RECOVERY - WDQS SPARQL on wdqs1013 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.074 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [15:36:15] RECOVERY - Check systemd state on wdqs2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:38:39] RECOVERY - Blazegraph Port for wdqs-blazegraph on wdqs2001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [15:38:39] RECOVERY - Blazegraph process -wdqs-blazegraph- on wdqs2001 is OK: PROCS OK: 1 process with UID = 499 (blazegraph), regex args ^java .* --port 9999 .* blazegraph-service-.*war https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [15:38:39] RECOVERY - Query Service HTTP Port on wdqs2001 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.022 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [15:38:41] RECOVERY - WDQS SPARQL on wdqs2001 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.210 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook [16:00:54] PROBLEM - WDQS high update lag on wdqs2001 is CRITICAL: 1.245e+05 ge 4.32e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:48:36] ACKNOWLEDGEMENT - MD RAID on wdqs2007 is CRITICAL: CRITICAL: State: degraded, Active: 7, Working: 7, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T281504 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [16:59:01] RECOVERY - SSH on wdqs2007 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring