[00:58:48] (03PS2) 10Krinkle: [WIP] Remove use of the $globals tmp cache file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579653 (https://phabricator.wikimedia.org/T169821) [01:01:50] RECOVERY - Check systemd state on boron is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:24:56] (03PS1) 10Krinkle: Move unconditionally included side-effect free files together [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579814 [01:24:58] (03PS1) 10Krinkle: [WIP] Remove "Cache-control: no-cache" hack from CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579815 (https://phabricator.wikimedia.org/T189966) [01:25:00] (03PS1) 10Krinkle: [WIP] Clean up MWMultiVersion check in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579816 [01:30:23] (03PS1) 10DannyS712: Preemptively revoke administrators' ability to check if 2FA is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579817 (https://phabricator.wikimedia.org/T209749) [01:31:25] (03PS2) 10DannyS712: Preemptively revoke administrators' ability to check if 2FA is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579817 (https://phabricator.wikimedia.org/T209749) [01:50:08] PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:04:32] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash7-eqiad instance=kafkamon1001:9501 job=burrow partition={0,1,10,11,2,3,4,5,6,7,8,9} site=eqiad topic={logback-info,rsyslog-err,rsyslog-info,rsyslog-notice,udp_localhost-info} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer [02:04:32] to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [03:30:56] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:33:34] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:19:46] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:20:10] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:03:56] ACKNOWLEDGEMENT - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP CDanis Zayo TTN-0003950338 https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:03:56] ACKNOWLEDGEMENT - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: CDanis Zayo TTN-0003950338 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:36:50] RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:39:00] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:42:14] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [06:43:02] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [07:01:48] !log Restart logrotate on db1107 [07:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:56] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [09:42:30] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:28:16] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:31:34] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:33:26] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:34:08] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:08:28] PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:13:32] RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:19:16] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:21:48] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:36:58] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/caption/addition/{target} (Caption addition suggestions) is CRITICAL: Test Caption addition suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:39:32] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:52:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:56:50] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) is CRITICAL: Test Caption translation suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:59:20] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [12:12:58] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22106 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:18:44] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 110.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [13:23:24] PROBLEM - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 105.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [13:27:15] !log restarting blazegraph on wdqs1005 T242453 [13:27:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:26] T242453: wdqs1005 stopped to handle updates - https://phabricator.wikimedia.org/T242453 [13:28:28] PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 113.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [13:29:32] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [13:29:48] PROBLEM - WDQS high update lag on wdqs1005 is CRITICAL: 5.488e+04 ge 4.32e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [13:30:58] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:32:04] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [13:37:22] PROBLEM - Old JVM GC check - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is CRITICAL: 105.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [13:48:38] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [13:58:46] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:02:34] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:05:06] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:22:58] RECOVERY - Old JVM GC check - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [14:24:56] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:27:26] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:44:24] RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [14:46:52] RECOVERY - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 74.24 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [14:47:16] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [15:15:32] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:20:28] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22099 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:32:52] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 113.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [15:40:06] PROBLEM - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 111.3 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [16:08:02] PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [16:34:04] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:36:36] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:37:58] 10Operations: decom racktables? - https://phabricator.wikimedia.org/T247646 (10faidon) Good question! I'd prefer to keep it until approximately 5 years (our server lifecycle timespan) since the migration, for historical queries and in case one of the data migration steps failed. We migrated to Netbox in Septem... [16:43:28] RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [16:45:58] RECOVERY - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 66.77 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [17:02:20] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:04:52] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:11:48] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [17:19:00] PROBLEM - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 120 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [17:19:26] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 105.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [17:39:16] PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 116.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [18:00:26] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:23:24] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) is CRITICAL: Test Description translation suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:26:00] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [18:28:14] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22096 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:00:36] RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 73.22 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [19:34:16] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 105.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [19:43:04] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:47:24] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:48:10] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:54:06] PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [19:54:06] RECOVERY - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [19:54:52] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22103 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:09:46] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [20:11:52] RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 70.17 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [20:16:12] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:18:44] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:34:08] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) is CRITICAL: Test Description addition suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:36:44] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:40:44] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) is CRITICAL: Test Caption translation suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:43:18] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:12:26] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/caption/translation/from/{source}/to/{target} (Caption translation suggestions) is CRITICAL: Test Caption translation suggestions returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:14:58] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:15:24] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579772 (https://phabricator.wikimedia.org/T247672) (owner: 10DannyS712) [21:17:42] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:20:16] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:42:37] (03PS1) 10Reedy: Make update-special-pages handle dblist comments [puppet] - 10https://gerrit.wikimedia.org/r/579876 (https://phabricator.wikimedia.org/T247716) [21:51:49] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Different age of history logged in and out when from the EU (but not SF office) - https://phabricator.wikimedia.org/T246185 (10Der-Wir-Ing) It still occurs, today on de-wiki https://de.wikipedia.org/w/index.php?title=Wikipedia:Fragen_zur_Wikipedia&oldi... [21:53:01] /quit/quit [21:53:12] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Different age of history logged in and out when from the EU (but not SF office) - https://phabricator.wikimedia.org/T246185 (10Reedy) {T169894} seems like a dupe of this too [22:16:38] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) is CRITICAL: Test article.creation.morelike - good article title returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [22:17:18] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:19:10] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [22:19:48] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:44:43] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Different age of history logged in and out when from the EU (but not SF office) - https://phabricator.wikimedia.org/T246185 (10DB111) I can reproduce the problem with Win10 Edge (old, 44) and Chromium Edge (80), maybe specific to this browsers? Debug i... [23:18:04] PROBLEM - Disk space on dbprov1001 is CRITICAL: DISK CRITICAL - free space: /srv 452855 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=dbprov1001&var-datasource=eqiad+prometheus/ops [23:20:34] RECOVERY - Disk space on dbprov1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=dbprov1001&var-datasource=eqiad+prometheus/ops [23:20:38] !log removed oldest snapshots on dbprov1001 [23:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log