[00:00:06] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:02:54] <wikibugs>	 (03CR) 10Reedy: [C: 03+1] Use MediaWikiServices::getAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585910 (owner: 10Umherirrender)
[00:04:54] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is OK: HTTP OK: HTTP/1.0 200 OK - 22359 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[00:07:22] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1077 is OK: HTTP OK: HTTP/1.0 200 OK - 22372 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[00:29:22] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:31:10] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:38:32] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:42:12] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:56:50] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:58:38] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:23:32] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[01:27:42] <icinga-wm>	 PROBLEM - Host restbase1025 is DOWN: PING CRITICAL - Packet loss = 100%
[01:30:14] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:30:30] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:30:30] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:30:30] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:30:30] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:30:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:31:14] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:31:24] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
[01:31:24] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:31:24] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:31:24] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:31:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:32:14] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:32:14] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:32:40] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:32:58] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:33:06] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[01:33:10] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:33:14] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:34:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:34:06] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:34:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:36:08] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is OK: HTTP OK: HTTP/1.0 200 OK - 22371 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[03:34:56] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[03:36:48] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[03:36:54] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[03:40:28] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[03:41:46] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:45:26] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:54:34] <icinga-wm>	 PROBLEM - snapshot of s5 in eqiad on db1115 is CRITICAL: snapshot for s5 at eqiad taken more than 3 days ago: Most recent backup 2020-04-09 03:41:37 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[04:12:58] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:14:50] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:17:18] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[04:17:20] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[04:24:40] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[04:25:46] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:26:28] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[04:27:36] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:31:50] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[04:34:02] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[04:44:38] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1085 is OK: HTTP OK: HTTP/1.0 200 OK - 22377 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[04:44:58] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is OK: HTTP OK: HTTP/1.0 200 OK - 22373 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[04:55:10] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:57:00] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[05:06:29] <AntiComposite>	 Just got Request from - via cp1075.eqiad.wmnet, ATS/8.0.6
[05:06:30] <AntiComposite>	 Error: 502, Next Hop Connection Failed at 2020-04-12 05:05:15 GMT on one page and "upstream connect error or disconnect/reset before headers. reset reason: overflow" on another
[05:06:34] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[05:06:40] * AntiComposite takes this as a sign to go to sleep
[05:06:45] <riley>	 AntiComposite: Me as well
[05:06:56] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:06:56] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:06:56] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1391 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:56] <icinga-wm>	 PROBLEM - Apache HTTP on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:56] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:57] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:58] <riley>	 504 Gateway Time-out on enwiki
[05:06:58] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:58] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:58] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:59] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:06:59] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:00] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:00] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:01] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:01] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:08] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1391 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:08] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1397 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1409 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1389 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1411 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:10] <icinga-wm>	 PROBLEM - Apache HTTP on mw1384 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1403 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1413 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:12] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1401 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:14] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:14] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1368 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:14] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:14] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1367 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:17] <icinga-wm>	 PROBLEM - Apache HTTP on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:19] <icinga-wm>	 PROBLEM - Apache HTTP on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:19] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1407 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:20] <icinga-wm>	 PROBLEM - Apache HTTP on mw1264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:21] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:21] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1409 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:22] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1413 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:22] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1397 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:23] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1370 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:23] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1368 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:24] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1373 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:24] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:25] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1389 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:25] <icinga-wm>	 PROBLEM - Apache HTTP on mw1413 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:26] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:26] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:27] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1364 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:27] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1373 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:28] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1409 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:28] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:29] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:29] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:30] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:30] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8188 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:07:31] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:31] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:32] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:32] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:36] <icinga-wm>	 PROBLEM - Apache HTTP on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:36] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1403 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:36] <icinga-wm>	 PROBLEM - Apache HTTP on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:36] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1384 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:40] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1384 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:40] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:40] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1405 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1399 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1370 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:44] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[05:07:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:44] <icinga-wm>	 PROBLEM - Apache HTTP on mw1407 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:46] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1403 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1387 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1399 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:48] <icinga-wm>	 PROBLEM - Apache HTTP on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:49] <icinga-wm>	 PROBLEM - Apache HTTP on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:50] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1265.eqiad.wmnet, mw1331.eqiad.wmnet, mw1395.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1330.eqiad.wmnet, mw1366.eqiad.wmnet, mw1322.eqiad.wmnet, mw1333.eqiad.wmnet, mw1323.eqiad.wmnet, mw1349.eqiad.wmnet, mw1384.eqiad.wmnet, mw1350.eqiad.wmnet, mw1327.eqiad.wmnet, mw1261.eqiad.
[05:07:50] <icinga-wm>	 ad.wmnet, mw1364.eqiad.wmnet, mw1407.eqiad.wmnet, mw1405.eqiad.wmnet, mw1351.eqiad.wmnet, mw1263.eqiad.wmnet, mw1320.eqiad.wmnet, mw1329.eqiad.wmnet, mw1269.eqiad.wmnet, mw1352.eqiad.wmnet, mw1264.eqiad.wmnet, mw1399.eqiad.wmnet, mw1355.eqiad.wmnet, mw1326.eqiad.wmnet, mw1268.eqiad.wmnet, mw1371.eqiad.wmnet, mw1319.eqiad.wmnet, mw1393.eqiad.wmnet, mw1373.eqiad.wmnet, mw1324.eqiad.wmnet, mw1353.eqiad.wmnet, mw1372.eqiad.wmnet, mw1
[05:07:51] <icinga-wm>	 mw1370.eqiad.wmnet, mw1403.eqiad.wmnet, mw1389.eqiad.wmnet, mw1274.eqiad.wmnet, mw1266.eqiad.wmnet, mw1271.eqiad.wmnet, mw1387.eqiad.wmnet, mw1321.eqiad.wmnet, mw1401.eqiad.wmnet, mw139 https://wikitech.wikimedia.org/wiki/PyBal
[05:07:52] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1411 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:52] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:54] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:07:54] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:55] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:56] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:07:56] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1395 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:57] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:07:58] <icinga-wm>	 PROBLEM - Apache HTTP on mw1391 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:58] <icinga-wm>	 PROBLEM - Apache HTTP on mw1368 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:58] <icinga-wm>	 PROBLEM - Apache HTTP on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:58] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1395 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:07:59] <icinga-wm>	 PROBLEM - Apache HTTP on mw1263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:00] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1265.eqiad.wmnet, mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1330.eqiad.wmnet, mw1322.eqiad.wmnet, mw1355.eqiad.wmnet, mw1323.eqiad.wmnet, mw1384.eqiad.wmnet, mw1327.eqiad.wmnet, mw1351.eqiad.wmnet, mw1413.eqiad.wmnet, mw1364.eqiad.wmnet, mw1354.eqiad.wmnet, mw1272.eqiad.wmnet, mw1263
[05:08:00] <icinga-wm>	 274.eqiad.wmnet, mw1405.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1352.eqiad.wmnet, mw1264.eqiad.wmnet, mw1399.eqiad.wmnet, mw1266.eqiad.wmnet, mw1391.eqiad.wmnet, mw1321.eqiad.wmnet, mw1328.eqiad.wmnet, mw1333.eqiad.wmnet, mw1393.eqiad.wmnet, mw1366.eqiad.wmnet, mw1349.eqiad.wmnet, mw1269.eqiad.wmnet, mw1372.eqiad.wmnet, mw1350.eqiad.wmnet, mw1370.eqiad.wmnet, mw1397.eqiad.wmnet, mw1389.eqiad.wmnet, mw1331.eqiad.wmn
[05:08:00] <icinga-wm>	 wmnet, mw1271.eqiad.wmnet, mw1387.eqiad.wmnet, mw1268.eqiad.wmnet, mw1395.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal
[05:08:01] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:01] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:02] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:08:02] <icinga-wm>	 PROBLEM - Apache HTTP on mw1370 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:03] <icinga-wm>	 PROBLEM - Apache HTTP on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:04] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1265.eqiad.wmnet, mw1371.eqiad.wmnet, mw1395.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1270.eqiad.wmnet, mw1331.eqiad.wmnet, mw1355.eqiad.wmnet, mw1349.eqiad.wmnet, mw1384.eqiad.wmnet, mw1272.eqiad.wmnet, mw1387.eqiad.wmnet, mw1364.eqiad.wmnet, mw1407.eqiad.wmnet, mw1
[05:08:04] <icinga-wm>	 mw1263.eqiad.wmnet, mw1405.eqiad.wmnet, mw1329.eqiad.wmnet, mw1269.eqiad.wmnet, mw1271.eqiad.wmnet, mw1264.eqiad.wmnet, mw1266.eqiad.wmnet, mw1391.eqiad.wmnet, mw1321.eqiad.wmnet, mw1333.eqiad.wmnet, mw1393.eqiad.wmnet, mw1366.eqiad.wmnet, mw1324.eqiad.wmnet, mw1350.eqiad.wmnet, mw1389.eqiad.wmnet, mw1320.eqiad.wmnet, mw1319.eqiad.wmnet, mw1352.eqiad.wmnet, mw1268.eqiad.wmnet, mw1401.eqiad.wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.
[05:08:04] <icinga-wm>	 ad.wmnet, mw1409.eqiad.wmnet, mw1385.eqiad.wmnet, mw1369.eqiad.wmnet, mw1413.eqiad.wmnet, mw1353.eqiad.wmnet, mw1273.eqiad.wmnet, mw1262.eqiad.wmnet, mw1411.eqiad.wmnet, mw1330.eqiad.wm https://wikitech.wikimedia.org/wiki/PyBal
[05:08:05] <icinga-wm>	 PROBLEM - Apache HTTP on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:05] <icinga-wm>	 PROBLEM - Apache HTTP on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:06] <icinga-wm>	 PROBLEM - Apache HTTP on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:06] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:07] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:10] <icinga-wm>	 PROBLEM - Apache HTTP on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:10] <icinga-wm>	 PROBLEM - Apache HTTP on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:10] <icinga-wm>	 PROBLEM - Apache HTTP on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:10] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1399 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:10] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1411 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:11] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1405 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:11] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:12] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:12] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={jmx_wdqs_updater,swagger_check_citoid_cluster_eqiad,swagger_check_mobileapps_cluster_eqiad,swagger_check_restbase_eqiad} site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[05:08:13] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1364 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:14] <icinga-wm>	 PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[05:08:14] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.codfw.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:08:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1397 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1395 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1385 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:17] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:17] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:18] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:18] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.codfw.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:08:19] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:19] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:22] <rxy>	 oww...
[05:08:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1373 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:22] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1367 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:23] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:23] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:24] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1387 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:24] <icinga-wm>	 PROBLEM - Apache HTTP on mw1366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:25] <icinga-wm>	 PROBLEM - Apache HTTP on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:25] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1389 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:26] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1393 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:27] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:27] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:28] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:08:28] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:29] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:29] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1385 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1401 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:30] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:31] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:31] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:32] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:32] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1393 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:33] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1367 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:33] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:34] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:34] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1407 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:35] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1387 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:35] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:36] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:38] <icinga-wm>	 PROBLEM - Apache HTTP on mw1393 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:40] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:40] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:40] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:08:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:42] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.eqsin.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:08:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:42] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:42] <icinga-wm>	 PROBLEM - Apache HTTP on mw1372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:43] <icinga-wm>	 PROBLEM - Apache HTTP on mw1364 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:43] <icinga-wm>	 PROBLEM - Apache HTTP on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:43] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:44] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:08:44] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:45] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:45] <icinga-wm>	 PROBLEM - Apache HTTP on mw1401 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:46] <icinga-wm>	 PROBLEM - ATS TLS has reduced HTTP availability #page on icinga1001 is CRITICAL: cluster=cache_text layer=tls https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[05:08:46] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:08:48] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:48] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:48] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:48] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:49] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:49] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:08:50] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1013 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp1081.eqiad.wmnet, cp1085.eqiad.wmnet, cp1089.eqiad.wmnet, cp1075.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled: textlb_443: Servers cp1079.eqiad.wmnet, cp1083.eqiad.wmnet, cp1085.eqiad.wmnet, cp1087.eqiad.wmnet, cp1089.eqiad.wmnet, cp1075.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled: testlb6_4
[05:08:51] <icinga-wm>	 1.eqiad.wmnet, cp1079.eqiad.wmnet, cp1087.eqiad.wmnet, cp1089.eqiad.wmnet, cp1075.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled: textlb6_443: Servers cp1085.eqiad.wmnet, cp1075.eqiad.wmnet, cp1079.eqiad.wmnet, cp1089.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[05:08:51] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:08:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:08:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:08:53] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:08:54] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[05:09:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:09:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:09:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:09:02] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[05:09:06] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:07] <icinga-wm>	 PROBLEM - wiki content on commons #page on commons.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/1118/
[05:09:08] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[05:09:08] <icinga-wm>	 PROBLEM - Apache HTTP on mw1405 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:09:14] <icinga-wm>	 PROBLEM - PHP7 rendering on mw1385 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:09:14] <rxy>	 moritzm: Ops Clinic Duty 
[05:09:14] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 #page on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:15] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:15] <icinga-wm>	 PROBLEM - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Graphoid
[05:09:16] <icinga-wm>	 PROBLEM - Apache HTTP on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:09:16] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Pr
[05:09:16] <icinga-wm>	 from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[05:09:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:09:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[05:09:22] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:22] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:22] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:22] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:26] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1265.eqiad.wmnet, mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1355.eqiad.wmnet, mw1323.eqiad.wmnet, mw1384.eqiad.wmnet, mw1327.eqiad.wmnet, mw1351.eqiad.wmnet, mw1413.eqiad.wmnet, mw1364.eqiad.wmnet, mw1354.eqiad.wmnet, mw1272.eqiad.wmnet, mw1263.eqiad.wmnet, mw1274
[05:09:26] <icinga-wm>	 405.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1352.eqiad.wmnet, mw1264.eqiad.wmnet, mw1399.eqiad.wmnet, mw1266.eqiad.wmnet, mw1391.eqiad.wmnet, mw1321.eqiad.wmnet, mw1333.eqiad.wmnet, mw1393.eqiad.wmnet, mw1366.eqiad.wmnet, mw1349.eqiad.wmnet, mw1269.eqiad.wmnet, mw1350.eqiad.wmnet, mw1389.eqiad.wmnet, mw1331.eqiad.wmnet, mw1319.eqiad.wmnet, mw1271.eqiad.wmnet, mw1387.eqiad.wmnet, mw1268.eqiad.wmnet, mw1395.eqiad.wmn
[05:09:26] <icinga-wm>	 wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.wmnet, mw1407.eqiad.wmnet, mw1409.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal
[05:09:30] <icinga-wm>	 PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[05:09:31] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.eqsin.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:31] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:31] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:32] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:46] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 #page on text-lb.ulsfo.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:09:46] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:50] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:09:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:09:56] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:09:58] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:10:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:10:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:10:00] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:10:06] <icinga-wm>	 PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:10:08] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:08] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:08] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:15] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 #page on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:10:24] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:10:26] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 10.16 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[05:10:34] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:10:40] <icinga-wm>	 PROBLEM - https://phabricator.wikimedia.org #page on phabricator.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Phabricator
[05:10:50] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:52] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:52] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:54] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:54] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:54] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:56] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:56] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:56] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:56] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:56] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:58] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:58] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:10:58] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:00] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:00] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:12] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:16] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:16] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:16] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:20] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:11:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 80 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:26] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:27] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:27] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:28] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:28] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:29] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:29] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:32] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:42] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:42] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:42] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:52] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:11:54] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:11:58] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:12:01] <_joe_>	 wat
[05:12:02] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:12:02] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:12:04] * volans|off here
[05:12:19] <brennen>	 was just about to ask if anybody's around
[05:12:36] <SQL>	 same
[05:12:37] <brennen>	 seeing reports of outage, enwiki not loading here.
[05:12:38] * shdubsh waves
[05:12:40] <icinga-wm>	 PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish
[05:12:43] <SQL>	 504 gateway time-out
[05:12:48] <icinga-wm>	 RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
[05:12:57] <Waggie>	 Yeah, I'm having issues, too.. West Coast US
[05:13:08] <Waggie>	 "upstream connect error or disconnect/reset before headers. reset reason: overflow"
[05:13:10] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:13:11] <Tks4Fish>	 Ptwiki is off form Brazil
[05:13:14] <icinga-wm>	 PROBLEM - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is CRITICAL: CRITICAL - exception while fetching the URL. 502 Server Error: Next Hop Connection Failed for url: https://en.planet.wikimedia.org/ https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org
[05:13:24] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 15 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:13:45] <Tks4Fish>	 Commons too, 502'ing
[05:13:46] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:13:52] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:13:59] <Waggie>	 It was lagging earlier.
[05:14:06] <icinga-wm>	 PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[05:14:11] <rxy>	 eqsin is up, but laggy
[05:14:18] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:14:52] <Tks4Fish>	 https://873gear.com/irc/uploads/7b27e682b7bd2cab/IMG_20200412_021416_158.jpg here's the returning error
[05:15:08] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:15:26] <AntiComposite>	 I'm mostly getting the WMF ATS 502s, but also the plain-text upstream connect error or disconnect/reset before headers. reset reason: overflow
[05:15:54] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/71/
[05:16:36] <Waggie>	 Yeah, just got a 502
[05:16:42] <icinga-wm>	 RECOVERY - wiki content on commons #page on commons.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 172039 bytes in 7.800 second response time https://phabricator.wikimedia.org/project/view/1118/
[05:16:44] <AntiComposite>	 Grafana is also 502 for me, from boston area
[05:17:06] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 9880 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:17:08] <Steinsplitter>	 can confirm
[05:17:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:18:01] <Majavah>	 Phab and wikitech working for me in EU, main wikis all down for me
[05:18:10] <Tks4Fish>	 If grafana should work in mobile, it's completely blank to me in Brazil
[05:18:14] <rxy>	 logstash and ichinga is up for me
[05:18:19] <rxy>	 JP , eqsin
[05:18:24] <icinga-wm>	 PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase
[05:18:28] <Tks4Fish>	 No error, just emptiness
[05:18:40] <AntiComposite>	 Following https://wikitech-static.wikimedia.org/wiki/Reporting_a_connectivity_issue, CURLs to everything but eqiad work as expected, eqiad 502s
[05:18:52] <icinga-wm>	 PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:18:56] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 6 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:19:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:21:16] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:22:00] <Vort>	 Hello. Do anyone knows why ru.wikipedia.org returns 504 Gateway Time-out ?
[05:22:20] <icinga-wm>	 PROBLEM - wiki content on commons #page on commons.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/1118/
[05:22:22] <Majavah>	 Vort: known issue, ops are working on it
[05:22:26] <Vort>	 Thanks
[05:22:30] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={2,3} site=eqiad topic=udp_localhost-err https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=loggi
[05:22:30] <icinga-wm>	 c=All&var-consumer_group=All
[05:23:24] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:23:28] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:23:54] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:24:30] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 3e+04 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:25:02] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:25:30] <icinga-wm>	 PROBLEM - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is CRITICAL: CRITICAL - exception while fetching the URL. 502 Server Error: Next Hop Connection Failed for url: https://en.planet.wikimedia.org/ https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org
[05:26:54] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:27:16] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:28:12] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 4 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:28:34] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:28:36] <wikibugs>	 (03PS1) 10BBlack: Strip certain parameters [puppet] - 10https://gerrit.wikimedia.org/r/588133
[05:29:04] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:29:13] <wikibugs>	 (03CR) 10BBlack: [V: 03+2 C: 03+2] Strip certain parameters [puppet] - 10https://gerrit.wikimedia.org/r/588133 (owner: 10BBlack)
[05:30:30] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:31:02] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:31:02] <icinga-wm>	 PROBLEM - Check if active EventStreams endpoint is delivering messages. on icinga1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[05:31:16] <bblack>	 !log pushing https://gerrit.wikimedia.org/r/588133 to cache_text
[05:31:54] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7146 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:32:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:32:26] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:32:56] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:04] <stashbot>	 bblack: Failed to log message to wiki. Somebody should check the error logs.
[05:33:16] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:20] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:20] <icinga-wm>	 RECOVERY - wiki content on commons #page on commons.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 172039 bytes in 0.014 second response time https://phabricator.wikimedia.org/project/view/1118/
[05:33:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:22] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:33:40] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:33:48] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 5 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:33:50] <icinga-wm>	 RECOVERY - check updates on en.planet.wikimedia.org on en.planet.wikimedia.org is OK: OK - Website content is current (602 = 86400) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org
[05:33:50] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:50] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 544 bytes in 0.519 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:52] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 3.532 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:52] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 3.659 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:52] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:52] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:52] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:53] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:53] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:54] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 7.203 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:56] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.323 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:56] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.461 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:58] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:33:58] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:00] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 3.667 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:08] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:34:10] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:12] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:12] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:12] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:12] <rxy>	 \o/
[05:34:16] <icinga-wm>	 RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps
[05:34:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:34:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:34:18] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[05:34:20] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:34:22] <Tks4Fish>	 Hooray
[05:34:22] <icinga-wm>	 RECOVERY - PHP7 rendering on mw1395 is OK: HTTP OK: HTTP/1.1 200 OK - 75850 bytes in 8.620 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering
[05:34:22] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1395 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 6.528 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:24] <icinga-wm>	 RECOVERY - Apache HTTP on mw1391 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.873 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:24] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:24] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:24] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:24] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:25] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:25] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:26] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:26] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:27] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:27] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:28] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:28] <icinga-wm>	 RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid
[05:34:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1368 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 8.401 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:29] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1405 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.796 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:30] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15237 bytes in 6.545 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:34:30] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1411 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.294 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:31] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1354 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 7.023 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:31] <icinga-wm>	 RECOVERY - Apache HTTP on mw1351 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.368 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:32] <icinga-wm>	 RECOVERY - Apache HTTP on mw1370 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 7.529 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:32] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1399 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:33] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1366 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.571 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:33] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1272 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 8.002 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:34] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 #page on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 550 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:34:34] <icinga-wm>	 RECOVERY - Apache HTTP on mw1353 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.757 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:35] <icinga-wm>	 RECOVERY - Apache HTTP on mw1349 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.846 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:35] <icinga-wm>	 RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.612 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:36] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 #page on text-lb.codfw.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15236 bytes in 1.506 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:34:36] <icinga-wm>	 RECOVERY - Apache HTTP on mw1274 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.636 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:37] <icinga-wm>	 RECOVERY - Apache HTTP on mw1395 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:37] <icinga-wm>	 RECOVERY - Apache HTTP on mw1265 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:38] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1364 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 4.211 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:38] <icinga-wm>	 RECOVERY - Apache HTTP on mw1385 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.061 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:39] <icinga-wm>	 RECOVERY - Apache HTTP on mw1397 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.201 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:39] <icinga-wm>	 RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[05:34:40] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:34:40] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1333 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 8.646 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:41] <icinga-wm>	 RECOVERY - Apache HTTP on mw1331 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 8.955 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:41] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1268 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:42] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 #page on text-lb.codfw.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15249 bytes in 0.309 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[05:34:42] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 4.743 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:43] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1332 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 5.087 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:43] <icinga-wm>	 RECOVERY - Apache HTTP on mw1322 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.156 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:44] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1324 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 7.191 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:44] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1319 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.832 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:45] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1327 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.908 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:45] <icinga-wm>	 RECOVERY - Apache HTTP on mw1373 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:46] <icinga-wm>	 RECOVERY - Apache HTTP on mw1371 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:46] <icinga-wm>	 RECOVERY - Apache HTTP on mw1350 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[05:34:58] <rxy>	 R.I.P icinga
[05:35:11] <Alaa|away>	 welcome back
[05:35:25] <Waggie>	 Thank you ops staff.
[05:35:36] <JD|cloud>	 big round of applause
[05:35:38] <TheSandDoctor>	 Yes, thank you ops staff! :)
[05:35:39] <elukey>	 Hi everybody, the SRE team is working on the issue, thanks for reporting
[05:35:43] <Tks4Fish>	 Thanks folks :D
[05:35:46] * TheSandDoctor applauds
[05:35:48] <TheSandDoctor>	 thanks folks
[05:35:50] <TheSandDoctor>	 ttyl
[05:35:59] <rxy>	 what is problem? disk full?
[05:36:03] <_joe_>	 can you confirm everything is working for you now?
[05:36:06] <AntiComposite>	 up here
[05:36:16] <elge>	 It's working now from russia
[05:36:28] <Majavah>	 Working
[05:36:29] <Alaa|away>	 arwiki working (Y)
[05:36:35] <JD|cloud>	 I doubt it's disk space
[05:36:37] <Tks4Fish>	 Up here
[05:36:39] <CountCount>	 dewiki working
[05:36:45] <TheSandDoctor>	 can confirm enwiki and commons in BC @_joe_
[05:37:02] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:37:02] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:37:02] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 544 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:37:03] <rxy>	 server: mw1263.eqiad.wmnet  x-cache: cp5010 miss, cp5008 pass  x-cache-status: pass  
[05:37:04] <icinga-wm>	 RECOVERY - ATS TLS has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1
[05:37:09] <TheSandDoctor>	 de and meta too. thanks again folks
[05:37:14] <rxy>	 ok at JP eqsin
[05:37:24] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[05:37:38] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[05:37:44] <icinga-wm>	 RECOVERY - Varnish HTTP text-frontend - port 80 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish
[05:38:00] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[05:38:25] <Wikipedia>	 enwiki up for me
[05:38:38] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:38:38] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:38:40] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[05:38:44] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[05:38:55] <wikibugs>	 10Operations: Slow response times and 504 Gateway tomeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Pruem)
[05:39:02] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:39:06] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1737 bytes in 0.098 second response time https://phabricator.wikimedia.org/project/view/71/
[05:40:22] <wikibugs>	 10Operations: Slow response times and 504 Gateway tomeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10RhinosF1) This was resolved a few moments ago, are you still getting the issues?
[05:41:12] <wikibugs>	 10Operations: Slow response times and 504 Gateway tomeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10colewhite) p:05Triage→03Unbreak! a:03colewhite
[05:42:04] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[05:42:22] <RhinosF1>	 _joe_: ^ see that task just opened
[05:42:24] <wikibugs>	 10Operations: Slow response times and 504 Gateway tomeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Pruem) As of now, it seems to have ceased. I'll keep checking.
[05:42:59] <wikibugs>	 10Operations: Slow response times and 504 Gateway tomeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10RhinosF1) 05Open→03Resolved p:05Unbreak!→03Medium a:05colewhite→03None This was fixed.
[05:43:07] <wikibugs>	 10Operations: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Pruem) 05Resolved→03Open p:05Medium→03Unbreak! a:03colewhite
[05:43:19] <wikibugs>	 10Operations: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Koavf) Appears resolved. Can others help document at https://wikitech.wikimedia.org/wiki/Incident_documentation/20200412-eqiad_down
[05:43:29] <riley>	 RhinosF1: It was claimed by a WMF staff member
[05:43:42] <Tks4Fish>	 RhinosF1: Cole is a WMF staff member
[05:43:42] <wikibugs>	 10Operations: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Koavf) p:05Unbreak!→03Low
[05:43:45] <riley>	 Don't close a task claimed by someone else
[05:43:56] <wikibugs>	 10Operations: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Pruem) 05Open→03Resolved
[05:44:12] <RhinosF1>	 I didn’t realise, there’s no on wiki account connected to see or information in the phab profile
[05:44:50] <Tks4Fish>	 There is the LDAP account, which is a redirect to his WMF account on SUL wikis
[05:45:12] <_joe_>	 hey everyone, don't edit-war on phabricator :D
[05:45:38] <wikibugs>	 10Operations: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Joe) >>! In T250025#6049704, @Koavf wrote: > Appears resolved. Can others help document at https://wikitech.wikimedia.org/wiki/Incident_documentation/20200412-eqiad_down  Please don...
[05:45:55] <rxy>	 but phab haven't good mechanism for avoid edit conflicts..
[05:45:57] <riley>	 Too many hands in the pots :)
[05:46:24] <TheSandDoctor>	 :P
[05:46:27] <TheSandDoctor>	 indeed'
[05:48:09] <wikibugs>	 (03PS1) 10BBlack: Fix regex syntax in prev commit [puppet] - 10https://gerrit.wikimedia.org/r/588134
[05:48:24] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 30.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[05:49:30] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:50:15] <vgutierrez>	 !log restart ats-tls on cp[1077,1081,1083,1085].eqiad.wmnet- T249335
[05:50:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:50:22] <stashbot>	 T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335
[05:50:26] <wikibugs>	 (03PS2) 10BBlack: Fix regex syntax in prev commit [puppet] - 10https://gerrit.wikimedia.org/r/588134
[05:52:12] <wikibugs>	 (03CR) 10BBlack: [V: 03+2 C: 03+2] Fix regex syntax in prev commit [puppet] - 10https://gerrit.wikimedia.org/r/588134 (owner: 10BBlack)
[05:53:09] <bblack>	 !log pushing https://gerrit.wikimedia.org/r/588134 to cache_text
[05:53:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:08] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:56:26] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:57:52] <icinga-wm>	 RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.001274 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet
[06:01:44] <icinga-wm>	 RECOVERY - Check if active EventStreams endpoint is delivering messages. on icinga1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[06:17:36] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[06:20:59] <elukey>	 !log powercycle restbase1025 (not reachable, serial console shows blank, racadm getsel reports errors with DIMM_B2)
[06:21:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:23:42] <icinga-wm>	 RECOVERY - Host restbase1025 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms
[06:26:14] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10elukey)
[06:27:03] <wikibugs>	 (03PS1) 10Ema: Rate limit non-API traffic from public clouds [puppet] - 10https://gerrit.wikimedia.org/r/588135
[06:27:34] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10elukey) Caught during boot:  ` UEFI0106: One or more memory correctable training errors have occurred on memory slot: B2. Remove input power to the system, reseat the DIMM module and restart th...
[06:27:40] <icinga-wm>	 PROBLEM - Host restbase1025 is DOWN: PING CRITICAL - Packet loss = 100%
[06:27:50] <elukey>	 this is me sorry --^
[06:27:52] <icinga-wm>	 RECOVERY - Host restbase1025 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms
[06:27:55] <elukey>	 host up now
[06:28:55] <volans>	 thanks elukey 
[06:29:21] <elukey>	 volans: I can't ssh to it, I think we should depool it :(
[06:30:05] <elukey>	 ah snap frozen again, yes it is not usable
[06:30:12] <icinga-wm>	 PROBLEM - Host restbase1025 is DOWN: PING CRITICAL - Packet loss = 100%
[06:30:17] <elukey>	 it just rebooted by itself
[06:30:18] <volans>	 elukey: ack, kill it
[06:30:30] <volans>	 if loops the reboot power it dow
[06:30:31] <volans>	 *down
[06:32:11] <elukey>	 !log powerdown restbase1025 - T250027
[06:32:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:17] <stashbot>	 T250027: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027
[06:32:27] <volans>	 elukey: I'd say let's also depool it officially
[06:34:01] <elukey>	 yep doing it
[06:35:34] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=no; selector: name=restbase1025.eqiad.wmnet
[06:35:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:36:06] <volans>	 <3 elukey!
[06:36:49] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10elukey) ` elukey@puppetmaster1001:~$ sudo confctl depool --hostname restbase1025.eqiad.wmnet eqiad/restbase/restbase/restbase1025.eqiad.wmnet: pooled changed yes => no eqiad/restbase/restbase-b...
[06:37:05] <elukey>	 ok all done
[06:59:24] <dcausse>	 !log restarting blazegraph on wdqs1004 (T242453)
[06:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:30] <stashbot>	 T242453: Deadlock in blazegraph blocking all queries and updates - https://phabricator.wikimedia.org/T242453
[07:21:51] <_joe_>	 elukey: i think something will be needed for cassandra too, but i'm inclined to leave it for later
[07:58:20] <wikibugs>	 10Operations, 10Cloud-Services, 10Traffic, 10Wikimedia-Incident: Requests to production are sometimes timing out or giving empty response - https://phabricator.wikimedia.org/T249035 (10Adithyak1997) I don't know whether its related. Yesterday, some of the users including me have faced problems logging into...
[08:52:25] <wikibugs>	 10Operations, 10Cloud-Services, 10Traffic, 10Wikimedia-Incident: Requests to production are sometimes timing out or giving empty response - https://phabricator.wikimedia.org/T249035 (10Lirazelf) Hi there, was pointed in this direction by the folks at wikidata:project chat - I'm also experiencing issues wit...
[09:28:42] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10Aklapper)
[09:57:28] <wikibugs>	 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10elukey) @Eevans adding yourself to this task as FYI :)
[10:18:11] <elukey>	 !log restart wdqs-updater on wdqs1004 (logs show no reports from the past hours, last one were stack traces related to a json decode failure)
[10:18:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:00] <wikibugs>	 (03CR) 10Nikerabbit: "I did the former: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/587251 (which also updated the misleading comment)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit)
[10:41:40] <wikibugs>	 (03Abandoned) 10Nikerabbit: Restore Beta Cluster logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit)
[11:01:28] <wikibugs>	 10Operations, 10Wikimedia-Incident: Slow response times and 504 Gateway timeouts accross all wiki projects - https://phabricator.wikimedia.org/T250025 (10Peachey88)
[11:11:34] <vgutierrez>	 !log restart ats-tls on cp5008.eqsin.wmnet - T249335
[11:11:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:41] <stashbot>	 T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335
[11:14:42] <icinga-wm>	 PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:41:44] <icinga-wm>	 RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:47:38] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[12:54:46] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is OK: HTTP OK: HTTP/1.0 200 OK - 22380 bytes in 2.933 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[14:32:15] <wikibugs>	 (03PS6) 10Zoranzoki21: robots.txt: Disable indexing user (sub)pages and draft-related pages on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584615 (https://phabricator.wikimedia.org/T248860)
[14:36:06] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for wikiwomencamp-bounces mainling list - https://phabricator.wikimedia.org/T250035 (10AnnaTorres)
[14:40:40] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for wikiwomencamp-bounces mainling list - https://phabricator.wikimedia.org/T250035 (10AnnaTorres)
[14:46:02] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for wikiwomencamp-bounces mainling list - https://phabricator.wikimedia.org/T250035 (10Reedy) a:05AnnaTorres→03None
[15:03:01] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for admin of wikiwomencamp mailing list - https://phabricator.wikimedia.org/T250035 (10Aklapper)
[15:04:03] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Password reset for admin of wikiwomencamp mailing list - https://phabricator.wikimedia.org/T250035 (10Aklapper) @AnnaTorres: Once you have access again, you probably want to remove kherold from the second field "The list administrator email addresses" on https://lists.w...
[15:57:37] <wikibugs>	 10Operations, 10Keyholder, 10Release-Engineering-Team-TODO, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services): Keyholder phab repo duplicate work - https://phabricator.wikimedia.org/T203003 (10hashar) I forgot to check for open changes, thank you for the notification. I guess it is tim...
[16:10:06] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helm/helm-diff for buster-wikimedia - https://phabricator.wikimedia.org/T249812 (10hashar)
[18:57:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:57:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:57:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:58:10] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:58:20] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[18:58:22] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[18:58:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:58:46] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:59:34] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[18:59:54] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:00:34] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:01:56] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[19:03:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:03:38] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[19:03:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:05:02] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:05:26] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:07:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:07:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[19:09:16] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:13:40] <wikibugs>	 (03PS2) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[19:17:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[19:17:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:18:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:18:24] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:21:04] <wikibugs>	 (03PS3) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[19:21:32] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:23:18] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:23:50] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:25:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[19:28:12] <wikibugs>	 (03PS4) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[19:29:16] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:31:10] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:32:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:34:50] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:38:28] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:41:48] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:41:50] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:41:50] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:42:06] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:42:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:43:32] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:43:54] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[19:44:10] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:45:28] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:45:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:45:42] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:45:44] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[19:50:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:51:02] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:52:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[19:52:52] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:58:34] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:02:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:03:12] <wikibugs>	 (03PS1) 10Andrew Bogott: (WIP) Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[20:03:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:04:04] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:05:54] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:07:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] (WIP) Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[20:09:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[20:09:42] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:10:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:10:22] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:11:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:11:18] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:11:20] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:11:28] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:11:52] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:12:08] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:06] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:13:18] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:13:26] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:14:54] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:14:56] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:15:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:18:46] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:18:54] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:22:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:24:18] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:27:10] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Framawiki) >>! In T249535#6048762, @Nattes wrote: > Hi I dont know if this is the riight pace to ask. So...
[20:30:28] <wikibugs>	 10Operations, 10Growth-Team, 10StructuredDiscussions: Flow failing with Error contacting the Parsoid/RESTBase server (HTTP 400) - https://phabricator.wikimedia.org/T249997 (10Framawiki)
[20:32:00] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:33:24] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[20:33:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:34:52] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Wikimedia-Site-requests: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648 (10Framawiki) The task title is pretty clear now. It's about redirecting everything non-names...
[20:35:04] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:10] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m
[20:35:24] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:40] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:35:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:36:19] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Wikimedia-Site-requests: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648 (10RhinosF1) >>! In T249648#6050573, @Framawiki wrote: > The task title is pretty clear now....
[20:36:52] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:36:58] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:37:14] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:38:40] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:38:48] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET
[20:42:32] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:46:10] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[20:47:46] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:47:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:47:56] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:48:28] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:48:40] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:49:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:50:16] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:51:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:51:30] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:52:12] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:53:28] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:55:33] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic, 10Performance-Team (Radar): Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle)
[20:57:06] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:58:02] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:58:56] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:59:46] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:59:48] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Traffic, 10Performance-Team (Radar): Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle)
[21:00:51] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, 10Performance-Team (Radar): Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle) As part of my focus on stability/sustainability, I'd like to try taking this on as part of the Perf Team....
[21:01:02] <wikibugs>	 10Operations, 10Core Platform Team, 10MediaWiki-Cache, 10Traffic, 10Performance-Team (Radar): Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835 (10Krinkle) a:03Krinkle
[21:01:57] <wikibugs>	 (03PS2) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:02:32] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:05:04] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[21:06:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[21:06:48] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[21:08:00] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:09:41] <wikibugs>	 (03PS3) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:11:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:17:10] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[21:26:05] <wikibugs>	 (03PS5) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[21:26:07] <wikibugs>	 (03PS4) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:26:09] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588175
[21:28:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588175 (owner: 10Andrew Bogott)
[21:30:05] <wikibugs>	 (03PS2) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588175
[21:30:07] <wikibugs>	 (03PS6) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[21:30:09] <wikibugs>	 (03PS5) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:30:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588175 (owner: 10Andrew Bogott)
[21:31:53] <wikibugs>	 10Operations, 10Growth-Team, 10StructuredDiscussions, 10VisualEditor: Flow failing with Error contacting the Parsoid/RESTBase server (HTTP 400) - https://phabricator.wikimedia.org/T249997 (10Framawiki) I've received an email via OTRS regarding this error on frwiki from an editor that was using #visualedito...
[21:33:06] <wikibugs>	 (03PS7) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[21:33:08] <wikibugs>	 (03PS6) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:33:10] <wikibugs>	 (03PS1) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176
[21:35:58] <wikibugs>	 (03Abandoned) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588175 (owner: 10Andrew Bogott)
[21:44:53] <wikibugs>	 (03PS2) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176
[21:44:55] <wikibugs>	 (03PS8) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[21:44:57] <wikibugs>	 (03PS7) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:45:28] <wikibugs>	 10Operations, 10DNS, 10Traffic, 10Wikimedia-Apache-configuration, 10Wikimedia-Site-requests: redirect sco.wiktionary.org/wiki/(.*?) -> sco.wikipedia.org/wiki/Define:$1 - https://phabricator.wikimedia.org/T249648 (10Bugreporter) Note sco.wiktionary.org/wiki/ and sco.wiktionary.org should be redirected a v...
[21:49:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176 (owner: 10Andrew Bogott)
[21:51:04] <wikibugs>	 (03PS3) 10Andrew Bogott: designate: change api_base_uri to proper HA endpoint [puppet] - 10https://gerrit.wikimedia.org/r/588176
[21:51:06] <wikibugs>	 (03PS9) 10Andrew Bogott: designate: remove second_region_* hiera values [puppet] - 10https://gerrit.wikimedia.org/r/588163 (https://phabricator.wikimedia.org/T249941)
[21:51:08] <wikibugs>	 (03PS8) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[21:55:11] <wikibugs>	 (03CR) 10Andrew Bogott: "pcc run:  https://puppet-compiler.wmflabs.org/compiler1002/21856/cloudservices1003.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/588176 (owner: 10Andrew Bogott)
[22:03:52] <wikibugs>	 (03PS9) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[22:37:10] <wikibugs>	 (03PS10) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[22:47:50] <wikibugs>	 (03PS11) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[22:50:27] <wikibugs>	 10Operations, 10MediaWiki-General, 10Traffic: Requests with utf-8 in the URL return a outdated page revision - https://phabricator.wikimedia.org/T23027 (10Krinkle)
[22:51:42] <wikibugs>	 (03PS12) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[22:53:37] <wikibugs>	 (03PS13) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[22:56:48] <wikibugs>	 (03PS14) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:01:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)
[23:06:33] <wikibugs>	 (03PS15) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:11:48] <wikibugs>	 (03PS16) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:14:23] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: General GET/POST limiting in MediaWiki - https://phabricator.wikimedia.org/T115088 (10Krinkle) 05Open→03Resolved a:03Krinkle >>! In T20489#6050810, @Krinkle wrote: > […]  For the concern of general load and concurrency (not individual...
[23:16:52] <wikibugs>	 (03PS17) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:22:28] <wikibugs>	 (03PS18) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:26:04] <wikibugs>	 (03PS19) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:32:29] <wikibugs>	 (03PS20) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:37:11] <wikibugs>	 (03PS21) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:47:09] <wikibugs>	 (03PS22) 10Andrew Bogott: Designate: use a list of designate hosts in hiera [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941)
[23:53:00] <wikibugs>	 (03CR) 10Andrew Bogott: "sample pcc run:  https://puppet-compiler.wmflabs.org/compiler1003/21872/" [puppet] - 10https://gerrit.wikimedia.org/r/588169 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott)