[02:28:53] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241534 (10Peachey88) [03:43:42] (03CR) 10DannyS712: [C: 03+1] Use editautopatrolprotected right for pages protected for autopatrollers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529043 (https://phabricator.wikimedia.org/T230103) (owner: 10Urbanecm) [05:39:59] (03CR) 10Andrew Bogott: [C: 03+2] extdist: Drop pre-stretch support [puppet] - 10https://gerrit.wikimedia.org/r/560957 (owner: 10Legoktm) [06:07:37] (03PS1) 10Andrew Bogott: nova: remove hotpatch of nova/api/manager.py [puppet] - 10https://gerrit.wikimedia.org/r/561117 (https://phabricator.wikimedia.org/T198950) [06:08:30] (03CR) 10Andrew Bogott: [C: 03+2] nova: remove hotpatch of nova/api/manager.py [puppet] - 10https://gerrit.wikimedia.org/r/561117 (https://phabricator.wikimedia.org/T198950) (owner: 10Andrew Bogott) [06:18:10] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Add Cloud VPS global root key for Alex Monk [labs/private] - 10https://gerrit.wikimedia.org/r/560868 (owner: 10Alex Monk) [06:31:23] (03PS1) 10Andrew Bogott: nfs-mounts: expose the dumps mount in the gratitude project [puppet] - 10https://gerrit.wikimedia.org/r/561119 (https://phabricator.wikimedia.org/T240737) [06:32:28] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts: expose the dumps mount in the gratitude project [puppet] - 10https://gerrit.wikimedia.org/r/561119 (https://phabricator.wikimedia.org/T240737) (owner: 10Andrew Bogott) [07:54:05] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 3.352e+04 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:54:53] PROBLEM - PHP7 rendering on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:54:57] PROBLEM - Nginx local proxy to apache on mw1263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:01] PROBLEM - Apache HTTP on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:01] PROBLEM - Apache HTTP on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:01] PROBLEM - Apache HTTP on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:01] PROBLEM - Apache HTTP on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:01] PROBLEM - Apache HTTP on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:03] PROBLEM - Nginx local proxy to apache on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:03] PROBLEM - PHP7 rendering on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:03] PROBLEM - Apache HTTP on mw1248 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:03] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method= [07:55:05] PROBLEM - Nginx local proxy to apache on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:07] PROBLEM - Nginx local proxy to apache on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:07] PROBLEM - Nginx local proxy to apache on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:07] PROBLEM - Nginx local proxy to apache on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:07] PROBLEM - PHP7 rendering on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:09] PROBLEM - PHP7 rendering on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:11] PROBLEM - Nginx local proxy to apache on mw1246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:11] PROBLEM - Apache HTTP on mw1258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:11] PROBLEM - PHP7 rendering on mw1247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:11] PROBLEM - PHP7 rendering on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:15] PROBLEM - PHP7 rendering on mw1248 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:15] PROBLEM - PHP7 rendering on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:15] PROBLEM - Apache HTTP on mw1239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:17] PROBLEM - Apache HTTP on mw1264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:17] PROBLEM - Apache HTTP on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:19] PROBLEM - PHP7 rendering on mw1273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:19] PROBLEM - Apache HTTP on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:19] PROBLEM - Apache HTTP on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:21] PROBLEM - Apache HTTP on mw1247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:21] PROBLEM - Apache HTTP on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:21] PROBLEM - PHP7 rendering on mw1253 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:21] PROBLEM - Apache HTTP on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:21] PROBLEM - Apache HTTP on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:21] PROBLEM - PHP7 rendering on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:21] PROBLEM - PHP7 rendering on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:23] PROBLEM - PHP7 rendering on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:23] PROBLEM - PHP7 rendering on mw1250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:23] PROBLEM - Apache HTTP on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:23] PROBLEM - Nginx local proxy to apache on mw1239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:24] PROBLEM - Nginx local proxy to apache on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:25] PROBLEM - Apache HTTP on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:25] PROBLEM - Apache HTTP on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:25] PROBLEM - Nginx local proxy to apache on mw1248 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:26] PROBLEM - Apache HTTP on mw1240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:26] PROBLEM - Apache HTTP on mw1244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:27] PROBLEM - Nginx local proxy to apache on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:27] PROBLEM - Nginx local proxy to apache on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:28] PROBLEM - Nginx local proxy to apache on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:28] PROBLEM - Nginx local proxy to apache on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:29] PROBLEM - Nginx local proxy to apache on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:29] PROBLEM - Nginx local proxy to apache on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:30] PROBLEM - PHP7 rendering on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:30] PROBLEM - PHP7 rendering on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:31] PROBLEM - Nginx local proxy to apache on mw1241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:31] PROBLEM - PHP7 rendering on mw1320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:32] PROBLEM - PHP7 rendering on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:32] PROBLEM - PHP7 rendering on mw1239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:33] PROBLEM - PHP7 rendering on mw1270 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:35] PROBLEM - PHP7 rendering on mw1332 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:35] PROBLEM - Nginx local proxy to apache on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:37] PROBLEM - Apache HTTP on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:41] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro [07:55:41] PROBLEM - Apache HTTP on mw1253 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:41] PROBLEM - Apache HTTP on mw1254 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:41] PROBLEM - Nginx local proxy to apache on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:41] PROBLEM - PHP7 rendering on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:41] PROBLEM - Apache HTTP on mw1245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:43] PROBLEM - Nginx local proxy to apache on mw1247 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:43] PROBLEM - Apache HTTP on mw1238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:43] PROBLEM - Apache HTTP on mw1324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:43] PROBLEM - PHP7 rendering on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:45] PROBLEM - PHP7 rendering on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:45] PROBLEM - Apache HTTP on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:45] PROBLEM - Nginx local proxy to apache on mw1330 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:45] PROBLEM - Nginx local proxy to apache on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:45] PROBLEM - Nginx local proxy to apache on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:47] PROBLEM - PHP7 rendering on mw1249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:47] PROBLEM - Apache HTTP on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:47] PROBLEM - Nginx local proxy to apache on mw1253 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:49] PROBLEM - Nginx local proxy to apache on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:49] PROBLEM - Nginx local proxy to apache on mw1244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:53] PROBLEM - PHP7 rendering on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:53] PROBLEM - Apache HTTP on mw1319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:53] PROBLEM - PHP7 rendering on mw1240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:53] PROBLEM - PHP7 rendering on mw1238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:55] PROBLEM - PHP7 rendering on mw1242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:55] PROBLEM - Nginx local proxy to apache on mw1254 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:57] PROBLEM - Nginx local proxy to apache on mw1272 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:57] PROBLEM - Nginx local proxy to apache on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:59] PROBLEM - Apache HTTP on mw1271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:59] PROBLEM - Apache HTTP on mw1265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:59] PROBLEM - Apache HTTP on mw1331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:55:59] PROBLEM - PHP7 rendering on mw1246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:55:59] PROBLEM - PHP7 rendering on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:01] PROBLEM - Apache HTTP on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:01] PROBLEM - Apache HTTP on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:01] PROBLEM - Apache HTTP on mw1326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:01] PROBLEM - Apache HTTP on mw1321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:02] PROBLEM - Apache HTTP on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:03] PROBLEM - Nginx local proxy to apache on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:05] PROBLEM - Nginx local proxy to apache on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:05] PROBLEM - PHP7 rendering on mw1327 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:05] PROBLEM - PHP7 rendering on mw1262 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:10] PROBLEM - LVS HTTPS IPv4 #page on text-lb.eqsin.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:56:11] PROBLEM - Nginx local proxy to apache on mw1251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:11] PROBLEM - PHP7 rendering on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:11] PROBLEM - PHP7 rendering on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:13] PROBLEM - Nginx local proxy to apache on mw1265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:13] PROBLEM - Nginx local proxy to apache on mw1264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:13] PROBLEM - PHP7 rendering on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:17] PROBLEM - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:56:17] PROBLEM - Apache HTTP on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:17] PROBLEM - Apache HTTP on mw1251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:17] PROBLEM - Nginx local proxy to apache on mw1243 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:19] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:56:19] PROBLEM - Nginx local proxy to apache on mw1323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:19] PROBLEM - Nginx local proxy to apache on mw1242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:19] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:56:23] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro [07:56:23] PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m [07:56:27] PROBLEM - LVS HTTPS IPv6 #page on text-lb.codfw.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:56:27] PROBLEM - Nginx local proxy to apache on mw1238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:27] PROBLEM - Apache HTTP on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:27] PROBLEM - Nginx local proxy to apache on mw1240 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:30] PROBLEM - LVS HTTPS IPv6 #page on text-lb.eqsin.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:56:30] PROBLEM - Apache HTTP on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:31] PROBLEM - Apache HTTP on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:31] PROBLEM - PHP7 rendering on mw1269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:31] PROBLEM - Apache HTTP on mw1263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:31] PROBLEM - Apache HTTP on mw1246 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:31] PROBLEM - Apache HTTP on mw1242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:33] PROBLEM - Nginx local proxy to apache on mw1258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:33] PROBLEM - Nginx local proxy to apache on mw1255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:33] PROBLEM - PHP7 rendering on mw1258 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:37] PROBLEM - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:56:39] PROBLEM - Nginx local proxy to apache on mw1245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:39] PROBLEM - Nginx local proxy to apache on mw1252 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:39] PROBLEM - PHP7 rendering on mw1255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:41] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:41] PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:41] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:41] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:41] PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:42] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:42] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:43] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:43] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:44] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro [07:56:44] PROBLEM - PHP7 rendering on mw1251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:45] PROBLEM - Nginx local proxy to apache on mw1249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:49] PROBLEM - Nginx local proxy to apache on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:49] PROBLEM - Nginx local proxy to apache on mw1250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:49] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:56:51] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1275.eqiad.wmnet, mw1265.eqiad.wmnet, mw1331.eqiad.wmnet, mw1242.eqiad.wmnet, mw1240.eqiad.wmnet, mw1253.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1238.eqiad.wmnet, mw1319.eqiad.wmnet, mw1323.eqiad.wmnet, mw1249.eqiad.wmnet, mw1327.eqiad.wmnet, mw1261.eqiad.wmnet, mw1243.eqiad.wmnet, mw1245.eqiad.wmnet, mw1 [07:56:51] mw1258.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1271.eqiad.wmnet, mw1264.eqiad.wmnet, mw1250.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1256.eqiad.wmnet, mw1333.eqiad.wmnet, mw1241.eqiad.wmnet, mw1324.eqiad.wmnet, mw1255.eqiad.wmnet, mw1257.eqiad.wmnet, mw1244.eqiad.wmnet, mw1274.eqiad.wmnet, mw1268.eqiad.wmnet, mw1325.eqiad.wmnet, mw1254.eqiad.wmnet, mw1248.eqiad.wmnet, mw1252.eqiad.wmnet, mw1328.eqiad. [07:56:51] ad.wmnet, mw1330.eqiad.wmnet, mw1247.eqiad.wmnet, mw1239.eqiad.wmnet are marked down but pooled: appservers-https_443: Servers mw1265.eqiad.wmnet, mw1256.eqiad.wmnet, mw1242.eqiad.wmnet https://wikitech.wikimedia.org/wiki/PyBal [07:56:51] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - appservers-https_443: Servers mw1275.eqiad.wmnet, mw1265.eqiad.wmnet, mw1256.eqiad.wmnet, mw1242.eqiad.wmnet, mw1240.eqiad.wmnet, mw1246.eqiad.wmnet, mw1253.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1331.eqiad.wmnet, mw1333.eqiad.wmnet, mw1323.eqiad.wmnet, mw1249.eqiad.wmnet, mw1327.eqiad.wmnet, mw1261.eqiad.wmnet, mw1243.eqiad. [07:56:51] ad.wmnet, mw1272.eqiad.wmnet, mw1263.eqiad.wmnet, mw1258.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1271.eqiad.wmnet, mw1264.eqiad.wmnet, mw1250.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1268.eqiad.wmnet, mw1319.eqiad.wmnet, mw1241.eqiad.wmnet, mw1324.eqiad.wmnet, mw1255.eqiad.wmnet, mw1257.eqiad.wmnet, mw1251.eqiad.wmnet, mw1244.eqiad.wmnet, mw1274.eqiad.wmnet, mw1321.eqiad.wmnet, mw1269.eqiad.wmnet, mw1 [07:56:51] mw1254.eqiad.wmnet, mw1248.eqiad.wmnet, mw1238.eqiad.wmnet, mw1252.eqiad.wmnet, mw1328.eqiad.wmnet, mw1270.eqiad.wmnet, mw1239.eqiad.wmnet, mw1273.eqiad.wmnet, mw1262.eqiad.wmnet, mw133 https://wikitech.wikimedia.org/wiki/PyBal [07:56:51] PROBLEM - PHP7 rendering on mw1268 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:52] PROBLEM - PHP7 rendering on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:52] PROBLEM - PHP7 rendering on mw1252 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:53] PROBLEM - PHP7 rendering on mw1265 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:53] PROBLEM - PHP7 rendering on mw1263 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:54] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [07:56:55] PROBLEM - Apache HTTP on mw1250 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:55] PROBLEM - graphoid endpoints health on scb1001 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [07:56:55] PROBLEM - Nginx local proxy to apache on mw1328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:56] PROBLEM - High average POST latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST [07:56:57] PROBLEM - PHP7 rendering on mw1254 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:57] PROBLEM - PHP7 rendering on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:57] PROBLEM - PHP7 rendering on mw1245 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:56:58] PROBLEM - Apache HTTP on mw1255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:59] PROBLEM - Nginx local proxy to apache on mw1266 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:56:59] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:56:59] PROBLEM - Apache HTTP on mw1252 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:57:01] PROBLEM - Apache HTTP on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:57:01] PROBLEM - PHP7 rendering on mw1244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:57:01] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [07:57:03] PROBLEM - PHP7 rendering on mw1264 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:57:03] PROBLEM - PHP7 rendering on mw1333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:57:05] PROBLEM - Apache HTTP on mw1249 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:57:05] PROBLEM - Nginx local proxy to apache on mw1329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:57:08] PROBLEM - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:10] PROBLEM - LVS HTTPS IPv4 #page on text-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:10] PROBLEM - graphoid endpoints health on scb1003 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [07:57:12] PROBLEM - LVS HTTPS IPv4 #page on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:14] PROBLEM - LVS HTTPS IPv6 #page on text-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:14] PROBLEM - Nginx local proxy to apache on mw1275 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:57:15] PROBLEM - LVS HTTPS IPv4 #page on text-lb.codfw.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:16] PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:57:16] PROBLEM - Varnish HTTP text-frontend - port 80 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:57:16] PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:57:23] neo- [07:57:24] PROBLEM - graphoid endpoints health on scb1004 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [07:57:26] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [07:57:26] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [07:57:28] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro [07:57:44] PROBLEM - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Graphoid [07:57:44] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:57:48] PROBLEM - LVS HTTPS IPv6 #page on text-lb.ulsfo.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:57:48] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:57:58] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [07:58:00] PROBLEM - graphoid endpoints health on scb1002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [07:58:02] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:02] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:05] PROBLEM - LVS HTTP IPv4 #page on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [07:58:05] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8724 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:58:12] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:22] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [07:58:22] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [07:58:23] Unable to connect to wikis - has this been reported yet? [07:58:24] Request from - via cp1085.eqiad.wmnet, ATS/8.0.5 [07:58:28] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:58:30] PROBLEM - wiki content on commons #page on commons.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/1118/ [07:58:32] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:32] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:36] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:36] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:36] PROBLEM - Varnish HTTP text-frontend - port 3127 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:38] PROBLEM - PyBal backends health check on lvs1013 is CRITICAL: PYBAL CRITICAL - CRITICAL - testlb_443: Servers cp1081.eqiad.wmnet, cp1083.eqiad.wmnet, cp1085.eqiad.wmnet, cp1087.eqiad.wmnet, cp1075.eqiad.wmnet, cp1079.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled: textlb_443: Servers cp1081.eqiad.wmnet, cp1083.eqiad.wmnet, cp1085.eqiad.wmnet, cp1087.eqiad.wmnet, cp1075.eqiad.wmnet, cp1079.eqiad.wmnet, cp1089.eqiad.wmn [07:58:38] wmnet are marked down but pooled: testlb6_443: Servers cp1081.eqiad.wmnet, cp1083.eqiad.wmnet, cp1085.eqiad.wmnet, cp1079.eqiad.wmnet, cp1089.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled: textlb6_443: Servers cp1083.eqiad.wmnet, cp1085.eqiad.wmnet, cp1087.eqiad.wmnet, cp1075.eqiad.wmnet, cp1079.eqiad.wmnet, cp1089.eqiad.wmnet, cp1077.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [07:58:39] PROBLEM - https://phabricator.wikimedia.org #page on phabricator.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Phabricator [07:58:44] PROBLEM - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [07:58:48] PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:58:48] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:58:48] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:58:48] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [07:58:51] DannyS712: apergos is aware I believe [07:58:58] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:58] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:58] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:58] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:58] PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:59] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:58:59] I'm here [07:59:00] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:06] Okay, phab isn't loading so I wasn't sure [07:59:07] PROBLEM - ATS TLS has reduced HTTP availability #page on icinga1001 is CRITICAL: cluster=cache_text layer=tls https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1 [07:59:10] PROBLEM - Varnish HTTP text-frontend - port 3127 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:20] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:20] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:24] Not just cluster/sul - wikitech too [07:59:27] there are a couple other folks around too [07:59:28] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:28] PROBLEM - Varnish HTTP text-frontend - port 3127 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:28] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:28] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:28] PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:29] PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:36] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:36] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:38] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:46] PROBLEM - Varnish HTTP text-frontend - port 3127 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:46] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:49] phab and wikis do work for me [07:59:52] RECOVERY - Apache HTTP on mw1325 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.325 second response time https://wikitech.wikimedia.org/wiki/Application_servers [07:59:54] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:54] PROBLEM - Varnish HTTP text-frontend - port 3121 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:54] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [07:59:59] interesting [08:00:14] PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1265.eqiad.wmnet, mw1256.eqiad.wmnet, mw1242.eqiad.wmnet, mw1240.eqiad.wmnet, mw1253.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1331.eqiad.wmnet, mw1319.eqiad.wmnet, mw1323.eqiad.wmnet, mw1249.eqiad.wmnet, mw1327.eqiad.wmnet, mw1328.eqiad.wmnet, mw1243.eqiad.wmnet, mw1245.eqiad.wmnet, mw1272.eqiad.wmnet, mw1263 [08:00:14] 258.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1271.eqiad.wmnet, mw1264.eqiad.wmnet, mw1250.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1321.eqiad.wmnet, mw1333.eqiad.wmnet, mw1241.eqiad.wmnet, mw1324.eqiad.wmnet, mw1255.eqiad.wmnet, mw1257.eqiad.wmnet, mw1239.eqiad.wmnet, mw1244.eqiad.wmnet, mw1238.eqiad.wmnet, mw1268.eqiad.wmnet, mw1269.eqiad.wmnet, mw1325.eqiad.wmnet, mw1274.eqiad.wmnet, mw1254.eqiad.wmn [08:00:14] wmnet, mw1252.eqiad.wmnet, mw1261.eqiad.wmnet, mw1270.eqiad.wmnet, mw1330.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal [08:00:20] PROBLEM - Varnish HTTP text-frontend - port 3125 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:00:20] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:00:24] <_joe_> hey here I am [08:00:26] RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET [08:00:36] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:00:44] PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([mw1265.eqiad.wmnet, mw1256.eqiad.wmnet, mw1242.eqiad.wmnet, mw1240.eqiad.wmnet, mw1246.eqiad.wmnet, mw1253.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1331.eqiad.wmnet, mw1319.eqiad.wmnet, mw1323.eqiad.wmnet, mw1249.eqiad.wmnet, mw1327.eqiad.wmnet, mw1328.eqiad.wmnet, mw1243.eqiad.wmnet, mw1245.eqiad.wmnet, mw1272 [08:00:44] 263.eqiad.wmnet, mw1258.eqiad.wmnet, mw1329.eqiad.wmnet, mw1320.eqiad.wmnet, mw1271.eqiad.wmnet, mw1264.eqiad.wmnet, mw1250.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1321.eqiad.wmnet, mw1333.eqiad.wmnet, mw1241.eqiad.wmnet, mw1324.eqiad.wmnet, mw1255.eqiad.wmnet, mw1257.eqiad.wmnet, mw1251.eqiad.wmnet, mw1239.eqiad.wmnet, mw1244.eqiad.wmnet, mw1238.eqiad.wmnet, mw1268.eqiad.wmnet, mw1269.eqiad.wmnet, mw1325.eqiad.wmn [08:00:44] wmnet, mw1254.eqiad.wmnet, mw1248.eqiad.wmnet, mw1252.eqiad.wmnet, mw1261.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal [08:00:58] RECOVERY - Varnish HTTP text-frontend - port 80 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:01:00] PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:10] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:10] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:10] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:22] PROBLEM - Varnish HTTP text-frontend - port 3120 on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:22] PROBLEM - Varnish HTTP text-frontend - port 3123 on cp1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:24] PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:01:44] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ReadTimeoutError(HTTPSConnectionPool(host=text-lb.eqiad.wikimedia.org, port=443): Read timed out. (read timeout=15),): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase [08:01:58] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:02:18] PROBLEM - Varnish HTTP text-frontend - port 3122 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:02:32] PROBLEM - Check if active EventStreams endpoint is delivering messages. on icinga1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [08:03:04] PROBLEM - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/project/view/71/ [08:03:18] RECOVERY - PHP7 rendering on mw1325 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.424 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:04:28] PROBLEM - Apache HTTP on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [08:05:40] * addshore reads up [08:06:15] Uh oh, metawiki is so slow for me [08:06:17] Is this related? [08:06:26] RECOVERY - Apache HTTP on mw1322 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.953 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:06:27] yes, there are known issues [08:06:47] * rileyh points to the status [08:07:54] PROBLEM - PHP7 rendering on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:08:08] RECOVERY - Apache HTTP on mw1325 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.940 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:08:32] PROBLEM - Varnish HTTP text-frontend - port 80 on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:08:36] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:08:58] RECOVERY - Nginx local proxy to apache on mw1322 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.808 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:09:16] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by ReadTimeoutError(HTTPSConnectionPool(host=text-lb.eqiad.wikimedia.org, port=443): Read timed out. (read timeout=15),): /api/rest_v1/?spec https://wikitech.wikimedia.org/wiki/RESTBase [08:09:24] PROBLEM - Nginx local proxy to apache on mw1325 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [08:09:40] RECOVERY - PHP7 rendering on mw1325 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.347 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:11:02] PROBLEM - Apache HTTP on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [08:11:12] RECOVERY - Nginx local proxy to apache on mw1325 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:11:15] RECOVERY - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15286 bytes in 9.600 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:12:10] RECOVERY - Varnish HTTP text-frontend - port 80 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:13:06] RECOVERY - Apache HTTP on mw1257 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.473 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:13:18] RECOVERY - Nginx local proxy to apache on mw1258 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.539 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:13:18] RECOVERY - PHP7 rendering on mw1258 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.878 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:13:22] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/graph/png/{title}/{revision}/{graph_id} (Get a graph from Graphoid) is CRITICAL: Test Get a graph from Graphoid returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/RESTBase [08:13:27] Zoranzoki21: Same. it's loading slow for me too [08:13:36] (or not at all) [08:13:42] I just enabled mwdebug and works correctly [08:13:51] folks are looking into the issue [08:14:44] I guess we're not bad luck [08:15:04] PROBLEM - PHP opcache health on mw1261 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [08:15:08] PROBLEM - Varnish HTTP text-frontend - port 3124 on cp1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:15:38] RECOVERY - Apache HTTP on mw1258 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.305 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:16:04] RECOVERY - Apache HTTP on mw1254 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.767 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:16:38] PROBLEM - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-m [08:16:41] PROBLEM - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:17:10] PROBLEM - Varnish HTTP text-frontend - port 3126 on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Varnish [08:17:12] RECOVERY - PHP7 rendering on mw1254 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.722 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:17:44] PROBLEM - Apache HTTP on mw1257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [08:18:02] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8295 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:18:10] PROBLEM - Nginx local proxy to apache on mw1322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [08:18:16] RECOVERY - Apache HTTP on mw1322 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.274 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:18:24] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 3.812 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:18:46] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:02] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [08:19:08] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:12] RECOVERY - graphoid endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [08:19:12] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [08:19:16] RECOVERY - PHP7 rendering on mw1322 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.316 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:19:16] RECOVERY - graphoid endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [08:19:18] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [08:19:20] PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: eqiad rsync status alert, rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [08:19:22] RECOVERY - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [08:19:24] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:28] RECOVERY - Apache HTTP on mw1256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.156 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:19:32] RECOVERY - Apache HTTP on mw1257 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.647 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:19:36] RECOVERY - Graphoid LVS eqiad on graphoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Graphoid [08:19:44] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 6.221 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:46] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [08:19:46] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:48] RECOVERY - graphoid endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [08:19:48] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:52] RECOVERY - Nginx local proxy to apache on mw1322 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.726 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:19:52] RECOVERY - Nginx local proxy to apache on mw1253 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.716 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:19:52] RECOVERY - Nginx local proxy to apache on mw1254 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:19:54] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:58] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 3.445 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:58] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:58] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:19:58] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:00] RECOVERY - Nginx local proxy to apache on mw1257 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:04] RECOVERY - Apache HTTP on mw1326 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.933 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:06] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [08:20:08] RECOVERY - Nginx local proxy to apache on mw1269 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 9.370 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:08] RECOVERY - Apache HTTP on mw1328 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 9.531 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:08] RECOVERY - PHP7 rendering on mw1256 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.137 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:08] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:08] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:10] RECOVERY - PHP7 rendering on mw1262 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.260 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:10] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [08:20:10] RECOVERY - PHP7 rendering on mw1327 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 9.898 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:12] RECOVERY - wiki content on commons #page on commons.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 170432 bytes in 0.018 second response time https://phabricator.wikimedia.org/project/view/1118/ [08:20:12] RECOVERY - Nginx local proxy to apache on mw1323 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.996 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:12] RECOVERY - restbase endpoints health on restbase1027 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:14] RECOVERY - PHP7 rendering on mw1267 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 8.330 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:14] RECOVERY - PHP7 rendering on mw1243 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 8.763 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:14] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:14] RECOVERY - Nginx local proxy to apache on mw1264 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 6.577 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:18] RECOVERY - Nginx local proxy to apache on mw1243 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 6.752 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:18] RECOVERY - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1964 bytes in 0.069 second response time https://phabricator.wikimedia.org/project/view/71/ [08:20:18] uh oh [08:20:19] RECOVERY - LVS HTTPS IPv4 #page on text-lb.ulsfo.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15287 bytes in 8.071 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:20:19] RECOVERY - Apache HTTP on mw1251 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 8.107 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:20] RECOVERY - Nginx local proxy to apache on mw1238 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 3.303 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:22] RECOVERY - Apache HTTP on mw1263 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 3.869 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:23] RECOVERY - LVS HTTPS IPv6 #page on text-lb.eqsin.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15300 bytes in 4.685 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:20:24] RECOVERY - Nginx local proxy to apache on mw1242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 6.626 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:24] RECOVERY - Apache HTTP on mw1268 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 5.193 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:24] RECOVERY - Nginx local proxy to apache on mw1240 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 5.642 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:24] RECOVERY - Apache HTTP on mw1269 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 6.364 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:24] RECOVERY - PHP7 rendering on mw1269 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 6.652 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:25] RECOVERY - LVS HTTPS IPv6 #page on text-lb.codfw.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15301 bytes in 6.823 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:20:25] RECOVERY - Apache HTTP on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 6.979 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:26] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [08:20:29] RECOVERY - https://phabricator.wikimedia.org #page on phabricator.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 36976 bytes in 0.128 second response time https://wikitech.wikimedia.org/wiki/Phabricator [08:20:29] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:30] RECOVERY - PyBal backends health check on lvs1013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:20:30] RECOVERY - Apache HTTP on mw1242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.841 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:32] RECOVERY - Nginx local proxy to apache on mw1255 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:32] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:32] RECOVERY - Apache HTTP on mw1246 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.449 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:34] RECOVERY - Nginx local proxy to apache on mw1252 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:34] RECOVERY - Nginx local proxy to apache on mw1245 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.080 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:34] RECOVERY - PHP7 rendering on mw1255 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.171 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:37] RECOVERY - LVS HTTPS IPv6 #page on text-lb.esams.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15300 bytes in 3.199 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:20:42] RECOVERY - PHP7 rendering on mw1251 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 1.148 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:42] RECOVERY - restbase endpoints health on restbase1024 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:42] RECOVERY - restbase endpoints health on restbase1018 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:42] RECOVERY - restbase endpoints health on restbase1025 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:42] RECOVERY - restbase endpoints health on restbase1021 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:42] RECOVERY - restbase endpoints health on restbase1019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:42] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:43] RECOVERY - Restrouter LVS eqiad on restrouter.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [08:20:43] RECOVERY - restbase endpoints health on restbase1022 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:44] RECOVERY - restbase endpoints health on restbase1017 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:44] RECOVERY - restbase endpoints health on restbase1026 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:45] RECOVERY - restbase endpoints health on restbase1016 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:45] RECOVERY - Nginx local proxy to apache on mw1249 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.040 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:46] RECOVERY - Nginx local proxy to apache on mw1250 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:46] RECOVERY - restbase endpoints health on restbase1020 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:47] RECOVERY - Nginx local proxy to apache on mw1256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.059 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:47] RECOVERY - restbase endpoints health on restbase1023 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:48] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:48] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [08:20:49] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [08:20:49] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [08:20:50] RECOVERY - PHP7 rendering on mw1328 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:50] RECOVERY - PHP7 rendering on mw1263 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.135 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:51] RECOVERY - PHP7 rendering on mw1252 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.188 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:51] RECOVERY - PHP7 rendering on mw1268 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.247 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:52] RECOVERY - PHP7 rendering on mw1272 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.356 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:52] RECOVERY - PHP7 rendering on mw1265 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.472 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:53] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:20:53] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [08:20:54] RECOVERY - Apache HTTP on mw1250 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:54] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 544 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:55] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:20:55] RECOVERY - Nginx local proxy to apache on mw1328 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:56] RECOVERY - PHP7 rendering on mw1257 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.115 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:56] RECOVERY - PHP7 rendering on mw1245 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.157 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:20:57] RECOVERY - graphoid endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/graphoid [08:20:57] RECOVERY - Apache HTTP on mw1255 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:58] RECOVERY - Apache HTTP on mw1252 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:58] RECOVERY - Nginx local proxy to apache on mw1263 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:20:59] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [08:20:59] RECOVERY - Nginx local proxy to apache on mw1266 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.413 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:00] RECOVERY - Apache HTTP on mw1332 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:00] RECOVERY - Apache HTTP on mw1327 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.074 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:01] RECOVERY - PHP7 rendering on mw1244 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.350 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:01] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:02] RECOVERY - Apache HTTP on mw1330 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.193 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:02] RECOVERY - Apache HTTP on mw1273 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 2.019 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:03] RECOVERY - Apache HTTP on mw1249 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:03] RECOVERY - Apache HTTP on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 2.410 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:04] RECOVERY - Apache HTTP on mw1248 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:04] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:05] RECOVERY - PHP7 rendering on mw1264 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:05] RECOVERY - Nginx local proxy to apache on mw1329 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:06] RECOVERY - Nginx local proxy to apache on mw1327 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.082 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:06] RECOVERY - Nginx local proxy to apache on mw1273 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.904 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:07] RECOVERY - PHP7 rendering on mw1333 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:07] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:08] RECOVERY - PHP7 rendering on mw1274 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.232 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:08] RECOVERY - Nginx local proxy to apache on mw1332 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:09] RECOVERY - Nginx local proxy to apache on mw1324 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:09] RECOVERY - LVS HTTPS IPv4 #page on text-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15284 bytes in 0.509 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:10] RECOVERY - LVS HTTPS IPv4 #page on text-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15286 bytes in 0.561 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:10] RECOVERY - LVS HTTPS IPv6 #page on text-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15297 bytes in 0.108 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:11] RECOVERY - PHP7 rendering on mw1324 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:11] RECOVERY - LVS HTTPS IPv4 #page on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 14667 bytes in 1.332 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:12] RECOVERY - Nginx local proxy to apache on mw1271 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.593 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:13] RECOVERY - Nginx local proxy to apache on mw1275 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:13] RECOVERY - PHP7 rendering on mw1247 is OK: HTTP OK: HTTP/1.1 200 OK - 79418 bytes in 0.103 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:15] RECOVERY - PHP7 rendering on mw1331 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.120 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:17] RECOVERY - Nginx local proxy to apache on mw1246 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.754 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:18] RECOVERY - LVS HTTPS IPv4 #page on text-lb.codfw.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15287 bytes in 2.891 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:21] RECOVERY - Apache HTTP on mw1239 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:21] RECOVERY - Apache HTTP on mw1264 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.035 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:21] RECOVERY - PHP7 rendering on mw1248 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.122 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:21] RECOVERY - PHP7 rendering on mw1241 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.124 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:21] RECOVERY - Apache HTTP on mw1272 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.043 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:23] RECOVERY - PHP7 rendering on mw1273 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 1.975 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:23] RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 2.636 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:23] RECOVERY - Apache HTTP on mw1274 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.494 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:25] RECOVERY - Apache HTTP on mw1247 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:25] RECOVERY - Apache HTTP on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:25] RECOVERY - Apache HTTP on mw1323 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:25] RECOVERY - PHP7 rendering on mw1321 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.116 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:25] RECOVERY - PHP7 rendering on mw1253 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.157 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:26] RECOVERY - Apache HTTP on mw1320 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.207 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:26] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [08:21:27] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:27] RECOVERY - PHP7 rendering on mw1261 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.471 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:29] RECOVERY - Apache HTTP on mw1329 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:29] RECOVERY - Nginx local proxy to apache on mw1239 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:29] RECOVERY - PHP7 rendering on mw1250 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.124 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:29] RECOVERY - Nginx local proxy to apache on mw1248 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.036 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:30] RECOVERY - Apache HTTP on mw1240 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:30] RECOVERY - Apache HTTP on mw1241 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:31] RECOVERY - Apache HTTP on mw1244 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.047 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:31] RECOVERY - Nginx local proxy to apache on mw1321 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:32] RECOVERY - Nginx local proxy to apache on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.100 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:32] RECOVERY - PHP7 rendering on mw1266 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.150 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:33] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:33] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:34] RECOVERY - Nginx local proxy to apache on mw1319 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:34] RECOVERY - Nginx local proxy to apache on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.904 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:35] RECOVERY - Nginx local proxy to apache on mw1320 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.275 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:35] RECOVERY - PHP7 rendering on mw1326 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.604 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:36] RECOVERY - Nginx local proxy to apache on mw1333 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.926 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:36] RECOVERY - Nginx local proxy to apache on mw1262 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 2.506 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:37] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:37] RECOVERY - Varnish HTTP text-frontend - port 3127 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:38] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 544 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:38] RECOVERY - Nginx local proxy to apache on mw1268 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:39] RECOVERY - Nginx local proxy to apache on mw1241 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.076 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:39] RECOVERY - PHP7 rendering on mw1239 is OK: HTTP OK: HTTP/1.1 200 OK - 79418 bytes in 0.108 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:40] RECOVERY - PHP7 rendering on mw1332 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.142 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:40] RECOVERY - PHP7 rendering on mw1320 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.332 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:41] RECOVERY - PHP7 rendering on mw1330 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.788 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:41] RECOVERY - PHP7 rendering on mw1270 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.882 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:42] RECOVERY - Apache HTTP on mw1243 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:42] RECOVERY - LVS HTTPS IPv6 #page on text-lb.ulsfo.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15297 bytes in 0.458 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:43] RECOVERY - Apache HTTP on mw1253 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:43] RECOVERY - Nginx local proxy to apache on mw1331 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:45] RECOVERY - Apache HTTP on mw1245 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.046 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:45] RECOVERY - Apache HTTP on mw1324 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:45] RECOVERY - Nginx local proxy to apache on mw1247 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:45] RECOVERY - Apache HTTP on mw1238 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:46] RECOVERY - PHP7 rendering on mw1319 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:46] RECOVERY - PHP7 rendering on mw1323 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:49] RECOVERY - Nginx local proxy to apache on mw1330 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.178 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:51] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [08:21:51] RECOVERY - PHP7 rendering on mw1249 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.117 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:51] RECOVERY - Nginx local proxy to apache on mw1326 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:53] RECOVERY - Apache HTTP on mw1262 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.613 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:53] RECOVERY - Nginx local proxy to apache on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:53] RECOVERY - PHP7 rendering on mw1271 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 4.439 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:55] RECOVERY - Nginx local proxy to apache on mw1244 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.066 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1085 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:21:57] RECOVERY - Apache HTTP on mw1319 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:21:58] RECOVERY - LVS HTTP IPv4 #page on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 14672 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:21:59] RECOVERY - PHP7 rendering on mw1275 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.127 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:59] RECOVERY - PHP7 rendering on mw1240 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.130 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:21:59] RECOVERY - PHP7 rendering on mw1238 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.125 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:22:00] RECOVERY - PHP7 rendering on mw1242 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.179 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:22:01] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:01] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 744 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:22:03] RECOVERY - Apache HTTP on mw1331 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:03] RECOVERY - Apache HTTP on mw1265 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:03] RECOVERY - Nginx local proxy to apache on mw1272 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.052 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:03] RECOVERY - PHP7 rendering on mw1329 is OK: HTTP OK: HTTP/1.1 200 OK - 79419 bytes in 0.129 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:22:05] RECOVERY - Nginx local proxy to apache on mw1274 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 629 bytes in 1.226 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:05] RECOVERY - PHP7 rendering on mw1246 is OK: HTTP OK: HTTP/1.1 200 OK - 79420 bytes in 2.878 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:22:07] RECOVERY - Apache HTTP on mw1271 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 3.570 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:09] RECOVERY - Apache HTTP on mw1275 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.042 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:09] RECOVERY - Apache HTTP on mw1321 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:09] RECOVERY - Apache HTTP on mw1333 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.590 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:11] RECOVERY - Varnish HTTP text-frontend - port 3127 on cp1081 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:15] RECOVERY - Nginx local proxy to apache on mw1251 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.057 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:17] 10Operations, 10Wikimedia-Incident: All Wikimedia sites are returning 504 error - https://phabricator.wikimedia.org/T241573 (10AlexisJazz) 05Open→03Resolved [08:22:18] RECOVERY - LVS HTTPS IPv4 #page on text-lb.eqsin.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 15284 bytes in 1.497 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [08:22:21] RECOVERY - Nginx local proxy to apache on mw1265 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/Application_servers [08:22:23] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:23] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:25] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:22:29] RECOVERY - High average GET latency for mw requests on api_appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=api_appserver&var-method=GET [08:22:29] RECOVERY - Varnish HTTP text-frontend - port 3127 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:29] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:29] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:29] RECOVERY - Varnish HTTP text-frontend - port 3124 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:30] RECOVERY - Varnish HTTP text-frontend - port 3126 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:33] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1087 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:33] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1075 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:33] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1079 is OK: HTTP OK: HTTP/1.1 200 OK - 542 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:40] 10Operations, 10Wikimedia-Incident: All Wikimedia sites are returning 504 error - https://phabricator.wikimedia.org/T241573 (10AlexisJazz) Dunno who fixed it, but thanks! [08:22:41] RECOVERY - Varnish HTTP text-frontend - port 3127 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:41] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:45] RECOVERY - Varnish HTTP text-frontend - port 3120 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:45] RECOVERY - Varnish HTTP text-frontend - port 3121 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:22:45] RECOVERY - Varnish HTTP text-frontend - port 3122 on cp1089 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:23:02] RECOVERY - ATS TLS has reduced HTTP availability #page on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=13&fullscreen&refresh=1m&orgId=1 [08:23:05] RECOVERY - Varnish HTTP text-frontend - port 3123 on cp1083 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:23:09] RECOVERY - Varnish HTTP text-frontend - port 3125 on cp1077 is OK: HTTP OK: HTTP/1.1 200 OK - 543 bytes in 0.000 second response time https://wikitech.wikimedia.org/wiki/Varnish [08:23:37] RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [08:24:01] RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [08:24:25] 10Operations, 10Wikimedia-Incident: All Wikimedia sites are returning 504 error - https://phabricator.wikimedia.org/T241573 (10ArielGlenn) 05Resolved→03Open Not yet resolved. Re-opening for now. [08:24:45] RECOVERY - High average POST latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST [08:24:57] (03PS1) 10Ema: vcl: rewrite cache busting Main_Page tests [puppet] - 10https://gerrit.wikimedia.org/r/561126 [08:26:41] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [08:27:11] (03CR) 10Giuseppe Lavagetto: [C: 04-1] vcl: rewrite cache busting Main_Page tests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/561126 (owner: 10Ema) [08:27:30] (03PS2) 10Ema: vcl: rewrite cache busting Main_Page tests [puppet] - 10https://gerrit.wikimedia.org/r/561126 [08:28:12] (03CR) 10Giuseppe Lavagetto: [C: 03+1] vcl: rewrite cache busting Main_Page tests [puppet] - 10https://gerrit.wikimedia.org/r/561126 (owner: 10Ema) [08:28:47] (03CR) 10Alexandros Kosiaris: [C: 03+1] vcl: rewrite cache busting Main_Page tests [puppet] - 10https://gerrit.wikimedia.org/r/561126 (owner: 10Ema) [08:29:14] (03CR) 10Ema: [C: 03+2] vcl: rewrite cache busting Main_Page tests [puppet] - 10https://gerrit.wikimedia.org/r/561126 (owner: 10Ema) [08:31:33] RECOVERY - PHP opcache health on mw1261 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [08:33:09] RECOVERY - Check if active EventStreams endpoint is delivering messages. on icinga1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [08:34:38] 10Operations, 10Wikimedia-Incident: All Wikimedia sites are returning 504 error - https://phabricator.wikimedia.org/T241573 (10ArielGlenn) p:05Unbreak!→03High Measures for remediation applied, lowering the priority of this task for the moment but not yet closing. [08:34:55] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:40:13] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:38:00] (03PS1) 10Ema: varnish: add vcl_ec2_nets.py [puppet] - 10https://gerrit.wikimedia.org/r/561136 [09:45:53] (03PS2) 10Ema: varnish: add vcl_ec2_nets.py [puppet] - 10https://gerrit.wikimedia.org/r/561136 [09:52:15] (03CR) 10Jcrespo: "While I could merge this right away, there is a procedure, unless this is an emergency or there are privacy concerns, I would suggest to c" [puppet] - 10https://gerrit.wikimedia.org/r/560972 (owner: 10Ladsgroup) [10:05:11] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 31841056 and 7 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [10:06:57] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [10:23:04] (03Abandoned) 10DCausse: [wdqs] enable async imports on wdqs1005 and wdqs2001 [puppet] - 10https://gerrit.wikimedia.org/r/559847 (https://phabricator.wikimedia.org/T238045) (owner: 10DCausse) [10:39:13] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:39:15] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:40:41] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:40:59] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:42:27] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:42:49] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:44:39] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:45:57] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:46:25] PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:46:27] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:47:45] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:48:09] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:49:57] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:50:08] (03CR) 10Jbond: [C: 04-1] "thanks for the work some minor fixes" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559944 (owner: 10CDanis) [10:51:47] RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:51:47] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:51:49] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:53:35] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:53:45] 10Operations, 10Puppet, 10Patch-For-Review: puppet-merge can't accept an explicit SHA1 for an --ops merge - https://phabricator.wikimedia.org/T241277 (10jbond) p:05Triage→03Normal [10:54:49] (03PS3) 10Jbond: puppet-merge.py: SHA1 or explicit FETCH_HEAD is mandatory [puppet] - 10https://gerrit.wikimedia.org/r/559944 (https://phabricator.wikimedia.org/T241277) (owner: 10CDanis) [10:55:23] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:55:54] 10Operations, 10Puppet, 10Patch-For-Review: puppet-merge can't accept an explicit SHA1 for an --ops merge - https://phabricator.wikimedia.org/T241277 (10jbond) >>! In T241277#5762041, @jcrespo wrote: > @CDanis Is this something you plan to work on? Otherwise, who do you need help with? I am trying to triage... [10:57:07] PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [10:58:53] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:00:39] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:00:43] RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:05:24] (03CR) 10Jbond: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/554825 (https://phabricator.wikimedia.org/T221083) (owner: 10Jbond) [11:23:57] PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:25:43] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [11:32:04] 10Operations, 10Core Platform Team, 10Discovery, 10Recommendation-API, and 3 others: flapping monitoring for recommendation_api on scb - https://phabricator.wikimedia.org/T178445 (10jcrespo) This is flapping very frequently, but with a 500, not a 429 (scb1002 only, for example, twice per hour). Should I cl... [11:33:57] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: include more packages for thirdparty/openstack-pike-stretch [puppet] - 10https://gerrit.wikimedia.org/r/561140 (https://phabricator.wikimedia.org/T241347) [11:43:18] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: include more packages for thirdparty/openstack-pike-stretch [puppet] - 10https://gerrit.wikimedia.org/r/561140 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez) [11:45:41] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241534 (10jcrespo) a:03Papaul This being a software raid please coordinate with @fgiunchedi . [11:45:52] !log importing more packages into stretch-wikimedia/thirdparty/openstack-pike-stretch (T241347) [11:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:01] T241347: upgrade cloud-vps openstack to Openstack version 'Pike' - https://phabricator.wikimedia.org/T241347 [11:47:58] 10Operations, 10ops-codfw, 10SRE-swift-storage: Degraded RAID on ms-be2035 - https://phabricator.wikimedia.org/T241535 (10jcrespo) a:03fgiunchedi See also T241534, I am not sure exactly what was the problem detected. [12:00:01] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: fix missing update for stretch-wikimedia (openstack-stretch-pike) [puppet] - 10https://gerrit.wikimedia.org/r/561141 (https://phabricator.wikimedia.org/T241347) [12:00:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: fix missing update for stretch-wikimedia (openstack-stretch-pike) [puppet] - 10https://gerrit.wikimedia.org/r/561141 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez) [12:01:01] (03CR) 10Andrew Bogott: [C: 03+1] aptrepo: fix missing update for stretch-wikimedia (openstack-stretch-pike) [puppet] - 10https://gerrit.wikimedia.org/r/561141 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez) [12:10:43] (03PS1) 10Arturo Borrero Gonzalez: aptrepro: openstack-pike-stretch: import python-oslo.middleware too [puppet] - 10https://gerrit.wikimedia.org/r/561142 (https://phabricator.wikimedia.org/T241347) [12:11:40] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepro: openstack-pike-stretch: import python-oslo.middleware too [puppet] - 10https://gerrit.wikimedia.org/r/561142 (https://phabricator.wikimedia.org/T241347) (owner: 10Arturo Borrero Gonzalez) [12:59:15] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={cloud_dev_pdns,cloud_dev_pdns_rec} site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:01:01] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:18:36] (03PS1) 10Andrew Bogott: pike designate.conf: specify pymysql driver for db connections [puppet] - 10https://gerrit.wikimedia.org/r/561148 (https://phabricator.wikimedia.org/T241348) [13:18:38] (03PS1) 10Andrew Bogott: pike deisgnate.conf: update config for deprecated/renamed options [puppet] - 10https://gerrit.wikimedia.org/r/561149 (https://phabricator.wikimedia.org/T241348) [13:20:32] (03CR) 10Andrew Bogott: [C: 03+2] pike designate.conf: specify pymysql driver for db connections [puppet] - 10https://gerrit.wikimedia.org/r/561148 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott) [13:20:46] (03CR) 10Andrew Bogott: [C: 03+2] pike deisgnate.conf: update config for deprecated/renamed options [puppet] - 10https://gerrit.wikimedia.org/r/561149 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott) [13:31:16] (03PS1) 10Andrew Bogott: designate.conf: move a couple more config options for Pike [puppet] - 10https://gerrit.wikimedia.org/r/561151 (https://phabricator.wikimedia.org/T241348) [13:32:54] (03CR) 10Andrew Bogott: [C: 03+2] designate.conf: move a couple more config options for Pike [puppet] - 10https://gerrit.wikimedia.org/r/561151 (https://phabricator.wikimedia.org/T241348) (owner: 10Andrew Bogott) [13:52:27] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [13:54:15] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:16:22] (03PS1) 10Ema: varnish: dummy acl cloud_nets [labs/private] - 10https://gerrit.wikimedia.org/r/561153 [14:22:29] (03CR) 10Ema: [C: 03+2] varnish: add vcl_ec2_nets.py [puppet] - 10https://gerrit.wikimedia.org/r/561136 (owner: 10Ema) [14:23:00] (03CR) 10Ema: [V: 03+2 C: 03+2] varnish: dummy acl cloud_nets [labs/private] - 10https://gerrit.wikimedia.org/r/561153 (owner: 10Ema) [14:31:08] (03PS1) 10Ema: vcl: stricter rate limiting of cloud IPs [puppet] - 10https://gerrit.wikimedia.org/r/561156 [14:37:53] (03PS1) 10Ema: Revert "vcl: rewrite cache busting Main_Page tests" [puppet] - 10https://gerrit.wikimedia.org/r/561157 [14:40:57] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:42:17] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:42:45] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [14:44:05] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [15:07:21] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:37:43] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:58:51] 10Operations, 10Traffic, 10observability: cp1083: ats-tls and varnish-fe crashed due to insufficient memory - https://phabricator.wikimedia.org/T241593 (10ema) [15:58:56] 10Operations, 10Traffic, 10observability: cp1083: ats-tls and varnish-fe crashed due to insufficient memory - https://phabricator.wikimedia.org/T241593 (10ema) p:05Triage→03High [16:00:28] !log cp1083: restart ats-tls and varnish-fe after crashes - T241593 [16:00:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:36] T241593: cp1083: ats-tls and varnish-fe crashed due to insufficient memory - https://phabricator.wikimedia.org/T241593 [16:02:59] RECOVERY - traffic_server tls process restarted on cp1083 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqiad+prometheus/ops&var-instance=cp1083&var-layer=tls [16:17:37] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:19:09] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:19:25] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:19:25] PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:22:59] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:24:31] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:24:45] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [16:26:31] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:04:05] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:07:51] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:09:29] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:09:39] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:09:39] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:11:27] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:50:29] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [17:54:03] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:18:09] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:19:57] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:25:22] Hi, Can someone having +2 rights on integration/config merge https://gerrit.wikimedia.org/r/#/c/integration/config/+/560982/? [19:51:51] PROBLEM - configured eth on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [19:51:57] PROBLEM - Check systemd state on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:52:15] PROBLEM - dhclient process on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [19:53:07] PROBLEM - Check size of conntrack table on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [19:53:11] PROBLEM - Disk space on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=analytics-tool1001&var-datasource=eqiad+prometheus/ops [19:53:19] PROBLEM - Check whether ferm is active by checking the default input chain on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [19:53:35] PROBLEM - DPKG on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [19:55:45] PROBLEM - recommendation_api endpoints health on scb2001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:56:03] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:56:27] PROBLEM - puppet last run on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [19:57:31] RECOVERY - recommendation_api endpoints health on scb2001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [19:58:41] PROBLEM - SSH on analytics-tool1001 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:01:05] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:01:19] PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:02:53] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:03:05] RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:03:13] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:08:39] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:10:27] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [20:11:07] RECOVERY - SSH on analytics-tool1001 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:21:03] PROBLEM - Check the NTP synchronisation status of timesyncd on analytics-tool1001 is CRITICAL: connect to address 10.64.36.110 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP [20:33:55] RECOVERY - Check size of conntrack table on analytics-tool1001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [20:33:59] RECOVERY - Disk space on analytics-tool1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=analytics-tool1001&var-datasource=eqiad+prometheus/ops [20:34:09] RECOVERY - Check whether ferm is active by checking the default input chain on analytics-tool1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [20:34:25] RECOVERY - DPKG on analytics-tool1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [20:34:27] RECOVERY - configured eth on analytics-tool1001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [20:34:35] RECOVERY - Check systemd state on analytics-tool1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:34:53] RECOVERY - dhclient process on analytics-tool1001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [20:36:49] RECOVERY - puppet last run on analytics-tool1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [20:51:51] RECOVERY - Check the NTP synchronisation status of timesyncd on analytics-tool1001 is OK: OK: synced at Mon 2019-12-30 20:51:49 UTC. https://wikitech.wikimedia.org/wiki/NTP [21:34:03] PROBLEM - recommendation_api endpoints health on scb2006 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:34:13] PROBLEM - recommendation_api endpoints health on scb1001 is CRITICAL: /{domain}/v1/article/creation/translation/{source}{/seed} (article.creation.translation - bad seed) is CRITICAL: Test article.creation.translation - bad seed returned the unexpected status 500 (expecting: 404) https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:35:51] RECOVERY - recommendation_api endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:36:01] RECOVERY - recommendation_api endpoints health on scb1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [21:49:22] (03CR) 10Jbond: puppet-merge.py: SHA1 or explicit FETCH_HEAD is mandatory (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/559944 (https://phabricator.wikimedia.org/T241277) (owner: 10CDanis)