[00:13:50] RECOVERY - SSH on db2082.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [00:56:12] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:57:02] 10Operations, 10ops-eqiad: Degraded RAID on kafka-jumbo1001 - https://phabricator.wikimedia.org/T251586 (10ops-monitoring-bot) [00:59:46] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 80905 bytes in 0.450 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [01:03:36] PROBLEM - SSH on ganeti2004.mgmt is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:32:14] !log bmansurov@deploy1001 Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service [01:32:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:36:46] !log bmansurov@deploy1001 Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s) [01:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:38:30] 10Operations, 10Research: recommendation api's test on scb nodes are flapping - https://phabricator.wikimedia.org/T247732 (10bmansurov) 05Open→03Resolved a:03bmansurov All changes have been deployed. Feel free to re-open when you see the issue again. [01:43:21] (03PS1) 10Reedy: Replace AuthManagerStatsdHandler with namespaced class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 [01:43:44] (03CR) 10Reedy: [C: 04-1] ".30 needs to be everywhere and stable" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 (owner: 10Reedy) [01:47:05] (03PS1) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 [01:51:40] (03PS2) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 [01:52:08] (03PS3) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 [02:16:42] (03CR) 10Krinkle: [C: 03+1] Replace stringified class names with ::class (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (owner: 10Reedy) [02:21:42] PROBLEM - PHP opcache health on mw2139 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [02:24:17] I just got an error about the database being locked for maintenance on Commons. Is that expected? [02:24:25] "WARNING: The database has been locked for maintenance, so you will not be able to save your edits right now." [02:27:11] chaomodus: ^ [02:33:15] (03PS1) 10Ottomata: refinery::job::camus - Make sure check defaults to true [puppet] - 10https://gerrit.wikimedia.org/r/593659 [02:37:17] interesting [02:37:32] Seems to be OK now. Just saw it once. [02:37:47] (03CR) 10Ottomata: [C: 03+2] refinery::job::camus - Make sure check defaults to true [puppet] - 10https://gerrit.wikimedia.org/r/593659 (owner: 10Ottomata) [02:38:08] RECOVERY - PHP opcache health on mw2139 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [02:40:27] okay cool, it was working when i tested just now too [05:13:20] 10Operations: FY2019-20 Q4 (or later) codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [05:21:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:22:54] kaldari: It is not expected, but happens sometimes when MW believes all the replicas are lagged (but they are not) [05:23:34] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:27:06] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 200 OK - 80859 bytes in 0.355 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:28:05] (03PS1) 10Marostegui: Revert "install_server: Reimage labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/593665 [05:32:36] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:33:04] wow those fatals really increased [05:33:30] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5009 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:35:58] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Reimage labsdb1011" [puppet] - 10https://gerrit.wikimedia.org/r/593665 (owner: 10Marostegui) [05:38:08] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:40:56] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5344 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:41:52] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:45:44] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:47:24] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80859 bytes in 0.385 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:48:14] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8459 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:49:58] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:51:10] (03CR) 10Marostegui: "I think the parsercache stand by host should follow the same rules as its siblings." [puppet] - 10https://gerrit.wikimedia.org/r/593527 (https://phabricator.wikimedia.org/T172489) (owner: 10Jcrespo) [05:52:12] PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:53:30] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80859 bytes in 0.925 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:55:44] RECOVERY - PHP7 rendering on mw1351 is OK: HTTP OK: HTTP/1.1 200 OK - 80845 bytes in 0.096 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [05:56:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:08:30] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5117 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:11:34] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:12:40] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:14:20] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80849 bytes in 0.317 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:16:50] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 80849 bytes in 0.360 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:21:40] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:23:20] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 80849 bytes in 0.350 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:26:04] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:26:54] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:27:40] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:27:42] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 80849 bytes in 0.365 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:32:10] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80849 bytes in 0.475 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:34:02] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6830 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:46:44] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5677 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:53:04] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:54:06] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5258 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:54:32] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:55:00] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:56:12] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.192 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:56:38] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 1.738 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [06:57:46] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5743 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:02:20] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:05:04] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:05:04] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7366 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:06:44] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:07:26] PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:07:50] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:09:06] RECOVERY - PHP7 rendering on mw1351 is OK: HTTP OK: HTTP/1.1 200 OK - 80800 bytes in 0.108 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:11:00] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:12:42] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.309 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:13:24] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:15:06] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.612 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:19:54] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:21:34] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.532 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:22:50] <_joe_> ugh [07:27:00] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6275 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:28:12] _joe_ here if needed [07:28:23] I can check memcached [07:31:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:31:48] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:31:56] PROBLEM - PHP7 rendering on mw1353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:33:20] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:33:36] RECOVERY - PHP7 rendering on mw1353 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.337 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:34:20] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6708 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:35:02] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 80803 bytes in 1.014 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:35:14] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:38:18] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:38:22] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:38:58] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 80802 bytes in 1.259 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:40:04] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:40:42] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:41:08] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [07:41:50] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80801 bytes in 0.222 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:42:58] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [07:44:35] !log Copy wikireplica dump from labsdb1009 to labsdb1011 - T249188 [07:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:38] T249188: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 [07:53:28] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:54:24] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7316 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:59:06] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [07:59:54] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 7244 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:00:48] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80860 bytes in 2.367 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:04:34] PROBLEM - PHP7 rendering on mw1351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:05:07] (03PS1) 10Giuseppe Lavagetto: mediawiki::web::vhost: add the ability to define go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593727 [08:05:09] (03PS1) 10Giuseppe Lavagetto: mediawiki::web::site: add a simple rule to go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593728 [08:08:04] RECOVERY - PHP7 rendering on mw1351 is OK: HTTP OK: HTTP/1.1 200 OK - 80859 bytes in 0.281 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:08:43] (03CR) 10Vgutierrez: mediawiki::web::site: add a simple rule to go away conditions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593728 (owner: 10Giuseppe Lavagetto) [08:09:18] (03CR) 10Vgutierrez: mediawiki::web::vhost: add the ability to define go away conditions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593727 (owner: 10Giuseppe Lavagetto) [08:10:16] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [08:12:33] (03CR) 10Marostegui: "Missing quote: https://puppet-compiler.wmflabs.org/compiler1002/22249/mw1351.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593728 (owner: 10Giuseppe Lavagetto) [08:14:12] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:15:56] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 80835 bytes in 0.622 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:17:40] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [08:18:10] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6885 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:22:44] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:24:59] (03CR) 10Giuseppe Lavagetto: mediawiki::web::site: add a simple rule to go away conditions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593728 (owner: 10Giuseppe Lavagetto) [08:25:03] (03CR) 10Giuseppe Lavagetto: mediawiki::web::vhost: add the ability to define go away conditions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593727 (owner: 10Giuseppe Lavagetto) [08:25:16] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:25:16] (03PS2) 10Giuseppe Lavagetto: mediawiki::web::vhost: add the ability to define go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593727 [08:25:18] (03PS2) 10Giuseppe Lavagetto: mediawiki::web::site: add a simple rule to go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593728 [08:26:20] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 200 OK - 80836 bytes in 1.979 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:28:25] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/22251/mw1331.eqiad.wmnet/index.html no actual changes happen" [puppet] - 10https://gerrit.wikimedia.org/r/593727 (owner: 10Giuseppe Lavagetto) [08:28:41] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/22251/mw1331.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/593727 (owner: 10Giuseppe Lavagetto) [08:28:52] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 80836 bytes in 1.201 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:29:16] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6863 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:31:58] (03PS3) 10Giuseppe Lavagetto: mediawiki::web::vhost: add the ability to define go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593727 [08:32:00] (03PS3) 10Giuseppe Lavagetto: mediawiki::web::site: add a simple rule to go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593728 [08:34:40] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8647 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:35:02] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/22253/mw1331.eqiad.wmnet/index.html seems to DTRT" [puppet] - 10https://gerrit.wikimedia.org/r/593728 (owner: 10Giuseppe Lavagetto) [08:37:10] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:37:36] PROBLEM - PHP7 rendering on mw1350 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:37:49] <_joe_> !log depooling mw1352 [08:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:18] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6145 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:38:50] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 80835 bytes in 0.178 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:39:03] <_joe_> !log repool mw1352 [08:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:18] RECOVERY - PHP7 rendering on mw1350 is OK: HTTP OK: HTTP/1.1 200 OK - 80836 bytes in 2.041 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:39:38] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [08:41:44] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:43:28] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80836 bytes in 3.790 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:44:44] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:45:44] <_joe_> !log repooling mw1409 [08:45:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:03] (03CR) 10Vgutierrez: [C: 03+1] mediawiki::web::vhost: add the ability to define go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593727 (owner: 10Giuseppe Lavagetto) [08:46:22] (03CR) 10Vgutierrez: [C: 03+1] mediawiki::web::site: add a simple rule to go away conditions [puppet] - 10https://gerrit.wikimedia.org/r/593728 (owner: 10Giuseppe Lavagetto) [08:47:18] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:48:07] <_joe_> !log repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled [08:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:58] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 80836 bytes in 1.626 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:50:04] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:50:04] !log oblivian@cumin1001 conftool action : set/weight=10; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet [08:50:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:16] <_joe_> let's see if this fixes the problem [08:50:24] <_joe_> if that's the case, it's a volumetric issue [08:50:31] <_joe_> if not, it's a deeper network issue [08:51:44] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 80835 bytes in 0.808 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [08:52:02] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [08:53:36] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T251579 (10Peachey88) [08:53:38] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T251563 (10Peachey88) [08:53:59] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T251579 (10Peachey88) [08:54:19] !log oblivian@cumin1001 conftool action : set/pooled=no:weight=30; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet [08:54:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:49] <_joe_> !log depooled all servers in the app pool in rack D1 [08:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:50] D1: Initial commit - https://phabricator.wikimedia.org/D1 [08:55:19] <_joe_> lol [08:55:44] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:00:16] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 477 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:23:38] PROBLEM - PHP opcache health on mw2147 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:33:10] 10Operations, 10SRE-Access-Requests: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10Aklapper) [OT] @leila: Related: Could you update the status and/or assignee of https://phabricator.wikimedia.org/maniphest/query/5gpN.lPseKRf/#R ? Thanks in advance! :) [09:36:40] 10Operations, 10Research: recommendation api's test on scb nodes are flapping - https://phabricator.wikimedia.org/T247732 (10elukey) 05Resolved→03Open @bmansurov one thing that I'd consider is changing the health check for the recommendation API service, and possibly not fire a request to mediawiki but jus... [09:42:30] (03PS1) 10Dzahn: admins: remove access for lexnasser, MOU expired [puppet] - 10https://gerrit.wikimedia.org/r/593730 [09:45:44] RECOVERY - PHP opcache health on mw2147 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [09:53:46] 10Operations, 10Wikimedia-Mailing-lists: The Wikiml-l is not archiving mail from August 2019 - https://phabricator.wikimedia.org/T251554 (10Aklapper) @jayantanth: Hi, do you know for sure that others on the mailing list received your recent emails, via the mailing list (not because of being CCed, etc)? [09:53:48] (03CR) 10Dzahn: [C: 03+2] admins: remove access for lexnasser, MOU expired [puppet] - 10https://gerrit.wikimedia.org/r/593730 (owner: 10Dzahn) [09:53:59] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Aklapper) [09:55:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:59:16] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22720 bytes in 0.273 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:06:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:15:48] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Dzahn) @jayantanth The list is set to "moderated" and therefore your messages are being held in moderation queue. The moderation queue is REALLY full. At the same time t... [10:15:48] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22723 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:16:38] (03PS1) 10ArielGlenn: CodeReview tables are now available for public download. [puppet] - 10https://gerrit.wikimedia.org/r/593731 (https://phabricator.wikimedia.org/T243055) [10:17:26] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10jayantanth) I gave tried today, three times to send the mails to "Wikiml-l". I have already subscribed to this mailing list long ago. I haven't received any error or bounce... [10:20:11] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Dzahn) @jayantanth Under "Membership Management" there is at the bottom "Set everyone's moderation bit, including those members not currently visible" and that is set to ON... [10:21:17] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Dzahn) Please mail wikiml-l-owner@lists.wikimedia.org to talk to the list admins about changing that setting and cleaning up the moderation queue. [10:24:47] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10jayantanth) >>! In T251554#6099619, @Dzahn wrote: > @jayantanth Under "Membership Management" there is at the bottom "Set everyone's moderation bit, including those members... [10:24:55] 10Operations, 10SRE-Access-Requests: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10Dzahn) a:03Dzahn [10:27:05] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Dzahn) @jayantanth Yes, as i said above, the list admins can change these settings. Somebody did it on purpose. Or they confused what "moderation bit" means and thought it... [10:30:05] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10Dzahn) It does not just affect you, it affects the majority of the list subscribers, but not all of them. [10:32:50] (03PS1) 10Dzahn: admins: remove shell access for jmorgan [puppet] - 10https://gerrit.wikimedia.org/r/593733 (https://phabricator.wikimedia.org/T251560) [10:36:48] (03PS1) 10Dzahn: admin: remove cachemiscpuppet alias from my .bash_profile [puppet] - 10https://gerrit.wikimedia.org/r/593734 [10:38:08] (03CR) 10Dzahn: [C: 03+2] admins: remove shell access for jmorgan [puppet] - 10https://gerrit.wikimedia.org/r/593733 (https://phabricator.wikimedia.org/T251560) (owner: 10Dzahn) [10:39:44] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:42:47] (03PS1) 10Ema: ATS: add SystemTap probe for cacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/593735 (https://phabricator.wikimedia.org/T251537) [10:43:47] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10RhinosF1) Sad to see this ticket but can someone also ask OIT to lock https://meta.wikimedia.org/wiki/Special:CentralAuth?target=Jmorgan%20(WMF) [10:45:14] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22736 bytes in 7.323 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:47:58] RhinosF1: https://wikimedia.zendesk.com/hc/en-us/requests/new btw [10:49:59] mutante: I did not know about that. Will submit. [10:50:39] cool :) [10:50:58] They usually do that themselves [10:52:32] {{done}} [10:52:47] hauskatze: hasn’t been done yet [10:53:48] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10Dzahn) @leila Done! He was in analytics-privatedata-users but no other groups. I merged the change above and ran puppet on all bastion hosts and all 95 "analytics-all... [10:53:55] Although I forget they’ll be in SF won’t they, probably need to be awake for it [10:54:00] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10Dzahn) 05Open→03Resolved [10:54:50] mutante: I guess you can quickly use that offboard-user.py script as well to check for leftovers [10:57:19] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 - https://phabricator.wikimedia.org/T251554 (10jayantanth) Thanks. I have emailed to list admin. [10:57:29] that will happen when OIT trigges the offboarding workflow [10:59:33] (03CR) 10Dzahn: [C: 03+2] admin: remove cachemiscpuppet alias from my .bash_profile [puppet] - 10https://gerrit.wikimedia.org/r/593734 (owner: 10Dzahn) [11:06:32] (03PS4) 10Dzahn: remove bromine.eqiad, vega.codfw and webserver_misc_static role [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) [11:08:31] 10Operations, 10LDAP-Access-Requests: Add uid=srodlund,ou=people,dc=wikimedia,dc=org to cn=wmf,ou=groups,dc=wikimedia,dc=org - https://phabricator.wikimedia.org/T251163 (10Dzahn) [11:10:00] 10Operations, 10LDAP-Access-Requests: Add uid=srodlund,ou=people,dc=wikimedia,dc=org to cn=wmf,ou=groups,dc=wikimedia,dc=org - https://phabricator.wikimedia.org/T251163 (10Dzahn) a:03Dzahn The right Phabricator tag for this is LDAP-Access-Requests. It does not require all the things for a shell access reques... [11:17:45] (03PS1) 10Dzahn: admin: add Sarah Rodlund to ldap_only_admins [puppet] - 10https://gerrit.wikimedia.org/r/593738 (https://phabricator.wikimedia.org/T251163) [11:25:04] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22255/" [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:27:48] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:28:55] (03CR) 10Dzahn: [C: 03+2] remove bromine.eqiad, vega.codfw and webserver_misc_static role [puppet] - 10https://gerrit.wikimedia.org/r/593218 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:31:12] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [11:31:13] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:31:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:19] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [11:31:19] 10Operations, 10serviceops, 10Patch-For-Review: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10ops-monitoring-bot) Icinga downtime for 1 day, 0:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: decom ` bromine.eqiad.wmnet ` [11:31:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:20] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:31:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:29] 10Operations, 10serviceops, 10Patch-For-Review: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10ops-monitoring-bot) Icinga downtime for 1 day, 0:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: decom ` vega.codfw.wmnet ` [11:32:52] (03CR) 10Dzahn: [C: 03+2] admin: add Sarah Rodlund to ldap_only_admins [puppet] - 10https://gerrit.wikimedia.org/r/593738 (https://phabricator.wikimedia.org/T251163) (owner: 10Dzahn) [11:33:09] (03PS2) 10Dzahn: admin: add Sarah Rodlund to ldap_only_admins [puppet] - 10https://gerrit.wikimedia.org/r/593738 (https://phabricator.wikimedia.org/T251163) [11:36:50] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:38:20] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 58.04 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [11:38:38] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22735 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:39:55] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Add uid=srodlund,ou=people,dc=wikimedia,dc=org to cn=wmf,ou=groups,dc=wikimedia,dc=org - https://phabricator.wikimedia.org/T251163 (10Dzahn) 05Open→03Resolved Done! @srodlund You are now a member of the wmf group. Login to piwiki should work for y... [11:40:20] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22732 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:41:03] 10Operations, 10Analytics, 10LDAP-Access-Requests: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10Dzahn) @DZierten Could you also give as an expiration date please? For contractors we need to add that to the repository on our side. Tha... [11:42:02] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 94.43 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [11:42:21] 10Operations, 10LDAP-Access-Requests: Add Eamedina to `wmf` LDAF group - https://phabricator.wikimedia.org/T251358 (10Dzahn) Hi @eamedina Could you let us know which tools you need access to? [11:50:06] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [11:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:53] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [11:50:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:00] 10Operations, 10serviceops, 10Patch-For-Review: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `vega.codfw.wmnet` - vega.codfw.wmnet (**PASS**) - Downtimed host on Icinga... [11:51:11] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [11:51:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:07] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [11:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:13] 10Operations, 10serviceops, 10Patch-For-Review: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `bromine.eqiad.wmnet` - bromine.eqiad.wmnet (**PASS**) - Downtimed host on Ici... [11:52:26] (03PS2) 10Dzahn: decom bromine.eqiad.wmnet and vega.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/593221 (https://phabricator.wikimedia.org/T247650) [11:54:29] (03CR) 10Dzahn: [C: 03+2] decom bromine.eqiad.wmnet and vega.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/593221 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:55:44] 10Operations, 10serviceops, 10Patch-For-Review: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10Dzahn) 05Open→03Resolved Done! bromine and vega are gone and the services on it have been fully merged into miscweb1002/2002 which are on buster. [11:55:47] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn) [11:55:56] 10Operations, 10serviceops: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10Dzahn) [11:55:56] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5256 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:59:18] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:00:58] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81092 bytes in 0.622 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:05:42] !log notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet T251560 [12:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:45] T251560: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 [12:07:01] !log notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet T251560 [12:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:47] mutante: thanks :) [12:10:16] ACKNOWLEDGEMENT - Host 208.80.153.83 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn some ongoing work in wmcs but no downtime [12:10:39] elukey: yw, just happened to notice in icinga as warn [12:11:05] i ran puppet on analytics-all-eqiad earlier [12:14:06] 10Operations, 10netops, 10serviceops: Investigate D1 appservers<->memcache TKOs - https://phabricator.wikimedia.org/T251601 (10ayounsi) p:05Triage→03High [12:14:49] 10Operations, 10SRE-Access-Requests: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10elukey) Created T251600 to check leftovers on analytics nodes. [12:16:47] 10Operations, 10netops, 10serviceops: Investigate D1 appservers<->memcache TKOs - https://phabricator.wikimedia.org/T251601 (10ayounsi) [12:20:12] !log mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw [12:20:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:08] !log mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw [12:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:56] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:31:02] PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:32:20] RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 200 OK - 81118 bytes in 8.664 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:36:58] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:41:30] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81092 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:42:42] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22726 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:44:29] (03PS1) 10Zoranzoki21: [WIP] Initial config for awawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) [12:51:07] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 (moderation enabled but nobody moderates, hence no emails get delivered) - https://phabricator.wikimedia.org/T251554 (10Aklapper) [12:51:10] (03CR) 10Zoranzoki21: "Review is welcome, and help with rebasing of InitialiseSettings.php also. I done git pull before editing InitialiseSettings.php, and looks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [12:52:02] PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:53:44] RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 200 OK - 81092 bytes in 0.400 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [12:54:24] PROBLEM - PHP opcache health on mw2310 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:54:36] PROBLEM - PHP opcache health on mw2314 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:56:02] 10Operations, 10netops, 10serviceops: Investigate D1 appservers<->memcache TKOs - https://phabricator.wikimedia.org/T251601 (10ayounsi) [12:56:12] RECOVERY - PHP opcache health on mw2310 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:57:50] PROBLEM - PHP opcache health on mw2376 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:57:50] PROBLEM - PHP opcache health on mw2315 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:58:16] RECOVERY - PHP opcache health on mw2314 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:59:14] PROBLEM - PHP opcache health on mw2301 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:59:40] PROBLEM - PHP opcache health on mw2307 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:59:40] RECOVERY - PHP opcache health on mw2315 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:00:37] the codfw ones are not new but due to restarts earlier and just went from WARN to CRIT about now and i am slowly restarting them [13:01:12] !log rolling restart of ats-tls in text@esams - T249335 [13:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:15] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [13:01:42] PROBLEM - PHP opcache health on mw2300 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:02:02] PROBLEM - PHP opcache health on mw2306 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:03:26] PROBLEM - PHP opcache health on mw2304 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:03:46] PROBLEM - PHP opcache health on mw2303 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:05:04] PROBLEM - PHP opcache health on mw2308 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:05:14] RECOVERY - PHP opcache health on mw2307 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:05:26] RECOVERY - PHP opcache health on mw2300 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:05:40] RECOVERY - PHP opcache health on mw2303 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:06:18] 10Operations, 10LDAP-Access-Requests: Add Eamedina to `wmf` LDAF group - https://phabricator.wikimedia.org/T251358 (10Aklapper) @eamedina: Hi, please read and follow https://phabricator.wikimedia.org/project/profile/1564/ and provide all information required - thanks! [13:06:27] !log holger@mwmaint1002 Starting renameInvalidUsernames.php as part of T219279 [13:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:30] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [13:06:54] RECOVERY - PHP opcache health on mw2308 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:07:36] RECOVERY - PHP opcache health on mw2306 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:08:28] RECOVERY - PHP opcache health on mw2301 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:09:00] RECOVERY - PHP opcache health on mw2304 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:11:30] PROBLEM - PHP opcache health on mw2200 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:12:36] RECOVERY - PHP opcache health on mw2376 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:12:54] PROBLEM - PHP opcache health on mw2204 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:13:16] PROBLEM - PHP opcache health on mw2201 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:13:16] PROBLEM - PHP opcache health on mw2207 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:14:40] PROBLEM - PHP opcache health on mw2206 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:14:42] PROBLEM - PHP opcache health on mw2208 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:14:48] PROBLEM - PHP opcache health on mw2202 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:15:06] PROBLEM - PHP opcache health on mw2205 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:15:06] PROBLEM - PHP opcache health on mw2203 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:15:12] RECOVERY - PHP opcache health on mw2200 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:15:38] PROBLEM - PHP opcache health on mw2209 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:16:32] RECOVERY - PHP opcache health on mw2206 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:16:34] RECOVERY - PHP opcache health on mw2208 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:16:58] RECOVERY - PHP opcache health on mw2205 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:17:32] RECOVERY - PHP opcache health on mw2209 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:18:27] RECOVERY - PHP opcache health on mw2204 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:18:48] RECOVERY - PHP opcache health on mw2201 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:18:48] RECOVERY - PHP opcache health on mw2207 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:18:48] RECOVERY - PHP opcache health on mw2203 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:20:12] PROBLEM - PHP opcache health on mw2310 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:20:18] RECOVERY - PHP opcache health on mw2202 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:20:44] PROBLEM - PHP opcache health on mw2313 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:21:50] PROBLEM - PHP opcache health on mw2315 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:22:04] RECOVERY - PHP opcache health on mw2310 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:22:34] RECOVERY - PHP opcache health on mw2313 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:22:56] PROBLEM - PHP opcache health on mw2312 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:23:10] PROBLEM - PHP opcache health on mw2325 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:23:44] RECOVERY - PHP opcache health on mw2315 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:23:56] PROBLEM - PHP opcache health on mw2320 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:24:26] PROBLEM - PHP opcache health on mw2327 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:25:16] PROBLEM - PHP opcache health on mw2322 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:25:20] PROBLEM - PHP opcache health on mw2328 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:26:20] PROBLEM - PHP opcache health on mw2321 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:26:40] RECOVERY - PHP opcache health on mw2312 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:26:52] RECOVERY - PHP opcache health on mw2325 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:26:52] annoying but all recover after restarts and codfw-only [13:27:10] RECOVERY - PHP opcache health on mw2328 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:27:36] RECOVERY - PHP opcache health on mw2320 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:28:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly warm up db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json [13:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:07] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [13:28:34] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [13:28:56] RECOVERY - PHP opcache health on mw2322 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:30:00] RECOVERY - PHP opcache health on mw2327 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:30:04] RECOVERY - PHP opcache health on mw2321 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:30:18] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81082 bytes in 0.754 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [13:33:36] PROBLEM - PHP opcache health on mw2318 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:36:52] PROBLEM - PHP opcache health on mw2319 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:36:56] PROBLEM - PHP opcache health on mw2317 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:40:00] PROBLEM - PHP opcache health on mw2136 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:40:34] PROBLEM - PHP opcache health on mw2300 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:42:24] PROBLEM - PHP opcache health on mw2302 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:42:54] PROBLEM - PHP opcache health on mw2305 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:43:12] (03CR) 10RhinosF1: [C: 04-1] "It looks to me as if the last few commits to IS.php aren’t included. If I’m reading correctly, they were all done last night. Could you ha" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [13:43:44] PROBLEM - PHP opcache health on mw2375 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:43:50] PROBLEM - PHP opcache health on mw2308 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:44:10] RECOVERY - PHP opcache health on mw2319 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:44:14] RECOVERY - PHP opcache health on mw2300 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:44:32] PROBLEM - PHP opcache health on mw2306 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:45:20] (03PS1) 10Ottomata: refinery::job::test::camus - set check -> $monitoring_enabled [puppet] - 10https://gerrit.wikimedia.org/r/593744 [13:45:36] PROBLEM - PHP opcache health on mw2371 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:45:56] PROBLEM - PHP opcache health on mw2304 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:46:10] RECOVERY - PHP opcache health on mw2317 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:46:26] RECOVERY - PHP opcache health on mw2306 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:47:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json [13:47:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:10] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [13:47:44] PROBLEM - PHP opcache health on mw2376 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:48:06] PROBLEM - PHP opcache health on mw2311 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:48:10] PROBLEM - PHP opcache health on mw2314 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:49:20] RECOVERY - PHP opcache health on mw2371 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:49:34] RECOVERY - PHP opcache health on mw2376 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:49:34] PROBLEM - PHP opcache health on mw2315 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:49:56] RECOVERY - PHP opcache health on mw2311 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:50:00] RECOVERY - PHP opcache health on mw2314 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:50:18] RECOVERY - PHP opcache health on mw2305 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:50:28] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10JHedden) The RAID card took drive 9 offline again during the virtual disk rebuild. We cannot update the SATA drive firmware until all the devices a... [13:51:10] RECOVERY - PHP opcache health on mw2375 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:51:40] RECOVERY - PHP opcache health on mw2302 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:52:02] RECOVERY - PHP opcache health on mw2318 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:52:14] (03CR) 10Zoranzoki21: "> Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [13:53:17] (03CR) 10RhinosF1: [C: 04-1] "> > Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [13:53:32] PROBLEM - PHP opcache health on mw2206 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:53:39] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, two comments inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [13:55:00] RECOVERY - PHP opcache health on mw2308 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:55:10] RECOVERY - PHP opcache health on mw2315 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:55:14] RECOVERY - PHP opcache health on mw2304 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:55:28] PROBLEM - PHP opcache health on mw2204 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:55:50] PROBLEM - PHP opcache health on mw2201 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:56:40] RECOVERY - PHP opcache health on mw2136 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:57:14] RECOVERY - PHP opcache health on mw2206 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:57:18] RECOVERY - PHP opcache health on mw2204 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:59:27] (03CR) 10Ottomata: [C: 03+2] refinery::job::test::camus - set check -> $monitoring_enabled [puppet] - 10https://gerrit.wikimedia.org/r/593744 (owner: 10Ottomata) [13:59:34] RECOVERY - PHP opcache health on mw2201 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:59:49] 10Operations, 10DBA: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Marostegui) Added the slot on the deployment's page [13:59:53] 10Operations, 10DBA: Upgrade and restart s5 and s6 primary DB master: Tue 5th May - https://phabricator.wikimedia.org/T251154 (10Marostegui) Added the slot on the deployment's page [14:00:04] (03CR) 10Ottomata: "@elukey I didn't rerun the checker to make sure the _IMPORTED flags were created on the test cluster. Let me know if I should." [puppet] - 10https://gerrit.wikimedia.org/r/593744 (owner: 10Ottomata) [14:01:06] PROBLEM - PHP opcache health on mw2202 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:02:32] PROBLEM - PHP opcache health on mw2177 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:02:56] RECOVERY - PHP opcache health on mw2202 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:04:24] RECOVERY - PHP opcache health on mw2177 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:04:38] (03CR) 10Jhedden: [C: 03+1] keystone: make the max_active_keys a bit smarter [puppet] - 10https://gerrit.wikimedia.org/r/593626 (owner: 10Andrew Bogott) [14:04:40] PROBLEM - PHP opcache health on mw2320 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:04:44] PROBLEM - PHP opcache health on mw2326 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:05:04] PROBLEM - PHP opcache health on mw2323 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:05:10] PROBLEM - PHP opcache health on mw2327 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:05:58] PROBLEM - PHP opcache health on mw2322 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:06:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json [14:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:14] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [14:06:30] RECOVERY - PHP opcache health on mw2320 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:06:32] PROBLEM - PHP opcache health on mw2324 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:06:34] RECOVERY - PHP opcache health on mw2326 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:06:56] RECOVERY - PHP opcache health on mw2323 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:07:02] RECOVERY - PHP opcache health on mw2327 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:07:04] PROBLEM - PHP opcache health on mw2321 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:07:43] (03CR) 10Andrew Bogott: [C: 03+2] keystone: make the max_active_keys a bit smarter [puppet] - 10https://gerrit.wikimedia.org/r/593626 (owner: 10Andrew Bogott) [14:08:24] RECOVERY - PHP opcache health on mw2324 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:08:50] PROBLEM - PHP opcache health on mw2140 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:08:56] RECOVERY - PHP opcache health on mw2321 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:09:04] PROBLEM - PHP opcache health on mw2141 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:09:42] RECOVERY - PHP opcache health on mw2322 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:10:18] PROBLEM - PHP opcache health on mw2146 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:10:40] RECOVERY - PHP opcache health on mw2140 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:10:50] PROBLEM - PHP opcache health on mw2145 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:12:10] RECOVERY - PHP opcache health on mw2146 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:12:40] RECOVERY - PHP opcache health on mw2145 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:12:44] RECOVERY - PHP opcache health on mw2141 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:13:02] PROBLEM - PHP opcache health on mw2138 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:14:08] PROBLEM - PHP opcache health on mw2314 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:14:54] RECOVERY - PHP opcache health on mw2138 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:15:50] PROBLEM - PHP opcache health on mw2210 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:16:24] PROBLEM - PHP opcache health on mw2211 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:16:26] PROBLEM - PHP opcache health on mw2215 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:16:32] PROBLEM - PHP opcache health on mw2219 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:17:10] PROBLEM - PHP opcache health on mw2216 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:17:40] PROBLEM - PHP opcache health on mw2218 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:18:13] !log holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of T219279 [14:18:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:16] T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 [14:19:26] PROBLEM - PHP opcache health on mw2300 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:19:30] RECOVERY - PHP opcache health on mw2218 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:19:30] RECOVERY - PHP opcache health on mw2210 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:19:38] RECOVERY - PHP opcache health on mw2314 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:19:46] PROBLEM - PHP opcache health on mw2352 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:20:04] 10Operations, 10DBA: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Marostegui) Day before: - Install the 10.1.43-2 package on both masters (db1123 and db1086) Maintenance day: - Silence all hosts in s3 and s7 - Set read only on s3 and s7: ` dbctl -... [14:20:32] PROBLEM - PHP opcache health on mw2334 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:20:32] PROBLEM - PHP opcache health on mw2331 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:20:42] PROBLEM - PHP opcache health on mw2369 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:21:04] PROBLEM - PHP opcache health on mw2370 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:21:52] PROBLEM - PHP opcache health on mw2358 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:21:58] RECOVERY - PHP opcache health on mw2211 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:22:00] RECOVERY - PHP opcache health on mw2215 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:22:06] RECOVERY - PHP opcache health on mw2219 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:22:14] 10Operations, 10Analytics: systemd::syslog conf should use :programname equals instead of startswith - https://phabricator.wikimedia.org/T251606 (10Ottomata) [14:22:36] PROBLEM - PHP opcache health on mw2356 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:22:42] PROBLEM - PHP opcache health on mw2371 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:22:44] RECOVERY - PHP opcache health on mw2216 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:23:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json [14:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:58] T232446: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 [14:24:22] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 900 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:24:38] PROBLEM - PHP opcache health on mw2330 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:26:36] (03CR) 10Zoranzoki21: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:27:24] PROBLEM - PHP opcache health on mw2372 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:27:32] PROBLEM - PHP opcache health on mw2374 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:27:40] PROBLEM - PHP opcache health on mw2221 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:28:06] RECOVERY - PHP opcache health on mw2356 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:28:22] PROBLEM - PHP opcache health on mw2176 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:28:24] PROBLEM - PHP opcache health on mw2376 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:28:44] PROBLEM - PHP opcache health on mw2223 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:28:50] (03CR) 10Zoranzoki21: "git log shows me this: https://pastebin.com/rq4L691c" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:29:12] PROBLEM - PHP opcache health on mw2305 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:29:14] PROBLEM - PHP opcache health on mw2222 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:30:22] PROBLEM - PHP opcache health on mw2304 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:31:02] PROBLEM - PHP opcache health on mw2200 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:31:56] PROBLEM - PHP opcache health on mw2308 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:33:34] RECOVERY - PHP opcache health on mw2369 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:34:12] PROBLEM - PHP opcache health on mw2206 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:34:36] PROBLEM - PHP opcache health on mw2207 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:36:26] PROBLEM - PHP opcache health on mw2201 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:36:26] PROBLEM - PHP opcache health on mw2203 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:36:48] PROBLEM - PHP opcache health on mw2171 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:37:20] RECOVERY - PHP opcache health on mw2371 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:37:34] PROBLEM - PHP opcache health on mw2285 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:37:54] PROBLEM - PHP opcache health on mw2283 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:37:56] PROBLEM - PHP opcache health on mw2196 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:38:20] PROBLEM - PHP opcache health on mw2286 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:38:22] RECOVERY - PHP opcache health on mw2222 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:38:42] PROBLEM - PHP opcache health on mw2292 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:38:48] PROBLEM - PHP opcache health on mw2209 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:39:44] RECOVERY - PHP opcache health on mw2223 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:39:56] PROBLEM - PHP opcache health on mw2296 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:40:44] PROBLEM - PHP opcache health on mw2325 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:40:44] RECOVERY - PHP opcache health on mw2334 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:40:54] PROBLEM - PHP opcache health on mw2297 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:06] PROBLEM - PHP opcache health on mw2298 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:14] RECOVERY - PHP opcache health on mw2370 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:14] PROBLEM - PHP opcache health on mw2299 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:30] PROBLEM - PHP opcache health on mw2320 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:38] RECOVERY - PHP opcache health on mw2196 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:41:52] PROBLEM - PHP opcache health on mw2295 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:42:02] RECOVERY - PHP opcache health on mw2358 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:42:52] PROBLEM - PHP opcache health on mw2328 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:42:56] RECOVERY - PHP opcache health on mw2308 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:43:24] PROBLEM - PHP opcache health on mw2326 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:43:44] PROBLEM - PHP opcache health on mw2323 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:43:54] PROBLEM - PHP opcache health on mw2294 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:44:02] RECOVERY - PHP opcache health on mw2374 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:44:06] RECOVERY - PHP opcache health on mw2221 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:44:10] RECOVERY - PHP opcache health on mw2171 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:44:36] PROBLEM - PHP opcache health on mw2322 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:44:41] i can see the actual opcache_hit_rate on a couple of these. it is just slightly under the threshold and then goes back up [14:44:50] RECOVERY - PHP opcache health on mw2330 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:45:12] PROBLEM - PHP opcache health on mw2324 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:45:34] RECOVERY - PHP opcache health on mw2295 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:45:42] RECOVERY - PHP opcache health on mw2286 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:45:44] RECOVERY - PHP opcache health on mw2372 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:45:44] PROBLEM - PHP opcache health on mw2321 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:46:26] PROBLEM - PHP opcache health on mw2144 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:47:20] RECOVERY - PHP opcache health on mw2352 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:47:30] PROBLEM - PHP opcache health on mw2140 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:47:30] RECOVERY - PHP opcache health on mw2203 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:48:07] (03PS2) 10DannyS712: [WIP] Initial config for awawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:48:17] RECOVERY - PHP opcache health on mw2144 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:48:34] PROBLEM - PHP opcache health on mw2143 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:49:20] (03CR) 10DannyS712: "> It looks to me as if the last few commits to IS.php aren’t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:49:30] PROBLEM - PHP opcache health on mw2145 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:49:32] PROBLEM - PHP opcache health on mw2141 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:49:35] (03CR) 10DannyS712: "> > It looks to me as if the last few commits to IS.php aren’t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:50:08] PROBLEM - PHP opcache health on mw2137 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:50:24] PROBLEM - PHP opcache health on mw2244 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:50:34] RECOVERY - PHP opcache health on mw2304 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:51:12] PROBLEM - PHP opcache health on mw2142 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:51:16] (03CR) 10RhinosF1: [C: 03+1] "Thanks Danny" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593743 (https://phabricator.wikimedia.org/T251371) (owner: 10Zoranzoki21) [14:51:16] RECOVERY - PHP opcache health on mw2200 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:51:38] PROBLEM - PHP opcache health on mw2138 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:51:48] RECOVERY - PHP opcache health on mw2331 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:52:22] PROBLEM - PHP opcache health on mw2139 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:52:36] PROBLEM - PHP opcache health on mw2146 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:53:50] RECOVERY - PHP opcache health on mw2322 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:54:10] RECOVERY - PHP opcache health on mw2285 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:54:27] RECOVERY - PHP opcache health on mw2206 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:54:54] RECOVERY - PHP opcache health on mw2207 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:00] RECOVERY - PHP opcache health on mw2376 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:57:06] RECOVERY - PHP opcache health on mw2292 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:03] 10Operations, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T251579 (10Zoranzoki21) [14:58:11] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Zoranzoki21) [14:58:14] RECOVERY - PHP opcache health on mw2283 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:14] PROBLEM - PHP opcache health on mw2214 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:26] RECOVERY - PHP opcache health on mw2296 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:26] PROBLEM - PHP opcache health on mw2212 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:32] RECOVERY - PHP opcache health on mw2323 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:36] RECOVERY - PHP opcache health on mw2201 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:44] RECOVERY - PHP opcache health on mw2145 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:46] PROBLEM - PHP opcache health on mw2215 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:46] PROBLEM - PHP opcache health on mw2211 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:46] RECOVERY - PHP opcache health on mw2141 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:54] PROBLEM - PHP opcache health on mw2219 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:59:04] PROBLEM - PHP opcache health on mw2217 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:59:06] 10Operations, 10Analytics, 10LDAP-Access-Requests: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10Nuria) I think Antonio is a full time employee. @DZierten to confirm [14:59:10] RECOVERY - PHP opcache health on mw2325 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:59:22] RECOVERY - PHP opcache health on mw2297 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:59:28] RECOVERY - PHP opcache health on mw2328 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:59:30] PROBLEM - PHP opcache health on mw2216 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:00:30] RECOVERY - PHP opcache health on mw2294 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:00:54] RECOVERY - PHP opcache health on mw2209 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:01:12] RECOVERY - PHP opcache health on mw2137 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:01:20] RECOVERY - PHP opcache health on mw2216 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:01:27] (03PS4) 10DannyS712: Remove "Create a book" link on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561403 (https://phabricator.wikimedia.org/T241683) [15:01:46] RECOVERY - PHP opcache health on mw2320 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:01:50] RECOVERY - PHP opcache health on mw2326 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:03:14] RECOVERY - PHP opcache health on mw2298 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:03:20] RECOVERY - PHP opcache health on mw2299 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:03:26] RECOVERY - PHP opcache health on mw2139 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:04:06] RECOVERY - PHP opcache health on mw2142 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:04:06] RECOVERY - PHP opcache health on mw2140 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:04:10] RECOVERY - PHP opcache health on mw2305 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:04:12] RECOVERY - PHP opcache health on mw2321 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:05:08] RECOVERY - PHP opcache health on mw2176 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:08:16] RECOVERY - PHP opcache health on mw2138 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:09:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:10:40] RECOVERY - PHP opcache health on mw2244 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:11:06] (03PS4) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 [15:12:54] RECOVERY - PHP opcache health on mw2146 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:13:32] RECOVERY - PHP opcache health on mw2215 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:13:52] RECOVERY - PHP opcache health on mw2217 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:15:06] RECOVERY - PHP opcache health on mw2212 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:16:36] RECOVERY - PHP opcache health on mw2300 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:18:10] RECOVERY - PHP opcache health on mw2143 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:19:00] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22709 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:21:31] (03PS1) 10QEDK: toolforge: Increase HSTS max-age directive to one month [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) [15:26:23] (03PS6) 10Dzahn: httpbb: add tests for miscweb sites [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) [15:27:58] (03CR) 10Dzahn: httpbb: add tests for miscweb sites (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [15:30:27] (03PS1) 10Vgutierrez: Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 [15:31:01] 10Operations, 10Toolforge, 10Traffic, 10HTTPS, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 (10QEDK) >>! In T102367#6064559, @bd808 wrote: > We have left the POST loophole open for more than a year. Now that we have introduced [[https://wi... [15:31:14] RECOVERY - PHP opcache health on mw2324 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:32:06] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:33:19] (03PS2) 10Vgutierrez: Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 [15:33:46] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22723 bytes in 0.270 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:35:04] (03PS2) 10QEDK: domainproxy: Increase HSTS max-age directive to one month [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) [15:37:34] RECOVERY - PHP opcache health on mw2219 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:39:01] (03Abandoned) 10Thcipriani: Gerrit: apache proxy not pooled [puppet] - 10https://gerrit.wikimedia.org/r/579601 (https://phabricator.wikimedia.org/T246763) (owner: 10Thcipriani) [15:40:14] (03PS3) 10QEDK: toolforge: Increase HSTS max-age directive to one month [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) [15:44:15] 10Operations, 10Analytics, 10LDAP-Access-Requests: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10DZierten) He is a full time employee. Thanks for checking. Do you know where I can change his status? Thanks [15:44:42] RECOVERY - PHP opcache health on mw2211 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:50:12] (03PS1) 10Cwhite: toil: mitigate monthly acct cronspam [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) [15:51:40] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [15:52:04] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [15:53:37] (03CR) 10jerkins-bot: [V: 04-1] toil: mitigate monthly acct cronspam [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) (owner: 10Cwhite) [15:58:15] (03PS6) 10Cwhite: mtail: add flag to install mtail from apt component [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) [15:58:54] (03CR) 10Cwhite: mtail: add flag to install mtail from apt component (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [15:59:34] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [16:02:54] (03PS2) 10Cwhite: toil: mitigate monthly acct cronspam [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) [16:03:04] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.990 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [16:06:07] (03CR) 10jerkins-bot: [V: 04-1] toil: mitigate monthly acct cronspam [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) (owner: 10Cwhite) [16:06:14] RECOVERY - PHP opcache health on mw2214 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [16:12:56] 10Operations, 10ops-eqiad, 10DC-Ops: () rack/setup/install - https://phabricator.wikimedia.org/T251614 (10RobH) [16:13:22] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10RobH) [16:13:34] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10RobH) [16:14:31] (03PS3) 10Cwhite: toil: mitigate monthly acct cronspam [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) [16:26:54] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:27:23] 10Operations, 10ops-eqiad, 10DC-Ops: (Due Date: ASAP) rack/setup/install replacement msw-c6-eqiad - https://phabricator.wikimedia.org/T251616 (10RobH) [16:27:34] 10Operations, 10ops-eqiad, 10DC-Ops: (Due Date: ASAP) rack/setup/install replacement msw-c6-eqiad - https://phabricator.wikimedia.org/T251616 (10RobH) [16:31:01] 10Operations, 10ops-eqiad, 10DC-Ops: (Due Date: TBD) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10RobH) [16:31:26] 10Operations, 10SRE-Access-Requests: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10leila) @Dzahn thanks a lot! @Aklapper thanks for flagging those. I'll need roughly a week to address those tasks. @RhinosF1 thanks for flagging. I'll follow up with OIT. @elukey I'm follo... [16:33:00] 10Operations, 10SRE-Access-Requests: Revoke production access for jmorgan - https://phabricator.wikimedia.org/T251560 (10leila) >>! In T251560#6099639, @RhinosF1 wrote: > Sad to see this ticket but can someone also ask OIT to lock https://meta.wikimedia.org/wiki/Special:CentralAuth?target=Jmorgan%20(WMF) > >... [16:35:24] PROBLEM - PHP7 rendering on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [16:35:51] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10RobH) [16:35:53] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10RobH) [16:37:14] RECOVERY - PHP7 rendering on mw1361 is OK: HTTP OK: HTTP/1.1 200 OK - 81271 bytes in 9.736 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [16:37:42] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22717 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:38:18] 10Operations, 10ops-eqiad, 10DC-Ops: (NEED BY: TBD) rack/setup/install thanos-fe100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T251620 (10RobH) [16:39:18] (03CR) 10BryanDavis: [C: 04-1] toolforge: Increase HSTS max-age directive to one month (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) (owner: 10QEDK) [16:41:57] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10RobH) [16:42:06] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10RobH) [16:43:56] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus200[34] - https://phabricator.wikimedia.org/T251622 (10RobH) [16:44:04] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus200[34] - https://phabricator.wikimedia.org/T251622 (10RobH) [16:49:01] 10Operations, 10Toolforge, 10Traffic, 10HTTPS, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 (10bd808) >>! In T102367#6100153, @QEDK wrote: > Also, is there any way for the nginx proxy to hide the header if a tool chooses to send their own?... [16:49:52] (03PS7) 10Elukey: Refactor the exporter to support metrics specs via config file [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/592261 [16:52:23] (03CR) 10Elukey: "Very interesting: the exporter seems to return weird results (like histograms without labels etc..) with the prometheus-client version in " [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/592261 (owner: 10Elukey) [17:00:00] 10Operations, 10Toolforge, 10Traffic, 10HTTPS, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 (10QEDK) >>! In T102367#6100515, @bd808 wrote: >>>! In T102367#6100153, @QEDK wrote: >> Also, is there any way for the nginx proxy to hide the head... [17:00:45] (03CR) 10RLazarus: [C: 03+1] httpbb: add tests for miscweb sites [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [17:03:14] * Krinkle would like if we had Supercritical as an Icginga status [17:03:24] (03PS1) 10Andrew Bogott: apt: install gnupg1 before trying to set up keys for a repo [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) [17:05:50] (03PS4) 10QEDK: toolforge: Increase HSTS max-age directive to one month [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) [17:07:41] Krinkle: that sounds crazy [17:08:04] Supercritical would be worrying when it went off [17:11:12] (03CR) 10QEDK: "> Patch Set 3: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) (owner: 10QEDK) [17:12:04] RhinosF1: https://en.wikipedia.org/wiki/Supercritical_fluid ; https://www.youtube.com/watch?v=JslxPjrMzqY&t=8m40s [17:13:52] (03PS2) 10Krinkle: Increase wmgMemoryLimit from 660MB to 666MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592761 (owner: 10Ladsgroup) [17:14:14] Krinkle: ah [17:14:31] Sounds dangerous that [17:17:20] RhinosF1: In a nut shell, it's any gas-liquid hybrid. Created by any liquid (like water) warmed up so it becomes a gas, except it's under pressure so it remains solid/liquid, kind of like rain that goes up instead of down :D [17:17:50] Oh god, interesting though [17:18:44] 10Operations, 10ops-eqiad, 10DC-Ops: (NEED BY: TBD) rack/setup/install thanos-fe100[123].eqiad.wmnet - https://phabricator.wikimedia.org/T251620 (10RobH) [17:21:57] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10RobH) [17:21:59] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be100[123] - https://phabricator.wikimedia.org/T251618 (10RobH) [17:23:06] (03PS2) 10Andrew Bogott: apt: install gnupg1 before trying to set up keys for a repo [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) [17:30:49] (03CR) 10Muehlenhoff: "One more thing inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [17:31:34] PROBLEM - PHP7 rendering on mw1357 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:32:19] (03CR) 10Krinkle: [C: 04-1] noc.wikimedia.org: highlight.php should not append .txt to dblist URLs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591459 (https://phabricator.wikimedia.org/T250852) (owner: 10Urbanecm) [17:33:14] RECOVERY - PHP7 rendering on mw1357 is OK: HTTP OK: HTTP/1.1 200 OK - 81235 bytes in 0.421 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:35:42] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install rdb200[78] - https://phabricator.wikimedia.org/T251626 (10RobH) [17:35:55] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install rdb200[78] - https://phabricator.wikimedia.org/T251626 (10RobH) [17:38:04] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) rack/setup/install cloudvirt103[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10RobH) [17:38:08] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:38:14] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) rack/setup/install cloudvirt103[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T251627 (10RobH) [17:39:20] (03PS1) 10RLazarus: maintenance: Migrate initsitestats to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/593772 (https://phabricator.wikimedia.org/T211250) [17:40:32] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5806 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:43:30] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81270 bytes in 1.257 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:43:50] (03PS1) 10RLazarus: maintenance: Migrate startupregistrystats to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/593774 (https://phabricator.wikimedia.org/T211250) [17:46:00] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6291 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:49:52] PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:51:32] RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.604 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [17:51:54] (03CR) 10BryanDavis: [C: 03+1] toolforge: Increase HSTS max-age directive to one month [puppet] - 10https://gerrit.wikimedia.org/r/593747 (https://phabricator.wikimedia.org/T102367) (owner: 10QEDK) [17:55:04] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5317 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:59:06] 10Operations, 10Toolforge, 10Traffic, 10HTTPS, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 (10bd808) >>! In T102367#6100529, @QEDK wrote: > > I don't disagree that we should manage it at the proxy layer, I meant to ask that if a tool was... [18:10:44] (03PS3) 10Andrew Bogott: apt: install gnupg before trying to set up keys for a repo [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) [18:21:31] (03CR) 10CRusnov: "LGTM, did you test with PCC?" [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [18:28:00] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10RobH) [18:28:15] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops: (Need By: TBD) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10RobH) [18:30:59] 10Operations, 10ops-eqiad, 10serviceops: mw1280 correctable memory errors logged in getsel - https://phabricator.wikimedia.org/T251077 (10Cmjohnson) @wiki_willy This server is out of warranty, If the DIMM has already been replaced on this server than most likely the system board or CPU is failing. My recom... [18:32:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: scb1001: Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T250482 (10Cmjohnson) 05Open→03Stalled I am stalling this task until it's ready for decom [18:36:28] 10Operations, 10ops-eqiad, 10serviceops: mw1280 correctable memory errors logged in getsel - https://phabricator.wikimedia.org/T251077 (10wiki_willy) @elukey - looks like we have another year left before the end of the 5yr life cycle mark. Let us know if have enough in production to able to decom this host,... [18:39:40] 10Operations, 10ops-eqiad, 10serviceops: mw1280 correctable memory errors logged in getsel - https://phabricator.wikimedia.org/T251077 (10Dzahn) This server has already had CPU and RAM replaced and crashed multiple times. Also see: T195734, T240187, T218006 [18:43:57] (03PS1) 10RLazarus: maintenance: Migrate updatequerypages to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/593797 (https://phabricator.wikimedia.org/T211250) [18:45:15] (03CR) 10CRusnov: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [18:47:01] (03CR) 10jerkins-bot: [V: 04-1] maintenance: Migrate updatequerypages to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/593797 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [18:49:34] (03PS2) 10RLazarus: maintenance: Migrate updatequerypages to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/593797 (https://phabricator.wikimedia.org/T211250) [18:51:54] PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [18:54:26] ^ looking [18:57:04] !log restart blazegraph on wdqs1006 - T242453 [18:57:04] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:07] T242453: Deadlock in blazegraph blocking all queries and updates - https://phabricator.wikimedia.org/T242453 [18:57:22] RECOVERY - Query Service HTTP Port on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [18:57:23] (03CR) 10RLazarus: "PCC for this one, because it's messy: https://puppet-compiler.wmflabs.org/compiler1001/22259/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593797 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [18:58:47] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22733 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:59:56] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[123] - https://phabricator.wikimedia.org/T251634 (10RobH) [19:00:14] PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:00:23] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[123] - https://phabricator.wikimedia.org/T251634 (10RobH) [19:01:08] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:02:18] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install thanos-be200[1-4] - https://phabricator.wikimedia.org/T251634 (10RobH) [19:02:52] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81237 bytes in 2.605 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:03:50] RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 200 OK - 81236 bytes in 0.152 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:03:56] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install thanos-fe200[123] - https://phabricator.wikimedia.org/T251635 (10RobH) [19:04:06] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install thanos-fe200[123] - https://phabricator.wikimedia.org/T251635 (10RobH) [19:07:36] (03CR) 10Andrew Bogott: "This is a bit scary since it touches every host in the fleet. pcc shows a diff on every host since just defining the resource in puppet s" [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:12:32] PROBLEM - PHP7 rendering on mw1356 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:14:12] RECOVERY - PHP7 rendering on mw1356 is OK: HTTP OK: HTTP/1.1 200 OK - 81234 bytes in 0.374 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:15:54] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5328 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:17:52] PROBLEM - PHP7 rendering on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:19:32] RECOVERY - PHP7 rendering on mw1361 is OK: HTTP OK: HTTP/1.1 200 OK - 81234 bytes in 0.830 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:22:48] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install db213[6-9] - https://phabricator.wikimedia.org/T251639 (10RobH) [19:22:56] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install db213[6-9] - https://phabricator.wikimedia.org/T251639 (10RobH) [19:23:36] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:23:38] 10Operations, 10Security-Team, 10Patch-For-Review, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10chasemp) @faidon it doesn't seem like that case was surfaced actually. @Jbond would know for sure as he handling the tr... [19:25:18] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81236 bytes in 2.568 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:30:32] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6037 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:31:12] (03CR) 10CRusnov: [C: 03+1] "after some irc discussion it feels like this is not terribly dangerous" [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:33:52] PROBLEM - PHP7 rendering on mw1356 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:35:34] RECOVERY - PHP7 rendering on mw1356 is OK: HTTP OK: HTTP/1.1 200 OK - 81268 bytes in 0.614 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:37:58] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 5045 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:38:52] (03CR) 10CDanis: [C: 03+1] apt: install gnupg before trying to set up keys for a repo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:40:34] PROBLEM - PHP7 rendering on mw1362 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:42:16] RECOVERY - PHP7 rendering on mw1362 is OK: HTTP OK: HTTP/1.1 200 OK - 81270 bytes in 1.366 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:42:30] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:44:06] PROBLEM - PHP7 rendering on mw1360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:44:14] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22716 bytes in 0.276 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:44:34] (03CR) 10Jforrester: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/593797 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [19:45:30] PROBLEM - PHP7 rendering on mw1357 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:47:10] RECOVERY - PHP7 rendering on mw1357 is OK: HTTP OK: HTTP/1.1 200 OK - 81268 bytes in 0.716 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:47:47] RECOVERY - PHP7 rendering on mw1360 is OK: HTTP OK: HTTP/1.1 200 OK - 81270 bytes in 8.656 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:48:50] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:50:34] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81270 bytes in 3.657 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:55:02] PROBLEM - PHP7 rendering on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:56:46] RECOVERY - PHP7 rendering on mw1361 is OK: HTTP OK: HTTP/1.1 200 OK - 81268 bytes in 0.769 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [19:56:57] !log rzl@cumin1001 conftool action : set/pooled=no; selector: name=mw13(5[6-9]|6[0-2]).eqiad.wmnet [19:56:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:06] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 122 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:03:35] (03PS1) 10Jforrester: contint: On stretch, use the docker we have [puppet] - 10https://gerrit.wikimedia.org/r/593806 (https://phabricator.wikimedia.org/T236675) [20:04:52] (03CR) 10Andrew Bogott: [C: 03+2] apt: install gnupg before trying to set up keys for a repo [puppet] - 10https://gerrit.wikimedia.org/r/593769 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [20:14:50] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Jdforrester-WMF) >>! In T243056#6095651, @Krinkle wrote: >>>! In T243056#6095566, @Jdforrester-WMF wrote: >> Why no... [20:15:59] (03PS1) 10Cwhite: profile: temporarily disable alerts on cloud_dev_pdns* for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/593809 (https://phabricator.wikimedia.org/T251294) [20:18:56] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: 0.01152 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [20:20:25] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install db213[6-9] - https://phabricator.wikimedia.org/T251639 (10RobH) [20:20:40] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install db213[6-9] - https://phabricator.wikimedia.org/T251639 (10Papaul) [20:21:28] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install db213[6-9] - https://phabricator.wikimedia.org/T251639 (10RobH) a:05Papaul→03jcrespo @jcrespo or @Marostegui: The racking details from the ordering task only list 4 hosts, but we ended up ordering 5. So the racking info lists 1 per... [20:23:02] (03PS7) 10Cwhite: mtail: add flag to install mtail from apt component [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) [20:24:09] (03CR) 10Cwhite: mtail: add flag to install mtail from apt component (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [20:27:59] jouncebot: refresh [20:28:00] I refreshed my knowledge about deployments. [20:28:03] jouncebot: now [20:28:03] For the next 10 hour(s) and 31 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200501T0700) [20:28:09] jouncebot: next [20:28:09] In 10 hour(s) and 31 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200502T0700) [20:28:13] Perfect. [20:28:30] (03PS1) 10Andrew Bogott: apt::repository: remove redundant package require [puppet] - 10https://gerrit.wikimedia.org/r/593812 [20:28:45] (03CR) 10Herron: [C: 03+1] profile: temporarily disable alerts on cloud_dev_pdns* for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/593809 (https://phabricator.wikimedia.org/T251294) (owner: 10Cwhite) [20:29:13] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudceph200[123]-dev - https://phabricator.wikimedia.org/T250846 (10Papaul) The task says "The 2 OS disks should be a software RAID 1," Can anyone please provide which software RAID1? Thanks [20:31:13] (03CR) 10Cwhite: [C: 03+2] profile: temporarily disable alerts on cloud_dev_pdns* for maintenance [puppet] - 10https://gerrit.wikimedia.org/r/593809 (https://phabricator.wikimedia.org/T251294) (owner: 10Cwhite) [20:31:56] (03CR) 10Andrew Bogott: [C: 03+2] apt::repository: remove redundant package require [puppet] - 10https://gerrit.wikimedia.org/r/593812 (owner: 10Andrew Bogott) [20:32:14] (03PS1) 10Cwhite: Revert "profile: temporarily disable alerts on cloud_dev_pdns* for maintenance" [puppet] - 10https://gerrit.wikimedia.org/r/593815 [20:39:05] 10Operations, 10observability: Icinga refresh hardware selection (2020) - https://phabricator.wikimedia.org/T251644 (10herron) p:05Triage→03Medium [20:39:23] James_F: if you're about, can you tell me more about that "check experimental" on https://gerrit.wikimedia.org/r/593797 ? it looks like it failed, do I need to do anything about that? [20:40:29] (won't be merging that until next week anyway, so not urgent) [20:41:21] (03PS1) 10Cwhite: profile: escape exclaimation point [puppet] - 10https://gerrit.wikimedia.org/r/593817 (https://phabricator.wikimedia.org/T251294) [20:41:56] 10Operations, 10observability: Icinga refresh hardware selection (2020) - https://phabricator.wikimedia.org/T251644 (10herron) icinga1001 current CPU config is a pair of `Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz` which have a max turbo freq of `3GHz` [20:45:15] (03CR) 10Cwhite: [C: 03+2] profile: escape exclaimation point [puppet] - 10https://gerrit.wikimedia.org/r/593817 (https://phabricator.wikimedia.org/T251294) (owner: 10Cwhite) [20:46:12] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01088 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [20:52:15] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:54:49] (03PS1) 10Andrew Bogott: Openstack Apt repos: allow gnupg to be installed before custom repos are set up [puppet] - 10https://gerrit.wikimedia.org/r/593821 (https://phabricator.wikimedia.org/T251294) [20:55:38] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Apt repos: allow gnupg to be installed before custom repos are set up [puppet] - 10https://gerrit.wikimedia.org/r/593821 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [20:59:53] 10Operations, 10Security-Team, 10Patch-For-Review, 10User-jbond: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP - https://phabricator.wikimedia.org/T244792 (10HMarcus) Thank you @faidon and @chasemp . No, this was not surfaced, and it's helpful to understand this dependancy. As... [21:06:07] (03PS1) 10Andrew Bogott: Openstack Apt repos: allow gnupg to be installed before custom repos are set up [puppet] - 10https://gerrit.wikimedia.org/r/593824 [21:06:20] rzl: Hey, the check experimental was for me to test a CI infrastructure change (moving the puppet CI jobs over from a jessie docker runner to a stretch one). [21:07:09] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Apt repos: allow gnupg to be installed before custom repos are set up [puppet] - 10https://gerrit.wikimedia.org/r/593824 (owner: 10Andrew Bogott) [21:07:29] rzl: It failed because the puppet manifest for CI specifies a version of docker that we don't actually provide in apt.wikimedia.org any more, I believe. I have a patch to move to the version upstream actually gives us, on the basis that it's probably fine: https://gerrit.wikimedia.org/r/c/593806 [21:07:53] ahh got it, thanks for the explanation [21:08:42] The puppet-specific CI agent is the penultimate jessie instance in CI's WMCS project, so… oops. :-) [21:10:11] 10Operations, 10observability: Icinga refresh hardware selection (2020) - https://phabricator.wikimedia.org/T251644 (10herron) Looking at CPUs, `Intel(r) Xeon(R) E-2288G CPU @3.70 GHz, Max Turbo Frequency of 5.00 GHz, 16M Cache` looks like a good option, but from what I can tell means an `R340` chassis. I can... [21:11:08] well, we're down to 68 prod jessie hosts, so... 😬 [21:16:17] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [21:16:41] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (CI & Testing services): Assess whether we should still disable seccomp in Docker for CI - https://phabricator.wikimedia.org/T249729 (10hashar) >>! In T249729#6092568, @MoritzMuehlenhoff wrote... [21:23:07] PROBLEM - BGP status on cr3-knams is CRITICAL: BGP CRITICAL - AS13030/IPv6: Idle - Init7, AS13030/IPv4: Idle - Init7 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [21:24:57] RECOVERY - BGP status on cr3-knams is OK: BGP OK - up: 13, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [21:26:34] (03PS2) 10Cwhite: Revert "profile: temporarily disable alerts on cloud_dev_pdns* for maintenance" [puppet] - 10https://gerrit.wikimedia.org/r/593815 (https://phabricator.wikimedia.org/T251294) [21:30:39] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={2,3} site=eqiad topic=udp_localhost-info https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logg [21:30:39] ic=All&var-consumer_group=All [21:31:33] RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.005762 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [21:40:30] (03PS1) 10Mstyles: increment extra plugin to 6.5.4-wmf-9 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 [21:50:12] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/593836 [21:51:41] (03CR) 10jerkins-bot: [V: 04-1] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/593836 (owner: 10Paladox) [21:52:49] (03CR) 10CRusnov: [C: 03+2] reports cables: Add extra regexp to support more active interfaces [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/589750 (owner: 10CRusnov) [21:53:57] (03PS2) 10CRusnov: netbox-script-proxy: Fix uwsgi configuration [puppet] - 10https://gerrit.wikimedia.org/r/593031 [21:54:43] (03CR) 10CRusnov: "> Patch Set 1: Code-Review+1" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/593031 (owner: 10CRusnov) [21:54:54] (03PS3) 10CRusnov: netbox-script-proxy: Fix uwsgi configuration [puppet] - 10https://gerrit.wikimedia.org/r/593031 [22:02:26] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10herron) [22:06:58] (03PS1) 10Paladox: Remove 4 plugins [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/593852 [22:19:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:21:46] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22714 bytes in 0.288 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:25:24] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:34:16] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:35:36] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 81251 bytes in 0.644 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:37:36] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22720 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:39:48] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10RKemper) My Phabricator user (`Rkemper`) is up. I held off on enabling 2FA per the documentation. Still need to... [22:44:44] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:48:12] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 81251 bytes in 0.371 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:53:21] 10Operations, 10ops-codfw, 10procurement: codfw: Next Gen test rack - https://phabricator.wikimedia.org/T251570 (10Peachey88) [22:54:54] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:56:36] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 81251 bytes in 0.584 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [22:59:19] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10RKemper) Pre-emptively added user-committed-identity-hash here: https://meta.wikimedia.org/wiki/User:RKemper_(WMF... [23:00:24] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:04:04] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 81253 bytes in 8.658 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:06:54] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [23:07:26] PROBLEM - PHP7 rendering on mw1359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:08:48] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:09:10] RECOVERY - PHP7 rendering on mw1359 is OK: HTTP OK: HTTP/1.1 200 OK - 81251 bytes in 0.427 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:10:32] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 81251 bytes in 0.593 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:17:12] PROBLEM - PHP7 rendering on mw1352 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:18:10] PROBLEM - PHP7 rendering on mw1355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:18:52] RECOVERY - PHP7 rendering on mw1352 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.634 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:19:00] PROBLEM - PHP7 rendering on mw1357 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:19:50] RECOVERY - PHP7 rendering on mw1355 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.796 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:20:44] RECOVERY - PHP7 rendering on mw1357 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.840 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:10] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:52] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.387 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:48] PROBLEM - PHP7 rendering on mw1354 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:25:22] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001:9501 job=burrow partition={4,5} site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometh [23:25:22] er=logging-eqiad&var-topic=All&var-consumer_group=All [23:26:38] RECOVERY - PHP7 rendering on mw1354 is OK: HTTP OK: HTTP/1.1 200 OK - 81271 bytes in 7.433 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:32:26] PROBLEM - PHP7 rendering on mw1349 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:34:04] RECOVERY - PHP7 rendering on mw1349 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.363 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:34:22] PROBLEM - PHP7 rendering on mw1361 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:36:04] RECOVERY - PHP7 rendering on mw1361 is OK: HTTP OK: HTTP/1.1 200 OK - 81269 bytes in 0.376 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:55:47] Hello, can you check IP 94.145.77.241 if there are accounts [23:55:51] It is George Reevs [23:56:16] He will come in Serbia and (censored) me if I touch "changes" made by him [23:57:04] @stewards [23:57:07] @steward [23:57:11] Oops [23:57:14] Wrong channel [23:57:56] Sorry.. :)