[00:59:32] <icinga-wm>	 PROBLEM - PHP opcache health on scandium is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[01:01:24] <icinga-wm>	 RECOVERY - PHP opcache health on scandium is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[02:30:16] <icinga-wm>	 RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
[03:35:54] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 112.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[04:00:06] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 131.2 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[04:46:38] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[04:59:40] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 50.85 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[06:40:48] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[07:26:50] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[08:13:44] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 114.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[08:24:24] <elukey>	 these alarms are not incredibly problematic, but they indicate that a couple of nodes are spending a ton of time in old GC runs --^
[09:05:32] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[09:09:36] <icinga-wm>	 PROBLEM - snapshot of s3 in codfw on db1115 is CRITICAL: snapshot for s3 at codfw taken more than 3 days ago: Most recent backup 2020-04-02 08:53:40 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[09:44:10] <icinga-wm>	 PROBLEM - Check systemd state on cp3050 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:53:28] <icinga-wm>	 RECOVERY - Check systemd state on cp3050 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:55:32] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[11:06:28] <wikibugs>	 (03Abandoned) 10MarcoAurelio: offboard-user: Include new security subprojects [puppet] - 10https://gerrit.wikimedia.org/r/576440 (owner: 10MarcoAurelio)
[11:18:24] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 131.2 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[11:30:40] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 69.15 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[11:59:08] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[12:15:48] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[12:37:52] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[13:01:38] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22410 bytes in 0.258 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[13:56:28] <wikibugs>	 10Operations, 10Mail: Wiki email not delievered to GMail - https://phabricator.wikimedia.org/T243937 (10Reedy)
[13:56:48] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[14:06:47] <wikibugs>	 10Operations, 10Mail: Wiki email not delievered to GMail - https://phabricator.wikimedia.org/T243937 (10Aklapper) @Huji: As there is [sometimes a backlog](https://grafana.wikimedia.org/d/nULM0E1Wk/mailman?orgId=1), did that message ever arrive later? Also, I guess you did check spam folders?
[14:07:48] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22407 bytes in 0.275 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[14:13:32] <wikibugs>	 10Operations, 10Mail: Wiki email not delievered to GMail - https://phabricator.wikimedia.org/T243937 (10Platonides) I don't think that graph is the right one, André. It may provide approximate data (both are emails), but I think list email is even sent from a completely different relay.  Also, @Huji the issue...
[14:13:53] <Platonides>	 /26/101
[14:13:56] <wikibugs>	 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10Platonides)
[14:16:37] <wikibugs>	 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10Reedy) FWIW, @Matanya is reporting that another onwiki user isn't getting password reset emails to their gmail account (no bounce records)
[14:16:59] <wikibugs>	 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10Aklapper) Thanks. If Yahoo is involved, then quite often Yahoo is the problem (there are quite some tasks about Yahoo Mail here).
[14:29:48] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 3 others: esams cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10CDanis)
[15:12:41] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584913 (owner: 104nn1l2)
[15:34:58] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[15:36:58] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone: Drop in a backported fix to token_formatters.py [puppet] - 10https://gerrit.wikimedia.org/r/586105 (https://phabricator.wikimedia.org/T248635)
[15:44:29] <wikibugs>	 (03PS2) 10Andrew Bogott: Keystone: Drop in a backported fix to token_formatters.py [puppet] - 10https://gerrit.wikimedia.org/r/586105 (https://phabricator.wikimedia.org/T248635)
[15:46:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone: Drop in a backported fix to token_formatters.py [puppet] - 10https://gerrit.wikimedia.org/r/586105 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott)
[16:02:40] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22420 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[16:14:57] <wikibugs>	 (03PS1) 10CDanis: update vhtcpd exporter to match reality [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346)
[16:19:42] <wikibugs>	 (03PS2) 10CDanis: update vhtcpd exporter to match reality [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346)
[16:20:07] <wikibugs>	 (03PS3) 10CDanis: update vhtcpd exporter to match reality [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346)
[16:25:36] <wikibugs>	 10Operations, 10Traffic, 10observability, 10Patch-For-Review: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10CDanis) Updated version parsing your new output above: {P10893}
[16:26:48] <wikibugs>	 10Operations, 10Traffic, 10observability, 10Patch-For-Review: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10CDanis) And parsing current output from `cp3052`: {P10895}
[16:29:00] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] "I'm self-+2'ing because I want to start tracking the esams backlog ASAP." [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346) (owner: 10CDanis)
[16:29:18] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] "> Patch Set 3: Code-Review+2" [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346) (owner: 10CDanis)
[16:36:33] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10WMF-Legal: wikipedia.lol - https://phabricator.wikimedia.org/T88861 (10Dzahn) Ok, fine with me. Thanks!
[16:44:10] <icinga-wm>	 PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
[16:45:46] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[16:46:58] <wikibugs>	 10Operations, 10Traffic, 10observability, 10Patch-For-Review: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10CDanis) 05Open→03Resolved a:03CDanis https://grafana.wikimedia.org/d/wBCQKHjWz/vhtcpd?orgId=1&var-datasour...
[16:52:32] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: esams cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10CDanis) It isn't just esams that often has a backlog: looking at the past 10-20 min...
[17:17:44] <wikibugs>	 10Operations, 10netops: review fastnetmon thresholds after sensible flow table sizes rollout - https://phabricator.wikimedia.org/T249454 (10CDanis)
[17:24:48] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 65.08 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[17:55:11] <wikibugs>	 (03PS1) 10Jhedden: openstack: update nova-placement healthcheck in codfwdev1 [puppet] - 10https://gerrit.wikimedia.org/r/586118 (https://phabricator.wikimedia.org/T249453)
[17:58:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: update nova-placement healthcheck in codfwdev1 [puppet] - 10https://gerrit.wikimedia.org/r/586118 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden)
[18:03:05] <wikibugs>	 (03PS2) 10Jhedden: openstack: update nova-placement healthcheck in codfwdev1 [puppet] - 10https://gerrit.wikimedia.org/r/586118 (https://phabricator.wikimedia.org/T249453)
[18:04:07] <wikibugs>	 (03PS3) 10Jhedden: openstack: update nova-placement healthcheck in codfwdev1 [puppet] - 10https://gerrit.wikimedia.org/r/586118 (https://phabricator.wikimedia.org/T249453)
[18:34:51] <wikibugs>	 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 3 others: esams cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10bearND) Yes, I think it's more than just esams since the merged in task I created (...
[18:37:06] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[18:49:08] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[18:49:58] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22409 bytes in 0.262 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[19:14:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:16:28] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[19:19:38] <wikibugs>	 10Operations, 10serviceops, 10Continuous-Integration-Config, 10Regression, 10Wikimedia-Incident: operations-apache-config-lint replacement doesn't check syntax - https://phabricator.wikimedia.org/T114801 (10Krinkle)
[19:26:18] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[19:30:49] <wikibugs>	 (03PS1) 10Andrew Bogott: codf1dev db server: increase max connections by a lot [puppet] - 10https://gerrit.wikimedia.org/r/586135 (https://phabricator.wikimedia.org/T249453)
[19:34:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] codf1dev db server: increase max connections by a lot [puppet] - 10https://gerrit.wikimedia.org/r/586135 (https://phabricator.wikimedia.org/T249453) (owner: 10Andrew Bogott)
[19:36:26] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 140.3 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[19:44:14] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=
[19:47:52] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:02:38] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:04:30] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:12:08] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb2002 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:13:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:13:58] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb2002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:19:26] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:25:00] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:28:04] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1004 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:28:10] <icinga-wm>	 RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors
[20:28:14] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 48.81 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[20:28:40] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:29:54] <icinga-wm>	 PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/description/translation/from/{source}/to/{target} (Description translation suggestions) timed out before a response was received: /{domain}/v1/description/addition/{target} (Description addition suggestions) timed out before a response was received: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed
[20:29:54] <icinga-wm>	 ponse was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:31:42] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:34:14] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 54 probes of 547 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:35:26] <icinga-wm>	 RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api
[20:40:12] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 38 probes of 547 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:41:36] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[20:43:26] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[20:51:21] <RF1>	 Someome’s just reported that there’s a significant delay in watchlist notification emails (they’re outlook/hotmail) - could it be linked to T243937
[20:51:21] <stashbot>	 T243937: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937
[20:54:30] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is OK: HTTP OK: HTTP/1.0 200 OK - 22337 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[20:58:39] <Platonides>	 RF1: we could send some test emails
[20:58:40] <wikibugs>	 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10Krenair) I just tried to reset password and the emails appeared in gmail immediately.
[20:59:02] <Platonides>	 Krenair is already testing, it seems :P
[20:59:16] <Krenair>	 I just did a one-off password reset attempt
[20:59:17] * Krenair shrugs
[20:59:20] <Krenair>	 it looked fine to me
[21:00:13] <Platonides>	 try Special:UserMail me
[21:00:16] <RF1>	 It’s the first report, left them a link to that task.
[21:00:40] <RF1>	 weird that they claim outlook is slow as well
[21:01:02] <Platonides>	 it could be something on wmf side...
[21:01:13] <Platonides>	 or simply people that is unable to look at the spam folfer
[21:01:15] <Platonides>	 *folder
[21:01:28] <RF1>	 Could be
[21:01:43] <RF1>	 Emails from the mailing lists seem fine
[21:02:00] <Platonides>	 I think those use a different relay
[21:02:21] <RF1>	 But I only get daily emails for notifications as I’d get one every few seconds otherwise
[21:02:55] <Platonides>	 I'm not even sure how I have notifications set up
[21:05:47] <Krenair>	 Platonides, did you get an  email?
[21:05:59] <Krenair>	 I got a sender's copy
[21:06:11] <Platonides>	 nope
[21:06:18] * Platonides digs deeper
[21:06:38] <Platonides>	 it's there now
[21:06:46] <Platonides>	 21:05 (0 minutes ago)
[21:06:54] <Krenair>	 fast enough for email
[21:07:03] <Platonides>	 LGTM
[21:07:56] <Platonides>	 0 seconds of delay according to Received: headers
[21:08:54] <Platonides>	 so it's pretty good  :P
[21:25:24] <icinga-wm>	 PROBLEM - Host mw2323 is DOWN: PING CRITICAL - Packet loss = 100%
[21:28:00] <icinga-wm>	 RECOVERY - Host mw2323 is UP: PING OK - Packet loss = 0%, RTA = 36.11 ms
[21:53:59] <wikibugs>	 (03PS1) 10Riley: Update ti.wiki logo from English version to version located at [[File:Wikipedia-logo-v2-ti.svg]] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586149
[21:54:01] <wikibugs>	 (03PS1) 10Legoktm: planet: Add new techblog.wikimedia.org feed [puppet] - 10https://gerrit.wikimedia.org/r/586147
[21:54:03] <wikibugs>	 (03PS1) 10Legoktm: planet: Remove dead blog.wikimedia.org feeds [puppet] - 10https://gerrit.wikimedia.org/r/586148
[21:54:05] <wikibugs>	 (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586149 (owner: 10Riley)
[21:54:19] <riley>	 thats excessive
[21:54:44] <legoktm>	 we can take back the welcome if you'd like ;)
[21:55:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] planet: Remove dead blog.wikimedia.org feeds [puppet] - 10https://gerrit.wikimedia.org/r/586148 (owner: 10Legoktm)
[21:55:50] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[21:56:19] <wikibugs>	 (03PS2) 10Legoktm: planet: Remove dead blog.wikimedia.org feeds [puppet] - 10https://gerrit.wikimedia.org/r/586148
[21:56:44] <paladox>	 riley ohh, you want to upload an image?
[21:57:17] <riley>	 paladox: Yes, for two wikis
[21:57:19] <riley>	 legoktm: LOL
[21:57:47] <paladox>	 Ok, you'll need to use git on the commandline (file upload support will be coming in gerrit 3.2).
[21:57:52] <paladox>	 riley do you have git installed?
[21:58:24] <wikibugs>	 (03Abandoned) 10Riley: Update ti.wiki logo from English version to version located at [[File:Wikipedia-logo-v2-ti.svg]] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586149 (owner: 10Riley)
[21:58:38] <riley>	 I do, just need to update SSH
[21:59:26] <paladox>	 You add your ssh key to https://gerrit.wikimedia.org/r/#/settings/ssh-keys
[21:59:49] <paladox>	 riley have you created ~/.gitconfig too?
[22:03:22] <riley>	 Yup
[22:03:25] <riley>	 And now done for ssh key
[22:37:34] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 109.5 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[22:46:00] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 16.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37
[22:46:14] <icinga-wm>	 PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/suggest/source/{title}/{to} (Suggest a source title to use for translation) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[22:47:58] <icinga-wm>	 RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[22:56:48] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[22:58:34] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[23:03:54] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[23:06:02] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[23:11:18] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[23:11:34] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[23:14:44] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[23:20:48] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:22:38] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:25:42] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is OK: HTTP OK: HTTP/1.0 200 OK - 22336 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[23:57:32] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37