[00:00:01] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
[00:00:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:07:12] <logmsgbot>	 !log ryankemper@cumin2001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
[00:07:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:07:33] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:08:09] <ryankemper>	 !log T267927 Reload of `wdqs2003` complete
[00:08:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:08:59] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, relocating_shards: 0, status: green, number_of_in_flight_fetch: 0, timed_out: False, delayed_unassigned_shards: 0, active_shards_percent_as_number: 100.0, number_of_data_nodes: 6, active_primary_shards: 895, number_of_nodes: 6, task_max_waiting_in_queue_millis:
[00:08:59] <icinga-wm>	 ards: 0, active_shards: 1793, initializing_shards: 0, number_of_pending_tasks: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:09:10] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
[00:09:11] <stashbot>	 T267927: Reload wikidata journal from fresh dumps - https://phabricator.wikimedia.org/T267927
[00:09:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:11:15] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
[00:11:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:34] <icinga-wm>	 PROBLEM - puppet last run on wdqs2008 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:14:30] <icinga-wm>	 PROBLEM - puppet last run on wdqs2003 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:14:54] <ryankemper>	 !log T267927 `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
[00:15:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:15:02] <stashbot>	 T267927: Reload wikidata journal from fresh dumps - https://phabricator.wikimedia.org/T267927
[00:16:30] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1403.eqiad.wmnet'] `  and were **ALL** s...
[00:16:59] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1402.eqiad.wmnet'] `  and were **ALL** s...
[00:17:48] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 108 probes of 622 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:18:47] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
[00:18:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:18:56] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
[00:19:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:19:56] <icinga-wm>	 RECOVERY - puppet last run on wdqs2003 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:20:31] <wikibugs>	 (03PS2) 10Ryan Kemper: wdqs: int can't take in float as string [cookbooks] - 10https://gerrit.wikimedia.org/r/680095 (https://phabricator.wikimedia.org/T280108)
[00:22:51] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
[00:22:53] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 51 probes of 622 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:22:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:23:38] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
[00:23:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:28:11] <icinga-wm>	 RECOVERY - puppet last run on wdqs2008 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:32:15] <wikibugs>	 (03CR) 10Ryan Kemper: wdqs: int can't take in float as string (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/680095 (https://phabricator.wikimedia.org/T280108) (owner: 10Ryan Kemper)
[00:48:48] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
[00:48:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:49:48] <wikibugs>	 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1307.eqiad.wmnet'] `  and were **ALL** s...
[00:53:20] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
[00:53:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:15:02] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:15:44] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[03:17:12] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:44:00] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[03:50:22] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_netflow_failure_flags.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:53:26] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[04:02:56] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[04:08:40] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:10:06] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37
[04:10:52] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:28:56] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[04:31:10] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: active_shards: 1793, initializing_shards: 0, delayed_unassigned_shards: 0, unassigned_shards: 0, relocating_shards: 0, active_primary_shards: 895, task_max_waiting_in_queue_millis: 0, cluster_name: cloudelastic-chi-eqiad, status: green, number_of_data_nodes: 6, number_of_pending_tasks: 0, number_of
[04:31:10] <icinga-wm>	  0, number_of_nodes: 6, active_shards_percent_as_number: 100.0, timed_out: False https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:11:58] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:14:14] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: task_max_waiting_in_queue_millis: 0, active_primary_shards: 895, active_shards: 1793, number_of_in_flight_fetch: 0, number_of_pending_tasks: 0, active_shards_percent_as_number: 100.0, unassigned_shards: 0, delayed_unassigned_shards: 0, number_of_nodes: 6, number_of_data_nodes: 6, initializing_shard
[05:14:14] <icinga-wm>	 e: cloudelastic-chi-eqiad, relocating_shards: 0, status: green, timed_out: False https://wikitech.wikimedia.org/wiki/Search%23Administration
[05:27:18] <wikibugs>	 10SRE, 10Wikimedia-SVG-rendering: Adding new font for CJK media display - https://phabricator.wikimedia.org/T280432 (10NFSL2001)
[05:27:58] <wikibugs>	 10SRE, 10Wikimedia-SVG-rendering: Adding new font for CJK media display - https://phabricator.wikimedia.org/T280432 (10NFSL2001)
[05:40:04] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create a mailing list for ptwikinews - https://phabricator.wikimedia.org/T280408 (10Ladsgroup) Hi, if you can wait for two to three weeks, we will get mailman3 up and running soon and you can enjoy a much more modern system (see https://lists-next.wikimedia.org). This is by no...
[06:38:38] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[06:40:54] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: number_of_data_nodes: 6, number_of_pending_tasks: 0, status: green, unassigned_shards: 0, cluster_name: cloudelastic-chi-eqiad, active_shards_percent_as_number: 100.0, number_of_nodes: 6, task_max_waiting_in_queue_millis: 0, active_primary_shards: 895, number_of_in_flight_fetch: 0, relocating_shard
[06:40:54] <icinga-wm>	 ssigned_shards: 0, active_shards: 1793, initializing_shards: 0, timed_out: False https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:19:02] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:19:40] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:24:26] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: active_primary_shards: 895, relocating_shards: 0, cluster_name: cloudelastic-chi-eqiad, delayed_unassigned_shards: 0, number_of_pending_tasks: 0, timed_out: False, number_of_in_flight_fetch: 0, number_of_nodes: 6, active_shards_percent_as_number: 100.0, initializing_shards: 0, unassigned_shards: 0,
[07:24:26] <icinga-wm>	 ctive_shards: 1793, number_of_data_nodes: 6, task_max_waiting_in_queue_millis: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[07:33:00] <icinga-wm>	 PROBLEM - Stale file for node-exporter textfile in eqiad on alert1001 is CRITICAL: cluster=misc file=puppet_agent.prom instance=otrs1001 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Stale_file_for_node-exporter_textfile https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile
[08:05:34] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:07:50] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: delayed_unassigned_shards: 0, active_shards: 1793, task_max_waiting_in_queue_millis: 0, cluster_name: cloudelastic-chi-eqiad, number_of_nodes: 6, timed_out: False, relocating_shards: 0, initializing_shards: 0, number_of_pending_tasks: 0, unassigned_shards: 0, active_shards_percent_as_number: 100.0,
[08:07:50] <icinga-wm>	 umber_of_data_nodes: 6, number_of_in_flight_fetch: 0, active_primary_shards: 895 https://wikitech.wikimedia.org/wiki/Search%23Administration
[08:09:20] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[08:14:08] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 104.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37
[09:13:23] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Install mailman3 and mailman2 at the same time on the cloud - https://phabricator.wikimedia.org/T278612 (10Ladsgroup) So after several changes in puppetmaster of mailman in the cloud, it works now: https://polymorphic.lists.wmcloud.org/hyperkitty/hyperkit...
[09:39:16] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[09:41:34] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:07:46] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1012 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.058 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[10:24:57] <wikibugs>	 10SRE, 10Wikimedia-Incident: Uncached wiki requests partially unavailable due to excessive request rates from a bot - https://phabricator.wikimedia.org/T280232 (10Schlurcher) >> From my talk page a suggestion was to check maxlag. I'm checking maxlag, and at the time I ultimately shut down the bot, maxlag event...
[10:27:29] <wikibugs>	 10SRE, 10MediaWiki-General, 10Browser-Support-Apple-Safari: File:Chessboard480.svg not visible on safari when size is fixed at 208px - https://phabricator.wikimedia.org/T280439 (10Daimona)
[10:37:04] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:41:52] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: status: green, number_of_pending_tasks: 0, task_max_waiting_in_queue_millis: 0, initializing_shards: 0, active_shards: 1793, number_of_in_flight_fetch: 0, number_of_data_nodes: 6, unassigned_shards: 0, delayed_unassigned_shards: 0, cluster_name: cloudelastic-chi-eqiad, active_primary_shards: 895, r
[10:41:52] <icinga-wm>	  0, timed_out: False, number_of_nodes: 6, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:03:28] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create a mailing list for ptwikinews - https://phabricator.wikimedia.org/T280408 (10Edu) @Ladsgroup I'll wait. Thanks.
[12:10:12] <icinga-wm>	 PROBLEM - Host elastic2043 is DOWN: PING CRITICAL - Packet loss = 100%
[12:12:10] <icinga-wm>	 RECOVERY - Host elastic2043 is UP: PING WARNING - Packet loss = 66%, RTA = 31.89 ms
[12:18:16] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:20:34] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: number_of_pending_tasks: 0, timed_out: False, status: green, task_max_waiting_in_queue_millis: 0, initializing_shards: 0, active_shards: 1793, active_shards_percent_as_number: 100.0, number_of_nodes: 6, number_of_data_nodes: 6, relocating_shards: 0, unassigned_shards: 0, cluster_name: cloudelastic-
[12:20:34] <icinga-wm>	 _primary_shards: 895, number_of_in_flight_fetch: 0, delayed_unassigned_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:10:59] <wikibugs>	 (03CR) 10Volans: "I might be missing context here but it seems to me that it would be much simpler to just write a cookbook (or modify the existing downtime" [puppet] - 10https://gerrit.wikimedia.org/r/680376 (owner: 10David Caro)
[14:36:58] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:39:16] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: timed_out: False, unassigned_shards: 0, active_shards: 1793, relocating_shards: 0, active_primary_shards: 895, active_shards_percent_as_number: 100.0, cluster_name: cloudelastic-chi-eqiad, delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, status: green, number_
[14:39:16] <icinga-wm>	 ializing_shards: 0, number_of_data_nodes: 6, task_max_waiting_in_queue_millis: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[15:06:34] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[15:08:48] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[15:20:16] <icinga-wm>	 PROBLEM - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is CRITICAL: 0.7937 gt 0.3 https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[15:21:44] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[15:22:26] <icinga-wm>	 PROBLEM - Mediawiki CirrusSearch pool counter rejections rate on alert1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Pool_Counter_rejections_%28search_is_currently_too_busy%29 https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1
[15:27:12] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - elastic1060-production-search-eqiad on elastic1060 is CRITICAL: 288.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-eqiad&var-instance=elastic1060&panelId=37
[15:32:22] <icinga-wm>	 RECOVERY - Some MediaWiki servers are running out of idle PHP-FPM workers in api_appserver at eqiad on alert1001 is OK: All metrics within thresholds. https://bit.ly/wmf-fpmsat https://grafana.wikimedia.org/d/fRn9VEPMz/application-servers-use-dashboard-wip?orgId=1
[15:36:07] <wikibugs>	 (03PS1) 10Daimona Eaytoy: Stop setting $wgAbuseFilterParserClass [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680753 (https://phabricator.wikimedia.org/T239990)
[15:38:40] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[15:41:46] <icinga-wm>	 RECOVERY - Mediawiki CirrusSearch pool counter rejections rate on alert1001 is OK: OK: Less than 1.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Pool_Counter_rejections_%28search_is_currently_too_busy%29 https://grafana.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=4&orgId=1
[16:03:50] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1005 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[16:06:08] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1005 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: timed_out: False, number_of_pending_tasks: 0, cluster_name: cloudelastic-chi-eqiad, relocating_shards: 0, active_shards_percent_as_number: 100.0, number_of_data_nodes: 6, number_of_in_flight_fetch: 0, initializing_shards: 0, number_of_nodes: 6, status: green, task_max_waiting_in_queue_millis: 0, ac
[16:06:08] <icinga-wm>	 , delayed_unassigned_shards: 0, unassigned_shards: 0, active_primary_shards: 895 https://wikitech.wikimedia.org/wiki/Search%23Administration
[16:06:44] <Jayprakash12345>	 Hi, I want to add my tool in CI on gerrit. I known I have to addd config in zuul/layout.yaml. In the past, I only added tox-docker to my python app. This time, I am using React that needs `yarn test` or `npm run test`. So what template name should I use?
[16:08:28] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on thumbor2001 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor2001&var-datasource=codfw+prometheus/ops
[16:16:22] <Amir1>	 !log cleaning SuccuBot's watchlist in wikidatawiki 
[16:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:25:16] <icinga-wm>	 RECOVERY - Rate of JVM GC Old generation-s runs - elastic1060-production-search-eqiad on elastic1060 is OK: (C)100 gt (W)80 gt 61.02 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-eqiad&var-instance=elastic1060&panelId=37
[16:42:20] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Create a mailing list for ptwikinews - https://phabricator.wikimedia.org/T280408 (10Ladsgroup) Thanks.
[16:49:32] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 69 probes of 623 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:55:46] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Search%23Administration
[16:55:58] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 51 probes of 623 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:58:00] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 673 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:03:46] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:06:04] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: status: green, active_shards_percent_as_number: 100.0, number_of_data_nodes: 6, active_shards: 1793, active_primary_shards: 895, initializing_shards: 0, unassigned_shards: 0, delayed_unassigned_shards: 0, timed_out: False, number_of_nodes: 6, number_of_pending_tasks: 0, relocating_shards: 0, number
[18:06:04] <icinga-wm>	 ch: 0, cluster_name: cloudelastic-chi-eqiad, task_max_waiting_in_queue_millis: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[18:49:10] <wikibugs>	 (03CR) 10Tacsipacsi: "Shouldn’t this patchset be tagged with T273317 and T275322? gerritbot doesn’t follow `Depends-On`, so there are no notifications on Phabri" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/679938 (owner: 10Gergő Tisza)
[20:14:14] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:19:02] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: status: green, cluster_name: cloudelastic-chi-eqiad, relocating_shards: 0, task_max_waiting_in_queue_millis: 0, active_shards: 1793, unassigned_shards: 0, active_primary_shards: 895, delayed_unassigned_shards: 0, initializing_shards: 0, active_shards_percent_as_number: 100.0, number_of_in_flight_fe
[20:19:02] <icinga-wm>	 _nodes: 6, number_of_pending_tasks: 0, number_of_data_nodes: 6, timed_out: False https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:37:28] <wikibugs>	 (03CR) 10Jforrester: [C: 03+1] "Ha, we could have removed the IS setting a while ago. Oops." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680753 (https://phabricator.wikimedia.org/T239990) (owner: 10Daimona Eaytoy)
[20:39:26] <wikibugs>	 (03CR) 10Gergő Tisza: "> Shouldn’t this patchset be tagged with T273317 and T275322? gerritbot doesn’t follow `Depends-On`, so there are no notifications on Phab" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/679938 (owner: 10Gergő Tisza)
[21:19:06] <icinga-wm>	 PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 307.1 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1001&panelId=37
[21:56:56] <wikibugs>	 (03CR) 10Tacsipacsi: "Yes, there are quite a number of FlaggedRevs tickets, probably a number of them having the same root cause. I’m also lost, but that’s what" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/679938 (owner: 10Gergő Tisza)
[22:08:58] <wikibugs>	 (03PS7) 10Ahmon Dancy: WIP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680405
[22:15:42] <wikibugs>	 (03PS8) 10Ahmon Dancy: WIP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680405
[22:19:10] <wikibugs>	 (03PS9) 10Ahmon Dancy: WIP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680405
[22:43:18] <icinga-wm>	 PROBLEM - ElasticSearch health check for shards on 9200 on cloudelastic1006 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[22:53:02] <icinga-wm>	 RECOVERY - ElasticSearch health check for shards on 9200 on cloudelastic1006 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: active_primary_shards: 895, task_max_waiting_in_queue_millis: 0, active_shards: 1793, delayed_unassigned_shards: 0, number_of_in_flight_fetch: 0, number_of_nodes: 6, relocating_shards: 0, timed_out: False, initializing_shards: 0, cluster_name: cloudelastic-chi-eqiad, status: green, number_of_pendin
[22:53:02] <icinga-wm>	 r_of_data_nodes: 6, active_shards_percent_as_number: 100.0, unassigned_shards: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration