[00:03:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:03:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:03:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:03:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:12:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:12:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:20:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:20:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:25:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:25:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:26:13] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1009 is CRITICAL: CRITICAL - commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), dewiki_content_1764707191[4](2026-05-01T00:25:20.169Z), commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.
[00:26:13] <icinga-wm>	 ommonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:21.933Z), commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30
[00:26:13] <icinga-wm>	 5-01T02:38:20.027Z), commonswiki_file_1764602342[31](2026-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wi https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:26:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:26:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:27:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:27:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:28:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:28:20] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:31:13] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1007 is CRITICAL: CRITICAL - commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), dewiki_content_1764707191[4](2026-05-01T00:25:20.169Z), commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.
[00:31:13] <icinga-wm>	 ommonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:21.933Z), commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30
[00:31:13] <icinga-wm>	 5-01T02:38:20.027Z), commonswiki_file_1764602342[31](2026-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wi https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:31:13] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1012 is CRITICAL: CRITICAL - commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), dewiki_content_1764707191[4](2026-05-01T00:25:20.169Z), commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.
[00:31:13] <icinga-wm>	 ommonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:21.933Z), commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30
[00:31:13] <icinga-wm>	 5-01T02:38:20.027Z), commonswiki_file_1764602342[31](2026-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wi https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:31:13] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1011 is CRITICAL: CRITICAL - commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), dewiki_content_1764707191[4](2026-05-01T00:25:20.169Z), commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.
[00:31:14] <icinga-wm>	 ommonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:21.933Z), commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30
[00:31:14] <icinga-wm>	 5-01T02:38:20.027Z), commonswiki_file_1764602342[31](2026-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wi https://wikitech.wikimedia.org/wiki/Search%23Administration
[00:31:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:31:15] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:32:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:32:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:36:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:36:00] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[00:38:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:38:00] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:53:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[00:53:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[00:54:02] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[00:54:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:03:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:03:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:04:02] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:04:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:06:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[01:10:03] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1282069
[01:10:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1282069 (owner: 10TrainBranchBot)
[01:20:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:20:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:20:54] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1282069 (owner: 10TrainBranchBot)
[01:21:02] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:21:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:25:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:25:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:26:02] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:26:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:26:21] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:26:26] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:26:38] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:26:44] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[01:31:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:31:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:32:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:32:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:40:53] <icinga-wm>	 PROBLEM - Host mr1-magru.oob is DOWN: PING CRITICAL - Packet loss = 100%
[01:41:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:41:00] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:41:08] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:41:14] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:42:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:42:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:46:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:46:00] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:46:17] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:46:23] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:48:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:48:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[01:50:58] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[01:51:00] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[01:51:00] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[01:51:05] <icinga-wm>	 RECOVERY - Host mr1-magru.oob is UP: PING OK - Packet loss = 0%, RTA = 117.15 ms
[01:51:20] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[01:51:26] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[02:00:41] <logmsgbot>	 !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image
[02:07:17] <logmsgbot>	 !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 36s)
[02:09:21] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:20:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[02:20:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[02:20:53] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[02:20:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[02:34:20] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:35:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[02:35:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[02:36:02] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[02:36:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[02:38:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:47:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[02:47:15] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[02:47:32] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[02:47:38] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[03:00:29] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.58.139 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[03:12:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[03:12:15] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[03:12:32] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[03:12:38] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[03:53:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[03:53:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[03:58:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[03:58:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[04:04:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[04:04:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[04:09:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[04:09:20] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[04:10:37] <icinga-wm>	 PROBLEM - Ensure acme-chief-backend is running only in the active node on acmechief2002 is CRITICAL: PROCS CRITICAL: 2 processes with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief
[04:11:37] <icinga-wm>	 RECOVERY - Ensure acme-chief-backend is running only in the active node on acmechief2002 is OK: PROCS OK: 1 process with args acme-chief-backend https://wikitech.wikimedia.org/wiki/Acme-chief
[04:24:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[04:24:15] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[04:24:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[04:24:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[04:42:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[04:42:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[04:43:02] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[04:43:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[04:47:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[04:47:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[04:48:02] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[04:48:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[05:06:00] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[05:09:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[05:09:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[05:09:32] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[05:09:38] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[05:19:23] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2157 - https://phabricator.wikimedia.org/T425242#11885164 (10Marostegui) p:05Triage→03Medium @Jhancock.wm can we swap this disk? It can be done anytime. Thanks!
[05:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:50:59] <jinxer-wm>	 FIRING: [4x] CoreBGPDown: Core BGP session down between cr1-drmrs and cr2-eqiad (185.15.58.138) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[05:54:09] <wikibugs>	 (03PS1) 10Marostegui: db2149: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282076 (https://phabricator.wikimedia.org/T424792)
[05:54:56] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2149: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282076 (https://phabricator.wikimedia.org/T424792) (owner: 10Marostegui)
[05:54:57] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie
[05:55:03] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie
[05:55:42] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie
[05:57:10] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie
[05:58:15] <wikibugs>	 (03PS1) 10Marostegui: db1188: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282077 (https://phabricator.wikimedia.org/T424615)
[05:58:36] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie
[05:58:41] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie
[05:58:55] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1188: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282077 (https://phabricator.wikimedia.org/T424615) (owner: 10Marostegui)
[06:02:50] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie
[06:05:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie
[06:09:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[06:09:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[06:09:22] <wikibugs>	 (03PS1) 10Marostegui: db1212.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282078 (https://phabricator.wikimedia.org/T424792)
[06:09:36] <marostegui>	 !log Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 T424792
[06:09:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:09:38] <stashbot>	 T424792: Migrate s3 section to Debian Trixie - https://phabricator.wikimedia.org/T424792
[06:10:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[06:10:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[06:10:29] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie
[06:10:30] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1212.yaml: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282078 (https://phabricator.wikimedia.org/T424792) (owner: 10Marostegui)
[06:11:18] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie
[06:11:23] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie
[06:11:41] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie
[06:13:52] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] ganeti.addnode: run ImportPuppetDB script after node addition [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175) (owner: 10Ayounsi)
[06:15:26] <logmsgbot>	 marostegui@cumin1003 reimage (PID 3690838) is awaiting input
[06:17:58] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
[06:18:05] <wikibugs>	 (03Merged) 10jenkins-bot: ganeti.addnode: run ImportPuppetDB script after node addition [cookbooks] - 10https://gerrit.wikimedia.org/r/1117554 (https://phabricator.wikimedia.org/T381175) (owner: 10Ayounsi)
[06:19:13] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
[06:21:49] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie
[06:22:41] <wikibugs>	 (03PS1) 10Ayounsi: eqsin durum hcaptcha-proxy: don't peer with core routers [puppet] - 10https://gerrit.wikimedia.org/r/1282080 (https://phabricator.wikimedia.org/T421863)
[06:25:25] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
[06:25:51] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Homer trying to delete BGP peerings for VMs on new Eqiad ganeti nodes - https://phabricator.wikimedia.org/T381175#11885209 (10ayounsi) 05Open→03Resolved I think we're all good here, the issue has been tackled in 2 different ways and...
[06:25:52] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1212.yaml: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282082
[06:25:57] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1188: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282083
[06:26:02] <wikibugs>	 (03PS1) 10Marostegui: Revert "db2149: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282084
[06:28:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 07Epic: [tracking] Don't keep on the public vlans hosts that don't require it - https://phabricator.wikimedia.org/T317177#11885212 (10ayounsi)
[06:28:23] <wikibugs>	 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists: Figure out plan for mailman IP situation - https://phabricator.wikimedia.org/T278495#11885213 (10ayounsi)
[06:29:40] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
[06:33:49] <wikibugs>	 (03CR) 10A smart kitten: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/815306 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[06:34:43] <wikibugs>	 (03CR) 10A smart kitten: "(very sorry, misclicked)" [puppet] - 10https://gerrit.wikimedia.org/r/815306 (https://phabricator.wikimedia.org/T305847) (owner: 10Filippo Giunchedi)
[06:37:04] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
[06:38:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:42:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1282080 (https://phabricator.wikimedia.org/T421863) (owner: 10Ayounsi)
[06:43:29] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
[06:47:36] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie
[06:49:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[06:49:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[06:50:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[06:50:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[06:52:14] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie
[06:54:57] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1188: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282083 (owner: 10Marostegui)
[06:55:04] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie
[06:55:19] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db2149: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282084 (owner: 10Marostegui)
[06:56:01] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie
[06:59:05] <wikibugs>	 06SRE, 10Observability-Alerting: performance.discovery.wmnet - https://phabricator.wikimedia.org/T425299 (10MoritzMuehlenhoff) 03NEW
[06:59:23] <wikibugs>	 06SRE, 10Observability-Alerting: ATS backend errors for performance.discovery.wmnet should not page - https://phabricator.wikimedia.org/T425299#11885240 (10MoritzMuehlenhoff)
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T0700).
[07:00:05] <jouncebot>	 xxb and nya_1F616EMO: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:05:21] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie
[07:06:14] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1212.yaml: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282082 (owner: 10Marostegui)
[07:08:04] <wikibugs>	 (03PS2) 10JMeybohm: Update rsyslog image to trixie and rsyslog 8.2504.0-1 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200)
[07:08:20] <wikibugs>	 (03CR) 10JMeybohm: "Oh, yes. Sorry. 8.2504.0-1 is the version shipped with trixie - so updating the base image to trixie will update rsyslog" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[07:11:53] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie
[07:16:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
[07:20:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
[07:22:57] <wikibugs>	 (03PS1) 10Elukey: profile::kafka::mirror: remove Icinga-based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1282092
[07:24:13] <wikibugs>	 (03PS1) 10Marostegui: db2147: Decommission [puppet] - 10https://gerrit.wikimedia.org/r/1282094 (https://phabricator.wikimedia.org/T424226)
[07:26:01] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] kafka-main: set main-codfw cluster brokers to Confluent distro 77 (3.7) [puppet] - 10https://gerrit.wikimedia.org/r/1278832 (https://phabricator.wikimedia.org/T419216) (owner: 10Jasmine)
[07:28:09] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet
[07:28:12] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db2147: Decommission [puppet] - 10https://gerrit.wikimedia.org/r/1282094 (https://phabricator.wikimedia.org/T424226) (owner: 10Marostegui)
[07:28:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[07:28:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[07:29:02] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[07:29:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[07:33:02] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.dns.netbox
[07:33:03] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet
[07:33:20] <wikibugs>	 (03CR) 10Blake: [C:03+1] "Done" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[07:34:21] <wikibugs>	 (03CR) 10JMeybohm: "Do we need to absent these before dropping the code?" [puppet] - 10https://gerrit.wikimedia.org/r/1282092 (owner: 10Elukey)
[07:34:30] <nya_1F616EMO>	 Oh sorry, missed the window again
[07:35:05] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, May 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281965 (https://phabricator.wikimedia.org/T420165) (owner: 101F616EMO)
[07:35:29] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[07:35:39] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[07:37:01] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet
[07:38:44] <logmsgbot>	 marostegui@cumin1003 decommission (PID 3706129) is awaiting input
[07:38:46] <moritzm>	 !log installing Linux 6.12.85 on trixie hosts
[07:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:30] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie
[07:41:00] <wikibugs>	 (03CR) 10JMeybohm: [V:03+2 C:03+2] Update rsyslog image to trixie and rsyslog 8.2504.0-1 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1280313 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[07:41:26] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie
[07:42:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[07:42:48] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
[07:42:48] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:42:49] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet
[07:43:31] <wikibugs>	 (03CR) 10Elukey: "Should be easy enough to clean them up manually with a quick pass afterwards." [puppet] - 10https://gerrit.wikimedia.org/r/1282092 (owner: 10Elukey)
[07:43:36] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2147.codfw.wmnet - https://phabricator.wikimedia.org/T424226#11885319 (10Marostegui) a:05Marostegui→03Jhancock.wm
[07:43:50] <wikibugs>	 (03PS1) 10Ayounsi: CoreRouterInterfaceDropPercent: fix ping disable [alerts] - 10https://gerrit.wikimedia.org/r/1282099
[07:43:52] <wikibugs>	 10ops-codfw, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2147.codfw.wmnet - https://phabricator.wikimedia.org/T424226#11885324 (10Marostegui) Ready for DC-Ops
[07:44:12] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[07:44:18] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[07:44:54] <dcausse>	 !log T425301: stopping writes on cloudelastic 
[07:44:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:56] <stashbot>	 T425301: The cloudelastic chi cluster is red - https://phabricator.wikimedia.org/T425301
[07:46:12] <wikibugs>	 (03CR) 10Ayounsi: CoreRouterInterfaceDropPercent: fix ping disable (032 comments) [alerts] - 10https://gerrit.wikimedia.org/r/1282099 (owner: 10Ayounsi)
[07:46:24] <wikibugs>	 (03PS1) 10Marostegui: db1182: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282101 (https://phabricator.wikimedia.org/T424615)
[07:46:44] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] eqsin durum hcaptcha-proxy: don't peer with core routers [puppet] - 10https://gerrit.wikimedia.org/r/1282080 (https://phabricator.wikimedia.org/T421863) (owner: 10Ayounsi)
[07:46:58] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie
[07:47:03] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie
[07:47:06] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1182: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282101 (https://phabricator.wikimedia.org/T424615) (owner: 10Marostegui)
[07:47:20] <marostegui>	 XioNoX: good to merge?
[07:47:26] <XioNoX>	 marostegui: yup, thx
[07:47:30] <marostegui>	 XioNoX: de rien
[07:47:31] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie
[07:48:26] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[07:48:39] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[07:51:22] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
[07:51:41] <logmsgbot>	 marostegui@cumin1003 reimage (PID 3707205) is awaiting input
[07:51:43] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
[07:53:26] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] eventstreams: Configure new stream for revertrisk-multilingual model. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1281431 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[07:54:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by urbanecm@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1277256 (https://phabricator.wikimedia.org/T407106) (owner: 10HakanIST)
[07:55:22] <logmsgbot>	 marostegui@cumin1003 reimage (PID 3707205) is awaiting input
[07:55:36] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie
[07:55:44] <wikibugs>	 (03Merged) 10jenkins-bot: Add sva to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1277256 (https://phabricator.wikimedia.org/T407106) (owner: 10HakanIST)
[07:55:47] <wikibugs>	 (03Merged) 10jenkins-bot: eventstreams: Configure new stream for revertrisk-multilingual model. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1281431 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[07:55:54] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
[07:56:12] <logmsgbot>	 !log urbanecm@deploy1003 Started scap sync-world: Backport for [[gerrit:1277256|Add sva to wmgExtraLanguageNames (T407106)]]
[07:56:14] <stashbot>	 T407106: Add label and monolingual language code sva - https://phabricator.wikimedia.org/T407106
[07:57:17] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie
[07:57:56] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm, h2o: Backport for [[gerrit:1277256|Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:57:59] <wikibugs>	 (03PS1) 10Elukey: aptrepo: add otelcol-contrib thirdparty config for Trixie [puppet] - 10https://gerrit.wikimedia.org/r/1282106 (https://phabricator.wikimedia.org/T416452)
[07:58:25] <wikibugs>	 (03PS2) 10JMeybohm: Test updated rsyslog image on mw-experimental and mw-web canary [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280324 (https://phabricator.wikimedia.org/T418200)
[07:58:25] <wikibugs>	 (03PS1) 10JMeybohm: mw: Remove references to rsyslogd image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282107 (https://phabricator.wikimedia.org/T418200)
[07:59:48] <logmsgbot>	 !log urbanecm@deploy1003 urbanecm, h2o: Continuing with deployment
[08:00:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[08:01:03] <wikibugs>	 (03CR) 10Blake: [C:03+1] Test updated rsyslog image on mw-experimental and mw-web canary [deployment-charts] - 10https://gerrit.wikimedia.org/r/1280324 (https://phabricator.wikimedia.org/T418200) (owner: 10JMeybohm)
[08:01:40] <moritzm>	 !log installing Linux 6.1.170 on bookworm hosts
[08:01:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
[08:02:24] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [staging] START helmfile.d/services/eventstreams: sync
[08:02:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: netbox_ganeti_codfw_test_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:02:31] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [staging] DONE helmfile.d/services/eventstreams: sync
[08:02:35] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[08:02:46] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
[08:02:56] <jinxer-wm>	 FIRING: CirrusConsumerCloudelasticFlinkJobNotRunning: ...
[08:02:56] <jinxer-wm>	 cirrus_streaming_updater_cloudelastic_consumer in eqiad (k8s) is not running - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerCloudelasticFlinkJobNotRunning
[08:03:37] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [eqiad] START helmfile.d/services/eventstreams: sync
[08:04:10] <logmsgbot>	 !log urbanecm@deploy1003 Finished scap sync-world: Backport for [[gerrit:1277256|Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s)
[08:04:12] <stashbot>	 T407106: Add label and monolingual language code sva - https://phabricator.wikimedia.org/T407106
[08:04:17] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
[08:06:28] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
[08:06:49] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
[08:08:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[08:08:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[08:08:53] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[08:08:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[08:09:00] <wikibugs>	 06SRE, 06Data-Platform-SRE, 06Infrastructure-Foundations, 07Epic, 13Patch-For-Review: Migrate Docker images running in Production away from Bullseye - https://phabricator.wikimedia.org/T416452#11885455 (10elukey)
[08:11:32] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage
[08:15:28] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage
[08:16:01] <wikibugs>	 (03CR) 10JMeybohm: [C:04-1] k8s: Remove support for k8s versions before 1.31 (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1278370 (https://phabricator.wikimedia.org/T423251) (owner: 10Blake)
[08:17:17] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1182: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282270
[08:17:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: netbox_ganeti_codfw_test_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:17:50] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "Ok, fine by me!" [puppet] - 10https://gerrit.wikimedia.org/r/1282092 (owner: 10Elukey)
[08:18:16] <wikibugs>	 (03PS6) 10Hashar: Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[08:19:55] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
[08:20:12] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
[08:20:19] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
[08:20:25] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json
[08:20:57] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::kafka::mirror: remove Icinga-based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/1282092 (owner: 10Elukey)
[08:23:12] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] QoS: Map packets marked with DSCP CS1 into low-prirority class [homer/public] - 10https://gerrit.wikimedia.org/r/1279334 (https://phabricator.wikimedia.org/T424640) (owner: 10Cathal Mooney)
[08:23:56] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
[08:24:34] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Network QoS: adjust configuration to mark low-priority traffic as CS1 [puppet] - 10https://gerrit.wikimedia.org/r/1279339 (https://phabricator.wikimedia.org/T424640) (owner: 10Cathal Mooney)
[08:26:19] <wikibugs>	 (03CR) 10Hashar: "I have created a Puppet prefix config which disable assignment of the ephemeral disk to /srv." [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[08:28:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json
[08:32:01] <moritzm>	 !log installing Linux 5.10.251-3 on bullseye hosts
[08:32:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:05] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1182: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282270 (owner: 10Marostegui)
[08:33:42] <wikibugs>	 (03CR) 10Ayounsi: "Realistically it won't change much, but it's the new "clean" way of running gNMIc as a daemon." [puppet] - 10https://gerrit.wikimedia.org/r/1278390 (https://phabricator.wikimedia.org/T416360) (owner: 10Ayounsi)
[08:33:51] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] gNMIc: use collect mode [puppet] - 10https://gerrit.wikimedia.org/r/1278390 (https://phabricator.wikimedia.org/T416360) (owner: 10Ayounsi)
[08:34:35] <wikibugs>	 (03PS4) 10Daniel Kinzler: rest-gateway: generalize class overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/1278376 (https://phabricator.wikimedia.org/T424828)
[08:35:03] <icinga-wm>	 PROBLEM - Host cloudelastic1007 is DOWN: PING CRITICAL - Packet loss = 100%
[08:37:15] <icinga-wm>	 RECOVERY - Host cloudelastic1007 is UP: PING OK - Packet loss = 0%, RTA = 0.33 ms
[08:37:43] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie
[08:38:07] <wikibugs>	 (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.1 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282277 (https://phabricator.wikimedia.org/T419511)
[08:38:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json
[08:42:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cloudelastic1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:42:25] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet
[08:42:31] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet
[08:42:43] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet
[08:42:48] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet
[08:42:54] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet
[08:43:00] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet
[08:43:06] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet
[08:43:12] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet
[08:43:18] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet
[08:43:24] <logmsgbot>	 !log jmm@cumin2002 DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet
[08:44:28] <wikibugs>	 (03PS1) 10Gkyziridis: ml-services: Deploy the latest version of revertrisk-multilingual model on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282278 (https://phabricator.wikimedia.org/T415892)
[08:44:35] <wikibugs>	 (03CR) 10Elukey: [C:03+1] kafka-main: set main-codfw cluster brokers to Confluent distro 77 (3.7) [puppet] - 10https://gerrit.wikimedia.org/r/1278832 (https://phabricator.wikimedia.org/T419216) (owner: 10Jasmine)
[08:45:43] <jinxer-wm>	 RESOLVED: CoreBGPDown: Core BGP session down between cr1-drmrs and  (2a02:ec80:600:fe01::1) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=drmrs&var-device=cr1-drmrs:9804&var-bgp_group=Confed_eqiad&var-bgp_neighbor= - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[08:47:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cloudelastic1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:48:22] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1223 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/1282279 (https://phabricator.wikimedia.org/T425318)
[08:48:28] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: wmnet: Update s3-master alias [dns] - 10https://gerrit.wikimedia.org/r/1282280 (https://phabricator.wikimedia.org/T425318)
[08:49:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json
[08:49:17] <wikibugs>	 (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy the latest version of revertrisk-multilingual model on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282278 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[08:50:45] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie
[08:51:36] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Deploy the latest version of revertrisk-multilingual model on prod. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282278 (https://phabricator.wikimedia.org/T415892) (owner: 10Gkyziridis)
[08:55:52] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[08:56:05] <logmsgbot>	 !log gkyziridis@deploy1003 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
[08:59:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json
[08:59:23] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
[08:59:31] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json
[09:03:34] <wikibugs>	 (03PS2) 10Muehlenhoff: Assign the hcaptcha::proxy role to  hcaptcha-proxy5003/5004 [puppet] - 10https://gerrit.wikimedia.org/r/1280353 (https://phabricator.wikimedia.org/T421863)
[09:06:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Assign bastion role to bast5005 [puppet] - 10https://gerrit.wikimedia.org/r/1282285 (https://phabricator.wikimedia.org/T421863)
[09:06:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 13Patch-For-Review: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683#11885878 (10ayounsi) Nice!  We can also filter out the `.16386`, `.16384`, `.16385`,  `.16383`, `.32769` - weird juniper... a...
[09:07:10] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Assign bastion role to bast5005 [puppet] - 10https://gerrit.wikimedia.org/r/1282285 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[09:08:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json
[09:10:41] <wikibugs>	 (03PS1) 10Slyngshede: P:idp webauthn, with database backend [puppet] - 10https://gerrit.wikimedia.org/r/1282286 (https://phabricator.wikimedia.org/T372892)
[09:12:53] <wikibugs>	 (03PS2) 10Slyngshede: P:idp webauthn, with database backend [puppet] - 10https://gerrit.wikimedia.org/r/1282286 (https://phabricator.wikimedia.org/T372892)
[09:13:57] <wikibugs>	 (03CR) 10Slyngshede: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282286 (https://phabricator.wikimedia.org/T372892) (owner: 10Slyngshede)
[09:15:11] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events
[09:15:50] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db2187: Fixing events
[09:16:26] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events
[09:18:39] <wikibugs>	 (03Abandoned) 10Slyngshede: P:idp experimental webauthn [puppet] - 10https://gerrit.wikimedia.org/r/1091237 (https://phabricator.wikimedia.org/T311236) (owner: 10Slyngshede)
[09:18:54] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json
[09:18:58] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11885960 (10elukey) Tried to upgrade the BIOS, and then reset the BMC as suggested by the UI. It seems taking a long time, I'll come back later to check!
[09:23:50] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] "Will deploy later today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282060 (owner: 10Chlod Alejandro)
[09:27:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Assign bastion role to bast5005 [puppet] - 10https://gerrit.wikimedia.org/r/1282285 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[09:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[09:29:02] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json
[09:31:27] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:kubernetes: deployment_server: Remove kafka cluster IPv6 flag [puppet] - 10https://gerrit.wikimedia.org/r/1270281 (owner: 10Majavah)
[09:36:08] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie
[09:37:15] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[09:37:16] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[09:37:42] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[09:37:54] <logmsgbot>	 !log javiermonton@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[09:39:10] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json
[09:39:30] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
[09:39:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json
[09:41:00] <wikibugs>	 (03CR) 10JavierMonton: [C:03+1] alerts: update runbook link for mw-page-html-feature-counts-change-enrich [alerts] - 10https://gerrit.wikimedia.org/r/1281017 (https://phabricator.wikimedia.org/T424225) (owner: 10AKhatun)
[09:43:23] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org
[09:48:03] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json
[09:49:29] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "I probably should have asked that here:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1226814 (https://phabricator.wikimedia.org/T414439) (owner: 10Clément Goubert)
[09:49:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org
[09:57:50] <wikibugs>	 (03PS1) 10Marostegui: db1162: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282293 (https://phabricator.wikimedia.org/T424615)
[09:58:10] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json
[10:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1000)
[10:00:50] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] db1162: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1282293 (https://phabricator.wikimedia.org/T424615) (owner: 10Marostegui)
[10:01:22] <wikibugs>	 (03PS1) 10Muehlenhoff: Add bast5005 to bastion firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1282294 (https://phabricator.wikimedia.org/T421863)
[10:01:37] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie
[10:01:42] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie
[10:01:59] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie
[10:02:57] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie
[10:03:18] <wikibugs>	 (03PS1) 10Elukey: profile::pki: add Puppet CA's public key to client_auth_CA.pem [puppet] - 10https://gerrit.wikimedia.org/r/1282295 (https://phabricator.wikimedia.org/T424549)
[10:04:21] <wikibugs>	 (03PS2) 10Muehlenhoff: Add bast5005 to bastion firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1282294 (https://phabricator.wikimedia.org/T421863)
[10:06:48] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::pki: add Puppet CA's public key to client_auth_CA.pem [puppet] - 10https://gerrit.wikimedia.org/r/1282295 (https://phabricator.wikimedia.org/T424549) (owner: 10Elukey)
[10:08:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json
[10:15:49] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage
[10:16:10] <logmsgbot>	 !log marostegui@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage
[10:16:29] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db2187: repool after maintenance
[10:18:27] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json
[10:18:47] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
[10:18:56] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json
[10:22:14] <wikibugs>	 (03PS1) 10Elukey: role::aux_k8s::master: setup IPIP encapsulation settings [puppet] - 10https://gerrit.wikimedia.org/r/1282298 (https://phabricator.wikimedia.org/T420439)
[10:22:16] <wikibugs>	 (03PS1) 10Elukey: role::aux_k8s::worker: add IPIP encapsulation settings [puppet] - 10https://gerrit.wikimedia.org/r/1282299 (https://phabricator.wikimedia.org/T420439)
[10:24:30] <wikibugs>	 (03PS2) 10Elukey: role::aux_k8s::master: setup IPIP encapsulation settings [puppet] - 10https://gerrit.wikimedia.org/r/1282298 (https://phabricator.wikimedia.org/T420439)
[10:24:30] <wikibugs>	 (03PS2) 10Elukey: role::aux_k8s::worker: add IPIP encapsulation settings [puppet] - 10https://gerrit.wikimedia.org/r/1282299 (https://phabricator.wikimedia.org/T420439)
[10:26:47] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet
[10:26:49] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[10:27:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json
[10:30:39] <wikibugs>	 (03PS1) 10JavierMonton: stream: mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282300 (https://phabricator.wikimedia.org/T425336)
[10:32:32] <logmsgbot>	 jmm@cumin2002 makevm (PID 779169) is awaiting input
[10:32:35] <wikibugs>	 (03PS1) 10Marostegui: Revert "db1162: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282301
[10:34:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002"
[10:34:34] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002"
[10:34:35] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[10:34:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors
[10:34:39] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors
[10:35:15] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002"
[10:35:21] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002"
[10:36:31] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] Revert "db1162: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1282301 (owner: 10Marostegui)
[10:37:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json
[10:38:22] <logmsgbot>	 jmm@cumin2002 makevm (PID 779169) is awaiting input
[10:38:38] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm
[10:38:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:38:55] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11886179 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host durum5003.eqsin.wmnet with OS bookworm
[10:39:01] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie
[10:40:53] <wikibugs>	 (03CR) 10Mszwarc: [C:03+1] Move privileged global and local group handling to WikimediaCustomizations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1271969 (https://phabricator.wikimedia.org/T418507) (owner: 10Bartosz Dziewoński)
[10:42:02] <logmsgbot>	 !log marostegui@cumin1003 START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie
[10:42:14] <moritzm>	 !log installing postgresql-17 security updates
[10:42:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:27] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 318 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 759, active_shards: 1215, relocating_shards: 0, initializing_shards: 8, unassigned_shards: 
[10:42:27] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 79.2563600782779 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:43:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1010 is CRITICAL: CRITICAL - elasticsearch inactive shards 307 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 760, active_shards: 1226, relocating_shards: 0, initializing_shards: 9, unassigned_shards: 
[10:43:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 79.97390737116764 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:43:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1009 is CRITICAL: CRITICAL - elasticsearch inactive shards 307 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 760, active_shards: 1226, relocating_shards: 0, initializing_shards: 9, unassigned_shards: 
[10:43:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 79.97390737116764 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:43:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch inactive shards 306 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1227, relocating_shards: 0, initializing_shards: 8, unassigned_shards: 
[10:43:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.03913894324853 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:43:31] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1011 is CRITICAL: CRITICAL - elasticsearch inactive shards 306 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1227, relocating_shards: 0, initializing_shards: 8, unassigned_shards: 
[10:43:31] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.03913894324853 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:43:37] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch inactive shards 305 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1228, relocating_shards: 0, initializing_shards: 8, unassigned_shards: 
[10:43:37] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.10437051532942 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:45:25] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 13Patch-For-Review: Cookbook for rack depool - https://phabricator.wikimedia.org/T327300#11886199 (10ayounsi)
[10:46:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1010 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1327, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 201, delayed_unassigned_
[10:46:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 86.56229615133725 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:46:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1329, relocating_shards: 0, initializing_shards: 3, unassigned_shards: 201, delayed_unassigned_
[10:46:27] <icinga-wm>	 0, number_of_pending_tasks: 3, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 164, active_shards_percent_as_number: 86.69275929549902 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:46:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1329, relocating_shards: 0, initializing_shards: 3, unassigned_shards: 201, delayed_unassigned_
[10:46:27] <icinga-wm>	 0, number_of_pending_tasks: 3, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 224, active_shards_percent_as_number: 86.69275929549902 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:46:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1009 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1330, relocating_shards: 0, initializing_shards: 2, unassigned_shards: 201, delayed_unassigned_
[10:46:27] <icinga-wm>	 0, number_of_pending_tasks: 2, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 94, active_shards_percent_as_number: 86.7579908675799 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:46:31] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Add bast5005 to bastion firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1282294 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[10:46:31] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1011 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1334, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 194, delayed_unassigned_
[10:46:31] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.01891715590345 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:46:37] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1338, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 190, delayed_unassigned_
[10:46:37] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.279843444227 https://wikitech.wikimedia.org/wiki/Search%23Administration
[10:47:32] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json
[10:48:09] <moritzm>	 !log installing bash updates from trixie point release
[10:48:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:52:37] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.3 point update - https://phabricator.wikimedia.org/T414179#11886203 (10MoritzMuehlenhoff)
[10:53:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add bast5005 to bastion firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1282294 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[10:57:40] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json
[10:58:00] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
[10:58:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json
[11:01:57] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance
[11:03:37] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:03:37] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1009 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:04:25] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1010 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:04:25] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:04:25] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:04:26] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:04:31] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1011 is CRITICAL: CRITICAL - elasticsearch inactive shards 301 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 761, active_shards: 1232, relocating_shards: 0, initializing_shards: 10, unassigned_shards:
[11:04:31] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 80.36529680365297 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:05:26] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json
[11:06:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1010 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1344, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 184, delayed_unassigned_
[11:06:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.67123287671232 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:06:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1009 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1344, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 184, delayed_unassigned_
[11:06:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.67123287671232 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:06:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1344, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 184, delayed_unassigned_
[11:06:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.67123287671232 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:06:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1344, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 184, delayed_unassigned_
[11:06:26] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.67123287671232 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:06:31] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1011 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1351, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 177, delayed_unassigned_
[11:06:31] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 88.12785388127854 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:06:37] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1356, relocating_shards: 0, initializing_shards: 4, unassigned_shards: 173, delayed_unassigned_
[11:06:37] <icinga-wm>	 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 88.45401174168298 https://wikitech.wikimedia.org/wiki/Search%23Administration
[11:09:36] <wikibugs>	 07sre-alert-triage: Alert in need of triage: Kafka MirrorMaker main-codfw_to_main-eqiad dropped message count in last 30m (instance alert1002) - https://phabricator.wikimedia.org/T425339 (10LSobanski) 03NEW
[11:10:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=eqiad%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:11:25] <wikibugs>	 07sre-alert-triage: Alert in need of triage: Kafka MirrorMaker main-codfw_to_main-eqiad dropped message count in last 30m (instance alert1002) - https://phabricator.wikimedia.org/T425339#11886249 (10LSobanski) Alerts mention both main and jumbo so tagging both #serviceops_new and #data-platform-sre
[11:11:43] <wikibugs>	 07sre-alert-triage, 06Data-Platform-SRE, 06ServiceOps new: Alert in need of triage: Kafka MirrorMaker main-codfw_to_main-eqiad dropped message count in last 30m (instance alert1002) - https://phabricator.wikimedia.org/T425339#11886261 (10LSobanski)
[11:13:18] <wikibugs>	 (03PS1) 10Muehlenhoff: redis::master: Remove obsolete code only used for old ferm service [puppet] - 10https://gerrit.wikimedia.org/r/1282308 (https://phabricator.wikimedia.org/T419976)
[11:15:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:15:34] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json
[11:19:21] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282308 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[11:20:15] <jinxer-wm>	 FIRING: MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?var-datasource=codfw%20prometheus/ops&viewPanel=19 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:25:15] <jinxer-wm>	 RESOLVED: [2x] MediaWikiMemcachedHighErrorRate: MediaWiki memcached error rate is elevated globally - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook  - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiMemcachedHighErrorRate
[11:25:43] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json
[11:25:49] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage
[11:26:14] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage
[11:26:20] <wikibugs>	 (03PS1) 10Majavah: P:redis::master: Pass ports as an array to firewall [puppet] - 10https://gerrit.wikimedia.org/r/1282311
[11:27:27] <logmsgbot>	 !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie
[11:28:23] <wikibugs>	 (03PS2) 10Majavah: P:redis::master: Pass ports as an array to firewall [puppet] - 10https://gerrit.wikimedia.org/r/1282311
[11:29:47] <wikibugs>	 (03PS2) 10Muehlenhoff: redis::master: Remove obsolete code only used for old ferm service [puppet] - 10https://gerrit.wikimedia.org/r/1282308 (https://phabricator.wikimedia.org/T419976)
[11:30:44] <wikibugs>	 (03PS3) 10Majavah: P:redis::master: Pass ports as an array to firewall [puppet] - 10https://gerrit.wikimedia.org/r/1282311
[11:31:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.3 point update - https://phabricator.wikimedia.org/T414179#11886343 (10MoritzMuehlenhoff)
[11:33:55] <wikibugs>	 (03PS4) 10Majavah: P:redis::master: Pass ports as an array to firewall [puppet] - 10https://gerrit.wikimedia.org/r/1282311
[11:34:41] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/8504/console" [puppet] - 10https://gerrit.wikimedia.org/r/1282311 (owner: 10Majavah)
[11:35:51] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json
[11:36:12] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
[11:36:20] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json
[11:36:57] <wikibugs>	 (03PS1) 10Muehlenhoff: redis::master: Pass ports as an array, not a string [puppet] - 10https://gerrit.wikimedia.org/r/1282315 (https://phabricator.wikimedia.org/T419976)
[11:43:54] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations: Add ganeti105[5678] - https://phabricator.wikimedia.org/T424680#11886374 (10MoritzMuehlenhoff) p:05Triage→03Medium
[11:44:01] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json
[11:45:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm
[11:45:49] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet
[11:46:02] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11886375 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host durum5003.eqsin.wmnet with OS bookworm completed: - durum500...
[11:47:01] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282315 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[11:47:16] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet
[11:47:18] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[11:51:03] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002"
[11:54:08] <logmsgbot>	 jmm@cumin2002 makevm (PID 833063) is awaiting input
[11:54:08] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json
[11:55:12] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002"
[11:55:13] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:55:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors
[11:55:17] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors
[11:55:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002"
[11:55:56] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002"
[11:58:57] <logmsgbot>	 jmm@cumin2002 makevm (PID 833063) is awaiting input
[12:02:56] <jinxer-wm>	 FIRING: CirrusConsumerCloudelasticFlinkJobNotRunning: ...
[12:02:56] <jinxer-wm>	 cirrus_streaming_updater_cloudelastic_consumer in eqiad (k8s) is not running - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerCloudelasticFlinkJobNotRunning
[12:03:34] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm
[12:03:49] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11886387 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host durum5004.eqsin.wmnet with OS bookworm
[12:03:58] <wikibugs>	 (03PS1) 10Gehel: feat(sysctl): priority is optional on sysctl::conffile [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301)
[12:03:59] <wikibugs>	 (03PS1) 10Gehel: perf(opensearch): increase 'vm.max_map_count' to 1048576 [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301)
[12:04:16] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json
[12:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1009 is CRITICAL: CRITICAL - elasticsearch inactive shards 269 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1264, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 
[12:04:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.45270711024135 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1010 is CRITICAL: CRITICAL - elasticsearch inactive shards 268 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1265, relocating_shards: 0, initializing_shards: 4, unassigned_shards: 
[12:04:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.51793868232224 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:04:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch inactive shards 267 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1266, relocating_shards: 0, initializing_shards: 4, unassigned_shards: 
[12:04:25] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.58317025440313 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:04:26] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 267 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1266, relocating_shards: 0, initializing_shards: 4, unassigned_shards: 
[12:04:26] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.58317025440313 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:04:31] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1011 is CRITICAL: CRITICAL - elasticsearch inactive shards 265 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1268, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 
[12:04:31] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.7136333985649 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:04:39] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch inactive shards 251 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1282, relocating_shards: 0, initializing_shards: 1, unassigned_shards: 
[12:04:39] <icinga-wm>	 ayed_unassigned_shards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 83.62687540769733 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1009 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1329, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 199, delayed_unassigned_
[12:05:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 86.69275929549902 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1010 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1329, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 199, delayed_unassigned_
[12:05:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 86.69275929549902 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1331, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 197, delayed_unassigned_
[12:05:25] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 86.8232224396608 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:25] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1331, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 197, delayed_unassigned_
[12:05:26] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 86.8232224396608 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:31] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1011 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1335, relocating_shards: 0, initializing_shards: 5, unassigned_shards: 193, delayed_unassigned_
[12:05:31] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.08414872798434 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:05:37] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1339, relocating_shards: 0, initializing_shards: 4, unassigned_shards: 190, delayed_unassigned_
[12:05:37] <icinga-wm>	 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 87.34507501630789 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:06:13] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1008 is CRITICAL: CRITICAL - commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.137Z), commonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:
[12:06:13] <icinga-wm>	 , commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30](2026-05-01T02:38:20.027Z), commonswiki_file_1764602342
[12:06:13] <icinga-wm>	 6-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wikidatawiki_content_1764707176[17](2026-05-01T02:27:03.39 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:06:34] <wikibugs>	 (03PS2) 10Gehel: perf(opensearch): increase 'vm.max_map_count' to 1048576 [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301)
[12:06:53] <wikibugs>	 (03CR) 10Gehel: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[12:09:20] <wikibugs>	 (03PS2) 10Muehlenhoff: redis::master: Pass ports as an array, not a string [puppet] - 10https://gerrit.wikimedia.org/r/1282315 (https://phabricator.wikimedia.org/T419976)
[12:11:36] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282315 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[12:13:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1282325 (owner: 10L10n-bot)
[12:14:09] <wikibugs>	 (03PS1) 10Dpogorzelski: lvs: expose grpc port on ml-serve staging [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049)
[12:14:24] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json
[12:14:34] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
[12:14:42] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json
[12:15:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good, I'll abandon https://gerrit.wikimedia.org/r/c/operations/puppet/+/1282315" [puppet] - 10https://gerrit.wikimedia.org/r/1282311 (owner: 10Majavah)
[12:16:15] <icinga-wm>	 PROBLEM - OpenSearch unassigned shard check - 9200 on cloudelastic1010 is CRITICAL: CRITICAL - commonswiki_content_1764807130[0](2026-05-01T01:20:36.409Z), commonswiki_content_1764807130[0](2026-05-01T01:16:04.312Z), commonswiki_file_1764602342[0](2026-05-01T02:27:03.399Z), commonswiki_file_1764602342[13](2026-05-01T02:38:50.137Z), commonswiki_file_1764602342[16](2026-05-01T02:39:50.267Z), commonswiki_file_1764602342[22](2026-05-01T02:34:
[12:16:15] <icinga-wm>	 , commonswiki_file_1764602342[23](2026-05-01T02:38:50.136Z), commonswiki_file_1764602342[24](2026-05-01T02:24:59.235Z), commonswiki_file_1764602342[25](2026-05-01T01:21:30.468Z), commonswiki_file_1764602342[25](2026-05-01T00:25:20.171Z), commonswiki_file_1764602342[26](2026-05-01T02:31:54.444Z), commonswiki_file_1764602342[28](2026-05-01T02:44:34.982Z), commonswiki_file_1764602342[30](2026-05-01T02:38:20.027Z), commonswiki_file_1764602342
[12:16:15] <icinga-wm>	 6-05-01T02:44:34.987Z), wikidatawiki_content_1764707176[7](2026-05-01T02:34:54.891Z), wikidatawiki_content_1764707176[16](2026-05-01T02:26:46.436Z), wikidatawiki_content_1764707176[17](2026-05-01T02:27:03.39 https://wikitech.wikimedia.org/wiki/Search%23Administration
[12:21:56] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json
[12:25:54] <wikibugs>	 (03PS2) 10Dpogorzelski: lvs: expose grpc port on ml-serve staging [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049)
[12:27:22] <wikibugs>	 (03PS3) 10Dpogorzelski: lvs: expose grpc port on ml-serve staging [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049)
[12:30:29] <wikibugs>	 (03PS2) 10Gehel: feat(sysctl): priority is optional on sysctl::conffile [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301)
[12:30:29] <wikibugs>	 (03PS3) 10Gehel: perf(opensearch): increase 'vm.max_map_count' to 1048576 [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301)
[12:30:42] <wikibugs>	 (03CR) 10Gehel: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[12:32:04] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json
[12:32:08] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+1] lvs: expose grpc port on ml-serve staging [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049) (owner: 10Dpogorzelski)
[12:40:39] <jinxer-wm>	 FIRING: TransitBGPDown: Transit BGP session down between cr2-esams and KPN (139.156.127.121) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=esams&var-device=cr2-esams:9804&var-bgp_group=Transit4&var-bgp_neighbor=KPN - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[12:41:30] <wikibugs>	 (03PS1) 10Elukey: Revert "profile::kafka::mirror: remove Icinga-based monitoring" [puppet] - 10https://gerrit.wikimedia.org/r/1282335
[12:42:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json
[12:43:40] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Revert "profile::kafka::mirror: remove Icinga-based monitoring" [puppet] - 10https://gerrit.wikimedia.org/r/1282335 (owner: 10Elukey)
[12:45:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage
[12:45:39] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and KPN (139.156.127.121) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[12:49:48] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:redis::master: Pass ports as an array to firewall [puppet] - 10https://gerrit.wikimedia.org/r/1282311 (owner: 10Majavah)
[12:50:14] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage
[12:50:35] <wikibugs>	 (03PS1) 10Elukey: profile::prometheus::alerts: fix alerts titles [puppet] - 10https://gerrit.wikimedia.org/r/1282337
[12:50:39] <jinxer-wm>	 RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and KPN (139.156.127.121) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[12:51:07] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
[12:52:19] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json
[12:52:40] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
[12:52:48] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json
[12:54:48] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoy: Allow configuring delayed_closed_timeout" [puppet] - 10https://gerrit.wikimedia.org/r/1282338 (https://phabricator.wikimedia.org/T271421)
[12:54:52] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoy: Allow disabling circuit breakers" [puppet] - 10https://gerrit.wikimedia.org/r/1282339 (https://phabricator.wikimedia.org/T271421)
[12:54:56] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Allow disabling x-request-id generation" [puppet] - 10https://gerrit.wikimedia.org/r/1282340 (https://phabricator.wikimedia.org/T271421)
[12:55:00] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Allow setting http2 protocol options" [puppet] - 10https://gerrit.wikimedia.org/r/1282341 (https://phabricator.wikimedia.org/T271421)
[12:55:04] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Allow configuring TLS handshake timeout" [puppet] - 10https://gerrit.wikimedia.org/r/1282342 (https://phabricator.wikimedia.org/T271421)
[12:55:07] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Support TLS min/max version config" [puppet] - 10https://gerrit.wikimedia.org/r/1282343 (https://phabricator.wikimedia.org/T271421)
[12:55:11] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Support alpn_protocols configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1282344 (https://phabricator.wikimedia.org/T271421)
[12:55:15] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Provide support for UDS upstreams" [puppet] - 10https://gerrit.wikimedia.org/r/1282345 (https://phabricator.wikimedia.org/T271421)
[12:55:16] <wikibugs>	 (03PS2) 10Mmartorana: Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366)
[12:55:19] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Add STEK configuration support" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421)
[12:55:23] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: global_tlsparams" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421)
[12:55:27] <wikibugs>	 (03PS1) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[12:57:59] <wikibugs>	 (03PS1) 10Elukey: role::pki: remove the 'discovery' intermediate's config [puppet] - 10https://gerrit.wikimedia.org/r/1282350 (https://phabricator.wikimedia.org/T420993)
[12:59:19] <wikibugs>	 (03PS1) 10Muehlenhoff: Assign the durum role for durum5003/5004 [puppet] - 10https://gerrit.wikimedia.org/r/1282351 (https://phabricator.wikimedia.org/T421863)
[12:59:20] <dcausse>	 !log T425301: resuming writes on cloudelastic 
[12:59:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:22] <stashbot>	 T425301: The cloudelastic chi cluster is red - https://phabricator.wikimedia.org/T425301
[12:59:24] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[12:59:29] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[12:59:31] <wikibugs>	 (03PS2) 10Elukey: role::pki: remove the 'discovery' intermediate's config [puppet] - 10https://gerrit.wikimedia.org/r/1282350 (https://phabricator.wikimedia.org/T420993)
[12:59:38] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282350 (https://phabricator.wikimedia.org/T420993) (owner: 10Elukey)
[12:59:46] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json
[13:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: May I have your attention please! UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1300)
[13:00:05] <jouncebot>	 manfredi and nya_1F616EMO: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:09] <nya_1F616EMO>	 o/
[13:00:14] <wikibugs>	 (03Abandoned) 10Muehlenhoff: redis::master: Pass ports as an array, not a string [puppet] - 10https://gerrit.wikimedia.org/r/1282315 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[13:00:25] <manfredi>	 I'm around
[13:01:17] <nya_1F616EMO>	 Let's pray for a deployer to appear
[13:02:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Provide support for UDS upstreams" [puppet] - 10https://gerrit.wikimedia.org/r/1282345 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[13:02:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[13:02:56] <jinxer-wm>	 RESOLVED: CirrusConsumerCloudelasticFlinkJobNotRunning: ...
[13:03:02] <jinxer-wm>	 cirrus_streaming_updater_cloudelastic_consumer in eqiad (k8s) is not running - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerCloudelasticFlinkJobNotRunning
[13:03:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add STEK configuration support" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[13:03:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[13:04:21] <wikibugs>	 (03PS1) 10Muehlenhoff: redis::master: Remove obsolete code only used for old ferm service [puppet] - 10https://gerrit.wikimedia.org/r/1282353 (https://phabricator.wikimedia.org/T419976)
[13:04:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: global_tlsparams" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[13:05:44] <wikibugs>	 (03PS1) 10Sbisson: ArticleGuidance: enable on simple english [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282354 (https://phabricator.wikimedia.org/T425351)
[13:06:42] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[13:07:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUns
[13:08:16] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Trixie 13.4 point update - https://phabricator.wikimedia.org/T420240#11886597 (10MoritzMuehlenhoff)
[13:09:09] <wikibugs>	 (03CR) 10Mmartorana: "recheck" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:09:19] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282353 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[13:09:44] <wikibugs>	 (03Abandoned) 10Muehlenhoff: redis::master: Remove obsolete code only used for old ferm service [puppet] - 10https://gerrit.wikimedia.org/r/1282308 (https://phabricator.wikimedia.org/T419976) (owner: 10Muehlenhoff)
[13:09:48] <stephanebisson>	 Is anyone doing a deployment? I have a last minute addition to this window if time allows.
[13:09:54] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json
[13:10:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm
[13:10:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet
[13:10:13] <nya_1F616EMO>	 stephanebisson: No deployers showed up so far
[13:10:20] <wikibugs>	 06SRE, 10Ganeti, 06Infrastructure-Foundations, 13Patch-For-Review: Migrating eqsin to routed Ganeti - https://phabricator.wikimedia.org/T421863#11886600 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host durum5004.eqsin.wmnet with OS bookworm completed: - durum500...
[13:10:33] <stephanebisson>	 manfredi are you around?
[13:10:41] <manfredi>	 yes
[13:10:52] <nya_1F616EMO>	 o/
[13:11:02] <stephanebisson>	 manfredi can you deploy yourself or do you want me to?
[13:11:29] <manfredi>	 I would appreciate it you deployed for me, thanks
[13:11:48] <stephanebisson>	 Can they both go at the same time?
[13:11:55] <manfredi>	 yes
[13:12:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:12:22] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281504 (https://phabricator.wikimedia.org/T420007) (owner: 10Mmartorana)
[13:12:40] <stephanebisson>	 manfredi will you be able to test with the WikimediaDebug browser extension?
[13:12:47] <manfredi>	 yes
[13:13:00] <moritzm>	 !log installing jaraco.context security updates
[13:13:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:14:44] <wikibugs>	 (03PS1) 10Elukey: sre.hardware.upgrade-firmware: remove unused code [cookbooks] - 10https://gerrit.wikimedia.org/r/1282356 (https://phabricator.wikimedia.org/T425327)
[13:16:18] <wikibugs>	 (03CR) 10AKhatun: [C:03+2] alerts: update runbook link for mw-page-html-feature-counts-change-enrich [alerts] - 10https://gerrit.wikimedia.org/r/1281017 (https://phabricator.wikimedia.org/T424225) (owner: 10AKhatun)
[13:18:08] <wikibugs>	 (03Merged) 10jenkins-bot: alerts: update runbook link for mw-page-html-feature-counts-change-enrich [alerts] - 10https://gerrit.wikimedia.org/r/1281017 (https://phabricator.wikimedia.org/T424225) (owner: 10AKhatun)
[13:19:01] <wikibugs>	 (03Merged) 10jenkins-bot: Use js promise for email confirmation banner [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281504 (https://phabricator.wikimedia.org/T420007) (owner: 10Mmartorana)
[13:20:03] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json
[13:21:32] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11886631 (10AnnieKim_WMDE) Hello! Is there anything else I can or need to provide?
[13:21:48] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:21:50] <wikibugs>	 (03PS2) 10Elukey: sre.hardware.upgrade-firmware: remove unused code [cookbooks] - 10https://gerrit.wikimedia.org/r/1282356 (https://phabricator.wikimedia.org/T425327)
[13:23:04] <wikibugs>	 (03CR) 10Sbisson: "recheck" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:23:24] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[13:23:59] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[13:24:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:24:50] * nya_1F616EMO peeks
[13:26:24] <nya_1F616EMO>	 stephanebisson, manfredi: Any progress on the two patches?
[13:28:05] <stephanebisson>	 One of them failed in CI so we're trying again
[13:28:10] <nya_1F616EMO>	 Ah
[13:28:27] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:28:34] <wikibugs>	 (03CR) 10Klausman: "We should also add a reviewer from serviceops, I've pinged Reuven." [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049) (owner: 10Dpogorzelski)
[13:29:03] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (CORE_DIFF 68 NOOP 5 DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compil" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[13:29:17] <wikibugs>	 (03PS4) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929)
[13:29:35] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[13:29:44] <nya_1F616EMO>	 I will join as Emojiwiki on my laptop
[13:30:11] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json
[13:30:31] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
[13:30:35] <Emojiwiki>	 o/ I am nya_1F616EMO
[13:30:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json
[13:31:51] <manfredi>	 CI is failing for tests which look unrelated to the patch
[13:32:43] <wikibugs>	 (03PS1) 10Muehlenhoff: Add install5004 [puppet] - 10https://gerrit.wikimedia.org/r/1282358 (https://phabricator.wikimedia.org/T421863)
[13:32:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[13:33:03] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.hosts.provision: add workaround for root user on X14 supermicros [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929) (owner: 10Elukey)
[13:35:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[13:37:07] <stephanebisson>	 It failed again with the same error.
[13:37:48] <stephanebisson>	 manfredi is there any value is deploying "Use js promise for email confirmation banner" but not "Email confirmation banner: Remove obsolete arm_b variant"?
[13:38:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1009 is CRITICAL: CRITICAL - elasticsearch inactive shards 322 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 755, active_shards: 1211, relocating_shards: 0, initializing_shards: 25, unassigned_shards:
[13:38:25] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 50, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 89252, active_shards_percent_as_number: 78.99543378995433 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:38:25] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1010 is CRITICAL: CRITICAL - elasticsearch inactive shards 322 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 755, active_shards: 1211, relocating_shards: 0, initializing_shards: 25, unassigned_shards:
[13:38:25] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 50, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 89272, active_shards_percent_as_number: 78.99543378995433 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:38:27] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 322 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 755, active_shards: 1211, relocating_shards: 0, initializing_shards: 25, unassigned_shards:
[13:38:27] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 50, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 91256, active_shards_percent_as_number: 78.99543378995433 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:38:29] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch inactive shards 315 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 758, active_shards: 1218, relocating_shards: 0, initializing_shards: 18, unassigned_shards:
[13:38:29] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 47, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 91975, active_shards_percent_as_number: 79.45205479452055 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:38:31] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1011 is CRITICAL: CRITICAL - elasticsearch inactive shards 312 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 760, active_shards: 1221, relocating_shards: 0, initializing_shards: 15, unassigned_shards:
[13:38:31] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 32, number_of_in_flight_fetch: 6, task_max_waiting_in_queue_millis: 92314, active_shards_percent_as_number: 79.6477495107632 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:38:37] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch inactive shards 310 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 760, active_shards: 1223, relocating_shards: 0, initializing_shards: 22, unassigned_shards:
[13:38:37] <icinga-wm>	 layed_unassigned_shards: 0, number_of_pending_tasks: 34, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 99277, active_shards_percent_as_number: 79.77821265492499 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:39:30] <manfredi>	 stephanebisson I think the value is limited if we deploy only one
[13:40:04] <wikibugs>	 (03Abandoned) 10Muehlenhoff: idm: Unconditionally use Envoy [puppet] - 10https://gerrit.wikimedia.org/r/1279095 (owner: 10Muehlenhoff)
[13:40:06] <stephanebisson>	 OK, I can revert the other one and let you investigate and try again another time
[13:40:11] <wikibugs>	 (03CR) 10Ayounsi: [C:03+1] Add install5004 [puppet] - 10https://gerrit.wikimedia.org/r/1282358 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[13:40:16] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[13:40:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1010 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1313, relocating_shards: 0, initializing_shards: 22, unassigned_shards: 198, delayed_unassigned
[13:40:27] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 28991, active_shards_percent_as_number: 85.64905414220483 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1009 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1313, relocating_shards: 0, initializing_shards: 22, unassigned_shards: 198, delayed_unassigned
[13:40:27] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 29033, active_shards_percent_as_number: 85.64905414220483 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1314, relocating_shards: 0, initializing_shards: 21, unassigned_shards: 198, delayed_unassigned
[13:40:27] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 30933, active_shards_percent_as_number: 85.71428571428571 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:31] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1011 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1314, relocating_shards: 0, initializing_shards: 21, unassigned_shards: 198, delayed_unassigned
[13:40:31] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 33764, active_shards_percent_as_number: 85.71428571428571 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:31] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1314, relocating_shards: 0, initializing_shards: 21, unassigned_shards: 198, delayed_unassigned
[13:40:31] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 33947, active_shards_percent_as_number: 85.71428571428571 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:37] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 763, active_shards: 1314, relocating_shards: 0, initializing_shards: 21, unassigned_shards: 198, delayed_unassigned
[13:40:37] <icinga-wm>	  0, number_of_pending_tasks: 6, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 40782, active_shards_percent_as_number: 85.71428571428571 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:40:47] <wikibugs>	 (03PS3) 10Muehlenhoff: Avoid false positive alerts after Ganeti master failover [puppet] - 10https://gerrit.wikimedia.org/r/1272701
[13:40:49] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json
[13:41:02] <wikibugs>	 (03PS1) 10Sbisson: Revert "Use js promise for email confirmation banner" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282362
[13:41:35] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281965 (https://phabricator.wikimedia.org/T420165) (owner: 101F616EMO)
[13:41:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[13:41:45] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[13:41:53] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[13:41:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[13:42:39] <wikibugs>	 (03CR) 10Sbisson: [C:03+2] Revert "Use js promise for email confirmation banner" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282362 (owner: 10Sbisson)
[13:42:51] <wikibugs>	 (03Merged) 10jenkins-bot: zhwikinews: (1/2) revert 20th anniversary logo change (config) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281965 (https://phabricator.wikimedia.org/T420165) (owner: 101F616EMO)
[13:42:51] <manfredi>	 stephanebisson: Looks like a scribunto/luaSandbox test failure which seems unrelated to this patch. I think it’s safe to proceed, but up to you
[13:43:25] <logmsgbot>	 !log sbisson@deploy1003 Started scap sync-world: Backport for [[gerrit:1281965|zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]]
[13:43:28] <stashbot>	 T420165: Requesting temporary logo change for zh.wikinews.org - https://phabricator.wikimedia.org/T420165
[13:43:44] <stephanebisson>	 Emojiwiki will you be able to test?
[13:43:53] <wikibugs>	 (03PS5) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929)
[13:43:54] <Emojiwiki>	 testing
[13:44:32] <stephanebisson>	 Not ready yet,
[13:44:50] <Emojiwiki>	 ah
[13:44:58] <Emojiwiki>	 sorry
[13:45:03] <Emojiwiki>	 but im ready at any time
[13:45:08] <logmsgbot>	 !log sbisson@deploy1003 1f616emo, sbisson: Backport for [[gerrit:1281965|zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:45:23] <wikibugs>	 (03PS1) 10JHathaway: WIP: Puppet 8 legacy fact removal [puppet] - 10https://gerrit.wikimedia.org/r/1282364
[13:45:43] <stephanebisson>	 Emojiwiki ready for testing
[13:46:03] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Use js promise for email confirmation banner" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282362 (owner: 10Sbisson)
[13:46:05] <wikibugs>	 (03CR) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929) (owner: 10Elukey)
[13:46:12] <wikibugs>	 (03CR) 10CI reject: [V:04-1] WIP: Puppet 8 legacy fact removal [puppet] - 10https://gerrit.wikimedia.org/r/1282364 (owner: 10JHathaway)
[13:46:27] <Emojiwiki>	 stephanebisson: Works via k8s-mwdebug
[13:46:38] <logmsgbot>	 !log sbisson@deploy1003 1f616emo, sbisson: Continuing with deployment
[13:47:07] <wikibugs>	 (03CR) 10Elukey: sre.hosts.provision: add workaround for root user on X14 supermicros (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929) (owner: 10Elukey)
[13:47:25] <wikibugs>	 (03PS2) 10Sbisson: ArticleGuidance: enable on simple english [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282354 (https://phabricator.wikimedia.org/T425351)
[13:47:30] <Emojiwiki>	 stephanebisson: The revert patch was split in two due to cached response concerns. When should I deploy the next change?
[13:47:50] <stephanebisson>	 What is the other patch?
[13:48:05] <Emojiwiki>	 https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1281967/1
[13:48:11] <Emojiwiki>	 removes all the assets from the config repo
[13:48:35] <stephanebisson>	 Can you schedule it for the next window?
[13:48:47] <wikibugs>	 (03CR) 10Elukey: "Tested with kafka-logging1007 from https://phabricator.wikimedia.org/T418929. If you are thinking "lemme test this with one of the other n" [cookbooks] - 10https://gerrit.wikimedia.org/r/1266257 (https://phabricator.wikimedia.org/T418929) (owner: 10Elukey)
[13:49:17] <Emojiwiki>	 stephanebisson: The UTC late window is 4 am in my timezone, so I will go for the next UTC morning one
[13:49:37] <stephanebisson>	 That works
[13:49:46] <Emojiwiki>	 thanks, gotta schedule it
[13:50:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10observability, 13Patch-For-Review: Q4:rack/setup/install kafka-logging100[6-8] - https://phabricator.wikimedia.org/T418929#11886740 (10elukey) >>! In T418929#11885960, @elukey wrote: > Tried to upgrade the BIOS, and then reset the BMC as suggested by the UI. It seems takin...
[13:50:56] <logmsgbot>	 !log sbisson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1281965|zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s)
[13:50:57] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json
[13:50:58] <stashbot>	 T420165: Requesting temporary logo change for zh.wikinews.org - https://phabricator.wikimedia.org/T420165
[13:51:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add install5004 [puppet] - 10https://gerrit.wikimedia.org/r/1282358 (https://phabricator.wikimedia.org/T421863) (owner: 10Muehlenhoff)
[13:51:37] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Read timed out. (read timeout=4) https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:51:54] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282354 (https://phabricator.wikimedia.org/T425351) (owner: 10Sbisson)
[13:51:56] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, May 05 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281967 (https://phabricator.wikimedia.org/T420165) (owner: 101F616EMO)
[13:52:20] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
[13:52:22] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[13:52:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 765, active_shards: 1378, relocating_shards: 0, initializing_shards: 11, unassigned_shards: 144, delayed_unassigned
[13:52:27] <icinga-wm>	  0, number_of_pending_tasks: 31, number_of_in_flight_fetch: 35, task_max_waiting_in_queue_millis: 105716, active_shards_percent_as_number: 89.88910632746249 https://wikitech.wikimedia.org/wiki/Search%23Administration
[13:52:52] <wikibugs>	 (03Merged) 10jenkins-bot: ArticleGuidance: enable on simple english [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282354 (https://phabricator.wikimedia.org/T425351) (owner: 10Sbisson)
[13:53:17] <wikibugs>	 (03CR) 10Eevans: [V:03+2 C:03+2] Update aqs host list [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/1281605 (https://phabricator.wikimedia.org/T412830) (owner: 10Eevans)
[13:53:18] <logmsgbot>	 !log sbisson@deploy1003 Started scap sync-world: Backport for [[gerrit:1282354|ArticleGuidance: enable on simple english (T425351)]]
[13:53:21] <stashbot>	 T425351: Enable the Article Guidance extension to Simple English Wikipedia - https://phabricator.wikimedia.org/T425351
[13:54:04] <dcausse>	 !log T425301: stopping writes again on cloudelastic, cluster unstable 
[13:54:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:06] <stashbot>	 T425301: The cloudelastic chi cluster is red - https://phabricator.wikimedia.org/T425301
[13:55:00] <logmsgbot>	 !log sbisson@deploy1003 sbisson: Backport for [[gerrit:1282354|ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:55:33] <logmsgbot>	 !log sbisson@deploy1003 sbisson: Continuing with deployment
[13:55:47] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[13:55:58] <logmsgbot>	 !log dcausse@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[13:56:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
[13:56:44] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
[13:56:44] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:56:45] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
[13:56:48] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
[13:57:06] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[13:59:09] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2157 - https://phabricator.wikimedia.org/T425242#11886769 (10Jhancock.wm) @Marostegui it's been replaced. got to skip the dell line since it's out of warranty. lemme know if it all looks good to you.
[13:59:31] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops: Degraded RAID on db2157 - https://phabricator.wikimedia.org/T425242#11886770 (10Jhancock.wm) a:03Jhancock.wm
[13:59:40] <logmsgbot>	 !log sbisson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282354|ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s)
[13:59:43] <stashbot>	 T425351: Enable the Article Guidance extension to Simple English Wikipedia - https://phabricator.wikimedia.org/T425351
[14:00:01] <wikibugs>	 (03CR) 10Elukey: [C:03+1] cumin: use aqs1016 as canary alias [puppet] - 10https://gerrit.wikimedia.org/r/1281602 (https://phabricator.wikimedia.org/T412830) (owner: 10Eevans)
[14:00:14] <logmsgbot>	 !log slyngshede@cumin1003 START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, T408892]
[14:00:17] <stashbot>	 T408892: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892
[14:00:20] <logmsgbot>	 !log slyngshede@cumin1003 END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, T408892]
[14:00:44] <logmsgbot>	 !log slyngshede@cumin1003 conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh T408892]
[14:01:05] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json
[14:01:37] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11886785 (10SLyngshede-WMF) Minor error in command, should have been:    ` $ ssh cumin1003.eqiad.wmnet $ sudo cookbook sre.dns.admin depool ulsfo -t T408892 -r...
[14:02:27] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9600 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch http://localhost:9600/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9600): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f02b42d1550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[14:02:27] <icinga-wm>	 dia.org/wiki/Search%23Administration
[14:02:27] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f069a7cd550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[14:02:27] <icinga-wm>	 dia.org/wiki/Search%23Administration
[14:02:33] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9400 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch http://localhost:9400/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9400): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f38c11cd550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[14:02:33] <icinga-wm>	 dia.org/wiki/Search%23Administration
[14:02:48] <logmsgbot>	 jmm@cumin2002 makevm (PID 918199) is awaiting input
[14:03:14] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11886801 (10SLyngshede-WMF) Depooling command output, for the records:   ` slyngshede@cumin1003:~$ sudo cookbook sre.dns.admin depool ulsfo -t T408892 -r "New...
[14:04:02] <logmsgbot>	 herron@cumin1003 reimage (PID 3973968) is awaiting input
[14:04:16] <manfredi>	 stephanebisson: should we give it another try? 
[14:04:24] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie
[14:04:52] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.move-vlan for host kafka-logging2001
[14:05:25] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: opensearch_2@cloudelastic-omega-eqiad.service on cloudelastic1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:06:29] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: red, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 762, active_shards: 1354, relocating_shards: 0, initializing_shards: 15, unassigned_shards: 164, delayed_unassigned
[14:06:29] <icinga-wm>	  0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 3, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 88.3235485975212 https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:06:39] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9400 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-omega-eqiad: cluster_name: cloudelastic-omega-eqiad, status: yellow, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 825, active_shards: 1613, relocating_shards: 0, initializing_shards: 3, unassigned_shards: 35, delayed_unass
[14:06:39] <icinga-wm>	 ards: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 97.69836462749849 https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:07:20] <wikibugs>	 (03PS1) 10Herron: kafka-logging2001: update IP and prep for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1282369 (https://phabricator.wikimedia.org/T422816)
[14:07:48] <wikibugs>	 (03CR) 10Mmartorana: "recheck" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[14:07:54] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.dns.netbox
[14:08:27] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9600 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-psi-eqiad: cluster_name: cloudelastic-psi-eqiad, status: green, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 816, active_shards: 1632, relocating_shards: 2, initializing_shards: 0, unassigned_shards: 0, delayed_unassigned_
[14:08:27] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:10:25] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitFailed: opensearch_2@cloudelastic-omega-eqiad.service on cloudelastic1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:11:13] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json
[14:12:56] <jinxer-wm>	 FIRING: CirrusConsumerCloudelasticFlinkJobNotRunning: ...
[14:12:56] <jinxer-wm>	 cirrus_streaming_updater_cloudelastic_consumer in eqiad (k8s) is not running - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerCloudelasticFlinkJobNotRunning
[14:13:16] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003"
[14:13:22] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003"
[14:13:22] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:13:22] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:13:26] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[14:13:27] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001
[14:14:32] <wikibugs>	 (03PS3) 10Gehel: feat(sysctl): priority is optional on sysctl::conffile [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301)
[14:14:32] <wikibugs>	 (03PS4) 10Gehel: perf(opensearch): increase 'vm.max_map_count' to 1048576 [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301)
[14:14:37] <wikibugs>	 (03CR) 10Gehel: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[14:15:40] <wikibugs>	 (03PS1) 10Majavah: P:zookeeper: Allow WMCS to use cloud-private FQDNs [puppet] - 10https://gerrit.wikimedia.org/r/1282372 (https://phabricator.wikimedia.org/T422646)
[14:16:31] <logmsgbot>	 herron@cumin1003 reimage (PID 3973968) is awaiting input
[14:16:36] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001
[14:16:36] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001
[14:16:36] <wikibugs>	 (03PS2) 10DCausse: search: add alt. completion indices to test keyword tokenizer (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1269465 (https://phabricator.wikimedia.org/T420427)
[14:16:55] <wikibugs>	 (03CR) 10Ebernhardson: [C:03+1] search: add alt. completion indices to test keyword tokenizer (2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1269465 (https://phabricator.wikimedia.org/T420427) (owner: 10DCausse)
[14:17:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:zookeeper: Allow WMCS to use cloud-private FQDNs [puppet] - 10https://gerrit.wikimedia.org/r/1282372 (https://phabricator.wikimedia.org/T422646) (owner: 10Majavah)
[14:18:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[14:19:50] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (DIFF 8 NOOP 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compile" [puppet] - 10https://gerrit.wikimedia.org/r/1282372 (https://phabricator.wikimedia.org/T422646) (owner: 10Majavah)
[14:20:17] <wikibugs>	 (03PS2) 10Majavah: P:zookeeper: Allow WMCS to use cloud-private FQDNs [puppet] - 10https://gerrit.wikimedia.org/r/1282372 (https://phabricator.wikimedia.org/T422646)
[14:20:59] <wikibugs>	 06SRE, 10observability, 13Patch-For-Review: Observability: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets - https://phabricator.wikimedia.org/T422816#11886853 (10herron)
[14:24:55] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (DIFF 8 NOOP 1 CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compile" [puppet] - 10https://gerrit.wikimedia.org/r/1282372 (https://phabricator.wikimedia.org/T422646) (owner: 10Majavah)
[14:25:31] <logmsgbot>	 !log pt1979@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh
[14:25:45] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11886863 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6733bed9-572f-4b81-9a71-76b2217ca3b5) set by pt1979@cumin1003 for 4:00:00 on 4 hos...
[14:28:03] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=logging-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[14:28:45] <logmsgbot>	 !log pt1979@cumin1003 DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh
[14:29:17] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[14:29:23] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[14:29:35] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[14:29:41] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[14:30:05] <jouncebot>	 Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1430)
[14:30:37] <logmsgbot>	 !log pt1979@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh
[14:30:44] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11886897 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ea06e422-63a1-4feb-89ac-13f0b89b4956) set by pt1979@cumin1003 for 4:00:00 on 5 hos...
[14:33:26] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
[14:33:35] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json
[14:33:37] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[14:34:44] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
[14:35:23] <wikibugs>	 (03PS1) 10Papaul: Add BGP peering from core routers to switches [homer/public] - 10https://gerrit.wikimedia.org/r/1282374 (https://phabricator.wikimedia.org/T408892)
[14:36:41] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10Toolforge: Adjust WMCS Gitlab CI/CD repo to stop using mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423596#11886916 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:36:43] <wikibugs>	 06SRE, 10dev-images, 06Infrastructure-Foundations, 06Release-Engineering-Team (Priority Backlog 📥): Rebuild dev-images using a base image without mirrors.wikimedia.org in the apt sources - https://phabricator.wikimedia.org/T423972#11886917 (10MoritzMuehlenhoff) p:05Triage→03Medium
[14:37:59] <wikibugs>	 06SRE: Please add Google Search Console domain verification for wikimediafoundation.org - https://phabricator.wikimedia.org/T424976#11886923 (10SCherukuwada) 05Open→03Resolved a:03SCherukuwada Ah, I wasn't aware this was already set up. Thank you. Closing this task.
[14:39:03] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Power Supply - PS1 Status - issue on wikikube-worker2371:9290 - https://phabricator.wikimedia.org/T425225#11886930 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm loose cable
[14:39:10] <wikibugs>	 (03CR) 10Ladsgroup: [C:04-1] "I think we should do this one by one for two reasons:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281479 (https://phabricator.wikimedia.org/T421796) (owner: 10Zabe)
[14:39:20] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Power Supply - PS2 Status - issue on wikikube-worker2372:9290 - https://phabricator.wikimedia.org/T425227#11886934 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm loose cable
[14:39:35] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
[14:41:13] <wikibugs>	 (03PS1) 10Bking: cirrussearch: install atop utility [puppet] - 10https://gerrit.wikimedia.org/r/1282377 (https://phabricator.wikimedia.org/T424852)
[14:41:17] <logmsgbot>	 !log pt1979@cumin1003 START - Cookbook sre.hosts.remove-downtime for 7 hosts
[14:41:21] <logmsgbot>	 !log pt1979@cumin1003 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
[14:41:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cirrussearch: install atop utility [puppet] - 10https://gerrit.wikimedia.org/r/1282377 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[14:42:14] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json
[14:42:17] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[14:42:44] <wikibugs>	 (03CR) 10Papaul: [C:03+2] Add BGP peering from core routers to switches [homer/public] - 10https://gerrit.wikimedia.org/r/1282374 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[14:44:06] <wikibugs>	 (03Merged) 10jenkins-bot: Add BGP peering from core routers to switches [homer/public] - 10https://gerrit.wikimedia.org/r/1282374 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[14:44:32] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 241 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1292, relocating_shards: 0, initializing_shards: 6, unassigned_shard
[14:44:32] <icinga-wm>	 delayed_unassigned_shards: 147, number_of_pending_tasks: 5, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 4487, active_shards_percent_as_number: 84.2791911285062 https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:45:09] <wikibugs>	 (03PS1) 10Ladsgroup: Close Gun Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282381 (https://phabricator.wikimedia.org/T421796)
[14:45:32] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1350, relocating_shards: 0, initializing_shards: 11, unassigned_shards: 172, delayed_unassig
[14:45:32] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 88.06262230919765 https://wikitech.wikimedia.org/wiki/Search%23Administration
[14:45:48] <wikibugs>	 (03PS2) 10Bking: cirrussearch: install atop utility [puppet] - 10https://gerrit.wikimedia.org/r/1282377 (https://phabricator.wikimedia.org/T424852)
[14:46:30] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282377 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[14:47:36] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoy: Allow configuring delayed_closed_timeout" [puppet] - 10https://gerrit.wikimedia.org/r/1282338 (https://phabricator.wikimedia.org/T271421)
[14:47:36] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoy: Allow disabling circuit breakers" [puppet] - 10https://gerrit.wikimedia.org/r/1282339 (https://phabricator.wikimedia.org/T271421)
[14:47:36] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Allow disabling x-request-id generation" [puppet] - 10https://gerrit.wikimedia.org/r/1282340 (https://phabricator.wikimedia.org/T271421)
[14:47:36] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Allow setting http2 protocol options" [puppet] - 10https://gerrit.wikimedia.org/r/1282341 (https://phabricator.wikimedia.org/T271421)
[14:47:37] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Allow configuring TLS handshake timeout" [puppet] - 10https://gerrit.wikimedia.org/r/1282342 (https://phabricator.wikimedia.org/T271421)
[14:47:39] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Support TLS min/max version config" [puppet] - 10https://gerrit.wikimedia.org/r/1282343 (https://phabricator.wikimedia.org/T271421)
[14:47:43] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Support alpn_protocols configuration" [puppet] - 10https://gerrit.wikimedia.org/r/1282344 (https://phabricator.wikimedia.org/T271421)
[14:47:48] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Provide support for UDS upstreams" [puppet] - 10https://gerrit.wikimedia.org/r/1282345 (https://phabricator.wikimedia.org/T271421)
[14:47:52] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Add STEK configuration support" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421)
[14:47:55] <Amir1>	 jouncebot: nowandnext
[14:47:55] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1430)
[14:47:55] <jouncebot>	 In 0 hour(s) and 42 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1530)
[14:47:56] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: global_tlsparams" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421)
[14:48:00] <wikibugs>	 (03PS2) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[14:49:01] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[14:50:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282381 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[14:52:22] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json
[14:53:03] <jinxer-wm>	 RESOLVED: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-codfw in codfw - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-kafka_cluster=logging-codfw - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[14:53:07] <wikibugs>	 (03CR) 10Zabe: "kk" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281479 (https://phabricator.wikimedia.org/T421796) (owner: 10Zabe)
[14:53:18] <wikibugs>	 (03Abandoned) 10Zabe: Close Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1281479 (https://phabricator.wikimedia.org/T421796) (owner: 10Zabe)
[14:56:22] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Provide support for UDS upstreams" [puppet] - 10https://gerrit.wikimedia.org/r/1282345 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[14:56:40] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add STEK configuration support" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[14:57:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: global_tlsparams" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[14:58:36] <wikibugs>	 (03Merged) 10jenkins-bot: Close Gun Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282381 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[14:58:41] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie
[14:58:50] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282381|Close Gun Wikinews (T421796)]]
[14:58:53] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[14:59:34] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[14:59:55] <wikibugs>	 (03CR) 10Eevans: [C:03+2] cumin: use aqs1016 as canary alias [puppet] - 10https://gerrit.wikimedia.org/r/1281602 (https://phabricator.wikimedia.org/T412830) (owner: 10Eevans)
[15:00:25] <wikibugs>	 (03PS3) 10JMeybohm: Revert "envoyproxy: Provide support for UDS upstreams" [puppet] - 10https://gerrit.wikimedia.org/r/1282345 (https://phabricator.wikimedia.org/T271421)
[15:00:25] <wikibugs>	 (03PS3) 10JMeybohm: Revert "envoyproxy: Add STEK configuration support" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421)
[15:00:26] <wikibugs>	 (03PS3) 10JMeybohm: Revert "envoyproxy: global_tlsparams" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421)
[15:00:26] <wikibugs>	 (03PS3) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[15:00:35] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1282381|Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:01:22] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[15:02:30] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json
[15:05:00] <wikibugs>	 (03PS3) 10Mmartorana: Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366)
[15:05:35] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282381|Close Gun Wikinews (T421796)]] (duration: 06m 45s)
[15:05:39] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[15:06:43] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[15:07:20] <wikibugs>	 (03PS2) 10Eevans: airflow-main: remove obsolete hosts (from commented entry) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1281587 (https://phabricator.wikimedia.org/T412830)
[15:07:20] <wikibugs>	 (03PS2) 10Eevans: revise-tone-task-generator: updated list of aqs cassandra nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/1281588 (https://phabricator.wikimedia.org/T412830)
[15:07:20] <wikibugs>	 (03PS2) 10Eevans: _aqs2-common_: updated aqs node list [deployment-charts] - 10https://gerrit.wikimedia.org/r/1281589 (https://phabricator.wikimedia.org/T412830)
[15:08:06] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282346 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[15:08:55] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[15:09:49] <wikibugs>	 (03PS1) 10CDanis: systemd::timer::job: validate monotonic triggers with calendar specs [puppet] - 10https://gerrit.wikimedia.org/r/1282382 (https://phabricator.wikimedia.org/T295284)
[15:09:59] <logmsgbot>	 elukey@cumin1003 provision (PID 4020290) is awaiting input
[15:10:02] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[15:10:13] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.netbox
[15:10:57] <papaul>	 !log ongoing switch refresh in ULSFO
[15:10:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:01] <wikibugs>	 (03CR) 10Dpogorzelski: lvs: expose grpc port on ml-serve staging (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1282328 (https://phabricator.wikimedia.org/T424049) (owner: 10Dpogorzelski)
[15:11:39] <wikibugs>	 (03CR) 10Muehlenhoff: "This doesn't sound like the right solution? This define installs the config into /etc/sysctl.d/, which takes precedence over the `/usr/lib" [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[15:12:11] <wikibugs>	 (03PS1) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[15:12:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json
[15:12:41] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[15:13:02] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:13:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
[15:13:06] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
[15:13:12] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org
[15:13:27] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, May 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-2" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[15:15:05] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[15:15:51] <jinxer-wm>	 FIRING: ATSBackendErrorsHigh: ATS: elevated 5xx errors from eventgate-logging-external.discovery.wmnet in codfw #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=codfw&var-cluster=text&var-origin=eventgate-logging-external.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHigh
[15:16:05] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[15:16:10] <herron>	 this again eh
[15:16:30] <elukey>	 herron: yeah probably :D
[15:16:33] <jhathaway>	 o/
[15:17:04] <wikibugs>	 (03PS1) 10Ladsgroup: Close Greek Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282384 (https://phabricator.wikimedia.org/T421796)
[15:17:25] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/admin 'sync'.
[15:17:34] <icinga-wm>	 RECOVERY - Dell PowerEdge or Supermicro Broadcom RAID Controller on db2157 is OK: communication: 0 OK : controller: 0 OK : physical_disk: 0 OK : virtual_disk: 0 OK : bbu: 0 OK : enclosure: 0 OK https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
[15:17:50] <icinga-wm>	 PROBLEM - Host ps1-23-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[15:17:52] <icinga-wm>	 PROBLEM - Host ps1-22-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[15:17:57] <wikibugs>	 (03PS1) 10Mmartorana: Revert^2 "Use js promise for email confirmation banner" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282385
[15:17:59] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'sync'.
[15:18:34] <icinga-wm>	 RECOVERY - Host ps1-23-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 72.12 ms
[15:18:34] <icinga-wm>	 RECOVERY - Host ps1-22-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 75.71 ms
[15:18:47] <elukey>	 synced the outstanding changes, but it was only kafka-logging2001 afaics
[15:18:50] <wikibugs>	 (03CR) 10Gehel: "I have a different read of that man page:" [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[15:19:11] <jinxer-wm>	 FIRING: [8x] GanetiBGPDown: BGP session down between ganeti4005 and cr3-ulsfo - group Ganeti4 - https://wikitech.wikimedia.org/wiki/Ganeti#GanetiBGPDown  - https://alerts.wikimedia.org/?q=alertname%3DGanetiBGPDown
[15:19:11] <herron>	 thanks elukey, yeah sounds right.  I guess this should get ran with each host
[15:19:43] <elukey>	 in theory no, eventgate should have a list of hostnames to check and fallback to those 
[15:19:47] <logmsgbot>	 elukey@cumin1003 provision (PID 4026055) is awaiting input
[15:20:07] <logmsgbot>	 !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[15:20:37] <herron>	 this is the theory I was working off as well but it paged
[15:20:50] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] scap.cfg.erb: Remove unused canary_service setting [puppet] - 10https://gerrit.wikimedia.org/r/1281606 (owner: 10Ahmon Dancy)
[15:20:51] <jinxer-wm>	 RESOLVED: ATSBackendErrorsHigh: ATS: elevated 5xx errors from eventgate-logging-external.discovery.wmnet in codfw #page - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server#Debugging - https://grafana.wikimedia.org/d/1T_4O08Wk/ats-backends-origin-servers-overview?orgId=1&viewPanel=12&var-site=codfw&var-cluster=text&var-origin=eventgate-logging-external.discovery.wmnet - https://alerts.wikimedia.org/?q=alertname%3DATSBackendErrorsHi
[15:21:11] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, May 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-2" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282385 (owner: 10Mmartorana)
[15:22:18] <icinga-wm>	 PROBLEM - Host bast4006 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:26] <icinga-wm>	 PROBLEM - Host doh4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:26] <icinga-wm>	 PROBLEM - Host doh4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:30] <icinga-wm>	 PROBLEM - Host durum4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:30] <icinga-wm>	 PROBLEM - Host hcaptcha-proxy4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:30] <icinga-wm>	 PROBLEM - Host hcaptcha-proxy4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:30] <icinga-wm>	 PROBLEM - Host install4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:31] <logmsgbot>	 !log fceratto@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
[15:22:32] <icinga-wm>	 PROBLEM - Host durum4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:39] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Depooling db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json
[15:22:40] <icinga-wm>	 PROBLEM - Host tcp-proxy4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:40] <icinga-wm>	 PROBLEM - Host tcp-proxy4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:42] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[15:22:48] <icinga-wm>	 PROBLEM - Host ncredir4004 is DOWN: PING CRITICAL - Packet loss = 100%
[15:22:58] <wikibugs>	 (03CR) 10Muehlenhoff: "I checked atop in bullseye and it still defaults to "-R", which would reintroduce the past error. But I also checked bookworm and trixie a" [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[15:23:00] <icinga-wm>	 PROBLEM - Host ps1-23-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[15:23:02] <icinga-wm>	 PROBLEM - Host ps1-22-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[15:23:12] <icinga-wm>	 PROBLEM - Host netflow4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:23:12] <icinga-wm>	 PROBLEM - Host ncredir4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:23:12] <icinga-wm>	 PROBLEM - Host prometheus4003 is DOWN: PING CRITICAL - Packet loss = 100%
[15:23:50] <icinga-wm>	 PROBLEM - VRRP status on cr3-ulsfo is CRITICAL: VRRP CRITICAL - 2 inconsistent interfaces, 0 misconfigured interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
[15:24:50] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json
[15:26:01] <wikibugs>	 (03PS1) 10Eevans: decommission aqs101[0-2,4-5] [puppet] - 10https://gerrit.wikimedia.org/r/1282386 (https://phabricator.wikimedia.org/T425357)
[15:26:44] <wikibugs>	 (03CR) 10Dzahn: "sorry, I can't really review this or have knowledge of it" [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[15:29:11] <jinxer-wm>	 RESOLVED: [8x] GanetiBGPDown: BGP session down between ganeti4005 and cr3-ulsfo - group Ganeti4 - https://wikitech.wikimedia.org/wiki/Ganeti#GanetiBGPDown  - https://alerts.wikimedia.org/?q=alertname%3DGanetiBGPDown
[15:30:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282060 (owner: 10Chlod Alejandro)
[15:30:04] <jouncebot>	 jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1530).
[15:30:40] <elukey>	 herron, jhathaway - the only suspicion that I have in mind is that eventgate may keep long tcp sessions until explicitly roll-restarted, so one attempt could be to roll restart the pods and see the next reimage 
[15:31:12] <wikibugs>	 (03Merged) 10jenkins-bot: Make errorpages responsive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282060 (owner: 10Chlod Alejandro)
[15:31:29] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282060|Make errorpages responsive]]
[15:32:13] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
[15:32:36] <logmsgbot>	 !log elukey@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
[15:32:47] <elukey>	 herron, jhathaway done
[15:33:00] <elukey>	 `helmfile -e codfw --state-values-set roll_restart=1 sync`
[15:33:10] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup, chlod: Backport for [[gerrit:1282060|Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[15:33:23] <jhathaway>	 thanks elukey 
[15:33:30] <logmsgbot>	 !log ayounsi@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement
[15:33:39] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11887131 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=241a7848-479d-48b2-8824-9a08c17249ab) set by ayounsi@cumin1003 for 20:00:00 on 39...
[15:34:15] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup, chlod: Continuing with deployment
[15:34:18] <wikibugs>	 07sre-alert-triage, 06Data-Platform-SRE, 06ServiceOps new: Alert in need of triage: Kafka MirrorMaker main-codfw_to_main-eqiad dropped message count in last 30m (instance alert1002) - https://phabricator.wikimedia.org/T425339#11887133 (10JMeybohm) We already tried to remove the icinga alerts completely: http...
[15:34:18] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[15:34:24] <wikibugs>	 (03CR) 10Gehel: "Check my tests:" [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[15:34:59] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json
[15:35:17] <wikibugs>	 (03CR) 10Federico Ceratto: "The deletions match the description." [puppet] - 10https://gerrit.wikimedia.org/r/1282386 (https://phabricator.wikimedia.org/T425357) (owner: 10Eevans)
[15:35:21] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+1] decommission aqs101[0-2,4-5] [puppet] - 10https://gerrit.wikimedia.org/r/1282386 (https://phabricator.wikimedia.org/T425357) (owner: 10Eevans)
[15:35:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:37:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Q3 :rack/setup/install cloudvirt refresh - https://phabricator.wikimedia.org/T425088#11887152 (10elukey) It seems to me that the BMC is not getting an IP address, but for cloudvirt1078 I see:  ` elukey@install1005:~$ sudo journalctl -u isc-dhcp-server.service --since '2 hours ag...
[15:38:29] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282060|Make errorpages responsive]] (duration: 06m 59s)
[15:39:02] <wikibugs>	 (03PS2) 10Elukey: Add Wikifunctions' evaluator ingress endpoints to service.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1280433 (https://phabricator.wikimedia.org/T424193)
[15:40:22] <wikibugs>	 (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1280433 (https://phabricator.wikimedia.org/T424193) (owner: 10Elukey)
[15:41:21] <wikibugs>	 (03CR) 10Jcrespo: "Particularly, I would read https://phabricator.wikimedia.org/T192551#4157551 which provided 3 recommended ways of solving the issue, and #" [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[15:41:38] <Amir1>	 jouncebot: nowandnext
[15:41:38] <jouncebot>	 For the next 0 hour(s) and 18 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1530)
[15:41:38] <jouncebot>	 In 1 hour(s) and 18 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1700)
[15:41:38] <jouncebot>	 In 1 hour(s) and 18 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1700)
[15:42:51] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[15:43:50] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::prometheus::alerts: fix alerts titles [puppet] - 10https://gerrit.wikimedia.org/r/1282337 (owner: 10Elukey)
[15:44:39] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[15:45:07] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json
[15:51:00] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2146.codfw.wmnet - https://phabricator.wikimedia.org/T424189#11887199 (10Jhancock.wm) 05Open→03Resolved
[15:51:28] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission db2147.codfw.wmnet - https://phabricator.wikimedia.org/T424226#11887205 (10Jhancock.wm) 05Open→03Resolved
[15:52:50] <wikibugs>	 (03PS1) 10Elukey: admin_ng: move cfssl-issuer on ml-staging-codfw to pki1002 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282389
[15:52:51] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware: decommission db2141.codfw.wmnet - https://phabricator.wikimedia.org/T424327#11887214 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm
[15:53:32] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc2012.codfw.wmnet - https://phabricator.wikimedia.org/T424201#11887224 (10Jhancock.wm) 05Open→03Resolved
[15:54:24] <wikibugs>	 10ops-codfw, 06SRE, 06DBA, 06DC-Ops, 10decommission-hardware: decommission pc2011.codfw.wmnet - https://phabricator.wikimedia.org/T424012#11887241 (10Jhancock.wm) 05Open→03Resolved
[15:55:09] <wikibugs>	 (03PS1) 10AKhatun: stream: change source to only eqiad in mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282390 (https://phabricator.wikimedia.org/T425362)
[15:55:15] <logmsgbot>	 !log fceratto@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json
[15:55:18] <stashbot>	 T419635: Drop il_to column from imagelinks table in wmf production - https://phabricator.wikimedia.org/T419635
[15:56:09] <wikibugs>	 (03PS2) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[15:56:45] <wikibugs>	 (03PS1) 10Elukey: Move pki.discovery.wmnet's eqiad endpoint to pki1002 [puppet] - 10https://gerrit.wikimedia.org/r/1282391 (https://phabricator.wikimedia.org/T416664)
[15:57:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282384 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[15:58:08] <wikibugs>	 (03Merged) 10jenkins-bot: Close Greek Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282384 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[15:58:24] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282384|Close Greek Wikinews (T421796)]]
[15:58:26] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[15:58:31] <wikibugs>	 (03PS2) 10Elukey: Move pki.discovery.wmnet's eqiad endpoint to pki1002 [puppet] - 10https://gerrit.wikimedia.org/r/1282391 (https://phabricator.wikimedia.org/T416664)
[15:59:00] <wikibugs>	 (03CR) 10DCausse: [C:03+1] "should be ready to go, happy to help with the deploy if you want" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468) (owner: 10Neriah)
[16:00:05] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1282384|Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[16:00:29] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[16:00:33] <wikibugs>	 (03Abandoned) 10Elukey: admin_ng: move cfssl-issuer on ml-staging-codfw to pki1002 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282389 (owner: 10Elukey)
[16:01:17] <wikibugs>	 (03PS4) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[16:01:26] <wikibugs>	 (03CR) 10Bking: "@jaime" [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[16:03:08] <wikibugs>	 (03CR) 10Jcrespo: "As long as it doesn't go into the "it is installed everywhere" side, and you are aware of the performance impact, no problems from me." [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[16:04:43] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282384|Close Greek Wikinews (T421796)]] (duration: 06m 19s)
[16:04:46] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[16:05:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[16:09:21] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:09:23] <wikibugs>	 (03CR) 10Neriah: "Yes, I'd be happy to. It's a bit hard for me to adjust to the deployment schedules...😊" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468) (owner: 10Neriah)
[16:09:27] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[16:17:11] <wikibugs>	 (03PS5) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[16:20:47] <wikibugs>	 (03CR) 10Ottomata: [C:03+1] stream: change source to only eqiad in mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282390 (https://phabricator.wikimedia.org/T425362) (owner: 10AKhatun)
[16:21:16] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[16:23:09] <wikibugs>	 (03PS1) 10Dzahn: gerrit: replace RSA ssh host key with new ed25519 key [puppet] - 10https://gerrit.wikimedia.org/r/1282395 (https://phabricator.wikimedia.org/T240266)
[16:23:53] <wikibugs>	 (03CR) 10Dzahn: "also see details at https://phabricator.wikimedia.org/T240266#11887287" [puppet] - 10https://gerrit.wikimedia.org/r/1282395 (https://phabricator.wikimedia.org/T240266) (owner: 10Dzahn)
[16:26:24] <wikibugs>	 (03CR) 10AKhatun: [C:03+2] stream: change source to only eqiad in mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282390 (https://phabricator.wikimedia.org/T425362) (owner: 10AKhatun)
[16:28:40] <wikibugs>	 (03Merged) 10jenkins-bot: stream: change source to only eqiad in mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282390 (https://phabricator.wikimedia.org/T425362) (owner: 10AKhatun)
[16:32:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[16:32:55] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-Needs-Improvement: Some SAL log entries (e.g. switchdc, scap backport) are getting cut off because long lines are being split over IRC - https://phabricator.wikimedia.org/T285709#11887352 (10A_smart_kitten)
[16:33:06] <wikibugs>	 (03CR) 10Muehlenhoff: standard_packages: prevent atop package from automatic purges (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[16:33:31] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:33:44] <logmsgbot>	 !log ebernhardson@deploy1003 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
[16:33:49] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:34:21] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:52] <logmsgbot>	 !log ebernhardson@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:35:41] <wikibugs>	 (03CR) 10Hashar: "I forgot this morning: I removed this change from the local Puppet server since that caused Puppet agent to fail on the instances." [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[16:35:52] <wikibugs>	 (03PS2) 10CDanis: haproxy: webrequest: capture ratelimiting headers [puppet] - 10https://gerrit.wikimedia.org/r/1279465 (https://phabricator.wikimedia.org/T419736)
[16:35:55] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1279465 (https://phabricator.wikimedia.org/T419736) (owner: 10CDanis)
[16:37:57] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[16:38:05] <jinxer-wm>	 RESOLVED: CirrusConsumerCloudelasticFlinkJobNotRunning: ...
[16:38:11] <jinxer-wm>	 cirrus_streaming_updater_cloudelastic_consumer in eqiad (k8s) is not running - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerCloudelasticFlinkJobNotRunning
[16:39:23] <wikibugs>	 (03PS1) 10HakanIST: sectionCollapsing: Scroll to fragment target on init [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282397 (https://phabricator.wikimedia.org/T425290)
[16:39:50] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:40:05] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[16:42:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUns
[16:43:31] <wikibugs>	 (03PS2) 10HakanIST: sectionCollapsing: Scroll to fragment target on init [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282397 (https://phabricator.wikimedia.org/T425290)
[16:50:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sectionCollapsing: Scroll to fragment target on init [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282397 (https://phabricator.wikimedia.org/T425290) (owner: 10HakanIST)
[16:51:59] <wikibugs>	 (03CR) 10JMeybohm: [V:03+1] "PCC SUCCESS (DIFF 48 CORE_DIFF 2 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compil" [puppet] - 10https://gerrit.wikimedia.org/r/1282347 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[17:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1700)
[17:00:05] <jouncebot>	 ryankemper: #bothumor My software never has bugs. It just develops random features. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T1700).
[17:00:19] <wikibugs>	 (03PS2) 10CDanis: systemd::timer::job: validate monotonic triggers with calendar specs [puppet] - 10https://gerrit.wikimedia.org/r/1282382 (https://phabricator.wikimedia.org/T295284)
[17:00:45] <wikibugs>	 (03PS6) 10JMeybohm: Revert "envoyproxy: Add dual stack cert support" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421)
[17:01:42] <wikibugs>	 (03PS3) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:02:58] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:02:58] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:03:06] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:03:25] <wikibugs>	 (03PS4) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:03:31] <jinxer-wm>	 FIRING: Outbound discards: Alert for device asw2-b-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[17:03:43] <icinga-wm>	 PROBLEM - Host cr4-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[17:03:43] <icinga-wm>	 PROBLEM - Host cr4-ulsfo IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[17:04:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:04:09] <wikibugs>	 (03PS5) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:04:20] <jinxer-wm>	 FIRING: CirrusSearchSaneitizerFixRateTooHigh: MediaWiki CirrusSearch Saneitizer is fixing an abnormally high number of documents in cloudelastic - https://wikitech.wikimedia.org/wiki/Search/CirrusStreamingUpdater#San(e)itizing - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?viewPanel=59&orgId=1&from=now-6M&to=now&var-search_cluster=cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchSaneitizerFixRateT
[17:04:34] <jhathaway>	 !incidents
[17:04:34] <sirenbot>	 7898 (ACKED)  Host cr4-ulsfo
[17:04:35] <sirenbot>	 7897 (RESOLVED)  ATSBackendErrorsHigh cache_text sre (eventgate-logging-external.discovery.wmnet codfw)
[17:04:35] <sirenbot>	 7894 (RESOLVED)  [5x] ATSBackendErrorsHigh cache_text sre (performance.discovery.wmnet)
[17:04:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:04:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[17:05:00] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:05:08] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:05:11] <icinga-wm>	 RECOVERY - Host cr4-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 71.49 ms
[17:05:43] <jinxer-wm>	 FIRING: [6x] CoreBGPDown: Core BGP session down between cr1-codfw and cr4-ulsfo (198.35.26.129) - group Confed_ulsfo - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[17:05:53] <wikibugs>	 (03PS3) 10CDanis: systemd::timer::job: silently translate calendar keywords on monotonic triggers [puppet] - 10https://gerrit.wikimedia.org/r/1282382 (https://phabricator.wikimedia.org/T295284)
[17:06:05] <wikibugs>	 (03PS6) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:06:07] <jhathaway>	 XioNoX: fyi just got a page for cr4-ulsfo being down, missed silence I assume?
[17:06:36] <XioNoX>	 jhathaway: nah, cr3 and cr4 are not supposed to be impacted by the maintenance
[17:06:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:06:50] <XioNoX>	 but the impact is none as ulsfo is depool
[17:06:58] <XioNoX>	 papaul: ^
[17:07:08] <jhathaway>	 got thanks
[17:07:31] <papaul>	 XioNoX: ack
[17:08:44] <icinga-wm>	 RECOVERY - Host cr4-ulsfo IPv6 is UP: PING OK - Packet loss = 0%, RTA = 71.51 ms
[17:09:51] <jinxer-wm>	 RESOLVED: CoreRouterInterfaceDown: Core router interface down - cr1-codfw:xe-1/1/1:0 (Transport: cr4-ulsfo:xe-0/1/1 (Lumen, 442550294) {#12252_12295-1}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[17:10:43] <jinxer-wm>	 RESOLVED: [7x] CoreBGPDown: Core BGP session down between cr1-codfw and cr4-ulsfo (198.35.26.129) - group Confed_ulsfo - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[17:10:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[17:10:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[17:11:01] <wikibugs>	 (03CR) 10JMeybohm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282348 (https://phabricator.wikimedia.org/T271421) (owner: 10JMeybohm)
[17:15:48] <wikibugs>	 (03PS7) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:16:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:16:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[17:17:14] <wikibugs>	 (03PS4) 10CDanis: systemd::timer::job: support `hourly` & friends [puppet] - 10https://gerrit.wikimedia.org/r/1282382 (https://phabricator.wikimedia.org/T295284)
[17:18:31] <jinxer-wm>	 FIRING: [2x] Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[17:19:46] <wikibugs>	 (03PS8) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:20:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:23:01] <wikibugs>	 (03PS9) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:23:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-eqiad:et-1/1/2 (Transport: cr1-codfw:et-1/0/2 (Arelion, IC-374549) {#20231106}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[17:24:13] <wikibugs>	 (03PS10) 10Bking: standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551)
[17:25:37] <wikibugs>	 06SRE: Please add Google Search Console domain verification for wikimediafoundation.org - https://phabricator.wikimedia.org/T424976#11887477 (10Aklapper) →14Duplicate dup:03T404974
[17:25:40] <wikibugs>	 06SRE, 06Traffic: [Search Console Verification DNS Request] - {{wikimediafoundation.org}} - https://phabricator.wikimedia.org/T404974#11887479 (10Aklapper)
[17:26:48] <wikibugs>	 (03CR) 10Bking: standard_packages: prevent atop package from automatic purges (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:28:26] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:28:51] <jinxer-wm>	 FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[17:30:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[17:30:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[17:31:38] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:31:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[17:32:42] <wikibugs>	 (03PS1) 10Bking: cloudelastic: explicitly disable security plugin [puppet] - 10https://gerrit.wikimedia.org/r/1282399 (https://phabricator.wikimedia.org/T424852)
[17:34:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:36:01] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282399 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[17:37:59] <wikibugs>	 (03CR) 10Bking: [C:03+2] standard_packages: prevent atop package from automatic purges [puppet] - 10https://gerrit.wikimedia.org/r/1282383 (https://phabricator.wikimedia.org/T192551) (owner: 10Bking)
[17:38:01] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:38:03] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:38:05] <logmsgbot>	 !log jclark@cumin1003 START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:38:54] <wikibugs>	 (03CR) 10Ryan Kemper: [C:03+1] cloudelastic: explicitly disable security plugin [puppet] - 10https://gerrit.wikimedia.org/r/1282399 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[17:41:09] <wikibugs>	 (03CR) 10Bking: [C:03+2] cloudelastic: explicitly disable security plugin [puppet] - 10https://gerrit.wikimedia.org/r/1282399 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[17:41:16] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:41:19] <wikibugs>	 (03CR) 10Mmartorana: "recheck" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[17:46:40] <icinga-wm>	 PROBLEM - OSPF status on cr1-drmrs is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[17:47:20] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:47:39] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:49:14] <logmsgbot>	 !log jclark@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
[17:53:26] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Q3 :rack/setup/install cloudvirt refresh - https://phabricator.wikimedia.org/T425088#11887548 (10Jclark-ctr) @elukey It looks like the servers powered themselves off. I power-cycled them again. They’re still failing, but they’re getting farther in the provisioning process.  ` Ru...
[17:53:48] <wikibugs>	 (03CR) 10Mmartorana: [C:03+1] "This backport previously passed CI and has a verified in the history. The original patch is already merged on the train, and this change d" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[18:02:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[18:02:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[18:06:01] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[18:06:26] <dancy>	 jouncebot now
[18:06:27] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 53 minute(s)
[18:06:52] <logmsgbot>	 !log dancy@deploy1003 Installing scap version "4.260.0" for 2 host(s)
[18:07:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[18:07:50] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[18:08:45] <logmsgbot>	 !log dancy@deploy1003 Installation of scap version "4.260.0" completed for 2 hosts
[18:09:20] <logmsgbot>	 !log dancy@deploy1003 Started scap sync-world: testing
[18:10:30] <logmsgbot>	 !log dancy@deploy1003 dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:10:43] <wikibugs>	 (03PS1) 10Andrew Bogott: Initial entries for cloudvirt1077-1080 [puppet] - 10https://gerrit.wikimedia.org/r/1282402 (https://phabricator.wikimedia.org/T425088)
[18:10:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[18:11:00] <logmsgbot>	 !log dancy@deploy1003 dancy: Rolling back deployment
[18:11:15] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[18:11:24] <logmsgbot>	 !log dancy@deploy1003 Finished scap sync-world: testing (duration: 02m 04s)
[18:14:11] <wikibugs>	 (03CR) 10HakanIST: "recheck" [extensions/MobileFrontend] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282397 (https://phabricator.wikimedia.org/T425290) (owner: 10HakanIST)
[18:16:23] <wikibugs>	 (03PS1) 10Ladsgroup: Close Albanian Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282405 (https://phabricator.wikimedia.org/T421796)
[18:17:00] <wikibugs>	 10ops-esams, 06SRE, 06Commons, 06DC-Ops, and 3 others: ESAMS serving an older revision of some overwritten files - https://phabricator.wikimedia.org/T425216#11887577 (10AlexisJazz) https://commons.wikimedia.org/wiki/File:Hana_Vagnerov%C3%A1_v_Show_Jana_Krause_19._5._2021_upout%C3%A1vka_10.png and https://c...
[18:18:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-codfw:et-1/0/2 (Transport: cr1-eqiad:et-1/1/2 (Arelion, IC-374549) {#12267}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[18:20:23] <Amir1>	 jouncebot: nowandnext
[18:20:23] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 39 minute(s)
[18:20:23] <jouncebot>	 In 1 hour(s) and 39 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2000)
[18:20:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282405 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[18:21:54] <wikibugs>	 (03Merged) 10jenkins-bot: Close Albanian Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282405 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[18:22:11] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282405|Close Albanian Wikinews (T421796)]]
[18:22:15] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[18:23:53] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1282405|Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:25:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, May 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468) (owner: 10Neriah)
[18:25:24] <icinga-wm>	 PROBLEM - Host asw2-ulsfo is DOWN: PING CRITICAL - Packet loss = 100%
[18:27:08] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[18:31:28] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282405|Close Albanian Wikinews (T421796)]] (duration: 09m 17s)
[18:31:31] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[18:31:52] <icinga-wm>	 PROBLEM - Host mr1-ulsfo IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[18:38:49] <wikibugs>	 (03PS1) 10Ladsgroup: Close Limburgish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282407 (https://phabricator.wikimedia.org/T421796)
[18:42:25] <wikibugs>	 (03CR) 10Neriah: [C:03+1] Close Limburgish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282407 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[18:48:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Initial entries for cloudvirt1077-1080 [puppet] - 10https://gerrit.wikimedia.org/r/1282402 (https://phabricator.wikimedia.org/T425088) (owner: 10Andrew Bogott)
[18:52:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282407 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[18:53:19] <wikibugs>	 (03Merged) 10jenkins-bot: Close Limburgish Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282407 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[18:53:26] <wikibugs>	 (03PS7) 10Andrew Bogott: Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258)
[18:53:27] <wikibugs>	 (03PS6) 10Andrew Bogott: Remove profile::wmcs::lvm [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258)
[18:53:27] <wikibugs>	 (03PS1) 10Andrew Bogott: labs_lvm: use ensure_packages so this can coexist with other lvm rules [puppet] - 10https://gerrit.wikimedia.org/r/1282408
[18:53:35] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282407|Close Limburgish Wikinews (T421796)]]
[18:53:38] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[18:55:17] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1282407|Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:55:29] <wikibugs>	 (03PS1) 10Bking: cloudelastic: remove systemd override that uses PrivateMounts [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852)
[18:55:40] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[18:55:41] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[18:56:28] <wikibugs>	 (03PS8) 10Andrew Bogott: Add new class, labs_lvm_ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258)
[18:56:28] <wikibugs>	 (03PS7) 10Andrew Bogott: Remove profile::wmcs::lvm [puppet] - 10https://gerrit.wikimedia.org/r/1282007 (https://phabricator.wikimedia.org/T422258)
[18:57:46] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[18:59:51] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282407|Close Limburgish Wikinews (T421796)]] (duration: 06m 16s)
[18:59:53] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[19:00:57] <wikibugs>	 (03PS2) 10Bking: cloudelastic: Disable systemd override that uses PrivateMounts [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852)
[19:01:20] <wikibugs>	 (03CR) 10Andrew Bogott: "I think this is good now, I tested it on 10 hosts:" [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[19:01:30] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cloudelastic: Disable systemd override that uses PrivateMounts [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[19:02:28] <wikibugs>	 (03PS3) 10Bking: cloudelastic: Disable systemd override that uses PrivateMounts [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852)
[19:02:58] <cjming>	 jouncebot: nowandnext
[19:02:58] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 57 minute(s)
[19:02:58] <jouncebot>	 In 0 hour(s) and 57 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2000)
[19:03:29] <wikibugs>	 (03PS1) 10Neriah: Close Hebrew Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282410 (https://phabricator.wikimedia.org/T421796)
[19:05:43] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[19:06:18] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[19:06:25] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[19:07:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[19:07:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[19:08:33] <wikibugs>	 (03CR) 10Andrew Bogott: "well that paste didn't work at all, but we I get vd-second-local-disk on /dev/sda for several VMs." [puppet] - 10https://gerrit.wikimedia.org/r/1282006 (https://phabricator.wikimedia.org/T422258) (owner: 10Andrew Bogott)
[19:11:07] <wikibugs>	 (03CR) 10Ladsgroup: "Thanks. I need to clean their DPL stuff first. Give me a bit. It'd be also better to also remove unneeded stuff from IS.php too (like RC p" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282410 (https://phabricator.wikimedia.org/T421796) (owner: 10Neriah)
[19:15:14] <wikibugs>	 (03CR) 10Bking: [C:03+2] cloudelastic: Disable systemd override that uses PrivateMounts [puppet] - 10https://gerrit.wikimedia.org/r/1282409 (https://phabricator.wikimedia.org/T424852) (owner: 10Bking)
[19:23:08] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
[19:23:11] <stashbot>	 T424852: Investigate performance issues in cloudelastic - https://phabricator.wikimedia.org/T424852
[19:23:25] <logmsgbot>	 !log root@deploy1003 helmfile [eqiad] START helmfile.d/admin 'sync'.
[19:23:36] <logmsgbot>	 !log bking@cumin2002 END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
[19:23:40] <logmsgbot>	 !log root@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'sync'.
[19:27:22] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie
[19:27:28] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1010 is CRITICAL: CRITICAL - elasticsearch inactive shards 270 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1263, relocating_shards: 0, initializing_shards: 25, unassigned_shar
[19:27:28] <icinga-wm>	  delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.38747553816047 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:27:28] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1009 is CRITICAL: CRITICAL - elasticsearch inactive shards 270 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1263, relocating_shards: 0, initializing_shards: 25, unassigned_shar
[19:27:28] <icinga-wm>	  delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.38747553816047 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:27:28] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9600 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch http://localhost:9600/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9600): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7feec57f9550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[19:27:28] <icinga-wm>	 dia.org/wiki/Search%23Administration
[19:27:32] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1007 is CRITICAL: CRITICAL - elasticsearch inactive shards 268 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1265, relocating_shards: 0, initializing_shards: 25, unassigned_shar
[19:27:32] <icinga-wm>	  delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.51793868232224 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:27:38] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1011 is CRITICAL: CRITICAL - elasticsearch inactive shards 267 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1266, relocating_shards: 0, initializing_shards: 25, unassigned_shar
[19:27:38] <icinga-wm>	  delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.58317025440313 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:27:38] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9400 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch http://localhost:9400/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9400): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f521aaad550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[19:27:38] <icinga-wm>	 dia.org/wiki/Search%23Administration
[19:27:43] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.move-vlan for host kafka-logging1005
[19:27:43] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005
[19:27:46] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1012 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f55ec8c9550: Failed to establish a new connection: [Errno 111] Connection refused)) https://wikitec
[19:27:46] <icinga-wm>	 dia.org/wiki/Search%23Administration
[19:27:46] <icinga-wm>	 PROBLEM - OpenSearch health check for shards on 9200 on cloudelastic1008 is CRITICAL: CRITICAL - elasticsearch inactive shards 266 threshold =0.15 breach: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1267, relocating_shards: 0, initializing_shards: 25, unassigned_shar
[19:27:46] <icinga-wm>	  delayed_unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 82.64840182648402 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:28:29] <inflatador>	 ^^ sorry for the noise, just suppressed these alerts
[19:28:46] <logmsgbot>	 !log bking@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting
[19:31:28] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1010 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1308, relocating_shards: 0, initializing_shards: 24, unassigned_shards: 201, delayed_unassig
[19:31:28] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 85.32289628180038 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:31:28] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1009 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1308, relocating_shards: 0, initializing_shards: 24, unassigned_shards: 201, delayed_unassig
[19:31:28] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 85.32289628180038 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:31:32] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1007 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1308, relocating_shards: 0, initializing_shards: 24, unassigned_shards: 201, delayed_unassig
[19:31:32] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 85.32289628180038 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:31:36] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1011 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1311, relocating_shards: 0, initializing_shards: 24, unassigned_shards: 198, delayed_unassig
[19:31:36] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 85.51859099804305 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:31:46] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1008 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 5, number_of_data_nodes: 5, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 766, active_shards: 1312, relocating_shards: 0, initializing_shards: 24, unassigned_shards: 197, delayed_unassig
[19:31:46] <icinga-wm>	 ds: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 85.58382257012394 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:32:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[19:32:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[19:35:15] <wikibugs>	 (03PS1) 10Herron: kafka-logging1005: prep for trixie [puppet] - 10https://gerrit.wikimedia.org/r/1282412 (https://phabricator.wikimedia.org/T417001)
[19:35:44] <jinxer-wm>	 FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:36:29] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9600 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-psi-eqiad: cluster_name: cloudelastic-psi-eqiad, status: green, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 818, active_shards: 1641, relocating_shards: 2, initializing_shards: 0, unassigned_shards: 0, delayed_unassigned_
[19:36:29] <icinga-wm>	 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:37:07] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
[19:37:10] <stashbot>	 T424852: Investigate performance issues in cloudelastic - https://phabricator.wikimedia.org/T424852
[19:38:37] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9400 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-omega-eqiad: cluster_name: cloudelastic-omega-eqiad, status: green, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 827, active_shards: 1660, relocating_shards: 0, initializing_shards: 0, unassigned_shards: 0, delayed_unassig
[19:38:37] <icinga-wm>	 ds: 0, number_of_pending_tasks: 1, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 100.0 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:38:49] <icinga-wm>	 RECOVERY - OpenSearch health check for shards on 9200 on cloudelastic1012 is OK: OK - elasticsearch status cloudelastic-chi-eqiad: cluster_name: cloudelastic-chi-eqiad, status: yellow, timed_out: False, number_of_nodes: 6, number_of_data_nodes: 6, discovered_master: True, discovered_cluster_manager: True, active_primary_shards: 768, active_shards: 1408, relocating_shards: 0, initializing_shards: 23, unassigned_shards: 111, delayed_unassig
[19:38:49] <icinga-wm>	 ds: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, task_max_waiting_in_queue_millis: 0, active_shards_percent_as_number: 91.30998702983139 https://wikitech.wikimedia.org/wiki/Search%23Administration
[19:39:03] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[19:40:25] <logmsgbot>	 !log bking@cumin2002 END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
[19:42:18] <logmsgbot>	 !log herron@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage
[19:44:47] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.netbox
[19:46:09] <wikibugs>	 (03PS2) 10Ladsgroup: Close Hebrew Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282410 (https://phabricator.wikimedia.org/T421796) (owner: 10Neriah)
[19:48:18] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt1077-1080: use efi in pressed [puppet] - 10https://gerrit.wikimedia.org/r/1282413 (https://phabricator.wikimedia.org/T425088)
[19:48:52] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage
[19:49:31] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003"
[19:49:37] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003"
[19:49:37] <logmsgbot>	 !log ayounsi@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:50:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] cloudvirt1077-1080: use efi in pressed [puppet] - 10https://gerrit.wikimedia.org/r/1282413 (https://phabricator.wikimedia.org/T425088) (owner: 10Andrew Bogott)
[19:50:58] <logmsgbot>	 !log ayounsi@cumin1003 START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors
[19:51:00] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[19:51:18] <logmsgbot>	 !log ayounsi@cumin1003 END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors
[19:54:10] <Amir1>	 jouncebot: nowandnext
[19:54:10] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 5 minute(s)
[19:54:10] <jouncebot>	 In 0 hour(s) and 5 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2000)
[19:58:57] <wikibugs>	 (03PS3) 10Neriah: Enable Hebrew keyboard DWIM for namespace resolution on hewikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468)
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: Your horoscope predicts another UTC late backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2000).
[20:00:05] <jouncebot>	 toyofuku, manfredi, and Neriah: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:50] <toyofuku>	 Here!
[20:00:59] <toyofuku>	 I will be deploying momentarily
[20:01:16] <cjming>	 o/
[20:01:25] <cjming>	 i can help deploy for those needing a deployer
[20:01:32] <manfredi>	 Hey, I am around
[20:01:36] <manfredi>	 thanks
[20:01:36] <toyofuku>	 <3
[20:01:36] <Neriah>	 Hi
[20:02:07] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by toyofuku@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1277667 (https://phabricator.wikimedia.org/T421776) (owner: 10Stoyofuku-wmf)
[20:03:07] <wikibugs>	 (03Merged) 10jenkins-bot: Enable the reading list beta feature survey on all wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1277667 (https://phabricator.wikimedia.org/T421776) (owner: 10Stoyofuku-wmf)
[20:03:42] <logmsgbot>	 !log toyofuku@deploy1003 Started scap sync-world: Backport for [[gerrit:1277667|Enable the reading list beta feature survey on all wikipedias (T421776)]]
[20:03:45] <stashbot>	 T421776: Enable the beta feature survey - https://phabricator.wikimedia.org/T421776
[20:05:25] <logmsgbot>	 !log toyofuku@deploy1003 toyofuku: Backport for [[gerrit:1277667|Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:05:44] <toyofuku>	 testing now
[20:06:50] <logmsgbot>	 !log toyofuku@deploy1003 toyofuku: Continuing with deployment
[20:06:55] <toyofuku>	 Tests looked good
[20:07:14] <logmsgbot>	 !log herron@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie
[20:11:03] <logmsgbot>	 !log toyofuku@deploy1003 Finished scap sync-world: Backport for [[gerrit:1277667|Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s)
[20:11:06] <stashbot>	 T421776: Enable the beta feature survey - https://phabricator.wikimedia.org/T421776
[20:11:14] <toyofuku>	 swiggity swag
[20:11:19] <toyofuku>	 over to the next person!
[20:11:31] <cjming>	 nice
[20:12:33] <cjming>	 manfredi: i can deploy your patches - the 1st one tho - will need to pass CI before we can do anything - shall i continue with your 2nd backport?
[20:12:41] <manfredi>	 ok
[20:14:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1003 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282385 (owner: 10Mmartorana)
[20:16:08] <wikibugs>	 (03Merged) 10jenkins-bot: Revert^2 "Use js promise for email confirmation banner" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1282385 (owner: 10Mmartorana)
[20:16:26] <logmsgbot>	 !log cjming@deploy1003 Started scap sync-world: Backport for [[gerrit:1282385|Revert^2 "Use js promise for email confirmation banner"]]
[20:17:55] <cjming>	 manfredi: since ^^ merged do you want to try rebasing your 1st patch to see if it can pass CI?
[20:18:07] <logmsgbot>	 !log cjming@deploy1003 mmartorana, cjming: Backport for [[gerrit:1282385|Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:18:27] <manfredi>	 It's a failing test not related to my patch at all
[20:18:32] <manfredi>	 How am I supposed to fix this? 
[20:20:57] <wikibugs>	 (03PS4) 10Mmartorana: Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366)
[20:22:55] <cjming>	 let's see what happens
[20:28:44] <cjming>	 hmm - not passing
[20:28:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[20:31:12] <cjming>	 looks like this patch was recently merged - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/1281829
[20:31:21] <manfredi>	 cjming: i have been dealing with this, it was verified in the past days, and the main patch is already merged on the train 
[20:32:00] <manfredi>	 Would it be possible to ignore it and merge it anyway? 
[20:32:22] <cjming>	 we can try
[20:32:32] <manfredi>	 I appreciate it 
[20:32:44] <cjming>	 i wonder if the Scribunto patch needs to be backported as well
[20:33:07] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[20:33:36] <cjming>	 huh - looks like it was backported
[20:33:42] <cjming>	 ok - let's try it
[20:34:25] <cjming>	 oh whoops - btw manfredi - your 2nd patch - is it ok to sync?
[20:34:30] <manfredi>	 yes
[20:34:36] <logmsgbot>	 !log cjming@deploy1003 mmartorana, cjming: Continuing with deployment
[20:34:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[20:34:51] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[20:34:53] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[20:34:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[20:36:31] <jinxer-wm>	 FIRING: Traffic bill over quota: Alert for device cr2-magru.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[20:38:46] <logmsgbot>	 !log cjming@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282385|Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s)
[20:39:53] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[20:39:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[20:40:11] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[20:40:17] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[20:42:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1003 using scap backport" [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[20:46:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Email confirmation banner: Remove obsolete arm_b variant [core] (wmf/1.46.0-wmf.26) - 10https://gerrit.wikimedia.org/r/1281501 (https://phabricator.wikimedia.org/T421366) (owner: 10Mmartorana)
[20:47:25] <cjming>	 manfredi: as i suspected - it won't merge w/o passing tests
[20:47:38] <cjming>	 https://spiderpig.wikimedia.org/jobs/1885
[20:47:47] <Neriah>	 not passing
[20:48:17] <Neriah>	 oops
[20:48:21] <Neriah>	 Is it possible to deploy my change until the issue is resolved?
[20:49:06] <manfredi>	 cjming: so no way to deploy this today?
[20:49:14] <cjming>	 manfredi: i would reach out to the engineers who worked on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/1281829
[20:49:15] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[20:49:21] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[20:49:32] <cjming>	 manfredi: i don't think so - not until CI passes - i don't know of a way to bypass
[20:50:12] <icinga-wm>	 RECOVERY - Host ps1-22-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 72.89 ms
[20:50:12] <icinga-wm>	 RECOVERY - Host ps1-23-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 72.11 ms
[20:50:17] <icinga-wm>	 RECOVERY - Host asw2-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 71.76 ms
[20:50:21] <cjming>	 Neriah: do you need a deployer?
[20:51:02] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - cloudelasticlb6_8443: Servers cloudelastic1011.eqiad.wmnet, cloudelastic1010.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9643: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1010.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb_8643: Servers cloudelastic1007.eqiad.wmnet,
[20:51:02] <icinga-wm>	 astic1010.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb_9243: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1010.eqiad.wmnet, cloudelastic1008.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9243: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1010.eqiad.wmnet, cloudelastic1008.eqiad.wmnet are marked down but pooled: cloudelasticlb6_8243: Servers cloudelastic1011.eqiad.wmnet, cloudelas
[20:51:02] <icinga-wm>	 eqiad.wmnet, cloudelastic1008.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9443: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1011.eqiad.wmnet, cloudelastic1010.eqiad.wm https://wikitech.wikimedia.org/wiki/PyBal
[20:51:02] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1018 is CRITICAL: PYBAL CRITICAL - CRITICAL - cloudelasticlb6_8443: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1011.eqiad.wmnet, cloudelastic1010.eqiad.wmnet are marked down but pooled: cloudelasticlb_9243: Servers cloudelastic1010.eqiad.wmnet, cloudelastic1008.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb_8643: Servers cloudelastic1007.eqiad.wmnet, 
[20:51:02] <icinga-wm>	 stic1010.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9243: Servers cloudelastic1010.eqiad.wmnet, cloudelastic1008.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb6_8243: Servers cloudelastic1010.eqiad.wmnet, cloudelastic1008.eqiad.wmnet, cloudelastic1009.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9643: Servers cloudelastic1007.eqiad.wmnet, cloudelas
[20:51:03] <cjming>	 which patch?
[20:51:03] <icinga-wm>	 eqiad.wmnet, cloudelastic1010.eqiad.wmnet are marked down but pooled: cloudelasticlb6_9443: Servers cloudelastic1007.eqiad.wmnet, cloudelastic1010.eqiad.wmnet, cloudelastic1009.eqiad.wm https://wikitech.wikimedia.org/wiki/PyBal
[20:51:25] <Neriah>	 cjming: ya
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9443: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Omega Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8443: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9243: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Omega Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8443: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9643: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8243: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:28] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8243: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:29] <icinga-wm>	 PROBLEM - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9643: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:29] <icinga-wm>	 PROBLEM - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9443: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:30] <icinga-wm>	 PROBLEM - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 9243: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:30] <icinga-wm>	 PROBLEM - WMF Cloud -Psi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8643: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:31] <icinga-wm>	 PROBLEM - WMF Cloud -Psi Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is CRITICAL: connect to address 208.80.154.241 and port 8643: Connection refused https://wikitech.wikimedia.org/wiki/Search%23Administration
[20:51:39] <cjming>	 Neriah: which patch?
[20:51:40] <Neriah>	 https://gerrit.wikimedia.org/r/c/1276432/
[20:51:57] <Neriah>	 https://spiderpig.wikimedia.org/?backport=1276432
[20:53:13] <cjming>	 Neriah: do you know if the dependent patch - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1258413 was backported to 1.46.0-wmf.26 or does it not matter?
[20:53:58] <Neriah>	 It shouldn't matter
[20:54:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[20:54:15] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[20:54:19] <manfredi>	 cjming: Is 1282385 deployed? 
[20:54:43] <cjming>	 manfredi: yes - that should be live
[20:55:06] <cjming>	 manfredi: i'm going to proceed with the next patch - lmk if you're able to sort out CI issues with your 1st patch and we can retry
[20:55:41] <manfredi>	 Ok thank you
[20:56:31] <jinxer-wm>	 RESOLVED: Traffic bill over quota: Alert for device cr2-magru.wikimedia.org - Traffic bill over quota   - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
[20:57:09] <Neriah>	 cjming: oops, I thought you meant something else
[20:57:16] <cjming>	 https://www.irccloud.com/pastebin/mKCP065T/
[20:57:54] <Neriah>	 the change you asked about was deployed in 1.46.0-wmf.26
[20:58:47] <cjming>	 ya - not sure why spiderpig is telling us it needs to be backported to wmf.24
[20:58:57] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[21:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: That opportune time for a Weekly Security deployment window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2100).
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 750 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Omega Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 750 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 746 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 746 bytes in 0.013 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Omega Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Omega Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:28] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:29] <icinga-wm>	 RECOVERY - WMF Cloud -Psi Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:29] <icinga-wm>	 RECOVERY - WMF Cloud -Psi Cluster- - Prod MW AppServer Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:00:30] <icinga-wm>	 RECOVERY - WMF Cloud -Psi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 746 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[21:02:32] <cjming>	 Neriah: I'm inclined to make sure the dependency is in the target release branches before your config patch can be deployed
[21:03:15] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1282320 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[21:03:40] <Neriah>	 cjming: Um, I didn't understand
[21:04:41] <jinxer-wm>	 FIRING: CirrusSearchSaneitizerFixRateTooHigh: MediaWiki CirrusSearch Saneitizer is fixing an abnormally high number of documents in cloudelastic - https://wikitech.wikimedia.org/wiki/Search/CirrusStreamingUpdater#San(e)itizing - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?viewPanel=59&orgId=1&from=now-6M&to=now&var-search_cluster=cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchSaneitizerFixRateT
[21:04:47] <cjming>	 Neriah: i think https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1258413 needs to be backported first before your config patch should be deployed
[21:05:13] <cjming>	 i only see that change in master, not in wmf.26 or prior
[21:05:47] <Neriah>	 https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/refs/heads/wmf/1.46.0-wmf.26/profiles/SecondTryProfiles.config.php#73
[21:06:49] <wikibugs>	 (03Abandoned) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.1 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282277 (https://phabricator.wikimedia.org/T419511) (owner: 10Santiago Faci)
[21:07:08] <cjming>	 ok - i'm going to err on rolling forward - the msg says it should be in wmf.24 but maybe it's fine if dependency is in wmf.26
[21:07:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cjming@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468) (owner: 10Neriah)
[21:08:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUns
[21:08:51] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Hebrew keyboard DWIM for namespace resolution on hewikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1276432 (https://phabricator.wikimedia.org/T412468) (owner: 10Neriah)
[21:08:52] <wikibugs>	 (03PS1) 10Bking: Revert "cloudelastic: Disable systemd override that uses PrivateMounts" [puppet] - 10https://gerrit.wikimedia.org/r/1282416
[21:09:00] <wikibugs>	 (03CR) 10Bking: [V:03+2 C:03+2] Revert "cloudelastic: Disable systemd override that uses PrivateMounts" [puppet] - 10https://gerrit.wikimedia.org/r/1282416 (owner: 10Bking)
[21:09:07] <logmsgbot>	 !log cjming@deploy1003 Started scap sync-world: Backport for [[gerrit:1276432|Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]]
[21:09:10] <stashbot>	 T412468: DWIM mapping does not support namespaces - https://phabricator.wikimedia.org/T412468
[21:09:45] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[21:09:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[21:10:02] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[21:10:08] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[21:10:26] <wikibugs>	 (03PS1) 10Ryan Kemper: Revert "cloudelastic: explicitly disable security plugin" [puppet] - 10https://gerrit.wikimedia.org/r/1282417
[21:10:48] <logmsgbot>	 !log cjming@deploy1003 cjming, neriah: Backport for [[gerrit:1276432|Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:11:15] <cjming>	 Neriah: on test servers - lmk if/when i can sync
[21:11:35] <Neriah>	 testing now
[21:12:07] <wikibugs>	 (03CR) 10Bking: [C:03+2] Revert "cloudelastic: explicitly disable security plugin" [puppet] - 10https://gerrit.wikimedia.org/r/1282417 (owner: 10Ryan Kemper)
[21:14:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterSetWeightedTagsTooLow: ...
[21:14:50] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is setting too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterSetWeightedTagsTooLow
[21:14:53] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterClearWeightedTagsTooLow: ...
[21:14:59] <jinxer-wm>	 CirrusSearch consumer-cloudelastic@eqiad is clearing too few weighted tags - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/fe251f4f-f6cf-4010-8d78-5f482255b16f/cirrussearch-update-pipeline-weighted-tags?orgId=1&var-tag_prefix=All&var-search_cluster_site=eqiad&var-search_cluster=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterClearWeightedTagsTooLow
[21:16:10] <Neriah>	 cjming: Tests looked good
[21:16:15] <cjming>	 cool - syncing
[21:16:18] <logmsgbot>	 !log cjming@deploy1003 cjming, neriah: Continuing with deployment
[21:18:45] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterRateTooLow: CirrusSearch update rate from flink-app-consumer-cloudelastic is critically low - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/jKqki4MSk/cirrus-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterRateTooLow
[21:18:46] <jinxer-wm>	 FIRING: [2x] Outbound discards: Alert for device asw2-a-eqiad.mgmt.eqiad.wmnet - Outbound discards   - https://alerts.wikimedia.org/?q=alertname%3DOutbound+discards
[21:19:03] <jinxer-wm>	 RESOLVED: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster logging-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=logging-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[21:20:28] <logmsgbot>	 !log cjming@deploy1003 Finished scap sync-world: Backport for [[gerrit:1276432|Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s)
[21:20:30] <jinxer-wm>	 FIRING: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUnsta
[21:20:37] <stashbot>	 T412468: DWIM mapping does not support namespaces - https://phabricator.wikimedia.org/T412468
[21:20:51] <cjming>	 Neriah: should be live
[21:20:57] <Neriah>	 nice
[21:21:07] <Neriah>	 Thank you :D!
[21:21:30] <cjming>	 yw!
[21:25:15] <jinxer-wm>	 RESOLVED: CirrusStreamingUpdaterFlinkJobUnstable: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusStreamingUpdaterFlinkJobUns
[21:28:27] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[21:32:24] <logmsgbot>	 !log cwhite@deploy1003 Started deploy [statsv/statsv@152de49]: fix logging
[21:32:36] <logmsgbot>	 !log cwhite@deploy1003 Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s)
[21:32:43] <wikibugs>	 (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.2 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282419 (https://phabricator.wikimedia.org/T424958)
[21:37:27] <wikibugs>	 (03CR) 10Clare Ming: [C:03+2] Test Kitchen UI: Deploy v1.3.2 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282419 (https://phabricator.wikimedia.org/T424958) (owner: 10Santiago Faci)
[21:39:22] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v1.3.2 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282419 (https://phabricator.wikimedia.org/T424958) (owner: 10Santiago Faci)
[21:42:17] <wikibugs>	 (03PS1) 10Santiago Faci: Test Kitchen UI: Deploy v1.3.2 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1282420 (https://phabricator.wikimedia.org/T419511)
[21:42:52] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
[21:43:15] <logmsgbot>	 !log sfaci@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
[21:47:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[21:47:53] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[21:47:58] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[21:53:34] <wikibugs>	 (03CR) 10JHathaway: "I found the current manpage text pretty difficult to grok, ironically" [puppet] - 10https://gerrit.wikimedia.org/r/1282319 (https://phabricator.wikimedia.org/T425301) (owner: 10Gehel)
[22:03:28] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - SSL Expiry on cloudelastic.wikimedia.org is OK: OK - Certificate cloudelastic.wikimedia.org will expire on Sun 05 Jul 2026 07:49:09 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Search%23Administration
[22:03:28] <icinga-wm>	 RECOVERY - WMF Cloud -Chi Cluster- - Public Internet Port - HTTPS on cloudelastic.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 746 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Search%23Administration
[22:05:02] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[22:06:45] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[22:06:50] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[22:06:55] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[22:07:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[22:08:02] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1018 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[22:08:47] <logmsgbot>	 !log akhatun@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
[22:12:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[22:12:50] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[22:14:55] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] Add Wikifunctions' evaluator ingress endpoints to service.yaml [puppet] - 10https://gerrit.wikimedia.org/r/1280433 (https://phabricator.wikimedia.org/T424193) (owner: 10Elukey)
[22:19:19] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] profile::services_proxy::envoy: add wikifunctions eval endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1280435 (https://phabricator.wikimedia.org/T424193) (owner: 10Elukey)
[22:19:27] <wikibugs>	 (03CR) 10RLazarus: [C:03+1] Turn Wikifunctions evaluator endpoints to production state [puppet] - 10https://gerrit.wikimedia.org/r/1280434 (https://phabricator.wikimedia.org/T424193) (owner: 10Elukey)
[22:19:31] <wikibugs>	 (03PS1) 10Papaul: Add bgp from mr to core switches [homer/public] - 10https://gerrit.wikimedia.org/r/1282427 (https://phabricator.wikimedia.org/T408892)
[22:21:17] <wikibugs>	 (03CR) 10RLazarus: "Time to dust this off?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1275467 (https://phabricator.wikimedia.org/T423311) (owner: 10RLazarus)
[22:23:03] <wikibugs>	 (03CR) 10Papaul: [C:03+2] Add bgp from mr to core switches [homer/public] - 10https://gerrit.wikimedia.org/r/1282427 (https://phabricator.wikimedia.org/T408892) (owner: 10Papaul)
[22:31:17] <wikibugs>	 (03PS1) 10Dzahn: tcpproxy: add support for gitlab-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1282428
[22:34:54] <wikibugs>	 (03PS2) 10Dzahn: tcpproxy: add support for gitlab-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1282428
[22:39:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[22:39:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[22:42:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[22:43:39] <jinxer-wm>	 FIRING: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and KPN (139.156.127.121) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[22:46:16] <icinga-wm>	 RECOVERY - Host mr1-ulsfo IPv6 is UP: PING OK - Packet loss = 0%, RTA = 71.79 ms
[22:47:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[22:52:36] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11888002 (10Papaul)
[22:53:59] <wikibugs>	 (03PS1) 10Dzahn: delete mwmaint.discovery.wmnet [dns] - 10https://gerrit.wikimedia.org/r/1282430
[23:00:05] <jouncebot>	 Deploy window Readers deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260504T2300)
[23:06:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282410 (https://phabricator.wikimedia.org/T421796) (owner: 10Neriah)
[23:06:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[23:07:32] <wikibugs>	 (03Merged) 10jenkins-bot: Close Hebrew Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282410 (https://phabricator.wikimedia.org/T421796) (owner: 10Neriah)
[23:07:49] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282410|Close Hebrew Wikinews (T421796)]]
[23:07:57] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[23:09:43] <logmsgbot>	 !log ladsgroup@deploy1003 neriah, ladsgroup: Backport for [[gerrit:1282410|Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[23:10:22] <logmsgbot>	 !log ladsgroup@deploy1003 neriah, ladsgroup: Continuing with deployment
[23:10:31] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11888032 (10Papaul)
[23:11:45] <jinxer-wm>	 RESOLVED: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[23:14:34] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282410|Close Hebrew Wikinews (T421796)]] (duration: 06m 45s)
[23:14:37] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[23:24:45] <jinxer-wm>	 RESOLVED: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[23:24:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[23:33:45] <jinxer-wm>	 FIRING: CirrusConsumerFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): fetch error rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerFetchErrorRate
[23:35:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: netbox_ganeti_ulsfo02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:39:56] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1282431
[23:39:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1282431 (owner: 10TrainBranchBot)
[23:43:46] <wikibugs>	 (03PS1) 10Ladsgroup: Close Bosnian Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282432 (https://phabricator.wikimedia.org/T421796)
[23:44:45] <jinxer-wm>	 FIRING: CirrusConsumerRerenderFetchErrorRate: cirrus_streaming_updater_consumer_cloudelastic_eqiad in eqiad (k8s): ...
[23:44:45] <jinxer-wm>	 fetch error (rerenders) rate too high - https://wikitech.wikimedia.org/wiki/Search#Streaming_Updater - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?var-datasource=eqiad+prometheus%2Fk8s&var-namespace=cirrus-streaming-updater&var-helm_release=consumer-cloudelastic - https://alerts.wikimedia.org/?q=alertname%3DCirrusConsumerRerenderFetchErrorRate
[23:45:32] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282432 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[23:46:25] <wikibugs>	 (03Merged) 10jenkins-bot: Close Bosnian Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282432 (https://phabricator.wikimedia.org/T421796) (owner: 10Ladsgroup)
[23:46:42] <logmsgbot>	 !log ladsgroup@deploy1003 Started scap sync-world: Backport for [[gerrit:1282432|Close Bosnian Wikinews (T421796)]]
[23:46:45] <stashbot>	 T421796: Close 31 editions of Wikinews on 2026-05-04 (make them read-only) - https://phabricator.wikimedia.org/T421796
[23:48:24] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Backport for [[gerrit:1282432|Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[23:48:39] <jinxer-wm>	 RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr2-esams and KPN (139.156.127.121) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
[23:49:16] <logmsgbot>	 !log ladsgroup@deploy1003 ladsgroup: Continuing with deployment
[23:50:59] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1282431 (owner: 10TrainBranchBot)
[23:53:27] <logmsgbot>	 !log ladsgroup@deploy1003 Finished scap sync-world: Backport for [[gerrit:1282432|Close Bosnian Wikinews (T421796)]] (duration: 06m 45s)
[23:58:13] <wikibugs>	 (03PS1) 10Ladsgroup: Close Catalan Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1282434 (https://phabricator.wikimedia.org/T421796)