[00:00:03] <wikibugs>	 (03PS1) 10Bvibber: Respect wgThumbnailSteps when generating thumbs [extensions/Popups] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211280 (https://phabricator.wikimedia.org/T411013)
[00:01:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [core] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211277 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[00:01:13] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
[00:01:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211278 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[00:01:29] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
[00:01:39] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Popups] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211279 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[00:01:52] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
[00:01:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [extensions/Popups] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211280 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[00:02:07] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
[00:02:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
[00:02:55] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[00:04:06] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11407956 (10Papaul)
[00:04:12] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
[00:04:34] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
[00:05:01] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[00:05:27] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[00:05:46] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
[00:06:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
[00:07:00] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/termbox: apply
[00:07:31] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
[00:07:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/toolhub: apply
[00:08:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
[00:09:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[00:09:23] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11407960 (10Papaul) @RobH I update the task description with all the connections that we need for phase 1 in December. Please don't forget the Cable ID's. Please...
[00:09:26] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[00:09:51] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
[00:10:12] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
[00:10:49] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
[00:11:40] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
[00:12:09] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] START helmfile.d/services/zotero: apply
[00:12:37] <logmsgbot>	 !log rzl@deploy2002 helmfile [eqiad] DONE helmfile.d/services/zotero: apply
[00:14:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[00:14:31] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[00:14:35] <swfrench-wmf>	 jouncebot: nowandnext
[00:14:35] <jouncebot>	 No deployments scheduled for the next 6 hour(s) and 45 minute(s)
[00:14:35] <jouncebot>	 In 6 hour(s) and 45 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T0700)
[00:14:55] <wikibugs>	 (03CR) 10Scott French: "Thanks for the review!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1208037 (https://phabricator.wikimedia.org/T405955) (owner: 10Scott French)
[00:14:57] <wikibugs>	 (03CR) 10Scott French: [C:03+2] mw-*: clean up 8.3 migration rollingUpdate and timeout tweaks [deployment-charts] - 10https://gerrit.wikimedia.org/r/1208037 (https://phabricator.wikimedia.org/T405955) (owner: 10Scott French)
[00:15:22] <swfrench-wmf>	 FYI, I'll be running a helmfile-only scap deployment in a few minutes, once the above is merged.
[00:16:55] <wikibugs>	 (03Merged) 10jenkins-bot: mw-*: clean up 8.3 migration rollingUpdate and timeout tweaks [deployment-charts] - 10https://gerrit.wikimedia.org/r/1208037 (https://phabricator.wikimedia.org/T405955) (owner: 10Scott French)
[00:19:53] <logmsgbot>	 !log swfrench@deploy2002 Started scap sync-world: Helmfile-only deployment to clean up migration overrides - T405955
[00:19:58] <stashbot>	 T405955: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955
[00:20:13] <denisse>	 !log Upgrading  envoy on Grafana hosts -  T405808
[00:20:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:20:18] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[00:20:56] <denisse>	 !log Upgrading envoy on prometheus1005.eqiad.wmnet -  T405808
[00:21:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:22:17] <logmsgbot>	 !log swfrench@deploy2002 Finished scap sync-world: Helmfile-only deployment to clean up migration overrides - T405955 (duration: 04m 10s)
[00:22:46] <swfrench-wmf>	 all done on my end :)
[00:23:06] <denisse>	 !log Upgrading envoy on prometheus hosts -  T405808
[00:23:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:08] <rzl>	 more envoys!
[00:24:18] <denisse>	 !log Upgrading envoy on prometheus::pop hosts -  T405808
[00:24:21] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/apertium: apply
[00:24:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:24:54] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/apertium: apply
[00:26:11] <denisse>	 !log Upgrading envoy on Graphite hosts -  T405808
[00:26:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:26:16] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[00:28:13] <wikibugs>	 (03PS1) 10Cwhite: admin: add new ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1211284
[00:28:27] <denisse>	 !log Upgrading envoy on 'logstash1023.eqiad.wmnet' -  T405808
[00:28:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:28:35] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop: apply
[00:29:07] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop: apply
[00:29:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
[00:30:04] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
[00:30:27] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/chart-renderer: apply
[00:30:42] <denisse>	 !log Upgrading envoy on logstash hosts -  T405808
[00:30:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:30:52] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
[00:31:15] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[00:31:39] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[00:32:00] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
[00:32:14] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
[00:32:38] <denisse>	 !log Upgrading envoy on 'titan1001.eqiad.wmnet' -  T405808
[00:32:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:32:43] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[00:33:09] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/data-gateway: apply
[00:33:21] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
[00:33:43] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/developer-portal: apply
[00:33:50] <denisse>	 !log Upgrading envoy on titan hosts -  T405808
[00:33:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:33:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
[00:34:03] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "This seems good to go but was waiting for you to confirm." [puppet] - 10https://gerrit.wikimedia.org/r/1196792 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[00:34:19] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/device-analytics: apply
[00:34:32] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
[00:35:27] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/echostore: apply
[00:36:23] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/echostore: apply
[00:36:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/edit-analytics: apply
[00:37:04] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
[00:37:07] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-api-ext-next_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-api-ext-next_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[00:37:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/editor-analytics: apply
[00:38:08] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
[00:38:46] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
[00:39:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
[00:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[00:39:48] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
[00:40:19] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211287
[00:40:20] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211287 (owner: 10TrainBranchBot)
[00:40:23] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
[00:40:54] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
[00:41:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: httpbb_kubernetes_mw-api-ext-next_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:41:30] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
[00:42:09] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-main: apply
[00:42:40] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
[00:43:46] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventstreams: apply
[00:44:32] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
[00:44:46] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
[00:45:52] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
[00:46:45] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/geo-analytics: apply
[00:47:00] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
[00:47:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/image-suggestion: apply
[00:47:36] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
[00:47:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/ipoid: apply
[00:48:16] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/ipoid: apply
[00:48:43] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/kartotherian: apply
[00:49:47] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
[00:50:47] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/mathoid: apply
[00:51:15] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/mathoid: apply
[00:52:10] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/media-analytics: apply
[00:52:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
[00:53:29] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/miscweb: apply
[00:54:00] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1211287 (owner: 10TrainBranchBot)
[00:55:21] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/miscweb: apply
[00:55:54] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/mobileapps: apply
[00:56:34] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
[00:57:36] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
[00:57:40] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[00:58:17] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/page-analytics: apply
[00:58:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
[00:59:24] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/proton: apply
[01:00:34] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/proton: apply
[01:00:52] <logmsgbot>	 !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
[01:00:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/push-notifications: apply
[01:01:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
[01:01:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[01:01:57] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[01:02:26] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/recommendation-api: apply
[01:02:54] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
[01:03:29] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/sessionstore: apply
[01:03:45] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
[01:04:41] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox: apply
[01:05:21] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox: apply
[01:05:43] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
[01:05:59] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
[01:06:29] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-media: apply
[01:06:43] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
[01:07:07] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
[01:07:29] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
[01:09:04] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
[01:09:34] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
[01:10:21] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[01:10:25] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/shellbox-video: apply
[01:10:26] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[01:10:48] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211294
[01:10:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211294 (owner: 10TrainBranchBot)
[01:10:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
[01:11:55] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
[01:12:25] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
[01:12:40] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/termbox: apply
[01:13:22] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
[01:13:48] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 55s)
[01:13:56] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/toolhub: apply
[01:14:32] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/toolhub: apply
[01:15:21] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[01:15:32] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[01:16:34] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
[01:16:52] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
[01:17:11] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply
[01:17:53] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
[01:19:05] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] START helmfile.d/services/zotero: apply
[01:19:28] <logmsgbot>	 !log rzl@deploy2002 helmfile [codfw] DONE helmfile.d/services/zotero: apply
[01:20:47] <logmsgbot>	 !log rzl@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
[01:21:00] <wikibugs>	 (03PS1) 10Samuel (WMF): Set $wgRateLimits['hcaptchaedit'] for edit attempt log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211295 (https://phabricator.wikimedia.org/T406865)
[01:21:17] <logmsgbot>	 !log rzl@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
[01:21:31] <logmsgbot>	 !log rzl@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
[01:22:00] <logmsgbot>	 !log rzl@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
[01:22:21] <rzl>	 done for the day!
[01:22:28] <swfrench-wmf>	 \i/
[01:22:32] <rzl>	 \i/
[01:26:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:31:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr1-eqiad and fe80::7a4f:9b00:d4e:7c0c - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:34:36] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1211294 (owner: 10TrainBranchBot)
[02:20:03] <jinxer-wm>	 FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster  - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[02:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[02:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[03:35:03] <jinxer-wm>	 RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster  - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[03:41:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:41:33] <jinxer-wm>	 FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster  - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[03:51:33] <jinxer-wm>	 RESOLVED: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster  - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures
[03:58:09] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] P:ldap:client:ldaptui use OS packages for ldaptui [puppet] - 10https://gerrit.wikimedia.org/r/1211084 (owner: 10Slyngshede)
[03:59:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] interface::tagged: Remove legacy_vlan_naming option [puppet] - 10https://gerrit.wikimedia.org/r/1208307 (owner: 10Majavah)
[03:59:37] <jinxer-wm>	 FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[04:26:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:26:32] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot (apply updates) - ryankemper@cumin2002 - T410573
[04:26:36] <stashbot>	 T410573: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573
[04:31:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-drmrs and fe80::ee38:7300:1ae8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-drmrs:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[05:08:59] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:12:13] <wikibugs>	 (03CR) 10Brennen Bearnes: "I didn't wind up deploying this backport for last week's wmf.3 train. I'm AFK most of this week and I think at this point it probably isn'" [extensions/GlobalPreferences] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1207950 (https://phabricator.wikimedia.org/T410551) (owner: 10Brennen Bearnes)
[05:12:51] <icinga-wm>	 PROBLEM - Backup freshness on backup1014 is CRITICAL: Stale-full only: 1 (gerrit2003), Fresh: 141 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[05:12:56] <wikibugs>	 (03CR) 10Brennen Bearnes: "CCing jnuche for awareness as train conductor." [extensions/GlobalPreferences] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1207950 (https://phabricator.wikimedia.org/T410551) (owner: 10Brennen Bearnes)
[05:27:43] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2092 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:28:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2092:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:33:59] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:37:43] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2092 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[05:38:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2092:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:05:53] <icinga-wm>	 PROBLEM - Host cirrussearch2093 is DOWN: PING CRITICAL - Packet loss = 100%
[06:06:10] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T410589)', diff saved to https://phabricator.wikimedia.org/P85658 and previous config saved to /var/cache/conftool/dbconfig/20251126-060609-ladsgroup.json
[06:06:15] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[06:09:15] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
[06:10:24] <wikibugs>	 (03PS1) 10Marostegui: clouddb1022: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1211343
[06:11:16] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] clouddb1022: Remove note [puppet] - 10https://gerrit.wikimedia.org/r/1211343 (owner: 10Marostegui)
[06:14:19] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[06:14:38] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[06:14:46] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1158 (T410531)', diff saved to https://phabricator.wikimedia.org/P85659 and previous config saved to /var/cache/conftool/dbconfig/20251126-061445-marostegui.json
[06:14:51] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[06:16:57] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T410531)', diff saved to https://phabricator.wikimedia.org/P85660 and previous config saved to /var/cache/conftool/dbconfig/20251126-061656-marostegui.json
[06:17:46] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] gerrit: remove localbackup logic from failover [cookbooks] - 10https://gerrit.wikimedia.org/r/1210386 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[06:18:21] <icinga-wm>	 RECOVERY - Host cirrussearch2093 is UP: PING OK - Packet loss = 0%, RTA = 30.39 ms
[06:20:53] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2093 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:21:17] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P85661 and previous config saved to /var/cache/conftool/dbconfig/20251126-062116-ladsgroup.json
[06:23:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: push_cross_cluster_settings_9200.service on cirrussearch2093:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:24:14] <wikibugs>	 (03Merged) 10jenkins-bot: gerrit: remove localbackup logic from failover [cookbooks] - 10https://gerrit.wikimedia.org/r/1210386 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[06:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[06:32:05] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P85662 and previous config saved to /var/cache/conftool/dbconfig/20251126-063204-marostegui.json
[06:36:24] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P85663 and previous config saved to /var/cache/conftool/dbconfig/20251126-063624-ladsgroup.json
[06:38:25] <jinxer-wm>	 RESOLVED: [7x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2081:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:40:53] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2093 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[06:42:09] <moritzm>	 !log upgrade Envoy on puppetboard* T405808
[06:42:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:42:14] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[06:47:13] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P85664 and previous config saved to /var/cache/conftool/dbconfig/20251126-064712-marostegui.json
[06:47:32] <wikibugs>	 (03CR) 10Muehlenhoff: "Permission management is different on Cloud VPS (via Nova) and doesn't use the POSIX groups defines in profile::admin." [puppet] - 10https://gerrit.wikimedia.org/r/1211181 (owner: 10Muehlenhoff)
[06:51:31] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T410589)', diff saved to https://phabricator.wikimedia.org/P85665 and previous config saved to /var/cache/conftool/dbconfig/20251126-065131-ladsgroup.json
[06:51:37] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[06:51:47] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
[06:51:55] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1188 (T410589)', diff saved to https://phabricator.wikimedia.org/P85666 and previous config saved to /var/cache/conftool/dbconfig/20251126-065154-ladsgroup.json
[06:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T0700)
[07:02:20] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T410531)', diff saved to https://phabricator.wikimedia.org/P85667 and previous config saved to /var/cache/conftool/dbconfig/20251126-070219-marostegui.json
[07:02:25] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[07:02:35] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[07:02:43] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1170 (T410531)', diff saved to https://phabricator.wikimedia.org/P85668 and previous config saved to /var/cache/conftool/dbconfig/20251126-070243-marostegui.json
[07:08:23] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T410531)', diff saved to https://phabricator.wikimedia.org/P85669 and previous config saved to /var/cache/conftool/dbconfig/20251126-070822-marostegui.json
[07:08:29] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[07:09:22] <wikibugs>	 (03CR) 10Arnaudb: "A bit more details relevant to this patch:" [puppet] - 10https://gerrit.wikimedia.org/r/1196792 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[07:16:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2100:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:18:15] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in x1 T408663', diff saved to https://phabricator.wikimedia.org/P85670 and previous config saved to /var/cache/conftool/dbconfig/20251126-071815-marostegui.json
[07:18:20] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[07:18:58] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in x3 T408663', diff saved to https://phabricator.wikimedia.org/P85671 and previous config saved to /var/cache/conftool/dbconfig/20251126-071857-marostegui.json
[07:19:47] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s3 T408663', diff saved to https://phabricator.wikimedia.org/P85672 and previous config saved to /var/cache/conftool/dbconfig/20251126-071947-marostegui.json
[07:19:49] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211179 (owner: 10Muehlenhoff)
[07:20:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s2 T408663', diff saved to https://phabricator.wikimedia.org/P85673 and previous config saved to /var/cache/conftool/dbconfig/20251126-072038-marostegui.json
[07:21:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2100:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:21:31] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2100 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:21:42] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s1 T408663', diff saved to https://phabricator.wikimedia.org/P85674 and previous config saved to /var/cache/conftool/dbconfig/20251126-072141-marostegui.json
[07:23:31] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P85675 and previous config saved to /var/cache/conftool/dbconfig/20251126-072330-marostegui.json
[07:24:54] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1 C:03+2] P:ldap:client:ldaptui use OS packages for ldaptui [puppet] - 10https://gerrit.wikimedia.org/r/1211084 (owner: 10Slyngshede)
[07:25:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] interface::tagged: Remove legacy_vlan_naming option [puppet] - 10https://gerrit.wikimedia.org/r/1208307 (owner: 10Majavah)
[07:26:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2100:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:31:31] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2100 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:38:38] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P85676 and previous config saved to /var/cache/conftool/dbconfig/20251126-073837-marostegui.json
[07:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:51:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove dataset-admins [puppet] - 10https://gerrit.wikimedia.org/r/1211179 (owner: 10Muehlenhoff)
[07:53:18] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:53:41] <wikibugs>	 (03CR) 10Tbodt: Set up tokwiki namespaces (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[07:53:46] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1170 (T410531)', diff saved to https://phabricator.wikimedia.org/P85677 and previous config saved to /var/cache/conftool/dbconfig/20251126-075345-marostegui.json
[07:53:51] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[07:54:02] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[07:54:10] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:55:01] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
[07:55:20] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
[07:55:31] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Setup the growthbook-next DNS names [dns] - 10https://gerrit.wikimedia.org/r/1211072 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[07:55:39] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 3 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11408330 (10Reedy)
[07:55:48] <logmsgbot>	 !log brouberol@dns1004 START - running authdns-update
[07:56:55] <logmsgbot>	 !log brouberol@dns1004 END - running authdns-update
[07:58:00] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[07:58:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1174 (T410531)', diff saved to https://phabricator.wikimedia.org/P85678 and previous config saved to /var/cache/conftool/dbconfig/20251126-075807-marostegui.json
[07:59:24] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T410531)', diff saved to https://phabricator.wikimedia.org/P85679 and previous config saved to /var/cache/conftool/dbconfig/20251126-075924-marostegui.json
[07:59:30] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[07:59:37] <jinxer-wm>	 FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[08:00:04] <jouncebot>	 Amir1, Urbanecm, and awight: Your horoscope predicts another UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T0800).
[08:00:04] <jouncebot>	 bvibber: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[08:00:18] <bvibber>	 o/
[08:00:36] <bvibber>	 i can spiderpig these myself :)
[08:02:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bvibber@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211277 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:02:02] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bvibber@deploy2002 using scap backport" [extensions/Popups] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211279 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:02:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bvibber@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211278 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:02:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by bvibber@deploy2002 using scap backport" [extensions/Popups] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211280 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:03:59] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11408341 (10brouberol) Naïve q, piggybacking on @Eevans 's response: what about a DNS domain resolving to the node IPs? If we have a recent enough version, we can let the client perform th...
[08:04:51] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: verify and trust server ssh key in join-stack [puppet] - 10https://gerrit.wikimedia.org/r/1211592 (https://phabricator.wikimedia.org/T411023)
[08:05:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Fix relforge Cumin alias [puppet] - 10https://gerrit.wikimedia.org/r/1210962 (owner: 10Muehlenhoff)
[08:05:53] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook-next: define a preproduction growthbook instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211065 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[08:06:00] <wikibugs>	 (03PS4) 10Muehlenhoff: sre.ganeti.reboot-vm: Use skip_acked=True [cookbooks] - 10https://gerrit.wikimedia.org/r/1203483 (https://phabricator.wikimedia.org/T330136)
[08:07:09] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS [core] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211277 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:07:14] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211278 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:07:16] <wikibugs>	 (03Merged) 10jenkins-bot: Respect wgThumbnailSteps when generating thumbs [extensions/Popups] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211280 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:07:18] <wikibugs>	 (03Merged) 10jenkins-bot: Respect wgThumbnailSteps when generating thumbs [extensions/Popups] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211279 (https://phabricator.wikimedia.org/T411013) (owner: 10Bvibber)
[08:08:23] <logmsgbot>	 !log bvibber@deploy2002 Started scap sync-world: Backport for [[gerrit:1211277|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211279|Respect wgThumbnailSteps when generating thumbs (T411013)]], [[gerrit:1211278|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211280|Respect wgThumbnailSteps when generating thumbs (T411013)]]
[08:08:25] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2073 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:08:28] <stashbot>	 T411013: Popups should use standard thumbnail sizes - https://phabricator.wikimedia.org/T411013
[08:09:40] <_joe_>	 bvibber: <3 <#
[08:10:26] <wikibugs>	 (03PS1) 10Brouberol: postgresql-growthbook-next: fix typos in helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211594 (https://phabricator.wikimedia.org/T410999)
[08:10:46] <logmsgbot>	 !log bvibber@deploy2002 bvibber: Backport for [[gerrit:1211277|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211279|Respect wgThumbnailSteps when generating thumbs (T411013)]], [[gerrit:1211278|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211280|Respect wgThumbnailSteps when generating thumbs (T411013)]] synced to the testservers (see
[08:10:46] <logmsgbot>	 https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[08:11:25] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2073:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:11:29] <logmsgbot>	 !log bvibber@deploy2002 bvibber: Continuing with sync
[08:11:33] <bvibber>	 confirmed works!
[08:11:57] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] postgresql-growthbook-next: fix typos in helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211594 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[08:12:05] <_joe_>	 niiice!
[08:12:19] <bvibber>	 no more 497px wide images ;)
[08:13:04] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
[08:13:09] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
[08:13:13] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2076 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:14:32] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P85680 and previous config saved to /var/cache/conftool/dbconfig/20251126-081431-marostegui.json
[08:14:53] <wikibugs>	 (03PS1) 10Brouberol: growthbook-next: register namespace in the ceph and cloudnative operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211595 (https://phabricator.wikimedia.org/T410999)
[08:15:30] <logmsgbot>	 !log bvibber@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211277|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211279|Respect wgThumbnailSteps when generating thumbs (T411013)]], [[gerrit:1211278|mediawiki.util: Add adjustThumbWidthForSteps for step sizing in JS (T411013)]], [[gerrit:1211280|Respect wgThumbnailSteps when generating thumbs (T411013)]] (duration: 07
[08:15:30] <logmsgbot>	 m 07s)
[08:15:36] <stashbot>	 T411013: Popups should use standard thumbnail sizes - https://phabricator.wikimedia.org/T411013
[08:18:25] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2073 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:19:47] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11408386 (10elukey) Hi folks!   >>! In T410075#11407492, @Eevans wrote: >>>! In T410075#11400035, @elukey wrote: >> [ ... ] >>  >> Lemme know :) >  > Ok, so some background: >  > Any node...
[08:21:09] <bvibber>	 whee that was funsies
[08:21:25] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2073:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:23:13] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9600 on cirrussearch2076 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9600 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:26:37] <wikibugs>	 (03CR) 10Jaime Nuche: "Ack, thank you 👍" [extensions/GlobalPreferences] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1207950 (https://phabricator.wikimedia.org/T410551) (owner: 10Brennen Bearnes)
[08:26:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] proton: Bump image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211082 (owner: 10Muehlenhoff)
[08:29:29] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::thanos::swift: add tegola account for staging [puppet] - 10https://gerrit.wikimedia.org/r/1210599 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[08:29:40] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P85681 and previous config saved to /var/cache/conftool/dbconfig/20251126-082939-marostegui.json
[08:31:36] <logmsgbot>	 !log jmm@deploy2002 helmfile [staging] START helmfile.d/services/proton: apply
[08:32:33] <logmsgbot>	 !log jmm@deploy2002 helmfile [staging] DONE helmfile.d/services/proton: apply
[08:35:04] <wikibugs>	 (03CR) 10Elukey: [C:03+2] profile::pyrra::fs::slos::editing: fix citoid's success ratio SLO [puppet] - 10https://gerrit.wikimedia.org/r/1210608 (https://phabricator.wikimedia.org/T345627) (owner: 10Elukey)
[08:35:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s4 T408663', diff saved to https://phabricator.wikimedia.org/P85682 and previous config saved to /var/cache/conftool/dbconfig/20251126-083511-marostegui.json
[08:35:16] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[08:35:34] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s4 T408663', diff saved to https://phabricator.wikimedia.org/P85683 and previous config saved to /var/cache/conftool/dbconfig/20251126-083533-marostegui.json
[08:36:00] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] interface::tagged: Remove legacy_vlan_naming option [puppet] - 10https://gerrit.wikimedia.org/r/1208307 (owner: 10Majavah)
[08:38:20] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] pontoon: verify and trust server ssh key in join-stack [puppet] - 10https://gerrit.wikimedia.org/r/1211592 (https://phabricator.wikimedia.org/T411023) (owner: 10Filippo Giunchedi)
[08:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[08:39:54] <wikibugs>	 (03CR) 10Jaime Nuche: [C:03+1] releases::mediawiki: change the time when jenkins is restarted [puppet] - 10https://gerrit.wikimedia.org/r/1208406 (https://phabricator.wikimedia.org/T410729) (owner: 10Dzahn)
[08:41:24] <fabfur>	 !log depooling cp7001 to test known-client feature (T406545)
[08:41:29] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=no; selector: name=cp7001.*
[08:41:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:30] <stashbot>	 T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits - https://phabricator.wikimedia.org/T406545
[08:44:47] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T410531)', diff saved to https://phabricator.wikimedia.org/P85684 and previous config saved to /var/cache/conftool/dbconfig/20251126-084447-marostegui.json
[08:44:52] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[08:45:03] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
[08:45:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1181 (T410531)', diff saved to https://phabricator.wikimedia.org/P85685 and previous config saved to /var/cache/conftool/dbconfig/20251126-084510-marostegui.json
[08:46:10] <logmsgbot>	 !log jmm@deploy2002 helmfile [codfw] START helmfile.d/services/proton: apply
[08:46:36] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s5 T408663', diff saved to https://phabricator.wikimedia.org/P85686 and previous config saved to /var/cache/conftool/dbconfig/20251126-084635-marostegui.json
[08:46:41] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[08:47:24] <logmsgbot>	 !log jmm@deploy2002 helmfile [codfw] DONE helmfile.d/services/proton: apply
[08:47:59] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s6 T408663', diff saved to https://phabricator.wikimedia.org/P85687 and previous config saved to /var/cache/conftool/dbconfig/20251126-084758-marostegui.json
[08:48:10] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T410531)', diff saved to https://phabricator.wikimedia.org/P85688 and previous config saved to /var/cache/conftool/dbconfig/20251126-084810-marostegui.json
[08:50:31] <logmsgbot>	 !log jmm@deploy2002 helmfile [eqiad] START helmfile.d/services/proton: apply
[08:51:35] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 3 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11408456 (10aaron) Maybe related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198941  In...
[08:51:59] <logmsgbot>	 !log jmm@deploy2002 helmfile [eqiad] DONE helmfile.d/services/proton: apply
[08:52:27] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 3 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11408459 (10aaron) @Clement_Goubert and @hnowlan would know more.
[08:52:32] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s7 T408663', diff saved to https://phabricator.wikimedia.org/P85689 and previous config saved to /var/cache/conftool/dbconfig/20251126-085232-marostegui.json
[08:52:37] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[08:53:45] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s8 T408663', diff saved to https://phabricator.wikimedia.org/P85690 and previous config saved to /var/cache/conftool/dbconfig/20251126-085344-marostegui.json
[08:54:02] <elukey>	 !log `elukey@cumin1003:~$ sudo cumin 'thanos-fe*' 'systemctl restart swift-proxy' -b 1 -s 30` - Restart swift proxies to pick up the new tegola_staging account
[08:54:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:18] <wikibugs>	 (03PS13) 10Elukey: Add the sre.hosts.powercycle cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1198928
[08:55:49] <wikibugs>	 (03CR) 10Elukey: "Hey folks, lemme know if you like this or not :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1198928 (owner: 10Elukey)
[08:57:37] <fabfur>	 !log repooling cp7001 (T406545)
[08:57:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:43] <stashbot>	 T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits - https://phabricator.wikimedia.org/T406545
[08:57:45] <logmsgbot>	 !log fabfur@cumin1003 conftool action : set/pooled=yes; selector: name=cp7001.*
[08:58:09] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211605 (https://phabricator.wikimedia.org/T410906)
[08:59:10] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 06serviceops: Add a --rack flag to sre.k8s.pool-depool-node - https://phabricator.wikimedia.org/T410537#11408474 (10MLechvien-WMF) a:03MLechvien-WMF
[08:59:52] <wikibugs>	 (03PS1) 10Elukey: kubernetes: add maps-staging-codfw IPs [puppet] - 10https://gerrit.wikimedia.org/r/1211606 (https://phabricator.wikimedia.org/T381565)
[09:00:04] <jouncebot>	 jnuche and brennen: That opportune time for a MediaWiki train - Utc-0+Utc-7 Version deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T0900).
[09:00:16] <jnuche>	 morning, rolling out the train in a few minutes
[09:01:27] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+1] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211605 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[09:01:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211606 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[09:02:07] <wikibugs>	 (03CR) 10Elukey: [C:03+2] kubernetes: add maps-staging-codfw IPs [puppet] - 10https://gerrit.wikimedia.org/r/1211606 (https://phabricator.wikimedia.org/T381565) (owner: 10Elukey)
[09:03:18] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P85691 and previous config saved to /var/cache/conftool/dbconfig/20251126-090317-marostegui.json
[09:03:44] <wikibugs>	 (03PS1) 10TrainBranchBot: group1 to 1.46.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211608 (https://phabricator.wikimedia.org/T408274)
[09:03:47] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by jnuche@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211608 (https://phabricator.wikimedia.org/T408274) (owner: 10TrainBranchBot)
[09:04:36] <wikibugs>	 (03Merged) 10jenkins-bot: group1 to 1.46.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211608 (https://phabricator.wikimedia.org/T408274) (owner: 10TrainBranchBot)
[09:07:08] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] cache::text: enable bots rate limiting on cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1211061 (https://phabricator.wikimedia.org/T406555) (owner: 10Giuseppe Lavagetto)
[09:07:32] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1 C:03+2] cache::text: enable bots rate limiting on cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1211061 (https://phabricator.wikimedia.org/T406555) (owner: 10Giuseppe Lavagetto)
[09:08:44] <vgutierrez>	 !log depool cp7001
[09:08:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:50] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211605 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[09:10:41] <logmsgbot>	 !log jnuche@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.4  refs T408274
[09:10:46] <stashbot>	 T408274: 1.46.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T408274
[09:11:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2111:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:11:40] <phuedx>	 jouncebot nowandnext
[09:11:41] <jouncebot>	 For the next 1 hour(s) and 48 minute(s): MediaWiki train - Utc-0+Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T0900)
[09:11:41] <jouncebot>	 In 1 hour(s) and 48 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1100)
[09:11:46] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211605 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[09:13:25] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
[09:14:42] <phuedx>	 brennen, jnuche: There's a change riding the train that is causing ~250,000 validation errors every 15 minutes on the mediawiki.api_request event stream. I have a fix for it, which I can backport and deploy
[09:14:57] <phuedx>	 I'll update the train blockers task in a moment
[09:15:06] <wikibugs>	 (03PS1) 10Elukey: services: move tegola and kartotherian to the new staging db [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211609 (https://phabricator.wikimedia.org/T409528)
[09:15:13] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2111 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:15:18] <jnuche>	 phuedx: ack, thank you
[09:17:59] <vgutierrez>	 !log repool cp7001
[09:18:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:26] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P85692 and previous config saved to /var/cache/conftool/dbconfig/20251126-091825-marostegui.json
[09:19:16] <wikibugs>	 (03PS1) 10Volans: toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313)
[09:19:42] <wikibugs>	 (03PS1) 10Phuedx: Hooks: Only add global logging context for pageviews [extensions/MetricsPlatform] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211611 (https://phabricator.wikimedia.org/T409965)
[09:19:58] <wikibugs>	 (03PS2) 10Phuedx: Hooks: Only add global logging context for pageviews [extensions/MetricsPlatform] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211611 (https://phabricator.wikimedia.org/T409965)
[09:21:00] <wikibugs>	 (03PS2) 10Elukey: services: move tegola and kartotherian to the new staging db [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211609 (https://phabricator.wikimedia.org/T409528)
[09:21:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2111:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:24:47] <wikibugs>	 (03CR) 10Elukey: Add a staging-specific stream for Maps tiles change (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[09:25:13] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2111 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[09:25:26] <wikibugs>	 (03CR) 10Volans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[09:29:31] <wikibugs>	 (03CR) 10Volans: "PCC results at:" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[09:29:35] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211609 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[09:29:54] <wikibugs>	 (03CR) 10Muehlenhoff: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1208406 (https://phabricator.wikimedia.org/T410729) (owner: 10Dzahn)
[09:30:08] <wikibugs>	 (03CR) 10Elukey: [C:03+2] services: move tegola and kartotherian to the new staging db [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211609 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[09:30:37] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook-next: register namespace in the ceph and cloudnative operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211595 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:31:11] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+1] Hooks: Only add global logging context for pageviews [extensions/MetricsPlatform] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211611 (https://phabricator.wikimedia.org/T409965) (owner: 10Phuedx)
[09:31:18] <phuedx>	 jnuche: The cherry-pick is ready to backported whenever :)
[09:31:35] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'sync'.
[09:31:39] <jnuche>	 phuedx: would you do the honors?
[09:32:01] <phuedx>	 jnuche: Can do
[09:32:08] <wikibugs>	 (03CR) 10Majavah: [C:04-1] "The monitoring rule will not work as is, otherwise this seems reasonable" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[09:32:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by phuedx@deploy2002 using scap backport" [extensions/MetricsPlatform] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211611 (https://phabricator.wikimedia.org/T409965) (owner: 10Phuedx)
[09:32:47] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
[09:33:20] <wikibugs>	 (03CR) 10FNegri: toolforge: add ingress for infra-tracing-loki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[09:33:21] <wikibugs>	 (03Merged) 10jenkins-bot: Hooks: Only add global logging context for pageviews [extensions/MetricsPlatform] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211611 (https://phabricator.wikimedia.org/T409965) (owner: 10Phuedx)
[09:33:33] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1181 (T410531)', diff saved to https://phabricator.wikimedia.org/P85693 and previous config saved to /var/cache/conftool/dbconfig/20251126-093332-marostegui.json
[09:33:38] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[09:33:49] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
[09:33:57] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1191 (T410531)', diff saved to https://phabricator.wikimedia.org/P85694 and previous config saved to /var/cache/conftool/dbconfig/20251126-093356-marostegui.json
[09:33:59] <logmsgbot>	 !log phuedx@deploy2002 Started scap sync-world: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]]
[09:34:08] <stashbot>	 T409965: Enable experiment enrollment in the MediaWiki Action API - https://phabricator.wikimedia.org/T409965
[09:34:08] <stashbot>	 T411074: context.ab_tests global logging context causing validation errors for the mediawiki.api_requests stream - https://phabricator.wikimedia.org/T411074
[09:34:58] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
[09:36:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T410531)', diff saved to https://phabricator.wikimedia.org/P85695 and previous config saved to /var/cache/conftool/dbconfig/20251126-093607-marostegui.json
[09:36:17] <logmsgbot>	 !log phuedx@deploy2002 phuedx: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:37:40] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
[09:38:28] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
[09:39:48] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook-next: register namespace in the ceph and cloudnative operators [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211595 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:41:26] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[09:41:32] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: cache-text: enable unidentified client rate limiting on one host [puppet] - 10https://gerrit.wikimedia.org/r/1211062
[09:42:01] <phuedx>	 Browsing the site looks OK. No new loglines in the logs. I ran a few test MediaWiki Action API queries
[09:42:06] <phuedx>	 And they looked OK too
[09:42:10] <logmsgbot>	 !log phuedx@deploy2002 phuedx: Continuing with sync
[09:42:28] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[09:44:14] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (owner: 10Giuseppe Lavagetto)
[09:47:08] <wikibugs>	 (03PS1) 10Brouberol: ferretdb-growthbook-next: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211613 (https://phabricator.wikimedia.org/T410999)
[09:47:27] <logmsgbot>	 !log phuedx@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]] (duration: 13m 29s)
[09:47:34] <stashbot>	 T409965: Enable experiment enrollment in the MediaWiki Action API - https://phabricator.wikimedia.org/T409965
[09:47:34] <stashbot>	 T411074: context.ab_tests global logging context causing validation errors for the mediawiki.api_requests stream - https://phabricator.wikimedia.org/T411074
[09:47:44] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] cache-text: enable unidentified client rate limiting on one host [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (owner: 10Giuseppe Lavagetto)
[09:47:56] <wikibugs>	 (03CR) 10Federico Ceratto: "This setup is needed at the moment to unblock progress on automation work, and it requires accessing only one flag not exposed otherwise b" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211165 (https://phabricator.wikimedia.org/T384810) (owner: 10Federico Ceratto)
[09:48:09] <phuedx>	 jnuche: I'll monitor the EventGate validation error logs for a while and report back
[09:48:22] <wikibugs>	 (03CR) 10Btullis: [C:03+1] ferretdb-growthbook-next: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211613 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:48:35] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
[09:48:47] <jnuche>	 phuedx: I can see the numbers already going down 🎉 Thanks for the fix, appreciated
[09:48:54] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook-next: configure ATS redirection and caching [puppet] - 10https://gerrit.wikimedia.org/r/1211046 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:49:15] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] ferretdb-growthbook-next: define helmfile and values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211613 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:49:25] <wikibugs>	 (03PS3) 10Fabfur: cache::text: enable unidentified client rate limiting on one host [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[09:50:25] <phuedx>	 jnuche: https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&from=2025-11-26T09:00:00.000Z&to=now&timezone=utc&var-service=eventgate-analytics&var-stream=$__all&var-kafka_broker=$__all&var-kafka_producer_type=$__all&var-dc=000000026&var-site=$__all&refresh=auto&viewPanel=panel-75
[09:50:45] <phuedx>	 Confirmed XD
[09:50:58] <jnuche>	 phuedx: nice :)
[09:51:02] <wikibugs>	 (03PS4) 10Fabfur: cache::text: enable unidentified client rate limiting on cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[09:51:05] <phuedx>	 I'll close out the blocking task
[09:51:12] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::text: enable unidentified client rate limiting on cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[09:51:15] <wikibugs>	 (03CR) 10Fabfur: [V:03+2 C:03+2] cache::text: enable unidentified client rate limiting on cp7001 [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[09:51:16] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P85696 and previous config saved to /var/cache/conftool/dbconfig/20251126-095115-marostegui.json
[09:52:32] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211062 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[09:53:07] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
[09:53:30] <wikibugs>	 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#11408657 (10jcrespo) > The only supported/working way is to stage the firmwares manually on the cumin nodes and use those :(  How?...
[09:53:54] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
[09:54:19] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove maintenance-log-readers [puppet] - 10https://gerrit.wikimedia.org/r/1211181
[09:55:47] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
[09:55:56] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
[09:56:46] <wikibugs>	 (03PS1) 10Brouberol: ferretdb-growthbook-next: tweak PG secret name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211616 (https://phabricator.wikimedia.org/T410999)
[09:57:36] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[09:57:57] <wikibugs>	 (03PS4) 10Majavah: Initial configuration for tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205954 (https://phabricator.wikimedia.org/T404457)
[09:57:57] <wikibugs>	 (03PS4) 10Majavah: Activate tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205955 (https://phabricator.wikimedia.org/T404457)
[09:57:58] <wikibugs>	 (03PS4) 10Majavah: Set up tokwiki namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457)
[09:57:58] <wikibugs>	 (03PS2) 10Majavah: Allow account creation on tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207262 (https://phabricator.wikimedia.org/T404457)
[09:59:23] <wikibugs>	 (03CR) 10Btullis: [C:03+1] ferretdb-growthbook-next: tweak PG secret name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211616 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[09:59:42] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[10:01:13] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Access to logstash for OKryva-WMF - https://phabricator.wikimedia.org/T410115#11408684 (10Aklapper) @OKryva-WMF: Could you please answer the last comment? Thanks in advance!
[10:04:39] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] ferretdb-growthbook-next: tweak PG secret name [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211616 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[10:04:42] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
[10:05:16] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
[10:06:23] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P85697 and previous config saved to /var/cache/conftool/dbconfig/20251126-100623-marostegui.json
[10:07:22] <wikibugs>	 (03PS1) 10Vgutierrez: cache::text: Include HEAD requests on global unauth ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545)
[10:08:07] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[10:09:01] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[10:10:25] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[10:11:21] <wikibugs>	 (03PS1) 10Muehlenhoff: maps::osm_replica: Explicitly pass the replication password [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565)
[10:13:45] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] START helmfile.d/services/kartotherian: sync
[10:14:03] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] DONE helmfile.d/services/kartotherian: sync
[10:14:34] <wikibugs>	 (03PS3) 10Slyngshede: C:varnish [puppet] - 10https://gerrit.wikimedia.org/r/1210600
[10:17:19] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:17:51] <wikibugs>	 (03PS4) 10Slyngshede: C:varnish [puppet] - 10https://gerrit.wikimedia.org/r/1210600
[10:18:59] <wikibugs>	 10SRE-SLO: Sloth: adapt default month view to quarter view - https://phabricator.wikimedia.org/T409312#11408695 (10tappof) Just updated the dashboard: https://grafana.wikimedia.org/goto/PzmXbiWvg?orgId=1  Quarter Error Budget Burn Rate:  * Use timestamps (which always increase and never reset) to define the time...
[10:19:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:19:49] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Enable imports on maps-test2001 [puppet] - 10https://gerrit.wikimedia.org/r/1210587 (https://phabricator.wikimedia.org/T409528) (owner: 10Muehlenhoff)
[10:20:56] <wikibugs>	 (03CR) 10Elukey: [C:03+1] maps::osm_replica: Explicitly pass the replication password [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:21:31] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T410531)', diff saved to https://phabricator.wikimedia.org/P85698 and previous config saved to /var/cache/conftool/dbconfig/20251126-102130-marostegui.json
[10:21:36] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[10:21:39] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "varnishtests are happy" [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[10:21:46] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
[10:21:54] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1194 (T410531)', diff saved to https://phabricator.wikimedia.org/P85699 and previous config saved to /var/cache/conftool/dbconfig/20251126-102153-marostegui.json
[10:24:06] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T410531)', diff saved to https://phabricator.wikimedia.org/P85700 and previous config saved to /var/cache/conftool/dbconfig/20251126-102405-marostegui.json
[10:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[10:24:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:26:34] <wikibugs>	 (03PS1) 10Brouberol: growthbook-next: add missing ingress certificate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211623 (https://phabricator.wikimedia.org/T410999)
[10:27:19] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook-next: add missing ingress certificate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211623 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[10:30:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good (I have no insight on the Redfish code for the actual powercycle, but that part was/is essentially just moved around anyway)" [cookbooks] - 10https://gerrit.wikimedia.org/r/1198928 (owner: 10Elukey)
[10:31:41] <wikibugs>	 06SRE, 06Wikimedia Enterprise: Provide auth-less access to Enterprise APIs from WMF Analytics cluster - https://phabricator.wikimedia.org/T403298#11408766 (10awight)
[10:32:42] <wikibugs>	 (03CR) 10Muehlenhoff: Add a staging-specific stream for Maps tiles change (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[10:34:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Enable imports on maps-test2001 [puppet] - 10https://gerrit.wikimedia.org/r/1210587 (https://phabricator.wikimedia.org/T409528) (owner: 10Muehlenhoff)
[10:34:13] <wikibugs>	 10ops-eqiad, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082 (10elukey) 03NEW
[10:34:14] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[10:34:47] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook-next: add missing ingress certificate [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211623 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[10:35:22] <wikibugs>	 (03PS2) 10Elukey: Add a staging-specific stream for Maps tiles change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528)
[10:35:33] <wikibugs>	 (03CR) 10Elukey: Add a staging-specific stream for Maps tiles change (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[10:35:56] <wikibugs>	 (03PS2) 10Vgutierrez: cache::text: Include HEAD requests on global unauth ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545)
[10:36:25] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[10:36:38] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[10:36:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:37:25] <wikibugs>	 (03CR) 10Brouberol: "`" [puppet] - 10https://gerrit.wikimedia.org/r/1211046 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[10:37:28] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook-next: configure ATS redirection and caching [puppet] - 10https://gerrit.wikimedia.org/r/1211046 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[10:38:22] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] cache::text: Include HEAD requests on global unauth ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[10:39:13] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P85701 and previous config saved to /var/cache/conftool/dbconfig/20251126-103913-marostegui.json
[10:40:55] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] cache::text: Include HEAD requests on global unauth ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[10:41:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:42:39] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] START helmfile.d/services/thumbor: sync
[10:42:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:42:55] <logmsgbot>	 !log hnowlan@deploy2002 helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
[10:43:31] <wikibugs>	 (03PS1) 10Muehlenhoff: Update secrets for tilerator->tegola rename [labs/private] - 10https://gerrit.wikimedia.org/r/1211625 (https://phabricator.wikimedia.org/T381565)
[10:47:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[10:48:01] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Update secrets for tilerator->tegola rename [labs/private] - 10https://gerrit.wikimedia.org/r/1211625 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:50:11] <wikibugs>	 (03PS2) 10Muehlenhoff: maps::osm_replica: Explicitly pass the replication password [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565)
[10:50:55] <wikibugs>	 (03PS5) 10Slyngshede: C:varnish [puppet] - 10https://gerrit.wikimedia.org/r/1210600
[10:51:20] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[10:52:02] <wikibugs>	 (03PS2) 10Volans: toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313)
[10:52:02] <wikibugs>	 (03PS1) 10Volans: prometheus: blackbox check http skip tls verify [puppet] - 10https://gerrit.wikimedia.org/r/1211628
[10:54:07] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: ml-services: Update image and set topic filtering env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211629 (https://phabricator.wikimedia.org/T408538)
[10:54:21] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P85702 and previous config saved to /var/cache/conftool/dbconfig/20251126-105420-marostegui.json
[10:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[10:57:02] <wikibugs>	 (03PS1) 10Elukey: services: set new caching and kafka configuration for Tegola staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211631 (https://phabricator.wikimedia.org/T409528)
[10:58:00] <wikibugs>	 (03PS5) 10Itamar Givon: Report integrity metric from Wikidata dump scripts [dumps] - 10https://gerrit.wikimedia.org/r/1203410 (https://phabricator.wikimedia.org/T403482) (owner: 10Silvan Heintze)
[10:58:07] <wikibugs>	 (03PS6) 10Slyngshede: C:varnish::common::errorpage update 404 error message [puppet] - 10https://gerrit.wikimedia.org/r/1210600 (https://phabricator.wikimedia.org/T381232)
[10:59:23] <wikibugs>	 (03PS3) 10Volans: toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313)
[11:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1100)
[11:00:53] <wikibugs>	 (03CR) 10Volans: "Addressed comments" [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[11:02:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211631 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[11:02:25] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] ml-services: Update image and set topic filtering env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211629 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[11:02:56] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+2] ml-services: Update image and set topic filtering env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211629 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[11:04:52] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] cache::text: Include HEAD requests on global unauth ratelimits [puppet] - 10https://gerrit.wikimedia.org/r/1211620 (https://phabricator.wikimedia.org/T406545) (owner: 10Vgutierrez)
[11:05:11] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: Update image and set topic filtering env vars. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211629 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[11:06:25] <logmsgbot>	 !log bwojtowicz@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
[11:07:06] <wikibugs>	 (03PS1) 10Aqu: Add Spurus connection configuration with proxy settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211634 (https://phabricator.wikimedia.org/T410285)
[11:09:26] <logmsgbot>	 !log bwojtowicz@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
[11:09:28] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T410531)', diff saved to https://phabricator.wikimedia.org/P85704 and previous config saved to /var/cache/conftool/dbconfig/20251126-110928-marostegui.json
[11:09:35] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[11:09:44] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
[11:09:52] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1202 (T410531)', diff saved to https://phabricator.wikimedia.org/P85705 and previous config saved to /var/cache/conftool/dbconfig/20251126-110951-marostegui.json
[11:10:47] <logmsgbot>	 !log bwojtowicz@deploy2002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
[11:12:03] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T410531)', diff saved to https://phabricator.wikimedia.org/P85706 and previous config saved to /var/cache/conftool/dbconfig/20251126-111203-marostegui.json
[11:12:35] <logmsgbot>	 !log jynus@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2014.codfw.wmnet with reason: upgrade and restart
[11:16:13] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] maps::osm_replica: Explicitly pass the replication password [puppet] - 10https://gerrit.wikimedia.org/r/1211621 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff)
[11:16:55] <wikibugs>	 (03CR) 10FNegri: toolforge: add ingress for infra-tracing-loki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[11:20:15] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
[11:20:21] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
[11:21:06] <icinga-wm>	 PROBLEM - Host cirrussearch2093 is DOWN: PING CRITICAL - Packet loss = 100%
[11:24:24] <logmsgbot>	 !log jynus@cumin2002 dbctl commit (dc=all): 'Depool db2166, perf issue', diff saved to https://phabricator.wikimedia.org/P85708 and previous config saved to /var/cache/conftool/dbconfig/20251126-112422-jynus.json
[11:24:56] <jynus>	 ^ marostegui federico3
[11:25:57] <federico3>	 looking
[11:26:32] <logmsgbot>	 !log ryankemper@cumin2002 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot (apply updates) - ryankemper@cumin2002 - T410573
[11:26:37] <stashbot>	 T410573: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573
[11:27:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P85709 and previous config saved to /var/cache/conftool/dbconfig/20251126-112710-marostegui.json
[11:27:14] <wikibugs>	 (03CR) 10Majavah: toolforge: add ingress for infra-tracing-loki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[11:28:32] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 3 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11408959 (10Clement_Goubert) >>! In T411066#11408456, @aaron wrote: > Maybe related to https://gerrit...
[11:29:19] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 4 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11408962 (10Clement_Goubert) p:05Triage→03High a:03Clement_Goubert
[11:31:06] <wikibugs>	 (03PS4) 10Volans: toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313)
[11:31:06] <wikibugs>	 (03CR) 10Volans: toolforge: add ingress for infra-tracing-loki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[11:33:15] <icinga-wm>	 RECOVERY - Host cirrussearch2093 is UP: PING OK - Packet loss = 0%, RTA = 30.43 ms
[11:33:45] <wikibugs>	 (03PS2) 10Volans: prometheus: blackbox check http skip tls verify [puppet] - 10https://gerrit.wikimedia.org/r/1211628
[11:33:46] <wikibugs>	 (03PS5) 10Volans: toolforge: add ingress for infra-tracing-loki [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313)
[11:33:49] <moritzm>	 !log installing libxslt security updates
[11:33:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:20] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: ml-services: Separate eqiad and codfw deployments for Revise Tone. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538)
[11:37:07] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
[11:37:11] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "I went through the whole script line-by-line and it makes sense, I just left some comments related to optional nits that you are free to s" [puppet] - 10https://gerrit.wikimedia.org/r/1205197 (https://phabricator.wikimedia.org/T376949) (owner: 10JHathaway)
[11:38:20] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: cache-text: enable auth, bot rate limiting in magru [puppet] - 10https://gerrit.wikimedia.org/r/1211063
[11:38:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: push_cross_cluster_settings_9200.service on cirrussearch2093:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:38:59] <wikibugs>	 (03PS3) 10Vgutierrez: cache::text: enable auth, bot rate limiting in magru [puppet] - 10https://gerrit.wikimedia.org/r/1211063 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[11:39:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
[11:39:19] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211063 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[11:39:24] <wikibugs>	 (03PS1) 10Clément Goubert: trafficserver::backend: Fix www-less wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1211641 (https://phabricator.wikimedia.org/T411066)
[11:40:18] <wikibugs>	 (03CR) 10Volans: "PCC is a noop as expected on a couple of random hosts:" [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[11:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:41:53] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2093 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:42:19] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P85710 and previous config saved to /var/cache/conftool/dbconfig/20251126-114218-marostegui.json
[11:43:53] <wikibugs>	 06SRE, 10MinT, 10Prod-Kubernetes, 06serviceops: machinetranslation eqiad pods in state ContainerStatusUnknown - https://phabricator.wikimedia.org/T411058#11409010 (10KartikMistry) @RLazarus We deployed MinT lastly on 06 Nov with a37ece7cde26383bba8b3f22519635f3e3b95da5. Is it possible that resource allocat...
[11:51:38] <wikibugs>	 (03PS6) 10Itamar Givon: Report integrity metric from Wikidata dump scripts [dumps] - 10https://gerrit.wikimedia.org/r/1203410 (https://phabricator.wikimedia.org/T403482) (owner: 10Silvan Heintze)
[11:51:54] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2093 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[11:52:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
[11:53:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: push_cross_cluster_settings_9200.service on cirrussearch2093:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:53:38] <wikibugs>	 (03CR) 10Fabfur: [C:03+1] "lgtm, merging instead of @joe" [puppet] - 10https://gerrit.wikimedia.org/r/1211063 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[11:53:40] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::text: enable auth, bot rate limiting in magru [puppet] - 10https://gerrit.wikimedia.org/r/1211063 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[11:54:03] <wikibugs>	 (03CR) 10Silvan Heintze: [C:03+1] "LGTM, +1 to the entire chain" [dumps] - 10https://gerrit.wikimedia.org/r/1203410 (https://phabricator.wikimedia.org/T403482) (owner: 10Silvan Heintze)
[11:54:04] <wikibugs>	 (03PS11) 10Pmiazga: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578)
[11:54:42] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
[11:56:33] <wikibugs>	 (03PS4) 10Clément Goubert: Cleanup redundant lint-related rest gateway routing config [puppet] - 10https://gerrit.wikimedia.org/r/1210631 (owner: 10Aaron Schulz)
[11:56:46] <wikibugs>	 (03PS5) 10Clément Goubert: Cleanup redundant lint-related rest gateway routing config [puppet] - 10https://gerrit.wikimedia.org/r/1210631 (owner: 10Aaron Schulz)
[11:57:23] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: cache-text: enable auth, bot rate-limiting on drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1211064
[11:57:27] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T410531)', diff saved to https://phabricator.wikimedia.org/P85711 and previous config saved to /var/cache/conftool/dbconfig/20251126-115726-marostegui.json
[11:57:32] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
[11:57:32] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[11:57:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1227 (T410531)', diff saved to https://phabricator.wikimedia.org/P85712 and previous config saved to /var/cache/conftool/dbconfig/20251126-115739-marostegui.json
[11:58:09] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] trafficserver::backend: Fix www-less wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1211641 (https://phabricator.wikimedia.org/T411066) (owner: 10Clément Goubert)
[11:58:40] <wikibugs>	 (03PS3) 10Fabfur: cache::text: enable auth, bot rate-limiting on drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1211064 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[11:59:05] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
[11:59:37] <jinxer-wm>	 FIRING: [3x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[12:00:04] <jouncebot>	 mvolz: Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1200). Please do the needful.
[12:01:23] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
[12:02:04] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
[12:02:40] <wikibugs>	 (03CR) 10Mvolz: [C:03+2] citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211067 (owner: 10PipelineBot)
[12:02:53] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T410531)', diff saved to https://phabricator.wikimedia.org/P85713 and previous config saved to /var/cache/conftool/dbconfig/20251126-120252-marostegui.json
[12:02:58] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[12:03:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] prometheus: blackbox check http skip tls verify [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[12:04:25] <wikibugs>	 (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211067 (owner: 10PipelineBot)
[12:06:30] <claime>	 !log Starting kafka-main rebalance with 30MB/s throttle - T407185
[12:06:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:47] <stashbot>	 T407185: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185
[12:07:37] <wikibugs>	 (03PS1) 10Majavah: wmflib: hosts2ips: Allow passing in IP ranges [puppet] - 10https://gerrit.wikimedia.org/r/1211650
[12:07:37] <wikibugs>	 (03PS1) 10Majavah: firewall: Use exported resources to fix ordering issues [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089)
[12:07:39] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089)
[12:07:54] <wikibugs>	 (03PS1) 10Muehlenhoff: Allow smartctl for datacenter-ops [puppet] - 10https://gerrit.wikimedia.org/r/1211653 (https://phabricator.wikimedia.org/T395939)
[12:08:05] <wikibugs>	 (03CR) 10Majavah: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211651 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[12:09:34] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply
[12:09:55] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] trafficserver::backend: Fix www-less wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1211641 (https://phabricator.wikimedia.org/T411066) (owner: 10Clément Goubert)
[12:09:58] <logmsgbot>	 !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply
[12:10:23] <wikibugs>	 (03PS5) 10Hnowlan: svg: refuse to generate SVGs larger than a particular size [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1211630 (https://phabricator.wikimedia.org/T411076)
[12:10:34] <logmsgbot>	 !log root@cumin2002 DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for backup2014.codfw.wmnet: Renew puppet certificate - root@cumin2002
[12:10:42] <wikibugs>	 (03PS1) 10AikoChou: changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538)
[12:10:46] <wikibugs>	 (03CR) 10CI reject: [V:04-1] P:wmcs::instance: Convert to firewall wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1211652 (https://phabricator.wikimedia.org/T411089) (owner: 10Majavah)
[12:11:14] <wikibugs>	 (03PS1) 10Daniel Kinzler: api-gateway chart: add values-rest-staging.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211656
[12:12:03] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211064 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[12:12:37] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply
[12:12:57] <wikibugs>	 (03PS12) 10Pmiazga: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578)
[12:13:36] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] "Seems reasonable." [puppet] - 10https://gerrit.wikimedia.org/r/1211653 (https://phabricator.wikimedia.org/T395939) (owner: 10Muehlenhoff)
[12:15:09] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
[12:16:10] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Enable hCaptcha editing in 100% passive mode on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211658 (https://phabricator.wikimedia.org/T405586)
[12:16:56] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Switch frwiki to 99.9% passive mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211659 (https://phabricator.wikimedia.org/T405586)
[12:17:36] <wikibugs>	 (03PS13) 10Pmiazga: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578)
[12:17:38] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Switch enwiki to 99.9% passive mode [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211660 (https://phabricator.wikimedia.org/T405586)
[12:18:00] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P85716 and previous config saved to /var/cache/conftool/dbconfig/20251126-121759-marostegui.json
[12:19:03] <jinxer-wm>	 FIRING: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=main-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[12:20:17] <wikibugs>	 (03CR) 10Dreamy Jazz: [C:03+1] Set $wgGlobalBlockingAutoblockExemptions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204571 (https://phabricator.wikimedia.org/T409915) (owner: 10Majavah)
[12:20:24] <logmsgbot>	 !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply
[12:20:24] <Dreamy_Jazz>	 jouncebot: nowandnext
[12:20:25] <jouncebot>	 For the next 0 hour(s) and 39 minute(s): Services – Citoid / Zotero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1200)
[12:20:25] <jouncebot>	 In 1 hour(s) and 39 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1400)
[12:20:52] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204571 (https://phabricator.wikimedia.org/T409915) (owner: 10Majavah)
[12:21:35] <claime>	 The kafka alert is expected
[12:21:56] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[12:22:03] <claime>	 It's a byproduct of the rebalance, it will subside once it is done (I estimate about 3h or so)
[12:22:25] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[12:26:03] <wikibugs>	 (03PS14) 10Pmiazga: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578)
[12:26:07] <wikibugs>	 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, 06Traffic: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062#11409175 (10Vgutierrez) We are now rate-limiting non thumbnail steps requests for cache misses when certain X-Is-Browser thresholds are met
[12:27:03] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s3 T411088', diff saved to https://phabricator.wikimedia.org/P85717 and previous config saved to /var/cache/conftool/dbconfig/20251126-122703-marostegui.json
[12:27:09] <stashbot>	 T411088: Clean up groups config - https://phabricator.wikimedia.org/T411088
[12:27:13] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.network.peering with action 'configure' for AS: 29357
[12:27:36] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
[12:27:51] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
[12:29:02] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
[12:31:14] <wikibugs>	 (03PS1) 10Ladsgroup: Clean up db groups config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211664 (https://phabricator.wikimedia.org/T411088)
[12:31:31] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s1 T411088', diff saved to https://phabricator.wikimedia.org/P85719 and previous config saved to /var/cache/conftool/dbconfig/20251126-123131-marostegui.json
[12:32:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Clean up db groups config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211664 (https://phabricator.wikimedia.org/T411088) (owner: 10Ladsgroup)
[12:33:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P85720 and previous config saved to /var/cache/conftool/dbconfig/20251126-123307-marostegui.json
[12:33:49] <wikibugs>	 (03PS2) 10Ladsgroup: Clean up db groups config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211664 (https://phabricator.wikimedia.org/T411088)
[12:34:11] <wikibugs>	 (03CR) 10Gmodena: [C:03+2] Add a staging-specific stream for Maps tiles change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[12:34:29] <wikibugs>	 (03CR) 10Gmodena: [C:03+1] Add a staging-specific stream for Maps tiles change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[12:35:30] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply
[12:35:31] <wikibugs>	 06SRE, 06Abstract Wikipedia team, 10MediaWiki-Action-API, 06MW-Interfaces-Team, and 4 others: wikifunctions.org API no longer works via that URL (without 'www.') - https://phabricator.wikimedia.org/T411066#11409199 (10Clement_Goubert) 05Open→03Resolved Deployed and tested quickly, looks like it's f...
[12:35:36] <logmsgbot>	 !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply
[12:36:27] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on ganeti1039 - https://phabricator.wikimedia.org/T410743#11409206 (10MoritzMuehlenhoff) dmesg is full of I/O errors for dev/sdb, we should definitely get that drive replaced.
[12:38:30] <logmsgbot>	 !log root@cumin2002 DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for backup2014.codfw.wmnet: Renew puppet certificate - root@cumin2002
[12:39:35] <wikibugs>	 (03Abandoned) 10Bartosz Wójtowicz: ml-services: Add CIDRs enabling pod-to-pod communication. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1207785 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[12:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[12:43:23] <logmsgbot>	 !log root@cumin2002 DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for backup2014.codfw.wmnet: Renew puppet certificate - root@cumin2002
[12:43:28] <wikibugs>	 (03PS1) 10Kevin Bazira: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211665 (https://phabricator.wikimedia.org/T410906)
[12:44:42] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s2 T411088', diff saved to https://phabricator.wikimedia.org/P85721 and previous config saved to /var/cache/conftool/dbconfig/20251126-124441-marostegui.json
[12:44:47] <stashbot>	 T411088: Clean up groups config - https://phabricator.wikimedia.org/T411088
[12:45:43] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
[12:45:59] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on ganeti1039 - https://phabricator.wikimedia.org/T410743#11409240 (10ops-monitoring-bot) Draining ganeti1039.eqiad.wmnet of running VMs
[12:46:10] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s4 T411088', diff saved to https://phabricator.wikimedia.org/P85722 and previous config saved to /var/cache/conftool/dbconfig/20251126-124609-marostegui.json
[12:46:44] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+1] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211665 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[12:47:01] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "Looks ok to me, but I'll be honest some of the puppetcode is a little complex for me.  Perhaps get Moritz's view on it?" [puppet] - 10https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726) (owner: 10JHathaway)
[12:47:35] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
[12:48:15] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1227 (T410531)', diff saved to https://phabricator.wikimedia.org/P85723 and previous config saved to /var/cache/conftool/dbconfig/20251126-124815-marostegui.json
[12:48:20] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[12:48:31] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
[12:48:35] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
[12:48:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1231 (T410531)', diff saved to https://phabricator.wikimedia.org/P85724 and previous config saved to /var/cache/conftool/dbconfig/20251126-124838-marostegui.json
[12:48:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on ganeti1039 - https://phabricator.wikimedia.org/T410743#11409250 (10ops-monitoring-bot) Draining ganeti1039.eqiad.wmnet of running VMs
[12:49:57] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211665 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[12:50:50] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T410531)', diff saved to https://phabricator.wikimedia.org/P85725 and previous config saved to /var/cache/conftool/dbconfig/20251126-125049-marostegui.json
[12:51:41] <wikibugs>	 (03Merged) 10jenkins-bot: ml-services: update llm model-server image [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211665 (https://phabricator.wikimedia.org/T410906) (owner: 10Kevin Bazira)
[12:51:50] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
[12:51:56] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
[12:52:01] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add Spurus connection configuration with proxy settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211634 (https://phabricator.wikimedia.org/T410285) (owner: 10Aqu)
[12:52:20] <wikibugs>	 (03PS2) 10Cathal Mooney: gNMI collect more metrics [puppet] - 10https://gerrit.wikimedia.org/r/1180101 (https://phabricator.wikimedia.org/T395998) (owner: 10Ayounsi)
[12:52:21] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Add Spurus connection configuration with proxy settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211634 (https://phabricator.wikimedia.org/T410285) (owner: 10Aqu)
[12:53:04] <logmsgbot>	 !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
[12:54:27] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] gNMI collect more metrics [puppet] - 10https://gerrit.wikimedia.org/r/1180101 (https://phabricator.wikimedia.org/T395998) (owner: 10Ayounsi)
[12:55:40] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] gNMI collect more metrics [puppet] - 10https://gerrit.wikimedia.org/r/1180101 (https://phabricator.wikimedia.org/T395998) (owner: 10Ayounsi)
[12:56:55] <wikibugs>	 (03CR) 10Klausman: "This should work fine. I'll see if I can factor out a few bits into `values.yaml` later." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[12:57:32] <wikibugs>	 (03PS13) 10Btullis: Add a new spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833)
[12:57:42] <wikibugs>	 (03PS18) 10Btullis: Add helmfile deployments of the spark-support chart to our two test namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833)
[13:00:31] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM thanks!  We can discuss if it's wise to merge now or we need extra tests first." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1211268 (https://phabricator.wikimedia.org/T409286) (owner: 10JHathaway)
[13:02:03] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s5 T411088', diff saved to https://phabricator.wikimedia.org/P85726 and previous config saved to /var/cache/conftool/dbconfig/20251126-130202-marostegui.json
[13:02:08] <stashbot>	 T411088: Clean up groups config - https://phabricator.wikimedia.org/T411088
[13:02:21] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s6 T411088', diff saved to https://phabricator.wikimedia.org/P85727 and previous config saved to /var/cache/conftool/dbconfig/20251126-130220-marostegui.json
[13:02:38] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s7 T411088', diff saved to https://phabricator.wikimedia.org/P85728 and previous config saved to /var/cache/conftool/dbconfig/20251126-130237-marostegui.json
[13:02:56] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s8 T411088', diff saved to https://phabricator.wikimedia.org/P85729 and previous config saved to /var/cache/conftool/dbconfig/20251126-130255-marostegui.json
[13:03:29] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "That's awesome Jesse thanks so much for working on this." [cookbooks] - 10https://gerrit.wikimedia.org/r/1211269 (https://phabricator.wikimedia.org/T409286) (owner: 10JHathaway)
[13:04:26] <wikibugs>	 (03CR) 10Btullis: Add a new spark-support chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[13:06:21] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s3 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85730 and previous config saved to /var/cache/conftool/dbconfig/20251126-130620-marostegui.json
[13:06:26] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[13:06:31] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P85731 and previous config saved to /var/cache/conftool/dbconfig/20251126-130630-marostegui.json
[13:07:58] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s1 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85733 and previous config saved to /var/cache/conftool/dbconfig/20251126-130757-marostegui.json
[13:08:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on ganeti1039 - https://phabricator.wikimedia.org/T410743#11409319 (10MoritzMuehlenhoff) Copied the output of dmesg to this paste in case it's needed for the warranty case: https://phabricator.wikimedia.org/P85732
[13:08:56] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s2 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85734 and previous config saved to /var/cache/conftool/dbconfig/20251126-130856-marostegui.json
[13:10:19] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s4 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85735 and previous config saved to /var/cache/conftool/dbconfig/20251126-131018-marostegui.json
[13:11:01] <logmsgbot>	 !log fceratto@cumin1003 START - Cookbook sre.mysql.pool db2166 gradually with 4 steps - Repooling
[13:11:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s5 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85736 and previous config saved to /var/cache/conftool/dbconfig/20251126-131110-marostegui.json
[13:12:53] <wikibugs>	 (03PS14) 10Majavah: P:wmcs::cloudgw: Use interface::route wrapper [puppet] - 10https://gerrit.wikimedia.org/r/1196367
[13:12:53] <wikibugs>	 (03PS4) 10Majavah: P:wmcs::cloudgw: Remove absented resources [puppet] - 10https://gerrit.wikimedia.org/r/1211012
[13:12:53] <wikibugs>	 (03PS1) 10Majavah: hieradata: cloudgw: Move shared data to role file [puppet] - 10https://gerrit.wikimedia.org/r/1211666
[13:12:54] <wikibugs>	 (03PS1) 10Majavah: hieradata: cloudgw: Configure individual v6 networks [puppet] - 10https://gerrit.wikimedia.org/r/1211667 (https://phabricator.wikimedia.org/T411081)
[13:13:05] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s6 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85738 and previous config saved to /var/cache/conftool/dbconfig/20251126-131304-marostegui.json
[13:13:10] <stashbot>	 T408663: Unify weights on hosts that are not in vslow/dumps - https://phabricator.wikimedia.org/T408663
[13:13:30] <wikibugs>	 (03PS1) 10Brouberol: Enable traffic from dse kubepods analytics-test hive/presto [puppet] - 10https://gerrit.wikimedia.org/r/1211669 (https://phabricator.wikimedia.org/T410999)
[13:15:13] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in s7 and s8 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85739 and previous config saved to /var/cache/conftool/dbconfig/20251126-131512-marostegui.json
[13:16:04] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7762/co" [puppet] - 10https://gerrit.wikimedia.org/r/1196367 (owner: 10Majavah)
[13:16:07] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Unify weights in x3 codfw T408663', diff saved to https://phabricator.wikimedia.org/P85740 and previous config saved to /var/cache/conftool/dbconfig/20251126-131606-marostegui.json
[13:18:04] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s3 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85741 and previous config saved to /var/cache/conftool/dbconfig/20251126-131803-marostegui.json
[13:18:05] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "I'm giving up on properly testing this in pontoon thanks to the network driver differences (systemd-networkd vs ifupdown), but the PCC loo" [puppet] - 10https://gerrit.wikimedia.org/r/1196367 (owner: 10Majavah)
[13:18:09] <stashbot>	 T411088: Clean up groups config - https://phabricator.wikimedia.org/T411088
[13:18:22] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s1 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85742 and previous config saved to /var/cache/conftool/dbconfig/20251126-131822-marostegui.json
[13:18:45] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s2 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85743 and previous config saved to /var/cache/conftool/dbconfig/20251126-131844-marostegui.json
[13:19:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409367 (10Jclark-ctr) @DPogorzelski-WMF @klausman @elukey is anyone available this morning for me to remove gpu’s?
[13:19:27] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s4 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85744 and previous config saved to /var/cache/conftool/dbconfig/20251126-131926-marostegui.json
[13:19:46] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s5 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85745 and previous config saved to /var/cache/conftool/dbconfig/20251126-131945-marostegui.json
[13:20:07] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s6 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85746 and previous config saved to /var/cache/conftool/dbconfig/20251126-132006-marostegui.json
[13:20:23] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s7 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85747 and previous config saved to /var/cache/conftool/dbconfig/20251126-132023-marostegui.json
[13:20:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove vslow/dump from s8 codfw T411088', diff saved to https://phabricator.wikimedia.org/P85748 and previous config saved to /var/cache/conftool/dbconfig/20251126-132039-marostegui.json
[13:21:35] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on bast2003 - https://phabricator.wikimedia.org/T410195#11409384 (10MoritzMuehlenhoff) >>! In T410195#11380368, @Jhancock.wm wrote: > Is this a false alert? I'm not seeing any issues physically with the server or in the idrac.  >  > If this drive does need to be re...
[13:21:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P85749 and previous config saved to /var/cache/conftool/dbconfig/20251126-132138-marostegui.json
[13:22:07] <wikibugs>	 (03CR) 10Brouberol: Add helmfile deployments of the spark-support chart to our two test namespaces (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[13:23:47] <wikibugs>	 (03PS2) 10Brouberol: Enable traffic from dse kubepods analytics-test hive/presto [puppet] - 10https://gerrit.wikimedia.org/r/1211669 (https://phabricator.wikimedia.org/T410999)
[13:23:48] <wikibugs>	 (03CR) 10Brouberol: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211669 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[13:23:57] <Dreamy_Jazz>	 jouncebot: nowandnext
[13:23:57] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 36 minute(s)
[13:23:58] <jouncebot>	 In 0 hour(s) and 36 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1400)
[13:25:01] <Dreamy_Jazz>	 Starting my backport in the backport window early
[13:25:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204571 (https://phabricator.wikimedia.org/T409915) (owner: 10Majavah)
[13:25:43] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers titan1001.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers titan1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:26:04] <wikibugs>	 (03Merged) 10jenkins-bot: Set $wgGlobalBlockingAutoblockExemptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1204571 (https://phabricator.wikimedia.org/T409915) (owner: 10Majavah)
[13:26:24] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
[13:26:37] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1204571|Set $wgGlobalBlockingAutoblockExemptions (T409915)]]
[13:26:42] <stashbot>	 T409915: GlobalBlocking: Global autoblocking exemption list should allow WMF config to define exemptions - https://phabricator.wikimedia.org/T409915
[13:26:44] <logmsgbot>	 !log dreamyjazz@deploy2002 sync-world failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'mergeMessageFileList.php', '--wiki=aawiki', '--force-version', '1.46.0-wmf.3', '--list-file', '/srv/mediawiki-staging/wmf-config/extension-list', '--output', '/tmp/tmp.S3QSelNe06']' returne
[13:26:44] <logmsgbot>	 d non-zero exit status 255. (scap version: 4.228.0) (duration: 00m 07s)
[13:27:11] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
[13:27:43] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:29:40] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409407 (10Jclark-ctr) a:03Jclark-ctr
[13:29:52] <wikibugs>	 (03PS1) 10Dreamy Jazz: Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211671 (https://phabricator.wikimedia.org/T409915)
[13:30:24] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211671 (https://phabricator.wikimedia.org/T409915) (owner: 10Dreamy Jazz)
[13:30:48] <wikibugs>	 (03CR) 10Btullis: [C:03+1] Enable traffic from dse kubepods analytics-test hive/presto [puppet] - 10https://gerrit.wikimedia.org/r/1211669 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[13:31:02] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Enable traffic from dse kubepods analytics-test hive/presto [puppet] - 10https://gerrit.wikimedia.org/r/1211669 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[13:31:19] <wikibugs>	 (03Merged) 10jenkins-bot: Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211671 (https://phabricator.wikimedia.org/T409915) (owner: 10Dreamy Jazz)
[13:31:51] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1204571|Set $wgGlobalBlockingAutoblockExemptions (T409915)]], [[gerrit:1211671|Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used (T409915)]]
[13:31:56] <stashbot>	 T409915: GlobalBlocking: Global autoblocking exemption list should allow WMF config to define exemptions - https://phabricator.wikimedia.org/T409915
[13:31:58] <logmsgbot>	 !log dreamyjazz@deploy2002 sync-world failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'mergeMessageFileList.php', '--wiki=aawiki', '--force-version', '1.46.0-wmf.3', '--list-file', '/srv/mediawiki-staging/wmf-config/extension-list', '--output', '/tmp/tmp.0iG7i2ezfh']' returne
[13:31:58] <logmsgbot>	 d non-zero exit status 255. (scap version: 4.228.0) (duration: 00m 07s)
[13:33:17] <wikibugs>	 (03PS1) 10Dreamy Jazz: Follow-up: Set $wgGlobalBlockingAutoblockExemptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211673 (https://phabricator.wikimedia.org/T409915)
[13:33:23] <wikibugs>	 (03PS2) 10AikoChou: changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538)
[13:33:48] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211673 (https://phabricator.wikimedia.org/T409915) (owner: 10Dreamy Jazz)
[13:34:35] <wikibugs>	 (03Merged) 10jenkins-bot: Follow-up: Set $wgGlobalBlockingAutoblockExemptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211673 (https://phabricator.wikimedia.org/T409915) (owner: 10Dreamy Jazz)
[13:35:08] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1211671|Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used (T409915)]], [[gerrit:1204571|Set $wgGlobalBlockingAutoblockExemptions (T409915)]], [[gerrit:1211673|Follow-up: Set $wgGlobalBlockingAutoblockExemptions (T409915)]]
[13:36:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409426 (10Jclark-ctr) Additionally, this will leave four Radeon PRO WX 9100 GPUs in storage. Should we consider selling them if they’re no longer well supported?
[13:36:46] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1231 (T410531)', diff saved to https://phabricator.wikimedia.org/P85751 and previous config saved to /var/cache/conftool/dbconfig/20251126-133645-marostegui.json
[13:36:52] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[13:37:02] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1253.eqiad.wmnet with reason: Maintenance
[13:37:10] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1253 (T410531)', diff saved to https://phabricator.wikimedia.org/P85752 and previous config saved to /var/cache/conftool/dbconfig/20251126-133709-marostegui.json
[13:37:21] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz, taavi: Backport for [[gerrit:1211671|Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used (T409915)]], [[gerrit:1204571|Set $wgGlobalBlockingAutoblockExemptions (T409915)]], [[gerrit:1211673|Follow-up: Set $wgGlobalBlockingAutoblockExemptions (T409915)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[13:37:27] <stashbot>	 T409915: GlobalBlocking: Global autoblocking exemption list should allow WMF config to define exemptions - https://phabricator.wikimedia.org/T409915
[13:37:54] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz, taavi: Continuing with sync
[13:38:17] <wikibugs>	 (03CR) 10Volans: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[13:39:22] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253 (T410531)', diff saved to https://phabricator.wikimedia.org/P85753 and previous config saved to /var/cache/conftool/dbconfig/20251126-133922-marostegui.json
[13:40:48] <wikibugs>	 (03PS2) 10Bartosz Wójtowicz: ml-services: Separate eqiad and codfw deployments for Revise Tone. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538)
[13:41:59] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211671|Only set $wgGlobalBlockingAutoblockExemptions if GlobalBlocking used (T409915)]], [[gerrit:1204571|Set $wgGlobalBlockingAutoblockExemptions (T409915)]], [[gerrit:1211673|Follow-up: Set $wgGlobalBlockingAutoblockExemptions (T409915)]] (duration: 06m 51s)
[13:42:20] <kostajh>	 Dreamy_Jazz: I have a backport when you're finished
[13:42:26] <Dreamy_Jazz>	 I am done
[13:42:28] <Dreamy_Jazz>	 Over to you
[13:42:56] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: "Sounds good! AFAIK, currently the best we could do is just defining the specific `custom_env` per cluster, but leave the rest in `values.y" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211640 (https://phabricator.wikimedia.org/T408538) (owner: 10Bartosz Wójtowicz)
[13:43:07] <Dreamy_Jazz>	 kostajh:
[13:43:15] <kostajh>	 Dreamy_Jazz: thanks
[13:43:46] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/WikiEditor] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1210614 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:44:23] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210586 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:45:15] <wikibugs>	 (03CR) 10Bartosz Wójtowicz: [C:03+1] changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou)
[13:45:41] <wikibugs>	 (03CR) 10Dpogorzelski: [C:03+1] changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou)
[13:45:50] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210586 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:45:51] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/WikiEditor] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1210614 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:46:54] <wikibugs>	 (03Merged) 10jenkins-bot: MonologChannels: Add WikiEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210586 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:47:33] <wikibugs>	 (03CR) 10AikoChou: [C:03+2] changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou)
[13:49:25] <wikibugs>	 (03Merged) 10jenkins-bot: changeprop: enable pilot wikis for revise-tone-task-generator [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211655 (https://phabricator.wikimedia.org/T408538) (owner: 10AikoChou)
[13:49:27] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "No issue here, though the aggregate looks simpler to the untrained eye.  I guess we can't change the v4 to that though, so agree it's bett" [puppet] - 10https://gerrit.wikimedia.org/r/1211667 (https://phabricator.wikimedia.org/T411081) (owner: 10Majavah)
[13:54:30] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P85755 and previous config saved to /var/cache/conftool/dbconfig/20251126-135429-marostegui.json
[13:54:41] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Allow smartctl for datacenter-ops [puppet] - 10https://gerrit.wikimedia.org/r/1211653 (https://phabricator.wikimedia.org/T395939) (owner: 10Muehlenhoff)
[13:57:00] <logmsgbot>	 !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2166 gradually with 4 steps - Repooling
[13:57:17] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good, one comment inline" [puppet] - 10https://gerrit.wikimedia.org/r/1198155 (https://phabricator.wikimedia.org/T407726) (owner: 10JHathaway)
[13:57:36] <wikibugs>	 (03PS3) 10Slyngshede: Update Meta geo-mapping [dns] - 10https://gerrit.wikimedia.org/r/1206185 (https://phabricator.wikimedia.org/T409735)
[13:58:33] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409488 (10elukey) >>! In T411082#11409426, @Jclark-ctr wrote: > Additionally, this will make four Radeon PRO WX 9100 GPUs in storage. Should we consider selling them if the...
[13:58:43] <wikibugs>	 (03Merged) 10jenkins-bot: Hooks: Log the status message when responseUnknown occurs [extensions/WikiEditor] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1210614 (https://phabricator.wikimedia.org/T410877) (owner: 10Kosta Harlan)
[13:59:10] <wikibugs>	 (03CR) 10Daniel Kinzler: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: 10Pmiazga)
[13:59:17] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1210586|MonologChannels: Add WikiEditor (T410877)]], [[gerrit:1210614|Hooks: Log the status message when responseUnknown occurs (T410877)]]
[13:59:22] <stashbot>	 T410877: WikiEditor: Log unknown codes to Logstash - https://phabricator.wikimedia.org/T410877
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: Time to do the UTC afternoon backport window deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1400).
[14:00:05] <jouncebot>	 Dreamy_Jazz and kostajh: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:15] <kostajh>	 finishing up a deploy
[14:00:24] <wikibugs>	 06SRE, 10MinT, 10Prod-Kubernetes, 06serviceops: machinetranslation eqiad pods in state ContainerStatusUnknown - https://phabricator.wikimedia.org/T411058#11409494 (10JMeybohm) `ContainerStatusUnknown` usually happens when a node is down or otherwise in trouble which seems to have been the for the two nodes...
[14:00:25] <wikibugs>	 (03PS3) 10Sbisson: CX3 Build 1.0.0+20251126 [extensions/ContentTranslation] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211679 (https://phabricator.wikimedia.org/T384485)
[14:00:50] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#dep" [extensions/ContentTranslation] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211679 (https://phabricator.wikimedia.org/T384485) (owner: 10Sbisson)
[14:01:01] <Lucas_WMDE>	 I can’t deploy, in a meeting
[14:01:16] <Dreamy_Jazz>	 Looks like there is nothing else in the window, so should be fine
[14:01:32] <Dreamy_Jazz>	 Oh actually there is now something :D
[14:01:45] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1210586|MonologChannels: Add WikiEditor (T410877)]], [[gerrit:1210614|Hooks: Log the status message when responseUnknown occurs (T410877)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:02:27] <stephanebisson>	 jouncebot now
[14:02:28] <jouncebot>	 For the next 0 hour(s) and 57 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1400)
[14:02:56] <Dreamy_Jazz>	 stephanebisson: Kosta is currently deploying, you will be next
[14:03:18] <Dreamy_Jazz>	 But may not be a deployer around to deploy (so if you have deploy rights you may need to self deploy)
[14:03:42] <logmsgbot>	 !log elukey@cumin1003 START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1001.eqiad.wmnet
[14:05:12] <stephanebisson>	 Dreamy_Jazz sounds good, thanks
[14:05:54] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[14:07:29] <wikibugs>	 10SRE-SLO, 10Citoid, 10VisualEditor, 06Editing-team (Tracking): Seperate SLO for requests made from Citoid Extension, possible wmf deployed extension only, vs bots etc. - https://phabricator.wikimedia.org/T345627#11409512 (10elukey) @Mvolz all merged, the new dashboard is available [[ https://slo.wikimedia...
[14:07:49] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache::text: enable auth, bot rate-limiting on drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1211064 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[14:08:07] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache::text: enable auth, bot rate-limiting on drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1211064 (https://phabricator.wikimedia.org/T406545) (owner: 10Giuseppe Lavagetto)
[14:08:46] <logmsgbot>	 !log elukey@cumin1003 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1001.eqiad.wmnet
[14:08:59] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409522 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by elukey@cumin1003 depool for host ml-serve1001.eqiad.wmnet completed: - ml-serve1001.eqi...
[14:09:38] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P85757 and previous config saved to /var/cache/conftool/dbconfig/20251126-140937-marostegui.json
[14:09:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409529 (10elukey) The host is depooled:  ` elukey@cumin1003:~$ sudo cookbook sre.k8s.pool-depool-node -t T411082 -r "Depool the node to remove old GPUs" --k8s-cluster ml-se...
[14:09:55] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1210586|MonologChannels: Add WikiEditor (T410877)]], [[gerrit:1210614|Hooks: Log the status message when responseUnknown occurs (T410877)]] (duration: 10m 39s)
[14:10:01] <stashbot>	 T410877: WikiEditor: Log unknown codes to Logstash - https://phabricator.wikimedia.org/T410877
[14:10:30] <wikibugs>	 (03PS1) 10Brouberol: growthbook/growthbook-next: differentiate provenance of invite emails [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211681 (https://phabricator.wikimedia.org/T410999)
[14:10:36] <wikibugs>	 (03PS15) 10Ssingh: trafficserver: Add missing REST Gateway for Beta Cluster [puppet] - 10https://gerrit.wikimedia.org/r/1182652 (https://phabricator.wikimedia.org/T404387) (owner: 10Krinkle)
[14:10:47] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11409537 (10Gehel) For the cloudelastic* nodes, it should be ok to unplug them for a few minutes. Ideally, we want to...
[14:11:47] <wikibugs>	 (03PS1) 10Elukey: Remove GPU settings from ml-serve1001 [puppet] - 10https://gerrit.wikimedia.org/r/1211682 (https://phabricator.wikimedia.org/T411082)
[14:12:47] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] trafficserver: Add missing REST Gateway for Beta Cluster [puppet] - 10https://gerrit.wikimedia.org/r/1182652 (https://phabricator.wikimedia.org/T404387) (owner: 10Krinkle)
[14:13:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team, 13Patch-For-Review: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409542 (10elukey) Next steps:  - John to remove the GPUs. - Dawid/Tobias to review/merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1211682...
[14:13:40] <wikibugs>	 (03PS4) 10Slyngshede: Update Meta geo-mapping [dns] - 10https://gerrit.wikimedia.org/r/1206185 (https://phabricator.wikimedia.org/T409735)
[14:14:03] <icinga-wm>	 PROBLEM - Host ml-serve1001 is DOWN: PING CRITICAL - Packet loss = 100%
[14:14:09] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by sbisson@deploy2002 using scap backport" [extensions/ContentTranslation] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211679 (https://phabricator.wikimedia.org/T384485) (owner: 10Sbisson)
[14:15:14] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "These will be non-functioning email addresses, in terms of receiving email." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211681 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[14:16:01] <wikibugs>	 (03Merged) 10jenkins-bot: CX3 Build 1.0.0+20251126 [extensions/ContentTranslation] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211679 (https://phabricator.wikimedia.org/T384485) (owner: 10Sbisson)
[14:16:33] <logmsgbot>	 !log sbisson@deploy2002 Started scap sync-world: Backport for [[gerrit:1211679|CX3 Build 1.0.0+20251126 (T384485)]]
[14:16:38] <stashbot>	 T384485: Recommendation API: Support pagination for single page collection recommendations - https://phabricator.wikimedia.org/T384485
[14:16:54] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] "Yes that should be easy enough." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211681 (https://phabricator.wikimedia.org/T410999) (owner: 10Brouberol)
[14:17:13] <wikibugs>	 10SRE-SLO: Sloth: adapt default month view to quarter view - https://phabricator.wikimedia.org/T409312#11409551 (10elukey) @herron this task should be good in my opinion for the pilot's goals, we'll may need to tune it a little further if we decide to use Sloth but I wouldn't spend a ton of time on it in Q2. Lem...
[14:17:47] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove obsolete stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1211684 (https://phabricator.wikimedia.org/T381565)
[14:17:51] <wikibugs>	 06SRE, 06Traffic: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097 (10ssingh) 03NEW
[14:17:52] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[14:18:14] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[14:18:32] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
[14:18:47] <logmsgbot>	 !log sbisson@deploy2002 sbisson: Backport for [[gerrit:1211679|CX3 Build 1.0.0+20251126 (T384485)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:18:50] <jinxer-wm>	 FIRING: KubernetesCalicoDown: ml-serve1001.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1001.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[14:18:56] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
[14:19:23] <wikibugs>	 (03CR) 10Volans: "I've manually aborted the PCC run for puppet5, the puppet7 seems good:" [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[14:21:22] <logmsgbot>	 !log sbisson@deploy2002 sbisson: Continuing with sync
[14:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[14:24:45] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1253 (T410531)', diff saved to https://phabricator.wikimedia.org/P85758 and previous config saved to /var/cache/conftool/dbconfig/20251126-142445-marostegui.json
[14:24:50] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
[14:24:51] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[14:24:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098 (10cmooney) 03NEW p:05Triage→03Medium
[14:25:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11409583 (10cmooney)
[14:25:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 13Patch-For-Review: Nokia L3 bugs [Oct 2025] - https://phabricator.wikimedia.org/T409286#11409584 (10cmooney)
[14:25:15] <wikibugs>	 (03PS5) 10Slyngshede: Update Meta geo-mapping [dns] - 10https://gerrit.wikimedia.org/r/1206185 (https://phabricator.wikimedia.org/T409735)
[14:25:25] <logmsgbot>	 !log sbisson@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211679|CX3 Build 1.0.0+20251126 (T384485)]] (duration: 08m 52s)
[14:25:30] <stashbot>	 T384485: Recommendation API: Support pagination for single page collection recommendations - https://phabricator.wikimedia.org/T384485
[14:26:43] <wikibugs>	 (03PS5) 10Daniel Kinzler: rest-gateway: assign ratelimit class by network range [deployment-charts] - 10https://gerrit.wikimedia.org/r/1206956 (https://phabricator.wikimedia.org/T410273)
[14:27:31] <icinga-wm>	 RECOVERY - Host ml-serve1001 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[14:27:38] <wikibugs>	 (03PS19) 10Jcrespo: garage: Add a first role and profile [puppet] - 10https://gerrit.wikimedia.org/r/1207887 (https://phabricator.wikimedia.org/T410020)
[14:27:47] <wikibugs>	 (03CR) 10Jcrespo: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1207887 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo)
[14:31:21] <wikibugs>	 (03CR) 10Btullis: Add helmfile deployments of the spark-support chart to our two test namespaces (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[14:31:51] <icinga-wm>	 PROBLEM - Host sretest1006 is DOWN: PING CRITICAL - Packet loss = 100%
[14:31:52] <wikibugs>	 06SRE, 10LDAP-Access-Requests: Access to logstash for OKryva-WMF - https://phabricator.wikimedia.org/T410115#11409617 (10OKryva-WMF) Hi,   ah, i see, yes, just requested permissions for the Logstash from the idm portal.
[14:32:56] <wikibugs>	 (03CR) 10Elukey: ipxe MBR support (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1211269 (https://phabricator.wikimedia.org/T409286) (owner: 10JHathaway)
[14:33:50] <jinxer-wm>	 RESOLVED: KubernetesCalicoDown: ml-serve1001.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1001.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[14:35:23] <elukey>	 jouncebot: nowandnext
[14:35:23] <jouncebot>	 For the next 0 hour(s) and 24 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1400)
[14:35:23] <jouncebot>	 In 0 hour(s) and 24 minute(s): Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1500)
[14:37:51] <logmsgbot>	 !log cmooney@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on sretest1006.eqiad.wmnet with reason: changing host to uefi mode boot
[14:38:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Move sretest1006 to rack D8 and connect to lswtest-d8-eqiad - https://phabricator.wikimedia.org/T411098#11409646 (10Jclark-ctr) a:03Jclark-ctr Relocated sretest1006 to D8 U37.   Connected to  lswtest-d8-eqiad Port 1
[14:38:37] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team, 13Patch-For-Review: Remove old GPUs from ml-serve1001 - https://phabricator.wikimedia.org/T411082#11409649 (10Jclark-ctr) removed both gpu.   While system was down updated bios and idrac firmware BIOS Version  2.10.0  to 2.25.0  iDRAC Firmware Versio...
[14:43:27] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Wikidata, and 3 others: Racking request for wdqs10(2[8-9]|3[0-2]) - https://phabricator.wikimedia.org/T410406#11409668 (10Jclark-ctr) @bking Did you have any luck with reimage?  or do you need any assistance?
[14:43:33] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[14:44:41] <elukey>	 I am going to quickly backport a change - https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1210598
[14:45:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by elukey@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[14:45:50] <logmsgbot>	 !log cmooney@cumin1003 END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
[14:46:08] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[14:46:39] <wikibugs>	 (03Merged) 10jenkins-bot: Add a staging-specific stream for Maps tiles change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210598 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[14:46:48] <jynus>	 deploying the test garage setup on backup2014
[14:47:10] <logmsgbot>	 !log elukey@deploy2002 Started scap sync-world: Backport for [[gerrit:1210598|Add a staging-specific stream for Maps tiles change (T409528)]]
[14:47:15] <stashbot>	 T409528: Setup a maps staging DB - https://phabricator.wikimedia.org/T409528
[14:48:33] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] garage: Add a first role and profile [puppet] - 10https://gerrit.wikimedia.org/r/1207887 (https://phabricator.wikimedia.org/T410020) (owner: 10Jcrespo)
[14:49:28] <logmsgbot>	 !log elukey@deploy2002 elukey: Backport for [[gerrit:1210598|Add a staging-specific stream for Maps tiles change (T409528)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:49:31] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new link ips to lswtest - cmooney@cumin1003"
[14:49:54] <logmsgbot>	 !log elukey@deploy2002 elukey: Continuing with sync
[14:49:54] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new link ips to lswtest - cmooney@cumin1003"
[14:49:54] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[14:50:34] <wikibugs>	 (03PS1) 10Jcrespo: Revert "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211689
[14:50:52] <wikibugs>	 (03CR) 10Jcrespo: [V:03+2 C:03+2] Revert "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211689 (owner: 10Jcrespo)
[14:51:32] <jynus>	 I did a quick revert, but maybe I was too fast
[14:51:45] <jinxer-wm>	 FIRING: WidespreadPuppetFailure: Puppet has failed in magru - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:51:56] <jynus>	 no, I should merge the revert
[14:52:16] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409701 (10Eevans) >>! In T410075#11408386, @elukey wrote: > Hi folks!  >  >>>! In T410075#11407492, @Eevans wrote: >>>>! In T410075#11400035, @elukey wrote: >>> [ ... ] >>> >  > I totall...
[14:52:36] <jynus>	 blocked on test-prio
[14:53:40] <jynus>	 probably has to do with undef != []
[14:53:52] <logmsgbot>	 !log elukey@deploy2002 Finished scap sync-world: Backport for [[gerrit:1210598|Add a staging-specific stream for Maps tiles change (T409528)]] (duration: 06m 41s)
[14:53:57] <stashbot>	 T409528: Setup a maps staging DB - https://phabricator.wikimedia.org/T409528
[14:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[14:54:49] <jynus>	 oh, I think I know what happened, a weird edge case triggered
[14:55:06] <jynus>	 for non-core sites
[14:56:20] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409705 (10Eevans) >>! In T410075#11408341, @brouberol wrote: > Naïve q, piggybacking on @Eevans 's response: what about a DNS domain resolving to the node IPs? If we have a recent enough...
[14:56:45] <jinxer-wm>	 FIRING: [4x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[14:56:56] <jynus>	 ^ fixing it slowly
[14:57:47] <wikibugs>	 (03PS14) 10Btullis: Add a new spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833)
[14:57:47] <wikibugs>	 (03PS19) 10Btullis: Add a deployment of the spark-support chart to our analytics-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833)
[14:58:05] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[14:58:14] <wikibugs>	 (03CR) 10Btullis: Add a new spark-support chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[14:58:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:59:00] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409707 (10brouberol) Oh, you're right!
[14:59:02] <wikibugs>	 (03CR) 10Btullis: Add a deployment of the spark-support chart to our analytics-test namespace (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[14:59:10] <wikibugs>	 (03CR) 10Btullis: Add a deployment of the spark-support chart to our analytics-test namespace (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[14:59:44] <wikibugs>	 (03PS1) 10Jcrespo: Revert^2 "garage: Add a first role and profile" [puppet] - 10https://gerrit.wikimedia.org/r/1211693
[14:59:58] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409711 (10elukey) @Eevans thanks for the explanation, I kinda assumed that a query to any of the cassandra nodes would have worked as-is, routing the request to the right node (if needed...
[15:00:04] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1500)
[15:00:51] <cscott>	 hey ops -- editing would like us to make a point release of parsoid to help unblock their work on flow deprecation
[15:01:07] <cscott>	 do you mind if i expect the morning backport window a bit to self-deploy a mediawiki-vendor patch?
[15:01:46] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new link ips to lswtest - cmooney@cumin1003"
[15:01:51] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new link ips to lswtest - cmooney@cumin1003"
[15:01:51] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:02:14] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add a deployment of the spark-support chart to our analytics-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:02:18] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add a new spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:03:25] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:05:49] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409717 (10elukey) At this point another alternative for the k8s world could be to have an `externalservice` configured, so that clients will use it to connect to random host and discover...
[15:08:59] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:09:25] <wikibugs>	 (03CR) 10Elukey: [C:03+2] services: set new caching and kafka configuration for Tegola staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211631 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[15:10:37] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
[15:11:08] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
[15:20:44] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet
[15:20:51] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet
[15:20:56] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
[15:21:17] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet
[15:21:23] <wikibugs>	 (03CR) 10Jcrespo: [C:04-1] "The issue happened when we have not defined mediabackups hash or no defined list of ips, it expects an array, leading to error." [puppet] - 10https://gerrit.wikimedia.org/r/1211693 (owner: 10Jcrespo)
[15:21:45] <jinxer-wm>	 FIRING: [4x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[15:22:09] <jynus>	 ^ this is corrected live, but the metrics have some lag
[15:23:45] <wikibugs>	 (03PS15) 10Pmiazga: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578)
[15:25:47] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on bast2003 - https://phabricator.wikimedia.org/T410195#11409770 (10Jhancock.wm) got the replacement rolling with dell. SR219265258  they try to fight me every time there's a disk that fails that doesn't show in the idrac. so better to overwhelm them with proof. th...
[15:26:45] <jinxer-wm>	 RESOLVED: [4x] WidespreadPuppetFailure: Puppet has failed in drmrs - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[15:27:26] <logmsgbot>	 !log dpogorzelski@deploy2002 helmfile [eqiad] START helmfile.d/services/changeprop: sync
[15:27:55] <logmsgbot>	 !log dpogorzelski@deploy2002 helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
[15:28:08] <logmsgbot>	 !log dpogorzelski@deploy2002 helmfile [codfw] START helmfile.d/services/changeprop: sync
[15:28:27] <wikibugs>	 06SRE, 06SRE Observability (FY2025/2026-Q3): Add Druid as a Private Grafana Datasource - https://phabricator.wikimedia.org/T410933#11409779 (10hnowlan)
[15:28:41] <logmsgbot>	 !log dpogorzelski@deploy2002 helmfile [codfw] DONE helmfile.d/services/changeprop: sync
[15:28:59] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[15:29:28] <wikibugs>	 (03PS1) 10Pmiazga: restbase: Handle JWT passsed in cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211703
[15:30:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1500)
[15:30:05] <jouncebot>	 Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1530)
[15:31:21] <wikibugs>	 (03PS2) 10Pmiazga: WIP: restbase: Handle JWT passsed in cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211703
[15:33:59] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:34:02] <wikibugs>	 06SRE, 10Cassandra, 06Data-Persistence: Discovery of Cassandra cluster nodes - https://phabricator.wikimedia.org/T410075#11409819 (10Eevans) >>! In T410075#11409711, @elukey wrote: > @Eevans thanks for the explanation, I kinda assumed that a query to any of the cassandra nodes would have worked as-is, routin...
[15:34:04] <wikibugs>	 (03PS3) 10Pmiazga: WIP: restbase: Handle JWT passsed in cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211703
[15:37:07] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] "LGTM as an MVP, I would also add the codfw hosts long-term." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211165 (https://phabricator.wikimedia.org/T384810) (owner: 10Federico Ceratto)
[15:39:09] <wikibugs>	 (03CR) 10Federico Ceratto: [C:03+2] zarcillo: Allow egress to etcd to fetch dbctl values [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211165 (https://phabricator.wikimedia.org/T384810) (owner: 10Federico Ceratto)
[15:39:11] <wikibugs>	 (03PS1) 10Vgutierrez: haproxy: Fix user ua_class regex [puppet] - 10https://gerrit.wikimedia.org/r/1211704
[15:39:38] <logmsgbot>	 !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:40:22] <wikibugs>	 (03PS8) 10Daniel Kinzler: rest-gateway: assign ratelimit class by network range [deployment-charts] - 10https://gerrit.wikimedia.org/r/1206956 (https://phabricator.wikimedia.org/T410273)
[15:40:38] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11409838 (10fnegri) @Jclark-ctr `clouddb10[17-20]` are now depooled, but not downtimed. Can you please downtime them yourself when you migrate them? Otherwise...
[15:41:06] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add a new spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:41:14] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Add a deployment of the spark-support chart to our analytics-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:43:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add a new spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195178 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:43:31] <wikibugs>	 (03Merged) 10jenkins-bot: Add a deployment of the spark-support chart to our analytics-test namespace [deployment-charts] - 10https://gerrit.wikimedia.org/r/1195182 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[15:45:45] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[15:46:53] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[15:47:27] <logmsgbot>	 !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[15:48:49] <logmsgbot>	 !log fnegri@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1017-1020].eqiad.wmnet with reason: moving to a new switch
[15:48:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11409855 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=80e83414-993e-4a63-b612-9625174481c7) set by fnegri@cumin1003 for 2:00:00 on 4 ho...
[15:52:06] <cscott>	 hm, gerrit seems unhappy?
[15:52:12] <wikibugs>	 (03PS5) 10Majavah: Initial configuration for tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205954 (https://phabricator.wikimedia.org/T404457)
[15:52:12] <wikibugs>	 (03PS5) 10Majavah: Activate tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205955 (https://phabricator.wikimedia.org/T404457)
[15:52:12] <wikibugs>	 (03PS5) 10Majavah: Set up tokwiki namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457)
[15:52:12] <wikibugs>	 (03PS3) 10Majavah: Allow account creation on tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207262 (https://phabricator.wikimedia.org/T404457)
[15:52:31] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:52:52] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
[15:52:59] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
[15:53:34] <wikibugs>	 (03PS1) 10Brouberol: growthbook: configure proxy environment vars to enable license activation [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211709 (https://phabricator.wikimedia.org/T411106)
[15:54:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+1] haproxy: Fix user ua_class regex [puppet] - 10https://gerrit.wikimedia.org/r/1211704 (owner: 10Vgutierrez)
[15:55:09] <sukhe>	 yeah gerrit should be done
[15:55:11] <sukhe>	 *down
[15:55:17] <sukhe>	 10:52:31 <+jinxer-wm> FIRING: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - 
[15:56:07] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook: configure proxy environment vars to enable license activation [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211709 (https://phabricator.wikimedia.org/T411106) (owner: 10Brouberol)
[15:56:13] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
[15:56:45] <logmsgbot>	 !log elukey@deploy2002 helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
[15:57:31] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:57:45] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook: configure proxy environment vars to enable license activation [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211709 (https://phabricator.wikimedia.org/T411106) (owner: 10Brouberol)
[16:00:05] <jouncebot>	 taavi: gettimeofday() says it's time for New wiki creation. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1600)
[16:00:12] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] haproxy: Fix user ua_class regex [puppet] - 10https://gerrit.wikimedia.org/r/1211704 (owner: 10Vgutierrez)
[16:00:41] <Tamzin>	 toki a, jan Taavi o :) (hi, taavi!)
[16:01:03] <wikibugs>	 (03CR) 10Vgutierrez: "holding..." [puppet] - 10https://gerrit.wikimedia.org/r/1211704 (owner: 10Vgutierrez)
[16:01:26] <taavi>	 o/ starting by creating the wiki itself
[16:01:50] <Tamzin>	 lmk when it's up, I wanna see if I can beat my low-user-ID record :P
[16:01:56] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[16:01:57] <Tamzin>	 oh wait you disabled autocreate nvm lol
[16:02:17] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205954 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:02:58] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[16:03:09] <wikibugs>	 (03Merged) 10jenkins-bot: Initial configuration for tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205954 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:03:26] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
[16:03:41] <logmsgbot>	 !log taavi@deploy2002 Started scap sync-world: Backport for [[gerrit:1205954|Initial configuration for tokwiki (T404457)]]
[16:03:46] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:04:20] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
[16:04:25] <wikibugs>	 (03PS1) 10Btullis: Update the helmfile values paths for the analytics-test spark-support [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211712 (https://phabricator.wikimedia.org/T406833)
[16:05:47] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[16:05:55] <logmsgbot>	 !log taavi@deploy2002 taavi: Backport for [[gerrit:1205954|Initial configuration for tokwiki (T404457)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[16:06:50] <logmsgbot>	 !log taavi@deploy2002 taavi: Continuing with sync
[16:07:11] <wikibugs>	 (03CR) 10Effie Mouzeli: [C:03+2] cumin: add aliases for memcached-gutter hosts [puppet] - 10https://gerrit.wikimedia.org/r/1211066 (https://phabricator.wikimedia.org/T408925) (owner: 10Effie Mouzeli)
[16:07:18] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11409981 (10Andrew)
[16:07:20] <logmsgbot>	 !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[16:07:40] <taavi>	 my brain seems to want to be extra careful today and triple-check every single button press
[16:08:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "Rollout plan SGTM, I spot-checked PCC and nothing obvious jumped to my eyes" [puppet] - 10https://gerrit.wikimedia.org/r/1196367 (owner: 10Majavah)
[16:08:33] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:09:03] <jinxer-wm>	 RESOLVED: KafkaUnderReplicatedPartitions: Under replicated partitions for Kafka cluster main-eqiad in eqiad - https://wikitech.wikimedia.org/wiki/Kafka/Administration - https://grafana.wikimedia.org/d/000000027/kafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-kafka_cluster=main-eqiad - https://alerts.wikimedia.org/?q=alertname%3DKafkaUnderReplicatedPartitions
[16:09:24] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Update the helmfile values paths for the analytics-test spark-support [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211712 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[16:09:27] <wikibugs>	 (03PS2) 10AOkoth: admin: add FIDO ssh key for aokoth [puppet] - 10https://gerrit.wikimedia.org/r/1211201
[16:10:57] <logmsgbot>	 !log taavi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1205954|Initial configuration for tokwiki (T404457)]] (duration: 07m 15s)
[16:11:02] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:11:57] <taavi>	 next up is running addWiki
[16:12:24] <logmsgbot>	 !log taavi@deploy2002 mwscript-k8s job started: extensions/WikimediaMaintenance/maintenance/addWiki.php --wiki=tokwiki  # T404457
[16:13:52] <taavi>	 addWiki is done
[16:14:34] <taavi>	 Tamzin: just the system users (Abuse filter, Maintenance script and MediaWiki default) take the first 3 user ids, so I don't think your record is beatable
[16:14:53] <Tamzin>	 darn. well at least I get the "I Am Number Four" joke on testwikidata :P
[16:15:08] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T410589)', diff saved to https://phabricator.wikimedia.org/P85761 and previous config saved to /var/cache/conftool/dbconfig/20251126-161508-ladsgroup.json
[16:15:13] <Tamzin>	 I think Re.edy has #1 there, though, so something must be different
[16:15:13] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[16:15:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205955 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:15:27] <taavi>	 I don't think anything will beat my mailman user id :P
[16:16:05] <wikibugs>	 (03Merged) 10jenkins-bot: Activate tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205955 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:16:18] <Tamzin>	 my wife has a 3-letter Minecraft Java username. she's pretty proud of that. (lmk if I'm too much of a distraction :P )
[16:16:34] <logmsgbot>	 !log taavi@deploy2002 Started scap sync-world: Backport for [[gerrit:1205955|Activate tokwiki (T404457)]]
[16:16:39] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:19:10] <logmsgbot>	 !log taavi@deploy2002 taavi: Backport for [[gerrit:1205955|Activate tokwiki (T404457)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[16:19:19] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[16:19:25] <wikibugs>	 (03PS1) 10Cwhite: gerrit: block more agressive scrapers [puppet] - 10https://gerrit.wikimedia.org/r/1211713 (https://phabricator.wikimedia.org/T411105)
[16:19:38] <taavi>	 x-wikimedia-debug shows the default "This subdomain is reserved for the creation of a Wikipedia in Toki Pona language" main page
[16:19:58] <taavi>	 logstash looks clean, so syncing
[16:20:01] <logmsgbot>	 !log taavi@deploy2002 taavi: Continuing with sync
[16:20:28] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+1] gerrit: block more agressive scrapers [puppet] - 10https://gerrit.wikimedia.org/r/1211713 (https://phabricator.wikimedia.org/T411105) (owner: 10Cwhite)
[16:21:15] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] gerrit: block more agressive scrapers [puppet] - 10https://gerrit.wikimedia.org/r/1211713 (https://phabricator.wikimedia.org/T411105) (owner: 10Cwhite)
[16:22:10] <wikibugs>	 (03CR) 10Volans: [C:03+2] prometheus: blackbox check http skip tls verify [puppet] - 10https://gerrit.wikimedia.org/r/1211628 (owner: 10Volans)
[16:23:15] <logmsgbot>	 !log andrew@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on cloudweb1004.wikimedia.org with reason: T411025
[16:23:20] <stashbot>	 T411025: eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025
[16:24:01] <logmsgbot>	 !log taavi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1205955|Activate tokwiki (T404457)]] (duration: 07m 27s)
[16:24:06] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:24:24] <taavi>	 https://tok.wikipedia.org/ should load for you all now
[16:24:28] <logmsgbot>	 !log andrew@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on clouddumps1002.wikimedia.org with reason: T411025
[16:25:29] <taavi>	 syncing the namespace config patch next
[16:25:40] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:26:44] <wikibugs>	 (03Merged) 10jenkins-bot: Set up tokwiki namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1205956 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[16:27:15] <logmsgbot>	 !log taavi@deploy2002 Started scap sync-world: Backport for [[gerrit:1205956|Set up tokwiki namespaces (T404457)]]
[16:28:48] <wikibugs>	 (03CR) 10AOkoth: [C:03+2] admin: add FIDO ssh key for aokoth [puppet] - 10https://gerrit.wikimedia.org/r/1211201 (owner: 10AOkoth)
[16:28:48] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11410092 (10fnegri) clouddb10[17-20] are depooled and downtimed, I mistakenly posted comments about those in the pare...
[16:29:28] <logmsgbot>	 !log taavi@deploy2002 taavi: Backport for [[gerrit:1205956|Set up tokwiki namespaces (T404457)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[16:29:33] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:30:16] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P85762 and previous config saved to /var/cache/conftool/dbconfig/20251126-163015-ladsgroup.json
[16:30:19] <logmsgbot>	 !log taavi@deploy2002 taavi: Continuing with sync
[16:31:54] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Log the hCaptcha token [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211721 (https://phabricator.wikimedia.org/T411096)
[16:32:15] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] haproxy: Fix user ua_class regex [puppet] - 10https://gerrit.wikimedia.org/r/1211704 (owner: 10Vgutierrez)
[16:34:03] <wikibugs>	 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11410107 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This is complete. Luca and myself made a total of 122 commits to puppet.git (plus surely a few where m...
[16:35:32] <logmsgbot>	 !log taavi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1205956|Set up tokwiki namespaces (T404457)]] (duration: 08m 17s)
[16:35:37] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[16:36:12] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.network.tls for network device lswtest-d8-eqiad
[16:36:31] <taavi>	 per https://wikitech.wikimedia.org/wiki/Add_a_wiki#Install, running the sites table script
[16:36:40] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lswtest-d8-eqiad
[16:36:46] <logmsgbot>	 !log taavi@deploy2002 mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https  # T404571
[16:36:51] <stashbot>	 T404571: Add Wikidata support for tokwiki - https://phabricator.wikimedia.org/T404571
[16:37:53] <taavi>	 this seems like it'll take a while since it needs to touch all existing wikis
[16:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[16:41:04] <wikibugs>	 (03CR) 10Btullis: [C:03+2] Update the helmfile values paths for the analytics-test spark-support [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211712 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[16:41:32] <wikibugs>	 (03PS1) 10Hashar: gerrit: block some more scrapers [puppet] - 10https://gerrit.wikimedia.org/r/1211723 (https://phabricator.wikimedia.org/T411105)
[16:42:44] <wikibugs>	 (03Merged) 10jenkins-bot: Update the helmfile values paths for the analytics-test spark-support [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211712 (https://phabricator.wikimedia.org/T406833) (owner: 10Btullis)
[16:43:56] <wikibugs>	 (03CR) 10Elukey: wdqs: add availability sli recording rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1202049 (https://phabricator.wikimedia.org/T393966) (owner: 10Ryan Kemper)
[16:43:58] <taavi>	 update to "take a while": it's done for about a third of the wikis
[16:44:51] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] gerrit: block some more scrapers [puppet] - 10https://gerrit.wikimedia.org/r/1211723 (https://phabricator.wikimedia.org/T411105) (owner: 10Hashar)
[16:45:07] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
[16:45:14] <logmsgbot>	 !log btullis@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
[16:45:16] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.wipe-cache lswtest-d8-eqiad.mgmt.eqiad.wmnet on all recursors
[16:45:19] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lswtest-d8-eqiad.mgmt.eqiad.wmnet on all recursors
[16:45:24] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P85763 and previous config saved to /var/cache/conftool/dbconfig/20251126-164523-ladsgroup.json
[16:46:20] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
[16:47:00] <wikibugs>	 (03PS6) 10Hnowlan: svg: refuse to generate SVGs larger than a particular size [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1211630 (https://phabricator.wikimedia.org/T411076)
[16:47:14] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
[16:47:18] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11410170 (10Andrew)
[16:47:22] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2150 (T410531)', diff saved to https://phabricator.wikimedia.org/P85764 and previous config saved to /var/cache/conftool/dbconfig/20251126-164722-marostegui.json
[16:47:27] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[16:48:10] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet
[16:49:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[16:51:30] <moritzm>	 !log installing Perl security updates
[16:51:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:53] <wikibugs>	 (03CR) 10Volans: "The parent patch has been merged and deployed, this should be ready for a final review." [puppet] - 10https://gerrit.wikimedia.org/r/1211610 (https://phabricator.wikimedia.org/T399313) (owner: 10Volans)
[16:52:39] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
[16:52:44] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet
[16:52:53] <logmsgbot>	 !log fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet
[16:53:10] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T410531)', diff saved to https://phabricator.wikimedia.org/P85765 and previous config saved to /var/cache/conftool/dbconfig/20251126-165309-marostegui.json
[16:53:15] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[16:53:52] <taavi>	 less than 200 wikis to go
[16:54:02] <Tamzin>	 je! (yay)
[16:54:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[16:57:21] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to cassandra-staging-devs group for amastilovic - https://phabricator.wikimedia.org/T410972#11410219 (10Ahoelzl) Thanks for the ping. Approved.
[16:58:05] <taavi>	 ok, that's finally done
[16:58:24] <taavi>	 next up is creating a bunch of empty accounts and then importing the dump
[16:59:29] <hashar>	 .c
[17:00:12] <wikibugs>	 (03PS7) 10Hnowlan: svg: refuse to generate SVGs larger than a particular size [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1211630 (https://phabricator.wikimedia.org/T411076)
[17:00:32] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T410589)', diff saved to https://phabricator.wikimedia.org/P85766 and previous config saved to /var/cache/conftool/dbconfig/20251126-170031-ladsgroup.json
[17:00:37] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[17:00:47] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
[17:00:55] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1197 (T410589)', diff saved to https://phabricator.wikimedia.org/P85768 and previous config saved to /var/cache/conftool/dbconfig/20251126-170054-ladsgroup.json
[17:05:58] <Tamzin>	 taavi: tbodt is saying that the issue with the dump was just with Discord. what is a good alternate way to get it to you? or have we already committed to Plan B at this point?
[17:06:07] <wikibugs>	 (03PS1) 10AOkoth: admin: remove old key for aokoth [puppet] - 10https://gerrit.wikimedia.org/r/1211727
[17:07:33] <taavi>	 Tamzin: I think it's a bit too late to change that, unless you or tbodt see problems with importing the latest daily dump (and dealing with any changes made after that by hand)?
[17:08:19] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P85769 and previous config saved to /var/cache/conftool/dbconfig/20251126-170817-marostegui.json
[17:08:34] <Tamzin>	 taavi: if that's what works for you, let's do that. should only be a minor pain cherry-picking the revs to import
[17:10:11] <taavi>	 great, continuing with what I already have then
[17:12:47] <icinga-wm>	 RECOVERY - Backup freshness on backup1014 is OK: Fresh: 142 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[17:12:57] <wikibugs>	 (03CR) 10CDanis: UEFI: dup partition on MD RAID boxes (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1205197 (https://phabricator.wikimedia.org/T376949) (owner: 10JHathaway)
[17:13:09] <taavi>	 running the import in a dry-run mode first
[17:17:01] <taavi>	 that seems fine, after I found the correct syntax
[17:17:17] <taavi>	 Tamzin: final call for any blockers before doing the proper import
[17:17:47] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211155 (https://phabricator.wikimedia.org/T410737) (owner: 10Ejegg)
[17:17:59] <Tamzin>	 taavi: good to go!
[17:18:23] <taavi>	 !log taavi@deploy2002 ~ $ mwscript importDump.php --wiki=tokwiki --no-updates --username-prefix="" < /home/taavi/tokwiki/wikipesija-2025-11-26-rewritten.xml # T404573
[17:18:26] <Tamzin>	 only note that's come up so far is we have one username to add to the merge list, but that's for later
[17:18:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:18:28] <stashbot>	 T404573: Import tokwiki from Wikipesija.org - https://phabricator.wikimedia.org/T404573
[17:19:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[17:20:15] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to cassandra-staging-devs group for amastilovic - https://phabricator.wikimedia.org/T410972#11410328 (10KOfori) Hi @RLazarus, this is approved.
[17:20:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[17:21:01] <kostajh>	 taavi: when you're done, I have a patch I'd like to backport
[17:21:50] <taavi>	 kostajh: I just started a maintenance script that'll take a while, so I think we can sneak in a backport now
[17:22:22] <kostajh>	 taavi: ok. It will take about 20-30 minutes, depending on k8s/ci etc. Shall I start?
[17:23:27] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P85770 and previous config saved to /var/cache/conftool/dbconfig/20251126-172325-marostegui.json
[17:23:31] <taavi>	 as long as you can finish before the next window in ~35 minutes, go ahead
[17:23:49] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211721 (https://phabricator.wikimedia.org/T411096) (owner: 10Kosta Harlan)
[17:24:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[17:25:13] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Log the hCaptcha token [extensions/ConfirmEdit] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211721 (https://phabricator.wikimedia.org/T411096) (owner: 10Kosta Harlan)
[17:25:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[17:25:48] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1211721|hCaptcha: Log the hCaptcha token (T411096)]]
[17:25:53] <stashbot>	 T411096: hCaptcha: Log token in Logstash - https://phabricator.wikimedia.org/T411096
[17:26:53] <Tamzin>	 taavi: good news! there are literally two edits we missed, and they're both by me, and i have offline backups of both of them, so we don't even need to Special:Import anything
[17:27:31] <taavi>	 cool. the import script says it just passed 10% of pages
[17:27:49] <taavi>	 also, you said there was one more user to merge?
[17:28:02] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1211721|hCaptcha: Log the hCaptcha token (T411096)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:28:02] <taavi>	 is that https://meta.wikimedia.org/w/index.php?title=Talk:Requests_for_new_languages/Wikipedia_Toki_Pona_2&curid=13210940&diff=29703752&oldid=29682357 or someone else?
[17:28:29] <kostajh>	 (testing my patch now)
[17:29:57] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[17:30:10] <wikibugs>	 (03PS1) 10RLazarus: admin: Add amastilovic to cassandra-staging-devs [puppet] - 10https://gerrit.wikimedia.org/r/1211729 (https://phabricator.wikimedia.org/T410972)
[17:30:40] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to cassandra-staging-devs group for amastilovic - https://phabricator.wikimedia.org/T410972#11410391 (10RLazarus)
[17:30:55] <logmsgbot>	 !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[17:31:03] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11410395 (10fnegri) clouddb10[17-20] are now repooled and working fine.
[17:32:03] <Tamzin>	 taavi: 3 more, actually, sorry. knew we'd have stragglers! see https://discord.com/channels/1405134055896383488/1405134285777801287/1443292951345369231
[17:32:46] <logmsgbot>	 !log fceratto@deploy2002 helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
[17:34:03] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211721|hCaptcha: Log the hCaptcha token (T411096)]] (duration: 08m 15s)
[17:34:08] <stashbot>	 T411096: hCaptcha: Log token in Logstash - https://phabricator.wikimedia.org/T411096
[17:35:44] <kostajh>	 taavi: all done
[17:35:53] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11410416 (10RobH) Day 11 Update: * 8 hosts moved, 5 remain out of 308 total hosts. * John did all the moves today working with Andrew. * Migrated 6 of the 8 W...
[17:36:14] <taavi>	 kostajh: thanks!
[17:36:23] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11410427 (10RobH)
[17:36:40] <wikibugs>	 (03CR) 10Ladsgroup: [C:03+2] svg: refuse to generate SVGs larger than a particular size [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1211630 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan)
[17:37:45] <wikibugs>	 (03PS1) 10Zabe: RestrictionStore: Check for no up to date cascade protections [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211731 (https://phabricator.wikimedia.org/T411092)
[17:38:12] <wikibugs>	 (03PS1) 10Eevans: cassandra: GRANTs for new analytics keyspace [puppet] - 10https://gerrit.wikimedia.org/r/1211733 (https://phabricator.wikimedia.org/T410962)
[17:38:26] <taavi>	 Tamzin: fwiw, the import script is saying it's imported ~1600 pages (out of ~7700, so a bit over 20%). my educated guess is that it'll get faster as time goes on (as newer pages generally have less revisions than older ones), but it'll still take a while before it's all done
[17:38:34] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T410531)', diff saved to https://phabricator.wikimedia.org/P85771 and previous config saved to /var/cache/conftool/dbconfig/20251126-173833-marostegui.json
[17:38:35] <taavi>	 just to manage expectation
[17:38:37] <wikibugs>	 (03CR) 10Xcollazo: "@btullis@wikimedia.org, could you +2 if you think this is ready?" [dumps] - 10https://gerrit.wikimedia.org/r/1203410 (https://phabricator.wikimedia.org/T403482) (owner: 10Silvan Heintze)
[17:38:39] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[17:38:50] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
[17:38:58] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2159 (T410531)', diff saved to https://phabricator.wikimedia.org/P85772 and previous config saved to /var/cache/conftool/dbconfig/20251126-173857-marostegui.json
[17:39:25] <Tamzin>	 taavi: that's fine! really appreciate the time you're putting into this. we're all having a gay old time watch-partying in VC :P
[17:39:54] <taavi>	 I'd join if it wasn't for the (very sensible) policy that all deployment coordination must happen here :P
[17:40:08] <wikibugs>	 (03Merged) 10jenkins-bot: svg: refuse to generate SVGs larger than a particular size [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/1211630 (https://phabricator.wikimedia.org/T411076) (owner: 10Hnowlan)
[17:40:15] <wikibugs>	 (03PS1) 10Dreamy Jazz: Add SuggestedInvestigationsRevisionsPager [extensions/CheckUser] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211735 (https://phabricator.wikimedia.org/T410300)
[17:40:27] <Dreamy_Jazz>	 jouncebot: nowandnext
[17:40:27] <jouncebot>	 For the next 0 hour(s) and 19 minute(s): New wiki creation (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1600)
[17:40:27] <jouncebot>	 In 0 hour(s) and 19 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1800)
[17:40:47] <taavi>	 hi
[17:40:54] <Dreamy_Jazz>	 Helo
[17:40:56] <Dreamy_Jazz>	 *Hello
[17:41:14] <taavi>	 Dreamy_Jazz: I'm currently in the middle of a very long maintenance script, so if you want to deploy (and are confident you'll finish before the next window) then go ahead
[17:41:16] <wikibugs>	 (03PS1) 10Fabfur: data: remove old non-fido ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/1211736
[17:41:28] <Dreamy_Jazz>	 Sure thanks
[17:41:34] <Dreamy_Jazz>	 I should be done before the next window
[17:41:47] <wikibugs>	 (03PS1) 10Dreamy Jazz: Add SuggestedInvestigationsRevisionsPager [extensions/CheckUser] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211737 (https://phabricator.wikimedia.org/T410300)
[17:42:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211737 (https://phabricator.wikimedia.org/T410300) (owner: 10Dreamy Jazz)
[17:42:16] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dreamyjazz@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211735 (https://phabricator.wikimedia.org/T410300) (owner: 10Dreamy Jazz)
[17:42:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[17:42:24] <Dreamy_Jazz>	 Starting backports now, thanks
[17:42:51] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] data: remove old non-fido ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/1211736 (owner: 10Fabfur)
[17:43:28] <wikibugs>	 (03CR) 10BCornwall: [C:03+1] data: remove old non-fido ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/1211736 (owner: 10Fabfur)
[17:43:36] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] data: remove old non-fido ssh keys [puppet] - 10https://gerrit.wikimedia.org/r/1211736 (owner: 10Fabfur)
[17:44:45] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T410531)', diff saved to https://phabricator.wikimedia.org/P85773 and previous config saved to /var/cache/conftool/dbconfig/20251126-174445-marostegui.json
[17:44:51] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[17:45:46] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] admin: Add amastilovic to cassandra-staging-devs [puppet] - 10https://gerrit.wikimedia.org/r/1211729 (https://phabricator.wikimedia.org/T410972) (owner: 10RLazarus)
[17:47:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[17:52:01] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] admin: Add amastilovic to cassandra-staging-devs [puppet] - 10https://gerrit.wikimedia.org/r/1211729 (https://phabricator.wikimedia.org/T410972) (owner: 10RLazarus)
[17:52:32] <taavi>	 Tamzin: two thirds done
[17:52:34] <wikibugs>	 (03Merged) 10jenkins-bot: Add SuggestedInvestigationsRevisionsPager [extensions/CheckUser] (wmf/1.46.0-wmf.3) - 10https://gerrit.wikimedia.org/r/1211737 (https://phabricator.wikimedia.org/T410300) (owner: 10Dreamy Jazz)
[17:52:36] <wikibugs>	 (03Merged) 10jenkins-bot: Add SuggestedInvestigationsRevisionsPager [extensions/CheckUser] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211735 (https://phabricator.wikimedia.org/T410300) (owner: 10Dreamy Jazz)
[17:53:13] <logmsgbot>	 !log dreamyjazz@deploy2002 Started scap sync-world: Backport for [[gerrit:1211737|Add SuggestedInvestigationsRevisionsPager (T410300)]], [[gerrit:1211735|Add SuggestedInvestigationsRevisionsPager (T410300)]]
[17:53:26] <Dreamy_Jazz>	 Hmm. Merging took a bit longer than I expected, but should be able to merge without any lengthy testing (as it is a no-op until some private code is deployed)
[17:54:26] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to cassandra-staging-devs group for amastilovic - https://phabricator.wikimedia.org/T410972#11410475 (10RLazarus) 05Open→03Resolved a:03RLazarus @Ahoelzl @KOfori Thanks both!  @amastilovic This is complete -- please allow up to 30...
[17:55:33] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Backport for [[gerrit:1211737|Add SuggestedInvestigationsRevisionsPager (T410300)]], [[gerrit:1211735|Add SuggestedInvestigationsRevisionsPager (T410300)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[17:56:01] <logmsgbot>	 !log dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
[17:56:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[17:56:31] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[17:59:53] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P85774 and previous config saved to /var/cache/conftool/dbconfig/20251126-175952-marostegui.json
[17:59:57] <taavi>	 7700 (3.14 pages/sec 17.44 revs/sec)
[17:59:57] <taavi>	 Done!
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1800)
[18:00:16] <logmsgbot>	 !log dreamyjazz@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211737|Add SuggestedInvestigationsRevisionsPager (T410300)]], [[gerrit:1211735|Add SuggestedInvestigationsRevisionsPager (T410300)]] (duration: 07m 03s)
[18:00:22] <Dreamy_Jazz>	 Nice. I'm also done just in time!
[18:00:41] <taavi>	 is anyone planning to deploy in this window?
[18:00:43] <logmsgbot>	 !log taavi@deploy2002 mwscript-k8s job started: initSiteStats.php --wiki=tokwiki  # T404573
[18:00:48] <stashbot>	 T404573: Import tokwiki from Wikipesija.org - https://phabricator.wikimedia.org/T404573
[18:01:06] <Dreamy_Jazz>	 If the window isn't being used I have a private code change to deploy, but I can wait till you are done taavi
[18:01:15] <logmsgbot>	 !log taavi@deploy2002 mwscript-k8s job started: rebuildall.php --wiki=tokwiki  # T404573
[18:01:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[18:01:31] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[18:03:28] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201338 (https://phabricator.wikimedia.org/T290778) (owner: 10DLynch)
[18:05:04] <taavi>	 Tamzin: fwiw the last script which I started (and is still running) is responsible for building the *links tables, it's much faster (but still slow) to do all of those at the end instead of parsing each imported revision during the import itself
[18:05:26] <Dreamy_Jazz>	 It seems that no one is using this window?
[18:05:43] <Dreamy_Jazz>	 (or at least no one is using it for the intended purpose in the calendar :D )
[18:06:04] <taavi>	 yeah :D
[18:06:27] <Dreamy_Jazz>	 Do you have a need to run scap at all to finish creating the wiki?
[18:06:33] <Tamzin>	 taavi: cool. dw, i'm stickin' around. cracking open a Thai tea, should keep me up another hour or two
[18:06:36] <Dreamy_Jazz>	 If not, I would like to do the private code change
[18:06:43] <Dreamy_Jazz>	 Which needs scap
[18:07:12] <taavi>	 Dreamy_Jazz: I'll have one more patch to deploy at the very end, but I think you can sneak yours in before if you're still fine deploying while I do a bunch of unrelated mediawiki magic
[18:07:26] <Dreamy_Jazz>	 Sure. I'll get started on it now. Thanks
[18:08:04] <taavi>	 (that is the patch that allows account creation again on tokwiki after the import and related user mangling, so while it's not time-critical I still prefer to get it out, say, today and not tomorrow)
[18:08:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[18:09:01] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, November 27 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1201338 (https://phabricator.wikimedia.org/T290778) (owner: 10DLynch)
[18:09:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[18:09:49] <wikibugs>	 06SRE, 10Observability-Metrics, 06SRE Observability (FY2025/2026-Q3): Add Druid as a Private Grafana Datasource - https://phabricator.wikimedia.org/T410933#11410553 (10RLazarus) (Clinic duty here! Apparently a milestone tag, like [[ https://phabricator.wikimedia.org/project/view/7979/ | SRE Observability (FY...
[18:11:05] <Dreamy_Jazz>	 Started scap for the private code change
[18:11:15] <Dreamy_Jazz>	 Will say when it has finished
[18:13:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[18:14:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[18:15:00] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P85775 and previous config saved to /var/cache/conftool/dbconfig/20251126-181500-marostegui.json
[18:18:52] <Dreamy_Jazz>	 taavi: scap finished and I'm done with any deploys I need to do
[18:18:55] <Dreamy_Jazz>	 Thanks
[18:18:58] <taavi>	 thanks!
[18:19:33] <Dreamy_Jazz>	 !log Deployed private code change for T410300
[18:19:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[18:29:13] <taavi>	 links refresh is done
[18:30:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T410531)', diff saved to https://phabricator.wikimedia.org/P85778 and previous config saved to /var/cache/conftool/dbconfig/20251126-183007-marostegui.json
[18:30:13] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[18:30:24] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
[18:30:32] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2168 (T410531)', diff saved to https://phabricator.wikimedia.org/P85779 and previous config saved to /var/cache/conftool/dbconfig/20251126-183031-marostegui.json
[18:30:41] <Tamzin>	 taavi: getting the sense you put your thumb on the scales on user ID order :P
[18:33:34] <taavi>	 note to self: using --dry-run will prevent the CA attachment script from working
[18:33:38] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
[18:33:47] <taavi>	 Tamzin: can you check that you can log in on tok.wikipedia.org now?
[18:33:54] <Tamzin>	 i can!
[18:34:00] <Tamzin>	 happened automatically
[18:34:01] <taavi>	 you can check, or you can login?
[18:34:09] <Tamzin>	 and the contribs are there
[18:35:15] <wikibugs>	 (03PS1) 10CDanis: stat hosts: zram: use up to 50% of RAM [puppet] - 10https://gerrit.wikimedia.org/r/1211744 (https://phabricator.wikimedia.org/T376813)
[18:35:43] <wikibugs>	 (03PS1) 10Cathal Mooney: lswtest: add test switch to eqiad row C/D IBGP cluster [homer/public] - 10https://gerrit.wikimedia.org/r/1211745 (https://phabricator.wikimedia.org/T409286)
[18:36:01] <taavi>	 !log attach imported tokwiki users to CentralAuth T404573
[18:36:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:06] <stashbot>	 T404573: Import tokwiki from Wikipesija.org - https://phabricator.wikimedia.org/T404573
[18:36:22] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T410531)', diff saved to https://phabricator.wikimedia.org/P85780 and previous config saved to /var/cache/conftool/dbconfig/20251126-183622-marostegui.json
[18:36:28] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[18:36:37] <wikibugs>	 (03CR) 10Eevans: [C:03+2] cassandra: GRANTs for new analytics keyspace [puppet] - 10https://gerrit.wikimedia.org/r/1211733 (https://phabricator.wikimedia.org/T410962) (owner: 10Eevans)
[18:37:25] <wikibugs>	 06SRE, 06Traffic: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097#11410637 (10Raine)
[18:37:31] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] lswtest: add test switch to eqiad row C/D IBGP cluster [homer/public] - 10https://gerrit.wikimedia.org/r/1211745 (https://phabricator.wikimedia.org/T409286) (owner: 10Cathal Mooney)
[18:37:45] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207262 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[18:37:56] <Tamzin>	 taavi: some reports of others not being able to log in though
[18:38:34] <wikibugs>	 (03Merged) 10jenkins-bot: Allow account creation on tokwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1207262 (https://phabricator.wikimedia.org/T404457) (owner: 10Majavah)
[18:38:49] <wikibugs>	 (03Merged) 10jenkins-bot: lswtest: add test switch to eqiad row C/D IBGP cluster [homer/public] - 10https://gerrit.wikimedia.org/r/1211745 (https://phabricator.wikimedia.org/T409286) (owner: 10Cathal Mooney)
[18:38:55] <Tamzin>	 disregard, both resolved
[18:39:05] <logmsgbot>	 !log taavi@deploy2002 Started scap sync-world: Backport for [[gerrit:1207262|Allow account creation on tokwiki (T404457)]]
[18:39:10] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[18:39:19] <taavi>	 great
[18:40:27] <wikibugs>	 (03PS6) 10CDobbins: sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240)
[18:40:30] <wikibugs>	 (03PS1) 10BryanDavis: haproxy: Use full URL in UA block message [puppet] - 10https://gerrit.wikimedia.org/r/1211749
[18:40:30] <wikibugs>	 (03PS1) 10BryanDavis: varnish: Use full URL in UA block message [puppet] - 10https://gerrit.wikimedia.org/r/1211750
[18:40:36] <Tamzin>	 [[tok:]] links still don't work, is that part of the last step?
[18:41:18] <logmsgbot>	 !log taavi@deploy2002 taavi: Backport for [[gerrit:1207262|Allow account creation on tokwiki (T404457)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:41:45] <taavi>	 yeah, interwiki cache still needs updating
[18:42:30] <Tamzin>	 got it. and will the logo change be tonight, or is that a separate thing?
[18:42:49] <logmsgbot>	 !log taavi@deploy2002 taavi: Continuing with sync
[18:43:12] <wikibugs>	 (03PS1) 10Eevans: data_gateway: upgrade to v1.0.14 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211751 (https://phabricator.wikimedia.org/T410962)
[18:43:23] <taavi>	 iirc that'll need a separate #wikimedia-site-requests task these days
[18:44:22] <wikibugs>	 (03CR) 10Ssingh: sre.loadbalancer: patch to fix reboot action (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins)
[18:46:50] <logmsgbot>	 !log taavi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1207262|Allow account creation on tokwiki (T404457)]] (duration: 07m 45s)
[18:46:55] <stashbot>	 T404457: Create Wikipedia Toki Pona - https://phabricator.wikimedia.org/T404457
[18:48:01] <wikibugs>	 (03PS1) 10Majavah: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211753
[18:48:25] <taavi>	 I'll sync out the interwiki cache, and then I'll be done for tonight
[18:48:30] <wikibugs>	 06SRE, 06Traffic: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup - https://phabricator.wikimedia.org/T411097#11410676 (10Raine)
[18:48:59] <wikibugs>	 (03PS7) 10CDobbins: sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240)
[18:49:11] <wikibugs>	 (03CR) 10CDobbins: sre.loadbalancer: patch to fix reboot action (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins)
[18:49:23] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by taavi@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211753 (owner: 10Majavah)
[18:50:08] <logmsgbot>	 !log eevans@deploy2002 helmfile [staging] START helmfile.d/services/data-gateway: apply
[18:50:09] <wikibugs>	 (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211753 (owner: 10Majavah)
[18:50:25] <logmsgbot>	 !log eevans@deploy2002 helmfile [staging] DONE helmfile.d/services/data-gateway: apply
[18:50:43] <logmsgbot>	 !log taavi@deploy2002 Started scap sync-world: Backport for [[gerrit:1211753|Update interwiki cache]]
[18:50:48] <wikibugs>	 (03CR) 10Ssingh: sre.loadbalancer: patch to fix reboot action (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins)
[18:51:14] <wikibugs>	 (03CR) 10Eevans: [C:03+2] data_gateway: upgrade to v1.0.14 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211751 (https://phabricator.wikimedia.org/T410962) (owner: 10Eevans)
[18:51:29] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P85781 and previous config saved to /var/cache/conftool/dbconfig/20251126-185129-marostegui.json
[18:53:01] <wikibugs>	 (03Merged) 10jenkins-bot: data_gateway: upgrade to v1.0.14 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211751 (https://phabricator.wikimedia.org/T410962) (owner: 10Eevans)
[18:53:04] <wikibugs>	 06SRE, 10DNS, 06serviceops, 06Traffic, 07Language codes: Redirect legacy language codes for Toki Pona to tok.wikipedia.org - https://phabricator.wikimedia.org/T404507#11410708 (10Tamzin) This technically wasn't stalled, but there wasn't much reason to get around to it till now, so, noting that T404457 ha...
[18:53:07] <logmsgbot>	 !log taavi@deploy2002 taavi: Backport for [[gerrit:1211753|Update interwiki cache]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[18:53:30] <logmsgbot>	 !log eevans@deploy2002 helmfile [staging] START helmfile.d/services/data-gateway: apply
[18:53:40] <logmsgbot>	 !log taavi@deploy2002 taavi: Continuing with sync
[18:53:48] <logmsgbot>	 !log eevans@deploy2002 helmfile [staging] DONE helmfile.d/services/data-gateway: apply
[18:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[18:54:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins)
[18:55:27] <Tamzin>	 taavi: thank you so much for everything!
[18:57:24] <wikibugs>	 (03CR) 10Joal: [C:03+1] "Awesome! TIL zram!" [puppet] - 10https://gerrit.wikimedia.org/r/1211744 (https://phabricator.wikimedia.org/T376813) (owner: 10CDanis)
[18:57:42] <logmsgbot>	 !log taavi@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211753|Update interwiki cache]] (duration: 06m 59s)
[18:57:50] <taavi>	 with that live I'm done for the evening
[19:00:05] <jouncebot>	 jnuche and brennen: #bothumor I � Unicode. All rise for MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1900).
[19:03:04] <wikibugs>	 (03PS8) 10CDobbins: sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240)
[19:03:15] <wikibugs>	 (03CR) 10CDobbins: sre.loadbalancer: patch to fix reboot action (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240) (owner: 10CDobbins)
[19:03:42] <wikibugs>	 (03PS9) 10CDobbins: sre.loadbalancer: patch to fix reboot action [cookbooks] - 10https://gerrit.wikimedia.org/r/1211241 (https://phabricator.wikimedia.org/T395240)
[19:06:37] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P85782 and previous config saved to /var/cache/conftool/dbconfig/20251126-190636-marostegui.json
[19:06:40] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11410757 (10Andrew) 05Open→03Resolved
[19:13:07] <wikibugs>	 10ops-eqiad, 06SRE, 06cloud-services-team, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): eqiad row C/D cloud hosts pending migration - https://phabricator.wikimedia.org/T411025#11410786 (10Jclark-ctr) a:05Andrew→03Jclark-ctr cloudelastic1009 clouddb1017 clouddumps1002 clouddb1018 cloud...
[19:13:58] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot (apply updates) - ryankemper@cumin2002 - T410573
[19:14:03] <stashbot>	 T410573: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573
[19:15:43] <logmsgbot>	 !log andrew@cumin2002 START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
[19:15:47] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[19:19:16] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change IPs for sretest1006 - cmooney@cumin1003"
[19:19:20] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change IPs for sretest1006 - cmooney@cumin1003"
[19:19:20] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[19:19:45] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.wipe-cache sretest1006.eqiad.wmnet on all recursors
[19:19:48] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1006.eqiad.wmnet on all recursors
[19:20:01] <logmsgbot>	 !log cmooney@cumin1003 END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1006.eqiad.wmnet
[19:21:04] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
[19:21:15] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410830 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm...
[19:21:44] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T410531)', diff saved to https://phabricator.wikimedia.org/P85783 and previous config saved to /var/cache/conftool/dbconfig/20251126-192143-marostegui.json
[19:21:49] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[19:22:00] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
[19:22:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2182 (T410531)', diff saved to https://phabricator.wikimedia.org/P85784 and previous config saved to /var/cache/conftool/dbconfig/20251126-192207-marostegui.json
[19:26:35] <wikibugs>	 (03CR) 10Ryan Kemper: wdqs: add availability sli recording rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1202049 (https://phabricator.wikimedia.org/T393966) (owner: 10Ryan Kemper)
[19:27:53] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T410531)', diff saved to https://phabricator.wikimedia.org/P85785 and previous config saved to /var/cache/conftool/dbconfig/20251126-192752-marostegui.json
[19:27:58] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[19:28:06] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Nice, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1211744 (https://phabricator.wikimedia.org/T376813) (owner: 10CDanis)
[19:28:25] <jinxer-wm>	 FIRING: [6x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2086:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:28:56] <wikibugs>	 (03PS5) 10Ryan Kemper: wdqs: add availability sli recording rules [puppet] - 10https://gerrit.wikimedia.org/r/1202049 (https://phabricator.wikimedia.org/T393966)
[19:29:37] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[19:32:11] <logmsgbot>	 !log andrew@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
[19:32:13] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2086 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:32:16] <wikibugs>	 (03PS1) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.23.0-a7 [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960)
[19:32:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, November 26 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960) (owner: 10C. Scott Ananian)
[19:34:25] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Infrastructure Security, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), and 3 others: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573#11410895 (10RKemper)
[19:35:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Bump wikimedia/parsoid to 0.23.0-a7 [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960) (owner: 10C. Scott Ananian)
[19:35:14] <wikibugs>	 (03PS2) 10C. Scott Ananian: Bump wikimedia/parsoid to 0.23.0-a7 [vendor] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211759 (https://phabricator.wikimedia.org/T204307)
[19:38:25] <jinxer-wm>	 RESOLVED: [6x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2086:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:38:35] <zabe>	 jouncebot: nowandnext
[19:38:36] <jouncebot>	 For the next 1 hour(s) and 21 minute(s): MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T1900)
[19:38:36] <jouncebot>	 In 1 hour(s) and 21 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T2100)
[19:38:41] <logmsgbot>	 !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
[19:39:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good and key verified out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1211284 (owner: 10Cwhite)
[19:41:02] <wikibugs>	 (03CR) 10Zabe: [C:03+2] RestrictionStore: Check for no up to date cascade protections [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211731 (https://phabricator.wikimedia.org/T411092) (owner: 10Zabe)
[19:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:42:13] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9400 on cirrussearch2086 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9400 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[19:43:01] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P85786 and previous config saved to /var/cache/conftool/dbconfig/20251126-194300-marostegui.json
[19:44:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[19:44:31] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[19:49:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[19:49:26] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[19:53:59] <logmsgbot>	 !log cmooney@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie
[19:54:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410946 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet...
[19:54:24] <wikibugs>	 06SRE, 10SRE-swift-storage, 10Infrastructure Security, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), and 3 others: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573#11410948 (10MoritzMuehlenhoff) @RKemper There's still an a missed host: cirrussearch2084...
[19:55:24] <wikibugs>	 (03CR) 10CI reject: [V:04-1] RestrictionStore: Check for no up to date cascade protections [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211731 (https://phabricator.wikimedia.org/T411092) (owner: 10Zabe)
[19:56:00] <zabe>	 20:54:57 1) Wikibase\Repo\Tests\Api\FormatSnakValueTest::testApiRequest with data set #9 (Closure Object (...))
[19:56:00] <zabe>	 20:54:57 RuntimeException: Could not acquire lock for page ID '1'.
[19:56:03] <zabe>	 Hmm
[19:56:18] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
[19:56:33] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11410949 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host sretest1006.eqiad.wm...
[19:57:03] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10Wikidata, and 3 others: Racking request for wdqs10(2[8-9]|3[0-2]) - https://phabricator.wikimedia.org/T410406#11410950 (10RKemper) Looks like wdqs1032 and wdqs1029 at minimum might need another reimage
[19:57:30] <wikibugs>	 (03CR) 10Zabe: [C:03+2] "retry" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211731 (https://phabricator.wikimedia.org/T411092) (owner: 10Zabe)
[19:58:08] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P85787 and previous config saved to /var/cache/conftool/dbconfig/20251126-195807-marostegui.json
[20:06:04] <wikibugs>	 (03CR) 10C. Scott Ananian: "recheck" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960) (owner: 10C. Scott Ananian)
[20:09:04] <wikibugs>	 (03Merged) 10jenkins-bot: RestrictionStore: Check for no up to date cascade protections [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211731 (https://phabricator.wikimedia.org/T411092) (owner: 10Zabe)
[20:09:43] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1211731|RestrictionStore: Check for no up to date cascade protections (T411092)]]
[20:09:49] <stashbot>	 T411092: InvalidArgumentException: Wikimedia\Rdbms\Platform\SQLPlatform::makeList: empty input for field tl_from - https://phabricator.wikimedia.org/T411092
[20:11:55] <logmsgbot>	 !log zabe@deploy2002 zabe: Backport for [[gerrit:1211731|RestrictionStore: Check for no up to date cascade protections (T411092)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:12:36] <logmsgbot>	 !log zabe@deploy2002 zabe: Continuing with sync
[20:13:01] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1006.eqiad.wmnet with reason: host reimage
[20:13:16] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T410531)', diff saved to https://phabricator.wikimedia.org/P85788 and previous config saved to /var/cache/conftool/dbconfig/20251126-201315-marostegui.json
[20:13:21] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[20:13:31] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
[20:16:12] <logmsgbot>	 !log andrew@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
[20:16:40] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211731|RestrictionStore: Check for no up to date cascade protections (T411092)]] (duration: 06m 56s)
[20:16:45] <stashbot>	 T411092: InvalidArgumentException: Wikimedia\Rdbms\Platform\SQLPlatform::makeList: empty input for field tl_from - https://phabricator.wikimedia.org/T411092
[20:17:35] <icinga-wm>	 PROBLEM - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2084 is CRITICAL: CRITICAL: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:18:21] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
[20:19:22] <logmsgbot>	 !log cmooney@cumin1003 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1006.eqiad.wmnet with reason: host reimage
[20:20:25] <jinxer-wm>	 FIRING: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2084:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:22:06] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
[20:22:13] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2208 (T410531)', diff saved to https://phabricator.wikimedia.org/P85789 and previous config saved to /var/cache/conftool/dbconfig/20251126-202213-marostegui.json
[20:22:18] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[20:24:48] <wikibugs>	 (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211772
[20:27:35] <icinga-wm>	 RECOVERY - Check unit status of push_cross_cluster_settings_9200 on cirrussearch2084 is OK: OK: Status of the systemd unit push_cross_cluster_settings_9200 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[20:27:40] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208 (T410531)', diff saved to https://phabricator.wikimedia.org/P85790 and previous config saved to /var/cache/conftool/dbconfig/20251126-202739-marostegui.json
[20:27:45] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[20:27:49] <icinga-wm>	 PROBLEM - Check correctness of the icinga configuration on alert1002 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga
[20:30:25] <jinxer-wm>	 RESOLVED: [3x] SystemdUnitFailed: prometheus-wmf-elasticsearch-exporter-9200.service on cirrussearch2084:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:30:33] <wikibugs>	 (03PS1) 10PipelineBot: wikifeeds: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1211775
[20:37:44] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1006.eqiad.wmnet with OS trixie
[20:37:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Eqiad C/D refresh: 2 x test hosts for config validation - https://phabricator.wikimedia.org/T405560#11411029 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host sretest1006.eqiad.wmnet...
[20:39:37] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[20:40:10] <wikibugs>	 (03PS1) 10Andrew Bogott: codfw1dev cloudlb: try 'source' balance method [puppet] - 10https://gerrit.wikimedia.org/r/1211782 (https://phabricator.wikimedia.org/T410265)
[20:40:12] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudlb: change balance for keystone-admin api to 'source [puppet] - 10https://gerrit.wikimedia.org/r/1211783
[20:41:38] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] codfw1dev cloudlb: try 'source' balance method [puppet] - 10https://gerrit.wikimedia.org/r/1211782 (https://phabricator.wikimedia.org/T410265) (owner: 10Andrew Bogott)
[20:42:47] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P85791 and previous config saved to /var/cache/conftool/dbconfig/20251126-204246-marostegui.json
[20:47:49] <wikibugs>	 (03PS2) 10Andrew Bogott: cloudlb: change balance for keystone-admin api to 'source' [puppet] - 10https://gerrit.wikimedia.org/r/1211783
[20:48:04] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "codfw1dev cloudlb: try 'source' balance method" [puppet] - 10https://gerrit.wikimedia.org/r/1211789
[20:53:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] Revert "codfw1dev cloudlb: try 'source' balance method" [puppet] - 10https://gerrit.wikimedia.org/r/1211789 (owner: 10Andrew Bogott)
[20:53:29] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] cloudlb: change balance for keystone-admin api to 'source' [puppet] - 10https://gerrit.wikimedia.org/r/1211783 (owner: 10Andrew Bogott)
[20:57:55] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P85792 and previous config saved to /var/cache/conftool/dbconfig/20251126-205754-marostegui.json
[21:00:05] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC late backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T2100).
[21:00:05] <jouncebot>	 ejegg and cscott: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:04:06] <cscott>	 i can spiderpig
[21:04:13] <cscott>	 do you mind if I go first?
[21:04:52] <zabe>	 I think the other person is not online
[21:05:03] <cscott>	 i win then!
[21:05:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [vendor] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211759 (https://phabricator.wikimedia.org/T204307) (owner: 10C. Scott Ananian)
[21:05:58] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960) (owner: 10C. Scott Ananian)
[21:06:13] <wikibugs>	 (03PS1) 10DCausse: cirrus: enable DWIM wrong keyboad second try on all he & ru wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211799 (https://phabricator.wikimedia.org/T408734)
[21:06:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 99.98% on relforge1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[21:11:55] <jinxer-wm>	 RESOLVED: [2x] MaxConntrack: Max conntrack at 99.98% on relforge1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[21:13:03] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2208 (T410531)', diff saved to https://phabricator.wikimedia.org/P85793 and previous config saved to /var/cache/conftool/dbconfig/20251126-211302-marostegui.json
[21:13:08] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[21:13:19] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
[21:13:27] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2220 (T410531)', diff saved to https://phabricator.wikimedia.org/P85794 and previous config saved to /var/cache/conftool/dbconfig/20251126-211326-marostegui.json
[21:18:52] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220 (T410531)', diff saved to https://phabricator.wikimedia.org/P85795 and previous config saved to /var/cache/conftool/dbconfig/20251126-211851-marostegui.json
[21:18:57] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[21:19:01] <wikibugs>	 (03Merged) 10jenkins-bot: Bump wikimedia/parsoid to 0.23.0-a7 [vendor] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211759 (https://phabricator.wikimedia.org/T204307) (owner: 10C. Scott Ananian)
[21:19:12] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot (apply updates) - ryankemper@cumin2002 - T410573
[21:19:13] <wikibugs>	 (03Merged) 10jenkins-bot: Bump wikimedia/parsoid to 0.23.0-a7 [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1211760 (https://phabricator.wikimedia.org/T410960) (owner: 10C. Scott Ananian)
[21:19:16] <stashbot>	 T410573: October 2025 Bullseye reboots: Search Platform-owned hosts - https://phabricator.wikimedia.org/T410573
[21:19:20] <ejegg>	 hi deploy folks, sorry if I missed the config deploy slot
[21:19:31] <ejegg>	 I was hanging out in -releng instead of here
[21:19:45] <logmsgbot>	 !log cscott@deploy2002 Started scap sync-world: Backport for [[gerrit:1211759|Bump wikimedia/parsoid to 0.23.0-a7 (T204307 T373253 T410826 T410960)]], [[gerrit:1211760|Bump wikimedia/parsoid to 0.23.0-a7 (T410960)]]
[21:19:54] <stashbot>	 T204307: Parser Functions should support named parameters - https://phabricator.wikimedia.org/T204307
[21:19:55] <stashbot>	 T373253: Develop semantic / distinct representation for wikifunctions output in Parsoid DOM - https://phabricator.wikimedia.org/T373253
[21:19:55] <stashbot>	 T410826: UnexpectedValueException: Unable to decode data-mw [{"parts":[{"template":{"target":{"wt":"#ifexpr: {{#expr:{{CURRENTMONTH}} = 4}} and {{#expr:{{CURRENTDAY}} = 1}}","function":"ifexpr"},"params":{"1":{"wt":"<div class=\"usermes - https://phabricator.wikimedia.org/T410826
[21:19:56] <stashbot>	 T410960: CTT tasks week of 2025-11-21 - https://phabricator.wikimedia.org/T410960
[21:20:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[21:20:26] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[21:21:21] <ejegg>	 hi RoanKattouw, sorry I was in the wrong channel for the start of the backport window
[21:21:55] <ejegg>	 if i've missed my chance, no worries, I'll reschedule for Monday
[21:21:56] <logmsgbot>	 !log cscott@deploy2002 cscott: Backport for [[gerrit:1211759|Bump wikimedia/parsoid to 0.23.0-a7 (T204307 T373253 T410826 T410960)]], [[gerrit:1211760|Bump wikimedia/parsoid to 0.23.0-a7 (T410960)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:21:58] <cscott>	 no worries, i jumped the queue.  my patches are just about to finish up
[21:22:08] <ejegg>	 oh cool
[21:25:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[21:25:32] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[21:26:08] <wikibugs>	 (03PS2) 10DCausse: cirrus: enable DWIM wrong keyboard second try on all he & ru wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211799 (https://phabricator.wikimedia.org/T408734)
[21:28:24] <logmsgbot>	 !log cscott@deploy2002 cscott: Continuing with sync
[21:28:40] <cscott>	 ok, tested the parsoid backport and it looks good, continuing
[21:28:58] <cscott>	 ejegg: shouldn't be long now
[21:29:46] <cscott>	 zabe: are you the official deployer on duty for this window?  or is that RoanKattouw / urbanecm / TheresNoTime / kindrobot / cjming ?
[21:30:32] <zabe>	 No I am not, but I can deploy something if needed
[21:32:23] <logmsgbot>	 !log cscott@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211759|Bump wikimedia/parsoid to 0.23.0-a7 (T204307 T373253 T410826 T410960)]], [[gerrit:1211760|Bump wikimedia/parsoid to 0.23.0-a7 (T410960)]] (duration: 12m 38s)
[21:32:33] <stashbot>	 T204307: Parser Functions should support named parameters - https://phabricator.wikimedia.org/T204307
[21:32:33] <stashbot>	 T373253: Develop semantic / distinct representation for wikifunctions output in Parsoid DOM - https://phabricator.wikimedia.org/T373253
[21:32:34] <stashbot>	 T410826: UnexpectedValueException: Unable to decode data-mw [{"parts":[{"template":{"target":{"wt":"#ifexpr: {{#expr:{{CURRENTMONTH}} = 4}} and {{#expr:{{CURRENTDAY}} = 1}}","function":"ifexpr"},"params":{"1":{"wt":"<div class=\"usermes - https://phabricator.wikimedia.org/T410826
[21:32:34] <stashbot>	 T410960: CTT tasks week of 2025-11-21 - https://phabricator.wikimedia.org/T410960
[21:33:59] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P85796 and previous config saved to /var/cache/conftool/dbconfig/20251126-213358-marostegui.json
[21:35:50] <cscott>	 zabe: well i'm done.  ejegg do you need a deployer?
[21:36:16] <ejegg>	 that would be great! It's been a long time since I deployed anything to the main cluster
[21:36:40] <cscott>	 you should try spiderpig, it's great ;)
[21:36:59] <cscott>	 zabe, can you help ejegg out?
[21:37:18] <zabe>	 Sure, I can deploy, unless ejegg wants to try spiderpig
[21:37:20] <jinxer-wm>	 FIRING: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[21:37:33] <ejegg>	 zabe: I think I'll try it next time
[21:37:36] <zabe>	 Alright
[21:37:42] <ejegg>	 thanks!
[21:37:43] <wikibugs>	 (03CR) 10Zabe: [C:03+2] Remove fundraiseup domains from donatewiki CSP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211155 (https://phabricator.wikimedia.org/T410737) (owner: 10Ejegg)
[21:38:20] <jinxer-wm>	 FIRING: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[21:38:36] <wikibugs>	 (03Merged) 10jenkins-bot: Remove fundraiseup domains from donatewiki CSP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211155 (https://phabricator.wikimedia.org/T410737) (owner: 10Ejegg)
[21:39:39] <logmsgbot>	 !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1211155|Remove fundraiseup domains from donatewiki CSP (T410737)]]
[21:39:44] <stashbot>	 T410737: Remove Fundraiseup from donatewiki CSP - https://phabricator.wikimedia.org/T410737
[21:42:03] <logmsgbot>	 !log zabe@deploy2002 ejegg, zabe: Backport for [[gerrit:1211155|Remove fundraiseup domains from donatewiki CSP (T410737)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:42:41] <zabe>	 ejegg: is this properly testable?
[21:43:06] <ejegg>	 thanks zabe, testing
[21:43:20] <ejegg>	 should be, just checking for a response header
[21:43:27] <zabe>	 fair
[21:43:32] <ejegg>	 lemme just get that debug extension going
[21:44:40] <ejegg>	 yep, looks good on the test server zabe 
[21:44:54] <zabe>	 Nice, syncing
[21:45:02] <logmsgbot>	 !log zabe@deploy2002 ejegg, zabe: Continuing with sync
[21:49:07] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P85797 and previous config saved to /var/cache/conftool/dbconfig/20251126-214906-marostegui.json
[21:50:14] <logmsgbot>	 !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1211155|Remove fundraiseup domains from donatewiki CSP (T410737)]] (duration: 10m 34s)
[21:50:19] <stashbot>	 T410737: Remove Fundraiseup from donatewiki CSP - https://phabricator.wikimedia.org/T410737
[21:50:19] <zabe>	 ejegg: should be live
[21:50:23] <ejegg>	 looking
[21:51:02] <ejegg>	 yep, headers look right w/o the debug extension. Thanks again, zabe!
[21:51:10] <zabe>	 yw
[21:52:20] <jinxer-wm>	 RESOLVED: CirrusSearchFullTextLatencyTooHigh: CirrusSearch full_text 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchFullTextLatencyTooHigh
[21:53:20] <jinxer-wm>	 RESOLVED: CirrusSearchMoreLikeLatencyTooHigh: CirrusSearch more_like 95th percentiles latency is too high (mw@eqiad to dnsdisc) - https://wikitech.wikimedia.org/wiki/Search#Health/Activity_Monitoring - https://grafana.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchMoreLikeLatencyTooHigh
[22:00:01] <wikibugs>	 (03PS1) 10Urbanecm: beta: Enable UserEmailConfirmationUseHTML on betawikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211809 (https://phabricator.wikimedia.org/T396155)
[22:00:05] <jouncebot>	 Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T2200)
[22:01:10] <wikibugs>	 (03PS2) 10Urbanecm: beta: Enable UserEmailConfirmationUseHTML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211809 (https://phabricator.wikimedia.org/T396155)
[22:02:07] <wikibugs>	 (03PS1) 10Urbanecm: enwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211810 (https://phabricator.wikimedia.org/T410970)
[22:03:51] <wikibugs>	 (03PS1) 10Urbanecm: testwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211812 (https://phabricator.wikimedia.org/T396155)
[22:04:14] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2220 (T410531)', diff saved to https://phabricator.wikimedia.org/P85798 and previous config saved to /var/cache/conftool/dbconfig/20251126-220414-marostegui.json
[22:04:16] <wikibugs>	 06SRE, 06collaboration-services, 05PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11411348 (10Jdrewniak) > @ATitkov Please file a ticket for the security review as normal, and we (Product Safety and Integrity) will expedite a decision (whether t...
[22:04:20] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[22:04:30] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2221.codfw.wmnet with reason: Maintenance
[22:04:38] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85799 and previous config saved to /var/cache/conftool/dbconfig/20251126-220437-marostegui.json
[22:04:52] <wikibugs>	 (03PS3) 10Urbanecm: enwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211810 (https://phabricator.wikimedia.org/T410970)
[22:05:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] enwiki: Enable HTML confirmation email [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211810 (https://phabricator.wikimedia.org/T410970) (owner: 10Urbanecm)
[22:07:05] <wikibugs>	 (03PS1) 10Urbanecm: Enable HTML confirmation email on Wikidata and Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1211813 (https://phabricator.wikimedia.org/T410971)
[22:10:11] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85800 and previous config saved to /var/cache/conftool/dbconfig/20251126-221010-marostegui.json
[22:10:16] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[22:10:39] <jinxer-wm>	 FIRING: CirrusSearchNodeIndexingNotIncreasing: Elasticsearch instance cirrussearch1112-production-search-eqiad is not indexing - https://wikitech.wikimedia.org/wiki/Search/Elasticsearch_Administration#Indexing_hung_and_not_making_progress - https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&from=now-3d&to=now&viewPanel=57 - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchNodeIndexingNotIncreasing
[22:13:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[22:19:03] <jinxer-wm>	 FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[22:23:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[22:24:03] <jinxer-wm>	 RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[22:24:37] <jinxer-wm>	 FIRING: [6x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[22:25:18] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P85801 and previous config saved to /var/cache/conftool/dbconfig/20251126-222517-marostegui.json
[22:36:56] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] admin: add new ssh key for cwhite [puppet] - 10https://gerrit.wikimedia.org/r/1211284 (owner: 10Cwhite)
[22:40:26] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P85802 and previous config saved to /var/cache/conftool/dbconfig/20251126-224025-marostegui.json
[22:54:37] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[22:55:33] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2221 (T410531)', diff saved to https://phabricator.wikimedia.org/P85803 and previous config saved to /var/cache/conftool/dbconfig/20251126-225532-marostegui.json
[22:55:38] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[22:55:49] <logmsgbot>	 !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2222.codfw.wmnet with reason: Maintenance
[22:55:57] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2222 (T410531)', diff saved to https://phabricator.wikimedia.org/P85804 and previous config saved to /var/cache/conftool/dbconfig/20251126-225556-marostegui.json
[23:00:04] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251126T2300)
[23:01:24] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222 (T410531)', diff saved to https://phabricator.wikimedia.org/P85805 and previous config saved to /var/cache/conftool/dbconfig/20251126-230123-marostegui.json
[23:01:29] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[23:10:26] <wikibugs>	 (03PS1) 10Cwhite: monitoring: add lswtest-d8-eqiad hostgroup [puppet] - 10https://gerrit.wikimedia.org/r/1211828 (https://phabricator.wikimedia.org/T411098)
[23:15:35] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+1] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/1211828 (https://phabricator.wikimedia.org/T411098) (owner: 10Cwhite)
[23:16:04] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] monitoring: add lswtest-d8-eqiad hostgroup [puppet] - 10https://gerrit.wikimedia.org/r/1211828 (https://phabricator.wikimedia.org/T411098) (owner: 10Cwhite)
[23:16:31] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P85806 and previous config saved to /var/cache/conftool/dbconfig/20251126-231631-marostegui.json
[23:29:37] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[23:31:39] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P85807 and previous config saved to /var/cache/conftool/dbconfig/20251126-233138-marostegui.json
[23:41:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-wikikube_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:46:46] <logmsgbot>	 !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2222 (T410531)', diff saved to https://phabricator.wikimedia.org/P85808 and previous config saved to /var/cache/conftool/dbconfig/20251126-234646-marostegui.json
[23:46:52] <stashbot>	 T410531: Drop rc_type from recentchanges in wmf production - https://phabricator.wikimedia.org/T410531
[23:51:28] <logmsgbot>	 !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
[23:54:11] <logmsgbot>	 !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[23:55:25] <wikibugs>	 (03PS1) 10Cathal Mooney: hierdata/comm.yaml: add lswtest-d8-eqiad temp test device [puppet] - 10https://gerrit.wikimedia.org/r/1211848
[23:57:12] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] "Awesome, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1211848 (owner: 10Cathal Mooney)
[23:58:01] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: Remove lvs1018 L2 link to ssw1-e1-eqiad - https://phabricator.wikimedia.org/T405499#11411590 (10VRiley-WMF) Hey @cmooney It has been reused for that purpose, however it's still being worked on to update the connection in netbox
[23:59:03] <jinxer-wm>	 FIRING: MediaWikiEditFailures: Elevated MediaWiki edit failures (session_loss) for cluster  - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=13 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiEditFailures