[00:03:06] !log vriley@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [00:03:30] !log vriley@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" [00:03:31] !log vriley@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1003.wikimedia.org with OS trixie [00:03:46] !log jhancock@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage [00:03:46] 10ops-eqiad, 06SRE, 06collaboration-services, 10Continuous-Integration-Infrastructure, and 2 others: eqiad: request for a decom'ed R440 - Config C - https://phabricator.wikimedia.org/T418544#11695586 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1003 for host contint100... [00:08:37] 10ops-eqiad, 06SRE, 06collaboration-services, 10Continuous-Integration-Infrastructure, and 2 others: eqiad: request for a decom'ed R440 - Config C - https://phabricator.wikimedia.org/T418544#11695589 (10VRiley-WMF) Was able to complete this after speaking with @Jhancock.wm Thank you! @Dzahn It should be c... [00:21:16] jouncebot: nowandnext [00:21:17] No deployments scheduled for the next 5 hour(s) and 38 minute(s) [00:21:17] In 5 hour(s) and 38 minute(s): MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260311T0600) [00:21:22] (03CR) 10Zabe: [C:03+2] Stop setting $wgImageLinksSchemaMigrationStage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250117 (https://phabricator.wikimedia.org/T299953) (owner: 10Zabe) [00:22:24] (03Merged) 10jenkins-bot: Stop setting $wgImageLinksSchemaMigrationStage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250117 (https://phabricator.wikimedia.org/T299953) (owner: 10Zabe) [00:23:16] zabe: Congratulations! Great work. [00:23:32] Thanks:) [00:24:18] !log zabe@deploy2002 Started scap sync-world: Backport for [[gerrit:1250117|Stop setting $wgImageLinksSchemaMigrationStage (T299953)]] [00:24:22] T299953: Normalize imagelinks table - https://phabricator.wikimedia.org/T299953 [00:26:27] !log zabe@deploy2002 zabe: Backport for [[gerrit:1250117|Stop setting $wgImageLinksSchemaMigrationStage (T299953)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:29:59] !log zabe@deploy2002 zabe: Continuing with sync [00:30:02] (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1249460 (owner: 10TrainBranchBot) [00:33:56] !log zabe@deploy2002 Finished scap sync-world: Backport for [[gerrit:1250117|Stop setting $wgImageLinksSchemaMigrationStage (T299953)]] (duration: 09m 38s) [00:34:00] T299953: Normalize imagelinks table - https://phabricator.wikimedia.org/T299953 [00:34:14] FIRING: CertAlmostExpired: Certificate for service lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [00:39:09] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1250131 [00:39:09] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1250131 (owner: 10TrainBranchBot) [00:44:15] FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [00:47:25] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:51:42] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1250131 (owner: 10TrainBranchBot) [00:56:12] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: FY2526 Q3:rack/setup/install ms-be209[56] - https://phabricator.wikimedia.org/T413088#11695622 (10Jhancock.wm) [00:59:04] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: FY2526 Q3:rack/setup/install ms-be209[56] - https://phabricator.wikimedia.org/T413088#11695626 (10Jhancock.wm) @MatthewVernon both these are having an issue at this step. puppet files might need an adjustment. [12/60, retrying in 360.00s] Attempt to run... [01:08:10] (03CR) 10Zabe: [C:03+2] "retry" [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1250131 (owner: 10TrainBranchBot) [01:09:17] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1250136 [01:09:17] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1250136 (owner: 10TrainBranchBot) [01:20:57] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1250131 (owner: 10TrainBranchBot) [01:26:27] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1250136 (owner: 10TrainBranchBot) [02:00:51] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:08:51] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 07m 59s) [02:08:55] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:15] FIRING: [3x] HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlserve@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [02:33:55] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:59:48] FIRING: [2x] KubernetesCalicoDown: dse-k8s-worker1010.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [03:30:17] PROBLEM - Ensure traffic_manager is running for instance backend on cp1100 is CRITICAL: PROCS CRITICAL: 3 processes with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:31:17] RECOVERY - Ensure traffic_manager is running for instance backend on cp1100 is OK: PROCS OK: 1 process with args /usr/bin/traffic_manager --nosyslog https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [04:08:43] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647 (10Papaul) 03NEW [04:09:30] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11695826 (10Papaul) p:05Triage→03High a:05cmooney→03ayounsi [04:34:14] FIRING: CertAlmostExpired: Certificate for service lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [04:44:15] FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [04:47:40] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:09:54] (03PS1) 10TChin: Add stream config for attribution research [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250249 (https://phabricator.wikimedia.org/T417050) [05:10:26] (03PS4) 10Ryan Kemper: Add new active-active discovery service for dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:10:34] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:14:19] (03PS5) 10Ryan Kemper: Add new active-active discovery service for dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:14:27] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:26:33] (03PS1) 10Clare Ming: Remove mpic redirects to test-kitchen [puppet] - 10https://gerrit.wikimedia.org/r/1250250 (https://phabricator.wikimedia.org/T415845) [05:28:01] (03PS6) 10Ryan Kemper: Add new active-active discovery service for dse-k8s [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:29:47] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [05:57:09] (03CR) 10Ryan Kemper: "Checked this thoroughly, it's perfect. Made a couple tiny amendments to commit message but otherwise this is ready to ship next Tuesday (o" [puppet] - 10https://gerrit.wikimedia.org/r/1248605 (https://phabricator.wikimedia.org/T417698) (owner: 10Bking) [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260311T0600) [06:08:06] (03CR) 10Ayounsi: [C:03+2] Add more depool strategies for rack depool cookbook [puppet] - 10https://gerrit.wikimedia.org/r/1249958 (https://phabricator.wikimedia.org/T327300) (owner: 10Ayounsi) [06:14:15] FIRING: [3x] HelmReleaseBadStatus: Helm release kserve/kserve on k8s-mlserve@codfw in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [06:15:22] 10ops-magru: Inbound errors on interface cr1-magru:xe-0/1/1 (Transport: cr2-eqiad:xe-1/0/1:3 (Telxius, CRT-008508) {#70089}) - https://phabricator.wikimedia.org/T413409#11695960 (10ayounsi) a:03RobH Rob, could you investigate those as well. Same as {T415743}. Please sync up with us to drain the link ahead of t... [06:16:00] 10ops-magru, 06SRE, 06Infrastructure-Foundations, 10netops: cr2-magru <-> asw1-b3-magru link down March 2026 - https://phabricator.wikimedia.org/T418978#11695964 (10ayounsi) Awesome, thx!! [06:33:12] (03PS1) 10Kevin Bazira: ml: add aiter support to vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) [06:59:48] FIRING: [2x] KubernetesCalicoDown: dse-k8s-worker1010.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [07:00:05] Amir1, Urbanecm, and awight: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260311T0700) [07:00:05] katherine_g and Msz2001: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:04:28] o/ [07:06:30] o/ [07:06:46] katherine_g: Do you need a deployer? [07:07:11] Msz2001: I can go ahead- starting now [07:07:22] ack [07:07:50] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kgraessle@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247639 (https://phabricator.wikimedia.org/T400727) (owner: 10Kgraessle) [07:08:53] (03Merged) 10jenkins-bot: Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247639 (https://phabricator.wikimedia.org/T400727) (owner: 10Kgraessle) [07:09:48] !log kgraessle@deploy2002 Started scap sync-world: Backport for [[gerrit:1247639|Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727)]] [07:09:50] (03PS1) 10Mszwarc: Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250426 [07:09:52] T400727: set AutoModeratorMultiLingualRevertRisk with available wikis - https://phabricator.wikimedia.org/T400727 [07:12:01] !log kgraessle@deploy2002 kgraessle: Backport for [[gerrit:1247639|Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:18:15] !log kgraessle@deploy2002 kgraessle: Continuing with sync [07:22:12] !log kgraessle@deploy2002 Finished scap sync-world: Backport for [[gerrit:1247639|Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727)]] (duration: 12m 24s) [07:22:16] T400727: set AutoModeratorMultiLingualRevertRisk with available wikis - https://phabricator.wikimedia.org/T400727 [07:22:46] Msz2001: over to you [07:23:50] (03PS3) 10Mszwarc: Display list of 2FA-req. groups on AccountSecurity for 2FA-less users [extensions/OATHAuth] (wmf/1.46.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1249921 (https://phabricator.wikimedia.org/T419422) [07:24:19] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/OATHAuth] (wmf/1.46.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1249921 (https://phabricator.wikimedia.org/T419422) (owner: 10Mszwarc) [07:24:19] (03CR) 10TrainBranchBot: [C:03+2] "Approved by mszwarc@deploy2002 using scap backport" [extensions/WikimediaMessages] (wmf/1.46.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1250066 (https://phabricator.wikimedia.org/T419111) (owner: 10Mszwarc) [07:24:36] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, March 11 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployca" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250426 (owner: 10Mszwarc) [07:25:29] (03PS3) 10AKhatun: stream: mediawiki.page_edit_type_simple [deployment-charts] - 10https://gerrit.wikimedia.org/r/1249360 (https://phabricator.wikimedia.org/T351225) [07:27:45] !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host ncredir4003.ulsfo.wmnet [07:27:47] !log jmm@cumin2002 START - Cookbook sre.dns.netbox [07:28:00] (03Merged) 10jenkins-bot: Display list of 2FA-req. groups on AccountSecurity for 2FA-less users [extensions/OATHAuth] (wmf/1.46.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1249921 (https://phabricator.wikimedia.org/T419422) (owner: 10Mszwarc) [07:28:52] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696068 (10ayounsi) Using this opportunity to test my WIP rack depool cookbook (only in "show" mode). More info in {T327300} That's the current status of what... [07:33:17] (03PS1) 10Muehlenhoff: profile::server_depool: Annotate maps [puppet] - 10https://gerrit.wikimedia.org/r/1250431 [07:34:11] !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4003.ulsfo.wmnet - jmm@cumin2002" [07:34:17] !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4003.ulsfo.wmnet - jmm@cumin2002" [07:34:17] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [07:34:17] !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache ncredir4003.ulsfo.wmnet on all recursors [07:34:21] (03PS2) 10Muehlenhoff: profile::server_depool: Annotate maps [puppet] - 10https://gerrit.wikimedia.org/r/1250431 [07:34:21] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir4003.ulsfo.wmnet on all recursors [07:34:50] !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4003.ulsfo.wmnet - jmm@cumin2002" [07:34:55] !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4003.ulsfo.wmnet - jmm@cumin2002" [07:36:00] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11696079 (10MoritzMuehlenhoff) [07:36:13] (03Merged) 10jenkins-bot: Send2FAWarningNotifications: Support reading users from file [extensions/WikimediaMessages] (wmf/1.46.0-wmf.19) - 10https://gerrit.wikimedia.org/r/1250066 (https://phabricator.wikimedia.org/T419111) (owner: 10Mszwarc) [07:36:46] !log mszwarc@deploy2002 Started scap sync-world: Backport for [[gerrit:1249921|Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422)]], [[gerrit:1250066|Send2FAWarningNotifications: Support reading users from file (T419111)]] [07:36:51] T419422: Display a list of 2FA-requiring groups on Special:AccountSecurity if user has no 2FA configured - https://phabricator.wikimedia.org/T419422 [07:36:52] T419111: Send Echo notification to 2FA-less users who are required to have 2FA - https://phabricator.wikimedia.org/T419111 [07:37:56] jmm@cumin2002 makevm (PID 1407536) is awaiting input [07:38:42] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ncredir4003.ulsfo.wmnet with OS bookworm [07:40:08] (03PS2) 10AKhatun: topic: mw-page-edit-type-enrich-next [puppet] - 10https://gerrit.wikimedia.org/r/1249957 (https://phabricator.wikimedia.org/T351225) [07:42:08] !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet [07:43:02] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696088 (10ops-monitoring-bot) Draining ganeti1033.eqiad.wmnet of running VMs [07:43:37] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet [07:43:40] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696089 (10MoritzMuehlenhoff) [07:44:05] !log jmm@cumin2002 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet [07:44:38] !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1049.eqiad.wmnet [07:46:41] (03CR) 10Elukey: sre.hosts.provision: allow no-pxe settings for NIC on Supermicro (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1249973 (https://phabricator.wikimedia.org/T400626) (owner: 10Elukey) [07:47:39] (03CR) 10Dpogorzelski: [C:03+1] ml: add aiter support to vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) (owner: 10Kevin Bazira) [07:47:43] (03CR) 10Elukey: [C:03+1] profile::server_depool: Annotate maps [puppet] - 10https://gerrit.wikimedia.org/r/1250431 (owner: 10Muehlenhoff) [07:49:31] (03CR) 10Muehlenhoff: [C:03+2] profile::server_depool: Annotate maps [puppet] - 10https://gerrit.wikimedia.org/r/1250431 (owner: 10Muehlenhoff) [07:50:56] (03PS1) 10Arnaudb: mailman: update helo data to use lists1004.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) [07:51:04] (03CR) 10Elukey: ml: add aiter support to vLLM 0.14 image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) (owner: 10Kevin Bazira) [07:51:45] (03CR) 10Elukey: [C:03+2] installserver: update preseed config for ml-serve101[4,5] [puppet] - 10https://gerrit.wikimedia.org/r/1249984 (https://phabricator.wikimedia.org/T400626) (owner: 10Elukey) [07:51:52] (03CR) 10Dpogorzelski: [C:03+2] ml: add aiter support to vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) (owner: 10Kevin Bazira) [07:51:55] (03CR) 10Dpogorzelski: [V:03+2 C:03+2] ml: add aiter support to vLLM 0.14 image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) (owner: 10Kevin Bazira) [07:52:40] 06SRE, 06collaboration-services, 10Wikimedia-Mailing-lists, 13Patch-For-Review: Put lists.wikimedia.org web interface behind CDN - https://phabricator.wikimedia.org/T286066#11696121 (10ABran-WMF) good catch @taavi, thanks! I've sent [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1250433 | a CR ]]... [07:55:13] (03CR) 10Kevin Bazira: ml: add aiter support to vLLM 0.14 image (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1250416 (https://phabricator.wikimedia.org/T419650) (owner: 10Kevin Bazira) [07:56:08] (03CR) 10Elukey: sre.hosts.provision: add safeguard for typoes in serials (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 (owner: 10Ayounsi) [07:56:08] !log mszwarc@deploy2002 mszwarc: Backport for [[gerrit:1249921|Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422)]], [[gerrit:1250066|Send2FAWarningNotifications: Support reading users from file (T419111)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:56:13] T419422: Display a list of 2FA-requiring groups on Special:AccountSecurity if user has no 2FA configured - https://phabricator.wikimedia.org/T419422 [07:56:13] T419111: Send Echo notification to 2FA-less users who are required to have 2FA - https://phabricator.wikimedia.org/T419111 [07:57:06] !log mszwarc@deploy2002 mszwarc: Continuing with sync [07:58:13] For the record, there are "Cannot access the database: could not connect to any replica DB server" errors, but they don't seem related to this patch (and they have been appearing earlier as well) [07:58:40] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage [08:03:56] (03PS1) 10Muehlenhoff: Add netflow4003 [puppet] - 10https://gerrit.wikimedia.org/r/1250499 (https://phabricator.wikimedia.org/T418993) [08:04:17] (03PS2) 10Muehlenhoff: Add netflow4003 [puppet] - 10https://gerrit.wikimedia.org/r/1250499 (https://phabricator.wikimedia.org/T418993) [08:04:22] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage [08:04:35] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1014.eqiad.wmnet with OS trixie [08:05:01] !log installing mariadb bugfix updates from Bookworm point release (tools and libraries as packaged in Debian, unrelated to the wmf-mariadb packages) [08:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:25] (03CR) 10Jelto: "one comment in-line. Beside that this change looks reasonable." [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) (owner: 10Arnaudb) [08:07:06] (03CR) 10Mszwarc: [C:03+2] "Ahead of deployment, to speed up things" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250426 (owner: 10Mszwarc) [08:07:58] (03Merged) 10jenkins-bot: Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1250426 (owner: 10Mszwarc) [08:09:53] !log mszwarc@deploy2002 Finished scap sync-world: Backport for [[gerrit:1249921|Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422)]], [[gerrit:1250066|Send2FAWarningNotifications: Support reading users from file (T419111)]] (duration: 33m 07s) [08:09:58] T419422: Display a list of 2FA-requiring groups on Special:AccountSecurity if user has no 2FA configured - https://phabricator.wikimedia.org/T419422 [08:09:58] T419111: Send Echo notification to 2FA-less users who are required to have 2FA - https://phabricator.wikimedia.org/T419111 [08:10:17] (03CR) 10Muehlenhoff: installserver: update preseed config for ml-serve101[4,5] (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1249984 (https://phabricator.wikimedia.org/T400626) (owner: 10Elukey) [08:10:35] !log mszwarc@deploy2002 Started scap sync-world: Backport for [[gerrit:1250426|Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages]] [08:11:09] (03CR) 10Brouberol: [C:03+2] Bump Airflow image to include missing jars [deployment-charts] - 10https://gerrit.wikimedia.org/r/1249936 (https://phabricator.wikimedia.org/T415874) (owner: 10Aqu) [08:14:47] !log mszwarc@deploy2002 mszwarc: Backport for [[gerrit:1250426|Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:15:16] !log mszwarc@deploy2002 mszwarc: Continuing with sync [08:17:05] !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1014.eqiad.wmnet with reason: host reimage [08:19:16] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11696204 (10MoritzMuehlenhoff) [08:20:08] (03PS2) 10Arnaudb: mailman: update helo data to use lists1004.w.o [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) [08:20:08] (03CR) 10Arnaudb: "good catch! this will be handled by https://gerrit.wikimedia.org/r/c/operations/dns/+/1249310 which I intended to merge today" [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) (owner: 10Arnaudb) [08:21:00] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4003.ulsfo.wmnet with OS bookworm [08:21:00] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir4003.ulsfo.wmnet [08:21:16] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696205 (10MatthewVernon) [08:21:21] !log mszwarc@deploy2002 Finished scap sync-world: Backport for [[gerrit:1250426|Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages]] (duration: 10m 46s) [08:21:38] (03PS1) 10Muehlenhoff: Update netflow collector for ulsfo [homer/public] - 10https://gerrit.wikimedia.org/r/1250505 (https://phabricator.wikimedia.org/T418993) [08:21:46] !log jmm@cumin2002 START - Cookbook sre.ganeti.makevm for new host ncredir4004.ulsfo.wmnet [08:21:47] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply [08:21:48] !log jmm@cumin2002 START - Cookbook sre.dns.netbox [08:21:56] !log UTC morning backport window finished [08:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:58] (03CR) 10Arnaudb: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) (owner: 10Arnaudb) [08:22:24] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696207 (10MatthewVernon) Can I check this is 15:00 UTC (particularly given daylight confusion...), please? Once it's done I'll check ms-be1091 [the frontends c... [08:22:29] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply [08:23:26] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1014.eqiad.wmnet with reason: host reimage [08:24:25] (03PS1) 10Muehlenhoff: Add ncredir4003/ncredir4004 [puppet] - 10https://gerrit.wikimedia.org/r/1250506 (https://phabricator.wikimedia.org/T418993) [08:24:58] 06SRE, 10SRE-Access-Requests, 06Data-Engineering, 06Data-Engineering-Radar: Requesting Kerberos access for ben.buchenau - https://phabricator.wikimedia.org/T390734#11696209 (10Ben.buchenau) Hi Andrea! Thanks for picking this up. A good name would be: **wmde_goal_monitoring**. Best, Ben [08:25:21] !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4004.ulsfo.wmnet - jmm@cumin2002" [08:26:28] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696225 (10MoritzMuehlenhoff) [08:28:26] jmm@cumin2002 makevm (PID 1420291) is awaiting input [08:29:19] (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1249995 (owner: 10Majavah) [08:30:10] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696229 (10ayounsi) >>! In T419647#11696205, @MatthewVernon wrote: > Can I check this is 15:00 UTC (particularly given daylight confusion...), please? Once it's... [08:30:52] (03CR) 10Muehlenhoff: "If you want to merge, please go ahead! Otherwise I'll do it myself when time permits." [puppet] - 10https://gerrit.wikimedia.org/r/1248385 (https://phabricator.wikimedia.org/T413740) (owner: 10Muehlenhoff) [08:31:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4004.ulsfo.wmnet - jmm@cumin2002" [08:31:48] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:31:48] !log jmm@cumin2002 START - Cookbook sre.dns.wipe-cache ncredir4004.ulsfo.wmnet on all recursors [08:31:52] !log jmm@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir4004.ulsfo.wmnet on all recursors [08:32:20] !log jmm@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4004.ulsfo.wmnet - jmm@cumin2002" [08:32:25] !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4004.ulsfo.wmnet - jmm@cumin2002" [08:33:44] (03CR) 10Jelto: [C:03+1] "looks better now, thank you. The PCC diff shows another `helo_data = lists.wikimedia.org`. I'm not sure if that needs to be updated as wel" [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) (owner: 10Arnaudb) [08:34:14] FIRING: CertAlmostExpired: Certificate for service lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#lsw1-e8-eqiad.mgmt.eqiad.wmnet:32767 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [08:35:26] jmm@cumin2002 makevm (PID 1420291) is awaiting input [08:35:36] !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ncredir4004.ulsfo.wmnet with OS bookworm [08:36:58] (03PS3) 10Ayounsi: sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 [08:37:16] (03CR) 10Ayounsi: sre.hosts.provision: add safeguard for typoes in serials (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 (owner: 10Ayounsi) [08:39:30] !log elukey@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" [08:40:23] (03CR) 10Tiziano Fogli: [C:03+2] alertmanager/o11y: add route to handle alerts with severity=task [puppet] - 10https://gerrit.wikimedia.org/r/1249349 (https://phabricator.wikimedia.org/T415317) (owner: 10Tiziano Fogli) [08:40:34] !log elukey@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" [08:40:34] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1014.eqiad.wmnet with OS trixie [08:41:12] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ml-serve1015.eqiad.wmnet with OS trixie [08:41:22] (03CR) 10Ayounsi: [C:03+1] "change lgtm but I don't have the authority to put those hosts to prod" [puppet] - 10https://gerrit.wikimedia.org/r/1250506 (https://phabricator.wikimedia.org/T418993) (owner: 10Muehlenhoff) [08:43:13] (03CR) 10Elukey: [C:03+1] sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 (owner: 10Ayounsi) [08:44:15] FIRING: CoreRouterInterfaceDown: Core router interface down - pfw1-codfw:reth2 (fasw1-f5 2x25G) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=pfw1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:45:03] 10ops-eqiad, 06DC-Ops: Power Supply - Status - issue on wdqs1025:9290 - https://phabricator.wikimedia.org/T419664 (10phaultfinder) 03NEW [08:48:39] FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:52:32] 06SRE, 06Infrastructure-Foundations, 10netops, 06ServiceOps new: Eqiad: lsw1-d2-eqiad BGP maintenance - https://phabricator.wikimedia.org/T419647#11696297 (10MatthewVernon) Ah, I just put `10:00 EST` into `date`. You're probably right, but a confirmation would be helpful :) [08:52:52] !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1015.eqiad.wmnet with reason: host reimage [08:54:02] !log installing imagemagick security updates [08:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:39] !log jynus@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: mysql upgrade / restart [08:56:31] (03PS2) 10Kgraessle: Enable AutoModerator on Italian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1192921 (https://phabricator.wikimedia.org/T405152) [08:56:43] (03PS3) 10JavierMonton: stream: mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1249959 (https://phabricator.wikimedia.org/T419258) [08:57:34] (03CR) 10Vgutierrez: Add ncredir4003/ncredir4004 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1250506 (https://phabricator.wikimedia.org/T418993) (owner: 10Muehlenhoff) [08:58:33] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage [08:58:55] !log trueg@deploy2002 helmfile [staging] START helmfile.d/services/SERVICE_NAME: apply [08:58:57] !log trueg@deploy2002 helmfile [staging] DONE helmfile.d/services/SERVICE_NAME: apply [08:59:21] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1015.eqiad.wmnet with reason: host reimage [09:00:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by javiermonton@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1249217 (https://phabricator.wikimedia.org/T419258) (owner: 10JavierMonton) [09:01:04] (03Merged) 10jenkins-bot: stream: mediawiki.page_html_content_change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1249217 (https://phabricator.wikimedia.org/T419258) (owner: 10JavierMonton) [09:01:28] (03CR) 10Muehlenhoff: Add ncredir4003/ncredir4004 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1250506 (https://phabricator.wikimedia.org/T418993) (owner: 10Muehlenhoff) [09:01:33] !log javiermonton@deploy2002 Started scap sync-world: Backport for [[gerrit:1249217|stream: mediawiki.page_html_content_change (T419258)]] [09:01:35] (03CR) 10Tiziano Fogli: [C:03+2] prometheus: add cardinality explosion alerts [alerts] - 10https://gerrit.wikimedia.org/r/1248866 (https://phabricator.wikimedia.org/T415317) (owner: 10Tiziano Fogli) [09:01:37] T419258: Adatp HTML pipeline to the new diffs schema - https://phabricator.wikimedia.org/T419258 [09:02:34] (03PS2) 10Trueg: deployment_server: Add wdqs-queryhammer service [puppet] - 10https://gerrit.wikimedia.org/r/1249918 (https://phabricator.wikimedia.org/T417415) [09:03:03] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage [09:03:36] !log javiermonton@deploy2002 javiermonton: Backport for [[gerrit:1249217|stream: mediawiki.page_html_content_change (T419258)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [09:03:37] !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2095.codfw.wmnet with OS bullseye [09:03:51] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: FY2526 Q3:rack/setup/install ms-be209[56] - https://phabricator.wikimedia.org/T413088#11696322 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be2095.codfw.wmnet with OS bullseye [09:05:27] (03PS4) 10Kgraessle: Enable revert risk filters for first batch of wikis: < 1000 monthly edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1247065 (https://phabricator.wikimedia.org/T411485) [09:06:04] !log javiermonton@deploy2002 javiermonton: Continuing with sync [09:07:53] !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2096.codfw.wmnet with OS bullseye [09:08:04] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: FY2526 Q3:rack/setup/install ms-be209[56] - https://phabricator.wikimedia.org/T413088#11696333 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be2096.codfw.wmnet with OS bullseye [09:08:58] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops: FY2526 Q3:rack/setup/install ms-be209[56] - https://phabricator.wikimedia.org/T413088#11696334 (10MatthewVernon) Hi @Jhancock.wm I'm afraid this is the problem we've seen with Dell before (but that I hoped they were going to correct), where they send us sy... [09:10:01] !log javiermonton@deploy2002 Finished scap sync-world: Backport for [[gerrit:1249217|stream: mediawiki.page_html_content_change (T419258)]] (duration: 08m 28s) [09:10:05] T419258: Adatp HTML pipeline to the new diffs schema - https://phabricator.wikimedia.org/T419258 [09:11:16] (03CR) 10JavierMonton: [C:03+2] stream: mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1249959 (https://phabricator.wikimedia.org/T419258) (owner: 10JavierMonton) [09:12:01] (03PS1) 10Muehlenhoff: ncredir: Switch to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1250517 [09:12:01] (03PS1) 10Gkyziridis: ml-services: Deploy new version of edit-check in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1250518 (https://phabricator.wikimedia.org/T419527) [09:13:12] (03Merged) 10jenkins-bot: stream: mw-page-html-content-change-enrich [deployment-charts] - 10https://gerrit.wikimedia.org/r/1249959 (https://phabricator.wikimedia.org/T419258) (owner: 10JavierMonton) [09:13:40] (03CR) 10Muehlenhoff: Add ncredir4003/ncredir4004 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1250506 (https://phabricator.wikimedia.org/T418993) (owner: 10Muehlenhoff) [09:14:30] !log elukey@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003" [09:15:13] !log javiermonton@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply [09:15:16] (03PS1) 10Muehlenhoff: ncredir4003/4004: Change back to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1250520 (https://phabricator.wikimedia.org/T418993) [09:15:22] !log javiermonton@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply [09:17:22] (03CR) 10Gkyziridis: [C:03+2] ml-services: Deploy new version of edit-check in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1250518 (https://phabricator.wikimedia.org/T419527) (owner: 10Gkyziridis) [09:17:35] elukey@cumin1003 reimage (PID 2828195) is awaiting input [09:18:46] (03PS4) 10Ayounsi: sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 [09:19:29] (03Merged) 10jenkins-bot: ml-services: Deploy new version of edit-check in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1250518 (https://phabricator.wikimedia.org/T419527) (owner: 10Gkyziridis) [09:19:54] (03PS5) 10Ayounsi: sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 [09:19:55] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4004.ulsfo.wmnet with OS bookworm [09:19:55] !log jmm@cumin2002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir4004.ulsfo.wmnet [09:21:00] (03Abandoned) 10Kgraessle: Enable rr-ml AutoModerator CC form on !large wikis Set AutoModeratorMultiLingualRevertRisk with available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1203498 (https://phabricator.wikimedia.org/T400727) (owner: 10Kgraessle) [09:21:52] (03CR) 10Vgutierrez: [C:03+1] ncredir4003/4004: Change back to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1250520 (https://phabricator.wikimedia.org/T418993) (owner: 10Muehlenhoff) [09:22:11] !log gkyziridis@deploy2002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' . [09:22:24] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1249522 (https://phabricator.wikimedia.org/T419058) (owner: 10Scott French) [09:22:59] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1250517 (owner: 10Muehlenhoff) [09:23:09] (03PS2) 10Muehlenhoff: ncredir: Switch to firewall::service [puppet] - 10https://gerrit.wikimedia.org/r/1250517 [09:24:10] !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage [09:24:12] (03CR) 10Arnaudb: "this is weird, we have 2 templates in that dir:" [puppet] - 10https://gerrit.wikimedia.org/r/1250433 (https://phabricator.wikimedia.org/T286066) (owner: 10Arnaudb) [09:25:12] (03PS6) 10Ayounsi: sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 [09:25:35] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply [09:26:24] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply [09:27:17] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply [09:27:54] !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage [09:28:04] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply [09:28:28] !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2096.codfw.wmnet with reason: host reimage [09:28:32] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply [09:29:10] (03PS4) 10JMeybohm: Remove PSP related code from admin_ng [deployment-charts] - 10https://gerrit.wikimedia.org/r/1248823 (https://phabricator.wikimedia.org/T273507) [09:29:18] (03CR) 10JMeybohm: [C:03+2] kubernetes: Don't re-define default admission_plugins [puppet] - 10https://gerrit.wikimedia.org/r/1248812 (https://phabricator.wikimedia.org/T273507) (owner: 10JMeybohm) [09:29:43] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply [09:30:02] !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply [09:30:05] (03CR) 10CI reject: [V:04-1] sre.hosts.provision: add safeguard for typoes in serials [cookbooks] - 10https://gerrit.wikimedia.org/r/1249971 (owner: 10Ayounsi) [09:30:20] (03CR) 10JMeybohm: [C:03+2] Remove istio 1.15 wikikube config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1248822 (https://phabricator.wikimedia.org/T341984) (owner: 10JMeybohm)