[00:40:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1223749 [00:40:13] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1223749 (owner: 10TrainBranchBot) [00:52:02] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1223749 (owner: 10TrainBranchBot) [01:10:05] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1223750 [01:10:06] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1223750 (owner: 10TrainBranchBot) [01:32:36] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1223750 (owner: 10TrainBranchBot) [06:16:54] PROBLEM - Exim SMTP on lists1004 is CRITICAL: connect to address 208.80.154.81 and port 25: Connection refused https://wikitech.wikimedia.org/wiki/Exim [06:20:00] RECOVERY - Exim SMTP on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Exim [06:49:39] (03PS1) 10Kevin Bazira: ml-services: deploy embeddings isvc to llm ns prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223961 (https://phabricator.wikimedia.org/T412338) [07:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T0700) [07:54:29] !log krinkle@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-experimental: apply [07:55:44] !log krinkle@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply [08:00:05] Amir1, Urbanecm, and awight: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T0800) [08:00:05] No Gerrit patches in the queue for this window AFAICS. [08:03:53] (03PS2) 10Thcipriani: Yubikey-SSH-FIDO: remove old key for thcipriani [puppet] - 10https://gerrit.wikimedia.org/r/1220410 (https://phabricator.wikimedia.org/T413416) [08:05:58] (03CR) 10JMeybohm: [C:03+1] etcd: Remove obsolete check [puppet] - 10https://gerrit.wikimedia.org/r/1223676 (owner: 10Muehlenhoff) [08:12:54] (03CR) 10Muehlenhoff: [C:03+2] Yubikey-SSH-FIDO: remove old key for thcipriani [puppet] - 10https://gerrit.wikimedia.org/r/1220410 (https://phabricator.wikimedia.org/T413416) (owner: 10Thcipriani) [08:14:38] (03PS1) 10JMeybohm: P:conftool::requestctl_client: update requestctl_cli.original.py [puppet] - 10https://gerrit.wikimedia.org/r/1224015 (https://phabricator.wikimedia.org/T404591) [08:18:16] !log installing libsodium security updates [08:18:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:51] (03PS1) 10Slyngshede: data.yaml: Extend llugo to 2027 [puppet] - 10https://gerrit.wikimedia.org/r/1224016 [08:20:07] (03CR) 10Dzahn: [C:03+2] acme_chief: add certs for wikipedia25.org [puppet] - 10https://gerrit.wikimedia.org/r/1223194 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [08:28:18] (03CR) 10Elukey: [C:03+2] sre.hosts.provision: make some Supermicro checks dynamic [cookbooks] - 10https://gerrit.wikimedia.org/r/1220311 (https://phabricator.wikimedia.org/T407991) (owner: 10Elukey) [08:31:40] (03CR) 10Dzahn: [C:03+2] ci/php: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1219875 (owner: 10Muehlenhoff) [08:33:14] (03CR) 10Ozge: [C:03+2] ml-services: deploy embeddings isvc to llm ns prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223961 (https://phabricator.wikimedia.org/T412338) (owner: 10Kevin Bazira) [08:33:38] (03PS2) 10Dzahn: cache-text: add wikipedia25.org to alternate_domains [puppet] - 10https://gerrit.wikimedia.org/r/1223182 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [08:34:01] (03CR) 10JavierMonton: [C:03+2] Configure new stream mediawiki.wmde_page_summary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1223688 (https://phabricator.wikimedia.org/T413891) (owner: 10Awight) [08:34:54] (03Merged) 10jenkins-bot: Configure new stream mediawiki.wmde_page_summary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1223688 (https://phabricator.wikimedia.org/T413891) (owner: 10Awight) [08:35:03] (03Merged) 10jenkins-bot: ml-services: deploy embeddings isvc to llm ns prod [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223961 (https://phabricator.wikimedia.org/T412338) (owner: 10Kevin Bazira) [08:45:53] !log jdlrobson@deploy2002 Sync cancelled. [08:49:54] !log awight@deploy2002 Started scap sync-world: Backport for [[gerrit:1223688|Configure new stream mediawiki.wmde_page_summary (T413891)]] [08:49:57] T413891: Run new scraper and verify its mechanisms - https://phabricator.wikimedia.org/T413891 [08:50:07] (03CR) 10Filippo Giunchedi: [C:03+1] "Neato" [puppet] - 10https://gerrit.wikimedia.org/r/1219907 (https://phabricator.wikimedia.org/T413193) (owner: 10Ahmon Dancy) [08:50:35] (03CR) 10Slyngshede: [V:03+1 C:03+2] P:idm configuration for Phabricator linking [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775) (owner: 10Slyngshede) [08:50:59] Hi, I'm squeaking a stream config change into the backport window now. [08:51:26] (03CR) 10Dzahn: [V:03+1] "what it does: https://puppet-compiler.wmflabs.org/output/1223182/7852/cp7001.magru.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1223182 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [08:52:14] !log awight@deploy2002 awight: Backport for [[gerrit:1223688|Configure new stream mediawiki.wmde_page_summary (T413891)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:53:13] !log awight@deploy2002 awight: Continuing with sync [08:53:46] (03CR) 10Dzahn: [V:03+1 C:03+2] cache-text: add wikipedia25.org to alternate_domains [puppet] - 10https://gerrit.wikimedia.org/r/1223182 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [08:56:30] (03PS1) 10Muehlenhoff: Add library hint for sodium [puppet] - 10https://gerrit.wikimedia.org/r/1224018 [08:57:22] !log awight@deploy2002 Finished scap sync-world: Backport for [[gerrit:1223688|Configure new stream mediawiki.wmde_page_summary (T413891)]] (duration: 07m 28s) [08:57:25] T413891: Run new scraper and verify its mechanisms - https://phabricator.wikimedia.org/T413891 [08:59:03] (03CR) 10Dzahn: "can we have a +1 for this tone? this should be the only one left at this point that we need to test this" [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [09:02:47] (03CR) 10Muehlenhoff: [C:03+2] Add library hint for sodium [puppet] - 10https://gerrit.wikimedia.org/r/1224018 (owner: 10Muehlenhoff) [09:04:09] (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1224016 (owner: 10Slyngshede) [09:04:50] (03CR) 10Slyngshede: [C:03+2] data.yaml: Extend llugo to 2027 [puppet] - 10https://gerrit.wikimedia.org/r/1224016 (owner: 10Slyngshede) [09:08:57] (03PS1) 10Slyngshede: P:idm Enable Phabricator social auth [puppet] - 10https://gerrit.wikimedia.org/r/1224019 [09:10:24] !log restarting bacula-sd daemon on backup1012 T413853 [09:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:27] T413853: Backup freshness: Stale bacula backup for gerrit1003 - https://phabricator.wikimedia.org/T413853 [09:11:52] (03CR) 10Vgutierrez: unlink wikipedia25.org from ncredir, point to geoip text-addrs (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [09:14:50] !log kevinbazira@deploy2002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' . [09:15:25] (03CR) 10Dzahn: unlink wikipedia25.org from ncredir, point to geoip text-addrs (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [09:15:39] 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.12 point update - https://phabricator.wikimedia.org/T403852#11498774 (10MoritzMuehlenhoff) [09:26:49] (03CR) 10Dzahn: unlink wikipedia25.org from ncredir, point to geoip text-addrs (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [09:28:06] (03CR) 10Muehlenhoff: [C:03+1] "Looks good, one typo inline" [puppet] - 10https://gerrit.wikimedia.org/r/1224019 (owner: 10Slyngshede) [09:33:52] (03PS2) 10Slyngshede: P:idm Enable Phabricator social auth [puppet] - 10https://gerrit.wikimedia.org/r/1224019 [09:34:07] (03CR) 10Slyngshede: P:idm Enable Phabricator social auth (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1224019 (owner: 10Slyngshede) [09:34:53] (03CR) 10Slyngshede: [C:03+2] P:idm Enable Phabricator social auth [puppet] - 10https://gerrit.wikimedia.org/r/1224019 (owner: 10Slyngshede) [09:38:02] !log uploaded Bird 2.18-1~wmf12u1 to component/bird-routed-ganeti for bookworm-wikimedia T413740 [09:38:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:05] T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740 [09:39:34] 06SRE, 06cloud-services-team, 10Cloud-VPS: ceph: test and decide 1 network interface setup - https://phabricator.wikimedia.org/T325531#11498871 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Will be done as part of {T399180} [09:43:29] (03CR) 10Harroyo-wmf: [C:03+1] Write new for CheckUser user agent table migration everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1223675 (https://phabricator.wikimedia.org/T361196) (owner: 10Dreamy Jazz) [09:43:30] (03CR) 10Harroyo-wmf: [C:03+1] Write new for CheckUser user agent table migration on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1223674 (https://phabricator.wikimedia.org/T361196) (owner: 10Dreamy Jazz) [09:43:31] (03CR) 10Harroyo-wmf: [C:03+1] Write new for CheckUser user agent table migration on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1223673 (https://phabricator.wikimedia.org/T361196) (owner: 10Dreamy Jazz) [09:44:13] (03CR) 10Vgutierrez: [C:03+1] unlink wikipedia25.org from ncredir, point to geoip text-addrs [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [09:46:38] (03CR) 10Vgutierrez: [C:03+1] ats: gerrit: use LetsEncrypt CA for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1215684 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [09:52:44] (03PS1) 10Jcrespo: backup: Expand available backup space for repos to 100 TB [puppet] - 10https://gerrit.wikimedia.org/r/1224023 (https://phabricator.wikimedia.org/T413853) [09:54:46] (03CR) 10Jcrespo: [C:03+2] backup: Expand available backup space for repos to 100 TB [puppet] - 10https://gerrit.wikimedia.org/r/1224023 (https://phabricator.wikimedia.org/T413853) (owner: 10Jcrespo) [09:58:32] !log javiermonton@deploy2002 helmfile [staging] START helmfile.d/services/eventgate-analytics: apply [09:58:43] !log javiermonton@deploy2002 helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply [09:59:30] (03CR) 10Dzahn: [C:03+2] ats: gerrit: use LetsEncrypt CA for gerrit [puppet] - 10https://gerrit.wikimedia.org/r/1215684 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [09:59:56] (03CR) 10Dzahn: [C:03+2] "nothing is going to change unless DNS changes" [puppet] - 10https://gerrit.wikimedia.org/r/1215684 (https://phabricator.wikimedia.org/T411895) (owner: 10CDanis) [10:01:32] (03PS2) 10Dzahn: switch gerrit service IP to CDN [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) [10:02:18] (03CR) 10Dzahn: "in theory it could work now - but we want to first let people opt-in and test" [dns] - 10https://gerrit.wikimedia.org/r/1215709 (https://phabricator.wikimedia.org/T411895) (owner: 10Dzahn) [10:16:09] 07sre-alert-triage, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Alert in need of triage: HDFS topology check (instance an-master1003) - https://phabricator.wikimedia.org/T413742#11499014 (10Gehel) p:05Triage→03High [10:25:16] (03PS1) 10Volans: Remove support for Python 3.9 [software/cumin] - 10https://gerrit.wikimedia.org/r/1224029 [10:25:16] (03PS1) 10Volans: Update deprecated type hints [software/cumin] - 10https://gerrit.wikimedia.org/r/1224030 [10:25:16] (03PS1) 10Volans: transports: refactor State implementation [software/cumin] - 10https://gerrit.wikimedia.org/r/1224031 [10:25:16] (03PS1) 10Volans: transports: add shortened method to Command class [software/cumin] - 10https://gerrit.wikimedia.org/r/1224032 [10:25:18] (03PS1) 10Volans: transports: add new API for the execution results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224033 [10:25:22] (03PS1) 10Volans: tests: fix integration tests error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/1224034 [10:25:27] (03PS1) 10Volans: clustershell: implement the new API for results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224035 [10:25:31] (03PS1) 10Volans: tests: add CLI comparison tests [software/cumin] - 10https://gerrit.wikimedia.org/r/1224036 [10:25:35] (03PS1) 10Volans: tests: fix shellcheck issues [software/cumin] - 10https://gerrit.wikimedia.org/r/1224037 [10:25:39] (03PS1) 10Volans: clustershell: add command index to output headers [software/cumin] - 10https://gerrit.wikimedia.org/r/1224038 [10:25:43] (03PS1) 10Volans: doc: update for the new API [software/cumin] - 10https://gerrit.wikimedia.org/r/1224039 [10:26:01] 06SRE, 06Infrastructure-Foundations, 10netops: Inaccurate stats reported by cr2-codfw - https://phabricator.wikimedia.org/T400205#11499053 (10cmooney) 05Open→03Resolved a:03cmooney I should have updated here, Juniper advise this is fixed in 23.4R2-S3 and beyond, which was released in Novemeber 2025... [10:27:32] (03PS2) 10Volans: doc: update for the new API [software/cumin] - 10https://gerrit.wikimedia.org/r/1224039 [10:27:52] (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/1223671 (https://phabricator.wikimedia.org/T413826) (owner: 10Ayounsi) [10:28:17] 06SRE-OnFire, 10Cloud-VPS, 06cloud-services-team (FY2025/2026-Q3-Q4), 13Patch-For-Review, 07Sustainability (Incident Followup): Add external meta-monitoring for metricsinfra - https://phabricator.wikimedia.org/T288053#11499069 (10fgiunchedi) [10:34:00] (03CR) 10CI reject: [V:04-1] transports: add new API for the execution results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224033 (owner: 10Volans) [10:34:38] (03CR) 10CI reject: [V:04-1] tests: fix integration tests error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/1224034 (owner: 10Volans) [10:34:56] (03CR) 10CI reject: [V:04-1] transports: refactor State implementation [software/cumin] - 10https://gerrit.wikimedia.org/r/1224031 (owner: 10Volans) [10:35:01] (03CR) 10CI reject: [V:04-1] Remove support for Python 3.9 [software/cumin] - 10https://gerrit.wikimedia.org/r/1224029 (owner: 10Volans) [10:35:04] (03CR) 10CI reject: [V:04-1] Update deprecated type hints [software/cumin] - 10https://gerrit.wikimedia.org/r/1224030 (owner: 10Volans) [10:35:24] (03CR) 10CI reject: [V:04-1] clustershell: implement the new API for results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224035 (owner: 10Volans) [10:35:37] (03CR) 10CI reject: [V:04-1] transports: add shortened method to Command class [software/cumin] - 10https://gerrit.wikimedia.org/r/1224032 (owner: 10Volans) [10:35:43] (03CR) 10CI reject: [V:04-1] clustershell: add command index to output headers [software/cumin] - 10https://gerrit.wikimedia.org/r/1224038 (owner: 10Volans) [10:35:46] (03CR) 10CI reject: [V:04-1] tests: fix shellcheck issues [software/cumin] - 10https://gerrit.wikimedia.org/r/1224037 (owner: 10Volans) [10:35:46] (03CR) 10CI reject: [V:04-1] doc: update for the new API [software/cumin] - 10https://gerrit.wikimedia.org/r/1224039 (owner: 10Volans) [10:35:51] (03CR) 10CI reject: [V:04-1] tests: add CLI comparison tests [software/cumin] - 10https://gerrit.wikimedia.org/r/1224036 (owner: 10Volans) [10:36:12] (03CR) 10CI reject: [V:04-1] doc: update for the new API [software/cumin] - 10https://gerrit.wikimedia.org/r/1224039 (owner: 10Volans) [10:37:39] (03PS2) 10Volans: Remove support for Python 3.9 [software/cumin] - 10https://gerrit.wikimedia.org/r/1224029 [10:42:47] (03PS1) 10Tchanders: WIP Check if adding - prevents "no change" CI failure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224045 [10:42:56] (03PS1) 10Muehlenhoff: Mark toluayo as converted to wmfreq [puppet] - 10https://gerrit.wikimedia.org/r/1224046 [10:45:14] (03PS1) 10Blake: service: add excluded_services helper function [software/spicerack] - 10https://gerrit.wikimedia.org/r/1224041 (https://phabricator.wikimedia.org/T412211) [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1100) [11:01:03] 06SRE, 10Horizon, 06serviceops, 10Striker, 06cloud-services-team (FY2025/2026-Q3-Q4): Move cloudweb to Ganeti VMs and repurpose the servers as wikikube nodes - https://phabricator.wikimedia.org/T392478#11499214 (10fgiunchedi) [11:05:21] !log javiermonton@deploy2002 helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync [11:05:57] !log javiermonton@deploy2002 helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync [11:06:20] 06SRE, 06cloud-services-team, 06Infrastructure-Foundations: ACPI kernel failure on debian installer last step - https://phabricator.wikimedia.org/T357896#11499230 (10fgiunchedi) 05Open→03Invalid I'm not aware of recent troubles with this problem [11:06:29] 06SRE, 10Horizon, 06serviceops, 10Striker, 06cloud-services-team (FY2025/2026-Q3-Q4): Move cloudweb to Ganeti VMs and repurpose the servers as wikikube nodes - https://phabricator.wikimedia.org/T392478#11499233 (10taavi) See also: {T411783} [11:07:40] !log javiermonton@deploy2002 helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync [11:08:19] !log javiermonton@deploy2002 helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync [11:08:31] (03PS3) 10Volans: Remove support for Python 3.9 [software/cumin] - 10https://gerrit.wikimedia.org/r/1224029 [11:08:31] (03PS2) 10Volans: Update deprecated type hints [software/cumin] - 10https://gerrit.wikimedia.org/r/1224030 [11:08:32] (03PS2) 10Volans: transports: refactor State implementation [software/cumin] - 10https://gerrit.wikimedia.org/r/1224031 [11:08:32] (03PS2) 10Volans: transports: add shortened method to Command class [software/cumin] - 10https://gerrit.wikimedia.org/r/1224032 [11:08:33] (03PS2) 10Volans: transports: add new API for the execution results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224033 [11:08:35] (03PS2) 10Volans: tests: fix integration tests error handling [software/cumin] - 10https://gerrit.wikimedia.org/r/1224034 [11:08:39] (03PS2) 10Volans: clustershell: implement the new API for results [software/cumin] - 10https://gerrit.wikimedia.org/r/1224035 [11:08:43] (03PS2) 10Volans: tests: add CLI comparison tests [software/cumin] - 10https://gerrit.wikimedia.org/r/1224036 [11:08:47] (03PS2) 10Volans: tests: fix shellcheck issues [software/cumin] - 10https://gerrit.wikimedia.org/r/1224037 [11:08:51] (03PS2) 10Volans: clustershell: add command index to output headers [software/cumin] - 10https://gerrit.wikimedia.org/r/1224038 [11:08:55] (03PS3) 10Volans: doc: update for the new API [software/cumin] - 10https://gerrit.wikimedia.org/r/1224039 [11:11:26] (03CR) 10Muehlenhoff: "Just a note that the cloudcumin hosts are still on Bullseye at this point." [software/cumin] - 10https://gerrit.wikimedia.org/r/1224029 (owner: 10Volans) [11:15:13] (03PS3) 10Clément Goubert: rest-gateway: Move REST API Sandbox to mw-rest-php [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223659 (https://phabricator.wikimedia.org/T396807) [11:16:19] (03CR) 10Clément Goubert: rest-gateway: Move REST API Sandbox to mw-rest-php (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223659 (https://phabricator.wikimedia.org/T396807) (owner: 10Clément Goubert) [11:21:12] 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, 06Traffic: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11499254 (10MatthewVernon) [11:22:56] (03CR) 10Volans: "An example usage of the final result can be seen in the last patch of the series where the documentation gets updated:" [software/cumin] - 10https://gerrit.wikimedia.org/r/1224033 (owner: 10Volans) [11:23:07] 10SRE-swift-storage, 06Data-Persistence, 10MediaViewer, 10Thumbor, 06Traffic: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971#11499255 (10MatthewVernon) [11:23:32] (03CR) 10Muehlenhoff: [C:03+2] Mark toluayo as converted to wmfreq [puppet] - 10https://gerrit.wikimedia.org/r/1224046 (owner: 10Muehlenhoff) [11:24:50] (03CR) 10Volans: "An example usage of the final result can be seen in the last patch of the series where the documentation gets updated:" [software/cumin] - 10https://gerrit.wikimedia.org/r/1224035 (owner: 10Volans) [11:27:52] (03PS1) 10MVernon: Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) [11:30:43] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1168 - https://phabricator.wikimedia.org/T413704#11499276 (10BTullis) [11:34:36] !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-worker1168.eqiad.wmnet [11:35:00] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1168 - https://phabricator.wikimedia.org/T413704#11499281 (10ops-monitoring-bot) Host an-worker1168.eqiad.wmnet rebooted by btullis@cumin1003 with reason: Rebooting to allow unmounting failed disk [11:40:48] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1168 - https://phabricator.wikimedia.org/T413704#11499293 (10BTullis) I cannot unmount the failed drive, so I have commented out the following line from `/etc/fstab` ` #LABEL=hadoop-e /var/lib/hadoop/data/e ext4 defaults,noatime 0 2 ` I... [11:41:28] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Degraded RAID on an-worker1198 - https://phabricator.wikimedia.org/T413336#11499294 (10BTullis) [11:44:55] !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1168.eqiad.wmnet [11:45:05] !log btullis@cumin1003 START - Cookbook sre.hosts.reboot-single for host an-worker1198.eqiad.wmnet [11:45:32] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1168 - https://phabricator.wikimedia.org/T413704#11499298 (10ops-monitoring-bot) Host an-worker1198.eqiad.wmnet rebooted by btullis@cumin1003 with reason: Rebooting to allow unmounting failed disk [11:46:41] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Degraded RAID on an-worker1198 - https://phabricator.wikimedia.org/T413336#11499304 (10BTullis) I had the same issue on this host as T413704, which is that I couldn't unmount the failed volume. I have commented out this line from... [11:47:42] !log restart dovecot on mx-out to pick up sodium security updates [11:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:53] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11499325 (10BTullis) Thanks @Jclark-ctr - You can replace this whenever is convenient. [11:56:25] (03CR) 10Hnowlan: [C:03+1] rest-gateway: Switch redis backend to 6380 instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223651 (https://phabricator.wikimedia.org/T413876) (owner: 10Clément Goubert) [11:56:44] (03PS1) 10Muehlenhoff: Add Cumin aliases for new codfw1dev roles [puppet] - 10https://gerrit.wikimedia.org/r/1224056 [11:57:09] (03CR) 10Hnowlan: [C:03+1] rest-gateway: Move REST API Sandbox to mw-rest-php [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223659 (https://phabricator.wikimedia.org/T396807) (owner: 10Clément Goubert) [11:59:28] (03PS1) 10Muehlenhoff: Add Cumin alias for tcpproxy hosts [puppet] - 10https://gerrit.wikimedia.org/r/1224057 (https://phabricator.wikimedia.org/T408532) [11:59:38] (03PS2) 10Muehlenhoff: Add Cumin alias for tcpproxy hosts [puppet] - 10https://gerrit.wikimedia.org/r/1224057 (https://phabricator.wikimedia.org/T408532) [12:00:05] mvolz: That opportune time for a Services – Citoid / Zotero deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1200). [12:01:58] (03PS1) 10Fabfur: cache::text: enable rate limit in cache::text [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) [12:02:00] (03CR) 10Mvolz: [C:03+2] Update zotero [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223642 (owner: 10Mvolz) [12:02:34] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [12:03:54] (03Merged) 10jenkins-bot: Update zotero [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223642 (owner: 10Mvolz) [12:04:22] (03Merged) 10jenkins-bot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223214 (owner: 10PipelineBot) [12:05:32] PROBLEM - Host an-worker1198 is DOWN: PING CRITICAL - Packet loss = 100% [12:08:16] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/citoid: apply [12:08:41] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/citoid: apply [12:09:47] !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/citoid: apply [12:10:12] !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/citoid: apply [12:11:03] !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/citoid: apply [12:11:48] !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/citoid: apply [12:17:31] !log mvolz@deploy2002 helmfile [staging] START helmfile.d/services/zotero: apply [12:18:04] !log mvolz@deploy2002 helmfile [staging] DONE helmfile.d/services/zotero: apply [12:23:18] (03CR) 10Majavah: [C:03+1] Add Cumin aliases for new codfw1dev roles [puppet] - 10https://gerrit.wikimedia.org/r/1224056 (owner: 10Muehlenhoff) [12:25:32] !log mvolz@deploy2002 helmfile [codfw] START helmfile.d/services/zotero: apply [12:26:00] !log mvolz@deploy2002 helmfile [codfw] DONE helmfile.d/services/zotero: apply [12:26:09] !log mvolz@deploy2002 helmfile [eqiad] START helmfile.d/services/zotero: apply [12:26:40] !log mvolz@deploy2002 helmfile [eqiad] DONE helmfile.d/services/zotero: apply [12:28:56] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1219129 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [12:29:10] (03PS1) 10Peter Fischer: Changed link to CirrusSearch dumps [puppet] - 10https://gerrit.wikimedia.org/r/1224063 (https://phabricator.wikimedia.org/T413970) [12:36:09] !incidents [12:36:10] 7280 (ACKED) ProbeDown sre (10.2.1.16 ip4 zotero:4969 probes/service http_zotero_ip4 codfw) [12:46:50] Mvolz: everything good with zotero? [12:52:59] (03CR) 10Ayounsi: [C:03+2] Add astein RO user to network devices [homer/public] - 10https://gerrit.wikimedia.org/r/1223671 (https://phabricator.wikimedia.org/T413826) (owner: 10Ayounsi) [12:53:05] (03CR) 10Gehel: [C:03+2] Changed link to CirrusSearch dumps [puppet] - 10https://gerrit.wikimedia.org/r/1224063 (https://phabricator.wikimedia.org/T413970) (owner: 10Peter Fischer) [12:55:44] Mvolz: we had a few pages for zotero.svc.codfw.wmnet., they recovered themselves though [12:56:00] anything to be concerned about? I see you logged some new deploys for it a while back [12:56:36] (03Merged) 10jenkins-bot: Add astein RO user to network devices [homer/public] - 10https://gerrit.wikimedia.org/r/1223671 (https://phabricator.wikimedia.org/T413826) (owner: 10Ayounsi) [12:57:10] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2148.codfw.wmnet, wikikube-worker2311.codfw.wmnet, wikikube-worker2172.codfw.wmnet, wikikube-worker2036.codfw.wmnet, wikikube-worker2252.codfw.wmnet, wikikube-worker2155.codfw.wmnet, wikikube-worker2113.codfw.wmnet, wikikube-worker2141.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2076.codfw.wmnet, w [12:57:10] worker2274.codfw.wmnet, wikikube-worker2165.codfw.wmnet, wikikube-worker2215.codfw.wmnet, wikikube-worker2177.codfw.wmnet, wikikube-worker2248.codfw.wmnet, wikikube-worker2157.codfw.wmnet, wikikube-worker2213.codfw.wmnet, wikikube-worker2139.codfw.wmnet, wikikube-worker2315.codfw.wmnet, wikikube-worker2314.codfw.wmnet, wikikube-worker2065.codfw.wmnet, wikikube-worker2060.codfw.wmnet, wikikube-worker2125.codfw.wmnet, wikikube-worker2196.co [12:57:10] t, wikikube-worker2159.codfw.wmnet, wikikube-worker2255.codfw.wmnet, wikikube-worker2124.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2313.codfw.wmnet, wikikube-worker20 https://wikitech.wikimedia.org/wiki/PyBal [12:57:10] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2170.codfw.wmnet, wikikube-worker2177.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2248.codfw.wmnet, wikikube-worker2036.codfw.wmnet, wikikube-worker2113.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2076.codfw.wmnet, wikikube-worker2274.codfw.wmnet, wikikube-worker2279.codfw.wmnet, w [12:57:10] worker2215.codfw.wmnet, wikikube-worker2161.codfw.wmnet, wikikube-worker2213.codfw.wmnet, wikikube-worker2289.codfw.wmnet, wikikube-worker2065.codfw.wmnet, wikikube-worker2281.codfw.wmnet, wikikube-worker2293.codfw.wmnet, wikikube-worker2090.codfw.wmnet, wikikube-worker2206.codfw.wmnet, wikikube-worker2055.codfw.wmnet, wikikube-worker2251.codfw.wmnet, wikikube-worker2062.codfw.wmnet, wikikube-worker2143.codfw.wmnet, wikikube-worker2282.co [12:57:10] t, wikikube-worker2149.codfw.wmnet, wikikube-worker2059.codfw.wmnet, wikikube-worker2254.codfw.wmnet, wikikube-worker2300.codfw.wmnet, wikikube-worker2285.codfw.wmnet, wikikube-worker22 https://wikitech.wikimedia.org/wiki/PyBal [12:58:10] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [12:58:51] (03PS3) 10Muehlenhoff: Remove puppetmaster::backend role and related Hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/1219129 (https://phabricator.wikimedia.org/T365798) [12:59:10] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:00:54] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1219129 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [13:03:27] (03CR) 10Ladsgroup: "let's update the thumb render map in a separate patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) (owner: 10MVernon) [13:05:07] !log btullis@cumin1003 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1198.eqiad.wmnet [13:08:33] (03CR) 10Muehlenhoff: [C:03+2] Add Cumin aliases for new codfw1dev roles [puppet] - 10https://gerrit.wikimedia.org/r/1224056 (owner: 10Muehlenhoff) [13:09:10] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2141.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2155.codfw.wmnet, wikikube-worker2136.codfw.wmnet, wikikube-worker2185.codfw.wmnet, wikikube-worker2161.codfw.wmnet, wikikube-worker2165.codfw.wmnet, wikikube-worker2044.codfw.wmnet, wikikube-worker2215.codfw.wmnet, wikikube-worker2248.codfw.wmnet, w [13:09:10] worker2157.codfw.wmnet, wikikube-worker2139.codfw.wmnet, wikikube-worker2315.codfw.wmnet, wikikube-worker2289.codfw.wmnet, wikikube-worker2041.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2313.codfw.wmnet, wikikube-worker2322.codfw.wmnet, wikikube-worker2272.codfw.wmnet, wikikube-worker2143.codfw.wmnet, wikikube-worker2050.codfw.wmnet, wikikube-worker2282.codfw.wmnet, wikikube-worker2151.codfw.wmnet, wikikube-worker2014.co [13:09:10] t, wikikube-worker2300.codfw.wmnet, wikikube-worker2156.codfw.wmnet, wikikube-worker2285.codfw.wmnet, wikikube-worker2268.codfw.wmnet, wikikube-worker2075.codfw.wmnet, wikikube-worker22 https://wikitech.wikimedia.org/wiki/PyBal [13:09:10] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2141.codfw.wmnet, wikikube-worker2033.codfw.wmnet, wikikube-worker2017.codfw.wmnet, wikikube-worker2280.codfw.wmnet, wikikube-worker2155.codfw.wmnet, wikikube-worker2150.codfw.wmnet, wikikube-worker2136.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2320.codfw.wmnet, wikikube-worker2071.codfw.wmnet, w [13:09:10] worker2165.codfw.wmnet, wikikube-worker2092.codfw.wmnet, wikikube-worker2161.codfw.wmnet, wikikube-worker2198.codfw.wmnet, wikikube-worker2139.codfw.wmnet, wikikube-worker2297.codfw.wmnet, wikikube-worker2289.codfw.wmnet, wikikube-worker2281.codfw.wmnet, wikikube-worker2293.codfw.wmnet, wikikube-worker2255.codfw.wmnet, wikikube-worker2124.codfw.wmnet, wikikube-worker2090.codfw.wmnet, wikikube-worker2206.codfw.wmnet, wikikube-worker2055.co [13:09:10] t, wikikube-worker2134.codfw.wmnet, wikikube-worker2050.codfw.wmnet, wikikube-worker2329.codfw.wmnet, wikikube-worker2149.codfw.wmnet, wikikube-worker2309.codfw.wmnet, wikikube-worker22 https://wikitech.wikimedia.org/wiki/PyBal [13:13:10] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:14:10] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:15:34] !log mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 11 echo-subscriptions-email-mention (T406724) [13:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:37] T406724: Clean up watchlist and user properties of users if they don't log in for certain time - https://phabricator.wikimedia.org/T406724 [13:16:24] RECOVERY - Host an-worker1198 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [13:16:27] 06SRE, 06cloud-services-team, 10Wikimedia-Mailing-lists: aborrero@wikimedia.org still subscribed to ops@lists.wikimedia.org - https://phabricator.wikimedia.org/T413883#11499539 (10Andrew) 05Open→03Invalid The alerts stopped after a few hours, so likely he was auto-unsubscribed. [13:18:01] (03CR) 10Dzahn: [C:03+2] unlink wikipedia25.org from ncredir, point to geoip text-addrs (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/1216843 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn) [13:19:07] !log dzahn@dns1004 START - running authdns-update [13:20:10] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2170.codfw.wmnet, wikikube-worker2262.codfw.wmnet, wikikube-worker2141.codfw.wmnet, wikikube-worker2174.codfw.wmnet, wikikube-worker2311.codfw.wmnet, wikikube-worker2202.codfw.wmnet, wikikube-worker2036.codfw.wmnet, wikikube-worker2150.codfw.wmnet, wikikube-worker2161.codfw.wmnet, wikikube-worker2076.codfw.wmnet, w [13:20:10] worker2261.codfw.wmnet, wikikube-worker2071.codfw.wmnet, wikikube-worker2165.codfw.wmnet, wikikube-worker2190.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2278.codfw.wmnet, wikikube-worker2092.codfw.wmnet, wikikube-worker2198.codfw.wmnet, wikikube-worker2273.codfw.wmnet, wikikube-worker2213.codfw.wmnet, wikikube-worker2297.codfw.wmnet, wikikube-worker2314.codfw.wmnet, wikikube-worker2065.codfw.wmnet, wikikube-worker2060.co [13:20:10] t, wikikube-worker2293.codfw.wmnet, wikikube-worker2255.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2090.codfw.wmnet, wikikube-worker2111.codfw.wmnet, wikikube-worker23 https://wikitech.wikimedia.org/wiki/PyBal [13:20:12] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2262.codfw.wmnet, wikikube-worker2278.codfw.wmnet, wikikube-worker2202.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2191.codfw.wmnet, wikikube-worker2172.codfw.wmnet, wikikube-worker2307.codfw.wmnet, wikikube-worker2036.codfw.wmnet, wikikube-worker2280.codfw.wmnet, wikikube-worker2328.codfw.wmnet, w [13:20:12] worker2155.codfw.wmnet, wikikube-worker2113.codfw.wmnet, wikikube-worker2136.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2171.codfw.wmnet, wikikube-worker2076.codfw.wmnet, wikikube-worker2320.codfw.wmnet, wikikube-worker2176.codfw.wmnet, wikikube-worker2215.codfw.wmnet, wikikube-worker2177.codfw.wmnet, wikikube-worker2161.codfw.wmnet, wikikube-worker2273.codfw.wmnet, wikikube-worker2213.codfw.wmnet, wikikube-worker2297.co [13:20:12] t, wikikube-worker2125.codfw.wmnet, wikikube-worker2301.codfw.wmnet, wikikube-worker2293.codfw.wmnet, wikikube-worker2255.codfw.wmnet, wikikube-worker2124.codfw.wmnet, wikikube-worker20 https://wikitech.wikimedia.org/wiki/PyBal [13:20:18] (03PS1) 10Filippo Giunchedi: installserver: use uefi for new cloud hw [puppet] - 10https://gerrit.wikimedia.org/r/1224070 (https://phabricator.wikimedia.org/T412568) [13:20:38] !log dzahn@dns1004 END - running authdns-update [13:21:53] (03CR) 10Muehlenhoff: [C:03+2] Remove puppetmaster::backend role and related Hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/1219129 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [13:21:59] (03PS2) 10MVernon: Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) [13:21:59] (03PS1) 10MVernon: Stop pregenerating thumbnails [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224071 (https://phabricator.wikimedia.org/T408062) [13:22:10] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:22:10] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:23:21] (03CR) 10Andrew Bogott: [C:03+1] installserver: use uefi for new cloud hw [puppet] - 10https://gerrit.wikimedia.org/r/1224070 (https://phabricator.wikimedia.org/T412568) (owner: 10Filippo Giunchedi) [13:23:43] (03CR) 10MVernon: "OK, done :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) (owner: 10MVernon) [13:24:38] (03CR) 10Filippo Giunchedi: [C:03+2] installserver: use uefi for new cloud hw [puppet] - 10https://gerrit.wikimedia.org/r/1224070 (https://phabricator.wikimedia.org/T412568) (owner: 10Filippo Giunchedi) [13:26:22] jouncebot: nowandnext [13:26:22] No deployments scheduled for the next 0 hour(s) and 33 minute(s) [13:26:22] In 0 hour(s) and 33 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1400) [13:27:07] 10ops-codfw, 06SRE, 06DC-Ops: FY2526 Q3:rack/setup/install cloudgw2004-dev - https://phabricator.wikimedia.org/T413831#11499579 (10Andrew) [13:27:10] (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) (owner: 10MVernon) [13:29:47] (03Merged) 10jenkins-bot: Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224051 (https://phabricator.wikimedia.org/T408062) (owner: 10MVernon) [13:30:10] (03PS1) 10JHathaway: Revert "Update zotero" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224073 [13:30:18] !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1224051|Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) (T408062 T412971)]] [13:30:22] T408062: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062 [13:30:23] T412971: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971 [13:30:50] (03CR) 10Cathal Mooney: [C:03+1] "worth doing to see if it restores availability on the codfw cluster" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224073 (owner: 10JHathaway) [13:31:10] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2170.codfw.wmnet, wikikube-worker2306.codfw.wmnet, wikikube-worker2033.codfw.wmnet, wikikube-worker2174.codfw.wmnet, wikikube-worker2202.codfw.wmnet, wikikube-worker2172.codfw.wmnet, wikikube-worker2150.codfw.wmnet, wikikube-worker2113.codfw.wmnet, wikikube-worker2136.codfw.wmnet, wikikube-worker2185.codfw.wmnet, w [13:31:10] worker2158.codfw.wmnet, wikikube-worker2076.codfw.wmnet, wikikube-worker2274.codfw.wmnet, wikikube-worker2044.codfw.wmnet, wikikube-worker2190.codfw.wmnet, wikikube-worker2215.codfw.wmnet, wikikube-worker2311.codfw.wmnet, wikikube-worker2157.codfw.wmnet, wikikube-worker2198.codfw.wmnet, wikikube-worker2322.codfw.wmnet, wikikube-worker2314.codfw.wmnet, wikikube-worker2065.codfw.wmnet, wikikube-worker2293.codfw.wmnet, wikikube-worker2255.co [13:31:10] t, wikikube-worker2124.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2055.codfw.wmnet, wikikube-worker2263.codfw.wmnet, wikikube-worker2062.codfw.wmnet, wikikube-worker21 https://wikitech.wikimedia.org/wiki/PyBal [13:31:10] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_4969: Servers wikikube-worker2170.codfw.wmnet, wikikube-worker2276.codfw.wmnet, wikikube-worker2262.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2172.codfw.wmnet, wikikube-worker2307.codfw.wmnet, wikikube-worker2155.codfw.wmnet, wikikube-worker2185.codfw.wmnet, wikikube-worker2158.codfw.wmnet, wikikube-worker2171.codfw.wmnet, w [13:31:10] worker2320.codfw.wmnet, wikikube-worker2108.codfw.wmnet, wikikube-worker2274.codfw.wmnet, wikikube-worker2071.codfw.wmnet, wikikube-worker2165.codfw.wmnet, wikikube-worker2044.codfw.wmnet, wikikube-worker2190.codfw.wmnet, wikikube-worker2177.codfw.wmnet, wikikube-worker2273.codfw.wmnet, wikikube-worker2213.codfw.wmnet, wikikube-worker2297.codfw.wmnet, wikikube-worker2315.codfw.wmnet, wikikube-worker2322.codfw.wmnet, wikikube-worker2314.co [13:31:10] t, wikikube-worker2125.codfw.wmnet, wikikube-worker2196.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2090.codfw.wmnet, wikikube-worker2055.codfw.wmnet, wikikube-worker21 https://wikitech.wikimedia.org/wiki/PyBal [13:31:30] (03PS1) 10Physikerwelt: Switch math rendering for group0 from native to mathjax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224074 (https://phabricator.wikimedia.org/T413973) [13:31:50] (03CR) 10Cathal Mooney: [C:03+2] Revert "Update zotero" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224073 (owner: 10JHathaway) [13:32:10] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:32:10] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:32:30] !log ladsgroup@deploy2002 mvernon, ladsgroup: Backport for [[gerrit:1224051|Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) (T408062 T412971)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [13:33:07] !log ladsgroup@deploy2002 mvernon, ladsgroup: Continuing with sync [13:35:40] (03PS1) 10Slyngshede: Notification for users to link their Phabricator account [software/bitu] - 10https://gerrit.wikimedia.org/r/1224076 [13:36:12] !log cmooney@deploy2002 helmfile [staging] START helmfile.d/services/zotero: apply [13:36:28] !log cmooney@deploy2002 helmfile [staging] DONE helmfile.d/services/zotero: apply [13:37:20] !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224051|Update config to reflect new standard set of thumbnail sizes (WE 5.4.7) (T408062 T412971)]] (duration: 07m 02s) [13:37:24] T408062: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062 [13:37:24] T412971: Propose a new set of standard thumbnail sizes - https://phabricator.wikimedia.org/T412971 [13:40:12] !log cmooney@deploy2002 helmfile [codfw] START helmfile.d/services/zotero: apply [13:40:41] !log cmooney@deploy2002 helmfile [codfw] DONE helmfile.d/services/zotero: apply [13:42:48] (03CR) 10Elukey: "Added some follow up points, lemme know!" [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [13:48:53] !log cmooney@deploy2002 helmfile [eqiad] START helmfile.d/services/zotero: apply [13:49:23] !log cmooney@deploy2002 helmfile [eqiad] DONE helmfile.d/services/zotero: apply [13:50:13] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11499689 (10Jclark-ctr) a:05Jclark-ctr→03BTullis thanks @BTullis drive has been swapped [13:51:46] (03PS2) 10Fabfur: cache::text: enable rate limit in cache::text (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) [13:53:48] (03PS1) 10Jforrester: wikifunctions: Upgrade evaluators from 2025-12-10-150641 to 2026-01-06-172129 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224078 (https://phabricator.wikimedia.org/T341624) [13:54:03] (03PS1) 10Jforrester: wikifunctions: Upgrade orchestrator from 2025-12-10-133418 to 2026-01-07-132737 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224079 (https://phabricator.wikimedia.org/T341624) [13:55:41] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Degraded RAID on an-worker1198 - https://phabricator.wikimedia.org/T413336#11499750 (10Jclark-ctr) a:05Jclark-ctr→03BTullis Thank you! @BTullis Replaced Failed Drive [13:56:33] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [13:58:33] (03PS2) 10Jforrester: wikifunctions: Upgrade evaluators from 2025-12-10-150641 to 2026-01-07-132938 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224078 (https://phabricator.wikimedia.org/T341624) [13:58:35] (03PS2) 10Jforrester: wikifunctions: Upgrade orchestrator from 2025-12-10-133418 to 2026-01-07-132737 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224079 (https://phabricator.wikimedia.org/T341624) [14:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1400). [14:00:05] No Gerrit patches in the queue for this window AFAICS. [14:00:20] indeed [14:00:21] no patches at all [14:00:25] so i'm going to steal the window! [14:00:58] o/ [14:01:00] :o [14:01:03] window thieff! [14:01:23] (03PS1) 10Urbanecm: fix(ConfirmEmail): Save settings after generating tokens [core] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224081 (https://phabricator.wikimedia.org/T413435) [14:01:29] (03CR) 10Urbanecm: [C:03+2] fix(ConfirmEmail): Save settings after generating tokens [core] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224081 (https://phabricator.wikimedia.org/T413435) (owner: 10Urbanecm) [14:01:32] (03CR) 10Muehlenhoff: [C:03+2] Stop uploading puppet facts to PCC from puppetmaster1001 [puppet] - 10https://gerrit.wikimedia.org/r/1075187 (https://phabricator.wikimedia.org/T367399) (owner: 10Muehlenhoff) [14:01:32] indeed! [14:03:25] * Lucas_WMDE waves at Mvolz [14:03:37] Hello! Sorry about that :) [14:04:22] I wasn’t involved, I just heard some people were looking for you ^^ [14:04:29] hopefully you’ve seen the slack messages or something ^^ [14:05:04] Hey folks, since there are no patches in the window, would it be possible to add a last-second change? [14:05:20] Noting though that I'm not a deployer and so would need a kind soul to actually do the work [14:05:23] Daimona: urbanecm is backporting at the moment, but after that, probably, yeah [14:05:27] I could deploy [14:05:42] or i can ship it at once [14:05:45] Daimona: what's your change? [14:05:51] (03PS4) 10Daimona Eaytoy: Stop setting $wgCampaignEventsEnableContributionTracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210716 (https://phabricator.wikimedia.org/T410939) [14:06:03] (also I don’t know if Mvolz / jhathaway / topranks might need to deploy anything citoid-y) [14:06:03] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Wednesday, January 07 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210716 (https://phabricator.wikimedia.org/T410939) (owner: 10Daimona Eaytoy) [14:06:12] (03CR) 10Vgutierrez: [C:03+1] cache::text: enable rate limit in cache::text (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [14:06:18] They reverted it, feel free to use the window. [14:06:19] ^^Added, and thank you :) [14:06:44] (03CR) 10Urbanecm: [C:03+2] Stop setting $wgCampaignEventsEnableContributionTracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210716 (https://phabricator.wikimedia.org/T410939) (owner: 10Daimona Eaytoy) [14:07:26] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "LGTM, the commit removing the flag should already be fully deployed with the train" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210716 (https://phabricator.wikimedia.org/T410939) (owner: 10Daimona Eaytoy) [14:07:35] (03Merged) 10jenkins-bot: Stop setting $wgCampaignEventsEnableContributionTracking [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1210716 (https://phabricator.wikimedia.org/T410939) (owner: 10Daimona Eaytoy) [14:08:48] (03CR) 10Fabfur: [C:03+2] cache::text: enable rate limit in cache::text (ulsfo) [puppet] - 10https://gerrit.wikimedia.org/r/1224058 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [14:11:07] !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:11:09] !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:11:38] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers titan1001.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers titan1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [14:11:46] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-web_443: Servers titan1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [14:12:08] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:12:10] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:13:32] (03Merged) 10jenkins-bot: fix(ConfirmEmail): Save settings after generating tokens [core] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224081 (https://phabricator.wikimedia.org/T413435) (owner: 10Urbanecm) [14:13:46] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:14:37] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host wdqs1032.eqiad.wmnet with OS trixie [14:14:38] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:15:38] !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:15:40] !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:16:23] !log urbanecm@deploy2002 Started scap sync-world: Backport for [[gerrit:1224081|fix(ConfirmEmail): Save settings after generating tokens (T413435)]], [[gerrit:1210716|Stop setting $wgCampaignEventsEnableContributionTracking (T410939)]] [14:16:24] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:16:25] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:16:28] T413435: Special:ConfirmEmail does not resend a working email token - https://phabricator.wikimedia.org/T413435 [14:16:28] T410939: Drop feature flag for collaborative contributions - https://phabricator.wikimedia.org/T410939 [14:17:39] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1032.eqiad.wmnet with OS trixie [14:17:55] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host wdqs1032.eqiad.wmnet with OS trixie [14:18:29] !log urbanecm@deploy2002 urbanecm, daimona: Backport for [[gerrit:1224081|fix(ConfirmEmail): Save settings after generating tokens (T413435)]], [[gerrit:1210716|Stop setting $wgCampaignEventsEnableContributionTracking (T410939)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:18:41] Daimona: ping if you can verify [14:18:48] !incidents [14:18:49] 7284 (RESOLVED) ProbeDown sre (10.2.2.53 ip4 thanos-query:443 probes/service http_thanos-query_ip4 eqiad) [14:18:49] 7283 (RESOLVED) ProbeDown sre (10.2.1.16 ip4 zotero:4969 probes/service http_zotero_ip4 codfw) [14:18:49] 7282 (RESOLVED) ProbeDown sre (10.2.1.16 ip4 zotero:4969 probes/service http_zotero_ip4 codfw) [14:18:49] 7281 (RESOLVED) ProbeDown sre (10.2.1.16 ip4 zotero:4969 probes/service http_zotero_ip4 codfw) [14:18:49] 7280 (RESOLVED) ProbeDown sre (10.2.1.16 ip4 zotero:4969 probes/service http_zotero_ip4 codfw) [14:19:07] (03PS1) 10Giuseppe Lavagetto: Performance improvements by adding caching [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1224086 [14:19:37] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Performance improvements by adding caching [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1224086 (owner: 10Giuseppe Lavagetto) [14:20:24] urbanecm: Looks good, ty [14:20:30] !log urbanecm@deploy2002 urbanecm, daimona: Continuing with sync [14:20:33] ty, proceeding [14:21:05] (03PS2) 10Giuseppe Lavagetto: Performance improvements by adding caching [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1224086 [14:21:14] (03CR) 10Giuseppe Lavagetto: [V:03+2 C:03+2] Performance improvements by adding caching [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1224086 (owner: 10Giuseppe Lavagetto) [14:21:43] !log oblivian@cumin1003 START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:21:45] !log oblivian@cumin1003 START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:22:15] (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete keys for PCC 5 instances [puppet] - 10https://gerrit.wikimedia.org/r/1223532 (https://phabricator.wikimedia.org/T367399) (owner: 10Muehlenhoff) [14:22:29] (03PS1) 10Muehlenhoff: Remove now obsolete spec test for Puppet 5 facts upload [puppet] - 10https://gerrit.wikimedia.org/r/1224088 (https://phabricator.wikimedia.org/T365798) [14:22:31] (03PS1) 10Muehlenhoff: Remove puppetmaster::scripts [puppet] - 10https://gerrit.wikimedia.org/r/1224089 (https://phabricator.wikimedia.org/T365798) [14:22:38] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Improve performance - oblivian@cumin1003 [14:22:40] !log oblivian@cumin1003 END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Improve performance - oblivian@cumin1003" [14:23:04] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1032.eqiad.wmnet with OS trixie [14:23:58] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224089 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:24:10] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host wdqs1032.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [14:24:29] !log urbanecm@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224081|fix(ConfirmEmail): Save settings after generating tokens (T413435)]], [[gerrit:1210716|Stop setting $wgCampaignEventsEnableContributionTracking (T410939)]] (duration: 08m 05s) [14:24:33] T413435: Special:ConfirmEmail does not resend a working email token - https://phabricator.wikimedia.org/T413435 [14:24:34] T410939: Drop feature flag for collaborative contributions - https://phabricator.wikimedia.org/T410939 [14:24:56] RECOVERY - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1168 is OK: communication: 0 OK : controller: 0 OK : physical_disk: 0 OK : virtual_disk: 0 OK : bbu: 0 OK : enclosure: 0 OK https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring [14:27:23] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on an-worker1168 - https://phabricator.wikimedia.org/T413704#11499877 (10Jclark-ctr) Thanks! @BTullis drive has been Swapped [14:31:00] (03PS1) 10Elukey: profile::docker_registry: add the ML instance [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) [14:32:27] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1032.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [14:32:38] 10ops-eqiad, 06SRE, 06DC-Ops, 06serviceops: Q2:rack/setup/install wikikube-worker refresh - https://phabricator.wikimedia.org/T408760#11499905 (10Jclark-ctr) >>! In T408760#11482291, @VRiley-WMF wrote: > wikikube-worker1371 > E8 > U32 > CableID 230304500167 > Port 5 > > wikikube-worker1372 > F8 > U18 > C... [14:32:51] (03CR) 10CI reject: [V:04-1] profile::docker_registry: add the ML instance [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) (owner: 10Elukey) [14:33:07] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device lsw1-e8-eqiad.mgmt.eqiad.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T413789#11499910 (10Jclark-ctr) 05Open→03Resolved [14:33:14] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device lsw1-f8-eqiad.mgmt.eqiad.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T413788#11499911 (10Jclark-ctr) 05Open→03Resolved [14:33:44] PROBLEM - Host titan1002 is DOWN: PING CRITICAL - Packet loss = 100% [14:34:33] (03CR) 10Majavah: [C:03+1] Remove now obsolete spec test for Puppet 5 facts upload [puppet] - 10https://gerrit.wikimedia.org/r/1224088 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:35:12] RECOVERY - Host titan1002 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [14:37:24] (03PS2) 10Elukey: profile::docker_registry: add the ML instance [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) [14:38:03] urbanecm: hey I’m going to deploy to https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1219234 with SpiderPig (under phuedx’s supervision), all good to go? [14:41:30] (03CR) 10TrainBranchBot: [C:03+2] "Approved by bearloga@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1219234 (https://phabricator.wikimedia.org/T396562) (owner: 10Bearloga) [14:42:23] (03Merged) 10jenkins-bot: EventStreamConfig: enrich stream with more headers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1219234 (https://phabricator.wikimedia.org/T396562) (owner: 10Bearloga) [14:42:55] !log bearloga@deploy2002 Started scap sync-world: Backport for [[gerrit:1219234|EventStreamConfig: enrich stream with more headers]] [14:44:30] (03CR) 10Elukey: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) (owner: 10Elukey) [14:45:02] !log bearloga@deploy2002 bearloga: Backport for [[gerrit:1219234|EventStreamConfig: enrich stream with more headers]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:46:19] !log sukhe@cumin1003 START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough [14:46:21] (03CR) 10Elukey: "Hi Scott! While talking with ML I realized that their use case is similar to what we want to do in https://phabricator.wikimedia.org/T4129" [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) (owner: 10Elukey) [14:46:33] !log bearloga@deploy2002 bearloga: Continuing with sync [14:49:35] (03CR) 10Zabe: [C:03+2] "Ok maybe I misunderstood something." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221678 (owner: 10Zabe) [14:50:33] !log bearloga@deploy2002 Finished scap sync-world: Backport for [[gerrit:1219234|EventStreamConfig: enrich stream with more headers]] (duration: 07m 37s) [14:51:10] (03CR) 10Muehlenhoff: "Looks good, a few wording fixes inline" [software/bitu] - 10https://gerrit.wikimedia.org/r/1224076 (owner: 10Slyngshede) [14:53:08] 06SRE, 06Infrastructure-Foundations, 10Puppet CI, 10Puppet-Infrastructure, 13Patch-For-Review: Default to the Puppet 7 PCC CI test, make it voting and eventually remove the Puppet 5 one - https://phabricator.wikimedia.org/T367399#11499955 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlen... [14:53:52] (03CR) 10Muehlenhoff: [C:03+2] Remove now obsolete spec test for Puppet 5 facts upload [puppet] - 10https://gerrit.wikimedia.org/r/1224088 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [14:54:17] Verified that the patch is working, event production rate hasn’t changed, logs look clean [14:54:17] (03PS1) 10Jelto: cache-text: add wikipedia25 to enabled_certificates [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) [14:54:22] (03CR) 10Majavah: [C:04-1] Notification for users to link their Phabricator account (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1224076 (owner: 10Slyngshede) [14:54:38] urbanecm: everything looks good with the patch I deployed [14:54:56] bearloga: awesome! [14:55:29] (03PS2) 10Jelto: cache-text: add wikipedia25 to enabled_certificates [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) [14:56:04] (03PS3) 10Arthur taylor: Enable the MEX / wbui2025 beta feature on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015) [14:56:24] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015) (owner: 10Arthur taylor) [14:56:40] (03CR) 10Majavah: "This patch is only removing tests from obsolete platforms, and expanding the same tests to run on new platforms (where they still all pass" [puppet] - 10https://gerrit.wikimedia.org/r/1219149 (owner: 10Majavah) [14:57:12] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on restbase1035 - https://phabricator.wikimedia.org/T413678#11499971 (10Jclark-ctr) @Eevans Failed drive has been replaced [14:58:19] (03CR) 10Elukey: [C:03+1] KDC: Remove support for buster [puppet] - 10https://gerrit.wikimedia.org/r/1219872 (owner: 10Muehlenhoff) [14:59:06] !log sukhe@cumin1003 END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough [14:59:57] (03CR) 10Elukey: [C:03+1] Remove cookbooks to migrate roles/hosts to Puppet [cookbooks] - 10https://gerrit.wikimedia.org/r/1219861 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff) [15:00:04] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1500) [15:00:22] PROBLEM - Bird Internet Routing Daemon on doh7003 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [15:00:32] (03CR) 10Elukey: [C:03+1] Remove puppetmaster::scripts [puppet] - 10https://gerrit.wikimedia.org/r/1224089 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff) [15:00:36] (03CR) 10JMeybohm: "Since it's rather complex in functionality and config we have httpbb tests for the docker registry (`modules/profile/templates/httpbb/dock" [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [15:00:55] (03PS3) 10Elukey: sre.hosts.reimage: remove puppet 5 support and default to 7 [cookbooks] - 10https://gerrit.wikimedia.org/r/1214488 (https://phabricator.wikimedia.org/T408219) [15:01:17] (03CR) 10Elukey: "@mmuhlenhoff@wikimedia.org Hi! Shall we proceed?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1214488 (https://phabricator.wikimedia.org/T408219) (owner: 10Elukey) [15:01:22] PROBLEM - Bird Internet Routing Daemon on doh7004 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [15:01:27] huh [15:01:33] 06SRE, 06collaboration-services, 13Patch-For-Review, 05PES1.3.3 WP25 Easter Eggs: Request: Wikipedia 25 microsite hosting - https://phabricator.wikimedia.org/T408592#11499988 (10Jelto) >>! In T408592#11497857, @ATitkov wrote: >> I noticed the website just shows a Wikipedia icon and the website is not usabl... [15:01:40] (03CR) 10Elukey: [V:03+2 C:03+2] Release version 0.0.17-1 [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1203871 (owner: 10Elukey) [15:02:02] (03CR) 10Dzahn: "found in compiler output: Enabled certificate wikipedia25 isn't available" [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:02:49] (03CR) 10Elukey: "TIL, +1" [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [15:02:58] (03CR) 10Ecarg: [C:03+2] wikifunctions: Upgrade evaluators from 2025-12-10-150641 to 2026-01-07-132938 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224078 (https://phabricator.wikimedia.org/T341624) (owner: 10Jforrester) [15:03:34] (03CR) 10Majavah: "the certificate needs adding to `profile::cache::haproxy::available_certificates` in `hieradata/role/common/cache/text.yaml` as well" [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:05:03] (03Merged) 10jenkins-bot: wikifunctions: Upgrade evaluators from 2025-12-10-150641 to 2026-01-07-132938 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224078 (https://phabricator.wikimedia.org/T341624) (owner: 10Jforrester) [15:05:40] (03CR) 10Clément Goubert: [C:03+1] Makefiles: strip trailing whitespace from parameters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223723 (owner: 10Daniel Kinzler) [15:06:00] (03CR) 10JMeybohm: docker registry: add ml build user password (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [15:06:31] !log ecarg@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:07:06] !log ecarg@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:07:35] (03PS3) 10Jelto: cache-text: add wikipedia25 to enabled_certificates [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) [15:08:00] !log ecarg@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:08:44] !log ecarg@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:08:50] (03CR) 10Elukey: docker registry: add ml build user password (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [15:08:56] !log ecarg@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:09:57] !log ecarg@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:10:27] (03CR) 10Ecarg: [C:03+2] wikifunctions: Upgrade orchestrator from 2025-12-10-133418 to 2026-01-07-132737 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224079 (https://phabricator.wikimedia.org/T341624) (owner: 10Jforrester) [15:10:27] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7856/co" [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:10:50] (03CR) 10Dzahn: [V:03+1] "https://puppet-compiler.wmflabs.org/output/1224096/7856/cp3069.esams.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:10:55] (03CR) 10Jelto: [V:03+1] "thanks a lot! I added this in patch set 3" [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:12:06] (03CR) 10JMeybohm: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) (owner: 10Elukey) [15:12:17] (03Merged) 10jenkins-bot: wikifunctions: Upgrade orchestrator from 2025-12-10-133418 to 2026-01-07-132737 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224079 (https://phabricator.wikimedia.org/T341624) (owner: 10Jforrester) [15:13:02] (03CR) 10Clément Goubert: [C:03+2] rest-gateway: Switch redis backend to 6380 instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223651 (https://phabricator.wikimedia.org/T413876) (owner: 10Clément Goubert) [15:13:36] !log ecarg@deploy2002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [15:13:58] !log ecarg@deploy2002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [15:14:28] !log ecarg@deploy2002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [15:14:55] (03Merged) 10jenkins-bot: rest-gateway: Switch redis backend to 6380 instance [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223651 (https://phabricator.wikimedia.org/T413876) (owner: 10Clément Goubert) [15:15:00] !log ecarg@deploy2002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [15:15:13] !log ecarg@deploy2002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [15:15:36] !log cgoubert@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [15:15:43] !log cgoubert@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [15:15:45] !log ecarg@deploy2002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [15:16:48] 10SRE-SLO, 10Citoid, 10VisualEditor, 06Editing-team (Tracking): Seperate SLO for requests made from Citoid Extension, possible wmf deployed extension only, vs bots etc. - https://phabricator.wikimedia.org/T345627#11500084 (10elukey) 05Resolved→03Open @Mvolz Hi! Sorry to re-open but something doesn't ma... [15:22:26] (03CR) 10Vgutierrez: [C:03+1] cache-text: add wikipedia25 to enabled_certificates [puppet] - 10https://gerrit.wikimedia.org/r/1224096 (https://phabricator.wikimedia.org/T408592) (owner: 10Jelto) [15:22:53] (03PS1) 10Fabfur: cache::text: enable some rate limit in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224104 (https://phabricator.wikimedia.org/T406545) [15:22:55] (03PS1) 10Fabfur: cache::text: enable remaining rate limits in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224105 (https://phabricator.wikimedia.org/T406545) [15:23:59] !log sukhe@cumin1003 START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox [15:24:07] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224104 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [15:24:26] elukey@cumin1003 provision (PID 3606797) is awaiting input [15:25:13] 06SRE, 10SRE-Access-Requests, 06Release-Engineering-Team (Radar): Add yubikey ssh keys for thcipriani - https://phabricator.wikimedia.org/T413416#11500142 (10thcipriani) 05Open→03Resolved a:03MoritzMuehlenhoff [15:25:52] !log cgoubert@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: sync [15:25:57] !log cgoubert@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: sync [15:28:23] (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224105 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [15:29:02] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host wdqs1032.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [15:29:22] RECOVERY - Bird Internet Routing Daemon on doh7003 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [15:30:05] Deploy window Wikifunctions Services UTC Afternoon (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1500) [15:30:05] Deploy window Test Kitchen Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1530) [15:31:45] (03PS1) 10Clément Goubert: rest-gateway: Fix staging nutcracker config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224111 [15:32:08] (03CR) 10Daniel Kinzler: [C:03+2] Makefiles: strip trailing whitespace from parameters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223723 (owner: 10Daniel Kinzler) [15:32:36] (03CR) 10Vgutierrez: [C:03+1] cache::text: enable some rate limit in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224104 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [15:32:39] elukey@cumin1003 provision (PID 3606797) is awaiting input [15:32:57] !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1032.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL [15:33:59] (03CR) 10Clément Goubert: [C:03+2] rest-gateway: Fix staging nutcracker config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224111 (owner: 10Clément Goubert) [15:34:00] (03CR) 10Scott French: [C:03+1] P:conftool::requestctl_client: update requestctl_cli.original.py [puppet] - 10https://gerrit.wikimedia.org/r/1224015 (https://phabricator.wikimedia.org/T404591) (owner: 10JMeybohm) [15:34:03] (03Merged) 10jenkins-bot: Makefiles: strip trailing whitespace from parameters [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223723 (owner: 10Daniel Kinzler) [15:35:03] (03CR) 10JMeybohm: [C:03+2] P:conftool::requestctl_client: update requestctl_cli.original.py [puppet] - 10https://gerrit.wikimedia.org/r/1224015 (https://phabricator.wikimedia.org/T404591) (owner: 10JMeybohm) [15:35:43] (03Merged) 10jenkins-bot: rest-gateway: Fix staging nutcracker config [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224111 (owner: 10Clément Goubert) [15:36:25] (03PS1) 10Silvan Heintze: Report # of skipped entities by type [dumps] - 10https://gerrit.wikimedia.org/r/1224110 (https://phabricator.wikimedia.org/T413869) [15:37:11] !log cgoubert@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [15:37:21] !log cgoubert@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [15:38:26] (03CR) 10Fabfur: [C:03+2] cache::text: enable some rate limit in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224104 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [15:40:22] RECOVERY - Bird Internet Routing Daemon on doh7004 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [15:40:31] !log uploaded Bird 2.18-1~wmf12u2 to component/bird-routed-ganeti for bookworm-wikimedia T413740 [15:40:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:34] T413740: Backport and test Bird 2.18 - https://phabricator.wikimedia.org/T413740 [15:41:36] 06SRE, 10envoy, 06serviceops: Upgrade Envoy to v1.35.7 - https://phabricator.wikimedia.org/T410975#11500242 (10Clement_Goubert) For the record, [[https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1217611|121761]] updates the rest-gateway to 1.35.7 [15:42:38] !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [15:42:55] !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [15:43:48] (03CR) 10Scott French: [C:03+1] "Ah, interesting! Yes, using Ceph from day 1 for their use case sounds like a solid way to put more production miles on it, as long as the " [puppet] - 10https://gerrit.wikimedia.org/r/1224091 (https://phabricator.wikimedia.org/T412951) (owner: 10Elukey) [15:48:31] !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply [15:48:53] !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [15:50:26] !issync [15:50:27] Syncing #wikimedia-operations (requested by taavi) [15:50:29] Set /cs flags #wikimedia-operations Az1568 -AFRefiorstv [15:50:31] Set /cs flags #wikimedia-operations wmopbot +t [15:50:33] Set /cs flags #wikimedia-operations sirenbot +V [15:51:23] (03CR) 10Jakob: [C:04-1] Report # of skipped entities by type (033 comments) [dumps] - 10https://gerrit.wikimedia.org/r/1224110 (https://phabricator.wikimedia.org/T413869) (owner: 10Silvan Heintze) [15:53:10] 06SRE, 10LDAP-Access-Requests: Grant Access to wmde for martyn.ranyard - https://phabricator.wikimedia.org/T413994 (10Martyn.ranyard) 03NEW [15:55:08] (03CR) 10Muehlenhoff: "I need to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1214564 first, will do that tomorrow." [cookbooks] - 10https://gerrit.wikimedia.org/r/1214488 (https://phabricator.wikimedia.org/T408219) (owner: 10Elukey) [16:02:21] !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host wdqs1032.eqiad.wmnet with OS trixie [16:05:39] (03PS1) 10AikoChou: ml-services: Update image and config for revise-tone-task-generator on staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224117 (https://phabricator.wikimedia.org/T412210) [16:10:29] (03PS1) 10CDanis: haproxy: proxy mmdb: all drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1224119 [16:15:31] (03CR) 10Vgutierrez: [C:03+1] cache::text: enable remaining rate limits in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224105 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:16:58] 10SRE-swift-storage, 06Commons, 10media-backups: File not found: /v1/AUTH_mw/wikipedia-commons-local-public ... for 3 files - https://phabricator.wikimedia.org/T400567#11500362 (10jcrespo) The file has been reuploaded from backups: https://commons.wikimedia.org/wiki/File:2025-11-16_ONEW_concert_032.jpg But... [16:17:15] (03PS2) 10Fabfur: cache::text: enable remaining rate limits in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224105 (https://phabricator.wikimedia.org/T406545) [16:18:06] !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1032.eqiad.wmnet with OS trixie [16:18:18] (03CR) 10Fabfur: [C:03+2] cache::text: enable remaining rate limits in cache::text (magru) [puppet] - 10https://gerrit.wikimedia.org/r/1224105 (https://phabricator.wikimedia.org/T406545) (owner: 10Fabfur) [16:20:19] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11500375 (10Papaul) @ssingh Hello and Happy New year I just wanted to check with you once again if it is now safe to resume the loopback IP changes on the co... [16:22:26] !log sukhe@cumin1003 END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox [16:24:28] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [16:24:48] !log sukhe@dns1004 START - running authdns-update [16:25:41] !log sukhe@dns1004 END - running authdns-update [16:28:01] (03PS5) 10Dpogorzelski: docker registry: add ml build user password [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) [16:30:06] (03PS1) 10Clément Goubert: ratelimit: Update to main branch e9ce92c [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1224124 [16:30:06] (03CR) 10Giuseppe Lavagetto: [C:03+1] haproxy: proxy mmdb: all drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [16:31:07] (03CR) 10Clément Goubert: [C:03+2] rest-gateway: Move REST API Sandbox to mw-rest-php [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223659 (https://phabricator.wikimedia.org/T396807) (owner: 10Clément Goubert) [16:32:57] (03Merged) 10jenkins-bot: rest-gateway: Move REST API Sandbox to mw-rest-php [deployment-charts] - 10https://gerrit.wikimedia.org/r/1223659 (https://phabricator.wikimedia.org/T396807) (owner: 10Clément Goubert) [16:33:17] !log cgoubert@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply [16:33:22] !log cgoubert@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply [16:33:34] !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply [16:34:12] !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply [16:37:03] !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply [16:37:16] !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply [16:38:33] (03CR) 10Fabfur: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [16:38:42] 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11500425 (10ssingh) >>! In T408892#11500375, @Papaul wrote: > @ssingh Hello and Happy New year I just wanted to check with you once again if it is now safe t... [16:41:53] (03CR) 10Hnowlan: [C:03+1] ratelimit: Update to main branch e9ce92c [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1224124 (owner: 10Clément Goubert) [16:47:10] (03CR) 10Scott French: "Thanks, Blake! One question, one optional thought." [software/spicerack] - 10https://gerrit.wikimedia.org/r/1224041 (https://phabricator.wikimedia.org/T412211) (owner: 10Blake) [16:50:32] (03CR) 10Bartosz Wójtowicz: [C:03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224117 (https://phabricator.wikimedia.org/T412210) (owner: 10AikoChou) [17:04:38] (03PS6) 10Dpogorzelski: docker registry: add ml build user password [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) [17:04:49] (03CR) 10Dpogorzelski: "well, to me it's not necessarily important that other users can't push or pull from the ml namespace, the only important thing is that ml-" [puppet] - 10https://gerrit.wikimedia.org/r/1220352 (https://phabricator.wikimedia.org/T412524) (owner: 10Dpogorzelski) [17:06:58] (03CR) 10Blake: service: add excluded_services helper function (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1224041 (https://phabricator.wikimedia.org/T412211) (owner: 10Blake) [17:28:23] (03PS1) 10Eevans: Revert "restbase1035: remove sdc data file directory (device failed)" [puppet] - 10https://gerrit.wikimedia.org/r/1224139 [17:29:17] (03CR) 10Eevans: [C:03+2] Revert "restbase1035: remove sdc data file directory (device failed)" [puppet] - 10https://gerrit.wikimedia.org/r/1224139 (owner: 10Eevans) [17:33:21] !log restarting restbase1035-{a,b,c}/Cassandra to re-add /dev/sdc — T413678 [17:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:24] T413678: Degraded RAID on restbase1035 - https://phabricator.wikimedia.org/T413678 [18:00:05] Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1800) [18:04:10] (03PS2) 10Clare Ming: extension-list: Add Test Kitchen [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216847 (https://phabricator.wikimedia.org/T407806) [18:04:27] (03CR) 10Clare Ming: "i think it's ok to merge this now?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1216847 (https://phabricator.wikimedia.org/T407806) (owner: 10Clare Ming) [18:06:40] (03CR) 10Clare Ming: "does this have to wait on anything?" [puppet] - 10https://gerrit.wikimedia.org/r/1212435 (https://phabricator.wikimedia.org/T407805) (owner: 10Brouberol) [18:08:48] (03PS1) 10CDanis: haproxy: proxy mmdb: add to x-analytics [puppet] - 10https://gerrit.wikimedia.org/r/1224143 [18:08:56] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [18:23:12] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on restbase1035 - https://phabricator.wikimedia.org/T413678#11500762 (10Eevans) >>! In T413678#11499971, @Jclark-ctr wrote: > @Eevans Failed drive has been replaced Everything looks good on my end; Thanks @Jclark-ctr [18:23:34] 10ops-eqiad, 06SRE, 06DC-Ops, 06Data-Platform-SRE (2025.11.07 - 2025.11.28): Degraded RAID on an-worker1198 - https://phabricator.wikimedia.org/T413336#11500763 (10Jclark-ctr) a:05BTullis→03Jclark-ctr [18:23:55] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11500764 (10Jclark-ctr) a:05BTullis→03Jclark-ctr [18:24:06] (03CR) 10Scott French: service: add excluded_services helper function (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1224041 (https://phabricator.wikimedia.org/T412211) (owner: 10Blake) [18:24:17] 10ops-eqiad, 06SRE, 06DC-Ops: hw troubleshooting: PERC1 battery failure for an-worker1148 - https://phabricator.wikimedia.org/T411919#11500765 (10Jclark-ctr) a:05Jclark-ctr→03BTullis [18:25:00] RECOVERY - MD RAID on restbase1035 is OK: OK: Active: 9, Working: 9, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [18:26:56] (03PS1) 10SBassett: Set CSP Report Only mode for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) [18:27:39] (03CR) 10CI reject: [V:04-1] Set CSP Report Only mode for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [18:28:47] (03PS1) 10Btullis: Add a kyuubi service to the spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224145 (https://phabricator.wikimedia.org/T413977) [18:29:56] (03PS2) 10SBassett: Set CSP Report Only mode for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) [18:30:43] (03CR) 10CI reject: [V:04-1] Set CSP Report Only mode for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [18:30:59] (03PS3) 10SBassett: Set CSP Report Only mode for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) [18:31:20] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [18:31:57] (03CR) 10SBassett: [C:04-2] "Hold for config deploy during Thursday late backport window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224144 (https://phabricator.wikimedia.org/T291867) (owner: 10SBassett) [18:32:50] (03PS23) 10Daniel Kinzler: api-gateway: Rest-gateway Read `ratelimit_class` and `user_id` from JWT [deployment-charts] - 10https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: 10Pmiazga) [18:36:53] (03CR) 10Scott French: haproxy: proxy mmdb: add to x-analytics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [18:37:47] (03PS4) 10Clare Ming: Deploy TestKitchen to Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217360 (https://phabricator.wikimedia.org/T407806) [18:50:25] 10ops-eqiad, 06SRE, 06DC-Ops: Degraded RAID on restbase1035 - https://phabricator.wikimedia.org/T413678#11500903 (10Jclark-ctr) 05Open→03Resolved [18:51:18] (03PS2) 10Btullis: Add a kyuubi service to the spark-support chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224145 (https://phabricator.wikimedia.org/T413977) [19:00:05] dduvall and dancy: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) MediaWiki train - Utc-7 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T1900). [19:05:25] (03PS2) 10CDanis: haproxy: proxy mmdb: add to x-analytics [puppet] - 10https://gerrit.wikimedia.org/r/1224143 [19:05:25] (03PS2) 10CDanis: haproxy: proxy mmdb: all drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1224119 [19:05:56] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [19:05:58] (03CR) 10CDanis: haproxy: proxy mmdb: add to x-analytics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [19:06:01] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [19:09:11] (03CR) 10Scott French: [C:03+1] haproxy: proxy mmdb: add to x-analytics [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [19:09:40] (03PS3) 10CDanis: haproxy: proxy mmdb: add to x-analytics [puppet] - 10https://gerrit.wikimedia.org/r/1224143 [19:09:41] (03PS3) 10CDanis: haproxy: proxy mmdb: all drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1224119 [19:09:53] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [19:13:02] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [19:13:07] (03CR) 10CDanis: [C:03+2] haproxy: proxy mmdb: add to x-analytics [puppet] - 10https://gerrit.wikimedia.org/r/1224143 (owner: 10CDanis) [19:14:52] o/ [19:15:20] (03PS1) 10TrainBranchBot: group1 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224163 (https://phabricator.wikimedia.org/T408280) [19:15:22] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224163 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [19:16:13] (03Merged) 10jenkins-bot: group1 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224163 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [19:22:13] !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.10 refs T408280 [19:22:17] T408280: 1.46.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T408280 [19:22:41] (03CR) 10CDanis: [C:03+2] haproxy: proxy mmdb: all drmrs [puppet] - 10https://gerrit.wikimedia.org/r/1224119 (owner: 10CDanis) [19:32:15] !log upload corto 1.0.22 to apt repo [19:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:43] oh boy, a slew (630,829) of deprecation errors [19:34:48] !log rolling back 1.46.0-wmf.10 from group1 due to a large number (630,829) of new deprecation warnings (T408280) [19:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:51] T408280: 1.46.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T408280 [19:35:01] (03PS1) 10TrainBranchBot: group0 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224166 (https://phabricator.wikimedia.org/T408280) [19:35:04] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224166 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [19:35:56] (03Merged) 10jenkins-bot: group0 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224166 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [19:37:19] (03PS2) 10Jsn.sherman: extension-list: Add PersonalDashboard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217786 (https://phabricator.wikimedia.org/T412528) [19:37:38] oh shoot, this seems to be a non-backwards compat unserialization issue. the actual errors arise in wmf.7. this is going to be fun [19:41:29] (03PS1) 10CDanis: haproxy: proxy mmdb: all 🌍 [puppet] - 10https://gerrit.wikimedia.org/r/1224168 [19:41:54] !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.10 refs T408280 [19:41:57] T408280: 1.46.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T408280 [19:42:28] (03PS2) 10CDanis: haproxy: proxy mmdb: all 🌍 [puppet] - 10https://gerrit.wikimedia.org/r/1224168 [19:42:29] (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1224168 (owner: 10CDanis) [19:45:10] (03PS3) 10Jsn.sherman: InitialiseSettings.php: Add wmgUsePersonalDashboard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217787 (https://phabricator.wikimedia.org/T412528) [19:45:17] (03PS4) 10Jsn.sherman: InitialiseSettings-labs.php: Deploy PersonalDashboard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217788 (https://phabricator.wikimedia.org/T412528) [19:45:23] (03PS4) 10Jsn.sherman: CommonSettings-labs: Load PersonalDashbard extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217789 (https://phabricator.wikimedia.org/T412528) [19:48:57] 06SRE, 10LDAP-Access-Requests: Grant Access to wmde for martyn.ranyard - https://phabricator.wikimedia.org/T413994#11501121 (10WMDE-leszek) I approve this request on WMDE's end. Thank you [19:51:09] (03PS1) 10C. Scott Ananian: Turn off magic ISBN/RFC/PMID links on iawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224169 (https://phabricator.wikimedia.org/T414019) [19:54:55] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 08 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1217786 (https://phabricator.wikimedia.org/T412528) (owner: 10Jsn.sherman) [19:58:17] dduvall: Parser related? [20:01:39] Reedy: unclear to me [20:01:46] I've just seen the task :) [20:04:06] dduvall: So two options I can see... Backport https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/commit/677ff412e913c6607102a3c8d73b6539c41b16d8 wholesale [20:04:12] Which will be slow, because i18n changes [20:04:18] Reedy: thanks for triaging. <3 it's always a good time trying to understand an error that only occurs in older code following unserialization :D [20:04:29] Or.. just add the property as a noop on the old version [20:04:50] Daimona: Around? :P [20:05:08] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:05:09] Reedy: when you say backport, do you mean backport to wmf.7? [20:05:16] Yeah [20:05:30] hmm, the noop seems much safer to me (with zero context) [20:05:30] I think that's overkill though [20:05:56] In reality, it's just logspam [20:06:11] it is yeah, just... lots of it [20:06:16] and that's only on group1 [20:06:46] I'm guessing it's some global-y AF rules that are getting mixed version hits from cache [20:06:47] i worry about toppling our logging infra with the levels that group2 would generate [20:06:57] * Reedy looks where the cache key is for this [20:07:26] Yeah... So that would be the other option in my mind [20:07:50] Adding some versioning, so both MW versions aren't going to be trying to use the same version from cache [20:07:56] It's probably excessive though [20:08:02] * Reedy puts a patch up for the variable [20:08:23] +1 for the prop addition. thanks for jumping on it [20:09:48] heh, Zabe having similar thought then [20:10:15] Ah, didnt see this [20:10:32] Your patch WFM if CI is ok with it :) [20:11:48] looks like there's an associated test you neeed to update [20:11:58] yeah [20:12:01] which is seemingly expecting very specific cache keys... [20:12:14] PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:13:50] thanks for the patch zabe <3 [20:17:14] PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:19:58] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:20:04] (03PS1) 10Daniel Kinzler: rest-gateway: add support for sessionJwt cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224173 [20:20:30] i'm going to grab a quick lunch. if you all have a cherry-pick ready to go when i get back i can backport and re-roll train. thanks again! [20:20:41] Reedy: I'm around now. What's borked? [20:20:49] Daimona: production ;) [20:21:03] maybe https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/1224171/2 is just enough [20:21:06] It was an AF issue, so I tried pinging you before digging into it too much. Zabe has a patch up though, just waiting for CI [20:21:09] Pheew I thought it was something serious [20:21:28] 800K lines of logspam is a lot :D [20:22:01] Ah I see, ye olde cache version to bump. Happy there is already a patch up because my brain just can't :D [20:22:04] RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55565 bytes in 0.080 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:22:04] RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.194 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [20:25:05] * Reedy waits for jerkins [20:26:39] (03PS2) 10Daniel Kinzler: rest-gateway: add support for sessionJwt cookies [deployment-charts] - 10https://gerrit.wikimedia.org/r/1224173 [20:50:12] (03PS1) 10CDanis: benthos: webrequest: add res_proxy [puppet] - 10https://gerrit.wikimedia.org/r/1224178 [20:53:09] (03PS1) 10CDanis: turnilo: webrequest: add res_proxy [puppet] - 10https://gerrit.wikimedia.org/r/1224181 [21:00:05] RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T2100). [21:00:05] No Gerrit patches in the queue for this window AFAICS. [21:00:49] perfect [21:01:53] (03PS1) 10Reedy: Bump cache key version of FilterLookup::getAllActiveFiltersInGroup [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) [21:01:56] (03PS2) 10Zabe: Bump cache key version of FilterLookup::getAllActiveFiltersInGroup [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:01:58] (03CR) 10Reedy: [C:03+2] "Copied votes on follow-up patch sets have been updated:" [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:02:06] heh [21:02:13] you were fater [21:02:16] *faster [21:02:29] (03PS3) 10Reedy: MWScript: Remove curl_close() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221218 (https://phabricator.wikimedia.org/T413538) [21:02:34] (03CR) 10Reedy: [C:03+2] MWScript: Remove curl_close() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221218 (https://phabricator.wikimedia.org/T413538) (owner: 10Reedy) [21:03:28] (03Merged) 10jenkins-bot: MWScript: Remove curl_close() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1221218 (https://phabricator.wikimedia.org/T413538) (owner: 10Reedy) [21:06:08] oh nice. Reedy are you doing the backport deploy? [21:06:28] I was going to as I wasn't sure if you were back from lunch [21:06:41] yeah, sounds good to me. i just say down :D [21:06:55] *sat* [21:07:02] heh [21:08:59] should be good to try the train again soon after [21:10:26] (03CR) 10Scott French: [C:03+1] benthos: webrequest: add res_proxy [puppet] - 10https://gerrit.wikimedia.org/r/1224178 (owner: 10CDanis) [21:10:33] (03CR) 10Scott French: [C:03+1] turnilo: webrequest: add res_proxy [puppet] - 10https://gerrit.wikimedia.org/r/1224181 (owner: 10CDanis) [21:13:52] 06SRE, 06serviceops, 10Wikimedia-Site-requests, 13Patch-For-Review: Change $wgMaxArticleSize limit from byte-based to character-based - https://phabricator.wikimedia.org/T275319#11501360 (10Jeff_G) See also https://commons.wikimedia.org/w/index.php?title=Commons%3AAdministrators%27_noticeboard&diff=1143860... [21:13:54] (03CR) 10Reedy: Bump cache key version of FilterLookup::getAllActiveFiltersInGroup [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:13:58] (03CR) 10Reedy: [C:03+2] Bump cache key version of FilterLookup::getAllActiveFiltersInGroup [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:14:09] * Reedy grumbles [21:15:25] (03Merged) 10jenkins-bot: Bump cache key version of FilterLookup::getAllActiveFiltersInGroup [extensions/AbuseFilter] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224182 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:15:41] thank god for the testing cache thing [21:17:23] !log reedy@deploy2002 Started scap sync-world: Backport for [[gerrit:1224182|Bump cache key version of FilterLookup::getAllActiveFiltersInGroup (T414016)]], [[gerrit:1221218|MWScript: Remove curl_close() (T413538)]] [21:17:28] T414016: PHP Deprecated: Creation of dynamic property MediaWiki\Extension\AbuseFilter\Filter\Flags::$suppressed is deprecated - https://phabricator.wikimedia.org/T414016 [21:17:28] T413538: Deprecation Notice: Function curl_close() is deprecated since 8.5, as it has no effect since PHP 8.0 - https://phabricator.wikimedia.org/T413538 [21:19:30] !log reedy@deploy2002 reedy: Backport for [[gerrit:1224182|Bump cache key version of FilterLookup::getAllActiveFiltersInGroup (T414016)]], [[gerrit:1221218|MWScript: Remove curl_close() (T413538)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [21:19:50] !log reedy@deploy2002 reedy: Continuing with sync [21:21:42] Reedy: so tempting to make a Maui from Moana joke there, but i'm just glad it helped :D [21:21:57] ready to train whenever that completes [21:22:17] scap seems to be a lot quicker again atm... so should be done in a couple of mins [21:23:47] the speed of scap mostly depends on the image building and whether i10n cache is rebuilt [21:23:53] !log reedy@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224182|Bump cache key version of FilterLookup::getAllActiveFiltersInGroup (T414016)]], [[gerrit:1221218|MWScript: Remove curl_close() (T413538)]] (duration: 06m 30s) [21:23:58] T414016: PHP Deprecated: Creation of dynamic property MediaWiki\Extension\AbuseFilter\Filter\Flags::$suppressed is deprecated - https://phabricator.wikimedia.org/T414016 [21:23:58] T413538: Deprecation Notice: Function curl_close() is deprecated since 8.5, as it has no effect since PHP 8.0 - https://phabricator.wikimedia.org/T413538 [21:24:06] 6.5 mins is pretty cool [21:24:21] Nice [21:24:40] Should be good for the train again [21:25:03] right on. will roll momentarily [21:27:30] (03PS1) 10TrainBranchBot: group1 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224188 (https://phabricator.wikimedia.org/T408280) [21:27:33] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by dduvall@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224188 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [21:28:20] https://www.mediawiki.org/wiki/MediaWiki_1.46/wmf.10/Changelog was surprisingly smaller than I expected [21:28:57] (03Merged) 10jenkins-bot: group1 to 1.46.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224188 (https://phabricator.wikimedia.org/T408280) (owner: 10TrainBranchBot) [21:29:10] yeah, huh. i never know what to expect with the first deploy of the year [21:29:57] I thought a few of us had been a lot busier over the break than that :D [21:30:45] not me! i was laaaaazy :D [21:31:22] Maybe we just filed a lot of bugs for things... [21:35:06] >[{reqId}] {exception_url} PHP Deprecated: Creation of dynamic property MediaWiki\Extension\AbuseFilter\Filter\Flags::$suppressed is deprecated [21:35:10] !log dduvall@deploy2002 rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.10 refs T408280 [21:35:13] T408280: 1.46.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T408280 [21:35:15] It's back :| [21:35:21] or not fixed [21:35:29] :/ [21:36:21] Oh, that's annoying [21:36:29] :/ [21:36:30] I'm around to help look into this [21:37:55] It's a different stack [21:38:06] Very similar, but not the same [21:38:20] And they all look to be from API requests [21:43:41] Logs look to have stopped though now [21:44:01] Actually no, they have started again [21:44:02] seems to be going up every time I refresh :P [21:44:58] Yeah, you are right [21:45:28] FilterRunner::runForStash [21:45:34] $this->stashCache->store( $origVars, $cacheData ); [21:46:21] EditStashCache has a version too.. [21:47:05] These seem to be ve stashes? [21:48:33] But the warning comes before that is called [21:48:54] Yeah [21:50:45] T414031 feels prominent enough that I filed it as a blocker [21:50:46] T414031: Missing space before changes list action links on wmf.10 - https://phabricator.wikimedia.org/T414031 [21:51:21] You've actually filed a dupe ;) [21:52:08] whoops :P [21:52:28] Reedy: thinking i'm going to roll it back again unless there's a reason not to [21:53:06] dduvall: I mean, it is logspam, so nothing functionality wise is broken [21:53:17] Dreamy_Jazz: zabe: Any ideas before potentially rolling back again? [21:53:37] Really I'm just going round in circles in the code [21:53:42] I'm tempted just to add the property to the class for .7 [21:53:55] We could try that [21:54:05] yeah [21:54:26] !log bking@clouddumps100{1,2} created /srv/dumps/xmldatadumps/public/other/cirrussearch/DEPRECATED.txt T366248 [21:54:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:29] T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script - https://phabricator.wikimedia.org/T366248 [21:54:46] Is someone creating a patch for that? If not, I can create it now [21:54:58] (03PS2) 10Reedy: Add Flags::$suppressed [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) [21:55:08] Dreamy_Jazz: ^ [21:55:21] (03CR) 10Dreamy Jazz: [C:03+1] Add Flags::$suppressed [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [21:55:25] +1'd [21:55:32] heh [21:56:19] The number of errors is dropping though [21:56:22] there seems to be gaps [21:56:30] yeah that is strange [21:56:31] Yeah it's every 5 minutes for me for some reason [21:56:45] That 300s == 5 minutes makes it feel something weird caching related [21:57:18] some cache pollution between versions or something? [21:57:23] I feel that the number of errors won't drop fully if I am reading the code correctly, because it's a global key [21:57:27] pollution cycle [21:57:30] Yeah [21:57:51] So if the value gets set on a group0 or 1 wiki, then it has the suppressed property [21:58:06] Probably an argument for why caching objects is probably not a good idea :D [21:58:21] but the version should vary between .7 and .10 now... [21:58:21] :D [21:59:10] The new trace has deferred update stuff in play... [22:00:05] Deploy window Wikifunctions Services UTC Late (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T2200) [22:00:47] Should someone deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/1224192 now or are we going to try something else? [22:01:25] I was waiting for CI [22:01:37] i can deploy it if someone +2s [22:01:41] Oh, I see so it can use the cache to avoid re-running [22:01:47] Yeah let's wait for CI to finish [22:02:03] As it'll probably be quicker that way round [22:02:12] i don't think the cache will apply to the cherry-pick however [22:02:18] (03CR) 10Zabe: [C:03+1] Add Flags::$suppressed [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [22:02:31] Oh maybe I mis-remembered them applying to the gate-and-submit version after a successful test? [22:02:52] oh i see. it's only on wmf.7 yeah makes sense [22:03:09] adding the property could fix the issue here, but it also feels a bit like a blind shot without actually understanding what the issue is [22:03:22] It's some caching issue somewhere... [22:03:43] but it should be a logspam of the dynamic creation [22:03:44] Dreamy_Jazz: yeah, it's based on an aggregate digest of all `HEAD^{tree}` values [22:05:21] "but it also feels a bit like a blind shot without actually understanding what the issue is" - Yes, though I feel like the logs appearing means that the new version of the code is being used for wmf.10 (and so the new logic is being evaluated)? i.e. if it was the other way round, then the old Flags class would be kept even though it should have the suppressed property? [22:06:14] CI has passed BTW on the patch [22:06:21] Error numbers are dropping [22:06:35] (03CR) 10Reedy: [C:03+2] Add Flags::$suppressed [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [22:06:41] The patch itself isn't going to cause any harm [22:06:48] k, i can backport it [22:06:55] By the time it's deployed, the error number may be getting a lot lower still [22:07:32] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dduvall@deploy2002 using scap backport" [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [22:08:02] There's a patch for taavi's UBN now too [22:08:31] (03Merged) 10jenkins-bot: Add Flags::$suppressed [extensions/AbuseFilter] (wmf/1.46.0-wmf.7) - 10https://gerrit.wikimedia.org/r/1224192 (https://phabricator.wikimedia.org/T414016) (owner: 10Reedy) [22:08:38] that was quick ;) [22:08:58] Looking at the error logs after the testwiki deploy of wmf.10, it seems that they might have repeated the same issue several hours later [22:09:04] !log dduvall@deploy2002 Started scap sync-world: Backport for [[gerrit:1224192|Add Flags::$suppressed (T414016)]] [22:09:07] T414016: PHP Deprecated: Creation of dynamic property MediaWiki\Extension\AbuseFilter\Filter\Flags::$suppressed is deprecated - https://phabricator.wikimedia.org/T414016 [22:09:18] i.e. a spike at 18:00 and then at 06:00 the next morning [22:10:00] There was not a drop for those periods of time though AFAICS [22:10:25] Though I guess that was because the cache version was not fixed [22:11:10] !log dduvall@deploy2002 reedy, dduvall: Backport for [[gerrit:1224192|Add Flags::$suppressed (T414016)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:11:14] Anyway, hopefully that addresses the log spam. The functionality that is being implemented is new, so I don't think there is a risk of having lots of private AbuseFilters exposed [22:11:38] if this is a symptom of a larger breakage of the functionality [22:11:50] !log dduvall@deploy2002 reedy, dduvall: Continuing with sync [22:11:51] 10SRE-Access-Requests, 06Release-Engineering-Team: Add yubikey ssh key for dancy - https://phabricator.wikimedia.org/T414032 (10dancy) 03NEW [22:14:28] (03PS1) 10Ahmon Dancy: Yubikey-SSH-FIDO: add new key for dancy [puppet] - 10https://gerrit.wikimedia.org/r/1224205 (https://phabricator.wikimedia.org/T414032) [22:15:16] It's definitely continuing to drop... [22:15:49] !log dduvall@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224192|Add Flags::$suppressed (T414016)]] (duration: 06m 45s) [22:15:53] T414016: PHP Deprecated: Creation of dynamic property MediaWiki\Extension\AbuseFilter\Filter\Flags::$suppressed is deprecated - https://phabricator.wikimedia.org/T414016 [22:16:16] whee [22:16:23] Coming down further... [22:16:48] Need to give it ~10 mins for the last bulk of errors from the last 15 mins to go away [22:18:23] seems promising so far [22:19:21] 10SRE-Access-Requests, 06Release-Engineering-Team, 13Patch-For-Review: Add yubikey ssh key for dancy - https://phabricator.wikimedia.org/T414032#11501643 (10dancy) [22:19:39] Congrats! [22:19:49] Reedy, Dreamy_Jazz, zabe, Daimona: big thanks for helping [22:19:51] Welcome back, Train! [22:20:20] MediaWiki Train 2026, back with a vengeance [22:20:34] Echo'ing thanks to everyone (won't look at who +2'd that AbuseFilter patch before Christmas :D ) [22:20:51] blame Santa [22:20:58] haha [22:21:42] There doesn't seem to be any other errors... I saw 1 occurance of something else, but it's already filed [22:21:43] deprecation warnings are the new coal [22:22:42] yeah looks ok from here, though taavi has that UI regression fix [22:23:49] That's going through on master... it just failed to merge because of a patch before it failing due to browser test related things [22:25:11] And if it's only going to take ~6 minutes to scap [22:25:12] happy days [22:25:23] i can stick around and backport it once it's ready [22:26:05] we're down to 0 new errors in the last 15 mins [22:26:05] (03PS1) 10Dduvall: Specials: Add space before page tools in changes lists [core] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224207 (https://phabricator.wikimedia.org/T414031) [22:26:06] wheee [22:26:22] \o/ [22:46:22] (03CR) 10Reedy: [C:03+2] Specials: Add space before page tools in changes lists [core] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224207 (https://phabricator.wikimedia.org/T414031) (owner: 10Dduvall) [22:46:59] (03CR) 10TrainBranchBot: [C:03+2] "Approved by dduvall@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224207 (https://phabricator.wikimedia.org/T414031) (owner: 10Dduvall) [22:50:50] (03Merged) 10jenkins-bot: Specials: Add space before page tools in changes lists [core] (wmf/1.46.0-wmf.10) - 10https://gerrit.wikimedia.org/r/1224207 (https://phabricator.wikimedia.org/T414031) (owner: 10Dduvall) [22:51:19] !log dduvall@deploy2002 Started scap sync-world: Backport for [[gerrit:1224207|Specials: Add space before page tools in changes lists (T414031)]] [22:51:22] T414031: Missing space before changes list action links on wmf.10 - https://phabricator.wikimedia.org/T414031 [22:53:26] !log dduvall@deploy2002 dduvall: Backport for [[gerrit:1224207|Specials: Add space before page tools in changes lists (T414031)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [22:53:49] !log dduvall@deploy2002 dduvall: Continuing with sync [22:57:49] !log dduvall@deploy2002 Finished scap sync-world: Backport for [[gerrit:1224207|Specials: Add space before page tools in changes lists (T414031)]] (duration: 06m 30s) [22:57:52] T414031: Missing space before changes list action links on wmf.10 - https://phabricator.wikimedia.org/T414031 [23:00:05] Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260107T2300) [23:01:11] all done [23:09:56] sweet [23:45:37] (03PS1) 10Aaron Schulz: Copy rest_v1-wikimedia.json to standard-docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224228 (https://phabricator.wikimedia.org/T396807) [23:55:12] (03PS1) 10Superpes15: [enwikiquote] Add new autopatrolled and patroller usergroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224232 (https://phabricator.wikimedia.org/T413848) [23:56:08] (03CR) 10CI reject: [V:04-1] [enwikiquote] Add new autopatrolled and patroller usergroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224232 (https://phabricator.wikimedia.org/T413848) (owner: 10Superpes15) [23:58:10] (03PS2) 10Superpes15: [enwikiquote] Add new autopatrolled and patroller usergroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1224232 (https://phabricator.wikimedia.org/T413848)