[00:04:26] FIRING: [9x] SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:07] (03Abandoned) 10Krinkle: Use Request-Timeout header to set jobrunner PHP timeouts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577642 (https://phabricator.wikimedia.org/T247114) (owner: 10Ppchelko) [00:33:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88056 and previous config saved to /var/cache/conftool/dbconfig/20260129-003310-marostegui.json [00:33:16] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [00:40:22] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 [00:40:23] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [00:48:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88057 and previous config saved to /var/cache/conftool/dbconfig/20260129-004818-marostegui.json [00:53:47] (03CR) 10CI reject: [V:04-1] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1234550 (owner: 10TrainBranchBot) [01:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P88058 and previous config saved to /var/cache/conftool/dbconfig/20260129-010327-marostegui.json [01:10:31] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 [01:10:31] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:18:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2217 (T415786)', diff saved to https://phabricator.wikimedia.org/P88059 and previous config saved to /var/cache/conftool/dbconfig/20260129-011836-marostegui.json [01:18:42] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [01:18:52] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance [01:19:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88060 and previous config saved to /var/cache/conftool/dbconfig/20260129-011900-marostegui.json [01:29:54] PROBLEM - MariaDB Replica Lag: m2 on db1217 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2231.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:31:54] RECOVERY - MariaDB Replica Lag: m2 on db1217 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:36:11] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1234553 (owner: 10TrainBranchBot) [01:41:00] PROBLEM - MariaDB Replica Lag: m2 on db2160 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 647.38 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:44:02] RECOVERY - MariaDB Replica Lag: m2 on db2160 is OK: OK slave_sql_lag Replication lag: 30.65 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:01:00] !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image [02:13:44] !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 12m 44s) [02:14:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88061 and previous config saved to /var/cache/conftool/dbconfig/20260129-021418-marostegui.json [02:14:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [02:29:27] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88062 and previous config saved to /var/cache/conftool/dbconfig/20260129-022926-marostegui.json [02:44:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P88063 and previous config saved to /var/cache/conftool/dbconfig/20260129-024435-marostegui.json [02:59:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2224 (T415786)', diff saved to https://phabricator.wikimedia.org/P88064 and previous config saved to /var/cache/conftool/dbconfig/20260129-025943-marostegui.json [02:59:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [03:00:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2229.codfw.wmnet with reason: Maintenance [03:00:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88065 and previous config saved to /var/cache/conftool/dbconfig/20260129-030008-marostegui.json [03:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:39:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:49:18] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88066 and previous config saved to /var/cache/conftool/dbconfig/20260129-034917-marostegui.json [03:49:23] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:02:56] (03PS6) 10Ryan Kemper: opensearch-semantic-search: provision namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:02:56] (03PS2) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:04:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88067 and previous config saved to /var/cache/conftool/dbconfig/20260129-040426-marostegui.json [04:04:41] FIRING: [8x] SystemdUnitFailed: nginx.service on urldownloader1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:34] (03PS7) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:08:34] (03PS3) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:08:34] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:08:36] (03PS1) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS8) 10Ryan Kemper: opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) [04:18:37] (03PS4) 10Ryan Kemper: opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) [04:18:37] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) [04:18:38] (03PS2) 10Ryan Kemper: opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) [04:19:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P88068 and previous config saved to /var/cache/conftool/dbconfig/20260129-041934-marostegui.json [04:24:08] (03CR) 10Ryan Kemper: "Should be ready for final review now" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [04:34:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2229 (T415786)', diff saved to https://phabricator.wikimedia.org/P88069 and previous config saved to /var/cache/conftool/dbconfig/20260129-043443-marostegui.json [04:34:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [04:48:06] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - kubemaster_6443: Servers wikikube-ctrl2002.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [04:49:06] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [05:09:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:14:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:18:46] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:21:46] PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:22:36] RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Sat 04 Apr 2026 07:22:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [05:33:14] PROBLEM - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is CRITICAL: 1.096e+05 gt 1e+05 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [05:34:15] FIRING: [3x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:42:42] (03CR) 10Ryan Kemper: Replace elasticsearch lib w/ spicerack APIClient (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/1167299 (https://phabricator.wikimedia.org/T390860) (owner: 10Ryan Kemper) [05:49:05] (03PS8) 10Ryan Kemper: hadoop.reboot-workers: make host override smarter [cookbooks] - 10https://gerrit.wikimedia.org/r/1214664 (https://phabricator.wikimedia.org/T411568) [06:03:30] (03Abandoned) 10Ryan Kemper: wdqs: Add new endpoints to allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1201296 (https://phabricator.wikimedia.org/T407407) (owner: 10Bking) [06:06:37] (03Abandoned) 10Ryan Kemper: flink-kubernetes-operator: change flink download URL [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1008534 (https://phabricator.wikimedia.org/T358879) (owner: 10Bking) [06:16:32] (03PS3) 10Bking: wdqs-categories: enable scrapes for jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) [06:16:50] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:17:14] RECOVERY - Kafka MirrorMaker main-codfw_to_main-eqiad max lag in last 10 minutes on alert1002 is OK: (C)1e+05 gt (W)1e+04 gt 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/d/000000521/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=codfw+prometheus/ops&var-mirror_name=main-codfw_to_main-eqiad [06:17:18] (03CR) 10Ryan Kemper: "addressed by ps2" [puppet] - 10https://gerrit.wikimedia.org/r/1118162 (https://phabricator.wikimedia.org/T385236) (owner: 10Bking) [06:22:57] (03Abandoned) 10Ryan Kemper: elasticsearch: move to opensearch client [software/spicerack] - 10https://gerrit.wikimedia.org/r/966492 (https://phabricator.wikimedia.org/T345337) (owner: 10David Caro) [06:35:37] (03PS1) 10Marostegui: Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 [06:35:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db2212: After schema change [06:36:41] (03CR) 10Marostegui: [C:03+2] Revert "db2212: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/1234780 (owner: 10Marostegui) [06:38:06] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance [06:38:13] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1159.eqiad.wmnet with reason: Maintenance [06:38:14] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88071 and previous config saved to /var/cache/conftool/dbconfig/20260129-063813-marostegui.json [06:38:21] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [06:38:21] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88072 and previous config saved to /var/cache/conftool/dbconfig/20260129-063820-marostegui.json [06:52:08] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) [06:52:38] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2229 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234788 (https://phabricator.wikimedia.org/T415862) [06:52:45] (03PS1) 10Gerrit maintenance bot: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/1234789 (https://phabricator.wikimedia.org/T415862) [06:55:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1173 with weight 0 T415861', diff saved to https://phabricator.wikimedia.org/P88074 and previous config saved to /var/cache/conftool/dbconfig/20260129-065528-marostegui.json [06:55:38] T415861: Switchover s6 master (db1201 -> db1173) - https://phabricator.wikimedia.org/T415861 [06:55:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s6 T415861 [06:56:02] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1173 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/1234787 (https://phabricator.wikimedia.org/T415861) (owner: 10Gerrit maintenance bot) [06:57:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1173 to s6 primary T415861', diff saved to https://phabricator.wikimedia.org/P88075 and previous config saved to /var/cache/conftool/dbconfig/20260129-065753-marostegui.json [06:58:39] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1201 T415861', diff saved to https://phabricator.wikimedia.org/P88076 and previous config saved to /var/cache/conftool/dbconfig/20260129-065838-marostegui.json [06:58:48] !log Starting s6 eqiad failover from db1201 to db1173 - T415861 [06:58:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:36] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1201.eqiad.wmnet with reason: Schema change on db1201 [07:00:04] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700) [07:00:04] marostegui, Amir1, and federico3: #bothumor My software never has bugs. It just develops random features. Rise for Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0700). [07:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:17:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88078 and previous config saved to /var/cache/conftool/dbconfig/20260129-071724-marostegui.json [07:17:31] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [07:21:13] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2212: After schema change [07:32:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88080 and previous config saved to /var/cache/conftool/dbconfig/20260129-073232-marostegui.json [07:41:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88081 and previous config saved to /var/cache/conftool/dbconfig/20260129-074130-marostegui.json [07:41:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [07:47:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P88082 and previous config saved to /var/cache/conftool/dbconfig/20260129-074742-marostegui.json [07:53:30] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565014 (10KartikMistry) I'm still debugging, and probably best way to check with reverting original memory allocation. Patch is co... [07:54:04] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565016 (10KartikMistry) 05Open→03In progress [07:54:38] 06SRE, 10MinT, 10Prod-Kubernetes, 06ServiceOps new, and 3 others: Can't deploy machinetranslation due to exceeding resource quotas - https://phabricator.wikimedia.org/T411058#11565017 (10KartikMistry) a:03KartikMistry [07:56:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88083 and previous config saved to /var/cache/conftool/dbconfig/20260129-075639-marostegui.json [07:57:31] Hey folks! My name is Charlie and I found my way here from the "Get involved" page on WikiTech. I just moved on from 13 years with Puppet Labs as a tech lead on their support team. It looks like you folks are using Puppet for this and that and I would love to put my experience to work volunteering if there's anything I could help with. [07:58:19] Also, if anyone happens to be in Belgium this weekend for Fosdem or CfgMgmtCamp next week, I would love to say hi! [07:59:08] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1230512 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [07:59:45] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search: deploy eqiad & codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234128 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [07:59:57] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: provision ns [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234593 (https://phabricator.wikimedia.org/T414702) (owner: 10Ryan Kemper) [08:00:05] Amir1, Urbanecm, and awight: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0800). [08:00:05] No Gerrit patches in the queue for this window AFAICS. [08:00:12] (03CR) 10Brouberol: [C:03+1] opensearch-semantic-search-test: depl eqiad, codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1234594 (https://phabricator.wikimedia.org/T414691) (owner: 10Ryan Kemper) [08:01:37] csharpsteen: you'll probably have more of a chance of not getting lost in noise in #wikimedia-sre [08:02:23] Awesome. Thanks for the pointer! [08:02:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88084 and previous config saved to /var/cache/conftool/dbconfig/20260129-080251-marostegui.json [08:02:58] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:03:09] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance [08:03:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [08:03:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88085 and previous config saved to /var/cache/conftool/dbconfig/20260129-080327-marostegui.json [08:04:26] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:11:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P88086 and previous config saved to /var/cache/conftool/dbconfig/20260129-081148-marostegui.json [08:26:57] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88087 and previous config saved to /var/cache/conftool/dbconfig/20260129-082656-marostegui.json [08:27:00] I have a patch to backport [08:27:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:27:14] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance [08:27:22] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88088 and previous config saved to /var/cache/conftool/dbconfig/20260129-082722-marostegui.json [08:30:37] (03PS1) 10Kosta Harlan: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) [08:30:50] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:31:52] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:32:54] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:35:16] (03Merged) 10jenkins-bot: BlockUtils: Remove x-provenance [extensions/WikimediaEvents] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1234918 (https://phabricator.wikimedia.org/T415354) (owner: 10Kosta Harlan) [08:36:33] !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] [08:36:38] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:38:55] !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [08:40:04] !log kharlan@deploy2002 kharlan: Continuing with sync [08:42:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88089 and previous config saved to /var/cache/conftool/dbconfig/20260129-084216-marostegui.json [08:42:22] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [08:44:18] !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1234918|BlockUtils: Remove x-provenance (T415354)]] (duration: 07m 45s) [08:44:23] T415354: Record CDN/Backend api values in editattemptsblocked schema - https://phabricator.wikimedia.org/T415354 [08:57:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88091 and previous config saved to /var/cache/conftool/dbconfig/20260129-085724-marostegui.json [09:00:05] brennen and andre: Time to do the MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) deploy. Don't look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T0900). [09:00:25] nah [09:05:53] 06SRE, 10SRE-Access-Requests: Requesting access to deployment for trueg - https://phabricator.wikimedia.org/T415632#11565160 (10DSantamaria) Approved! [09:06:29] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88092 and previous config saved to /var/cache/conftool/dbconfig/20260129-090628-marostegui.json [09:06:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:12:33] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P88093 and previous config saved to /var/cache/conftool/dbconfig/20260129-091232-marostegui.json [09:21:36] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88094 and previous config saved to /var/cache/conftool/dbconfig/20260129-092135-marostegui.json [09:27:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T415786)', diff saved to https://phabricator.wikimedia.org/P88095 and previous config saved to /var/cache/conftool/dbconfig/20260129-092741-marostegui.json [09:27:50] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:27:58] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance [09:28:07] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88096 and previous config saved to /var/cache/conftool/dbconfig/20260129-092806-marostegui.json [09:30:05] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 [09:32:11] (03PS2) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [09:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:36:44] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P88097 and previous config saved to /var/cache/conftool/dbconfig/20260129-093644-marostegui.json [09:41:16] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1193 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234935 (https://phabricator.wikimedia.org/T415879) [09:42:46] (03PS1) 10Jgiannelos: beta: Fix duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234940 [09:51:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2171 (T415786)', diff saved to https://phabricator.wikimedia.org/P88098 and previous config saved to /var/cache/conftool/dbconfig/20260129-095151-marostegui.json [09:51:59] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [09:52:08] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance [09:52:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88099 and previous config saved to /var/cache/conftool/dbconfig/20260129-095216-marostegui.json [10:01:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88100 and previous config saved to /var/cache/conftool/dbconfig/20260129-100158-marostegui.json [10:02:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:02:30] hi everyone! can someone help me with this problem https://phabricator.wikimedia.org/T415876 ? on it.wiki the recent deploy broken module/templates that handles datetime triggered by a fault localization on translatewiki. the changes were reverted but we don't want to wait another week to fix the problem. thanks! [10:17:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88101 and previous config saved to /var/cache/conftool/dbconfig/20260129-101706-marostegui.json [10:17:22] (03PS3) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:18:06] (03PS4) 10Jgiannelos: Remove duplicate definition of site.v1.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1234927 (https://phabricator.wikimedia.org/T415877) [10:28:35] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88103 and previous config saved to /var/cache/conftool/dbconfig/20260129-102834-marostegui.json [10:28:40] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:32:16] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P88104 and previous config saved to /var/cache/conftool/dbconfig/20260129-103215-marostegui.json [10:43:43] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88105 and previous config saved to /var/cache/conftool/dbconfig/20260129-104343-marostegui.json [10:47:25] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1185 (T415786)', diff saved to https://phabricator.wikimedia.org/P88106 and previous config saved to /var/cache/conftool/dbconfig/20260129-104723-marostegui.json [10:47:33] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [10:47:40] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance [10:47:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88107 and previous config saved to /var/cache/conftool/dbconfig/20260129-104748-marostegui.json [10:58:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P88108 and previous config saved to /var/cache/conftool/dbconfig/20260129-105851-marostegui.json [11:00:05] Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1100) [11:01:17] FIRING: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:04:08] FIRING: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:04:41] !log root@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None [11:14:00] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2178 (T415786)', diff saved to https://phabricator.wikimedia.org/P88109 and previous config saved to /var/cache/conftool/dbconfig/20260129-111359-marostegui.json [11:14:06] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:14:17] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2201.codfw.wmnet with reason: Maintenance [11:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:21:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88110 and previous config saved to /var/cache/conftool/dbconfig/20260129-112137-marostegui.json [11:21:45] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:24:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88111 and previous config saved to /var/cache/conftool/dbconfig/20260129-112437-marostegui.json [11:24:47] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [11:24:47] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [11:34:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88112 and previous config saved to /var/cache/conftool/dbconfig/20260129-113446-marostegui.json [11:36:46] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P88113 and previous config saved to /var/cache/conftool/dbconfig/20260129-113645-marostegui.json [11:44:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88114 and previous config saved to /var/cache/conftool/dbconfig/20260129-114455-marostegui.json [11:46:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2211.codfw.wmnet with reason: Maintenance [11:47:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88115 and previous config saved to /var/cache/conftool/dbconfig/20260129-114701-marostegui.json [11:47:07] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [11:51:55] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P88116 and previous config saved to /var/cache/conftool/dbconfig/20260129-115154-marostegui.json [11:55:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1226 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P88117 and previous config saved to /var/cache/conftool/dbconfig/20260129-115503-marostegui.json [11:55:12] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [11:55:13] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [11:55:21] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance [12:04:41] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:06:40] (03PS1) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:07:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1200 (T415786)', diff saved to https://phabricator.wikimedia.org/P88118 and previous config saved to /var/cache/conftool/dbconfig/20260129-120702-marostegui.json [12:07:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:07:10] (03CR) 10CI reject: [V:04-1] gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:07:19] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance [12:07:28] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88119 and previous config saved to /var/cache/conftool/dbconfig/20260129-120727-marostegui.json [12:08:07] (03PS2) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:11:18] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7960/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:15:14] (03PS3) 10Jelto: gitlab: set qos to low in rsync server [puppet] - 10https://gerrit.wikimedia.org/r/1234984 [12:17:52] (03CR) 10Jelto: [V:03+1] "PCC SUCCESS (CORE_DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7961/co" [puppet] - 10https://gerrit.wikimedia.org/r/1234984 (owner: 10Jelto) [12:21:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88120 and previous config saved to /var/cache/conftool/dbconfig/20260129-122138-marostegui.json [12:21:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [12:36:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88121 and previous config saved to /var/cache/conftool/dbconfig/20260129-123630-marostegui.json [12:36:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [12:36:49] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P88122 and previous config saved to /var/cache/conftool/dbconfig/20260129-123647-marostegui.json [12:51:38] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P88123 and previous config saved to /var/cache/conftool/dbconfig/20260129-125138-marostegui.json [12:51:47] RESOLVED: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:51:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P88124 and previous config saved to /var/cache/conftool/dbconfig/20260129-125157-marostegui.json [12:53:13] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1193 with weight 0 T415879', diff saved to https://phabricator.wikimedia.org/P88125 and previous config saved to /var/cache/conftool/dbconfig/20260129-125312-marostegui.json [12:53:23] T415879: Switchover s8 master (db1209 -> db1193) - https://phabricator.wikimedia.org/T415879 [12:53:28] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1193 to s8 master [puppet] - 10https://gerrit.wikimedia.org/r/1234935 (https://phabricator.wikimedia.org/T415879) (owner: 10Gerrit maintenance bot) [12:53:41] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T415879 [12:56:34] !log Starting s8 eqiad failover from db1209 to db1193 - T415879 [12:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1193 to s8 primary T415879', diff saved to https://phabricator.wikimedia.org/P88126 and previous config saved to /var/cache/conftool/dbconfig/20260129-125700-marostegui.json [12:57:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1209 T415879', diff saved to https://phabricator.wikimedia.org/P88127 and previous config saved to /var/cache/conftool/dbconfig/20260129-125739-marostegui.json [12:58:51] (03PS2) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) [13:00:05] Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1300) [13:00:21] (03PS1) 10Marostegui: db1209: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1235001 [13:00:48] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1209.eqiad.wmnet with reason: long schema change on db1209 [13:00:59] (03CR) 10Marostegui: [C:03+2] db1209: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1235001 (owner: 10Marostegui) [13:02:07] (03PS3) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) [13:02:50] !log Run schema change on old s8 eqiad master (db1209) T411164 T411163 [13:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:58] T411164: Drop rev_sha1 from revision table in wmf production - https://phabricator.wikimedia.org/T411164 [13:02:58] T411163: Drop ar_sha1 from archive table in wmf production - https://phabricator.wikimedia.org/T411163 [13:06:04] (03CR) 10Federico Ceratto: service, trafficserver: Prepare "linked-artifacts" k8s pod (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1227851 (https://phabricator.wikimedia.org/T414112) (owner: 10Federico Ceratto) [13:06:47] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P88129 and previous config saved to /var/cache/conftool/dbconfig/20260129-130646-marostegui.json [13:07:06] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2211 (T415786)', diff saved to https://phabricator.wikimedia.org/P88130 and previous config saved to /var/cache/conftool/dbconfig/20260129-130705-marostegui.json [13:07:11] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:07:24] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2213.codfw.wmnet with reason: Maintenance [13:07:32] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88131 and previous config saved to /var/cache/conftool/dbconfig/20260129-130731-marostegui.json [13:21:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1207 (T415786)', diff saved to https://phabricator.wikimedia.org/P88132 and previous config saved to /var/cache/conftool/dbconfig/20260129-132154-marostegui.json [13:22:04] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:22:15] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance [13:27:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1008.eqiad.wmnet with reason: long schema change [13:28:57] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1245.eqiad.wmnet with reason: long schema change [13:34:15] FIRING: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:41:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88133 and previous config saved to /var/cache/conftool/dbconfig/20260129-134153-marostegui.json [13:42:00] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:48:20] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db1230 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235032 (https://phabricator.wikimedia.org/T415893) [13:54:54] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance [13:55:03] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1230 (T415786)', diff saved to https://phabricator.wikimedia.org/P88134 and previous config saved to /var/cache/conftool/dbconfig/20260129-135501-marostegui.json [13:55:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [13:57:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P88135 and previous config saved to /var/cache/conftool/dbconfig/20260129-135702-marostegui.json [14:00:05] Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1400). [14:00:05] No Gerrit patches in the queue for this window AFAICS. [14:00:32] o/ [14:00:40] Lucas_WMDE: if I got a patch together, do you think you'd be willing/able to depoly the backport requested in T415876? [14:00:41] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:00:53] (would just be reverting the changes to datetime/it.json IIUC) [14:01:23] can’t they override that in the MediaWiki: namespace to unbreak their wiki? [14:01:25] but yeah sure [14:01:38] ty :] [14:01:59] given that the changes seem to have been reverted on twn already [14:02:02] i think they have actually (which might make the backport a bit hard to test...) but I guess they might want to stop doing that sooner rather than later [14:02:14] (re: MediaWiki: namespace changes) [14:02:19] (03PS1) 10Kareid: Test Kitchen UI: Deploy v.1.1.7 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) [14:02:28] ack [14:02:45] ETA 2 mins on patch [14:04:26] FIRING: [9x] SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:30] (03PS1) 10A smart kitten: Update Italian datetime messages (from https://translatewiki.net) [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) [14:05:19] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, January 29 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:05:20] * Lucas_WMDE opens deployment calendar, misreads Test Kitchen as Test Kitten [14:05:42] lets rename it again! [14:05:50] rename ALL the things! [14:07:12] patch should be ready now; i guess, given the CI success caching, it might be quicker to actually wait for the main test to pass before +2ing it for deployment [14:07:14] * Lucas_WMDE reviews the messages on twn [14:08:46] !log Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/fixFunctionTesterImplementationIssues.php --wiki=wikifunctionswiki` for T399934 [14:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:50] T399934: tests moved to a different function still show on implementations of the original - https://phabricator.wikimedia.org/T399934 [14:09:28] (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "I checked on TWN that all of these messages match the current on-wiki revision." [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:09:39] (03CR) 10TrainBranchBot: [C:03+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:09:45] * Lucas_WMDE checks wmf.12 [14:10:21] ok everything’s still lowercase there, no further backport needed [14:10:28] (I guess otherwise they would’ve noticed the error sooner anyway) [14:11:10] Yeah, FWICS https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1229450 was first included in wmf.13 [14:12:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P88137 and previous config saved to /var/cache/conftool/dbconfig/20260129-141210-marostegui.json [14:13:01] hm, https://it.wikipedia.org/w/index.php?title=MediaWiki:January&action=history and https://it.wikipedia.org/w/index.php?title=MediaWiki:Jan&action=history both exist since 2005 [14:13:17] ok but https://it.wikipedia.org/wiki/MediaWiki:Monday doesn’t [14:13:33] they were undeleted today FWICS [14:13:36] (I also got caught out by that) [14:13:41] ah, I couldn’t see that ^^ [14:14:17] oh, I just realized this will be one of those super long deployments /o\ [14:14:20] due to the touched l10n cache [14:14:23] but no way around it [14:15:02] oh yeah, apologies :/ (and apologies also for the VERY late-notice request) [14:18:53] RESOLVED: [2x] CertAlmostExpired: Certificate for service titan2001:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#titan2001:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:19:15] RESOLVED: JobUnavailable: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:22:54] (FWIW, my perspective on the request that was made in T415876 is something like: the i18n change being reverted here (that's been objected to by folks on the wiki) is a very recent one, AFAICS (from looking at TWN history pages) the previous versions of those messages have been that way since at least 2008, and the updated message would presumably be deployed by next-week's train anyway) [14:22:55] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:23:22] (03Merged) 10jenkins-bot: Update Italian datetime messages (from https://translatewiki.net) [core] (wmf/1.46.0-wmf.13) - 10https://gerrit.wikimedia.org/r/1235035 (https://phabricator.wikimedia.org/T415876) (owner: 10A smart kitten) [14:23:56] !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] [14:24:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1230 (T415786)', diff saved to https://phabricator.wikimedia.org/P88138 and previous config saved to /var/cache/conftool/dbconfig/20260129-142428-marostegui.json [14:24:35] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [14:26:24] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, asmartkitten: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [14:26:41] looking [14:26:45] thanks [14:27:04] and yeah IMHO an out-of-cadence sync with TWN is fine; if the messages hadn’t been reverted on TWN already then I’d be more hesitant about the deploy [14:27:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Set db1230 with weight 0 T415893', diff saved to https://phabricator.wikimedia.org/P88139 and previous config saved to /var/cache/conftool/dbconfig/20260129-142716-marostegui.json [14:27:22] T415893: Switchover s5 master (db1210 -> db1230) - https://phabricator.wikimedia.org/T415893 [14:27:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2213 (T415786)', diff saved to https://phabricator.wikimedia.org/P88140 and previous config saved to /var/cache/conftool/dbconfig/20260129-142725-marostegui.json [14:27:32] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2223.codfw.wmnet with reason: Maintenance [14:27:38] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T415893 [14:27:40] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88141 and previous config saved to /var/cache/conftool/dbconfig/20260129-142740-marostegui.json [14:27:48] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1230 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235032 (https://phabricator.wikimedia.org/T415893) (owner: 10Gerrit maintenance bot) [14:27:53] well, https://it.wikivoyage.org/wiki/MediaWiki:January has the new lowercase-first version of the message on mwdebug [14:28:34] and so as I might not be able to test on itwiki itself (as they've overridden their messages temporarily), I'd personally call that okay I think [14:29:06] !log Starting s5 eqiad failover from db1210 to db1230 - T415893 [14:29:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Promote db1230 to s5 primary T415893', diff saved to https://phabricator.wikimedia.org/P88142 and previous config saved to /var/cache/conftool/dbconfig/20260129-142953-marostegui.json [14:30:30] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1210 T415893', diff saved to https://phabricator.wikimedia.org/P88143 and previous config saved to /var/cache/conftool/dbconfig/20260129-143030-marostegui.json [14:31:34] Lucas_WMDE: ^ [14:31:59] sorry, got distracted for a second [14:32:07] no worries :) [14:32:12] so long as everything seems okay your end [14:32:20] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1210.eqiad.wmnet with reason: Long schema change [14:32:26] !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, asmartkitten: Continuing with sync [14:32:29] yeah let’s go [14:35:11] FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:36:35] !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1235035|Update Italian datetime messages (from https://translatewiki.net) (T415876)]] (duration: 12m 39s) [14:36:41] :o [14:36:42] T415876: Revert datetime it localization on it.wiki - https://phabricator.wikimedia.org/T415876 [14:36:47] that’s a lot faster than I expected [14:37:04] same here! [14:37:12] “17 languages rebuilt out of 545” [14:37:26] I wonder if it was faster because it didn’t affect en.json, and therefore didn’t change the many languages that end up copying the English message [14:38:24] thanks again for deploying Lucas_WMDE :) I will try and give a fair amount more notice the next time I ask if something can be deployed... [14:38:33] np, thanks for the backport :) [14:38:37] !log UTC afternoon backport+config window done [14:38:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:15] RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:45:31] (03CR) 10Clare Ming: [C:03+2] "looks good !" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) (owner: 10Kareid) [14:47:17] (03Merged) 10jenkins-bot: Test Kitchen UI: Deploy v.1.1.7 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1235033 (https://phabricator.wikimedia.org/T415325) (owner: 10Kareid) [15:01:13] 10ops-codfw, 06SRE, 10SRE-swift-storage, 06DC-Ops, 06Infrastructure-Foundations: DHCP failing for at least 2 ms-be servers in codfw - https://phabricator.wikimedia.org/T415189#11566106 (10cmooney) >>! In T415189#11546923, @jhathaway wrote: > Perhaps there is a race condition with that script updating the... [15:01:58] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88144 and previous config saved to /var/cache/conftool/dbconfig/20260129-150156-marostegui.json [15:02:05] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:08:37] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance [15:08:45] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [15:08:53] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88145 and previous config saved to /var/cache/conftool/dbconfig/20260129-150852-marostegui.json [15:09:03] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:09:15] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:09:53] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1210: After schema change [15:10:44] !log marostegui@cumin1003 START - Cookbook sre.mysql.newpool pool db1201: After schema change [15:17:08] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P88148 and previous config saved to /var/cache/conftool/dbconfig/20260129-151705-marostegui.json [15:18:01] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2213 to s5 master [puppet] - 10https://gerrit.wikimedia.org/r/1235054 (https://phabricator.wikimedia.org/T415900) [15:18:11] (03PS1) 10Gerrit maintenance bot: wmnet: Update s5-master alias [dns] - 10https://gerrit.wikimedia.org/r/1235055 (https://phabricator.wikimedia.org/T415900) [15:19:15] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate eventstreams-internal.discovery.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire [15:32:17] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P88151 and previous config saved to /var/cache/conftool/dbconfig/20260129-153215-marostegui.json [15:34:15] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:47:26] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2223 (T415786)', diff saved to https://phabricator.wikimedia.org/P88154 and previous config saved to /var/cache/conftool/dbconfig/20260129-154725-marostegui.json [15:47:38] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [15:47:44] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2228.codfw.wmnet with reason: Maintenance [15:47:54] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88155 and previous config saved to /var/cache/conftool/dbconfig/20260129-154751-marostegui.json [15:55:24] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1210: After schema change [15:56:15] !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1201: After schema change [16:17:42] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88158 and previous config saved to /var/cache/conftool/dbconfig/20260129-161741-marostegui.json [16:17:49] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [16:18:18] (03PS1) 10Dzahn: admin/nagios/wmcs: offboard akosiaris [puppet] - 10https://gerrit.wikimedia.org/r/1235066 [16:32:52] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P88159 and previous config saved to /var/cache/conftool/dbconfig/20260129-163250-marostegui.json [16:33:09] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [16:33:38] (03PS1) 10Dzahn: cumin/insetup_role_report: send wmcs report to Mark and Levi, not Alex [puppet] - 10https://gerrit.wikimedia.org/r/1235071 [16:33:52] !log dancy@deploy2002 Installing scap version "4.240.0" for 2 host(s) [16:35:43] !log dancy@deploy2002 Installation of scap version "4.240.0" completed for 2 hosts [16:48:01] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P88160 and previous config saved to /var/cache/conftool/dbconfig/20260129-164800-marostegui.json [16:54:50] (03PS1) 10Bking: apt: mirror opensearch 2 and 3 repos in trixie-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/1235075 (https://phabricator.wikimedia.org/T415699) [16:57:02] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88161 and previous config saved to /var/cache/conftool/dbconfig/20260129-165701-marostegui.json [16:57:09] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:00:05] jhathaway and rzl: That opportune time for a Puppet request window deploy is upon us again. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260129T1700). [17:00:05] No Gerrit patches in the queue for this window AFAICS. [17:03:09] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2228 (T415786)', diff saved to https://phabricator.wikimedia.org/P88162 and previous config saved to /var/cache/conftool/dbconfig/20260129-170308-marostegui.json [17:03:14] T415786: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786 [17:04:53] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance [17:05:04] !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2148 (T415786)', diff saved to https://phabricator.wikimedia.org/P88163 and previous config saved to /var/cache/conftool/dbconfig/20260129-170501-marostegui.json [17:12:11] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P88164 and previous config saved to /var/cache/conftool/dbconfig/20260129-171210-marostegui.json [17:27:19] !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P88166 and previous config saved to /var/cache/conftool/dbconfig/20260129-172718-marostegui.json