[00:00:32] !log jdlrobson@deploy1003 Started scap sync-world: Backport for [[gerrit:1285907|Skin: Correct thumbnail class (T424910)]] [00:00:35] T424910: Limit Special:Preferences thumbnail option to three options - small, regular and large - https://phabricator.wikimedia.org/T424910 [00:02:22] !log jdlrobson@deploy1003 jdlrobson: Backport for [[gerrit:1285907|Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [00:03:25] FIRING: [25x] SystemdUnitFailed: cfssl-ocsprefresh-Wikimedia_Internal_Root_CA.service on pki1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:03:49] !log jdlrobson@deploy1003 jdlrobson: Continuing with deployment [00:07:56] !log jdlrobson@deploy1003 Finished scap sync-world: Backport for [[gerrit:1285907|Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) [00:08:00] T424910: Limit Special:Preferences thumbnail option to three options - small, regular and large - https://phabricator.wikimedia.org/T424910 [00:08:25] FIRING: [50x] SystemdUnitFailed: cfssl-ocsprefresh-Wikimedia_Internal_Root_CA.service on pki1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:08:51] (03PS1) 10Eevans: echostore: enable host verification (test) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285929 (https://phabricator.wikimedia.org/T425308) [00:11:11] (03CR) 10Eevans: [C:03+2] echostore: enable host verification (test) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285929 (https://phabricator.wikimedia.org/T425308) (owner: 10Eevans) [00:11:37] (done) [00:12:45] (03PS1) 10Dbrant: Add "get_login_creds" permission to Android app. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285930 (https://phabricator.wikimedia.org/T426010) [00:13:15] (03Merged) 10jenkins-bot: echostore: enable host verification (test) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285929 (https://phabricator.wikimedia.org/T425308) (owner: 10Eevans) [00:14:24] !log eevans@deploy1003 helmfile [staging] START helmfile.d/services/echostore: apply [00:16:41] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:21:25] FIRING: [5x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:24:34] !log eevans@deploy1003 helmfile [staging] DONE helmfile.d/services/echostore: apply [00:29:38] (03PS1) 10Eevans: echostore: disable TLS host verification [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285932 (https://phabricator.wikimedia.org/T425308) [00:30:34] (03PS2) 10Dbrant: Add "get_login_creds" permission to Android app. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285930 (https://phabricator.wikimedia.org/T426010) [00:31:26] FIRING: [5x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:31:55] (03CR) 10Eevans: [C:03+2] echostore: disable TLS host verification [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285932 (https://phabricator.wikimedia.org/T425308) (owner: 10Eevans) [00:34:00] (03Merged) 10jenkins-bot: echostore: disable TLS host verification [deployment-charts] - 10https://gerrit.wikimedia.org/r/1285932 (https://phabricator.wikimedia.org/T425308) (owner: 10Eevans) [00:35:03] !log eevans@deploy1003 helmfile [staging] START helmfile.d/services/echostore: apply [00:35:09] !log eevans@deploy1003 helmfile [staging] DONE helmfile.d/services/echostore: apply [00:35:50] !log eevans@deploy1003 helmfile [codfw] START helmfile.d/services/echostore: apply [00:36:12] !log eevans@deploy1003 helmfile [codfw] DONE helmfile.d/services/echostore: apply [00:37:23] !log eevans@deploy1003 helmfile [eqiad] START helmfile.d/services/echostore: apply [00:37:40] !log eevans@deploy1003 helmfile [eqiad] DONE helmfile.d/services/echostore: apply [00:38:14] (03PS3) 10Dbrant: Add "get_login_creds" permission to Android app. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285930 (https://phabricator.wikimedia.org/T426010) [01:00:48] (03PS5) 10Jasmine: wikikube: add wikikube-ctrl2006 [puppet] - 10https://gerrit.wikimedia.org/r/1249321 (https://phabricator.wikimedia.org/T406596) [01:01:44] (03PS6) 10Jasmine: wikikube: add wikikube-ctrl2006 [puppet] - 10https://gerrit.wikimedia.org/r/1249321 (https://phabricator.wikimedia.org/T406596) [01:02:28] (03CR) 10Jasmine: wikikube: add wikikube-ctrl2006 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1249321 (https://phabricator.wikimedia.org/T406596) (owner: 10Jasmine) [01:09:13] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.47.0-wmf.2 [core] (wmf/1.47.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1285933 (https://phabricator.wikimedia.org/T423911) [01:09:15] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/1.47.0-wmf.2 [core] (wmf/1.47.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1285933 (https://phabricator.wikimedia.org/T423911) (owner: 10TrainBranchBot) [01:09:26] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1285934 [01:09:26] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1285934 (owner: 10TrainBranchBot) [01:19:53] (03Merged) 10jenkins-bot: Branch commit for wmf/1.47.0-wmf.2 [core] (wmf/1.47.0-wmf.2) - 10https://gerrit.wikimedia.org/r/1285933 (https://phabricator.wikimedia.org/T423911) (owner: 10TrainBranchBot) [01:22:04] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1285934 (owner: 10TrainBranchBot) [01:25:23] FIRING: [2x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in -8d 11h 30m 34s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [01:44:40] FIRING: [2x] SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:00:04] Deploy window Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260512T0200) [02:01:02] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:03:52] 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#11910693 (10Papaul) [02:06:51] 10ops-eqsin, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724#11910694 (10Papaul) [02:07:40] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) [02:09:22] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:13:20] PROBLEM - Druid historical on an-druid1007 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.druid.cli.Main server historical https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid [02:23:40] FIRING: KubernetesRsyslogDown: rsyslog on wikikube-worker1119:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1119 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:28:40] RESOLVED: KubernetesRsyslogDown: rsyslog on wikikube-worker1119:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=wikikube-worker1119 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown [02:32:20] RECOVERY - Druid historical on an-druid1007 is OK: PROCS OK: 1 process with command name java, args org.apache.druid.cli.Main server historical https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid [02:34:22] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:51:00] FIRING: [2x] CoreBGPDown: Core BGP session down between cr3-ulsfo and asw1-23-ulsfo (198.35.26.149) - group Switch - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=ulsfo&var-device=cr3-ulsfo:9804&var-bgp_group=Switch&var-bgp_neighbor=asw1-23-ulsfo - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown [03:00:04] Deploy window Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see Heterogeneous deployment/Train deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260512T0300) [03:01:53] (03PS1) 10TrainBranchBot: testwikis to 1.47.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285937 (https://phabricator.wikimedia.org/T423911) [03:01:56] (03CR) 10TrainBranchBot: [C:03+2] "Initiated by mwpresync@deploy1003" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285937 (https://phabricator.wikimedia.org/T423911) (owner: 10TrainBranchBot) [03:02:48] (03Merged) 10jenkins-bot: testwikis to 1.47.0-wmf.2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1285937 (https://phabricator.wikimedia.org/T423911) (owner: 10TrainBranchBot) [03:03:13] !log mwpresync@deploy1003 Started scap sync-world: testwikis to 1.47.0-wmf.2 refs T423911 [03:03:17] T423911: 1.47.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T423911 [03:18:18] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2002 is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner [03:28:18] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on deploy2002 is OK: Files ownership is ok. https://wikitech.wikimedia.org/wiki/Monitoring/bad_directory_owner [03:39:49] !log mwpresync@deploy1003 Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs T423911 (duration: 36m 36s) [03:39:53] T423911: 1.47.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T423911 [03:46:26] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:00:04] Deploy window Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260512T0400) [04:08:40] FIRING: [49x] SystemdUnitFailed: cfssl-ocsprefresh-Wikimedia_Internal_Root_CA.service on pki1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:56:26] FIRING: [4x] SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1012:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:57:55] 10ops-drmrs: Alert for device asw1-b12-drmrs.mgmt.drmrs.wmnet - Port with no description on access switch - https://phabricator.wikimedia.org/T418136#11910824 (10phaultfinder) [05:10:41] FIRING: BFDdown: BFD session down between cr2-esams and fe80::ee38:7300:17e8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown [05:15:41] RESOLVED: BFDdown: BFD session down between cr2-esams and fe80::ee38:7300:17e8:9c56 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown