[00:08:33] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1161078 [00:08:34] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1161078 (owner: 10TrainBranchBot) [00:11:25] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2038 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:12:07] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2033 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:12:29] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4051 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:13:09] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4042 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:14:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1108 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:14:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2037 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:15:49] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2029 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:16:17] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1104 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:17:25] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2032 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:17:29] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2039 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:17:41] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1113 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:20:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4038 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:20:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2040 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:20:41] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2036 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:21:33] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1114 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:22:45] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4043 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:23:43] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2034 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:25:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [00:25:11] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1103 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:26:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2031 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:26:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:26:09] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4037 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:26:37] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1105 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:27:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2042 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:27:07] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2030 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:27:25] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4047 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:28:03] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1110 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:28:17] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1106 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:29:13] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1111 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:29:21] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4046 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:30:03] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1102 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:30:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [00:30:39] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4052 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:31:09] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4048 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:31:21] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4039 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:31:21] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4041 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:32:07] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2035 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:32:13] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4044 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:33:27] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4040 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:34:25] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1115 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:34:41] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1100 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:35:37] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1109 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:35:37] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4050 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:35:51] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1112 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:36:25] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4049 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:37:05] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2028 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:37:17] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1107 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:38:39] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp2041 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:39:39] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp1101 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:40:55] PROBLEM - HAProxy HTTPS wikipedia.org ECDSA on cp4045 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:did not receive the required stapled OCSP response https://wikitech.wikimedia.org/wiki/HTTPS [00:49:50] 10ops-eqiad, 06DC-Ops: Alert for device ps1-b4-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T397386 (10phaultfinder) 03NEW [00:50:24] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1161078 (owner: 10TrainBranchBot) [01:06:41] PROBLEM - Disk space on releases1003 is CRITICAL: DISK CRITICAL - /srv/docker/overlay2/adc96fa92c4576fd1d55056bda08bfc99ab0a4cc07015c81f7840ab837be82a7/merged is not accessible: Permission denied https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [01:24:49] 10ops-eqiad, 06DC-Ops: Alert for device ps1-b4-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T397386#10930318 (10phaultfinder) [01:26:41] RECOVERY - Disk space on releases1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=releases1003&var-datasource=eqiad+prometheus/ops [01:39:57] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-b4-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T397386#10930322 (10phaultfinder) [01:59:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:05:49] (03PS1) 10Dzahn: zuul::executor: add zuul user and nodepool ssh private key [puppet] - 10https://gerrit.wikimedia.org/r/1161090 (https://phabricator.wikimedia.org/T395938) [02:12:44] (03PS1) 10Dzahn: secrets: add fake SSH private key for zuul [labs/private] - 10https://gerrit.wikimedia.org/r/1161093 (https://phabricator.wikimedia.org/T395938) [02:15:43] (03CR) 10Dzahn: [V:03+2 C:03+2] secrets: add fake SSH private key for zuul [labs/private] - 10https://gerrit.wikimedia.org/r/1161093 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [02:15:51] (03PS2) 10Dzahn: secrets: add fake SSH private key for zuul [labs/private] - 10https://gerrit.wikimedia.org/r/1161093 (https://phabricator.wikimedia.org/T395938) [02:15:55] (03CR) 10Dzahn: [V:03+2] secrets: add fake SSH private key for zuul [labs/private] - 10https://gerrit.wikimedia.org/r/1161093 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [02:22:36] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/c/labs/private/+/1161093" [puppet] - 10https://gerrit.wikimedia.org/r/1161090 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [02:24:18] (03PS2) 10Dzahn: zuul::executor: add zuul user and nodepool ssh private key [puppet] - 10https://gerrit.wikimedia.org/r/1161090 (https://phabricator.wikimedia.org/T395938) [02:25:23] (03CR) 10Dzahn: [V:03+1 C:03+2] "https://puppet-compiler.wmflabs.org/output/1161090/6023/zuul1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1161090 (https://phabricator.wikimedia.org/T395938) (owner: 10Dzahn) [02:34:58] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-b4-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T397386#10930339 (10phaultfinder) [03:14:58] (03PS1) 10Andrew Bogott: Openstack [service_user] config: use internal endpoint for service users [puppet] - 10https://gerrit.wikimedia.org/r/1161113 (https://phabricator.wikimedia.org/T330759) [03:15:00] (03PS1) 10Andrew Bogott: Openstack [keystone_authtoken]: remove auth_url setting [puppet] - 10https://gerrit.wikimedia.org/r/1161114 (https://phabricator.wikimedia.org/T330759) [03:15:01] (03PS1) 10Andrew Bogott: cinder: use 'cinder' service user rather than 'novaadmin' [puppet] - 10https://gerrit.wikimedia.org/r/1161115 (https://phabricator.wikimedia.org/T330759) [03:15:17] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161115 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [03:18:29] FIRING: [2x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:21:37] (03PS1) 10Andrew Bogott: Comment back in cinder ldap passwords [labs/private] - 10https://gerrit.wikimedia.org/r/1161116 [03:22:02] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Comment back in cinder ldap passwords [labs/private] - 10https://gerrit.wikimedia.org/r/1161116 (owner: 10Andrew Bogott) [03:22:57] (03PS2) 10Andrew Bogott: cinder: use 'cinder' service user rather than 'novaadmin'. [puppet] - 10https://gerrit.wikimedia.org/r/1161115 (https://phabricator.wikimedia.org/T330759) [03:23:04] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161115 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [03:30:21] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161113 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [03:30:28] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161114 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [03:42:01] (03PS2) 10Andrew Bogott: Openstack [service_user] config: use internal endpoint for service users [puppet] - 10https://gerrit.wikimedia.org/r/1161113 (https://phabricator.wikimedia.org/T330759) [03:42:01] (03PS2) 10Andrew Bogott: Openstack [keystone_authtoken]: remove auth_url setting [puppet] - 10https://gerrit.wikimedia.org/r/1161114 (https://phabricator.wikimedia.org/T330759) [03:42:01] (03PS3) 10Andrew Bogott: cinder: use 'cinder' service user rather than 'novaadmin' [puppet] - 10https://gerrit.wikimedia.org/r/1161115 (https://phabricator.wikimedia.org/T330759) [03:42:18] (03CR) 10Andrew Bogott: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161113 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [03:45:39] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:59:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:20:19] (03CR) 10Andrew Bogott: [C:03+2] Openstack [service_user] config: use internal endpoint for service users [puppet] - 10https://gerrit.wikimedia.org/r/1161113 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [04:20:31] (03CR) 10Andrew Bogott: [C:03+2] Openstack [keystone_authtoken]: remove auth_url setting [puppet] - 10https://gerrit.wikimedia.org/r/1161114 (https://phabricator.wikimedia.org/T330759) (owner: 10Andrew Bogott) [04:22:17] RECOVERY - Wikitech and wt-static content in sync on wikitech-static.wikimedia.org is OK: wikitech-static OK - wikitech and wikitech-static in sync (0 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [04:24:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:26:17] PROBLEM - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:27:17] RECOVERY - nova-compute proc minimum on cloudvirt1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:28:35] PROBLEM - nova-compute proc minimum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:29:04] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:29:35] RECOVERY - nova-compute proc minimum on cloudvirt1071 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:30:03] PROBLEM - nova-compute proc minimum on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:30:17] PROBLEM - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:31:03] RECOVERY - nova-compute proc minimum on cloudvirt1073 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:31:17] RECOVERY - nova-compute proc minimum on cloudvirt1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:33:37] PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:33:38] PROBLEM - nova-compute proc minimum on cloudvirt1067 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:33:56] (03PS1) 10Andrew Bogott: Revert "Openstack [keystone_authtoken]: remove auth_url setting" [puppet] - 10https://gerrit.wikimedia.org/r/1161148 [04:33:58] (03PS1) 10Andrew Bogott: Revert "Openstack [service_user] config: use internal endpoint f..." [puppet] - 10https://gerrit.wikimedia.org/r/1161149 [04:34:03] PROBLEM - nova-compute proc minimum on cloudvirt1044 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:34:04] FIRING: [2x] ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [04:34:37] RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:34:37] RECOVERY - nova-compute proc minimum on cloudvirt1067 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:34:42] (03Abandoned) 10Andrew Bogott: Revert "Openstack [service_user] config: use internal endpoint f..." [puppet] - 10https://gerrit.wikimedia.org/r/1161149 (owner: 10Andrew Bogott) [04:34:53] (03CR) 10Andrew Bogott: [C:03+2] Revert "Openstack [keystone_authtoken]: remove auth_url setting" [puppet] - 10https://gerrit.wikimedia.org/r/1161148 (owner: 10Andrew Bogott) [04:35:03] RECOVERY - nova-compute proc minimum on cloudvirt1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:36:23] PROBLEM - nova-compute proc minimum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:37:23] RECOVERY - nova-compute proc minimum on cloudvirt1056 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [04:59:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:01:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:01:48] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [05:05:24] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance [05:06:42] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [05:07:26] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool pc2 T378715', diff saved to https://phabricator.wikimedia.org/P78392 and previous config saved to /var/cache/conftool/dbconfig/20250619-050725-root.json [05:07:31] T378715: Possibility to transition some codfw data persistence hosts to 10G - https://phabricator.wikimedia.org/T378715 [05:08:45] !log marostegui@cumin1003 DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1012.eqiad.wmnet with reason: Maintenance [05:09:02] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance [05:24:24] (03PS1) 10Marostegui: db2186: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161179 (https://phabricator.wikimedia.org/T397279) [05:26:39] (03PS1) 10KartikMistry: Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161182 [05:27:49] (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, June 19 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161182 (owner: 10KartikMistry) [05:34:26] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance [05:34:34] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78393 and previous config saved to /var/cache/conftool/dbconfig/20250619-053433-marostegui.json [05:34:39] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [05:38:02] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2186.codfw.wmnet with reason: Maintenance [05:38:27] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2186', diff saved to https://phabricator.wikimedia.org/P78394 and previous config saved to /var/cache/conftool/dbconfig/20250619-053826-root.json [05:40:09] (03CR) 10Marostegui: [C:03+2] db2186: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161179 (https://phabricator.wikimedia.org/T397279) (owner: 10Marostegui) [05:44:19] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78395 and previous config saved to /var/cache/conftool/dbconfig/20250619-054418-root.json [05:50:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:55:23] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78396 and previous config saved to /var/cache/conftool/dbconfig/20250619-055522-marostegui.json [05:55:28] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [05:55:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:59:25] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78397 and previous config saved to /var/cache/conftool/dbconfig/20250619-055924-root.json [06:00:05] Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0600) [06:00:05] marostegui, Amir1, and federico3: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Primary database switchover . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0600). [06:04:35] <_joe_> criung [06:04:38] <_joe_> *cringe [06:04:47] <_joe_> jouncebot: cringe [06:06:42] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [06:10:30] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78398 and previous config saved to /var/cache/conftool/dbconfig/20250619-061030-marostegui.json [06:14:31] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78399 and previous config saved to /var/cache/conftool/dbconfig/20250619-061430-root.json [06:19:51] FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqord:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:21:32] (03PS1) 10Giuseppe Lavagetto: New deployment, including new api endpoints [software/hiddenparma/deploy] - 10https://gerrit.wikimedia.org/r/1161205 [06:24:51] FIRING: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [06:25:38] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78400 and previous config saved to /var/cache/conftool/dbconfig/20250619-062537-marostegui.json [06:26:05] PROBLEM - Hadoop DataNode on an-worker1154 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_Datanode_process [06:27:05] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [06:29:36] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78401 and previous config saved to /var/cache/conftool/dbconfig/20250619-062936-root.json [06:37:59] !log stevemunene@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1149-1153].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 10 [06:38:05] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 10 - multiple racks - singletons) - https://phabricator.wikimedia.org/T390178#10930469 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0e9f957a-66ba-4353-b48... [06:38:23] !log stevemunene@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1175.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 10 [06:38:35] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 10 - multiple racks - singletons) - https://phabricator.wikimedia.org/T390178#10930470 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=9df86b9e-4cff-46c4-970... [06:39:16] !log stevemunene@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1154.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9 [06:39:22] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 9 - rack E3) - https://phabricator.wikimedia.org/T390176#10930471 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=df7b706d-9914-413e-aa5e-5dc80159cf57) set b... [06:39:42] !log stevemunene@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1176.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9 [06:39:52] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 9 - rack E3) - https://phabricator.wikimedia.org/T390176#10930472 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bc400f30-590d-486b-89e0-ab54c7fac73e) set b... [06:40:45] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78402 and previous config saved to /var/cache/conftool/dbconfig/20250619-064045-marostegui.json [06:40:50] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [06:41:00] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance [06:41:09] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78403 and previous config saved to /var/cache/conftool/dbconfig/20250619-064108-marostegui.json [06:41:24] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 10 - multiple racks - singletons) - https://phabricator.wikimedia.org/T390178#10930478 (10Stevemunene) the hosts are finally done draining and are listed as decommissioned {F62... [06:41:45] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 10 - multiple racks - singletons) - https://phabricator.wikimedia.org/T390178#10930479 (10Stevemunene) [06:42:20] 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1246 crashed yet again - https://phabricator.wikimedia.org/T393296#10930480 (10Marostegui) Thank you - from puppet side this host is ready to be installed and reimaged. [07:00:05] Amir1, Urbanecm, and awight: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0700). nyaa~ [07:00:05] georgekyz and kart_: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:00:28] I am ready for deploy [07:00:46] here [07:00:54] ping me when done georgekyz [07:01:47] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78404 and previous config saved to /var/cache/conftool/dbconfig/20250619-070146-marostegui.json [07:01:52] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [07:02:04] starting [07:02:14] (03CR) 10TrainBranchBot: [C:03+2] "Approved by gkyziridis@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1160797 (https://phabricator.wikimedia.org/T395824) (owner: 10Gkyziridis) [07:03:01] (03Merged) 10jenkins-bot: ores-extension: enable extension with revertrisk filter for azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1160797 (https://phabricator.wikimedia.org/T395824) (owner: 10Gkyziridis) [07:04:04] !log gkyziridis@deploy1003 Started scap sync-world: Backport for [[gerrit:1160797|ores-extension: enable extension with revertrisk filter for azwiki (T395824)]] [07:04:08] T395824: [batch #3] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395824 [07:04:30] (03PS1) 10Muehlenhoff: Make ganeti2047/ganeti2048 Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1161335 (https://phabricator.wikimedia.org/T396590) [07:04:50] !log installing edk2 security updates [07:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:28] !log gkyziridis@deploy1003 gkyziridis: Backport for [[gerrit:1160797|ores-extension: enable extension with revertrisk filter for azwiki (T395824)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:07:54] (03CR) 10Muehlenhoff: [C:03+2] Make ganeti2047/ganeti2048 Ganeti nodes [puppet] - 10https://gerrit.wikimedia.org/r/1161335 (https://phabricator.wikimedia.org/T396590) (owner: 10Muehlenhoff) [07:08:59] !log gkyziridis@deploy1003 gkyziridis: Continuing with sync [07:12:05] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [07:15:25] (03PS3) 10Aqu: Airflow analytics-test: Optimization for LocalExecutors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) [07:15:55] !log gkyziridis@deploy1003 Finished scap sync-world: Backport for [[gerrit:1160797|ores-extension: enable extension with revertrisk filter for azwiki (T395824)]] (duration: 11m 50s) [07:16:00] T395824: [batch #3] Enable revertrisk filters in recent changes in multiple wikis - https://phabricator.wikimedia.org/T395824 [07:16:32] I finished with the deployment, feel free to proceed. [07:16:38] thnx for being around [07:16:43] FIRING: [5x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:16:54] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78405 and previous config saved to /var/cache/conftool/dbconfig/20250619-071654-marostegui.json [07:17:34] (03CR) 10CI reject: [V:04-1] Airflow analytics-test: Optimization for LocalExecutors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [07:18:09] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet [07:18:29] FIRING: [2x] SystemdUnitFailed: docker-reporter-k8s-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:30] thanks georgekyz [07:18:58] (03CR) 10TrainBranchBot: [C:03+2] "Approved by kartik@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161182 (owner: 10KartikMistry) [07:19:44] (03Merged) 10jenkins-bot: Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161182 (owner: 10KartikMistry) [07:19:55] (03PS1) 10Ayounsi: sre.network.tls: add timeout to get_server_certificate [cookbooks] - 10https://gerrit.wikimedia.org/r/1161337 [07:20:12] !log kartik@deploy1003 Started scap sync-world: Backport for [[gerrit:1161182|Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek]] [07:20:15] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.27s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:21:22] (03CR) 10Brouberol: [C:03+1] Prepare for renaming kafka-stretch200[1-2] to dse-k8s-worker200[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/1160888 (https://phabricator.wikimedia.org/T353789) (owner: 10Btullis) [07:21:43] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:22:13] (03CR) 10Brouberol: Airflow: Use a python value for the xcom_sidecar resource settings (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [07:22:28] !log kartik@deploy1003 kartik: Backport for [[gerrit:1161182|Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [07:24:22] !log kartik@deploy1003 kartik: Continuing with sync [07:25:00] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1164,1217,1250].eqiad.wmnet with reason: Primary switchover m2 T397182 [07:25:05] T397182: Switchover m2 master db1164 -> db1250 - https://phabricator.wikimedia.org/T397182 [07:25:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.27s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:25:19] !log slyngshede@cumin1002 START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300 [07:25:24] T397300: Upgrade Netbox to version 4.0.11 - https://phabricator.wikimedia.org/T397300 [07:25:30] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet [07:25:52] (03CR) 10CI reject: [V:04-1] sre.network.tls: add timeout to get_server_certificate [cookbooks] - 10https://gerrit.wikimedia.org/r/1161337 (owner: 10Ayounsi) [07:26:43] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:26:53] !log slyngshede@cumin1002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300 [07:26:58] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:27:44] (03PS4) 10Aqu: Airflow analytics-test: Optimization for LocalExecutors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) [07:28:03] (03PS1) 10Marostegui: mariadb: Promote db1250 to m2 master [puppet] - 10https://gerrit.wikimedia.org/r/1161338 (https://phabricator.wikimedia.org/T397182) [07:29:14] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet [07:29:43] (03PS2) 10Ayounsi: sre.network.tls: add timeout to get_server_certificate [cookbooks] - 10https://gerrit.wikimedia.org/r/1161337 [07:29:43] (03PS1) 10Ayounsi: tox: remove python 3.9 and 3.10 [cookbooks] - 10https://gerrit.wikimedia.org/r/1161342 [07:30:07] (03PS3) 10KartikMistry: WIP: machinetranslation: Use s3 storage for production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1159696 (https://phabricator.wikimedia.org/T335491) [07:30:22] (03CR) 10Marostegui: [C:03+2] mariadb: Promote db1250 to m2 master [puppet] - 10https://gerrit.wikimedia.org/r/1161338 (https://phabricator.wikimedia.org/T397182) (owner: 10Marostegui) [07:31:11] !log kartik@deploy1003 Finished scap sync-world: Backport for [[gerrit:1161182|Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek]] (duration: 10m 59s) [07:31:43] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:32:01] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78407 and previous config saved to /var/cache/conftool/dbconfig/20250619-073201-marostegui.json [07:32:54] (03CR) 10Brouberol: Airflow analytics-test: Optimization for LocalExecutors (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [07:33:52] !log Failover m2 from db1164 to db1250 - T397182 [07:33:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:56] T397182: Switchover m2 master db1164 -> db1250 - https://phabricator.wikimedia.org/T397182 [07:35:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.698s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:36:35] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet [07:36:43] FIRING: [4x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:36:48] FIRING: [18x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:37:45] !log just started es read only backup regeneration T387892 [07:37:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:49] T387892: Decommission backup1001, backup1002, backup2001, backup2002 (and their arrays) - https://phabricator.wikimedia.org/T387892 [07:37:59] (03PS1) 10Marostegui: db1164: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1161379 [07:39:02] (03CR) 10Marostegui: [C:03+2] db1164: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/1161379 (owner: 10Marostegui) [07:39:16] !log jmm@cumin1003 START - Cookbook sre.ganeti.addnode for new host ganeti2047.codfw.wmnet to cluster codfw and group B [07:39:57] (03CR) 10Brouberol: Airflow analytics-test: Optimization for LocalExecutors (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [07:40:15] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.602s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:41:34] !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2047.codfw.wmnet to cluster codfw and group B [07:41:39] !log jmm@cumin1003 START - Cookbook sre.ganeti.addnode for new host ganeti2048.codfw.wmnet to cluster codfw and group B [07:41:43] FIRING: [17x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:41:52] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1164,1217].eqiad.wmnet with reason: Maintenance [07:42:18] (03CR) 10Jelto: [C:03+2] gitlab-runner: upgrade default image to bookworm on Trusted Runners [puppet] - 10https://gerrit.wikimedia.org/r/1160120 (https://phabricator.wikimedia.org/T384595) (owner: 10Jelto) [07:44:16] FIRING: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.48s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:45:21] PROBLEM - haproxy failover on dbproxy1029 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:45:25] PROBLEM - haproxy failover on dbproxy1027 is CRITICAL: CRITICAL check_failover servers up 1 down 1: https://wikitech.wikimedia.org/wiki/HAProxy [07:45:39] FIRING: [4x] SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [07:46:27] (03PS1) 10Marostegui: mariadb: Move db1164 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/1161381 (https://phabricator.wikimedia.org/T397397) [07:46:43] FIRING: [17x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:47:09] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78409 and previous config saved to /var/cache/conftool/dbconfig/20250619-074708-marostegui.json [07:47:13] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [07:47:24] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance [07:47:31] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78410 and previous config saved to /var/cache/conftool/dbconfig/20250619-074731-marostegui.json [07:47:59] (03PS1) 10Elukey: admin: allow dcops to use perccli and storcli via sudo [puppet] - 10https://gerrit.wikimedia.org/r/1161382 (https://phabricator.wikimedia.org/T395939) [07:47:59] haproxy alerts are expected [07:49:16] RESOLVED: MediaWikiLatencyExceeded: p75 latency high: eqiad mw-parsoid releases routed via main (k8s) 1.793s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [07:50:18] 06SRE, 10SRE-swift-storage, 07SRE-Unowned, 06Data-Persistence, and 2 others: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584#10930719 (10elukey) @MatthewVernon would it be ok to start the upload of the new tiles to Swift, while we are removing... [07:50:23] (03CR) 10Vgutierrez: [C:03+1] "looks good & varnish tests are happy, please update https://wikitech.wikimedia.org/wiki/X-Analytics" [puppet] - 10https://gerrit.wikimedia.org/r/1160381 (https://phabricator.wikimedia.org/T390924) (owner: 10Krinkle) [07:50:23] (03CR) 10Marostegui: [C:03+2] mariadb: Move db1164 to m5 [puppet] - 10https://gerrit.wikimedia.org/r/1161381 (https://phabricator.wikimedia.org/T397397) (owner: 10Marostegui) [07:50:30] !log installing glib2.0 security updates [07:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:01] (03CR) 10Jelto: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1154866 (https://phabricator.wikimedia.org/T350794) (owner: 10AOkoth) [07:51:43] FIRING: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:51:43] FIRING: [17x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:55:04] (03PS1) 10Marostegui: db1179: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161383 (https://phabricator.wikimedia.org/T397279) [07:55:32] (03CR) 10MVernon: [C:03+2] thanos: add new backends, remove old ones gone from rings [puppet] - 10https://gerrit.wikimedia.org/r/1160855 (https://phabricator.wikimedia.org/T391352) (owner: 10MVernon) [07:55:49] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P78411 and previous config saved to /var/cache/conftool/dbconfig/20250619-075548-root.json [07:56:19] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance [07:56:43] RESOLVED: [3x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:56:48] FIRING: [14x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [07:59:40] (03CR) 10Marostegui: [C:03+2] db1179: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161383 (https://phabricator.wikimedia.org/T397279) (owner: 10Marostegui) [07:59:51] (03CR) 10MVernon: [C:03+2] thanos: add new nodes to ring, drain old ones [puppet] - 10https://gerrit.wikimedia.org/r/1160856 (https://phabricator.wikimedia.org/T392908) (owner: 10MVernon) [08:00:05] hashar and brennen: Deploy window MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0800) [08:00:13] o/ [08:00:53] (03PS1) 10Muehlenhoff: Add Joanna to Bitu account managers [puppet] - 10https://gerrit.wikimedia.org/r/1161389 [08:01:35] (03PS1) 10TrainBranchBot: group2 to 1.45.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161390 (https://phabricator.wikimedia.org/T392176) [08:01:36] (03CR) 10TrainBranchBot: [C:03+2] group2 to 1.45.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161390 (https://phabricator.wikimedia.org/T392176) (owner: 10TrainBranchBot) [08:01:43] FIRING: [9x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [08:02:26] (03Merged) 10jenkins-bot: group2 to 1.45.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161390 (https://phabricator.wikimedia.org/T392176) (owner: 10TrainBranchBot) [08:03:03] 06SRE, 10SRE-SLO, 10Observability-Metrics, 13Patch-For-Review: Create a Pyrra template for Istio-based K8s services and apply it to Citoid - https://phabricator.wikimedia.org/T391852#10930800 (10elukey) >>! In T391852#10927063, @elukey wrote: >>> * The success SLO seems not taken into account so far from o... [08:04:04] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 9 - rack E3) - https://phabricator.wikimedia.org/T390176#10930825 (10Stevemunene) the hosts are finally done draining and are listed as decommissioned {F62386905} disabled pup... [08:04:28] 10ops-eqiad, 06SRE, 06DC-Ops, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Upgrade an-worker hard drives from 4TB to 8TB (group 9 - rack E3) - https://phabricator.wikimedia.org/T390176#10930837 (10Stevemunene) [08:05:28] (03PS5) 10Aqu: Airflow analytics-test: Optimization for LocalExecutors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) [08:05:36] (03CR) 10Aqu: Airflow analytics-test: Optimization for LocalExecutors (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [08:06:43] RESOLVED: [8x] CategoriesQueryServiceUpdateLagTooHigh: Categories Query service lag is above 2 days - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DCategoriesQueryServiceUpdateLagTooHigh [08:07:21] !log installing python-tornado security updates [08:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:03] (03PS1) 10Marostegui: db1164: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161395 (https://phabricator.wikimedia.org/T397397) [08:08:06] (03PS6) 10Aqu: Airflow analytics-test: Optimization for LocalExecutors [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) [08:08:13] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78412 and previous config saved to /var/cache/conftool/dbconfig/20250619-080812-marostegui.json [08:08:18] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [08:08:21] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78413 and previous config saved to /var/cache/conftool/dbconfig/20250619-080820-root.json [08:10:17] 10SRE-SLO, 06SRE Observability, 10Abstract Wikipedia team (26Q1 (Jul–Sep)), 07Essential-Work: create new SLO dashboard via Pyrra - https://phabricator.wikimedia.org/T394057#10930936 (10DSantamaria) [08:10:22] !log mvernon@cumin1003 START - Cookbook sre.hosts.decommission for hosts thanos-be[1001-1004].eqiad.wmnet [08:10:37] jouncebot: now and next [08:10:37] For the next 1 hour(s) and 49 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0800) [08:10:50] (03PS1) 10Vgutierrez: haproxy: Disable OCSP monitoring for LE unified cert [puppet] - 10https://gerrit.wikimedia.org/r/1161397 (https://phabricator.wikimedia.org/T370821) [08:11:02] (03CR) 10Filippo Giunchedi: [C:03+2] thanos: enable snappy compression for grpc in query [puppet] - 10https://gerrit.wikimedia.org/r/1160749 (https://phabricator.wikimedia.org/T394318) (owner: 10Filippo Giunchedi) [08:11:08] (03PS2) 10Filippo Giunchedi: thanos: enable snappy compression for grpc in query [puppet] - 10https://gerrit.wikimedia.org/r/1160749 (https://phabricator.wikimedia.org/T394318) [08:11:14] (03CR) 10Filippo Giunchedi: [V:03+2 C:03+2] thanos: enable snappy compression for grpc in query [puppet] - 10https://gerrit.wikimedia.org/r/1160749 (https://phabricator.wikimedia.org/T394318) (owner: 10Filippo Giunchedi) [08:11:34] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161397 (https://phabricator.wikimedia.org/T370821) (owner: 10Vgutierrez) [08:11:41] (03CR) 10Marostegui: [C:03+2] db1164: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161395 (https://phabricator.wikimedia.org/T397397) (owner: 10Marostegui) [08:11:59] !log hashar@deploy1003 rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.6 refs T392176 [08:12:03] T392176: 1.45.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T392176 [08:13:21] RECOVERY - haproxy failover on dbproxy1029 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:13:25] RECOVERY - haproxy failover on dbproxy1027 is OK: OK check_failover servers up 2 down 0: https://wikitech.wikimedia.org/wiki/HAProxy [08:14:20] logs are quiet [08:14:37] 06SRE, 10SRE-swift-storage, 07SRE-Unowned, 06Data-Persistence, and 2 others: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584#10930949 (10elukey) >>! In T396584#10930868, @MatthewVernon wrote: > @elukey yes, that should be fine to start upload -... [08:17:04] !log Ran fixStuckGlobalRename.php for T397384 T397219 T397218 [08:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:13] T397384: Unblock stuck global rename of Mr. PowerUp98 - https://phabricator.wikimedia.org/T397384 [08:17:13] T397219: Unblock stuck global rename of CyberLife070 - https://phabricator.wikimedia.org/T397219 [08:17:13] T397218: Unblock stuck global rename of Renamed user fc26ace47276834fd507d19dab11aed6 - https://phabricator.wikimedia.org/T397218 [08:21:58] (03CR) 10Muehlenhoff: [C:03+2] mediawiki/memcached: Switch to firewall_src_sets [puppet] - 10https://gerrit.wikimedia.org/r/1156669 (owner: 10Muehlenhoff) [08:23:20] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78414 and previous config saved to /var/cache/conftool/dbconfig/20250619-082320-marostegui.json [08:23:26] (03PS1) 10Slyngshede: Lock RQ dependency at 1.16.2 [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/1161409 (https://phabricator.wikimedia.org/T397300) [08:23:27] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78415 and previous config saved to /var/cache/conftool/dbconfig/20250619-082326-root.json [08:23:33] !log akosiaris@cumin1003 START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [08:25:08] !log mvernon@cumin1003 START - Cookbook sre.dns.netbox [08:28:37] !log akosiaris@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [08:29:15] !log mvernon@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[1001-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003" [08:29:51] RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr2-eqord:xe-0/1/3 (Transport: cr3-ulsfo:xe-0/1/1 (Arelion, IC-313592 51ms 10Gbps wave) {#11372}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown [08:29:57] jmm@cumin1003 addnode (PID 2229380) is awaiting input [08:31:12] !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2048.codfw.wmnet to cluster codfw and group B [08:31:42] (03CR) 10Filippo Giunchedi: [C:03+2] thanos: force query-frontend query stats [puppet] - 10https://gerrit.wikimedia.org/r/1160748 (https://phabricator.wikimedia.org/T394318) (owner: 10Filippo Giunchedi) [08:31:48] !log installing modsecurity-apache security updates [08:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:19] mvernon@cumin1003 decommission (PID 2232385) is awaiting input [08:33:03] !log mvernon@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[1001-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003" [08:33:03] !log mvernon@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:33:04] !log mvernon@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-be[1001-1004].eqiad.wmnet [08:33:15] 06SRE, 10SRE-swift-storage: Q4 Thanos hardware refresh - https://phabricator.wikimedia.org/T391352#10931066 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by mvernon@cumin1003 for hosts: `thanos-be[1001-1004].eqiad.wmnet` - thanos-be1001.eqiad.wmnet (**PASS**) - Downtimed host on Icinga/Ale... [08:33:29] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06DC-Ops, 10decommission-hardware: decommission thanos-be100[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T397414 (10MatthewVernon) 03NEW [08:34:19] 06SRE, 10SRE-swift-storage: Q4 Thanos hardware refresh - https://phabricator.wikimedia.org/T391352#10931085 (10MatthewVernon) [08:35:30] (03CR) 10Btullis: [C:03+2] Prepare for renaming kafka-stretch200[1-2] to dse-k8s-worker200[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/1160888 (https://phabricator.wikimedia.org/T353789) (owner: 10Btullis) [08:37:18] 10ops-eqiad, 06SRE, 10SRE-swift-storage, 06Data-Persistence, 06DC-Ops: Q4:rack/setup/install ms-be109[2-5] - https://phabricator.wikimedia.org/T393104#10931104 (10MatthewVernon) @Jclark-ctr I've just put in T397414 to decommission (amongst others) thanos-be1003 in `C4` and thanos-be1004 in `D7`; could th... [08:38:27] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78417 and previous config saved to /var/cache/conftool/dbconfig/20250619-083827-marostegui.json [08:38:33] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78418 and previous config saved to /var/cache/conftool/dbconfig/20250619-083832-root.json [08:39:35] FIRING: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [08:42:44] (03CR) 10Marostegui: Add switchover cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1129904 (https://phabricator.wikimedia.org/T384810) (owner: 10Federico Ceratto) [08:43:41] (03CR) 10Marostegui: "I'd like to see if we can have a deeper review by someone else" [cookbooks] - 10https://gerrit.wikimedia.org/r/1129904 (https://phabricator.wikimedia.org/T384810) (owner: 10Federico Ceratto) [08:44:57] (03CR) 10Fabfur: [C:03+1] haproxy: Disable OCSP monitoring for LE unified cert [puppet] - 10https://gerrit.wikimedia.org/r/1161397 (https://phabricator.wikimedia.org/T370821) (owner: 10Vgutierrez) [08:46:46] (03CR) 10Vgutierrez: [C:03+2] haproxy: Disable OCSP monitoring for LE unified cert [puppet] - 10https://gerrit.wikimedia.org/r/1161397 (https://phabricator.wikimedia.org/T370821) (owner: 10Vgutierrez) [08:47:03] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2235.codfw.wmnet with reason: Maintenance [08:47:36] (03PS1) 10Marostegui: db2235: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161433 (https://phabricator.wikimedia.org/T397412) [08:48:08] (03CR) 10Marostegui: [C:03+2] db2235: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161433 (https://phabricator.wikimedia.org/T397412) (owner: 10Marostegui) [08:48:23] (03PS1) 10Vgutierrez: hiera: Switch lvs4009 to katran [puppet] - 10https://gerrit.wikimedia.org/r/1161434 (https://phabricator.wikimedia.org/T396561) [08:48:56] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1161434 (https://phabricator.wikimedia.org/T396561) (owner: 10Vgutierrez) [08:50:13] PROBLEM - MariaDB Replica IO: m5 on db2160 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl2024@db2235.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 100000 message: Cant connect to server on db2235.codfw.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:50:21] ^ expected [08:50:42] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2160,2235].codfw.wmnet with reason: Maintenance [08:50:43] !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet [08:51:06] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti2045-ganeti2050 into production and decom ganeti2019-ganeti2024 - https://phabricator.wikimedia.org/T396590#10931171 (10ops-monitoring-bot) Draining ganeti2021.codfw.wmnet of running VMs [08:51:43] jouncebot: nowandnext [08:51:43] For the next 1 hour(s) and 8 minute(s): MediaWiki train - Utc-7 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T0800) [08:51:43] In 1 hour(s) and 8 minute(s): MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250619T1000) [08:52:00] 06SRE, 06Traffic, 13Patch-For-Review: Research and respond to Let's Encrypt's intent to deprecate OCSP in favour of CRLs - https://phabricator.wikimedia.org/T370821#10931172 (10Vgutierrez) 05In progress→03Resolved a:03Vgutierrez [08:52:32] !log btullis@cumin1003 START - Cookbook sre.hosts.rename from kafka-stretch2001 to dse-k8s-worker2001 [08:52:53] !log btullis@cumin1003 START - Cookbook sre.dns.netbox [08:52:55] (03CR) 10Jcrespo: [C:03+2] "Thanks, I did some of those afterwards (apparently, if I rename roles or create new ones they default to puppet5 :-(, which created some m" [puppet] - 10https://gerrit.wikimedia.org/r/1160691 (https://phabricator.wikimedia.org/T387892) (owner: 10Jcrespo) [08:53:35] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78419 and previous config saved to /var/cache/conftool/dbconfig/20250619-085334-marostegui.json [08:53:40] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [08:53:45] !log marostegui@cumin1002 dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78420 and previous config saved to /var/cache/conftool/dbconfig/20250619-085344-root.json [08:53:50] !log marostegui@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance [08:53:58] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depooling db2182 (T396130)', diff saved to https://phabricator.wikimedia.org/P78421 and previous config saved to /var/cache/conftool/dbconfig/20250619-085357-marostegui.json [08:55:06] !log jmm@cumin1003 END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet [08:55:32] (03CR) 10Urbanecm: [C:03+2] changeprop: Decrease reenqueue_delay for Getting Started notif job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1150699 (https://phabricator.wikimedia.org/T394958) (owner: 10Urbanecm) [08:55:58] (03PS7) 10Btullis: Airflow: Use a python value for the xcom_sidecar resource settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [08:56:05] !log jmm@cumin1003 START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet [08:56:13] RECOVERY - MariaDB Replica IO: m5 on db2160 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [08:56:40] !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2001 to dse-k8s-worker2001 - btullis@cumin1003" [08:57:07] (03Merged) 10jenkins-bot: changeprop: Decrease reenqueue_delay for Getting Started notif job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1150699 (https://phabricator.wikimedia.org/T394958) (owner: 10Urbanecm) [08:58:13] 06SRE, 10SRE-swift-storage, 07SRE-Unowned, 06Data-Persistence, and 2 others: Create a new bucket for Tegola's tile cache and duplicate its data - https://phabricator.wikimedia.org/T396584#10931199 (10MatthewVernon) @elukey You could delete each container in parallel (in a separate tmux/screen window or wha... [08:58:36] !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2001 to dse-k8s-worker2001 - btullis@cumin1003" [08:58:37] !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [08:58:37] !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-worker2001 on all recursors [08:58:40] !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2001 on all recursors [08:58:40] !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2001 [08:58:53] !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2001 [08:58:55] !log urbanecm@deploy1003 helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply [08:58:55] !log fceratto@cumin1002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T397164 [08:59:04] T397164: Switchover s8 master (db2161 -> db2165) - https://phabricator.wikimedia.org/T397164 [08:59:08] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2042 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:08] jmm@cumin1003 drain-node (PID 2238338) is awaiting input [08:59:10] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2030 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:10] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2027 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:10] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2031 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:12] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1103 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:12] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4037 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:18] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1106 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:24] !log urbanecm@deploy1003 helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply [08:59:26] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4047 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:26] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1110 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:32] !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kafka-stretch2001 to dse-k8s-worker2001 [08:59:34] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1114 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:38] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1105 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:44] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2034 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:44] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4043 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:59:59] !log urbanecm@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply [09:00:49] !log btullis@cumin1003 START - Cookbook sre.hosts.rename from kafka-stretch2002 to dse-k8s-worker2002 [09:01:08] !log urbanecm@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply [09:01:10] !log btullis@cumin1003 START - Cookbook sre.dns.netbox [09:01:16] !log urbanecm@deploy1003 helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply [09:01:38] (03CR) 10Brouberol: Airflow analytics-test: Optimization for LocalExecutors (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161047 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [09:02:12] !log urbanecm@deploy1003 helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply [09:06:46] btullis@cumin1003 rename (PID 2238782) is awaiting input [09:08:52] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet [09:09:01] 06SRE, 10Ganeti, 06Infrastructure-Foundations: Move ganeti2045-ganeti2050 into production and decom ganeti2019-ganeti2024 - https://phabricator.wikimedia.org/T396590#10931246 (10ops-monitoring-bot) Draining ganeti2021.codfw.wmnet of running VMs [09:10:39] !log btullis@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2002 to dse-k8s-worker2002 - btullis@cumin1003" [09:11:49] (03CR) 10Btullis: Airflow: Use a python value for the xcom_sidecar resource settings (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [09:12:43] (03PS1) 10Urbanecm: changeprop beta: Decrease reenqueue_delay for Getting Started notif job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161443 (https://phabricator.wikimedia.org/T394958) [09:13:44] btullis@cumin1003 rename (PID 2238782) is awaiting input [09:13:53] (03PS8) 10Btullis: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:14:15] !log btullis@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2002 to dse-k8s-worker2002 - btullis@cumin1003" [09:14:15] !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [09:14:15] !log btullis@cumin1003 START - Cookbook sre.dns.wipe-cache dse-k8s-worker2002 on all recursors [09:14:19] !log btullis@cumin1003 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2002 on all recursors [09:14:19] !log btullis@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2002 [09:14:29] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet [09:14:48] (03CR) 10Michael Große: [C:03+1] "Confirming reducing the reenqueue delay for this job to 30 minutes" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161443 (https://phabricator.wikimedia.org/T394958) (owner: 10Urbanecm) [09:15:08] (03CR) 10Urbanecm: [C:03+2] changeprop beta: Decrease reenqueue_delay for Getting Started notif job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161443 (https://phabricator.wikimedia.org/T394958) (owner: 10Urbanecm) [09:15:22] (03PS1) 10Marostegui: db2196: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161444 (https://phabricator.wikimedia.org/T397279) [09:15:33] !log marostegui@cumin1002 dbctl commit (dc=all): 'Depool db2196', diff saved to https://phabricator.wikimedia.org/P78422 and previous config saved to /var/cache/conftool/dbconfig/20250619-091532-root.json [09:15:40] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T396130)', diff saved to https://phabricator.wikimedia.org/P78423 and previous config saved to /var/cache/conftool/dbconfig/20250619-091539-marostegui.json [09:15:44] T396130: Add afl_ip_hex column and afl_var_dump_timestamp index to abuse_filter_log - https://phabricator.wikimedia.org/T396130 [09:15:52] (03PS9) 10Btullis: Airflow: Use a python value for the xcom_sidecar resource settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:16:01] !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2196.codfw.wmnet with reason: Maintenance [09:16:16] (03CR) 10Marostegui: [C:03+2] db2196: Migrate to MariaDB 10.11 [puppet] - 10https://gerrit.wikimedia.org/r/1161444 (https://phabricator.wikimedia.org/T397279) (owner: 10Marostegui) [09:16:20] (03PS10) 10Btullis: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:16:50] !log btullis@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2002 [09:16:53] (03Merged) 10jenkins-bot: changeprop beta: Decrease reenqueue_delay for Getting Started notif job [deployment-charts] - 10https://gerrit.wikimedia.org/r/1161443 (https://phabricator.wikimedia.org/T394958) (owner: 10Urbanecm) [09:17:19] (03PS1) 10Elukey: sre.hosts.provision: add another special Supermicro set of EFI settings [cookbooks] - 10https://gerrit.wikimedia.org/r/1161445 (https://phabricator.wikimedia.org/T397415) [09:17:29] !log btullis@cumin1003 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kafka-stretch2002 to dse-k8s-worker2002 [09:17:57] (03PS11) 10Btullis: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:19:12] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:19:15] !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:19:21] (03CR) 10CI reject: [V:04-1] Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [09:19:43] (03PS12) 10Btullis: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:21:50] (03PS2) 10Elukey: sre.hosts.provision: add another special Supermicro set of EFI settings [cookbooks] - 10https://gerrit.wikimedia.org/r/1161445 (https://phabricator.wikimedia.org/T397415) [09:22:08] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:22:11] !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:23:17] (03PS13) 10Btullis: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) [09:24:35] RESOLVED: ErrorBudgetBurn: citoid-requests codfw - https://slo.wikimedia.org/?search=citoid-requests - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:24:48] (03PS1) 10Effie Mouzeli: Add wikikube-worker-exp to Homer wmf plugin to assign to k8s BGP group [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1161447 [09:25:11] (03CR) 10Clément Goubert: "UX issue, otherwise LGTM." [cookbooks] - 10https://gerrit.wikimedia.org/r/1160817 (https://phabricator.wikimedia.org/T397148) (owner: 10JMeybohm) [09:25:34] (03PS3) 10Elukey: sre.hosts.provision: add another special Supermicro set of EFI settings [cookbooks] - 10https://gerrit.wikimedia.org/r/1161445 (https://phabricator.wikimedia.org/T397415) [09:25:44] (03CR) 10Brouberol: [C:03+1] Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [09:25:51] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:25:54] !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:26:39] (03CR) 10Cathal Mooney: [C:03+1] "LGTM tested here works as expected." [puppet] - 10https://gerrit.wikimedia.org/r/1160764 (https://phabricator.wikimedia.org/T397303) (owner: 10Vgutierrez) [09:27:05] (03CR) 10Giuseppe Lavagetto: "One main question about EXCLUDED_SERVICES, everything else can be fixed later/when we move the code to spicerack." [cookbooks] - 10https://gerrit.wikimedia.org/r/1160817 (https://phabricator.wikimedia.org/T397148) (owner: 10JMeybohm) [09:27:10] (03CR) 10Ladsgroup: "Wanna add collation table too or you want to add it later?" [puppet] - 10https://gerrit.wikimedia.org/r/1160178 (https://phabricator.wikimedia.org/T299951) (owner: 10Zabe) [09:28:01] !log marostegui@cumin1002 dbctl commit (dc=all): 'db2196 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78424 and previous config saved to /var/cache/conftool/dbconfig/20250619-092801-root.json [09:28:12] (03PS4) 10Elukey: sre.hosts.provision: add another special Supermicro set of EFI settings [cookbooks] - 10https://gerrit.wikimedia.org/r/1161445 (https://phabricator.wikimedia.org/T397415) [09:28:16] (03PS1) 10Effie Mouzeli: refactor server hostgroup matching [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/1161448 [09:28:21] (03CR) 10Btullis: [C:03+2] Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [09:28:23] (03PS1) 10Gerrit maintenance bot: mariadb: Promote db2196 to x1 master [puppet] - 10https://gerrit.wikimedia.org/r/1161449 (https://phabricator.wikimedia.org/T397419) [09:28:41] !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART [09:28:57] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4045 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:09] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1108 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:09] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2040 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:09] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2028 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:09] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2035 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:09] (03CR) 10Zabe: "https://gerrit.wikimedia.org/r/c/operations/puppet/+/1117909 ?" [puppet] - 10https://gerrit.wikimedia.org/r/1160178 (https://phabricator.wikimedia.org/T299951) (owner: 10Zabe) [09:29:11] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1112 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:11] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2033 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:11] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2037 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:13] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4038 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:13] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1111 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:13] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4044 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:13] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4042 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:15] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4048 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:17] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1104 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:17] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1107 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:23] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4041 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:23] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4046 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:23] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4039 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:27] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1115 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:27] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2038 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:27] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2032 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:27] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4049 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:27] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4040 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:29] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2039 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:29] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4051 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:33] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1102 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:39] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1109 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:39] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4050 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:39] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1101 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:39] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2041 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:41] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp4052 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:41] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1113 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:41] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp1100 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:41] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2036 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:29:48] (03PS1) 10Esanders: Deploy mobile insert menu to remaining top 20 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1161450 (https://phabricator.wikimedia.org/T388591) [09:29:49] RECOVERY - HAProxy HTTPS wikipedia.org ECDSA on cp2029 is OK: SSL OK - Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2025-09-09 23:59:37 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [09:30:17] (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1161382 (https://phabricator.wikimedia.org/T395939) (owner: 10Elukey) [09:30:26] !log btullis@cumin1003 START - Cookbook sre.hosts.reimage for host dse-k8s-worker2001.codfw.wmnet with OS bookworm [09:30:36] !log btullis@cumin1003 START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker2001 [09:30:41] (03Merged) 10jenkins-bot: Airflow: Render the xcom_sidecar resource settings correctly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160938 (https://phabricator.wikimedia.org/T396197) (owner: 10Btullis) [09:30:48] !log marostegui@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P78425 and previous config saved to /var/cache/conftool/dbconfig/20250619-093047-marostegui.json [09:31:06] (03CR) 10Ladsgroup: "oh I have memory of goldfish. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1160178 (https://phabricator.wikimedia.org/T299951) (owner: 10Zabe) [09:31:10] (03PS2) 10Zabe: filtered_tables: Add new categorylinks columns [puppet] - 10https://gerrit.wikimedia.org/r/1160178 (https://phabricator.wikimedia.org/T299951) [09:31:22] !log btullis@cumin1003 START - Cookbook sre.dns.netbox [09:32:12] (03CR) 10Ladsgroup: [C:03+2] filtered_tables: Add new categorylinks columns [puppet] - 10https://gerrit.wikimedia.org/r/1160178 (https://phabricator.wikimedia.org/T299951) (owner: 10Zabe) [09:32:29] (03CR) 10Hnowlan: [C:03+2] mobileapps: bump replicas significantly [deployment-charts] - 10https://gerrit.wikimedia.org/r/1160773 (owner: 10Hnowlan)