[00:05:38] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06collaboration-services, 07Essential-Work: Set up zuul web on zuul1001/zuul2001 - https://phabricator.wikimedia.org/T405119#11274862 (10Dzahn) [00:19:15] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06collaboration-services, 07Essential-Work: Set up zuul web on zuul1001/zuul2001 - https://phabricator.wikimedia.org/T405119#11274868 (10Dzahn) more debugging and the remaining issues we are experiencing probably all come down to the zookeeper connectio... [00:23:12] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06collaboration-services, 13Patch-For-Review: puppetize setup of new zuul VMs - https://phabricator.wikimedia.org/T395938#11274869 (10Dzahn) I have done more debugging, such as disabling puppet, adding logging config file, and then pointing to this logg... [00:24:40] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06collaboration-services, 13Patch-For-Review: puppetize setup of new zuul VMs - https://phabricator.wikimedia.org/T395938#11274870 (10Dzahn) > The "Len error" in the Zookeeper logs means that the client (Kazoo) sent a message that is larger than Zookeep... [00:39:01] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06collaboration-services, 13Patch-For-Review: puppetize setup of new zuul VMs - https://phabricator.wikimedia.org/T395938#11274880 (10Dzahn) https://www.alibabacloud.com/blog/zookeeper-practice-how-to-tune-jute-maxbuffer_601674 [02:14:31] 06Project-Admins: Archive #acl*outreachy-mentors - https://phabricator.wikimedia.org/T407304 (10Aklapper) 03NEW p:05Triage→03Low [02:22:12] 06Project-Admins, 06Release-Engineering-Team (Doing 😎): Archive #acl*outreachy-mentors - https://phabricator.wikimedia.org/T407304#11274931 (10Aklapper) 05Open→03Resolved I also dropped all members. [02:35:42] 06Project-Admins, 06Release-Engineering-Team (Doing 😎): Archive #acl*outreachy-mentors - https://phabricator.wikimedia.org/T407304#11274945 (10Aklapper) ...which wouldn't be great for accessing tasks still with that View Policy ACL set, but there should be no such tasks, as this ACL was always meant to be... [06:49:31] FIRING: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:49:41] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407312 (10phaultfinder) 03NEW [06:52:38] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407312#11275243 (10ABran-WMF) 05Open→03Resolved a:03ABran-WMF related to {T365259} [06:53:00] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - gerrit1003:443 - https://phabricator.wikimedia.org/T407312#11275248 (10ABran-WMF) [07:00:28] Project beta-code-update-eqiad build #569993: 04FAILURE in 4 min 7 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/569993/ [07:01:04] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - gerrit1003:443 - https://phabricator.wikimedia.org/T407312#11275250 (10hashar) 05Resolved→03Open This has not been resolved, the Gerrit web service is still not available. The Apache scoreboard shows it is full: https://grafana.wikimedia.or... [07:02:29] PROBLEM - gerrit process on gerrit1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [07:03:32] Project beta-code-update-eqiad build #569994: 04STILL FAILING in 15 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/569994/ [07:04:29] RECOVERY - gerrit process on gerrit1003 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit [07:04:31] FIRING: [4x] ProbeDown: Service gerrit1003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:04:36] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407314 (10phaultfinder) 03NEW [07:09:31] RESOLVED: [4x] ProbeDown: Service gerrit1003:29418 has failed probes (tcp_gerrit_ssh_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:15:15] Yippee, build fixed! [07:15:15] Project beta-code-update-eqiad build #569995: 09FIXED in 2 min 14 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/569995/ [07:17:59] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407314#11275274 (10ABran-WMF) →14Duplicate dup:03T407312 [07:18:01] 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: ProbeDown - gerrit1003:443 - https://phabricator.wikimedia.org/T407312#11275276 (10ABran-WMF) [07:25:31] FIRING: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:25:43] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407316 (10phaultfinder) 03NEW [07:31:03] Project beta-code-update-eqiad build #569996: 04FAILURE in 8 min 3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/569996/ [07:35:09] Yippee, build fixed! [07:35:09] Project beta-code-update-eqiad build #569997: 09FIXED in 2 min 9 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/569997/ [07:35:31] RESOLVED: [2x] ProbeDown: Service gerrit1003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit1003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:10:21] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T407316#11275390 (10ABran-WMF) →14Duplicate dup:03T407312 [08:10:27] 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: ProbeDown - gerrit1003:443 - https://phabricator.wikimedia.org/T407312#11275392 (10ABran-WMF) [08:10:32] 06Release-Engineering-Team, 06collaboration-services: ProbeDown -gerrit1003:443 - https://phabricator.wikimedia.org/T407316#11275393 (10ABran-WMF) [08:10:56] 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: ProbeDown - gerrit1003:443 - https://phabricator.wikimedia.org/T407312#11275397 (10ABran-WMF) 05Open→03Resolved [09:20:50] 10Phabricator, 10CAS-SSO, 06Infrastructure-Foundations, 06SRE: Add logout.d script for Phabricator - https://phabricator.wikimedia.org/T286904#11275609 (10MoritzMuehlenhoff) Simon is adding support to Bitu to have users link their username similar to one can currently link a SUL account (https://phabricato... [11:25:03] 10Gerrit: Gerrit drops equals sign from commit message in email subject - https://phabricator.wikimedia.org/T407342 (10Lucas_Werkmeister_WMDE) 03NEW [11:28:54] 10Gerrit: Gerrit drops equals sign from commit message in email subject - https://phabricator.wikimedia.org/T407342#11276179 (10Lucas_Werkmeister_WMDE) Might be #upstream [bug 40015467](https://issues.gerritcodereview.com/issues/40015467)? Though the Unicode character in the commit message (the typographic quota... [12:49:42] (03update) 10asmartkitten: Delete `bin/move_project` [repos/phabricator/phabricator] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/merge_requests/101 (https://phabricator.wikimedia.org/T342275) [13:34:41] 10Phabricator (Upstream), 07Upstream: Incorrect default dropdown value in Project query form when passing an URI parameter - https://phabricator.wikimedia.org/T406127#11276719 (10Aklapper) [13:41:59] (03CR) 10Jforrester: [C:03+2] Zuul: [mediawiki/extensions/CampaignEvents] Add CentralAuth dependency [integration/config] - 10https://gerrit.wikimedia.org/r/1196180 (https://phabricator.wikimedia.org/T404995) (owner: 10Daimona Eaytoy) [13:42:22] (03CR) 10Jforrester: [C:03+2] Zuul: [mediawiki/extensions/BlueSpiceWikiFarm] Add BlueSpiceDiscovery dep [integration/config] - 10https://gerrit.wikimedia.org/r/1196120 (owner: 10Hslater) [13:43:34] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/CampaignEvents] Add CentralAuth dependency [integration/config] - 10https://gerrit.wikimedia.org/r/1196180 (https://phabricator.wikimedia.org/T404995) (owner: 10Daimona Eaytoy) [13:44:09] !log Zuul: [mediawiki/extensions/CampaignEvents] Add CentralAuth dependency, for T404995 [13:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:44:14] T404995: Change username sorting to be alphabetic in Contributions tab - https://phabricator.wikimedia.org/T404995 [13:44:21] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/BlueSpiceWikiFarm] Add BlueSpiceDiscovery dep [integration/config] - 10https://gerrit.wikimedia.org/r/1196120 (owner: 10Hslater) [13:44:30] !log Zuul: [mediawiki/extensions/BlueSpiceWikiFarm] Add BlueSpiceDiscovery dep [13:44:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:03:33] (03update) 10aklapper: Related GitLab patches: Ignore partial task ID matches [repos/phabricator/extensions] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/extensions/-/merge_requests/55 (https://phabricator.wikimedia.org/T395728) [14:17:47] (03merge) 10aklapper: Delete `bin/move_project` [repos/phabricator/phabricator] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/merge_requests/101 (https://phabricator.wikimedia.org/T342275) (owner: 10asmartkitten) [14:39:18] (03update) 10dancy: backport.py: Revise relevant dependency selection algorithm [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1010 (https://phabricator.wikimedia.org/T365146 https://phabricator.wikimedia.org/T371611 https://phabricator.wikimedia.org/T388025) [14:52:06] (03open) 10dancy: test-scap-backport: Set PATH [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/165 [14:52:07] (03update) 10dancy: test-scap-backport: Set PATH [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/165 [14:53:01] (03merge) 10dancy: test-scap-backport: Set PATH [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/165 [16:28:41] 10GitLab (CI & Job Runners), 06collaboration-services, 10Toolforge, 06cloud-services-team (FY2025/26-Q1): tofu-provisioning: Failed to install provider - https://phabricator.wikimedia.org/T405742#11277894 (10fnegri) After more testing, it seems to be a good'ol MTU issue: `lang=shell-session root@runner-10... [16:43:38] 10Gerrit: Access issue for golson-wmf - https://phabricator.wikimedia.org/T378187#11278035 (10GOlson-WMF) @Aklapper - I haven't had to use Gerrit in a minute, but I'll see if I can test if out! [17:07:47] 10Continuous-Integration-Config, 05FY2025-26 WE3.3 Engaging core audiences, 06Reader Experience Team (Sprint 7 [Q2 Oct 7-Oct 21 '25]), 07Unplanned-Sprint-Work: Add BetaFeatures and MetricsPlatform as CI dependencies for ReadingLists - https://phabricator.wikimedia.org/T407249#11278150 (10Jdlrobson-WMF) [17:14:36] 10GitLab (CI & Job Runners), 06collaboration-services, 10Toolforge, 06cloud-services-team (FY2025/26-Q1), 13Patch-For-Review: tofu-provisioning: Failed to install provider - https://phabricator.wikimedia.org/T405742#11278217 (10dcaro) Awesome find! what's the mtu of the vm itself? Is there a discrepancy? [17:16:26] 10GitLab (CI & Job Runners), 06collaboration-services, 10Toolforge, 06cloud-services-team (FY2025/26-Q1), 13Patch-For-Review: tofu-provisioning: Failed to install provider - https://phabricator.wikimedia.org/T405742#11278224 (10dcaro) >>! In T405742#11278217, @dcaro wrote: > Awesome find! > what's the mt... [17:22:47] 10GitLab (CI & Job Runners), 06collaboration-services, 10Toolforge, 06cloud-services-team (FY2025/26-Q1), 13Patch-For-Review: tofu-provisioning: Failed to install provider - https://phabricator.wikimedia.org/T405742#11278268 (10dcaro) I suspect that it might be related to the vxlan/overlay network sort o... [17:24:10] 10GitLab (CI & Job Runners), 06collaboration-services, 10Toolforge, 06cloud-services-team (FY2025/26-Q1), 13Patch-For-Review: tofu-provisioning: Failed to install provider - https://phabricator.wikimedia.org/T405742#11278287 (10fnegri) a:03fnegri [17:26:27] 10GitLab (CI & Job Runners), 06Release-Engineering-Team, 07Essential-Work: failed to configure registry cache exporter: invalid reference format error with new kokkuri - https://phabricator.wikimedia.org/T407294#11278303 (10dduvall) Jobs using `needs` are not getting the dotenv artifacts from `kokkuri:setup-... [17:35:10] (03update) 10dduvall: variables: Ensure all `.kokkuri` derived jobs get variables [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/153 [17:35:14] (03open) 10dduvall: variables: Ensure all `.kokkuri` derived jobs get variables [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/153 [17:48:11] (03open) 10dancy: monitoring/main.tf: Add metrics-server to kube-system [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/513 [17:48:13] (03update) 10dancy: monitoring/main.tf: Add metrics-server to kube-system [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/513 [17:49:50] (03update) 10dancy: monitoring/main.tf: Add metrics-server to kube-system [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/513 [17:49:54] (03update) 10dancy: monitoring/main.tf: Add metrics-server to kube-system [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/513 [17:50:57] (03merge) 10dancy: monitoring/main.tf: Add metrics-server to kube-system [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/513 [17:53:08] (03update) 10dduvall: variables: Ensure all `.kokkuri` derived jobs get variables [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/153 [17:57:18] (03open) 10dancy: Revert "monitoring/main.tf: Add metrics-server to kube-system" [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/514 [17:57:21] (03update) 10dancy: Revert "monitoring/main.tf: Add metrics-server to kube-system" [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/514 [17:58:31] (03merge) 10dancy: Revert "monitoring/main.tf: Add metrics-server to kube-system" [repos/releng/gitlab-cloud-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/514 [18:07:20] (03close) 10dduvall: variables: Ensure all `.kokkuri` derived jobs get variables [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/153 [20:11:46] 10Gerrit, 06collaboration-services: gerrit2003 is trying to backup incrementally 3.5 million files every hour, clogging backus and filling in available disk space - https://phabricator.wikimedia.org/T406762#11278937 (10Dzahn) I merged the change above to re-enable backups. Then noticed that currently puppet is... [21:08:32] I just got this error on gerrit: Plugin install error: https://gerrit.wikimedia.org/r/plugins/wm-zuul-status/static/wm-zuul-status.js load error from https://gerrit.wikimedia.org/r/plugins/wm-zuul-status/static/wm-zuul-status.js [21:08:42] and it seems to be dropping in and out for me [21:31:44] hashar: arnaudb ^