[17:03:37] 10GitLab, 10Beta-Cluster-Infrastructure, 10m3api: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners - https://phabricator.wikimedia.org/T414864#11554620 (10dancy) @LucasWerkmeister Thanks for setting up https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2 and adding me as... [17:58:33] dancy: hi! I would have some time for debugging now ^^ [17:59:48] Great! I'm around [18:00:22] I added comment https://phabricator.wikimedia.org/T414864#11554620 earlier today [18:02:30] yeah I saw that but wasn’t sure how to respond 😅 [18:02:34] should we just chat in here btw? [18:02:40] sure [18:03:04] you’re not going to get a working pipeline without beta unblocking DO, which is what that task is about ^^ [18:03:24] (except that the very first pipeline in the tmp repo, https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/pipelines/161065, actually passed and I have no idea why o_O) [18:03:37] Interesting. [18:03:55] So a re-run of those jobs should work [18:04:02] but the WMCS builds fail before they can even reach Chromium, which is different from the DO builds which fail when #wpLoginAttempt doesn’t exist (because the page is actually a “you’re blocked” notice) [18:04:46] maybe. feel free to try a re-run [18:05:02] my best guess is that Beta only blocks a part of the DigitalOcean IP range and one job happened to run outside of it [18:05:50] (though it’s weird that this didn’t happen with all my other failed builds while I was trying to debug this last weekend) [18:05:55] I see [18:06:52] So if I see a "before all" hook failure, it's "working" (meaning ChromeDrivers, etc started properly and tried to access beta cluster) [18:06:57] yup [18:07:21] ok.. Experimenting based on that knowledge.. [18:08:02] meanwhile I don’t know why chrome/chromium fails to launch in the WMCS runner… in https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/commit/93817b49d5 I already added --verbose to the chromedriver args, as the error message says, and yet the log doesn’t seem to have anything useful [18:08:02] https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/jobs/727330/artifacts/raw/wdio-chromedriver.log [18:08:03] ok [18:09:26] btw I have confirmed that chromedriver is running: I caught it in the process list once: [18:09:26] `/tmp/chromedriver/linux-146.0.7651.0/chromedriver-linux64/chromedriver --port=39585 --allowed-origins=* --allowed-ips=0.0.0.0` [18:09:38] (That was on an WMCS runner) [18:10:13] ok [18:10:54] I’m trying to figure out if the “bind() failed: Cannot assign requested address (99)” and/or “listen on IPv6 failed with error ERR_ADDRESS_INVALID” could be related to the error [18:11:10] I feel like I ruled it out before because I also saw it in successful builds, but I can’t find that now, so maybe I’m misremembering [18:11:30] (errno 99 = EADDRNOTAVAIL Cannot assign requested address btw) [18:11:43] Nod. That would be interesting. [18:11:55] Seems like it should be allowing the OS to choose an ephemeral port [18:12:17] bind(2) describes that error as “A nonexistent interface was requested or the requested address was not local.” [18:13:07] possibly trying to listen on IPv4+IPv6, failing one, and not handling it gracefully? [18:21:15] https://stackoverflow.com/a/63310469/1420237 points to --whitelisted-ips but I assume that’s just the predecessor of --allowed-ips which per above is already being set [18:25:13] interesting, according to https://issues.chromium.org/issues/42322092 it binds to 127.0.0.1 by default, but to 0.0.0.0 with --whitelisted-ips [18:25:29] and I can confirm the same behavior locally with strace -e bind -f and with --allowed-ips as the option name [18:25:37] maybe we want it to only bind to 127.0.0.1 [18:25:45] I got a successful run on WMCS runners: https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/jobs/727406 [18:25:48] \o/ [18:25:54] I was about to say, I just got an email about a login from the beta cluster :) [18:26:27] ok, so no-sandbox and/or disable-dev-shm-usage was needed? [18:26:32] https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/merge_requests/1/diffs (suggested by AI) [18:26:56] I'll break down which setting actually makes the difference [18:28:23] …thanks [18:28:33] (I’ll admit, that “suggested by AI” has instantly killed my mood again) [18:29:49] (I wonder if it got the disable-dev-shm-usage from https://stackoverflow.com/a/67154031/1420237 or somewhere else) [18:42:28] “OAuth request returned non-200 HTTP status code: 401” o_O [18:42:30] (https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/jobs/727427) [18:47:44] lucaswerkmeister: I'm going to take a break now but I will keep working on this later. It feels like we're close to the root cause [18:50:03] alright, thank you! [18:52:45] ooh, another beta cluster login email [18:53:00] :o so many jobs https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/pipelines/161131 [18:53:47] haha yeah [19:00:12] hm. so in the first successful run, --headless=new worked, but in that latest pipeline, IIUC, --headless worked but --headless=new didn’t? [19:00:28] Looks like the old options without the `--` were being ignored [19:01:57] ah [19:02:16] but still, didn’t both of those commits have -- ? [19:02:39] (https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/commit/cd0bb02164 and https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/commit/2a7d0159ab) [19:18:43] lucaswerkmeister: The changes in https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/merge_requests/1 seem to help move things forward a bit. The change adds `--` to the existing command line flags, and adds `--no-sandbox`. [19:19:24] alright, thank you! [19:19:58] do you want to keep looking or is this the point where I copy the commit over to the real repo with Co-Authored-By:? [19:20:16] (I would tweak it to add wmcs to the other job(s) as well and probably sort the flags alphabetically) [20:45:58] lucaswerkmeister: I've made additional changes to https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oauth2/-/merge_requests/1 and the pipeline passes in `wmcs`! I'm happy to turn it back over to you now. [20:51:27] 10GitLab, 10Beta-Cluster-Infrastructure, 10m3api: Unblock running tests against Beta Cluster from Digital Ocean GitLab CI runners - https://phabricator.wikimedia.org/T414864#11555474 (10dancy) Here's a working .gitlab-ci.yml config using `wmcs` runners: https://gitlab.wikimedia.org/repos/m3api/tmp-m3api-oau... [22:39:31] 10GitLab (Upstream pit of despair 🕳️), 07Upstream: GitLab truncates commit messages over 1k of text - https://phabricator.wikimedia.org/T330790#11555825 (10TheDJ) 05Open→03Resolved a:03TheDJ Both examples show fully now in gitlab for me.