[07:43:57] nokibsarkar: in your build process for toolforge, you can add at the end an `rm -rf /workspace/.node_modules` and it will save ~1.2G of space in the final image (I think) [07:45:00] is there any `.` in front of node_modules? [07:45:15] @dcaro [07:46:23] Toolforge migration is almost complete. I have mixed reaction for gitlab ci, but I am trying that out for backend and it is working awesome [07:47:28] gitlab ci has a lot of things built-in, which github action does not. But, at the same time, it took almost 6 hours for me to figure out how secure files work [07:48:22] but, my toolforge deployment is kinda slow (we did not officially switched there yet). [07:54:38] it has no `.` yep xd [07:56:26] Currently I am building it on github action and only the minimum built version is push. Current implementation already works and gives me flexibility of upgrading whenever I want. Should I switch to build service instead? [07:59:35] But, I am facing issue such as no backup server for my backend. In the droplet, I had two servers - one is the actual production server, another is read-only server. Whenever actual server died (for ANY reason), read-only server would take the control until actual server arises from the ashes. It was very critical for deployment or other type of downtime [08:00:19] I'd recommend it yep, it's going to be a bit slower I think though as it has to assemble the image (instead of just dropping the static files in the tool home) [08:01:18] it gives you (and the platform) the ability to not depend on NFS, and thus being able to run the image with all that's needed independently on the workers [08:03:05] nokibsarkar: what's the current system setup on toolforge? [08:04:14] If u are talking about switching from github action to toolforge build service, please look at my current pipeline (https://github.com/nokibsarkar/campwiz-frontend/actions/runs/15106260231). The github action built is upstream for both toolforge and our droplet. Until we completely switchover to toolforge, I am not sure if it is a good idea to switch to toolforge [08:04:14] build service. [08:05:46] system setup means? (re @wmtelegram_bot: nokibsarkar: what's the current system setup on toolforge?) [08:06:23] nokibsarkar: the part you'd have to change to use the build service there (at least for now), is instead of the ssh-ing + git pull + restart, trigger a build (using the `--ref build`) + restart [08:07:20] nokibsarkar: I mean if you just have one big static webservice, a backend continuous job + a static webservice, etc. (re system setup means?) [08:07:33] if you were to draw me how it's setup, what would that drawing show? [08:08:58] So, I have a frontend nxtjs server (let's say nodejs server), one backend server, one continuous grpc server, one toolsdb database (until now, redis is not used, but planning to use it) [08:09:17] is that backend server public? [08:10:44] Oh yeah, https://github.com/nokibsarkar/campwiz-backend. All the codes and manual reside in the repo (https://github.com/nokibsarkar/campwiz). (re @wmtelegram_bot: is that backend server public?) [08:11:13] I mean if it's meant to be used by users directly (bypassing the frontend) [08:12:04] oh, actually, if anyone wants, they can use it directly. But, as of now, only the campwiz frontend is using it [08:12:12] okok [08:13:02] we are planning to design mobile application, where the backend would be used directly. But the GRPC server cannot be used directly. It must be used by the backend. [08:13:27] 👍 [08:16:24] is the backend meant to have >1 replica? [08:16:37] yep. 2 replica [08:17:22] also, the backend should have another version (`-tag read-only`) which should be used as failover server [08:17:30] hmm... I'm thinking on how to get the load balancing behavior you want yep [08:18:02] But, apparently I cannot define any type of build arguments [08:19:38] nokibsarkar: what do you mean? [08:20:20] like `go build ./.. ` [08:21:25] i need two version of the same server. one is the usual, another is the read-only version for failover [08:22:37] and the failover is very critical feature, because users do not like a blank screen or a loader spinning infinitely. [08:24:16] nokibsarkar: I'd have to investigate a bit more, but from the top of my head, you can have a branch for it (and pass `--ref `, or build both at the same time by having both packages declared https://github.com/heroku/buildpacks-go/blob/main/buildpacks/go/src/main.rs#L111C13-L111C14) [08:25:22] you can pass also environment variables, I'd have to investigate how to use that in the build process though [08:26:03] of course! you can use `GOFLAGS` [08:26:43] you can pass it at build time with `toolforge build start --envvar="GOFLAGS=-tag read-only" ...` [08:26:58] (untested though) [08:28:15] Actually, another branch is not very elegant solution here. Because, in my read-only mode, all the code remains unchanged, only a few things change depending on the tag=readonly). Also, I have to run the build service twice. Second is with the build arguments or reference. Then, when I start the webservce, what would happen? The latest one would be used [08:30:10] Another question, in my procfile, I added a second entry. How can I run that second entry as a continuous job? [08:31:08] `toolforge jobs start --command --image ` like this? [08:31:43] !log dcaro@tools-bastion-13 tools.wikibugs toolforge webservice restart #wikibugs had exited but the process did not die [08:31:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:32:51] !log dcaro@tools-bastion-13 tools.wikibugs restarting phorge+gerrit too as they depend on wikibugs #wikibugs had exited but the process did not die [08:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:34:28] nokibsarkar: yep, that should do it [08:34:56] nokibsarkar: agree that a branch might not be the best, I'd try with the `GOFLAGS` [08:35:42] Then my second build would override the first build [08:36:04] you'd build two different images [08:36:30] using `--image-name ` [08:36:42] Ooo [08:36:52] the other option is declaring the two binaries and build a single image with both binaries [08:37:00] (you save having to build twice) [13:55:32] I'm getting the following puppet error on a traffic project instance: Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Lookup of key 'lookup_options' failed: cloudlib::httpyaml failed Tried to load unspecified class: Symbol (file: /srv/puppet_code/environments/production/modules/role/manifests/wmcs/instance.pp, line: 3, column: 5) [13:55:32] on node traffic-cache-bullseye.traffic.eqiad1.wikimedia.cloud [13:55:38] could somebody point me in the right direction? [13:57:51] vgutierrez: which instance? [13:58:03] raffic-cache-bullseye.traffic.eqiad1.wikimedia.cloud [13:58:08] *traffic-cache-bullseye.traffic.eqiad1.wikimedia.cloud [14:13:33] vgutierrez: i think this is the ruby yaml parser misbehaving [14:14:09] in particular, in hiera you have `server_bind: :9341`, where `:9341` is being interpreted as a ruby symbol (https://www.rubyguides.com/2018/02/ruby-symbols/) [15:40:31] vgutierrez: sorry for disappearing, had to jump into meetings for a bit. I created T394691 for that issue and should have a patch soon [15:40:32] T394691: puppet-enc issue with Hiera values starting with a colon due to PyYAML and Ruby YAML parsing differences - https://phabricator.wikimedia.org/T394691 [15:40:44] taavi: thx <3 [15:52:51] vgutierrez: fixed, and the remaining puppet issues look like they're related to that specific instance [15:53:08] cool, thx [16:45:31] nokibsarkar: found a way to get your custom load-balancing setup (it's not pretty, but it works), using https://gitlab.wikimedia.org/toolforge-repos/wm-lol/-/tree/fakelb?ref_type=heads [16:45:59] it (ab)uses the php buildpack to just use nginx instead with the given config [16:47:07] tested it in lima-kilo successfully (starting it as a webservice, then having two continuous jobs one called `campwiz-backend` and one `campwiz-backend-ro`) [16:50:17] Wow [16:54:01] That's a very good. But is it a good thing that we both are abusing some hackish way into the system? I mean I already got my hand dirty with frontend build thingy (using build service without using build service). If a new maintainer see this, he would wonder from which planet these people come. I was expecting some elegant (or at least, not very abusive way of [16:54:01] doing things). I [16:54:01] am kind of worried about maintainibility already. 😂 😂 [16:57:39] I'm almost certain that for the frontend you can collapse it in one single build, without the need to create a build branch and such, I'll try that next (it's just doing an rm of the node-modules directory after generating the static files, or telling node to put them in /tmp for example if possible) [16:58:24] for the proxy yep, there's no easy workaround I think, it's not a usual feature [16:59:53] Using this solution, I have to build 3 images (first one `--image-name normal-backend`), second one with `-tag readonly --image-name readonly-backend`, third one this hackish php. Then I would start first two images as continuous jobs, then third one as webservice, right? (re @wmtelegram_bot: tested it in lima-kilo successfully (starting it as a webservice, [16:59:53] then having tw [16:59:54] o continuous jobs one called `campwiz-ba...) [17:00:30] yep [17:00:59] is there a way to make the read-only feature a flag of the starting binary instead of having to build the whole image again? [17:01:08] (ex. `campwiz -ro`) [17:01:33] that way you can add two entries in the `Procfile` to start ro/rw, and build a single image [17:02:12] I mean, I can make that a flag, but then it would break current deployment in droplet (which, I am kinda afraid to change in the midst of live campaign) [17:02:46] I see, how does it work now? [17:03:10] which one? read only mode? [17:04:24] Last night I broke the port flag, which caused a lot of chaos. Server was switching between read-only and read-write mode [17:05:10] I woke up by emergency call from Project Lead about the disruption. 🤣 [17:05:31] xd, :hugops: [17:06:07] yep, the read-only mode, I see that it uses the `//go:build readonly` header [17:08:28] Yes, So, I have a file containing a function called setup router which in turn calles different functions to mount each module of routes. So, this feature flag changes the 2nd level function calls to implement only the read-only endpoints. Other end-points are replaced by a function giving error message 'Server is read-only mode. May be maintenance going on blah [17:08:28] blah blah' [17:11:11] This read-only mode avoids some surprises and let the end user still access to the tool, but the database do not get written. This also helps with migration, downtime, maintenance [17:13:12] then for backwards compatibility, you can keep that second read-only main file, and modify the current 'rw' one to allow the parameter flip, that should work on both deployments [17:13:14] It also let them know why their changes were not saved, and gives a feeling that they are dealing with a very big successful software company made tool which has millions of users 😁. So, they tend to complain less, which makes the developer happy. [17:14:40] xd [17:16:35] i also use sentry for error monitoring, which gives me a heads up before the user even gets ready to complain. Though third party sentry might have policy implication, which i asked to `answers@wikimedia.org`. Ironically, no answer came yet. [17:20:01] give it time, that email list is pretty generic, I saw you wrote a task, that will probably be faster [17:22:17] just had a quick look at the code, it seems it should not be hard to add the extra flag, I'd recommend doing that even if you end up not setting it up in toolforge [17:22:32] gtg. I'll be around later [19:16:45] So, I implemented your proxy thing, now it seems like my young energetic tool turned into a much older guy with very slow performance [19:16:58] @dcaro [19:19:25] nokibsarkar: can you open a task with the details, I'll take a deep dive tomorrow [19:23:04] I'm not caught up completely here. I wasn't understanding why you need build-time configuration for the read only service? that can't be a run-time feature flag/configuration option/environment variable and they share the same container image. [19:49:30] T394730 [19:49:45] now, i did use a flag (re @jeremy_b: I'm not caught up completely here. I wasn't understanding why you need build-time configuration for the read only service? that ...) [19:50:40] by the way, why my phabricator tasks aren't detected by bot ? [20:01:46] T394730 [20:02:02] [[phab:T394730]] [20:02:41] https://en.wikipedia.org/wiki/phab:T394730 [20:03:52] looks like it only responds to irc not bridgebot [20:04:57] nokibsarkar thanks! :) [21:44:08] !log codesearch soft rebooting instance codesearch9, web UI was down and could not get shell [21:44:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Codesearch/SAL [21:53:43] !log anticomposite@tools-bastion-13 tools.stewardbots ./SULWatcher/manage.sh restart # pymysql.err.InterfaceError: (0, '') when restarting from IRC [21:53:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL