[08:10:54] ^ fixed the etherpad url [09:10:44] greetings [13:05:30] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1227321 [13:30:18] LGTM [13:30:41] taavi: re: https://phabricator.wikimedia.org/T377570 I forget the context, we decided not to go with 3x HA? [13:33:22] godog: I never got a proper answer what benefits we would get from a third node for services with no stateful data storage and where a single node is enough load-wise [13:34:13] taavi: ok thank you! that makes sense, would you mind updating the task with said context for the archives ? [13:35:49] I will [13:36:10] cheers [15:40:44] godog: pontoon nodes have $::realm == 'production', right? [15:42:53] taavi: mmhh 'no' off the top of my head because manifests/realm.pp is executed [15:43:15] I'm chekcing [15:44:09] yeah realm is labs [15:44:16] ok, thanks for checking [15:44:22] sure np [17:13:19] andrewbogott: your accidental nerd snipe is maybe close to me presenting you with a merge request that migrates the wikitech container build from a raw Dockerfile to a Blubber config. The full build test that I kicked off locally last night is still running though, so I haven't proved to myself it all works as planned yet. [17:14:27] Oh, I refactored the dockerfile at the same time. We can have a head:head build footprint contest. [17:16:30] I think yours is going to win that one. I still have a copy to a runtime container step in mine. [17:17:27] I could get rid of it relatively easily though [17:19:07] the main advance of mine is a hopefully more readable httrack config. A .httrackrc file lets the add and remove filters be written in a way that I think is much more understandable. [17:24:12] it certainly isn't understandable the way it is now [17:42:48] taavi: my judgement about the latest https://wikitech-static.wikimedia.org/ build is that it as a totally messed up front page but otherwise works and has (at least some) proper locally-stored images. Before I dive in to the front page issue, do you agree? [17:51:35] * dhinus off [21:10:12] andrewbogott: so... my thing ran for 15 hours, stored 23,000 things, and managed to miss all of the Portal landing pages. Not ideal. [21:10:49] oh, it didn't really miss them, the front page is just wrong. Try navigating to /wiki/main_page.html [21:10:57] (I'm testing a fix for that as we speak) [21:11:49] I'm looking at the files inside of my container [21:12:23] really? what's an example of a landing page that's missing? [21:13:18] you're talking about, like, https://wikitech-static.wikimedia.org/Portal_Toolforge.html ? [21:13:23] in my image, http://localhost:8000/wiki/Portal_Toolforge.html [21:13:58] huh [21:14:04] httrack got a lot of things, but not all the things for sure [21:14:05] I have https://wikitech-static.wikimedia.org/wiki/Portal_Toolforge.html [21:14:07] which works [21:14:30] you're using the same nginx conf as me? [21:14:52] (obviously if there is no Portal_Toolforge.html in your scrape at all then something interesting is happening) [21:15:34] I am inside the container looking at saved files. lots of things are just not there [21:15:46] ok [21:16:01] which makes me less excited to share my work [21:16:08] so sounds like something got mangled in your refactor of the httrack rules. [21:16:23] which may or may not be fun to try and fix [21:16:25] * bd808 should check a file count in the published image [21:17:16] published image is [21:17:20] root@c134a33d9f04:/usr/share/nginx/html# find . | wc [21:17:20] 52298 52300 5291849 [21:22:57] My final image size is 3.44G. It looks like the latest official build is 8.98G. [21:23:06] so yeah I f'ed something up [21:27:47] I see that the official image still misses things like https://wikitech-static.wikimedia.org/static/images/footer/wikimedia-button.svg too. [21:29:21] yeah, it's not perfect about images although I wonder why not that one... it should be a first-level reference from many pages [21:29:41] the re-fetch of SAL isn't quite right either. It doesn't have all the rewritten links I would expect. The link to the archives goes to the live site instead of the crawled page inside the container. I was seeing that in local testing yesterday too. [21:30:54] thcipriani asked an interesting hypothetical yesterday, could we be making a zim file as the archive? [21:31:21] yeah, I just pushed a fix for the SAL thing [21:31:45] I'm going to break the production site for a minute, will be back shortly [21:53:04] dammit the SAL is still messed up somehow [21:58:46] andrewbogott: from my messing about yesterday, I think that the `-#Ln` link traversal depth being small keeps httrack from rewriting the URLs in the source page. It seems that only links which are fetched are rewritten. This makes the single page fetch for the SAL produce a different file than the longer crawl would. [21:59:57] OK, you're talking about the 'messed up css' problem and not the "doesn't really update the content" problem? [22:00:26] yeah, things like js, css, images, outbound links [22:00:52] ok, that's easy to change... [22:03:25] ok, it /is/ updating SAL content now, just ugly. [22:06:24] so for this, I want to set --depth 2 and a high page count, assuming that the depth will restrict the page count from getting to huge [22:06:27] *too [22:08:48] yeah, that might be the right magic. [22:11:47] ok, time for yet another aws build... [22:12:10] (I can't cleanly switch over between old and new because I need the floating IP for letsencrypt to work at build time) [22:13:06] using the AWS web UI makes me hate Horizon a whole lot less [22:23:57] well, that does not seem to have helped at all [22:28:52] * andrewbogott off to cook dinner [22:29:08] * andrewbogott in EST today so it's not as early as it seems [22:30:23] andrewbogott: I pushed my not right, but maybe interesting changes into gitlab. The part that might have some inspiration for you is https://gitlab.wikimedia.org/bd808/wikitech-static-docker/-/blob/work/bd808/blubber/.pipeline/blubber.yaml?ref_type=heads#L39-119 [23:05:36] I will have a look!