[13:04:01] I've a reimage that is spending a lot of time at the Run Puppet in NOOP mode to populate exported resources in PuppetDB stage [13:04:21] ...which I'm guessing means puppet is unhappy about something on the host; is there a way to check progress / see what the problem is? [13:05:35] facter -p runs on the host (via install_console) OK [13:06:14] [this is apus-be1004 which doesn't appear on puppetboard, presumably because of the same problem] [13:07:38] Emperor: yes, sudo install-console $FQDN [13:07:56] puppet agent -t --noop [13:08:55] and the problem is that is not in site.pp [13:09:27] ugh, stoats [13:09:42] at least that's easily fixable [13:10:16] if you merge the fix and run the noop manually while the cookbook is still polling it will continue happily [13:10:47] if not you can resume it with --no-pxe [13:13:49] volans: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1146986 look OK? [13:14:42] sure [13:17:48] that ran to completion (with some red on the way past), so hopefully it'll progress now [13:17:54] indeed so, grand. [13:18:19] great [13:39:35] huh, puppet now complaining about a bad systemd-timesyncd.service.d puppet override file [13:41:15] (looks like puppet is installing a temporary override file, systemd doesn't like it, puppet removes it again, which makes inspecting it a bit challenging) [13:47:44] we ship an override which sets InaccessiblePaths to deal with some HDFS issues, but that has been around for a long time, did you capture the exact error message? [13:50:24] moritzm: Execution of '/usr/bin/systemd-analyze verify --recursive-errors=no /etc/systemd/system/systemd-timesyncd.service.d/puppet-override.conf20250516-3360-tkal5:systemd-timesyncd.service' returned 1: [13:50:24] [13:50:24] systemd-timesyncd.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing. [13:50:24] Unit systemd-timesyncd.service has a bad unit file setting. [13:51:10] I'm currently failing to work out how to get Puppet to actually show me the dratter override file - it currently creates it, systemd barfs on it, then puppet removes it again which is making it hard to see what's in it and work out what I've screwed up [13:51:38] https://puppetboard.wikimedia.org/report/apus-be1004.eqiad.wmnet/42ff29aa968b1708a58dccc1b67b9b52f4315bef is puppetboard in any case [13:52:36] running puppet with --debug should help [13:53:18] I tried puppet agent -tv --evaltrace which produces _copious_ output, but not AFAICT the override file contents [13:54:41] (I thought that might help as wikitech says it shows resources as they are being evaluated); now trying --debug instead [13:54:58] the content you can get it from the puppet repo [13:57:06] volans: I don't think I can? [13:57:31] or at least, it' won't show me whatever garbage is being generated on my failing host (and presumably it's not generally garbage, so there's something wrong on this host) [13:57:38] ah! that's most certainly related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1138905 [13:57:46] iirc that validation is a very new thing, and sounds like it might not actually work properly with override files [13:57:59] i.e. that error sounds like it's expecting a full unit file, and not an override [13:58:11] the systemd-analyze is also run on the overrides, which are not syntax-correct by itself, only in conjunction with the underlying unit [13:58:25] ^ jhathaway, what do you think? [13:58:32] reading... [13:58:43] volans: FWIW --debug still isn't showing me what puppet is putting in the temporary override file its creating [13:59:08] it is e.g. /usr/bin/systemd-analyze verify --recursive-errors=no /etc/sy [13:59:08] stemd/system/ferm.service.d/ferm-service-status-restart.conf20250516-11793-2z687 [13:59:08] r:ferm.service that's failing though [13:59:16] sorry, C&P damage [13:59:33] does the command fail for that file? [13:59:38] but that would fit - on existing hosts these overrides won't be changing, so puppet won't be trying to validate them [13:59:41] or do we not have the contents of the override yet [13:59:52] jhathaway: puppet says so, but it's a temporary file so I can't see what's in it (nor work out how to get puppet to show me) [14:00:01] ugh [14:00:37] do we have the file in existence on another host? [14:00:49] I presume so, give me a mo [14:01:20] yeah, it contains: [14:01:26] [Service] [14:01:26] InaccessiblePaths=-/mnt [14:01:27] [14:01:38] okay, which host? [14:02:06] it's extant on moss-be1001 (and failing on apus-be1004); want me to run system-analyze on the existing override? {doing so} [14:02:13] thanks [14:02:32] you could also get a PCC to get the file content if it's a template, if it's a file is verbatim as it's in the puppet repo [14:03:51] jhathaway: https://phabricator.wikimedia.org/P76271 [14:04:12] volans: this is a thing created by systemd::service (I think!) [14:04:58] jhathaway: that paste matches the error message I'm seeing in the bowels of the puppet output [14:05:17] nod, probably just revert my patch for now, give me one moment [14:05:44] Hi! I'm looking for an existing repo that packages a python package with an entrypoint script as a deb package. I'm currently using poetry but I can fall back to the venerable setup,py method if required [14:07:00] jhathaway: or could just fix the validator to not run on overrides? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1146998 [14:07:05] (seems that https://gitlab.wikimedia.org/repos/sre/conftool/ might be a good place to start) [14:07:24] brouberol: hey! if you want something totally not fancy nor bleeding edge we have a bunch of them standardized with a script to deploy them [14:07:45] thanks taavi, was just trying to confirm verify doesn't work for overrides [14:07:48] (and no, conftool doesn't follow that standard ;) ) I can show it to you and then you decide :) [14:07:57] I can do not fancy and not bleeding edge. I reached out to poetry in development mode as an automatism really, but I don't need it [14:08:05] so, yes please volans! [14:08:16] I'm always lost when entering the debian packaging world [14:08:41] if the wikitech docs need improving, do shout :) [14:08:42] * volans pinging you for more details [14:09:24] jhathaway / taavi : I dunno if it helps, but could you add hosts: apus-be1004 to see if your fix unbreaks my sad computer :) [14:09:36] (and, err, sorry for the Friday hassles) [14:10:12] [ 2025-05-16T14:09:59 ] CRITICAL: Unexpected error running run_host: Unable to find fact file for: apus-be1004.eqiad.wmnet under directory /var/lib/catalog-differ/puppet [14:10:21] sounds like that host isn't yet known to PCC [14:10:44] Emperor: I'll just revert my change for now, and verify if taavi's patch is the only recourse for overrides [14:10:54] OK, cool, thanks :) [14:11:23] taavi: Hm, it's still in the "waiting for successful Puppet run" stage of being imaged for the first time. [14:18:30] jhathaway: puppet runs to completion OK on my host now, thank you :-) [14:18:43] great, apologies for the breakage [14:20:31] installing new hosts can be too easy, nice to have a bit of a challenge ;-) [14:56:09] to close the above loop for python packaging I went through both what a bunch of I/F packages do and wmf-debci on gitlab :)