[07:15:24] greetings [08:14:16] is it ok for me to restart neutron-metadata-agent at any time? cfr T410983 [08:14:16] T410983: VM metadata service slow response - https://phabricator.wikimedia.org/T410983 [08:16:03] yes [08:16:10] thank you [09:04:47] morning [09:36:02] godog: did you have a pontoon env with a cloudgw host that I could borrow for a bit? [11:01:51] dhinus: for https://phabricator.wikimedia.org/T409557#11404824 what do you think about pooling those as a separate 'fake' x4 section to allow people to update their code early, similar to what we did for x3 when it was introduced? [11:05:46] yes I think the approach we used for x3 was nice, +1 for doing the same for x4 [11:06:24] ty, will make a patch [11:09:08] thanks! [11:09:27] taavi: yes actually, I did some cloudgw tests a few weeks back, I'm adding a new vm to the demo stack [11:24:24] taavi: I pushed the stack to this branch: sandbox/filippo/pontoon-demo you can follow https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/%2B/refs/heads/production/modules/pontoon/#installation then join the 'demo' stack as per https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/%2B/refs/heads/production/modules/pontoon/#join-an-existing-stack [11:24:39] from the above branch check out that is [11:25:19] currently puppet is failing on pontoon-demo-cloudgw-01.testlabs.eqiad1.wikimedia.cloud though that's to be expected [11:25:36] going to lunch, bbl [12:54:30] published the asciinema demo from last week: https://asciinema.org/a/VKXmLlkwa1olpIPQSOtT6H2rV [12:56:47] * taavi wonders if a mediawiki extension could embed asciinema files in wikitech [12:59:40] oh yeah why not, I just looked up the web player api and seems straightforward https://docs.asciinema.org/manual/player/ [13:08:51] that'd be a non-trivial project so not realistically happening anytime soon, but could still be fun :P [13:10:06] heheh ./append-to-hackathon-ideas.py [14:16:22] hmm, `pontoon join-stack` initially fails on ssh key verification: https://phabricator.wikimedia.org/P85619 [14:23:54] doh, of course! I'll file a task to fix it [14:23:59] thank you for the report [14:26:35] {{done}} T411023 [14:26:35] T411023: pontoon join-stack should ask for puppetserver host key verification - https://phabricator.wikimedia.org/T411023 [14:37:33] volans: I think your patch has broken puppet across all of toolforge k8s [14:37:56] taavi: what? checking [14:38:20] https://alerts.wikimedia.org/?q=%40state%3Dactive&q=team%3Dwmcs&q=project%3Dtools [14:38:46] ah it wants the private patch even if it's absented, my bad fixing right now [14:39:42] no wait, it should still have access to the fake unes from labs/private no? [14:40:34] checking if sync puppet work [14:40:43] yes, but there is also an "Failed to update Puppet repository /srv/git/labs/private on instance tools-puppetserver-01 in project tools" alert which seems related [14:41:50] Could not apply 4a4cb42b6... [local] add k8s/kubeadm encryption key' [14:42:17] what's the way to manually run puppet-git-sync-upstream allowing me to fix the conflict? [14:43:59] `cd /srv/git/labs/private/ ; pgit rebase` [14:44:26] thx [14:46:53] fixed the conflict it's rebasing the rest of patches, not sure why it sayscode deployed and Puppet code cache evicted at each one, I hope it doesn't mean that if puppet is running right now on a host it doesn't get the local changes [14:47:00] that would be bad as a workflow [14:49:02] the whole labs/private handling seems pretty brittle, as we currently have 51 patches on top right now and it seems super easy to create a rebase conflict [14:52:00] rebase completed [14:52:40] puppet now runs fine on tools-k8s-worker-nfs-47 [14:52:56] should I force a puppet run or leave them self-heal in the next 30m? [14:54:21] I think letting the timer deal with them is fine [14:54:55] why do we keep adding patches on top of labs/private? doesn't seem a great choice of workflow [15:15:06] alert cleared [17:39:11] FYI I filed an incident report for the second toolsdb crash last week https://wikitech.wikimedia.org/wiki/Incidents/2025-11-11_WMCS_toolsdb_primary_down [17:39:19] it contains a link o the previous incident from two weeks ago [17:53:10] * dhinus off [18:08:47] FYI infra-tracing-nfs is currently running in toolsbeta (not tracing much given the low activity). If all goes well probably I'll start to deploy it to actual toolforge tomorrow