[07:34:23] <godog>	 greetings
[08:19:30] <godog>	 dhinus: I'm looking into the toolforge scheduling failure
[08:21:12] <dhinus>	 godog: thanks!
[11:02:26] <topranks>	 people of the cloud! 
[11:02:32] <topranks>	 how are you doing?
[11:03:12] <topranks>	 hopefully a quick one.  I rolled out a patch that modifies the DSCP marking we use for "low priority" traffic
[11:03:16] <topranks>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1279339
[11:03:24] <topranks>	 this is used on the cloudcephosd nodes 
[11:03:40] <topranks>	 for some reason despite puppet saying it restarts ferm when the file on disk changed, the old rules persist 
[11:03:58] <topranks>	 I've tested on some non-cloud hosts and they needed a manual "systemctl restart ferm.service" to apply the change 
[11:04:43] <topranks>	 so I guess my question is if that is risky/ok to do across the cloudcephosd's?  I'm guessing so, they won't have a lot of other things adding iptables rules in the same way say the openstack nodes might?
[11:08:26] <godog>	 hey topranks, checking
[11:13:36] <topranks>	 I had a chat with Morit z on this, and it makes some sense.  When puppet does a "ferm refresh" it signals to ferm itself to reload the rules (doesn't trigger a systemctl restart)
[11:14:14] <topranks>	 and due to some "ferm fun" with how it parses all the files it doesn't see changes in mangle/postrouting so doesn't do anything 
[11:15:19] <godog>	 I can confirm that's the case, ferm-status thinks there are no changes, but there are
[11:15:44] <moritzm>	 the correct fix is to move cloudceph to nftables :-)
[11:16:09] <godog>	 I ran and verified systemctl restart ferm on cloudcephosd1037 and things check out, so +1 on my end to proceed topranks, or I can do it too
[11:16:19] <godog>	 moritzm: heh, agreed
[11:16:27] <topranks>	 godog: no that is ok I'll take a look 
[11:16:37] <topranks>	 thanks for checking!
[11:16:54] <godog>	 topranks: sure np, thank you for taking care of it
[11:17:08] <topranks>	 cloudcephosd1037 looks good I can confirm it's using the new marking 
[11:17:48] <godog>	 re: cloudceph on nftables that's T361913 just for the record
[11:17:49] <stashbot>	 T361913: Migrate cloudceph servers to nftables - https://phabricator.wikimedia.org/T361913
[11:38:21] <topranks>	 FYI folks that's been done now and all looking good thanks 
[11:41:37] <godog>	 sweet, thank you topranks !
[12:09:23] <godog>	 I'm going to add a new non-nfs worker to toolforge, I think what caused T425696 is cordoning 106 and that pushed cpu harder on non-nfs workers
[12:09:24] <stashbot>	 T425696: restarted pod failed to schedule due to resource constraints - https://phabricator.wikimedia.org/T425696
[12:09:47] <godog>	 cc taavi ^ JFYI
[12:11:24] <godog>	 https://w.wiki/Mt5s this is what I mean, all but one non-nfs worker is > 80% cpu allocated (requests, not limits)
[12:13:55] <taavi>	 godog: non-NFS workloads are still able to run on NFS workers if necessary. so does that mean we don't have a single 3CPU slot anywhere on the cluster?
[12:15:02] <godog>	 mmhh ok I wasn't aware non-nfs can spillover to nfs workers, nevermind
[12:15:45] <godog>	 to your question, I am checking
[12:16:08] <godog>	 https://www.youtube.com/watch?v=QY4KKG4TBFo  "we are checking"
[12:26:46] <godog>	 heh occasionally some tools can't schedule, looks like cpu/mem https://phabricator.wikimedia.org/P92441
[12:26:59] <godog>	 and https://w.wiki/Mt9i
[15:30:33] <andrewbogott>	 Oops, TIL why I wanted to keep sysop on wikitech
[15:30:46] <andrewbogott>	 does anyone here have it still?
[15:32:44] <andrewbogott>	 bliviero: here's the stub of that 'exceptions to terms of use' page https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use/exceptions -- the main thing we're missing is the actual name of the tool.
[15:45:38] <Reedy>	 andrewbogott: Can grant you it again easily enough...
[16:06:19] <bliviero>	 andrewbogott: thanx for the stub!
[16:39:55] <andrewbogott>	 Reedy: at the moment I just want to add a single link, you can grant me sysop or add the link yourself, whatever's easier
[21:00:10] <andrewbogott>	 taavi, regarding that rust build, I opened https://github.com/vexxhost/magnum-cluster-api/issues/1017 and will resort to something desperate/hacky if they don't go for it.