[07:45:46] greetings [09:17:23] morning [13:36:24] taavi: I'd like to try reimaging cloudgw2002-dev. As far as we know the process is just 1) stop keepalived 2) reimage -- ? [13:37:10] yep [13:37:22] great, let's see how this goes [13:40:23] ok, at least shutting it down didn't interrupt anything. Didn't expect much from the first one. [13:50:38] puppet runs should be fine on trixie FWIW, at least they were a few months ago when I did some prep work in T401899 [13:50:39] T401899: Reimage cloudgw hosts to Trixie - https://phabricator.wikimedia.org/T401899 [13:57:59] godog: oh, cool! I was going to just see what broke after the reimage but you are way ahead of the game :D [13:58:14] going to shortly get started on the k8s upgrade in tools [13:58:53] ack [13:59:41] andrewbogott: heh actually not very much was obviously broken, then I ran into "nftables on cloud vps and puppet" issue taavi and jhathaway have been looking into [14:18:53] toolforge test suite finally passed and the rest of prepare_upgrade is now done [14:19:11] moving on to upgrading the first control node [14:26:39] \o/ [14:30:54] I had to manually depool the api server on non-upgraded control nodes due to the kubelet backwards compat issue seen on toolsbeta mentioned earlier, after doing that the rest of the pods on k8s-control-7 have started back up again [14:30:59] continuing to control-8 [14:36:14] upgrading the final control node [14:41:30] all good, starting worker upgrades now [14:54:37] neat [14:58:22] non-NFS workers all upgraded, NFS workers about halfway there [14:58:55] great [15:12:52] does calico-typha somehow ignore the 'kubectl cordon' taints when scheduling?? [15:18:09] https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1226871 [15:32:27] upgrade is all done I think [15:34:24] andrewbogott: how is the cloudgw upgrade going? [15:35:29] 2002-dev is done and puppet is clean as godog predicted. I have not otherwise verified that it's working yet on account of multitasking [15:35:57] taavi: awesome! [15:37:01] actually, I'm not sure I know how to verify it's working other than by switching off 2003-dev and seeing if everything breaks. Taavi, do you have a better suggestion? [15:37:40] turn keepalived off on the other node, watch IPs flip to the reimaged node, watch traffic flow through the new node, test the flip works in the other direction [15:38:14] I think it's the 'watch IPs flip to the reimaged node' part that I don't know how to do. [15:38:17] where do I watch that? [15:38:30] `ip -br a` [15:39:01] oh, you mean just see the ip get reassigned to the... ok, I see what you mean now [15:41:21] that worked like a charm [15:43:08] I'll reimage 2003-dev [17:07:32] codfw1dev cloudgw nodes are on Trixie now, all seems well. Going to let that sit for a day or two before I do eqiad1. [17:11:27] nice! [17:40:15] * dhinus off