[14:21:17] XioNoX: so following the usual plan, disabling Puppet on C:bird, roll it out on a few hosts, test and take it from there? [14:21:37] sukhe: yeah exactly [14:22:37] ok starting. [14:23:58] 97 hosts with 'C:bird' that's becoming quite a lot! [14:24:18] yeah, the most recent is the hcaptcha-proxy anycast [14:24:28] XioNoX: "fun" [14:30:09] sukhe: let me know which host you're starting with and I can double check that BGP is coming up properly [14:30:31] XioNoX: dns7001 [14:32:27] (running) [14:34:09] XioNoX: please check dns7001 thanks [14:35:14] 10.3.0.5/32, [14:35:14] 198.35.27.27/32, [14:35:14] 10.3.0.1/32 [14:35:17] BGP is established and looking cleaner than before: [14:35:17] before: Peer: 2a02:ec80:700:1:195:200:68:4+36443 AS 64605 Local: fe80::429e:a402:c78c:6c00+179 AS 4265007001 [14:35:17] After : Peer: 2a02:ec80:700:1:195:200:68:4+179 AS 64605 Local: 2a02:ec80:700:1::1+51947 AS 4265007001 [14:35:32] we should check v6 [14:35:51] bird is advertising 2a02:ec80:53::1/128 [14:36:16] yep, receiving * 2a02:ec80:53::1/128 2a02:ec80:700:1:195:200:68:4 64605 I on the switch [14:36:41] ok awesome. let's check one host in eqiad and codfw as well. doing dns1004 [14:36:42] and I can ping it from the switch [14:36:47] so we're all good here [14:38:25] dns1004 should be a NOOP if all goes well [14:38:49] yep :) [14:39:03] trying one host in esams too, sorry for being overly paranoid. I mean we can always blame topranks if it doesn't go well but yeah. [14:39:29] nah, if we can test other types of hosts that would be great too [14:39:37] in theory it should be a NOOP on VMs [14:39:45] I'm testing codfw routed ganeti [14:39:50] yes please thanks [14:40:47] - interface "eno12399np0"; [14:40:47] - neighbor fe80::8243:3f01:3717:4ec0 external; [14:40:47] + neighbor 2a02:ec80:300:1::1 external; [14:40:59] dns3004, so also expected [14:41:13] sorry it's "blame Arzhel week" pa paul said it.... so you can't blame me :P [14:41:14] good, let me check [14:41:39] topranks: :P [14:41:54] ganeti2033 looks good [14:42:12] nice! [14:42:14] (waiting for confirmation on dns3004) [14:42:50] topranks: already did [14:42:55] sukhe: `2a02:ec80:53::1/128 2a02:ec80:300:1:185:15:59:2` all good for dns3004 [14:43:33] awesome [14:43:42] ok, I think we can roll it out everywhere then? [14:43:46] yeah nice work, glad this didn't turn into a can of worms [14:44:07] my ocd never liked the mis-matched address fams configured either side anyway :P [14:44:30] hahah [14:44:48] so dns is fine and ganeti is fine, anything else we should look at first? [14:44:50] sukhe: I think so, unless there are other important host types we forgot? [14:45:00] maybe cephosd or cloudlb or cloudservice or centralog? [14:45:01] routed_ganeti works, non-routed-ganeti peers with CRs so unaffected? [14:45:08] centrallog for sure I guess [14:45:10] oh yeah forgot about those [14:45:15] non-routed-ganeti doesn't do BGP [14:45:37] XioNoX: ? [14:45:54] you mean not with the ganetis that is? [14:45:57] I will check cloudservices/cloudlb [14:46:03] taavi: thanks [14:46:06] taavi: <3 [14:46:10] sukhe: ? [14:46:17] XioNoX: I meant the VMs on it like doh or whatever [14:46:28] which peer with CRs so shouldn't be affected I believe [14:46:35] yeah [14:46:40] but I guess we can double check those [14:46:45] topranks: ah, yeah, or with their local router, like baremetal [14:47:00] so worth having a look [14:47:31] no changes for those as expected [14:47:33] picking doh1001 in eqiad [14:47:50] ok [14:48:50] NOOP [14:49:12] sukhe: should be a NOOP there, maybe DOH in drmrs as well as it's old ganeti or modern networking [14:49:26] yep, trying that [14:51:26] XioNoX: can you check doh6002 and/or (185.71.138.138/32 and 2001:67c:930::1/128) [14:51:32] thx [14:53:03] er, small problem in drmrs... [14:53:12] oh? [14:53:39] the public vlan gateway is configured as `2a02:ec80:600:102:10:136:1:1` and not `2a02:ec80:600:102::1` [14:54:41] that's the private, but public is `2a02:ec80:600:2:185:15:58:33` and not `2a02:ec80:600:2::1` [14:55:52] topranks, what do you think of adding it with https://www.irccloud.com/pastebin/5gvDUQ6A/ [14:56:07] and then later on removing the old one [14:56:31] servers use the link local as next hop so it shouldn't have any impact [14:57:02] sukhe: I don't think we deviate from our standards anywhere else [14:57:22] XioNoX: +1 from me [14:57:42] I'll sleep so much sounder tonight we're removing all the little things I never liked :P [14:58:20] alright, BGP is up! [14:58:25] but yeah it should be safe, RAs come from the link-local, BGP already peered with LL should remain stable [14:58:58] doing the same for private and then the other switch [14:59:06] XioNoX: one thing perhaps to check, exsiting servers there doing BGP, what next-hop do they see for v6 routes? [14:59:16] shouldn't be the GUA address but just in case [14:59:56] XioNoX: ignore me, we don't use any routes from bgp on the server side [15:00:09] default is link-local, it should be good [15:01:30] done for both racks [15:02:02] XioNoX: ok thanks. can you remind me again why drmrs was the only site different in this regard? I vaguely remember something but I don't remember what exactly [15:02:37] sukhe: I probably decided to be creative when I set it up and used the v4 to v6 mapping we use for servers :) [15:02:44] no particular reason I think. we were following how we did the same mapping we do for servers [15:03:36] ok. [15:03:43] anything else to check I guess? [15:05:02] or should we roll out everywere (-b1 -s5) for C:bird [15:06:02] sukhe: I think we can roll it everywhere [15:07:34] ok [15:07:42] starting [15:07:47] jhathaway, elukey: we're going to upgrade kafka-main in eqiad to kafka 3.7 [15:07:58] thanks jayme! [15:08:09] thanks jayme [15:12:26] \o/ [15:13:45] XioNoX: it will take a while, I am doing -s5 and batch of 1. happy to baby sit this fwiw. [15:13:54] sukhe: sounds good! thx [15:13:58] but if your end of day is near, I can simply push it everywhere else in one go [15:14:37] sukhe: nah it's ok to go slowly, I've manually enabled it where it mattered most [15:18:38] ok