[12:59:40] XioNoX: ready when you are :) [12:59:58] sukhe: cool! [13:01:05] sukhe, that's a big flock of birds https://www.irccloud.com/pastebin/BOetortI/ [13:01:33] yep :) [13:02:11] go for it, I will take care of the DNS hosts checking (we don't want a t-shirt for taking down all ns'es :P) and the doh/durum hosts, centrallog [13:02:19] yeah [13:02:20] cool [13:02:27] the ganetis are all yours [13:02:31] but I am here to share the load [13:04:32] sukhe: all disabled [13:04:39] 100.0% (65/65) success ratio (>= 100.0% threshold) for command: 'disable-puppet 'CR1052109'' [13:04:47] merging [13:05:00] thanks, will start with dns in magru first (just ns2) [13:05:18] cool, will test the routed ganeti in codfw [13:07:35] deployed [13:07:41] I mean merged [13:07:48] thanks. starting. [13:10:38] it didn't even bounce the BGP sessions, so very smooth [13:11:03] yeah, NOOP [13:12:18] show route table inet.0 terse receive-protocol bgp 195.200.68.4 also looks good, 10.3.0.1 and 10.3.0.5 [13:12:28] (and ns2 198.35.27.27) [13:12:38] moving on to eqiad now [13:12:47] sukhe: fyi the change only impacts ipv6 [13:13:12] XioNoX: I know, but bird + nsXes + paranoia :) [13:13:20] yep, no pb [13:13:35] will double check the V6 ones for durum and odh [13:13:36] *doh [13:15:00] trying doh* in parallel. [13:16:18] forcing a restart of bird on ganeti2034 to make sure it comes back clean [13:16:47] yup, all good [13:17:13] nice, doing doh7002 now [13:17:28] then will enable and run agent on the Traffic subset at least [13:18:01] local 2a02:ec80:700:2:195:200:68:39 as 64605; [13:18:01] - neighbor fe80::429e:a402:c88c:8000%ens13 external; [13:18:01] + interface "ens13"; [13:18:01] + neighbor fe80::429e:a402:c88c:8000 external; [13:18:10] ^ expected [13:18:27] yep, looks good [13:20:57] yeah, I forgot to change inet.0 to inet6.0 above so was worried for a second :) [13:21:12] * 2001:67c:930::1/128 2a02:ec80:700:2:195:200:68:39 64605 I [13:22:14] taavi: do you want to test this change on one of the cloud hosts ? [13:22:40] [+1, we should. I had planned to take one of the cloud hosts as well but someone from WMCS is a better fit] [13:24:05] XioNoX: which change? [13:24:24] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052109 [13:24:28] > Bird: use the "interface" config option for v6 peers [13:24:45] v4 should be a NOOP but can't hurt to double check. [13:24:59] that should be a noop for our hosts but I can test to be sure [13:26:28] testing syslog (also v4 so checking just one host) [13:28:49] all good [13:29:03] I am proceeding with merging this for all Traffic hosts in batches with some suitable interval set [13:29:21] XioNoX: sounds goods to proceed? ^ I am OK with moving forward [13:29:29] sukhe: XioNoX: all seems fine on my end [13:29:33] thanks taavi [13:30:29] cool, thx [13:33:59] centrallog is noop as v4 only [13:34:05] yep [13:34:37] I checked that in case you wanted to try (see syslog above. syslog/centrallog) [13:35:00] ah yeah [13:35:12] and yes +1 for all traffic hosts [13:35:20] all hosts actually [13:35:27] as ganeti is good too [13:35:45] ok great. happy to enable puppet and run it everywhere. [13:37:03] just curious, can we use a hiera key as cumin selector? for example `profile::bird::do_ipv6` [13:41:49] if it's used in a class/define technically yes [13:41:53] as a parameter I mean [13:41:57] like in a profile [13:42:56] https://github.com/wikimedia/operations-puppet/blob/cf3a3af259e7b4d2174082fa838a0e832a3520d5/modules/profile/manifests/bird/anycast.pp#L18 for example [13:43:22] sudo cumin 'P:bird::anycast%do_ipv6=true' [13:43:44] exactly :) [13:43:53] ohh, nice [13:43:53] as documented in https://wikitech.wikimedia.org/wiki/Cumin :-P [13:43:54] emoji in cumin aliases? [13:44:00] :D [13:44:17] I'd love to, but I'm sure some people would get mad :) [13:45:15] cephosd and cloudservices are the only ones I see, in addition to doh and durm which we have checked [13:45:26] but taav.i checked those, so I am assuming they are good. [13:45:39] cephosd -> cloudceph storage [13:46:51] I will check one cephosd host, just in case. [13:47:10] no, our hosts are cloudcephosd* :-) [13:47:19] fun [13:47:41] cool, checking the above then. [13:47:48] yeah, I tried to suggest a less confusing name for the other cluster, but was too late in that process back then [13:51:31] NOOP there too (only in eqiad/codfw so less surprising) [14:04:44] XioNoX: rolling it out everywhere now [14:04:52] rgr! [14:04:54] DNS hosts done and those were the ones really I was most worried about [14:34:06] XioNoX: all done. no unexpeced changes. NOOPS elsehwere. thanks! [14:34:13] sukhe: awesome, thanks a lot! [14:55:08] taavi: how do I do https://phabricator.wikimedia.org/T396985 ? [14:56:58] XioNoX: 1) send a quota request for that for the records (https://phabricator.wikimedia.org/project/view/2880/), and once that has been done, 2) go to network -> floating ips on horizon, allocate an IP address, and assign it to the diffscan instance [15:00:39] taavi: https://phabricator.wikimedia.org/T396884#10918991 let me know if you disagree (cc jhathaway) [15:15:55] taavi: https://phabricator.wikimedia.org/T397059 [15:17:00] brett, swfrench-wmf: I'm planning to decommission and reimage one of the sessionstore hosts (2004) shortly. I don't expect any issues, but want to give heads up in case Murphy. [15:20:32] XioNoX: you should be set for that now [15:21:15] urandom: ack, thanks! [15:21:32] taavi: done [15:23:13] thank you! [16:08:04] does anyone know/remember off the top of their heads, how to get to the various installer consoles when connected via the drac? [16:08:43] console com2 [16:09:14] yes, but if d-i is running and I want to switch to its virt console for logging (4)? [16:09:38] because partman failed (because it did) [16:09:47] ctrl a ] [16:10:52] ctrl a N maybe, it it screen [16:10:57] * urandom sighs [16:11:35] that actually worked, but anything useful would have scrolled off... because of what I guess is ssh login noise from the cookbook [16:11:43] yeah, it's tricky [16:12:09] ctrl a N is exactly right [16:12:11] sukhe: thanks [16:12:24] so what do you do in this situation.... run the installer again? [16:12:59] frustrating... I've tested this partman recipe elsewhere and it fails in the least convenient way to debug possible [16:13:10] urandom: define failed, did it stop at the d-i step or it went through but with a wrong setup? [16:13:21] volans: the former [16:13:23] urandom: so I just try to scrape the logs to see if I can grep something but it is usually hard, yes. there is a lot of noise. [16:13:28] you should be able to get a shell [16:13:33] dmesg or /messages output [16:13:33] and check the logs [16:13:47] /var/log/installer [16:14:05] !? [16:14:11] there is no installer [16:14:29] you said the former "did it stop at the d-i step" :D [16:14:41] so I assumed it's still in d-i [16:14:48] it is! [16:14:54] it failed at partitioning [16:15:16] then could be just /var/log/syslog inside sorry [16:15:17] there is a /var/log/partman and /var/log/syslog [16:15:25] I might have confused it to the location on teh installed system [16:15:59] you can also log into the d-i environment via SSH from a cumin host: "ssh -4 -i /root/.ssh/new_install -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no HOSTNAME" [16:21:57] https://www.irccloud.com/pastebin/wK8phnoA/ [16:23:24] too small? [16:24:01] the root partition I guess? [16:24:12] https://www.irccloud.com/pastebin/qJWcfDfV/ [16:24:22] 249856 blocks [16:24:31] the "bios" one [16:24:39] oh, right [16:24:59] it's the only 256 I see [16:25:10] although it says 254... [16:25:12] not sure then [16:25:21] I would call the partman expert but he's AFK [16:25:54] who's that? g.odog? [16:26:35] lu.ca [16:26:38] md0 is 248832 [16:26:45] but he will hate me for this :D [16:27:50] yeah, it tried to make the raid on the little bios partition [16:30:58] wow, I have no idea where to go with this. I tested this on g.odog's test harness, and then rolled it out to real hardware (cassandra-dev cluster) [16:32:58] oh hell. [16:33:09] 🤦‍♂️ [16:33:30] sorry I have to step out now [16:37:40] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1159530 [17:01:31] moritzm: I used the wrong preseed, which makes that error totally understandable, but I just fixed it, ran the puppet-agent on A:installserver, tried again, and got the same error (the same cause)... is there something else that may be needed after a change like that? [17:06:13] that should be all that is required AFAIK at least [17:07:58] thanks [19:06:26] fyi, today's sessionstore work is done, so any subsequent alerts, etc, are not expected [19:38:21] ack, thanks! [19:39:51] last two durum hosts being reimaged so more of those alerts as wwll [19:39:55] *well