[08:28:57] akosiaris btullis I'll start a flink (dse-k8s-eqiad) load test between now and 30 mins. It'll run for 4 hours today (give or take). Systems that might see a load increase are mw-api-int (discussed yesterday) and ceph. [08:28:59] I'm monitoring the process, but holler if you spot any issue. [08:30:38] cool, thanks for the heads up. jelto, effie, ^ since you are oncall [08:32:15] ack, thanks. and the ceph load will be on the S3 interface, correct? [08:32:56] thanks for the heads up [08:49:19] marostegui: while auditing stuff in our BMCs we've noted that db2241 has some weird discrepancies in what Redfish API returns for NICs and would like to restart the iDRAC there. I know it's x3 codfw master so wanted to check with you if that's something you think it's ok to do or better to do next time you failover the master to another host? It's not urgent, to be clear [08:49:24] cc XioNoX [08:49:50] and it's only the idrac, so in theory it won't impact the host itself [08:49:59] as you know idrac restart shouldn't affect the host in any way, but you know... [08:50:11] it's "hardware" :D [08:50:26] Yeah, but I don't trust that too much, if they idrac doesn't come back, the master would be without it [08:50:31] """computers""" [08:50:32] So I'd prefer to wait until it is not master [08:50:54] ack, fair enough. Is there a way to be "notified" of when that will be? [08:51:13] like is already planned for any reasons? can I left a note somewhere? [08:51:14] there is also no rush at all [08:51:19] volans: No, but if you create a task, I can get it done :) [08:51:31] nah, it's not worth a master failover jsut for this [08:51:42] let's couple with the next reboot or something [08:51:47] *couple it [09:01:15] btullis correct [09:02:18] btullis you'll also see records produced to kafka-test; the systems looks healthy. The total payload is about 6GB [09:05:10] volans: I'll do it relatively soon anyway for the migration [09:05:47] marostegui: problem found, and it's worse that you think, but easily fixable [09:06:20] ssh db2241.mgmt.codfw.wmnet... console com2 ... db2242 login: [09:06:32] db2241 and db2242 have the mgmt IP/DNS inverted [09:06:33] Amazing [09:06:56] so rebooting db2242 via mgmt would reboot db2241 right now [09:07:26] let me fix it [09:07:48] Hahaha [09:08:32] hold on doing anything via mgmt on db2242 (reboot, firmware upgrade, provision cookbook) [09:10:48] Got it [09:22:41] it might take a while, it seems the hosts are inverted (comparing serial numbers from netbox/accounting and the one reported by the host) [09:23:38] so I need to figure it out all the bits I need to fix [09:24:43] I'll also see if we can add some verification step in the provision cookbook to prevent this [09:53:32] !oncall-now [09:53:32] Oncall now for team SRE, rotation business_hours: [09:53:32] j.elto, e.ffie [09:54:44] jelto: effie I'm going to reboot alert1002 as part of the work on task T395240 [09:54:56] ack [12:00:22] GitLab needs a short maintenance restart in around 30 minutes [12:45:06] herron, jelto, effie, fyi, I'm deploying this change https://gerrit.wikimedia.org/r/c/operations/alerts/+/1155620 it's going to replace an existing p.aging alert, it's a noop from you but mentioning it just in case [12:45:31] noted, thanks for the heads up [15:47:07] Hi folks! Is there anyone here with Ruby experience willing to review or pair review a new script? [15:55:24] <_joe_> cwhite: no one with enough ruby experience wants to review ruby code :D [15:56:02] <_joe_> cwhite: having said that, I can try to help - but I hear jayme is a ruby wizard, just sayin' [15:56:26] if we're in the game of voluntolding other people I have one too :D [15:57:28] <_joe_> I just volunteered myself if no one else will, but my ruby is kinda rusty at this point [15:58:21] <_joe_> volans: I was trolling jayme actually, the poor soul has been fixing threading race conditions in some of the most vicious ruby I've ever written [16:01:12] Thanks _joe_! I'll, add you to the patch. I don't like to inflict pain, so jayme if you're willing, ping me. :) [16:54:45] marostegui: to close the loop here too, all done, db2241 and db2242 BMCs are now properly addressed, all context in T379757 [16:54:45] T379757: Q2:rack/setup/install db224[12] - https://phabricator.wikimedia.org/T379757 [19:28:27] volans: thank you