[07:17:27] greetings [08:57:31] re: keepalived vs bird, if I understand correctly keepalived is used with novaproxy and cloudgw nowadays (?) was converging on bgp (in production, not toolforge) something that was discussed already ? [09:07:23] morning [12:02:05] I checked the alert MaintainDBUsersManyErrors, it's complaining about tools.graphbot-frontend [12:02:15] the tool is disabled according to https://toolsadmin.wikimedia.org/tools/id/graphbot-frontend [12:02:32] so I'm not sure why maintain-dbusers is still trying to create a db account for it [12:11:01] dhinus: related to our rotation? [12:11:46] or something the user has done? I saw they replied on github (thank you for finding it) and then also on Phab [12:12:19] volans: it's a different tool, graphbot-frontend vs graphbot [12:12:29] apparently this one was created last night and disabled shortly after [12:12:46] maybe an experiment and they changed their mind? not sure [12:12:47] yes but given the similarity I'm wondering if it's the same user and somehow connected [12:12:51] ack [12:12:56] I opened T414452 [12:12:56] T414452: maintain-dbusers tries to create user for disabled tool - https://phabricator.wikimedia.org/T414452 [12:14:19] k [12:26:06] didn't a.ndrew fix that exact bug over the break? [12:32:56] taavi: there was another error in the logs but slightly different. this one started last night [12:45:25] this one seems to be a race condition between maintain-dbusers and maintain-kubeusers that we should be handling better (see comments in task) [12:47:38] wdym by a race condition? [12:47:50] maintain-dbusers should not be trying to create credentials for disabled users at all [12:49:45] the error is because maintain-dbusers is trying to read .kube/config for that tool, but the file is not there [12:50:05] maybe the tool was disabled immediately after creation, and maintain-kubeusers didn't have time to run and create that file? [12:50:14] maintain-dbusers has no reason to read that file for disabled tools [12:50:36] agreed, and in general I think it should not crash if the file gets deleted for some reason [12:51:01] I think maintain-dbusers at the moment treats disabled tools as normal tools [12:51:28] skipping disabled tools is probably an easy fix, although it could still crash in case the file is missing on an active tool [12:51:40] (which should not happen, probably) [12:53:15] anyone has more details on these uses of dnsmasq in cloud? https://phabricator.wikimedia.org/T396864#11515907 I would be easiest if we simply had one unique dnsmasq version, wondering if there's a simply way to test these with 2.93-rc3 [12:54:32] dhinus: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1226231/ [12:55:39] thanks, +1d [12:55:50] moritzm: neutron uses that on cloudnet hosts to tell instances about their assigned IP addresses, we could test upgrading the codfw1dev nodes and making sure instance creation still works [12:59:27] taavi: shall I patch maintain_dbusers to query that LDAP attribute, or are you already doing it? [12:59:50] not sure I understand that question [13:00:29] I think your patch makes an attribute available (pwdAccountLockedTime), but we are not reading it, are we? [13:00:50] I thought we could use it to determine whether to skip a tool in maintain-dbusers, if it's disabled [13:01:03] thanks, I'll build 2.93-rc3 for trixie and will get back [13:01:44] dhinus: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1221311 is the fix from last week I was referring to which tried to do that, my patch made the fix actually do something [13:02:08] ahhh sorry, I did not "git pull" so I was seeing an old version of maintain-dbusers :( [13:10:33] although the way the current check was done means it's now constantly logging 'Found 1 new tool accounts (tools.graphbot-frontend)' so maybe I need to patch the check to happen earlier [13:17:50] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1226243/ [13:22:40] thanks, I was writing a patch to add a log line "{account_name} skipped because disabled", but I like your solution more! [13:36:52] * andrewbogott waking up [13:37:03] dhinus, thanks for patching my patch [13:37:15] oh, or taavi [13:37:22] * andrewbogott not all the way awake [17:36:53] dhinus: if you're cleaning up old 'in progress' tasks back to open, please also reset the assignee at the same time [17:37:06] taavi: good point, will do! [17:37:15] ty! [17:37:28] I'm only clearing the ones that were in the "wmcs-current" board btw, so I can archive the board [17:38:03] (I can archive the Q1-Q2 board I mean, not the current Q3-Q4) [17:53:35] the Q1-Q2 board is now archived, I moved a small number of tasks to Q3-Q4, and the others back to the Inbox [18:17:15] * dhinus off