[02:22:04] I've got 2 tasks that are stuck in queue on toolforge, 9999799 and 9999895 [02:22:10] can someone force kill them? [02:41:56] * Reedy looks at the docs [02:47:32] AmandaNP: Done, I think? [02:48:02] !log tools qdel -f 9999895 9999799 [02:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:48:14] thanks [09:54:30] !log admin cleanup conntrack table in qrouter nents in cloudnet1003 (backup) [09:54:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:37:08] !log tools.lexeme-forms deployed 9ad3addd6a (Malayalam verbs, and vocative case for nouns) [13:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [13:41:20] !log tools.lexeme-forms deployed 4619f8cd03 (remove duplicate template) [13:41:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [14:00:04] !log tools.lexeme-forms deployed 1f2a6f2e17 (replace OrderedDict with dict) [14:00:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [14:47:07] I did jstop on job 696621 yesterday and it's been registered for deletion but hasn't stopped since. Any workarounds? [14:51:17] DatGuy: likely caused by the grid issues earlier this week, you'll need a toolforge admin to stop it [14:54:43] DatGuy: if you need a workaround, looks like the job isn't actually running on the node but the grid think it is, so you should be able to submit it with a different name until an admin can force remove it [16:19:19] Majavah it isn't /too/ necessary that I would need to start another instance of the task, but I would like to hopefully fix it today. bd808 I noticed you took care of something similar on phabricator, would you mind? [17:07:32] !log tools.robokobot Force killed job 2141375 per irc request by thib [17:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.robokobot/SAL [17:08:15] !log tools.datbot Force killed job 696621 per irc request by DatGuy [17:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.datbot/SAL [17:08:48] thanks [17:09:43] yw. the grid master freaked out on Thursday and lost track of a lot of jobs. Folks are still finding these "scheduler thinks it's running but its not" cases :( [17:42:51] legoktm: I may have gotten your DKIM record created properly, but I don't have a DKIM selector to test it with. See T278358 for the current state. [17:42:52] T278358: Delegate lists.wmcloud.org domain to be able to add DNS DKIM records - https://phabricator.wikimedia.org/T278358 [17:43:58] !logs tools.stewardbots deploy https://gerrit.wikimedia.org/r/c/labs/tools/stewardbots/+/670326 [17:44:08] !log tools.stewardbots deploy https://gerrit.wikimedia.org/r/c/labs/tools/stewardbots/+/670326 [17:44:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [18:18:46] * bd808 crawls out of a DNS TXT record + DKIM value rabbit hole [18:19:55] * Majavah gives bd808 a ladder [20:32:11] bd808: thanks! The selector is just "wikimedia". I sent a test email and it's still failing DKIM, but it might be because of our gnutls version and the ed25519 key. [20:32:51] I'll need to debug it a bit more to see whether it's the DNS or our setup [20:36:45] legoktm: I think we can use an rsa key too. I left notes on T203035 [20:36:46] T203035: Designate DNS TXT records max length is 255 chars (Horizon reports vague "Error: Unable to create the record set.") - https://phabricator.wikimedia.org/T203035 [20:38:09] Ahhh, got it. I'll give that a shot later today then [20:39:12] Knew we couldn't be the first people to try setting up dkim :p [20:40:56] Thanks for going down the rabbit hole :) [21:06:35] Hi, I'm having problems with my templatehoard tool. I was restarting it several times to test the service.template (specifically whether it would load a new one with `webservice restart`: nope) but now it's returning 503 even though `webservice status` says the server is running. service.log doesn't have anything, and logs/server.err, where the server program logs some information on [21:06:36] requests, doesn't have anything recent. I'm suspecting 503 is from the Toolforge infrastructure but not sure how to confirm it. [21:08:59] Like maybe Toolforge is refusing to run my server because I restarted so many times, but not telling me [21:09:14] and pretending that my server is running even though it isn't [21:09:21] I might be a little paranoid here [21:16:27] Erutuon_: we don't have purposefully misleading error messages [21:17:08] but in this case looking at https://k8s-status.toolforge.org/namespaces/tool-templatehoard/ I can see that your web service is failing to start for some reason [21:21:04] Specifically the ready: 0/1 part? [21:22:28] that, and clicking on the pod says its status is "CrashLoopBackOff" [21:26:12] Thanks... I finally figured out how to access stderr and found that the cause was a missing file