[01:54:54] Krinkle: Thanks for the catch! :) [01:55:45] Hydriz: Hi! - I hope it was alright for me to disable it. [01:56:18] Yep, I noticed about the privacy policy part. I have stepped down from that wiki so I will leave it to the current admins to decide [15:45:59] What is Toolforge's IP address as seen by Wikipedia? [15:47:55] Cyberpower678: currently some private ip in the 172.16.0.0/21 range [15:48:54] Cyberpower678: that might change to the generic egress NAT address (185.15.56.1) at some point in the future, but that would be announced on cloud-announce@lists.wm.o in advance [15:49:09] Majavah: is there any way I can get the current IP. I'm getting complaints that my tool is returning a blocked error message to it's users. The message is being directly passed back from the API. So this error is stemming from the API itself, not the tool. [15:49:50] I need to see if there is some lingering block on one of those IPs that may be causing this. [15:50:23] Cyberpower678: use api.php, action=query&meta=userinfo [15:50:45] What will that accomplish? [15:51:02] that will show you the IP mediawiki thinks you are using [15:51:10] (if you're logged out) [15:52:09] But I'm not. [15:52:15] So that's not going to work. [15:52:27] just make a request logged out? [15:53:04] The tool uses OAuth tokens to make the request. It ALWAYS uses a token to make the requests to the API. [15:53:22] It's designed to require one. [15:53:41] Pretty much a failsafe to prevent it from editting logged out. [15:54:31] I don't see any other options if you want the IP address MediaWiki thinks you're using [15:54:55] If I can SSH into the exec node of the webservice, I can fire off a CURL from within it with that API command. That outta work? [15:55:06] likely yes [15:55:13] there's also action=query&meta=userinfo&uiprop=blockinfo if you want to see the currently applied blocks [16:19:47] !log admin-monitoring deleting 2 leaked VMs by hand: 6aefef6f-0723-499d-895f-314f4804c377 | fullstackd-20210424153344 and af8bc9bd-ea0a-4789-b8dd-cf5cf96c31cc | fullstackd-20210424074938 (puppet check step timed out) [16:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin-monitoring/SAL [16:33:47] Majavah: what command allows me to look at more details regarding a grid engine job? [16:38:22] Hey folks! I'm struggling to use scap and was wondering if anyone might be able to help. It seems that in the middle of running scap deploy, I'm getting hung up on a key failure. [16:38:25] Full output is here: https://phabricator.wikimedia.org/T278723#6995430 [16:38:53] Well from the IABot tool, it's 172.16.7.167 [16:39:40] Cyberpower678: qstat -j [16:40:18] halfak: try manually ssh'ing to that host using the keyholder socket [16:40:58] I can ssh to the host. I'm not sure what you mean by "keyholder socket". [16:41:53] Majavah am I allowed to SSH into the exec nodes? I'm getting a permission denied [16:42:59] Cyberpower678: you should be able to ssh into the grid ones, not to kubernetes nodes [16:45:14] Majavah: Actually I'm not even looking at the right info. That info doesn't tell me what queue it landed in, and the qstat table cuts it off. :-( [16:45:15] halfak: on the deploy host, run `SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh deploy-service@` [16:45:35] hey [16:46:31] "Host key verification failed." would suggest the deployment host does not trust the host key of the target host? [16:48:13] seems to work under my user and yours [16:49:12] Deploy host is the one we're deploying from? [16:49:14] Or to? [16:51:29] from [16:56:06] [pid 23723] execve("/usr/bin/ssh", ["/usr/bin/ssh", "-oBatchMode=yes", "-oSetupTimeout=10", "-oIdentitiesOnly=yes", "-F/dev/null", "-v", "-oUser=deploy-service", "-oIdentityFile=/etc/keyholder.d/deploy_service.pub", "deployment-ores01.deployment-prep.eqiad.wmflabs", "/usr/bin/scap", "deploy-local", "-v", "--repo", "ores/deploy", "-g", "default", "fetch", "--refresh-config"], [/* 28 vars */] [16:56:16] can reproduce by running: /usr/bin/ssh -oBatchMode=yes -oSetupTimeout=10 -oIdentitiesOnly=yes -F/dev/null -v -oUser=deploy-service -oIdentityFile=/etc/keyholder.d/deploy_service.pub deployment-ores01.deployment-prep.eqiad.wmflabs id [16:57:01] if you remove -oBatchMode=yes it prompts: [16:57:03] The authenticity of host 'deployment-ores01.deployment-prep.eqiad.wmflabs (172.16.4.95)' can't be established. [16:57:03] ECDSA key fingerprint is SHA256:gNywH2BdkKg2mU45nnQhMo6HX336cntrTME3iKfbczo. [16:57:04] Are you sure you want to continue connecting (yes/no)? [16:57:28] which begs the question, why on earth isn't it getting the right host key from puppetdb [16:58:13] taavi@deployment-deploy01:/etc/ssh$ cat ssh_known_hosts|grep ores [16:58:13] deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud,deployment-ores01,172.16.4.95,fe80::f816:3eff:fe71:a934 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHJt0taeo9YjPgQe23NfrsqafK1JYmImgporSJmtv7WjUG1wrEW+he1LCeS72e/9xfCXkVcE7SfousoUMmY5GXU= [16:59:28] that is the correct ecdsa, but should it have the rsa key too? [17:00:38] rsa and ed25519. hmm [17:00:45] wait [17:00:55] the key in ssh_known_hosts is .wikimedia.cloud [17:00:58] scap is using .wmflabs [17:01:04] lovely [17:01:41] well spotted [17:02:16] `SSH_AUTH_SOCK=/run/keyholder/proxy.sock /usr/bin/ssh -oBatchMode=yes -oSetupTimeout=10 -oIdentitiesOnly=yes -F/dev/null -v -oUser=deploy-service -oIdentityFile=/etc/keyholder.d/deploy_service.pub deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud id` works [17:02:30] guess we should update the config scap uses to use the new hostname that will match known_hosts [17:03:12] I guess that would be https://github.com/wikimedia/mediawiki-services-ores-deploy/blob/master/scap/ores-beta#L1? [17:03:28] yeah [17:04:06] * Majavah makes a patch [17:05:15] AHA! [17:05:27] I wonder why it worked for elukey [17:06:34] halfak: https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/682258 [17:10:34] might have his own personal known_hosts with it in perhaps [17:11:26] Got it. [17:11:38] Should I clear my known hosts on the deployment-ores01? [17:13:16] It works! [17:13:20] great! [17:16:03] https://codesearch.wmcloud.org/search/?q=wmflabs&i=nope&files=scap%2F.*&excludeFiles=&repos= this is clearly going to cause pain when the new buster based deployment-deploy03 is put into active use [18:05:55] Hey folks. Does anyone know what "Envoy" is and if we can use it in Wikimedia Cloud? [18:06:24] It appears that connections to MW apis internally are configured to go through localhost:6500 and I have a past commit from someone calling that "Envoy" [18:06:31] E.g. https://phabricator.wikimedia.org/rORESDEPLOYe860508bb36d64683434e79e646795530a529c97 [18:09:03] halfak: https://www.envoyproxy.io/ ? [18:09:13] https://wikitech.wikimedia.org/wiki/Envoy seems to describe it. [18:09:20] Looks like there is a puppet role. [18:10:03] Maybe I just need to enable that? I wonder if it will even work in labs. [18:11:54] Oh! Or I could make it not use envoy in wikimedia cloud. [18:13:18] Now if I can figure where the wikimedia cloud custom config is written [18:15:19] This config contains a password which means it should come from a private repo. Any ideas where such private repos live? [18:17:04] halfak, typically labs/private.git is the public version of the private repo that has snakeoil values [18:19:39] thanks Krenair. Still trying to figure out what writes the config file. I'm looking at /etc/ores/99-main.yaml on deployment-ores01 [18:19:59] Searching puppet for some of the strings in the file doesn't get me anywhere. [18:20:14] I suppose those strings are coming from hiera. Do we manage that in horizon these days? [18:20:30] looking at the content... [18:20:30] lock_managers [18:20:35] only appears in modules/ores/manifests/web.pp in puppet [18:20:59] looks like it originates in there [18:21:26] Aha! Looks like I'm getting somewhere. [18:21:31] which calls ores::config with title 'main' and priority 99, which fits [18:22:02] I want to add something to this config that doesn't require any secrets but I want it to only apply in deployment-prep. [18:22:22] well [18:22:30] How would you approach that? [18:22:33] it is technically possible to live hack it in on the deployment-prep puppetmaster [18:22:39] certainly on a temporary basis [18:22:57] but we prefer to avoid all the extra commits that build up there [18:23:40] +1 [18:24:15] so it might be possible to avoid that by proposing patches to operations/puppet.git that modify ores::web to conditionally set that key if provided with certain values which are ultimately pulled from hiera and only ever set in beta hiera, not prod [18:24:28] I guess I could add a variable for the config I want to change and set it in production hiera and deployment-prep hiera. [18:24:33] yeah [18:24:51] Oh nice. What does the syntax look like for optionally pulling a value from hiera if it exists? [18:25:07] well we wouldn't pull directly from hiera in that file [18:25:23] you can already see the existing `if $redis_password {` stuff in that file [18:25:51] modules/profile/manifests/ores/web.pp includes ores::web which would set the parameters that get given to it [18:26:03] you can use lookup() in there to get data from hiera [18:28:22] Oh wait. Dang. This isn't going to work like I hoped. I'd have to set a bunch of hosts -- not just one. [18:28:59] I might have to actually try to get Envoy working on the instance instead. [18:29:42] Any experience with Envoy, Krenair? [18:32:48] Not directly [18:33:05] Can probably take a look later [18:33:25] Thanks! Also this wm-bb is weird :) [18:33:36] I'm going to try to see if I can find the production configuration in the meantime. [18:34:24] Yeah sorry am in the kitchen on my phone, only have telegram, no IRC client. So we talk via bridgebot :) [18:35:22] Production config was often scattered in lots of different places, defaults in various manifests, hiera in different files, etc. [18:40:27] good to see you here, halfak :) [18:40:55] o/ yuvipanda! [18:41:01] \o/ [18:41:03] Good to "see" you too, buddy. :) [18:41:31] :D [18:41:45] I've been continuing my modeling work and stuff since making the staff->volunteer jump. But now I'm roped into maintaining a production service :( [18:42:00] :D unsurprising [18:42:03] I feel like the WMF is very...sticky. [18:42:05] Right. lol [18:42:15] halfak: production ores? [18:42:19] Yup. [18:42:26] fun fun fun [18:42:57] Seems like no one is really maintaining it. We get a few changes merged that bring prod ORES out of sync with beta along with a few force-pushes to downstream repos. Now I'm playing detective with Alex's help :) [18:43:17] And Majavah and elukey <3 [18:43:51] coalition of the willing, as always [18:43:56] No deployments to ORES for ... 10 months now. [18:44:28] when did you leave? [18:44:30] Lots of volunteer requests that I'm handling in my volunteer hours [18:44:35] About 10 months ago lol [18:47:01] OK I've got to head out now. Posted the last bits in the phab ticket. https://phabricator.wikimedia.org/T278723 [18:47:19] Thanks again all who helped me today. I feel very lucky to have y'all around :)