[10:57:28] Krenair: let me dig around and have a check [10:57:43] I *think* it can die (actually I thought I had killed it) [12:17:08] !log admin [codfw1dev] enabling puppet in cloudnet200x-dev servers after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/579259 (T247505) [12:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:17:11] T247505: CloudVPS: neutron: consider dropping routing_source_ip custom hack from the l3 agent - https://phabricator.wikimedia.org/T247505 [12:39:49] !log admin [codfw1dev] reintroduce address scopes for another round of testing T244851 [12:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:39:52] T244851: Neutron: replace NAT customization with address scopes - https://phabricator.wikimedia.org/T244851 [18:12:05] Hi [18:12:43] IRC-Source_34: hello [18:13:06] Gopa Hey [18:15:41] Hello Wikimedia Cloud service support team, [18:15:41] we're currently working on moving the video cut tool ( https://commons.wikimedia.org/wiki/Commons:VideoCutTool ) from toolforge to CloudVPS. We were able to successfully install the tool on the instances there though we're having trouble exposing the tool's to a public URL, is there any instructions for this setup [18:23:43] create a proxy [18:30:05] Gopa: https://wikitech.wikimedia.org/wiki/Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet [18:30:35] We are facing problem in creating proxy server [18:30:48] and the problem is? [18:32:20] We are not able to create the proxy with [18:32:20] DNS name: videocuttool.wmflabs.org [18:32:20] PORT: 80 [18:32:20] instance: VideoCutTool [18:32:21] DNS name: videocuttool.wmflabs.org [18:32:22] PORT: 443 [18:32:22] instance: VideoCutTool [18:32:23] It is returning "Error: Duplicate RecordSet" [18:37:11] Gopa: https://openstack-browser.toolforge.org/project/videocuttool -- you have a proxy created already that points to port 4000 on the videocuttool.videocuttool.eqiad1.wikimedia.cloud instance [18:38:39] You are very likely missing a security group rule allowing traffic into port 4000 on that instance -- https://wikitech.wikimedia.org/wiki/Help:Security_groups [18:39:20] Gopa: https://wikitech.wikimedia.org/wiki/Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet -- should have all the information you need on how to use the proxy system [18:39:40] * Gopa going through all the above resources... [18:40:03] * bd808 steps away for lunch [19:12:01] i've got some questions about how the web proxy works [19:12:22] it seems to be munging the Host header on parsoid-beta.wmflabs.org [19:32:36] hi cscott [19:32:41] what's up? [19:32:58] trying to figure out why http://parsoid-beta.wmflabs.org/ isn't working [19:33:05] alright [19:33:12] let's see [19:33:34] http://172.16.1.115:80 [19:33:49] deployment-parsoid11.deployment-prep.eqiad1.wikimedia.cloud. [19:34:44] root@deployment-parsoid11:/etc/apache2/sites-enabled# grep parsoid-beta * [19:34:45] root@deployment-parsoid11:/etc/apache2/sites-enabled# [19:34:48] (and thus solve the mystery of T247589) [19:34:48] T247589: Parsoid-PHP should be publicly accessible in beta - https://phabricator.wikimedia.org/T247589 [19:34:57] cscott, looks like the instance might not be confused to listen for this host? [19:35:18] well, generally mediawiki instances are particular about the Host parameter [19:35:37] really? [19:35:44] in wikimedia's setup surely the Host header is the most important thing [19:35:46] but cscott@deployment-parsoid11:~$ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://deployment-parsoid11/wiki/Special:Version [19:35:55] you'd need it to determine what DB name to use and what config to load [19:36:02] ^ but that works locally.... but not from outside parsoid11 [19:36:38] eg [19:36:40] $ curl -H 'Host: en.wikipedia.beta.wmflabs.org' https://parsoid-beta.wmflabs.org/wiki/Special:Version [19:36:47] fails from outside [19:36:55] well yeah, the proxy won't know anything about en.wikipedia.beta.wmflabs.org, that can't possibly work [19:37:06] the working theory over in -releng was that the proxy was munging the Host header [19:37:48] i was thinking the proxy was just a dumb tunnel, but i guess it's doing more [19:37:49] cscott: things in deployment-prep should be behind it's varnish, not the shared Cloud VPS front proxy [19:38:01] well [19:38:08] we do have a bunch of things using the shared Cloud VPS proxy too [19:38:19] cscott, it is fairly dumb [19:38:33] the Cloud VPS front proxy is an nginx deployment that does host header matching via a redis lookup hash [19:38:34] some of the pet owners here might disagree :P but good catch - https://wikitech.wikimedia.org/w/index.php?title=Template:Cloud_VPS_nav&curid=444620&diff=1860145&oldid=1854805 [19:38:45] it will route things based on the host header... in your example you override that from the default parsoid-beta.wmflabs.org to en.wikipedia.beta.wmflabs.org - which it cannot possibly handle [19:38:59] cscott@deployment-parsoid11:~$ curl -x deployment-parsoid11:80 http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version [19:39:14] ^ parsoid11 also works via the http proxy protocol, fwiw. probably not useful. [19:39:34] probably not, no [19:39:47] bd808: so mobrovac_ apparently configured varnish to redirect anything /w/rest.php over to parsoid-php at one point in the past [19:40:10] which i'd like *not* to do, because /w/rest.php in beta should ideally work the same as it does in prod, not get hijacked [19:41:13] so one option i guess is to override the MWScript host matching on the parsoid deploy in beta, to treat parsoid-beta.wmflabs.org as a synonym for en.wikipedia.beta.wmflabs.org ? That seems pretty ugly. [19:41:25] one option to achieve what, exactly? [19:41:35] or we can go back to varnish and make /w/parsoidhack redirect to parsoid11, which also seems pretty ugly [19:41:46] that does not sound valid [19:42:17] cscott: how does this stuff work in production? What layer routes to the parsoid-php cluster? [19:42:24] Krenair: ultimately to make https://github.com/wikimedia/restbase/blob/master/config.frontend.test.yaml work [19:42:33] It has to be varnish somewhere right? [19:42:35] since right now I've apparently broken restbase CI [19:42:46] restbase configs don't mean much to me [19:42:56] note that all of the other test servers listed use some verion of -beta.wmflabs.org [19:43:06] "restbase configs don't mean much to me" -- timeless statement [19:43:19] but parsoid has this crazy hack that *used to be* via redirecting /w/rest.php from *.wikipedia.beta.wmflabs.org [19:43:47] presumably via something like [19:43:55] https://horizon.wikimedia.org/project/instances/eed81e86-2874-4740-a8c2-fee29ced046d/?marker=06c8f1f1-2070-4d05-b97b-386c6bf5636b [19:44:00] I still don't know what we're trying to achieve. [19:44:04] although the exact configuration got lost [19:44:18] restbase wants to access the parsoid cluster in beta from CI in travis [19:44:49] CI in travis? [19:44:57] I thought CI happened in Jenkins [19:45:00] yes, alas. [19:45:27] they use github instead of gerrit, too. [19:45:36] so restbase is different from everything else? [19:45:38] ok [19:45:41] so what's the problem exactly? [19:45:54] but even from jenkins I don't think it would work, i don't think jenkins can reach deployment-parsoid11 directly either [19:46:00] At least it's not deployed using ansible anymore :) [19:46:33] TIL RB was deployed with Ansible [19:46:42] yes indeed, it would not be able to [19:46:51] bd808: basically they want to be able to reach a parsoid server in beta from "the outside world" so their restbase tests can verify that restbase can interact with a parsoid service correctly [19:47:15] and what does interacting with a parsoid service correctly look like exactly? [19:47:21] The jenkins worker instances are in the integration project. It would be possible to open security group rules to allow integration to route directly to deployment-prep [19:47:26] it's invoked from `npm test` so in principle the test needs to work whether invoked from your own laptop, jenkins, or travis. [19:47:29] I'm several years out of date with this stuff [19:48:02] Do you use a single parsoid hostname? [19:48:02] for everything else restbase supports, a web proxy in cloud seems to work just fine [19:48:08] Do you have some special route under the normal wiki domains? [19:48:13] there's a web proxy for citoid-beta.wmflabs.org, for example [19:48:30] the problem is that parsoid-integrated-with-core (which is what we're running now) cares about the Host header [19:48:59] old-parsoid had the desired wiki target included in the URL, didn't use the Host header for this [19:49:13] ok [19:49:17] it had it in the path [19:49:18] but /w/rest.php dispatches through MWScript.php now. [19:49:19] cscott: *nod* and the issue is that you also need that to come into the parsoid-php instance with a Host header that matches a db config I guess. Which is not going to happen via the shared Cloud VPS proxy as the DNS entries for *.beta point to the varnish cluster in deployment-prep [19:49:29] so what's the problem with it sending requests to the wiki with the right path, having varnish route it to parsoid? [19:50:05] bd808: well, that's the question -- which evil hack is best? i can do an evil hack to teach deployment-parsoid11 to treat 'parsoid-beta.wmflabs.org' as a synonym for en.wikipedia.beta.wmflabs.org or something [19:50:24] please don't set enwiki as the default for anything [19:50:30] or i can create a varnish redirect like /w/parsoidhack that would pass through the 'right' Host from the URL [19:51:04] I take it this is a problem we don't have in prod because we're not trying to run tests against the parsoid API directly from external there? [19:51:25] there used to be a varnish redirect from /w/rest.php apparently, but it (as far as I can tell) was made only in Horizon, not puppet git, and was lost sometime before changes made in Horizon got mirrored into repo commits [19:51:32] Krenair: yes. [19:51:52] and it's not a "parsoid" problem. we're running fine & can test fine. it's just a problem that I broke restbase's CI [19:52:06] so we could give -parsoid11 its own floating IP, a security group rule that permits whatever external CI system you have, and have it get on with it [19:52:40] we'd presumably prefer not to hide it behind some mediawiki URI because that would be a different API to real parsoid which is what you're trying to test? [19:53:23] Krenair: well yes and no. i didn't like hijacking any "real" prod URL (like /w/rest.php) because that seems to be a recipe for future confusion [19:53:43] yeah I'd very much like to avoid that if we're not going to do it in prod [19:54:08] so i don't mind /w/parsoidhack or something like that, to make it obvious that this is a weird thing just for beta [19:54:15] /w/restbase-ci-needs-this [19:54:25] I don't like the idea much anyway [19:54:34] alternatively we could open some security group rule to let the integration hosts access it, and dealing with the consequences of any non-wikimedia CI system is someone else's problem [19:54:57] but we could also fit parsoid11 a floating IP & etc. I'm not 100% sure that restbase's CI passes along the correct Host header, but if it doesn't I'm sure that can/should be fixed on the restbase side, because that's how parsoid works now. [19:54:59] but of all these I'd prefer to just give the instance it's own floating IP and let travis talk to that [19:55:41] can we still use a CNAME? [19:56:06] we can give it any name you like that's not already taken [19:56:13] and is under our domain hierarchy [19:56:14] ie make parsoid-beta.wmflabs.org resolve to the IP which deployment-parsoid11 is using? [19:56:15] within reason [19:56:22] yeah we can do that [19:56:36] probably would be an A record rather than a CNAME but yeah [19:56:56] ok, because we're up to 11 parsoid servers now, and it was a pain to switch everything from parsoid10 to parsoid11 and if at all possible I'd like to reduce the number of places 'parsoid11' appears externally. [19:57:04] or internally, for that matter [19:57:10] uh [19:57:13] we're not running 11 [19:57:28] no, i mean this is the 11th server we've set up for parsoid [19:57:29] that's just how many different instances we've gone through since starting to run parsoid in deployment-prep, presumably [19:57:30] AFAICT [19:57:34] yeah. [19:57:41] over what, like 7-8 years? [19:58:08] i'm just saying the number 11 still shows up in a bazillion different places, when ideally i could change one entry for "current active parsoid server" and that would be it. [19:58:16] for connecting in externally I would not ask you to use internal hostnames [19:58:18] prod side has the discovery mechanism [19:58:28] Krenair: i think we're on the same page [19:58:30] cool [19:58:35] do you need TLS for this thing? [19:59:09] if it's not too hard. [19:59:18] erm [19:59:20] everything else is set up with TLS [19:59:29] citoid-beta.wmflabs.org etc [19:59:39] yeah but everything else goes through the trusted proxy [19:59:47] I'm not allowed to take the *.wmflabs.org cert and stick it in deployment-prep [19:59:49] yeah. skip tls then. [20:00:18] this is a weird special thing for weird special external things to talk to [20:00:33] i kind of liked the 'make parsoid-beta.wmflabs.org resolve to a hardcoded wiki' just because it would keep anyone from abusing this as an actual service [20:00:54] nah [20:01:01] if you hardcoded it to enwiki people might still do that [20:01:28] yeah, i need to set it to the en-rtl wiki ;) [20:01:29] hard-code it to afwiki. [20:01:35] Or deploymentwiki. [20:01:55] it's also not necessarily easy to put that hack in place anyway [20:02:30] yeah [20:02:49] anyway, i think i know how floating IPs work in horizon, let me set that up and see if i can make all the pieces work. [20:03:00] 1 step ahead of you [20:03:10] this is 185.15.56.9 [20:03:19] that solution is straightforward with the least amount of magic to surprise future devs with [20:04:27] $ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-beta.wmflabs.org/wiki/Special:Version [20:04:27] do we know the source IPs that travis will be using? [20:04:42] doesn't work quite yet [20:04:50] yeah that's several steps away [20:04:54] one step at a time [20:05:02] Krenair: `npm test` on a developers laptop would use any IP [20:05:23] we've got the IP assigned, we need to allow ingress, then we need to (or at least would like to) set up a DNS name for it [20:05:27] ah [20:05:31] so this is for the world [20:05:45] that's probably okay in this case, parsoid won't serve anything sensitive will it? [20:05:54] i'd lock it down backed on port and then perhaps in software wrt only allowing certain routes [20:06:09] Krenair: only stuff on beta wikis [20:06:36] that should be fine [20:08:33] cscott, how do you feel about using port 8001? [20:08:51] I've just realised this box has ferm enabled and that's only permitting 8001, not 80 [20:12:45] and what about naming it "parsoid-external-ci-access.beta.wmflabs.org" ? [20:13:00] both of those are fine [20:13:25] cscott, alright try `curl parsoid-external-ci-access.beta.wmflabs.org:8001` [20:13:35] i'm going to remove the parsoid-beta.wmflabs.org proxy, since that Won't Work [20:13:54] thanks [20:14:14] $ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-external-ci-access.beta.wmflabs.org:8001/wiki/Special:Version [20:14:32] gives me "
Cannot GET /wiki/Special:Version
" [20:14:40] which is... at least a different sort of error? [20:14:49] < X-Powered-By: Express [20:14:57] I think this is coming from the real parsoid app? [20:15:11] oh, yeah, there's still a parsoid/js instance on port 8000 or 8001 [20:15:18] i think maybe your destination port is 8000 or 8001? [20:15:24] should be 80 for parsoid-php [20:15:37] so this is where it gets problematic [20:15:42] we'll get rid of the parsoid/js service soon-ish, but hasn't been done yet [20:16:20] we'll need a puppet change of some sort (could theoretically be hacked into our puppetmaster, ugh) to get the firewall rule there that permits external access to port 80 [20:18:38] firewall rule on deployment-parsoid11? [20:19:16] i thought i'd already added perms to reach port 80 as part of the web proxy config [20:19:51] iptables configured by ferm configured by puppet [20:20:05] on parsoid11 [20:20:07] yep [20:20:08] ? [20:20:11] ok [20:20:16] just making sure i understand [20:20:17] alright [20:20:26] i can deal with puppet for parsoid11, it's all in horizon right now anyway [20:20:30] try `curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-external-ci-access.beta.wmflabs.org/wiki/Special:Version -v` [20:20:50] (cf T247480) [20:20:51] T247480: Sync parsoid11 config from Horizon back to puppet git - https://phabricator.wikimedia.org/T247480 [20:20:53] I added a commit to deployment-puppetmaster04:/var/lib/git/operations/puppet and ran puppet on -parsoid11 [20:21:29] also ew [20:21:38] $ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-external-ci-access.beta.wmflabs.org/wiki/Special:Version [20:21:42] works, yay [20:21:52] I don't like the deployment-prep hieradata living in puppet.git [20:21:58] pretty much out of principal [20:22:08] have been tempted several times to move it all to horizon [20:22:31] fwiw i like it living in git because i can 'git grep' to find all the places parsoid11 is mentioned (eg) [20:22:50] but it seems like there's now a secondary git for horizon? [20:23:18] yeah but you can git grep through the repo that horizon backs everything up to [20:24:12] and not have to wait for a prod root to say it's okay to do what you've already done in labs [20:26:44] anyway [20:27:30] it works [20:27:41] ok, let me see if i can use this to fix up restbase's CI now [20:28:06] Krenair: you want to take a shot at T247545 while you're fixing things? [20:28:06] T247545: Fix scap on deployment-parsoid11 - https://phabricator.wikimedia.org/T247545 [20:28:20] hm [20:28:36] James_F might be able to tell you more about that, the bug description i just copied from his IRC chat to me [20:29:06] Krenair: I needs the scap key or something in the keyholder? Or something. [20:29:14] uhhh [20:29:18] But I didn't see in the hieradata where to set that? [20:29:23] okay scap is different from how I remember [20:29:24] (Going off comments from hashar.) [20:29:26] how do I run this thing again [20:29:30] `scap pull`. [20:29:38] Or `scap sync` from the deployment server. [20:29:40] from the deployment host? [20:29:41] right [20:29:48] sync not deploy... great [20:33:11] alright gave up with that, went straight for ssh with SSH_AUTH_SOCK [20:33:24] confirmed it can log into deployment-mediawiki-09 but not deployment-parsoid11 [20:33:35] auth.log on parsoid11 shows it's not accepting the keys provided [20:34:09] gotta love deployment-salt references sticking around in 2020, that instance probably hasn't existed for the last 4-5 years or something [20:34:20] * James_F grins. [20:34:45] hey, don't make fun of my known_hosts file [20:35:07] interestingly /etc/ssh/userkeys/mwdeploy is identical between -parsoid11 and -mediawiki-09 [20:35:41] Mar 13 20:33:57 deployment-mediawiki-09 sshd[14391]: Accepted publickey for mwdeploy from 172.16.4.18 port 56132 ssh2: RSA SHA256:nUV3qf86EbG/cslV8H2DkV2upw8CGoIgqYYH2UPK7QE [20:35:48] Odd. [20:36:11] interestingly that fingerprint doesn't come up in -parsoid11 logs [20:36:25] oh no there it is, just later on [20:36:37] Mar 13 20:35:58 deployment-parsoid11 sshd[16243]: pam_access(sshd:account): access denied for user `mwdeploy' from `172.16.4.18' [20:36:39] don't like that [20:36:55] that'll be a /etc/security/access.conf.d/ thing I bet [20:37:24] krenair@deployment-mediawiki-09:~$ ls -lh /etc/security/access.conf.d/60* [20:37:24] -r--r--r-- 1 root root 39 Nov 20 2018 /etc/security/access.conf.d/60-scap-allow-mwdeploy [20:37:32] root@deployment-parsoid11:~# ls -lh /etc/security/access.conf.d/60* [20:37:32] -r--r--r-- 1 root root 33 Mar 7 04:36 /etc/security/access.conf.d/60-scap-allow-deploy-service [20:37:39] Oh, interesting. [20:37:41] yeah that isn't gonna work [20:37:45] That's the scap3 key. [20:37:55] uh yeah, username but yeah [20:37:55] Yeah, how did it screw up that badly? [20:38:14] these files just control what username can log in from where [20:38:20] Or, rather, how did this not break production. [20:38:28] this is a control we only have in labs [20:38:33] Ah. [20:38:37] That'd explain. [20:38:42] whereas production uses the admin module to ensure the right users exist on the right hosts... [20:38:49] * James_F nods. [20:38:59] in labs, we have LDAP, and some PAM config that uses these files to let the right set of users (usually project members) log in [20:39:07] plus, for special local users, some special local config [20:39:14] here, parsoid11 has config for the wrong user that won't permit mwdeploy [20:39:57] this'll be some puppet lurking somewhere... [20:39:58] * Krenair digs [20:41:00] alex@alex-laptop:~/Development/Wikimedia/Operations-Puppet (production)$ git grep security::access::config | grep mwdeploy [20:41:00] modules/profile/manifests/beta/mediawiki.pp: security::access::config { 'scap-allow-mwdeploy': [20:41:20] that's the sole purpose of that file, just to add a Security::Access::Config [20:41:47] and the sole purpose of modules/role/manifests/beta/mediawiki.pp is to include it [20:41:56] so I'll just stick the role::beta::mediawiki role on this instance and run puppet [20:43:40] Notice: /Stage[main]/Profile::Beta::Mediawiki/Security::Access::Config[scap-allow-mwdeploy]/File[/etc/security/access.conf.d/60-scap-allow-mwdeploy]/ensure: defined content as '{md5}914fad6350bb0d689067d57867a7964d' [20:43:58] and now, I can `SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-parsoid11` from deployment-deploy01 [20:44:04] James_F, cscott: so scap should work now probably ^ [20:49:56] Brilliant. [20:50:11] Can we remove the hack dropping parsoid11 from the scap target list? [20:50:55] yes, though off the top of my head I don't remember where the hack lives [20:51:23] oh it's just a puppet cherry-pick [20:51:56] dropped [20:52:22] gotta love how in 2020 I can sign into gerrit and start doing stuff without any 2FA [20:52:59] :-( [20:53:14] at least you don't have to have your temperature taken [20:53:20] that's all the rage in access control these days [20:53:43] for leave a comment it's fine but I could be like approving PRs in scary places [20:53:47] leaving* [20:53:56] er not PRs, changes [20:54:49] ran puppet on deploy01, running scap sync [20:55:15] Success? [20:55:21] err, but from /srv/mediawiki-staging this time, not ~krenair [20:55:26] * James_F laughs. [20:56:05] it's gonna take a moment, it's scap [20:56:12] updating localisation cache with 6 threads [20:59:46] sync-apaches: 100% (ok: 5; fail: 0; left: 0) [21:00:52] Success! [21:01:04] Thank you Krenair. As always, I owe you drinks. :-) [21:01:15] Though with covid19 my next trip to London is looking dicier. [21:01:38] when were you thinking of visiting? [21:02:07] (feel free to PM ofc) [21:05:18] Krenair: May. [21:05:28] We'll have to see. [21:05:41] mm [21:06:07] that's probably not looking very good [21:06:25] Nope. [21:07:20] wikimedia hackathon in Tirana in may was cancelled, would expect London to be worse :/ [21:08:01] Yeah. Though 1:1 is very different from a mass gathering of 250. [21:08:53] tur [21:08:55] true* [21:26:04] James_F, Krenair: i'm getting 504 errors from VE on beta now [21:26:26] is it possible your fixing scap caused something to break? [21:29:59] Oh dear. [21:31:05] maybe the firewall hack? [21:31:09] cscott@deployment-parsoid11:~$ curl -x deployment-parsoid11:80 'http://en.wikipedia.beta.wmflabs.org/wiki/Special:Version' [21:31:12] still works from parsoid11 [21:31:27] but this times out from elsewhere [21:31:30] $ curl -H 'Host: en.wikipedia.beta.wmflabs.org' http://parsoid-external-ci-access.beta.wmflabs.org/wiki/Special:Version [21:32:30] hm [21:32:54] but this still works: [21:33:03] $ curl -X GET "https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/html/Main_Page" -H "accept: text/html; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/HTML/2.1.0"" [21:34:45] hm, let me look into this more, maybe it was the scap that killed it, maybe something in the latest vendor commits isn't quite compatible? [21:34:55] stupid question: where do i find the logs for beta? [21:34:57] looks like my firewall rule is gone [21:35:46] what logs for beta? [21:36:23] if deployment-parsoid11 threw an exception, where would it end up? [21:36:35] dunno [21:36:38] is there a separate logstash instance for beta (i think so?) [21:36:40] logstash maybe? [21:36:45] if it's functioning that is [21:37:14] i wish i had some logs to figure out why my pod is runcontainererror [21:37:19] kubectl logs shows nothing [21:39:24] Mar 13 21:10:08 deployment-parsoid11 puppet-agent[22711]: (/Stage[main]/Ferm/File[/etc/ferm/conf.d/10_allow-external-parsoid-ci-access]) Filebucketed /etc/ferm/conf.d/10_allow-external-parsoid-ci-access to puppet with sum c62414b9e3126d66233cbdfde4dde76d [21:39:24] Mar 13 21:10:08 deployment-parsoid11 puppet-agent[22711]: (/Stage[main]/Ferm/File[/etc/ferm/conf.d/10_allow-external-parsoid-ci-access]/ensure) removed [21:39:31] cscott: there is https://logstash-beta.wmflabs.org/. The username and password are found via `ssh deployment-deploy01.deployment-prep.eqiad.wmflabs sudo cat /root/secrets.txt` [21:40:25] oh hang on [21:40:43] https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/fc20d915cfc669265ffea482971d6293008bb3be%5E%21/ [21:41:05] with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/576493/3/hieradata/cloud/eqiad1/deployment-prep/hosts/deployment-parsoid11 [21:41:31] my firewall rule was added conditionally if use_php => true [21:41:38] but this new file will never get loaded, it lacks a .yaml extension [21:42:07] and i'm going to get rid of use_php RSN: https://gerrit.wikimedia.org/r/577043 [21:42:09] root@deployment-puppetmaster04:/var/lib/git/operations/puppet(production u+15)# find hieradata -type f | grep -v yaml [21:42:09] hieradata/cloud/eqiad1/deployment-prep/hosts/deployment-parsoid11 [21:42:09] hieradata/cloud/eqiad1/hosts/.gitignore [21:42:09] root@deployment-puppetmaster04:/var/lib/git/operations/puppet(production u+15)# [21:42:34] Zppix: try `kubectl get po sopeltest.bot-798c546558-zxj9c -o yaml`. It will show you that the problem is "exec: \"/data/project/zppixbot-test/k8s/starter.sh\": permission denied" [21:42:57] why is it permission denied bd808 ? [21:43:24] Zppix: because the exec bit is not set on the file [21:43:38] what do i add to chmod for that? i cant remember [21:44:08] Zppix: chmod u+x [21:44:20] cscott, try now [21:44:25] u == user, x == eXecute [21:45:50] Krenair: VE seems to be working again [21:46:11] Joy. [21:46:34] so this was an unfortunate error in related work ongoing around the same instance around the same time :) [21:46:41] we got unlucky [21:47:01] on puppetmaster04 you mean? [21:47:18] I patched us up on puppetmaster04 pending the fix being merged in gerrit for real [21:47:19] (there shouldn't be anyone else working on parsoid11 afaik) [21:47:29] ok, gotcha. thanks! [21:47:34] the error was introduced via a change in horizon + a change in gerrit [21:47:48] fix is merged [21:48:33] will https://gerrit.wikimedia.org/r/577043 break it again? [21:49:08] uh [21:49:15] well it'll conflict with our cherry-pick on puppetmaster04 [21:49:35] so will probably stop all puppet changes going out to deployment-prep until someone sees it and fixes the merge conflict by hand on puppetmaster04 [21:50:14] i can C-2 it for the moment, it's not worth breaking the world over [21:50:26] nah [21:50:32] I mean A) you can't [21:50:39] but also B) we just fix things when this happens [21:51:06] it's an unfortunate consequence of dealing with cherry-picks in this environment [21:52:28] sometimes a prod change comes through that conflicts with something we did locally [21:53:27] Sometimes ~== every week. [21:54:39] yeah [21:55:01] and 'we just fix things' is really 'sometimes we fix it shortly, sometimes we only notice when someone complains on a mailing list' [21:55:40] bd808: now its saying no file or directory but the file path exists [21:55:54] * Zppix is horrible at setting up k8s [21:57:45] (sometimes the person complaining is capable of fixing it themselves but is lacking confidence or maybe wants to draw attention to the problem) [21:58:01] was that re me krenair? [21:58:09] no [21:58:17] i figured :P [21:58:26] sorry I haven't been paying attention to your thing, have been looking at beta [21:58:27] Zppix: Do you know which file it is complaining about. That script seems to be doing more than I would expect a startup script for a sopel bot to do from inside the container. [21:58:52] The `rm` in there is especially confusing [21:59:42] bd808: it seems to think that ~/zppixbottest/bin/python3 and ~/zppixbottest/bin/pip3 doesnt exist... im using the same setup (obviously with different file paths in the configs) for the main zppixbot project which is working just fine [21:59:45] `command: [ "/data/project/zppixbot-test/k8s/starter.sh", "bash" ]` makes me think that the rm is failing [21:59:56] im not worried about the RM [22:01:38] i just dont see a reason why it should be erroring [22:01:50] bd808: unless i did something wrong in the .yaml [22:02:50] Zppix: try running `webservice --backend=kubernetes python3.5 shell` and then calling your script directly. Maybe that will help you find the bug. [22:17:44] now all i get is /data/project/zppixbot-test/k8s/starter.sh: /data/project/zppixbot-test/zppixbottest/bin/sopel: /mnt/nfs/labstore-secondary-tools-project/zppixbot-test/zppixbottest/bin/python3: bad interpreter: No such file or directory bd808 [22:20:19] Zppix: how did you initially create that venv? [22:20:44] bd808: python3 -mvenv zppixbottest [22:20:56] if you made it anywhere other than from inside a python3.5 Kubernetes pod then it will be broken [22:21:18] so if you typed `python3 -mvenv zppixbottest` on the bastion that is the cause of the problem [22:21:30] argh [22:21:37] the python3 on the bastion is not the python3 in the pod [22:21:57] so how do i fix this [22:22:30] Recreate the venv from inside a `webservice --backend=kubernetes python3.5 shell` [22:22:42] jsut rm what you have and then make it again [22:24:07] bd808: i still do the python3 -mvenv cmd right [22:24:48] Zppix: yes, you did the right things. You just started from the wrong place [22:25:05] I wish I know how to make this less confusing :/ [22:25:18] you are not remotely the only person to have done this [22:25:30] bd808: updating https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python with that little bits of info is a start [22:26:53] Zppix: so the fun thing is that the difference depends on if you are setting up for Grid Engine or Kubernetes [22:27:10] lol [22:27:59] that page kind of annoys me because it mixes both at the start but then only talks about grid engine stuff in the details [22:28:19] * bd808 has an idea [22:32:41] Zppix: better? -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Virtual_environments [22:37:04] yes [22:37:10] bd808: i got it working [22:39:08] +1 [23:18:42] bd808: you think this will still be used for future Wikimanias? https://scholarships.wikimedia.org [23:19:30] mutante: T243037, but then I was told to hold off [23:19:30] T243037: Shutdown scholarships.wikimedia.org and archive project - https://phabricator.wikimedia.org/T243037 [23:19:37] so *shrug* [23:19:53] oh, i see [23:20:01] * mutante subscribes [23:20:38] bd808: by legal? [23:20:51] mutante: if it lives on it certainly needs a new host with a modern version of php [23:21:03] mutante: by... it's complicated ;) [23:23:26] that host needs to be updated to buster sometime this year [23:23:40] it is not just this app though [23:24:24] shares it with iegreview and racktables, heh [23:24:42] could be that all 3 are not really used [23:24:54] we still have racktables? [23:25:03] yea :p