[00:05:30] bd808: excellent, thanks! i'll give this a shot tomorrow and see how it works. i'm still all for default https but this solution at least loooooooks relatively simple... [07:50:57] I was thinking that https://docs.kubessh.org/en/latest/ would make a stellar addition to PAWS, useful for all the folks who just want ssh [07:51:23] I am guessing you don't want more stuff on that cluster tho :D [07:51:34] we could allow ssh in with mediawiki OAuth tokens or something. [07:51:57] and toolforge [11:27:34] does anybody know why both meta and meta2.toolforge.org give service not available error (503)? [11:36:32] Wiki13: I think there are errors in the tool: [11:36:39] for example https://www.irccloud.com/pastebin/Al1bsdlu/ [11:37:12] I think I used it a few days ago and then it still worked... [11:37:39] has anything changed recently? [11:38:17] not recently, last month I see [11:38:22] 86 2020-06-07 20:13:04 git -C git/wikimedia-contrib reset origin/HEAD --hard [11:38:22] 87 2020-06-07 20:13:08 webservice restart [11:39:05] hmmm but it worked recently so that can't be the issue [11:39:55] what if the service is restarted, I wonder if that would fix this or not [11:40:05] arturo: oh, toolforge as in toolforge ssh keys? [11:40:18] https://www.irccloud.com/pastebin/MvBuXLx8/ [11:41:01] is that upon restartring? [11:41:12] Wiki13: let me try a restart [11:41:36] yuvipanda: the long standing idea bryan has is to replace toolforge bastions with pods in the k8s cluster [11:41:57] arturo: aaah, yes :) But then it has to work with gridengine, and ain't nobody got time for that? [11:43:00] we may eventually get rid of gridengine if we manage to provide same functionalities in k8s [11:45:25] Wiki13: do you know if this tool is mean to run in the grid or k8s? [11:46:48] !log tools.meta running `webservice restart` backend is k8s [11:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.meta/SAL [11:47:03] well, considering its a webtool that queries databases i think its k8s [11:47:15] Wiki13: https://meta.toolforge.org/stalktoy/ [11:47:19] grid is long running jobs afaik [11:47:40] !log tools.meta running `webservice restart` backend is k8s (per Wiki13 request on IRC) [11:47:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.meta/SAL [11:47:59] !log tools.meta2 running `webservice restart`, backend is k8s (per Wiki13 request on IRC) [11:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.meta2/SAL [11:48:10] https://meta2.toolforge.org/crossactivity/ this works now [11:48:15] cool [11:48:51] and stalktoy works too https://meta.toolforge.org/stalktoy/ [11:48:55] thanks arturo [11:49:10] anytime! [12:01:00] arturo: yeah I hope so :) [12:03:42] yuvipanda: brooke has been wanting to take a look at k8s Jobs for a while now, but we don't usually find the time :-) [12:03:54] * arturo relocating [12:05:17] :D I've been meaning to write a small thing that would basically operate as a 'jsub' equivalent only, built on top of k8s jobs but without it shining through [12:05:20] and a similar one for crontab [12:13:11] would be a good way to try out https://github.com/clux/kube-rs maybe, although who am I kidding it'll probably be written in python [12:16:11] mhm guess that meta and meta2 died again, but now without a clear reason on why it did... ugh [12:36:10] posted a message to pathoschild's (maintainer) user talkage, so that he hopefully can fix the issue [12:36:38] I don't think restarting the tools again will be solving the problem [15:05:06] yuvipanda: If you are bored, you could implement T249787 :) [15:05:07] T249787: Create Docker image for Toolforge that is purpose built to run pywikibot scripts - https://phabricator.wikimedia.org/T249787 [15:09:37] bd808: fyi fixed all comments the Striker phab project patch [15:10:14] Majavah: cool. Hopefully it won't take me as long to test things again ;) [15:13:54] !log paws merge pr #50 to fix T258142 [15:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [15:13:57] T258142: pwb outdated in PAWS - https://phabricator.wikimedia.org/T258142 [15:16:53] !help Help, The Wikimedia Toolforge does not work [15:16:53] If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban [15:17:09] @kb joaquinito01 [15:17:44] bd808: do I have that power too? :-) [15:17:47] boring trolls are boring [15:18:03] arturo: maybe? [15:18:03] I trust: bd808!.*@wikimedia/BDavis-WMF (2admin), .*@wikimedia/andrew-bogott (2admin), .*@wikimedia/mviswanathan-wmf (2admin), .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikipedia/.* (2trusted), [15:18:03] @trusted [15:18:27] arturo: No but it looks like bd808 can give it to you [15:18:39] cool [15:20:20] User was deleted from access list [15:20:20] @trustdel .*@wikimedia/mviswanathan-wmf [15:55:32] !log tools set the bastion prefix to have explicitly set hiera value of profile::wmcs::nfsclient::nfs_version: '4' [15:55:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:06:14] Successfully added arturo!*arturo@* [16:06:14] @trustadd arturo!*arturo@* trusted [16:07:04] Successfully added bstorm!*bstorm_@* [16:07:04] @trustadd bstorm!*bstorm_@* trusted [16:09:34] !log tools rebooting tools-sgegrid-shadow to remount NFS correctly [16:09:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:13:05] bd808: nice :) isn't that the base PAWS image? [16:13:09] or does that have too many things? [16:13:45] yuvipanda: good question actually. The paws base image is hiding somewhere that I never look :) [16:14:09] My starter WIP is https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/603652 [16:14:15] bd808: https://github.com/toolforge/paws/tree/master/images/singleuser [16:14:40] but the real thing for that task is the hypothetical pwb-k8s runner script [16:14:57] the paws image also auto-deploys https://travis-ci.com/github/toolforge/paws/builds/176528606 [16:15:04] * yuvipanda nods [16:15:50] that singleuser Dockerfile has all the old bad patterns in it yuvipanda :) -- https://github.com/toolforge/paws/blob/master/images/singleuser/Dockerfile#L26-L27 [16:16:24] :D can be fixed! [16:16:41] let's see how https://mybinder.org/v2/gh/wikimedia/pywikibot/master turns out [16:16:52] if it works then you can just use the image that produces! [16:16:55] I'll wait until b.storm's fork lands to mess with the PAWS containers [16:17:02] * yuvipanda nods [16:17:20] today I'm fighting with LUA! [16:17:41] ah, from mediawiki, nginx or mysql-proxy? :D [16:17:49] nbserve [16:18:14] T258304 is being really stubborn [16:18:15] T258304: PAWS public fails to render notebooks with whitespace in their name and returns 500 response - https://phabricator.wikimedia.org/T258304 [16:27:07] bd808: oh wow I forgot about that [16:27:56] I hate that I used openresty for that :| [16:28:06] should've been a straightforward wsgi app [16:28:18] I was just realy putting lua into everything for a while there [16:28:27] yeah, I keep thinking about rewriting it [16:28:42] its a neat hack, but its being annoying right now [16:28:43] you should [16:28:50] it'll probably take you less time tbh [16:29:45] I finally found a theoretical fix, but I apparently don't have the correct version of either nginx or openresty [16:30:43] fun [16:31:16] https://github.com/openresty/lua-nginx-module#ngxreqset_uri -- it is supposed to accept a 3rd arg to tell it that whitespace is ok, but that is failing in the container [16:31:59] there is also https://github.com/jupyter/nbviewer of course, but that's quite heavyweight [16:32:05] fun. [16:57:50] 09:28:18 I was just realy putting lua into everything for a while there <-- just missing redis ;) [17:24:58] legoktm: :D ya but I stopped with redis before I stopped with proxies [17:25:04] I haven't touched redis in almost 4-5y now? [17:25:40] I'm shocked [17:25:56] wikibugs still uses the redis pubsub/queue thingy [17:26:14] it's held up rather well [17:26:39] yeah I think it's not bad at all if you aren't doing full on async stuff [17:26:46] or using a language with proper threading [17:37:47] I now put everything in sqlite, and am investigating how it would be to dynamically run 100s of NFS servers inside a kubernetes cluster [17:37:57] so idk if that's improved or gotten worse from my REDIS IN EVERYTHING days [18:00:00] bd808: if we need to upgrade openresty in that container, that should be simple. The dockerfile has a param in there for the version the way I set it up...I think :) [18:00:42] It's a bit of a wait, though. You like watching compiles, right? [18:00:52] 😁 [18:01:54] bstorm: I'm messing about wildly at this point. The lua-nginx-module doc that shows what we really need says: "This interface was first introduced in the v0.3.1rc14 release." [18:02:11] I'm still trying to figure out what that maps to in openresty [18:02:24] Ahhh [18:02:39] the one we custom compile now does not have the 3rd arg. And the onve from Buster does not either [18:03:03] *sigh* [18:03:29] We can most likely just use the latest openresty version [18:03:31] If that has it [18:03:36] I wonder if I can pull in the needed version with luarocks actually... let me poke at that [18:03:54] maybe...depends. Some things need to be compiled in [18:04:25] Let me know if you want me to run a rebuild and push a tag for you to play with, too. I have most of the setup cached locally, so it'll be quick here. [18:04:52] bstorm: I broke down and learned to use Docker enough to have a testing loop on my laptop :) [18:05:00] lol [18:05:01] Ok :) [18:05:13] build container, staert container, hit it with curl, watch the logs [18:05:57] I would have no reticence about just using the very latest openresty if it has the version of the module you need. [18:06:23] There should be nothing in there that requires the version I picked, which is way later than what came with the original container. [18:08:08] 7 [18:08:28] 8 [18:08:35] 9 [18:08:38] :PO [18:09:18] The answer we were looking for was "why is 6 afraid of 7?" ;) [18:10:36] !log tools.forrestbot git pull f7eb691...2d99a8f (now at "Mail the rest of us when forrestbot breaks") [18:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.forrestbot/SAL [19:07:23] hello! in my error.log I'm seeing `establishing connection failed: Resource temporarily unavailable socket: unix:/var/run/lighttpd/php.socket.pageviews-0` [19:07:27] `If this happened on Linux: You have run out of local ports. Check the manual, section Performance how to handle this.` [19:08:03] I'm assuming that basically means all the lighttpd threads were used up for this one webservice, and some traffic got rejected? [19:09:01] I saw this just after UptimeRobot reported the tool was down. It came back up just minutes later [19:10:28] musikanimal: is this tool running on k8s? [19:10:34] yes [19:11:26] if lighttpd ran out of threads, then bumping the # of replicas might help: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Kubernetes_webservices [19:13:15] which option would that correlate to at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Namespace-wide_quotas , if any? `pods` or `replicationcontrollers`? I'm trying to figure out if I need the quota to be increased [19:13:46] I believe it just adds another pod [19:16:00] sweet, well I just bumped it to 4. We'll see if that helps the uptime [19:16:05] thanks for the assistance! [19:41:21] musikanimal: I think going from 1 (default) -> 2 is more reasonable at first unless you actually need 4 replicas [19:43:56] sure [20:29:33] bd808: i wanted to follow up on yesterday's discussion on nginx + encryption between SSL termination on Cloud VPS and a particular instance. i implemented the changes that you pointed me towards (https://github.com/geohci/research-api-endpoint-template/blob/master/model/config/model.nginx#L9) and they do effectively force users to use the HTTPS version of the endpoint (which is quite nice). as far as i can tell though, the traffic [20:29:33] between the server doing SSL termination and my instance is still unencrypted. the more i looked into it, the more it seemed like it would require a good bit more work to fix with minimal benefit given that i assume the server handling the SSL termination is colocated with my instances, so it would be extra difficult to snoop on that traffic. seem like a reasonable conclusion? [20:31:19] isaacj: encrypting between the front proxy and your upstream application is literally not possible today. The front proxy does not know how to deal with a TLS upstream. [20:31:54] So I applaud you for your data integrity concern, but you are ahead of the infrastructure you are running on [20:31:54] bd808: ahh...well that makes me feel better about it :) i don't think i knew enough yesterday to describe what i wanted well [20:32:42] haha thanks [21:51:36] !log bishopfox Deleting empty project (T238222) [21:51:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bishopfox/SAL [21:51:39] T238222: Request creation of BishopFox VPS project - https://phabricator.wikimedia.org/T238222 [21:54:34] !log chicotestproject Deleting empty project [21:54:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Chicotestproject/SAL