[12:51:31] (03PS1) 10Rosalieper: Added a developer manual containing the deployment on toolforge steps and more info. [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/452197 (https://phabricator.wikimedia.org/T190163) [14:24:44] (03CR) 10D3r1ck01: "Minor fixes." (038 comments) [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/452197 (https://phabricator.wikimedia.org/T190163) (owner: 10Rosalieper) [14:28:41] (03CR) 10ArielGlenn: "This is a very good outline. There are some details I'd like to see added:" [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/452197 (https://phabricator.wikimedia.org/T190163) (owner: 10Rosalieper) [17:52:53] (03CR) 10Zhuyifei1999: Added a developer manual containing the deployment on toolforge steps and more info. (032 comments) [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/452197 (https://phabricator.wikimedia.org/T190163) (owner: 10Rosalieper) [18:13:00] zhuyifei1999_: you can ask, but for now until some sort of deployment works, that's just a placeholder... [18:13:22] ok [18:13:26] have you played at all with uh [18:14:00] the sort of setup stashbot uses, where there's a script to kubectl create, etc? [18:14:32] not really (but I should, never got to it) [18:14:55] I'm looking at the podspec stuff and it's unclear to me what names one could give a container in a pod [18:15:03] mostly because it's unclear what a 'pod' covers in our setup [18:15:15] I'm docker-experienced but know very little about kube [18:15:40] but if thisn't anything you know about, we'll muddle through [18:16:39] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes deployment.yaml, containers: name bot and also the namespace [18:16:50] is the pod a per tool thing? [18:16:59] if so we can just shove 'bot' in there and it's fine... [18:17:20] I think the name is arbitrary. it's what's shown in kubectl get pods [18:17:37] well there's the pod name and the container name [18:17:59] but maybe you just answered my question [18:18:53] fwiw, https://github.com/wikimedia/operations-software-tools-webservice/blob/master/toollabs/webservice/backends/kubernetesbackend.py#L308 [18:21:34] apergos: I'm pretty sure the namespace has to be the name of the tool. it's sort of like access control where one tool can't mess with the namespace belonging to another tool [18:22:57] there's three places in that file where something is a question for me [18:23:14] yeah? [18:23:55] metadata.name, spec.template.metadata.labels.name, spec.template.spec.containers.name [18:23:57] those three [18:24:23] can that containers.name be arbitrarily chosen by me, and it's unique to the pod namespace? [18:24:38] and then the first two are podnamespace.containername ? [18:25:02] or what is it? honestly wading through the endless loop of kube docs is a bit... meh [18:25:14] metadata.name <= likely an arbitrary name for $ kubectl get deployments [18:26:14] so not related to the other two then [18:27:47] you know what I am going to assume that they ae related, it won't hurt anything to do that [18:28:01] and if they are related and I assume the opposite we'll be chasing our tails, and time is short [18:28:24] just as long as there's not a docker container with the name 'bot' floating around someplace.... [18:29:11] as in, dock ps shows it on all backends [18:29:12] 8docker [18:29:15] *docker [18:30:25] also, thank you for weighing in on commits, that's awesome [18:30:30] I'm looking at `kubectl get -o json deployments` and `kubectl get -o json pods` and it occured to me that the name displayed in the pods is different to that of deployments [18:31:56] i.e. deployment containers.name = webservice, pod metadata.name = video2commons-1684508684-vn9l3 [18:32:21] that's a horrible pod name [18:32:34] (sorry, that's off topic) [18:32:54] well, it's what's generated and shown in kubectl get pods [18:33:11] rght, it's what hapens if you don't provide a name to lubectl create I guess [18:33:18] *kubectl ! [18:33:38] I thought deployments creates pods [18:34:01] it must also create a container that lives in the pod, no? [18:34:19] so it pods' names should be specified by the data of the deployment [18:34:20] the image will already be around, but the container generated from it... not so much? [18:34:23] yes [18:34:39] yes I mean 'it must also create a container that lives in the pod' [18:35:02] but if ou don't specify the pod name in the spec, then one must get generated from something or other [18:35:13] right right [18:35:52] the deployment spec has containers.name [18:36:10] is that a mandatory field? (this is entirely off topic, I'm just speculating now) [18:36:26] and for an interactive webservice pod the name does show 'interactive' in kubectl get pods [18:37:09] that's interesting [18:37:30] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#writing-a-deployment-spec <= might need this [18:38:20] For labels, make sure not to overlap with other controllers. huh [18:38:42] ok well my original idea of use the toolname followed by some arbitrary string, is probably ok [18:39:19] .spec.selector must match .spec.template.metadata.labels <--- very useful [18:39:37] In API version apps/v1, .spec.selector and .metadata.labels do not default to .spec.template.metadata.labels if not set. So they must be set explicitly. <--- very stupid [18:39:44] * apergos <-- very opinionated [18:40:02] since each tool has their own namespace I overlapping probably will only happen within a tool [18:40:07] lol [18:41:28] it looks like the sample deployment.yaml doesn't have spec.selector so meh [18:42:03] well we have some things to try, if the student has the endurance.... [18:42:11] thanks for helping sort this out [18:42:50] np [19:24:26] hey r054l13! [19:24:43] so I've been looking at the docs and chatting with zhuyifei1999_ here and my thoughts are these [19:24:48] apergos: hey! [19:25:00] ok [19:25:01] first, gridengine is going to be deprecated so I don't want to fall back to that unless we have to [19:25:17] second the webservice thing seems to lock you into the docker image entrypoint... [19:25:22] how much do you know about docker? [19:25:37] not much [19:26:02] do you know about containers, or dockerfiles? [19:26:02] grid isn't being deprecated any time soon afaict https://phabricator.wikimedia.org/T199271 [19:26:15] atended a docker envent before but never really used it in a project [19:26:28] apergos: yes [19:26:48] zhuyifei1999_: it's all over the wikitech docs though [19:28:01] r054l13: ok well the very short story, since you have no time to read the docs on docker and kubernetes and etc is [19:29:39] a container is like having a vm but you use the same kernel, you use (I'm thinking of linux containers using lxc here) 'namespaces' to get most of the work done of isolation of processes in a special environment [19:29:41] https://linuxcontainers.org/ [19:29:59] and docker is a project that makes creation, monitoring, configuring etc of these containers very easy [19:30:28] you initially create an 'image' which contains everything you expect the container to have in it, all the apps, etc [19:30:29] ok [19:30:46] and then you can use that image to create a container which you can start, stop, ssh into, etc [19:30:55] well, there was a time people were searching for a grid engine replacement and found k8s, but they figured that it's not technically feasible to build generic containers for every tool. I don't really know the corrent long term plans for it, but if it's going to use debian stretch I'd assume the support will continue for a long time [19:30:58] and this container in our case is what would be running your bot [19:31:36] zhuyifei1999_: well that is quite interesting to know, perhaps I shall poke bd808 about it if that's the case [19:31:51] ok [19:31:52] because having trusty docker images as the only default would be the pits [19:32:30] r054l13: the way images are described is by using a file with a list of commands in a simple syntax called a docker file [19:32:47] alright [19:32:47] and then these images can be stored and made available in a registry for use [19:33:00] ok [19:33:24] so all the docker images used for python, nodejs, etc for toolforge are in a registry and their docker files are in a gerrit repo when we can look at them if we want [19:33:47] there was a talk about replacing the grid with another system (not sonofgridengine or k8s) in a random (but fairly recent) phab comment, but can't find it now [19:33:59] typically docker files specify a command to run at the end [19:34:17] ok [19:34:21] or else, if there is something managing the containers then it will start them with some command that they run [19:34:37] the problem is that we don't have access to those values to override them [19:34:46] that stuff is not exposed to the user [19:34:48] so [19:35:08] the next thing we could do is what i think I mentioned a while back, something bd808 did for his 'stashbot' [19:35:27] which is a process that should just run all the time and it runs on the kubernetes backend [19:35:34] but it does NOT use webservice, see [19:36:04] but it DOES use a virtual env (different than yours but you can see that there's a place to stuff that in) [19:36:13] do you want to take a look at this with me? [19:37:45] apergos: found that comment. not sure about the current status of it. https://phabricator.wikimedia.org/T182451#3826017 [19:38:07] slurm, never heard of it, huh! [19:38:18] thanks for the pointer, zhuyifei1999_ ! [19:38:23] np [19:38:44] r054l13: ? do you want to look at this now, try something else, work on your docs tonight instead, ...? [19:40:35] one alternative is, as zhuyifei1999_ suggests, to fall back to using gridengine since it's not going to die tomorrow at any rate [19:41:27] apergos: Hello [19:41:32] ah [19:41:42] uh, last message you got from me before being disconnected? [19:41:55] r054l13: [19:42:00] So [19:42:08] (10:35:08 μμ) apergos: the next thing we could do is what i think I mentioned a while back, something bd808 did for his 'stashbot' [19:42:15] (10:35:27 μμ) apergos: which is a process that should just run all the time and it runs on the kubernetes backend [19:42:22] (10:35:34 μμ) apergos: but it does NOT use webservice, see [19:42:29] (10:36:03 μμ) apergos: but it DOES use a virtual env (different than yours but you can see that there's a place to stuff that in) [19:42:37] do you want to look at this now, try something else, work on your docs tonight instead, ...? [19:42:45] (10:40:35 μμ) apergos: one alternative is, as zhuyifei1999_ suggests, to fall back to using gridengine since it's not going to die tomorrow at any rate [19:42:49] now you are caught up [19:43:18] ok am reading [19:43:45] r054l13, apergos, I've left some feedback on the document. I think if all that is done, then the doc will be ready [19:43:45] * zhuyifei1999_ should add, if grid is gonna die we'll make migrating plans, because too many tools / bots depend on it [19:44:01] d3r1ck: thanks [19:44:12] sure [19:44:45] apergos: but it does NOT use webservice, see (what should I see here) [19:45:02] 'you see' [19:45:05] it's an expression [19:45:11] oh! [19:45:24] heh. the ambiguiities of english [19:46:27] apergos: I would like to work on this now [19:46:32] ok [19:46:41] so maybe we should compare the two approaches [19:46:48] yeah [19:46:58] there's what would need to happen to use gridengine [19:47:04] I think zhuyifei1999_ will know best about that [19:47:12] and there's what you would need to do to use kubernetes [19:47:18] I think I can give you the outline for that [19:47:36] * r054l13 taking notes [19:48:52] grid engine is jstart a command: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid, optionally increasing the virtual memory limit and/or use bigbrother to keep a job alive and/or other hacks invented here [19:49:27] ok [19:49:28] you don't get as much isolation as in k8s [19:50:01] so let's say a virtual env is installed on the bastion (well it's on the nfs mounted home dir I suppose, but whatever) [19:50:17] what would one need to do, to make sure that's activated and the nodejs command of one's desire run? [19:50:31] in k8s, the k8s will manage the contains so if your command crashes (but not too frequently) it'll restart automatically [19:52:09] apergos: activate it [19:52:39] zhuyifei1999_: ok [19:52:41] so you'll want a script to do that: to be in the tool's home directory, activate the virt env, start the nodejs command [19:53:13] yeah [19:53:50] and I guess that script could be submitted to the gridengine as a job? zhuyifei1999_ does that sound right? will it take bash scripts (/bin/bash string of args here) ? [19:54:12] yes [19:54:15] ah nm the docs say [19:54:28] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#My_shell_script_job_fails_with_%22Exec_format_error%22 [19:54:31] or it could have a shebang and chmod +x [19:54:35] you have to have the first line of the script... yep [19:54:39] so that's the wayto do that [19:54:48] so r054l13 this is one approach [19:55:26] and perhaps zhuyifei1999_ would be able to show you the way to some degree if you get stuck (it's 11 pm here so getting late for me) [19:55:40] ok noted [19:56:07] the other is to use kubectl create to set up your own 'deployment' which means you can tell kubernetes which command to run [19:56:19] and that command again can be a script which will activate the virtual env [19:56:21] but: [19:56:22] cool [19:56:26] it's 4am here and I should really sleep. I've having sleep troubles for a very long time and find sleeping super difficult *facepalm* [19:56:29] the script must do a few other things too, that's the problem [19:56:36] ah zhuyifei1999_ you are in a worse tz than me [19:56:38] ok [19:56:40] please get some rest [19:56:49] thanks :) [19:57:21] zhuyifei1999_: good night or morning :) [19:57:28] :) [19:57:34] maybe r054l13 this will mean that you look carefully at the docs for whichever method you choose and if you get stuck and there's no one else in the channel that can help, you try to get some sleep yourself! [19:58:06] so a sample script for kubernetes should be able to start, run, stop the service, at least. most of this is pretty generic: [19:58:20] ok [20:00:08] https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/stashbot/+/master/bin/stashbot.sh this is the one bryan used for stashbot [20:00:21] ok [20:00:32] you can see it sets a few convenient variables for use in the script, [20:00:47] activiates the virt env needed, [20:01:10] defines one convenience function [20:01:30] yeah [20:01:58] and then the things that are probably useful to you are start, run, stop, attach, and maybe status [20:02:30] most of those are pretty generic, for 'run' you will of course put your nodejs command in there instead of stashbot [20:02:36] but the tricky bit is this [20:02:47] what is attach for [20:02:49] for start, it does this: [20:02:52] kubectl create -f ${TOOL_DIR}/etc/deployment.yaml [20:03:10] attach lets you get a shell on the pod [20:03:17] so you can get on there and look around [20:03:33] ok [20:03:41] so this kubectl create, it needs a yaml file with settings describing the container etc [20:03:46] that's where it gets tricky [20:04:10] here's the one bryan used for stashbot: it's called deployment.yaml (obviously you can call it anything you want though) [20:04:37] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Example_deployment.yaml [20:04:59] alright [20:05:11] you want to pout that somewhere in your tool's directory and use that path to the kubectl create command [20:05:38] so here, the image name, you have to know what to put in there [20:06:03] you would need to find the nodejs docker image base name and put that [20:06:46] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/toollabs-images/+/master [20:07:23] everywhere it has /data/project/stashbot you put your tool directory of course [20:08:00] you'd change the command to the full path to the script you make [20:08:07] ok [20:08:38] there's a few places you see stashbot.bot and I think you could probably just change those to name-of-your-project.bot [20:08:48] but it is a bit more complicated than the gridengine thing [20:09:09] yeah [20:09:38] https://wikitech.wikimedia.org/wiki/Tool:Stashbot#Maintenance this is how he starts/stops the bot etc [20:09:42] so there's that [20:09:59] ok [20:10:01] so here are these two approaches... which do you want to try? [20:10:57] I will look at the grid docs [20:11:00] ok [20:11:37] then if I understand it better I will try that first. [20:12:02] otherwise this [20:12:44] but will most probably start with the gridengine metho [20:13:09] ok [20:13:32] I would start by writing a script that puts you in the tool dir, activates the virt env, and starts nodejs [20:13:37] see if that runs ok just from the bastion [20:14:00] once you have that, then maybe try submitting that as a grid job (jsub?) and see what happens. ah just to make sure, [20:14:18] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Running_a_job_only_once [20:14:27] so you don't wind up with 20 of them running after retries! [20:14:44] ok [20:14:49] oh, it's jstart anyways, my bad [20:15:14] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Submitting_continuous_jobs_(such_as_bots)_with_'jstart' [20:15:19] this takes care of all that for you [20:15:25] so script, make executable, [20:15:34] put the bin/bash stuff at the first line, [20:15:39] try jstart and see what happens! [20:16:24] ok [20:16:34] I think I got it [20:17:05] ok! [20:17:17] I won't really be around because it's late, but I will check the backread tomorrow [20:17:30] and maybe others are in here that can help if you happen to get stuck [20:17:32] good luck!! [20:18:00] thank you [20:18:05] good night