[05:38:36] wm-bridgebot_: are you functional? [09:30:13] !log tools detected puppet issue in all VMs: T226480 [09:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:30:16] T226480: toolforge: puppet issue probably related to puppet-enc - https://phabricator.wikimedia.org/T226480 [10:35:18] !log toolsbeta create 2 VMs toolsbeta-arturo-k8s-worker-[1,2] for T215531 [10:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:35:28] T215531: Deploy upgraded Kubernetes to toolsbeta - https://phabricator.wikimedia.org/T215531 [10:35:53] !log toolsbeta create puppet prefix `toolsbeta-arturo-k8s-worker` for T215531 [10:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:10:49] arturo: Installed a package in python env and it worked fine before now from the flask application until today reports `ImportError: No module named 'yaml'`, just to note that the import statemtn `import yaml` works fine on command from python interpreter in the same environment. [13:11:56] some missing dependency Eugene233 ? [13:12:58] arturo: Installed the dependency earlier on to run the app. Surprisingly does not work at the moment [13:30:08] Eugene233: are you using a virtualenv? maybe the python interpreter is different between the flask app default shell. [13:31:01] jeh: I always resolve that with `alias python=python3` [13:33:56] do you happen to have the code online? [13:35:35] I could take a quick look. typically import issues are related to python paths or interpreters and sometimes files named like the module you're trying to import [13:40:34] it also might help to verify where the yaml module is installed. Inside the python interactive shell after the import you can run `help(yaml)`. At the bottom of the page it should tell you where the module is installed. [14:23:48] There's currently a breaking issue with the latest python3.4 3.4.2-1+deb8u3 package on Jessie. More info and a workaround at: https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1685477.html [14:42:09] yikes [14:42:14] I am trying an experiment of mirroring conversation in this channel to a Telegram channel (). The Telegram side is read-only. [14:42:20] jeh: is the new version out? It it seems suggested in there? [14:42:30] If so, we can update all the vms that will respond with it [14:43:16] Ah, yeah, it seems to be :) [14:49:50] yep, new python3.4 version is out 3.4.2-1+deb8u4 [14:51:06] jeh: [14:51:20] jeh: Now I get *** uWSGI is running in multiple interpreter mode *** [14:53:40] AFAIK that message is normal [14:53:55] should be followed by “spawned uWSGI master process (pid: 1)” soon [14:55:45] Lucas_WMDE: Yes [14:56:16] I usually sort that out using `alias python=python3` but it no longer works [14:56:25] !log admin Updated python 3.4 on the labs-puppetmaster server [14:56:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:00:17] Eugene233: the alias command you're using still relies on where python3 is at in your path. Not sure if you're using a virtualenv or the system path `/usr/bin/python3` but it may be the shell and flask app are using something different. [15:00:52] jeh: I am using a virtualenv [15:06:03] I'd double check the uwsgi configuration and make sure it's using the correct virtualenv. That is most likely causing your import error. [15:08:10] Eugene233: It sounds like you need to rebuild your virtualenv. I this in Kubernetes/toolforge? [15:08:34] bstorm_: Yes it is in Kubernates [15:10:24] Then I might try `webservice shell` in and maybe rebuild the virtualenv. However, I am interested to hear how things go because Kubernetes is running on jessie (where a broken python package is floating around). bd808: are the containers still jessie-based or was all that pushed up to stretch? [15:13:25] bstorm_: there are a mix of jessie and stretch containers. For python, the python3.5 type is running on Stretch. [15:14:23] But the containers themselves should be stable right now. We don't have any auto-patching system for them yet, so the containers do not have the broken python3.4 package in them [15:14:39] Yeah [15:15:01] Or shouldn't anyway [15:22:32] * Lucas_WMDE is looking forward to python3.7 in buster [15:25:44] Me too! [15:47:16] After rebuilding the container, i still get `504 Gateway Time-out` [15:48:17] Errors from the webservice? [15:48:24] Or is the webservice looking ok? [15:49:11] Hrm, I probably should ask if you restarted the webservice after you rebuild the virtualenv in the container? [15:50:41] bstorm_: yes i did [15:51:49] !log deployment-prep restart php7.2-fpm for wikidiff2 upgrade (T223391) [15:51:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [15:51:52] T223391: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 [16:00:49] Eugene233: are you still getting the 504 response? I've seen the kubernetes backend take up to 4-5 minutes to be fully synced up with the ingress service (nginx reverse proxy for https://tools.wmflabs.org/) on webservice start/restart [16:03:02] andrewbogott, arturo: meeting? [16:03:07] omw [16:05:40] bd808: yes [16:05:43] !log admin updated python3.4 to update4 wherever it was installed on Jessie VMs to prevent issues with broken update3. [16:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:54:59] Eugene233: what tool is this again? I missed that earlier. [16:55:51] bstorm_: isa [16:55:58] thx [17:00:23] `kubectl logs ` shows: [17:00:29] https://www.irccloud.com/pastebin/dgbm6qVq/ [17:01:15] I've seen that whatnot before... [17:01:26] bd808: what's that look like to you? [17:02:00] When you are around again :) [17:02:20] I seem to recall it meaning the wrong container is used or something weird, but I'm not sure [17:03:08] It's using python "docker-registry.tools.wmflabs.org/toollabs-python35-web:latest" [17:03:16] bstorm_: I don't know what that is supposed to mean but i have been using that container for some time now [17:03:27] I'm just checking things [17:03:57] Looks like it's using the right image. I'm trying to sort out why it's missing an expected library [17:07:52] I see in the history that the venv was rebuilt on the shell on the bastion. That will conflict with what is done in the container, potentially (though the paths should match in most cases). [17:08:11] I'd only interact with pip in that virtualenv on the webservice shell [17:08:56] I'm not sure that's what is up now, since that is older in the history than webservice shell, but it could still be related. [17:15:13] !log T225823 create asyncwiki vps project [17:15:14] jeh: Unknown project "T225823" [17:15:14] T225823: Request creation of asyncwiki VPS project - https://phabricator.wikimedia.org/T225823 [17:18:30] !log create asyncwiki vps project: T225823 [17:18:30] jeh: Unknown project "create" [17:18:42] jeh: it's `!log msg` [17:19:34] thanks, was looking at old examples... not sure what project this falls under [17:20:02] your newly created project [17:20:15] i.e, `!log ayncwiki msg` [17:20:23] (in case that's the name of the openstack tenant) [17:20:24] !log asyncwiki create VPS project T225823 [17:20:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Asyncwiki/SAL [17:20:27] T225823: Request creation of asyncwiki VPS project - https://phabricator.wikimedia.org/T225823 [17:22:18] there are also some “created project” messages under the Admin project https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:37:20] Can someone help me identity why I still have a 504 gateway error while accessing /isa? [17:55:09] It's because it is failing here: open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3689] [17:55:28] Which seems like some weird problem between your virtualenv and the container's venv [17:55:49] I've seen it before specifically, but I don't recall precisely which issue it is. [17:57:10] Have you *only* installed things in the virtualenv inside the webservice shell, or have you done it on the command line on the bastion? If you did the latter (which I see you have at least some in the history), it may conflict. That said, that error might mean something else. So I'm hoping someone else remembers what can cause that. [17:57:40] But the app is absolutely not running if the container says that's killing it [18:00:17] Some folks who might be able to help aren't online right now, but we should make sure they see that output--` open("/usr/lib/uwsgi/plugins/python_plugin.so"): No such file or directory [core/utils.c line 3689]` [18:03:40] * bstorm_ digs around some more [18:04:44] I've got a few minutes to help look too bstorm_ [18:05:01] Great, thanks. I could swear I've seen that error [18:05:31] Seems odd, since I'm fairly sure we use uwsgi packages installed in the container anyway, right? [18:06:30] yeah, the uwsgi service should all be running from deb managed stuff inside the Dockerfile [18:08:06] bstorm_: I'm seeing that error now. It's a red herring [18:08:12] Ok! :) [18:08:44] it happens because the uwsgi config we generate is the same for both python2 and python3 but the containers only have one or the other runtime [18:08:49] So the service might actually be running [18:09:03] Which the uwsgi log suggests at least [18:09:07] its harmless for the py2 module to not load as long as the py3 one does [18:09:14] yeah, I get it now [18:10:17] https://tools.wmflabs.org/isa/ is trying to load for me... so it looks like the webservice is at least registered with the gasteway [18:10:43] ah. and there is the 504 timeout [18:11:05] not seeing anything in the $HOME/uwsgi.log yet to help me understand the timeout [18:13:16] bd808: That's what I am talking about. It's kind of weird [18:14:26] Eugene233: is there a route in the app that should be basically a static response? Something that would let us decide if the uwsgi service is running at all? [18:15:18] * bstorm_ takes notes to not trust that error from the container next time [18:15:57] bstorm_: it confuses people all the time. I think I should fix webservice to not cause it to happen :) [18:16:38] `/api/post-contribution` I think should return at least something. [18:17:22] Eugene233: I got the 504 response even for https://tools.wmflabs.org/isa/this_should_404 which I think should have been a pretty fast static response [18:19:22] argh. the stretch python3.5 image is missing procps :/ [18:19:36] so pretty hard to debug inside the running container... [18:20:43] * bd808 pokes around in /proc like it was 1972 [18:22:44] ok. curl is working inside the pod. I think our problem is outside in the kubernetes network layer maybe. Requests not getting from the ingress proxy to the container and back [18:23:35] I did `kubectl exec -it isa-3401979998-qcmid -- /bin/bash` to get a shell inside the running pod and then `curl -v localhost:8000/isa/` and got a response right away [18:26:44] and `curl -v 192.168.210.2:8000/isa/` works from tools-worker-1027 where the pod is running, but not from the bastion I'm on [18:27:04] I got the pod's ip address from `kubectl get po -o yaml` [18:27:18] so still looking like a network problem of some sort [18:27:42] I need to step away and get food. [18:28:40] I remember to have had a problem with `import yaml` which worked previously. [18:29:04] bd808: Take your time [18:30:44] Eugene233: I think the next thing to try it stopping and restarting your webservice unitl it is on a host other than tools-worker-1027 on the backend to see if that just makes things work. tools-worker-1027 might be having some kind of issues (not sure what) [18:39:40] That's a good thought. [18:40:26] bd808: I will say that you cannot connect to the pod IP from the bastion anymore. [18:40:35] Lemme test that from the k8s master [18:42:09] It times out from a node that has a route into flannel [18:42:19] Wait, no it doesn't [18:42:45] tools-worker-1016:~$ curl 192.168.191.24:8000/isa/ [18:42:47] That works [18:43:03] So the issue is not the k8s network necessarily, but possibly the proxy [18:43:05] Checking that [18:43:26] It did take a little while [18:43:41] So it could be slow because of an overloaded node... [18:44:06] checking the node in case [18:44:46] that node isn't having any issue with load at least [18:45:36] Found it! [18:45:40] kube2proxy is dead [18:45:51] It was affected by our little python package problem this morning [18:47:07] fixing [18:49:09] https://tools.wmflabs.org/isa/ [18:49:12] Better [18:49:21] Eugene233: ^ [18:49:25] :) [18:51:25] I feel relieved at least to have Isa back online [18:52:10] Thanks bstorm_ bd808, * for the help \o/ [18:53:03] Thank you! You managed to expose a fundamental problem that was going on in Toolforge that hadn't been fixed by my efforts earlier [18:53:17] I'm glad to have found it [18:58:23] kube2proxy would certainly mess up launching new Kubernetes webservices :) [18:59:41] for those reading along, kube2proxy is the bespoke python service we run that watches for new things in the Kubernetes cluster and registers them with the nginx ingress proxy that serves https://tools.wmflabs.org/