Scholia

[06:27:19] !log admin running nova-manage cell_v2 map_cell0 on eqiad1 [06:27:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [06:28:29] !log admin running nova-manage db sync on eqiad1 [06:28:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [06:40:57] !log tools.integraality Deploy latest from Git master: 770cc93 (T237182) [06:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.integraality/SAL [10:34:16] !log toolsbeta manually scale nginx-ingress deployment to 5 replicas (T239405) [10:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:34:19] T239405: toolforge: new k8s: evaluate ingress controller reload behaviour - https://phabricator.wikimedia.org/T239405 [10:38:45] !log toolsbeta create wildcard DNS record for `*.toolsbeta.wmflabs.org` for use by the new k8s cluster [10:38:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:17:35] is there a way to get a working (as in, not link-local) ipv6 address on a cloud instance? [18:18:13] (I mean an easy/standard way, like some hiera flag or instance setting) [18:19:58] I can do it manually, just curious! [19:56:20] Hello, has something happened lately to the python installation? Suddenly my web app can't call pywikibot because of a missing "enum" module [19:56:53] I have tried to install it in my virtual environment, but even pip fails. [19:57:10] Please help, i am quite the newbie at this. [20:01:22] Joutbis, hey, looks like that was new in python 3.4 [20:01:36] is it possible you're running the default system python which would be python 2.7? [20:02:23] nice timing freenode [20:04:14] yes [20:04:33] @krenair yes, I am running python 2.7 [20:04:53] But it worked last week [20:05:05] as a matter of fact, it runs from the command line [20:05:31] it has stopped working from the virtual environment [20:05:47] Joutbis, is it possible you were in a venv with the enum34 pip package? [20:06:06] It's not in the requirements.txt file. [20:06:15] and I haven't changed anything for months [20:06:42] I did have to make a workaround, because the web app is running python 3 but called an external python 2.7 script [20:06:59] but that was long ago [20:07:08] february, or so [20:11:02] Joutbis: https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/538244/ [20:11:20] enum34 is a required dependency now [20:11:42] the support for Python 2 will drop very soon and I'd suggest migrate to Python 3 [20:13:29] yes, I know [20:13:48] But this means everything will stop working? [20:15:56] And how do I add it as a requirement? I have tried the procedure the help pages, and got the following [20:15:58] ImportError: No module named enum [20:16:06] oops, sorry, not that [20:16:17] (venv) tools.jorobot@interactive:~$ cat >> www/python/src/requirements.txt < enum34> EOF(venv) tools.jorobot@interactive:~$ pip install -r www/python/src/requirements.txt Traceback (most recent call last): File "/data/project/jorobot/www/python/venv/bin/pip", line 7, in from pip._internal import mainImportError: No module named [20:16:18] 'pip' [20:27:02] no module named 'pip'? that sounds really broken [20:27:46] 14:13:49 But this means everything will stop working? <= basically pywikibot can break at any time [20:30:03] perhaps I tried the wrong backend? [20:31:51] If I do webservice status [20:32:02] I get "Your webservice of type python is running" [20:45:15] mind if I `become jorobot` to see what id going on? [20:45:58] *is [20:46:52] sure, no problem [20:48:04] your venv is www/python/venv right? [20:49:24] this venv has python 3.4 [20:49:36] so in theory, enum should be no problem [20:49:59] installing it also worked for me [20:50:09] https://www.irccloud.com/pastebin/eA7JlZCB/ [20:51:18] oh wait, you said it is calling py2 script [20:51:43] I think it's venv, yes [20:51:47] can you make that script Python 3? [20:51:52] I don't remember creating the other, though [20:52:08] I plan to do it sooner than later, yes [20:52:35] I had some problems when I first wrote the web app. [20:52:47] so one way is to create a venv for python2 that has enum34 in it [20:52:52] ok [20:53:08] but then the webapp is python3 [20:54:15] can I just install pip in the venv? [20:54:21] yeah, two venvs, one python2 , one python3. but in the long term, converting the whole thing to python 3 will be more worth your time than having two venvs [20:54:36] every venv should have pip. which doesn't? [20:54:57] well, it used to have it. [20:55:01] Which webservice backend type were you using to open the shell? [20:55:05] I installed pywikibot, flask, the whole thing [20:55:25] webservice --backend=kubernetes python3.5 shell [20:55:47] perhaps I should try python 3.4? [20:56:16] oh. I just did `webservice shell` [20:57:38] your service.manifest is `python` not `python3.5` [20:57:54] I guess if you switch from 3.4 to 3.5 you need to rebuild your venv [20:58:37] Yeah, the venv includes everything but the actual version of python, so if you update, things break [21:00:00] meaning "if someone else updates", right? [21:00:08] Because I did nothing of the sort [21:00:09] (makes me fear that in the future when I run python 3 bots (I am currently migrating), I have to rebuild my venv every two years... and I have a ton of venvs) [21:00:25] you did `webservice --backend=kubernetes python3.5 shell` instead of `webservice --backend=kubernetes python shell` [21:03:41] Oh, that! [21:04:01] @toniher Hola, toni [21:04:17] I was following https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Flask_OAuth_tool [21:04:26] I guess it's been updated since last time [21:06:37] zhuyifei1999_ yes, now pip works! [21:06:57] and you installed enum34, so it should be good [21:07:13] Thanks a lot [21:14:55] python 3.4 doesn't need enum34 [21:15:00] python 2 needs that [21:15:16] so my installing enum34 is practically a no-op [21:20:54] Ops, I found my notes [21:21:07] the venv environment is for the webservice [21:22:15] the venv-kubernetes was the environment for the python2 script [21:22:58] and now the webservice is giving me a 504 Gateway timeout. So I guess a rollback is in order? [21:24:09] I should install the enum34 module in the venv-kubernetes environment. [21:30:02] o.O [21:34:27] I don't get it. webservice status says it's running, but I get a 504. I did a webservice restart, but still... [21:51:48] Is there something wrong with Toolforge? I cannot restart and hangs with "spawned uWSGI worker 4 (pid: 20, cores: 1)". I note https://phabricator.wikimedia.org/T239569 [21:52:40] hi fnielsen [21:53:05] I'm not aware of anything but that doesn't mean nothing is wrong [21:53:22] Hi Krenair [21:53:31] https://github.com/fnielsen/scholia/issues/961 [21:53:34] I think that cloudstore box handles maps and scratch mounts [21:53:55] I doubt it's relevant here [21:54:07] I now have two webapps that I cannot restart :/ [21:54:38] okay so [21:54:52] you have an app that looks like it should be serving traffic [21:55:05] yet you get 504s when trying to make requests that should be routed to it? [21:56:08] me too. Although I was fiddling with requirements.txt and restarting the app [21:56:10] Yes. They are both Flask apps. [21:56:16] the bastion host was kinda slow too [21:57:21] people sometimes abuse that bastion host and NFS (in terms of resource consumption), though it doesn't look massively bad at the moment [21:57:41] Joutbis, your one was crashing though right? [21:58:05] https://tools.wmflabs.org/scholia/ and https://tools.wmflabs.org/ordia are both nonresponsive. [21:58:36] fnielsen, okay this looks bad [21:58:44] Ordia should be reasonable lightweight to start up quickly. [21:58:47] uh let me paste this [21:59:37] fnielsen, https://phabricator.wikimedia.org/P9797 [22:01:37] fnielsen, where are you seeing the logs indicating spawning of uWSGI worker processes? [22:01:59] With tail -f ~/uwsgi.log [22:02:54] Am I using a wrong version of Python... [22:03:54] Heh, that's a 5.8G log file sat on NFS :) [22:04:29] the mtime of that file does line up with the k8s pod age [22:04:40] wonder why it doesn't get the same error we see in kubectl logs [22:07:15] What Python version am I suppose to run on? [22:07:30] $ python --version [22:07:31] Python 2.7.9 [22:07:31] tools.ordia@interactive:~$ python3 --version [22:07:31] Python 3.4.2 [22:08:19] I only see python34_plugin.so and python3_plugin.so [22:08:43] by the looks of things the docker image your pod runs in has python 2.7.9 and 3.4.2 too [22:09:10] are you trying to run python 2 or python 3? [22:09:51] I am doing webservice --backend=kubernetes python restart [22:10:36] what version of python is the application written to run under? [22:11:31] I can run under both Python2 and Python3 [22:12:20] fun [22:13:17] might need someone who knows toolforge's own tooling better [22:13:29] do you configure uwsgi etc. or is that done for you? [22:13:41] That is done for me [22:14:09] hm [22:14:33] I wonder if everyone else trying to run webservice python stuff gets this error in the background but it doesn't matter after all? [22:16:27] I have another tool where I defined a uwsgi.ini [22:17:55] as for me, the web part was running. My problems were with subsequent calls [22:18:19] So what we did is add a dependency to requirements.txt and restart the webservice. No changes to the app [22:18:29] And then I started getting 504's [22:18:48] my uwsgi is totally standard. I wouldn't know how to mess with it [22:19:13] and the log shows the workers doing just fine [22:19:19] the request gets to the tools proxy and then it does the equivalent of this [22:19:22] root@tools-proxy-05:~# redis-cli hgetall prefix:scholia [22:19:22] 1) ".*" [22:19:22] 2) "http://192.168.42.10:8000" [22:19:25] which would be fine [22:19:27] *** uWSGI is running in multiple interpreter mode ***spawned uWSGI master process (pid: 1)spawned uWSGI worker 1 (pid: 8, cores: 1)spawned uWSGI worker 2 (pid: 9, cores: 1)spawned uWSGI worker 3 (pid: 10, cores: 1)spawned uWSGI worker 4 (pid: 11, cores: 1) [22:19:47] and that IP is the IP of your Service resource in k8s [22:20:58] but it seems the proxy is not able to open a TCP connection to that IP [22:21:43] the port matches the uwsgi args [22:22:14] from inside the pod I can fire requests at localhost:8000 and get responses including Scholia [22:22:31] ... it does feel like maybe something is up with the ingress into k8s somehow [22:24:03] that's fnielsen's, right? [22:24:25] Scholia is fnielsen's (mine) [22:24:31] my service is jorobot if you want to try [22:25:25] Now I get "$ become ordia [22:25:25] -bash: fork: retry: Resource temporarily unavailable [22:25:25] -bash: fork: retry: Resource temporarily unavailable [22:25:25] " [22:25:25] yes I am looking at fnielsen's [22:28:47] kubectl doesn't work for me now. [22:29:01] can you be more specific? [22:29:15] looks ok to me [22:29:18] kubectl [22:29:18] runtime: failed to create new OS thread (have 3 already; errno=11) [22:29:18] fatal error: newosproc [22:29:18] runtime stack: [22:29:18] runtime.throw(0x2689df0, 0x9) [22:29:28] krenair@tools-k8s-master-01:~$ kubectl get -n scholia svc [22:29:28] NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE [22:29:28] scholia 192.168.42.10 8000/TCP 51m [22:32:25] fnielsen, we think we've identified a problem with k8s ingress affecting new pods [22:32:42] specifically to do with the firewall on the proxy [22:34:06] Ok. So it is nothing I can do. I should just restart them and wait for a firewall fix? [22:34:51] I'd just leave it be for the moment [22:42:58] Found the issue. It was an unrelated production change that leaked into the Toolforge kubernetes setup. Working on a fix. [22:44:19] wow, good to know [22:44:45] we chose a bad day for debugging :-) [22:47:02] specifically https://gerrit.wikimedia.org/r/c/operations/puppet/+/554036 applied to toolforge and changed tools-proxy-05:/etc/default/kube-proxy, triggered it to restart the service and it failed to come back up because our k8s version is too old to recognise the flag [22:47:42] kube-proxy needs to be up to set up iptables rules that allow the proxy to talk to k8s services [22:48:50] I think the following tools are affected: editgroups, isa, dewikigreetbot, scholia, jorobot, ordia, phamhi-tool (CrashLoopBackOff though) [22:51:43] being tracked as https://phabricator.wikimedia.org/T239670 [22:51:48] FYI fnielsen ^ [22:52:15] Thanks [22:59:37] throw anticompositetools on the pile too, getting 504s [23:00:18] I guess that's my cue to stop working and go get food :) [23:00:50] AntiComposite, don't think that pod existed when I looked before :) [23:56:13] Ugh, of course I broke a thing while toolforge is also broken [23:56:18] Joutbis, AntiComposite: try now [23:56:26] There we go [23:57:01] (I managed to test and fix my issues using lynx over ssh) [23:57:58] bst.orm fought with Puppet and managed to get kube-proxy back up on the proxy [23:59:53] yes, it works for me