[00:09:02] legoktm: +1. I like the library (non-framework) usage of pwb [00:18:12] zhuyifei1999_: how aggressively does the webgrid nodes cache stuff? [00:18:25] cache what? [00:18:48] Because I've got an impossible logic issue. I have a PHP constant that defines the version of IABot to v2.0beta11 [00:19:06] THat constant is used when loading the footer at https://tools.wmflabs.org [00:19:10] THat constant is used when loading the footer at https://tools.wmflabs.org/iabot [00:19:36] Yet the footer is still saying v2.0beta10ehf1 [00:19:50] When I checked the file the constant is being defined correctly. [00:20:00] and it doens't fix itself even if you restart the webservoce? [00:20:10] So now I have a bunch of ??? floating above my head [00:20:29] I just restarted and it immediately loaded outdated footer data. [00:20:42] It's still loading it. [00:20:59] * Cyberpower678 restarts the webservice again. [00:21:00] mind if I `become iabot`? [00:21:33] zhuyifei1999_: sure [00:21:54] nothing persists across a restart in either the Kubernetes or grid engine backends. I guess there could be NFS file cache delays [00:21:55] zhuyifei1999_: there it goes [00:22:03] It's now up to day [00:22:13] *date [00:22:18] * zhuyifei1999_ did nothing [00:22:23] I would not expect NFS cache to persist for more than 1-2 minutes at most [00:22:34] bd808: it was stuck for almost 10 minutes [00:22:45] I never hit NFS cache honestly [00:22:47] Cyberpower678: is your backend Kubernetes or the grid engine? [00:23:03] like, I've never hit it such that it becomes an issue [00:23:08] I know it usually takes up to two for the changes to propagate, so I usually wait before checking. [00:23:09] it seems like grid [00:23:27] bd808: no idea. Whatever webservice restart loads [00:23:49] I also have 5 workers running on the grid. They are restarted as well. [00:23:53] well that depends 100% on what you typed for `webservice start` :) [00:23:58] trusty grid even [00:24:17] bd808: just "webservice start" ;p [00:24:23] Cyberpower678: see if the stretch grid / k8s helps [00:24:43] dunno, but it could be possible that it's a kernel issue [00:25:11] actually, could you try changing the constant again, let me check what the php process loads [00:25:34] Cyberpower678: ^ [00:25:35] zhuyifei1999_: feel free to fudge with it yourself. :-) [00:25:41] which file is it? [00:25:46] Just as long as you change it back. [00:26:04] It's /mnt/nfs/labstore-secondary-tools-project/iabot/IABot/init.php [00:26:12] ok, so lighttpd-cgi on gird engine. The only thing I can think of beside NFS cache (which should empty relatively quickly) is the restart not actually having killed the original process yet [00:26:35] zhuyifei1999_: define( 'VERSION', "2.0beta11" ); [00:26:38] near the bottom [00:27:02] oh my god your php processes are really busy [00:27:18] * zhuyifei1999_ just straced and my screen was flooded [00:28:01] They have a lot of work to do. [00:28:12] * bd808 sees that zhuyifei1999_ has his hammer out again ;) [00:28:27] lol [00:28:31] :-) [00:32:40] I don't see php opening the php file [00:33:42] zhuyifei1999_: It should be. It's the first thing it opens to even start. Without the interface can't even get a byte of HTML out. [00:34:37] I know. it's weird [00:34:42] bd808: zhuyifei1999_: BTW, is there a way for PHP errors to be displayed as they happen in HTML on the phpcgi? [00:34:54] what do you mean? [00:35:15] I just checked, the php files aren't mapped into memory either [00:36:01] The tool gets used pretty often so when I try to get it to throw an error when it errors out, I have to defer to the logs, only that it sometimes gets lost in other error messages of which I cannot determine the origin cause. It would be nice to have prod throw some so I can see what is failing on prod as it happens. [00:36:07] Because it doesn't fail on dev. [00:37:02] So for example if the tool times out, instead of a blank screen I get a PHP Fatal error visibly displayed. [00:37:28] so you mean it has too many error messages and you can't find the important error? fix all of them :P [00:38:16] zhuyifei1999_: the files would end up in opcache after the initial read and only be re-read at most every 2 seconds if the timestamp on the source file changes [00:38:30] oh [00:39:27] zhuyifei1999_: I try to, but the error messages happen randomly. So I can't replicate it. [00:39:39] oh I see it stats the php files [00:40:13] I want to have it printed in HTML, maybe hidden behind. How do I enable it for my tool? [00:41:05] Cyberpower678: you can make a $HOME/public_html/.user.ini file, and add "display_errors=1" in that file. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#PHP [00:41:25] bd808: thanks. :D [00:41:50] https://secure.php.net/manual/en/errorfunc.configuration.php#ini.display-errors [00:45:34] * Cyberpower678 sees zhuyifei1999_ messing around with the VERSION constant. :p [00:45:44] did you restart webservice? [00:46:00] I'm getting no such process [00:46:06] zhuyifei1999_: so possible delays from php's stat() cache, the kernel's stat() cache, NFS server->client change propagation, opcache refresh threshold. But a full restart of the webservice should nullify all of that except the NFS server->client delay which really should be short [00:46:43] zhuyifei1999_: yes I did. I just installed a .user.ini file under bd808. For good measure I restarted the webservice [00:46:53] I greeped through all my straces and got: [00:46:57] https://www.irccloud.com/pastebin/02F9EYHL/ [00:47:23] after changing the timestamp it did do an open() [00:49:08] Cyberpower678: could you try moving to k8s / stretch? see if it'll help [00:49:20] zhuyifei1999_: how? [00:49:43] https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation [00:49:52] ^ you should have received many mails about this [00:53:15] zhuyifei1999_: Yes I did. It tells me to move it, but I don't believe I received instructions on how to. I just assumed webservice restart would move it for me. [00:53:36] it wouldn't [00:54:04] https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#Move_a_grid_engine_webservice <= this is to move to k8s [00:55:06] if you stop on the trusty bastion and start on stretch bastion (without specifying --backend=kubernetes) it should go to stretch grid [00:58:38] zhuyifei1999_: I use tools-login.wmflabs.org [00:59:54] that's trusty [01:01:21] zhuyifei1999_: Am I running on stretch now? [01:02:19] uh no [01:02:28] which host did you ssh into? [01:02:35] But I logged into login-stretch.tools.wmflabs.org [01:03:54] did you start your webservice there? [01:03:56] bd808: is tideways installed on Stretch> [01:04:05] zhuyifei1999_: yes [01:04:09] like, stop on trusty, start on stretch [01:04:15] huh, let me try [01:04:21] Well I killed it on stretch [01:04:27] Seems to have died. [01:04:37] Then I started it on stretch [01:04:54] it should now me on stretch [01:05:04] huh [01:05:05] ? [01:05:08] like, it should show up in qstat on a stretch bastion [01:05:17] and not on a trusty bastion [01:05:52] I deleted the jobs again. [01:06:02] qstat is empty [01:06:12] The crontab has also been reloaded onto stretch [01:06:22] So it should restart the workers. [01:06:23] are you on 'tools-sgebastion-07' [01:06:32] like, where you are starting the jobs [01:06:55] I don't see your crontab there [01:07:01] tools.iabot@tools-bastion-03:~$ [01:07:14] that's trusty [01:07:15] become iabot took me there. [01:07:32] ssh login-stretch.wmflabs.org [01:07:43] and then? you should see tools-sgebastion-07 [01:07:50] not tools-bastion-03 [01:08:10] tools-bastion-* is trusty, tools-sgebastion-* is stretch [01:08:22] I'm an idiot. I wasn't logged out of trusty so the command failed and I didn't even realize it. [01:08:37] Lol [01:08:39] lol [01:08:55] well... :) [01:09:39] ssh: Could not resolve hostname login-stretch.wmflabs.org: nodename nor servname provided, or not known [01:09:47] oh. Wait [01:10:07] the manifest is confused right now, you might want to qdel and start it again via webservice command [01:10:17] it's login-stretch.tools.wmflabs.org [01:11:11] zhuyifei1999_: It should be running on stretch now [01:11:53] It feels like IABot's Web Interface is loading faster now [01:12:08] Except that I get 502 Bad Gateways now. [01:12:20] I just restarted it again [01:12:22] That didn't happen on Trusty [01:12:24] oh [01:12:34] service.manifest was confused [01:12:44] (done) [01:12:55] Wow. It's definitely faster [01:14:28] * zhuyifei1999_ starts my strace [01:14:35] zhuyifei1999_: my bot workers aren't starting [01:14:46] huh why not? [01:15:16] nvm [01:15:41] * Cyberpower678 is being impatient. :p [01:16:03] * Cyberpower678 continues to plan his trip to Stockholm [01:19:25] Cyberpower678: every time I change the version, it needs a while to reload [01:19:44] I'm assuming this is a good sign (it knows to update immediately) [01:28:13] zhuyifei1999_: is tideways installed on Stretch? [01:28:23] what's tideways? [01:28:33] PHP metrics analysis extensions [01:28:43] Let's IABot log performance issues. [01:28:56] how can you check if it's installed? [01:30:10] I think with php -m [01:30:55] zhuyifei1999_: no it's not. Any chance of getting it installed on stretch. MW uses it on prod [01:32:34] could you file a ticket? [01:32:44] zhuyifei1999_: https://tideways.com/profiler/xhprof-for-php7. I did a while ago. [01:33:00] The ticket collected dust and went into limbo [01:33:04] uh, link? [01:33:51] I don't remember [01:34:47] zhuyifei1999_: here's a more recent one though [01:34:50] https://phabricator.wikimedia.org/T202825 [01:37:27] that's for k8s [01:37:52] might need a seperate ticket for grid, or if you mention it explicitly [01:49:42] zhuyifei1999_: I renamed the ticket. :p [01:50:35] ok [01:51:20] well, some might want it on k8s as well [05:36:14] zhuyifei1999_: there is definitely something wonky going when transferring files via SFTP [05:36:32] It seems to damage them. I'm getting Syntax parse errors everywhere. [05:36:49] It's as if the NFS is locking the file mid-write. [05:37:13] So I am forced to delete the files and reload them [05:41:54] Ugh, and it's making everything behave weirdly too. I've never had these kinds of issues before. [05:44:48] * Cyberpower678 is getting really frustrated here. [05:45:27] There we go [06:25:04] Cyberpower678: huh? [06:25:56] I rarely use sftp. do you some 'minimum reproduceable steps'? [10:46:29] Hi, can anybody help me? I'm trying to update a page from the webservice, but after a few seconds, I get a timeout. The uwsgi.log file shows "OSError: write error" and the update gets aborted (I suppose). Where can I increase the timeout? Or is there a better way to do it? [10:50:10] !help [10:50:10] Joutbis: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [10:59:41] Joutbis: what is "the webservice"? [11:02:46] a flask app [11:03:20] on kubernetes, to set it up I followed the procedure in https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Flask_OAuth_tool [11:15:38] I suppose I should fiddle with timeouts in nginx.conf but I don't know if there's a way to override the default for my app. Something like what they do here, perhaps? https://stackoverflow.com/questions/16141610/nginx-timeouts-when-uwsgi-takes-long-to-process-request [13:41:32] !help [13:41:32] Joutbis: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [13:43:19] Is there some guide on how to modify a pywikibot application to run from a webservice and use an access token granted via Oauth? Is it just a matter of overriding authenticate from user-config.py? [13:44:43] I have an application that runs from the command line as the bot; I have created a flask webservice that would allow to run the same application, but it should run as the webservice's user. [13:48:34] Joutbis: when you say 'webservice's user', do you mean whoever accesse the website? [13:48:40] Or the tool account under which the webservice runs? [13:51:01] The latter should be relatively easy (essentialy, follow https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth/Wikimedia). The latter... should be possible, but pywikibot has a very user-config.py-centric configuration workflow [13:54:54] looking at the code, you should be able to modify config.authenticate (and config.usernames!), *but* this is a global configuration, so you will have to start a new process for every user [13:55:06] My question meant "whoever accessed the website", but I wouldn't really mind doing it the other way [13:56:04] and I would not be surprised if it doesn't work anymore after the first request has been sent out. Also... some requests are cached, and that cache may become a mess with request from different users. [13:56:43] the problem is that right now I get the following error: Logged in on wikipedia:ca via OAuth as 10.68.22.118, but expect as JoRobot [13:57:48] perhaps it's because of a previous error: OAuth authentication not supported: No module named requests_oauthlib [13:57:59] that sounds quite likely [13:58:05] Maybe I need to install pywikibot in the kubernetes virtual envrionment? [13:58:16] and requests? [13:59:14] yes, create a virtualenv and install the dependencies there [13:59:31] pywikibot itself should also work, but pip install pywikibot will not install the scripts [13:59:46] so the goal should be to run the python script as the bot? [14:00:16] ? [14:00:18] as opposed to run it as the webservice's user? [14:00:33] Let's take a step back -- what are you trying to do? [14:00:44] yes, sorry [14:00:57] You currently have a (self-built?) bot based on pywikibot that normally runs from the command line? [14:01:09] right. From crontab, usually [14:01:21] But sometimes, it can't run because of missing data [14:01:37] So i have created a webservice that allows anyone to update what's needed [14:01:57] and then, i provide a button that would force the script to run again with the new data. [14:02:06] aaah! [14:02:16] But, when I run it from the webservice I get the OAuth error [14:02:23] In that case, I think the easiest option would be to just call jsub to submit the job as normal [14:02:44] I thought I needed the user's token to run [14:02:58] I tried that, too, but it said 'command not found' [14:03:17] ...hrm. [14:03:48] yes, you're right, jsub doesn't run from k8s :/ [14:05:25] one option could be to run the webservice on the grid (which would allow access to jsub), another option would be to write to a file on disk, and have a crontab that periodically checks that file [14:06:13] and there's the option you're trying now, to run the bot in the webservice container [14:06:50] Perhaps it's just a matter of installing the missing libraries in venv [14:07:05] yes, that should be possible [14:07:06] moving the webservice to the grid doesn't sound very standard. [14:07:27] does the script you run from the crontab also use oauth to login? [14:07:39] I will try to update venv. If it doesn't work, I will try your workaround from the crontab [14:07:55] no, the script uses the user-config.py [14:08:29] I guess it then uses OAuth, in the backgroudn [14:08:43] but then it's running from the grid, not the webservice [14:09:08] no -- pywikibot either uses a password given from by the user (default), a botpassword (an option in the initial configuration) or oauth (if a user really wants to) [14:09:36] you mean in the login.py? [14:09:48] yes [14:10:00] then no, it's using oauth. The secrets are in the user-config.py [14:10:16] ah, ok [14:11:05] one more question -- how are you starting the bot from the container? [14:11:25] do you `import` it, or do you use `subprocess` to start a new process? [14:14:56] If tried with subprocess [14:15:21] first, with subprocess.Popen [14:15:37] then with subprocess.check_output [14:16:32] I have tried to add requests to the venv, but it was already there because I already had mwoauth. I wonder why it still complains [14:17:26] I am running a shell script, in fact, not directly python. The shell script updates PYTHONPATH and cd's into the scripts home directory [14:18:04] Joutbis: right -- the script you're running doesn't run in the webservices virtualenv [14:18:31] where is it running then, the grid? [14:19:03] no, it runs in the container, but it uses the normal system python, instead of the one in the virtualenv [14:19:05] maybe I'll try jsub again. I may have screwed up calling it. [14:19:36] but if it's in the container jsub isn't there, I see [14:20:28] so, I think the following is what should work: 1) shell into a container (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Shell); 2) create a new virtualenv in that shell, e.g. venv-kubernetes; 3) venv-kubernetes/bin/pip install [dependencies]; 4) close down the shell [14:21:00] Then in the script wrapper you've written, call /data/project/[...]/venv-kubernetes/bin/python instead of the regular python [14:21:26] Very interesting. I'll try that [14:21:36] Thanks a lot! [14:56:26] valhallasw`cloud it crashes? [15:19:21] valhallasw`cloud, one more problem. I have tried implementing your solution, and I can't create a Python 2 virtual environment (the script I want to run is python 2). There is a python 2.7.9, but the command "virtualenv" is "not found" [15:20:11] Joutbis: I'm a bit confused by that [15:20:21] I would expect virtualenv -p python2 [...] to work [15:22:07] What I do: kubectl exec it jorobot-3292732732-rp4v0 -- /bin/bash [15:22:30] python --version returns "Python 2.7.9" [15:23:01] but virtualenv -p python2 venv-kubernetes gives "virtualenv: command not found" [15:23:29] what kind of webservice are you running? (`webservice status`) [15:29:14] Your webservice of type python is running [15:29:33] the flask app is actually python3 [15:30:00] * valhallasw`cloud ponders [15:30:09] can you try running python3 -m venv ? [15:30:58] yes, I have done that, but then the python in venv-kubernetes/bin is python3 [15:31:03] hm, but that doesn't work, as that doesn't allow a different python version [15:31:22] so when I try to run a python2 script, it fails [15:32:20] Joutbis: I would try https://virtualenv.pypa.io/en/latest/installation/ under "To use locally from source:" [15:32:41] https://github.com/pypa/virtualenv/archive/16.1.0.tar.gz seems to be the latest version [15:57:28] I have to execute the virtualenv.py downloaded from github? I see [16:00:12] It worked !!!!!!!!!!!!!!!!!!!!111 [16:00:21] Holy cow !!! [16:02:17] I'd better write down all we have done. [16:04:28] Thank you very much, valhallasw`cloud [20:04:17] I have trouble to follow https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation#What_should_I_do? I can log in to ssh login-trusty.tools.wmflabs.org with no password but when I log in to login-stretch.tools.wmflabs.org I get immediately the following warning Connection closed by 185.15.56.48 port 22 [20:06:37] physikerwelt: that's weird! let me check. [20:06:42] seems to be temporary issue [20:06:43] works now [20:06:51] ok! [20:16:42] when I stop the webservice on trusty it automatically restarts intself after a while [20:23:03] physikerwelt: odd... could you try to rename service.manifest immediately after calling webservice stop? [20:25:27] I deleted the file. It was recreated after I started the tool on Stretch [20:26:16] uh, yes, that sounds like what should happen [20:26:37] is it a kubernetes webservice? I think that in that case both bastions would tell you it is running [20:27:37] now it was not restarted in a while [20:28:03] maybe deleting the file solved the problem... the type of the service has changed [20:28:35] it was webservice of type php5.6 and is now lighttpd [20:29:30] right, so it was a kubernetes webservice, and now you're running a grid-based one, which explains why you're only seeing it from the stretch bastion [20:30:07] webservice --backend=kubernetes php5.6 start [20:30:13] should bring you back in the initial state [20:30:32] (note that it will most likely both bastions will report the webservice is running) [20:30:59] ok. so I should be done with the migration [20:39:42] thanks a lot valhallasw`cloud I made it https://tools.wmflabs.org/trusty-tools/u/physikerwelt [20:43:44] :-)