[06:16:46] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/452293 (owner: 10L10n-bot) [08:54:36] (03CR) 10Lokal Profil: "> Patch Set 5: Code-Review+1" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/447794 (https://phabricator.wikimedia.org/T200325) (owner: 10Lokal Profil) [08:54:50] (03CR) 10Lokal Profil: [C: 032] Ensure unicode encoding of query results [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/447794 (https://phabricator.wikimedia.org/T200325) (owner: 10Lokal Profil) [08:56:25] (03Merged) 10jenkins-bot: Ensure unicode encoding of query results [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/447794 (https://phabricator.wikimedia.org/T200325) (owner: 10Lokal Profil) [08:57:28] (03CR) 10jenkins-bot: Ensure unicode encoding of query results [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/447794 (https://phabricator.wikimedia.org/T200325) (owner: 10Lokal Profil) [09:24:12] !log tools.heritage Deploy latest from Git master: 5ea3c21, 0d6158d (T200325) [09:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [09:24:17] T200325: Handle encoding of sort keys - https://phabricator.wikimedia.org/T200325 [11:24:31] jynus: ping RE T201674 [11:24:32] T201674: m5-master: setup for new keystone daemon - https://phabricator.wikimedia.org/T201674 [11:25:01] we are doing this operation today [11:25:08] will take care of that after lunch [11:25:17] thanks [13:36:15] ping jynus [13:41:38] yes? [13:42:16] jynus: T201674 [13:42:18] T201674: m5-master: setup for new keystone daemon - https://phabricator.wikimedia.org/T201674 [13:42:26] yes, when I have the time [13:42:47] we are about to start the operations, in 15 mins [13:43:09] ? [13:44:26] jynus: it would be great if you could handle this now [13:44:56] I am starting now [13:45:05] thanks! [13:45:16] (to read= [13:52:03] arturo: so there seems to be security issues on all your cloud db accounts [14:04:55] jynus: are these new issues related to today's maintenance? [14:05:10] oh, sorry, you're discussing elsewhere [15:53:08] zhuyifei1999_: Hello [15:53:18] hi [15:53:31] Please I am trying to use grid engine now for the bot [15:53:50] ok [15:54:30] I went through a few documentations and I don't know if this is the start 'jstart [options…] program [args…]' [15:55:03] actually not clear to me [15:55:33] so jstart is a command [15:55:54] yes [15:56:04] this is what I am using https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Submitting_continuous_jobs_(such_as_bots)_with_'jstart' [15:56:21] [options…] is the options for jstart itself, like the name of the job (-N name) and the virtual memory limit for the job (-mem amount) [15:57:13] ok [15:57:19] program [args...] is your usual command. say if you want to run `python foobar.py` then you write `python foobar.py` for program [args...] [15:57:58] ok [15:58:26] so a complete jstart command could be like: $ jstart -N say_hi -mem 120M echo Hi [15:58:54] so I do that in the bot directory? [15:59:21] the command always runs with your home directory as the working directory [15:59:32] I mean in grid, for program [args...] [15:59:53] ok [15:59:57] but there's an option to change that, but I don't remember what exactly [16:00:26] (there's an option to let the program run with the same working directory as jstart) [16:01:11] am checking that online now [16:01:47] uh, jstart is our own code. the relevant manual should be in qsub(1) [16:03:15] ok [16:19:53] I can't find an option for that [16:22:32] yeah, the man page is really long [16:22:42] had to find stuffs [16:22:45] *hard [16:23:53] the common options are just -N, -mem, and -quiet, (and -continuous and -once which are default in jstart) [16:24:14] zhuyifei1999_: I found something finaly [16:24:40] -cwd [16:27:07] zhuyifei1999_: please what is the expected output [16:27:38] I get just a message of your job has been submited [16:32:29] yeah that’s expected [16:34:52] the stdout and street should be redirected to jobname.out and err [16:38:01] (I just made a mess in my gnome build, please wait while I try to fix my desktop :P) [16:42:20] stdout and 'street'? [16:42:34] that's a hella autocorrect right there [16:43:26] yeah [16:43:35] hate typing on mobile [16:45:32] for real [16:47:33] right now I’m screwed for real [16:48:35] https://usercontent.irccloud-cdn.com/file/9xDoiiau/1534178869.JPG [16:48:44] ok [16:58:50] oh shhhhh.... [16:59:17] that's an unhappy camper right there [17:00:58] !help I'm having connection issues for traffic between the cloud VPS that runs P&E Dashboard and a tools.wmflabs.org tool. All the queries are getting connection timeout errors. It only seems to be affecting P&E Dashboard, as I can connect just fine my dev environment and the Wiki Education Dashboard production server (which doesn't run on Wikimedia Cloud). [17:00:58] ragesoss: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [17:02:08] ragesoss: since when? [17:02:21] arturo: about 2 hours ago. [17:02:41] ragesoss: ok, then it's probably related to the operations we did with keystone [17:02:55] we are working on some final things, stay tunned [17:03:10] arturo: great, thanks. [17:04:43] ragesoss, what tool exactly? [17:07:46] Krenair: both https://tools.wmflabs.org/wikiedudashboard/ and https://tools.wmflabs.org/eranbot/plagiabot/api.py [17:08:13] well both of those links work fine for me [17:08:52] Krenair: yes, me too. and they work fine from the app, except for the instance running at outreachdashboard.wmflabs.org [17:09:27] so those URLs break from another cloud VPS instance [17:09:31] interesting [17:12:06] hmm i just got "PROBLEM - Host gerrit.git.wmflabs.org is DOWN: PING CRITICAL - Packet loss = 100%" [17:12:51] paladox, well I can ping the IP [17:12:59] it loads for me [17:13:00] too [17:13:06] but i just got that [17:13:30] I can SSH into the instance behind it too [17:13:45] paladox, where did you get that alert? [17:14:02] Krenair in #wikimedia-bots-testing (i have icinga2) [17:14:55] paladox, so that's a ping from one of your own personal hosts which fails? [17:15:02] or is icinga running on cloud vps? [17:15:07] Krenair nope from a cloud vps [17:15:11] well that's interesting [17:15:16] that sounds like ragesoss's problem too [17:16:21] well this is very interesting [17:16:28] krenair@bastion-01:~$ host gerrit.git.wmflabs.org [17:16:28] gerrit.git.wmflabs.org has address 208.80.155.149 [17:16:28] still shows as down [17:16:34] Why does it resolve to that IP from within labs?? [17:16:38] see https://gerrit-icinga.wmflabs.org/dashboard#!/monitoring/service/show?host=gerrit.git.wmflabs.org&service=ping4 [17:16:41] labsaliaser should be forcing the recursor to give you a private IP [17:17:18] same for ragesoss [17:18:33] doing "ping gerrit.git.wmflabs.org" from gerrit-mysql results in it hanging [17:18:59] then when i ctrl-c i get "36 packets transmitted, 0 received, 100% packet loss, time 35809ms" [17:20:45] apergos: fixed it. don’t worry :) [17:20:45] Krenair wierd thing is pinging gerrit-test3.git.eqiad.wmflabs work from gerrit-mysql [17:20:55] that's not weird [17:20:55] which is where gerrit.git.wmflabs.org is hosted. [17:20:56] congrats, zhuyifei1999_ [17:21:00] given what we know about the bug, that is to be expected [17:21:08] oh [17:21:15] gerrit-test3.git.eqiad.wmflabs will always give you the private IP [17:21:22] which will work within labs [17:21:30] oh [17:22:22] it was working up until 16:34:31 [17:22:24] bst time [17:40:28] paladox, ragesoss: fixes are going in [17:40:45] it's https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/452427/ [17:50:34] Krenair oh thanks [18:09:47] Krenair hmm still having the same problem [18:09:50] after that is merged. [18:10:03] paladox, yeah this stuff doesn't get fixed just because someone merged a commit [18:10:09] it has to be applied [18:10:13] oh [18:10:26] and then in some cases, stuff run [18:12:46] in this case the fix is on labs-recursor1.wikimedia.org but not labs-recursor0.wikimedia.org [18:13:41] oh [18:17:12] ragesoss, paladox: alright try no [18:17:13] now [18:17:23] * paladox checks [18:17:31] nope [18:17:33] still says [18:17:34] PING gerrit.git.wmflabs.org (208.80.155.149) 56(84) bytes of data. [18:18:02] what does `dig gerrit.git.wmflabs.org` say? [18:19:19] Krenair: appers to be fixed. [18:21:05] paladox? [18:23:15] alright well I've got to go but good luck [18:26:09] Krenair see https://phabricator.wikimedia.org/P7453 [18:28:47] works now [19:09:35] r054l13: can you catch me up a little? what command did you run (the full command please) to start the bot? [19:09:54] ok [19:10:17] * d3r1ck lurks [19:10:44] jstart -N WM-Emoji-Bot npm start [19:10:55] that is the command to start the bot [19:10:59] I used [19:11:16] ok now.... [19:11:21] let me ask a couple of questions [19:11:31] ok [19:11:38] the grid ending needs to start your virtual environment right? [19:12:01] so there was to be a script that would start your virt env and then run the bot [19:12:03] well I started the virtual env myself [19:12:13] ok [19:12:15] yes but you started it on the host you're logged inon [19:12:23] remember we said the grid engine is servers other than this host [19:12:32] you can't think all the tools run directly on the one box :-) [19:12:44] ok [19:13:07] so remember yesterday we talked a bout a bash script that would be executable and etc? [19:13:10] you need that [19:13:20] ok [19:13:27] otherwise it will run the same old baad npm and we know how that goes [19:13:30] soo..... [19:13:34] stop the job [19:13:39] and write your script! [19:13:56] so in the script I need to start the env and latter start the bot [19:14:02] ok [19:14:56] yes [19:15:05] and don't forget in the script [19:15:09] to cd into you project directory [19:15:14] *your [19:15:27] yea [19:15:28] and to use paths of things it might not find [19:15:36] the error, I saw it, it said uh [19:15:52] [2018-13-08T16:08] /usr/share/npm/bin/npm-cli.js exited with code 134. Respawning... [19:15:55] so you see there: [19:16:00] yes [19:16:01] /usr/share/npm/bin/npm-cli.js [19:16:05] /usr/share/npm [19:16:10] that's not the good npm right? [19:16:22] that's how you can tell it went wrong... [19:16:55] yes but there are some errors right bellow in the .err file [19:17:04] which one is the good one anyways? if you put which npm (with the env activated), what is that? [19:17:10] whose path it shows are different [19:17:39] oh I see the new ones at the end [19:17:41] /mnt/nfs/labstore-secondary-tools-project/wm-commons-emoji-bot/www/js/app_env/lib/node_modules/npm/bin/npm-cli.js [19:17:44] those ones right? [19:17:50] yes [19:17:53] that's better as a path [19:18:12] yeah. for the first errors I was not in the env [19:20:02] ok [19:20:17] what did google tell you about this sort of error? [19:21:14] I had a look at this https://github.com/nodejs/node/issues/21433 [19:22:05] I'm not sure that's related.... https://phabricator.wikimedia.org/T113826 this may be helpful (hopefully jstart takes the memory option) [19:26:18] ok I will allocate more space to the bot [19:26:44] let's see what happens! [19:28:58] please do let me know :-) [19:33:16] done [19:33:27] I get a different error [19:34:27] SyntaxError: Unexpected token } [19:34:40] apergos: that's it [19:34:46] * r054l13 checking online [19:35:03] ok one issue down [19:35:08] let's see what the next thing is [19:35:42] this is probably that it's using the older version of node! [19:36:21] so you have to deal with the virtenv stuff now [19:36:46] ok [19:38:11] I'm not sure what you found, but I found this about the specific syntax error for that code: https://github.com/remy/nodemon/issues/1227 [19:44:22] found a few things but not very helpful [19:45:09] from what you sent, the bot was ran from the node version on toolforge and not the virtual env [19:45:26] seems very likely [19:45:46] so: bash script writing time [19:45:53] ok [19:45:59] I have a lil question [19:46:08] it can be a big one, too, that's ok :-) [19:46:10] the script is just a set of command s right [19:46:13] uh huh [19:46:24] well you have to put the line at the top to indicate it's a bash script [19:46:24] meanning it is just puting what I do manually in a file? [19:46:39] yes [19:46:48] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#My_shell_script_job_fails_with_%22Exec_format_error%22 [19:46:54] and you have th make the script executable [19:47:05] and then you just give the script name as the command to jstart [19:47:08] but yes [19:47:43] it's the commands you type to the bash shell on the login host [19:47:55] except they go in a file and get executed that way [19:48:01] what I mean is there may be no difference with the script as it will just execute the comands in the file in a particular order [19:48:06] yes [19:48:09] just like I do manually [19:48:11] there are a few things though: [19:48:19] when you log in, you are in a certain directory [19:48:20] ok [19:48:29] ok [19:48:38] and the script, when it runs, it won't necessarily be so you have to put an explicit cd into the directory as the first thing [19:48:59] I don't know if there will be other things you need, we'll see (an environment variable or something) [19:49:03] we'll find out [19:49:12] ok [19:49:31] lemme know when you have a script ready to try! excited to see this go at last [19:56:51] Krenair: thanks much for that fix! [20:06:16] r054l13: wait wait wait [20:06:22] two things: [20:06:32] first, give the full path to the directory to cd into [20:06:47] like, go into it yourself and pwd, and copy that in [20:07:00] and second, the jstart command doesn't go into the script [20:07:24] the script gets passed to the jstart command! do you know what I mean? [20:07:58] about the directoyr, the scriptprobably is run from some random working directory unrelated to yours, that's why you have to giv e afull path not a relative one [20:08:10] otherwie it probably won't even find the directory yuou give it [20:09:00] ooooh [20:09:45] apergos: I understand what you mean [20:09:53] thanks [20:10:09] but do both of those things make sense? [20:10:20] if the reasons why are not clear, stop until they are [20:20:01] I did the script [20:20:29] apergos: trying to run that [20:20:44] ok stop [20:20:52] stop stop stop [20:21:00] first, the directory thin [20:21:00] g [20:21:03] ok [20:21:10] did you put the full path to the directory yet? [20:22:00] yes [20:22:18] ok great [20:22:24] and you activate the environment [20:22:37] yes [20:22:38] and then does the script run your bot? like, you added that command too? [20:23:07] wanted to try it without that first [20:23:17] but will add the bot dstart command [20:23:20] uh... well it will just restart it over and over because [20:23:29] it will exit with 0 I guess or something [20:24:27] also: [20:24:31] ok [20:24:48] s there a file /data/project/wm-commons-emoji-bot/app_env/bin/activate ? [20:24:56] because that's what your script seems to want right now [20:25:06] yes [20:25:37] I don't find such a file [20:25:41] are you sure that's the right path? [20:25:51] oh no [20:26:31] corrected [20:26:34] ok! [20:27:58] and what's the full jstart command you use now? [20:29:23] jstart -N WM-Emoji-Bot -mem 1G wmeb.sh [20:32:59] ok [20:34:13] I see you still have the same error, so I'd like to see if the environment vars we want are set, from the environment activation [20:34:35] you cna do this by putting a printenv into the script right after the 'activate' line [20:34:39] this should produce some output [20:37:58] ok [20:39:08] I don't see any error in the .err file [20:39:09] word of warning: it's 11:40 pm here, I'm not good for much longer (sorry) [20:39:22] ok noted [20:39:32] oh, no errors now! ok [20:39:35] well [20:40:44] I should restart the job right? [20:41:09] ah you want the printenv before the bot run [20:41:12] not after :-) [20:41:33] because in theory the bot run never finishes, so then the printenv would never print otherwise! [20:41:53] uh well [20:41:55] before you restart [20:42:03] do you have a job id for the job or anything? [20:42:11] can you get its status? [20:42:35] I use its name [20:43:49] can you try to get a job id for it? [20:44:15] ah nm [20:44:19] I see some output good good [20:44:55] yes [20:45:07] yes for I can get the id [20:46:36] all right, so what does qstat show you anyways? [20:48:57] seems what I am using is not the job id [20:49:11] * r054l13 checking the docs [20:51:28] ok [20:52:07] the other thing you can do (so we can see if the bot actually exits or not) is to put a line like [20:52:15] echo "done now" [20:52:25] at the end of the script and see if that shows up in the output [20:52:57] ok [20:53:13] if npm start returns after forking nodejs or whatever, then you might have an issue [20:53:19] we can work around that too I guess [20:53:57] leave the printenv, it's very helpful [20:54:01] where it is. [20:54:04] ok [20:56:34] what? [20:56:34] time to restart the job again.... [20:56:50] let's see what it shows! [20:56:58] ok can I delete the .out file? [20:57:05] so the output is clear [20:57:07] sure [20:57:51] almost midnight [20:58:24] it claims to be running [20:58:34] that is, I see no 'done' line at tthe end [20:58:40] so... is it actually running? [20:58:56] It is [20:59:45] check it is there now [21:01:36] and it's done [21:01:44] apergos: I think It is runing [21:01:46] so at a certain point your script exits without doing anything [21:01:52] the bot itself is running? [21:01:58] are you able to tweet to it? [21:02:13] when I go to the backend host I can't find the thing running at this point [21:02:19] the bot is not, just tried it manually and it gives me an error [21:02:23] but i got there after the 'done' is echoed [21:02:32] oh? what error? [21:02:34] so guess that's why the script stops [21:02:37] ah ha [21:03:40] https://pastebin.com/RBPjsF8e [21:04:58] /data/project/wm-commons-emoji-bot/.npm/_logs/2018-08-13T21_00_47_181Z-debug.log you're looking at this right? [21:05:26] no [21:05:36] now yes [21:06:44] ok [21:06:58] you probably have to add some debugging again in there, if you don't have it already [21:07:03] i don't have .npm/logs/ [21:07:04] NODE DEBUG or whatever it is [21:07:16] and the output wll hopefully show up in the right plce [21:07:36] you have it if you're inthe right directory [21:07:46] I've got to try to get some sleep, I'm sorry [21:07:55] ok [21:08:02] hopefully someone here can pick up with you, or you can carry on step by step yourself [21:08:13] just remember how you debugged the bot before [21:08:21] I think I can update the repo [21:08:25] ok [21:08:32] I think what is on master is good [21:08:37] great [21:09:01] I'll look here tomorrow to find out how it went [21:09:22] I guess you want to make sure you reference the grid engine in any of your docs and not webservice [21:09:30] even if in the end you put that it's not yet working [21:09:43] I'll look for the deployment manual patchset tomorrow too [21:10:04] I will be around tomorrow during the day but at like 4 your time I'm gone.... [21:10:23] ok I will update the docs [21:11:11] see you tomorrow, or see your messages of success and your commits tomorrow! [21:13:49] ok [21:14:14] I will have to make some merges before I update the code on toolforge [21:15:41] ok, good luck!! [21:15:53] thanks [22:32:47] ores.wmflabs.org shows as down for me [22:32:52] in the cloud [22:32:56] it shows the private ip [22:33:04] but ping shows 100% loss [22:33:05] cc andrewbogott ^^ [22:33:10] oh fun! [22:33:20] it works in my browser [22:33:32] but running ping inside the cloud show 100% packet loss [22:33:44] PROBLEM - Host Experimental ORES Website is DOWN: PING CRITICAL - Packet loss = 100% [22:34:01] what IP does it resolve to? [22:34:12] PING ores.wmflabs.org (208.80.155.156) 56(84) bytes of data. [22:34:16] when i ping [22:34:21] when i run "host" it shows [22:34:25] 10.68.21.68 [22:34:43] for me it resolves to 10.68.21.68 [22:34:46] and works…. [22:35:08] hmm [22:35:11] host works [22:35:14] but ping fails [22:35:15] 208.80.155.156 is the proxy [22:35:27] doesn't really matter what it is [22:35:36] other than that it's a labs public IP which you won't be able to route to from within labs [22:35:45] krenair@bastion-01:~$ dig ores.wmflabs.org @labs-recursor0.wikimedia.org +short [22:35:45] 10.68.21.68 [22:35:45] krenair@bastion-01:~$ dig ores.wmflabs.org @labs-recursor1.wikimedia.org +short [22:35:45] 10.68.21.68 [22:35:47] though strange thing is [22:35:51] it works on gerrit-test3 [22:35:51] I've tried on two different hosts and it routes properly for me [22:35:55] but fails on gerrit-mysql [22:35:57] paladox: have you tried anywhere else? [22:36:07] yup [22:36:12] works on gerrit-test3 [22:37:38] Yeah I see that behaviour on gerrit-mysql.git.eqiad.wmflabs [22:37:52] something must have cached the broken dns response somewhere [22:38:27] hmm though i did restart it earlier today i think [22:38:28] 3:43 [22:45:16] oh there we go paladox, andrewbogott [22:45:24] oh? [22:45:28] all it took was a `service nscd reload` [22:45:34] ah [22:45:35] works [22:45:37] ping is now getting the right IP [22:45:39] thanks [22:46:14] yup it's recovered now, thanks! [22:46:16] "RECOVERY - Host Experimental ORES Website is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms" [22:53:33] fwiw, nscd is not installed on random prod debian machines [23:06:08] !log tools fixed permissions of tools-package-builder-01:/srv/src/tools-webservice [23:06:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:06:54] mutante, yeah I think it's put in as part of the labs base images [23:07:00] firstboot.sh interacts with it [23:07:17] modules/labs_vmbuilder/templates/vmbuilder.cfg.erb:addpkg = [...], nscd, [...] [23:07:41] i see.. *nod*, thx [23:07:53] it's also added by ldap::client::nss [23:08:23] which of course all labs instances have [23:16:01] !log tools published toollabs-webservice_0.41_all.deb [23:16:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:29:57] (03PS1) 10Rosalieper: Created a developer manual for project in docs/devManual [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/452584 (https://phabricator.wikimedia.org/T190163) [23:31:48] !log tools rebuilding docker images for webservice upgrade [23:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:55:25] how do you leave a project? is it not possible if you are a user but no admin? [23:58:52] so if you are an admin and you remove yourself as member.. boom it logs you out and says " [23:58:55] Unauthorized. Please try logging in again. [23:59:12] and logs you out