[09:17:31] Hello everyone! :) I wanted to know something. I had a very long bash script. Around 40 000 lines. I tried running it on the terminal of Git Bash and it said that it was too long, unsurprisingly. Then I tried submitting it as a job at the job grid, mostly to see what would happen and it shows the job is running with no errors. Can someone tell me [09:17:31] what's really happening? Can the job grid run big scripts like these while the terminal can't? [09:34:40] !help Please, see above. [09:34:40] If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban [09:35:32] Bigem: heavy tasks should run in the grid, indeed [09:35:59] CAN they be run though? Or is it just a sort of glitch? [09:36:35] Bigem: you should NOT run any workload on the bastions. Use the grid for that. Use the bastions only to schedule stuff in the grid. [09:37:13] arturo, I know. I was just testing if the script was correct. I stop the operation as soon as it starts. [09:37:32] I just want to know if the grid can run heavy jobs like that or not. [09:37:59] Bigem: the bastion is just a linux box with a shell interface. We have a strong resource limitation in place for users in the bastion [09:38:17] the grid can run anything. You can run month-long jobs, no problem [09:38:48] Bigem: that being said, I don't know the details or nature of the failure you are reporting [09:39:35] specifically, I'm not sure what does this means: "I tried running it on the terminal of Git Bash.." [09:42:44] arturo, there is no failure per se. I tried running the 40 k lines task at the terminal (a simple list of find and replace regex-es for a pywikibot). It said that it had too many arguments and it couldn't be run. Then I wrote a wrapper script containing those 40 k lines and tried running on the job grid. It says it is still running. I wanted to [09:42:45] know if that's really true or not, somehow because I didn't believe that it would be able to run. It shows no errors but it shows no "out" output too because of the nature of the regex-es (most likely they won't be able to find anything soon enough) and so I was not sure if the job was actually running or not truthfully. [09:44:24] Bigem: if you didn't solve the input argument errors, it is very likely it is still failing in the grid [09:46:13] arturo, I don't believe there really was any error involved apart from the fact that there were just too many find and replace regex-es for it to handle. [09:47:51] Bigem: perhaps this can help: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Using_%E2%80%98qstat%E2%80%99_to_return_status_information [09:48:48] arturo, yes, I tried that. It reads that it still is running. [09:49:38] perhaps you can cancel the job and add some logging messages to your program, so you can follow the execution by reading the logs [09:53:55] Unfortunately I'm not that versatile in programming yet. :P The idea of my question was simpler: Can the grid handle a heavy job like that? You told me that it can, basically so it's all right. [09:59:19] ok :-) [12:46:22] ohmy 40KLOC of bash [12:46:31] * joakino hides in a corner and shivers [17:34:29] All of my Toolforge cron jobs are failing. [17:34:31] error: commlib error: got select error (Connection refused) [17:34:40] error: unable to send message to qmaster using port 6444 on host "tools-sgegrid-master.tools.eqiad.wmflabs": got send error [17:34:52] Also cannot qstat on the bastions [17:37:04] !log tools service gridengine-master restart on tools-sgegrid-master [17:37:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:39:58] JJMC89[m]: any better now? [17:40:15] I can qstat now [17:40:27] + jobs actually run? [17:41:46] JJMC89[m]: ? [17:42:06] I started one manually and it seems to be running fine. [17:42:16] Will check cron in 5 [17:42:24] cool [18:23:30] andrewbogott: cron looks good too [22:29:45] !log tools draining tools-k8s-worker-58, tools-k8s-worker-56, tools-k8s-worker-42 for migration to ceph [22:29:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:02:06] !log tools uncordoned tools-k8s-worker-58, tools-k8s-worker-56, tools-k8s-worker-42 for migration to ceph [23:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:03:29] !log tools depooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph [23:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:20:53] !log tools repooled tools-sgeexec-0941 and tools-sgeexec-0939 for move to ceph [23:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL