[07:53:12] hi. my bot started to hang when run on the grid. if I manually ssh to one of the grid machines and run it with the same command line - it works fine. I woke up today to 740 jobs hanging in qstat and had to kill them. it would be really nice if I could connect to one of the hanging processes with gdb, but I don't have enough access rights. what do I do? [08:01:51] leloiandudu: I could help [08:02:09] whats the boot? [08:04:46] arturo: chie-bot [08:05:26] ack [08:05:28] not sure what happened, I didn't touch the code. have we recently installed anything mono-related? [08:05:37] oh yeah [08:05:48] we upgraded the mono framework yesterday [08:05:58] from 3.x to 5.x [08:06:06] hmmm [08:06:39] but still, if I run it manually - it finishes fine [08:06:45] it's eating cpu now [08:06:46] I sent an email to the mailing list, hoping to reach people [08:07:10] do you have any local mono framework version installed? [08:07:14] so it would be great if I could connect a debugger to it and check the call stack [08:07:59] arturo: I ran it manually on one of the grid machines (e.g. tools-exec-1428), even though there was another instance on that machine eating CPU [08:09:01] ack [08:09:38] sorry for the inconvenience leloiandudu :-/ [08:10:14] see T194665 for context [08:10:15] T194665: Provide an up-to-date mono environment on toolforge - https://phabricator.wikimedia.org/T194665 [08:10:31] * arturo finishing breakfast [08:13:45] arturo: thanks! sorry for interrupting your breakfast [08:14:45] looks like it's not only my bot. this is what I see on tools-exec-1430: https://pastebin.com/8EX5Bick [08:14:57] only the third process belongs to my bot [08:15:07] two others are unrelated and they also eat CPU [08:46:41] is it possible to use redis? [08:57:40] mbh is aware of the mono framework upgrade [08:57:49] the first one I don't know [08:59:57] leloiandudu: which process should I debug? [09:00:19] what does this cmdline means? `mono --runtime=v4.0.30319`? [09:01:40] arturo: it selects the runtime version. in mono 3, if you don't specify that, it will select the (older) v2 runtime [09:01:59] arturo: pid 25166 on tools-exec-1430 [09:02:07] but we have now mono v5.x, right? [09:02:14] runtime version != mono version [09:02:19] oh ok [09:03:27] arturo: https://stackoverflow.com/a/39249750/ [09:03:38] try this to get the stack trace [09:04:24] the bot may be stopped, right? [09:05:13] yes [09:06:15] if it doesn't work with that process - feel free to kill it, I have a bunch more [09:08:24] leloiandudu: in theory the stack trace should be now in stdout of the bot [09:08:34] not sure where can I fetch it [09:09:08] leloiandudu: BTW, please could you upgrade the U-A of your bot according to https://meta.wikimedia.org/wiki/User-Agent_policy to reflect that you're using mono? :) [09:12:45] vgutierrez: ok, I will, thanks [09:12:59] awesome, thx :D [09:13:36] I had your bot in the list of bots using old TLS settings.. but I couldn't track it as one of the bots being affected by the mono update @ toolforge :( [09:14:10] arturo: the stdout is ignored I'm afraid [09:15:59] if I run a simple bt gdb command, I don't see symbols names [09:16:01] https://www.irccloud.com/pastebin/qyqlAIgS/ [09:16:12] arturo: let's submit it again with output redirect? [09:16:25] leloiandudu: ok, let's try [09:16:47] arturo: hmm you can open stdout from gdb [09:17:11] arturo: https://stackoverflow.com/questions/1323956/how-to-redirect-output-of-an-already-running-process [09:17:16] it doesn't hang every time, so give me some time plz [09:17:51] of course before closing FD 1 (stdout) check where is pointing to, to be able to restore it afterwards [09:19:44] ack [09:21:05] leloiandudu: see if this helps [09:21:08] https://www.irccloud.com/pastebin/IUS7aQs3/ [09:21:11] arturo: pid 3883 on tools-exec-1413 [09:21:32] arturo: are toolsdb users aware of the upcoming outage? [09:21:47] jynus: I'm not aware myself [09:21:55] that is quite bad [09:22:16] there is switch maintenance today, we were told [09:22:35] arturo: it is stuck in a spinlock accessing wikipeida's API via HTTP(s) [09:22:46] sounds very much like a bug in this version of mono to me [09:23:08] vgutierrez: ^^^ [09:23:45] jynus: I saw arzel talking about it. A complete DC row, right? there should be some WMF-level announcement? [09:24:15] leloiandudu: do you have your bot source code published somewhere? [09:24:31] yes, https://bitbucket.org/leloiandudu/chiebot/ [09:24:50] awesome :D [09:25:16] vgutierrez: https://bitbucket.org/leloiandudu/chiebot/src/9d504b06fc31bcab6166d3b8c4cb9a6182e65c39/Browser.cs?at=default&fileviewer=file-view-default#Browser.cs-64 [09:25:26] this is the line it got stuck at [09:25:46] arturo: https://phabricator.wikimedia.org/T187962 [09:26:26] arturo: it seems chase was on the loop https://phabricator.wikimedia.org/T187962#4198186 [09:26:44] leloiandudu: could you set a sane timeout? the default one is 100 seconds [09:26:52] jynus: I see now the comments. We should be fine [09:26:57] leloiandudu: https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.timeout(v=vs.110).aspx [09:27:23] vgutierrez: I can, but it won't help. it is stuck in a spin lock, eating CPU [09:27:34] this should never happen [09:28:34] it really sounds like a bug in runtime [09:29:42] funny.. HttpWebRequest has been rewritten for mono 5.12 [09:29:47] leloiandudu: could you change the runtime version and tests? You said runtime != mono version [09:29:52] http://www.mono-project.com/docs/about-mono/releases/5.12.0/#class-libraries [09:30:30] vgutierrez: that sucks [09:31:00] arturo: I can test anything, just tell me which version to use haha [09:31:13] leloiandudu: which runtime version are you using right now? [09:31:33] --runtime=v4.0.30319 [09:31:40] let's try the default one? [09:32:51] leloiandudu: another stack trace (pid 3883 on tools-exec-1413) [09:32:53] https://www.irccloud.com/pastebin/4mvixrht/ [09:33:51] is everything ok on the server? my ssh session suddenly got frozen. I opened a second one and cannot get past the logon screen [09:34:39] vim won't open, just isn't doing anything until I press ctrl+c [09:35:17] I'm inside [09:37:03] leloiandudu: which host? [09:37:09] jynus: do you consider it would be fine to send an email to cloud-announce@l.w.o mentioning a possible outage of toolsdb? [09:37:20] zhuyifei1999_: I guess tools-exec-1413 [09:37:30] arturo: tools-bastion-03, suddenly it's fine again [09:38:18] perhaps somone was running some IO-intensive process on that host [09:39:43] vgutierrez: I removed the "--runtime=v4.0.30319" and it doesn't freeze anymore [09:40:00] leloiandudu: interesting, which runtime is using now? [09:40:31] I'm not sure, let me make a quick test [09:43:22] Environment.Version outputs the same "4.0.30319.42000" regardless of --runtime flag, so I'm not sure [09:43:40] wow, vgutierrez thanks, you are great! [09:44:50] leloiandudu: sorry for all the noise [09:45:37] arturo: vgutierrez: thank you very much guys, I will monitor it for some time and if anything goes wrong again, I will let you know :D [09:45:46] sure thanks! [09:46:07] leloiandudu: thx for your feedback :D [09:46:42] vgutierrez: what do you want me to state in the UA? mono or .NET? [09:47:42] leloiandudu: both :) [09:48:16] ok, but I don't really have dependencies on mono itself, so not sure which version number should I output [09:48:53] current mono version and current .NET runtime version used [09:49:05] and perhaps bot name, if that's not present already [09:49:22] that's already there [09:49:37] he's currently sending the bot name, his email and repo link (bitbucket) [09:49:40] cool, thanks! will do [09:49:44] almost perfect <3 [09:49:52] ^_^ I tried! [09:50:41] put some spam link in case some engineer is staring at logs on the other side :-P [09:53:59] lol [09:55:32] leloiandudu: BTW; you didn't recompile your bot, right? [09:56:01] I mean.. you compiled it with the old mono 3.2.8 and now it's running with 5.12.0, right? [09:56:22] I didn't. since February lol [09:56:57] it shouldn't matter because the produced binary should be identical (it only contains IL after all, no real code) [09:57:19] vgutierrez: and it doesn't link any libs statically [09:58:52] maybe from mono 3.2.8 to 5.12.0 some mcs bugs have been fixed O:) [09:59:30] vgutierrez: the new mono will allow me to use new C#7 features though, so I'm excited! gonna try that soon to update the UA [09:59:50] vgutierrez: true [10:01:48] leloiandudu: we are pushing this update cause we needed your bots to be able to speak modern TLS [10:02:04] vgutierrez: actually now Roslyn is used by default instead of mcs: https://www.mono-project.com/docs/about-mono/releases/5.0.0/#c-compiler [10:02:09] awesome! [10:02:24] mono 3.2.8 was stuck in TLS 1.0 and some crappy ciphersuites [10:02:51] I see haha. it was a really old version anyway [10:04:11] arturo: vgutierrez: ok, bad news. it stuck again. pid 24283 on tools-exec-1428 [10:04:30] consumes 97% cpu again [10:05:38] leloiandudu: just to be safe, could you recompile it with the current version and see what happens? [10:05:46] vgutierrez: ok [10:05:57] thx [10:07:58] it runs every 5 minutes, so it means not every run results in locking [10:11:43] ok, I can get more stack traces if required, just ping me [10:13:49] btw, I'm using this to email the output in case the bot fails: https://pastebin.com/exa85ijB is this the right way to do that? [10:28:46] leloiandudu: why do you map stdout and stderr to null? [10:29:43] oh wait, you mapped stderr to new stdout and old stdout to null [10:32:14] just some style issues: use single quotes. charset="utf-8" <= those double quotes are removed [10:32:39] otherwise it looks fine to me, although I might write it differently [11:31:45] zhuyifei1999_: oh, thank you! [11:31:59] good catch about the quotes [11:32:02] np [11:32:59] is there a way to submit a grid task with a timeout? so it will get killed after the timeout elapses [11:33:35] leloiandudu: I don't know about grid but timeout is a pretty standard command [11:33:36] arturo: pid 21729 on tools-exec-1430 [11:33:55] ack [11:34:03] do you have stdout under control? [11:34:19] we could strace [11:34:30] strace won't tell the function, right? [11:34:51] leloiandudu: $ time timeout 1s sleep 2 -> real 0m1.001s [11:35:00] strace would tell any write it does to stdout, though truncated [11:35:01] arturo: no, it's redirected to /dev/null [11:35:14] ack [11:35:37] jynus: thanks [11:35:59] leloiandudu: I suppose you could run qdel after sleeping for some time [11:36:06] give me a second, I closed all the stackoverflow tabs :-P I will put the info somewhere in wikitech [11:36:09] { sleep 10m; qdel $jobId; } & [11:36:47] Lucas_WMDE: I thought about it, but my bash-fu is not that strong haha [11:37:00] it's stuck in a futex call [11:37:23] why is it consuming CPU? [11:37:43] are futexes implemented via spinlocks? [11:38:00] good question, /me looks at the threads [11:38:31] thanks [11:38:41] thread with PID 21746 is consuming the CPU [11:38:58] I know, but I don't know why [11:39:12] zhuyifei1999_: previous stack trace: https://www.irccloud.com/pastebin/raw/IUS7aQs3 [11:39:34] leloiandudu: stack trace [11:39:37] https://www.irccloud.com/pastebin/NM85k4so/ [11:39:49] arturo: thank you! [11:40:20] it is quite the same as the last time [11:40:41] something deep inside Monitor.Lock spins instead of locking [11:40:49] and it is stuck [11:41:06] sorry, Monitor.Wait [11:41:48] (since once can't attach gdb to an already attached process, I guess I'll leave this to arturo :) ) [11:42:24] zhuyifei1999_: leaving gdb session now [11:45:05] ok so PID 21746 is stuck in user mode, probably spinning [11:45:28] is other threads waiting for this thread, similar to GIL in python? [11:47:02] other threads, except for this one and 21740, are doing futex() [11:47:30] 21740 is doing poll() [11:50:27] zhuyifei1999_: on my side, I don't create any threads and just make a blocking i/o call, but underneath, HttpClient probably spawns threads [11:51:31] my process is not the only one who gets stuck after the yesterday's mono upgrade. look at the 'top' [11:51:38] I'm trying to figure out what is it polling on. http? strace only gives restart_syscall() and gdb `bt full` says poll has no locals :( [11:51:58] oh no [11:53:55] here's the implementation of the function in question (System.Threading.Monitor.Monitor_wait ) https://github.com/mono/mono/blob/ae5d45e55959247807b860acffd6ebd07de7f13b/mono/metadata/monitor.c#L1338 [11:54:05] in the installed mono version (5.12) [11:54:23] this may help? http://www.mono-project.com/docs/debug+profile/debug/ [11:54:51] i.e `mono --debug program.exe` [11:54:59] I could attempt to read the information about the poll() from /proc/kcore but that is ridiculously difficult without debugging symbols [11:55:40] zhuyifei1999_: debugging symbols should've come with mono [11:56:29] bt is full of ?? so it's unfortunately not available. at least not auto-discovered [11:58:19] wait, there should be a separate debian package for debugging symbols [11:58:27] not sure if installed [12:00:10] mono-devel? [12:00:22] it's installed [12:00:38] this very much looks like a bug in mono. maybe we should just report it to mono people? but I'm not sure what to do in the meantime [12:00:45] oh wait mono-runtime-dbg [12:00:55] `mono-runtime-dbg` [12:01:05] not installed. let's install it? [12:01:21] installed! [12:01:28] try again [12:01:59] leloiandudu: we could always rollback toolforge to the previous mono version, cc vgutierrez [12:03:19] arturo: is the previous version uninstalled now? [12:03:35] leloiandudu: yes, all packages were upgraded [12:03:41] I see [12:04:14] https://www.irccloud.com/pastebin/4wYZix7N/ [12:04:24] [12:04:51] could we lack some additional package in the framework? [12:05:31] https://www.irccloud.com/pastebin/nRQWJHGQ/ [12:05:39] what about the other hanging processes on that machine? someone is running that mono-sgen thing and it is stuck too [12:05:46] I don't even know what is mono-sgen [12:06:48] * arturo created https://wikitech.wikimedia.org/wiki/Help:Toolforge/Mono [12:06:57] this is 21746, so calloc in user mode. I don't think this information is useful, since calloc returns really fast, so it must be something repeatedly calling calloc [12:07:41] arturo: is compile-time optimization. makes running faster but debugging harder [12:08:45] I thought it was about hiding ugly and meaningless symbols and numbers [12:09:26] I highly suspect it's polling on mono 21729 tools.chie-bot 4u IPv4 18103547 0t0 TCP tools-exec-1430.tools.eqiad.wmflabs:51523->text-lb.eqiad.wikimedia.org:https (CLOSE_WAIT) [12:09:44] how did you get the stack trace? [12:09:54] zhuyifei1999_: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Mono#Debugging_a_Mono_tool [12:10:37] could you get the stack of 21740? [12:10:53] on `tools-exec-1430`? sure [12:11:02] (thanks for writing the docs btw) [12:11:46] arturo: you are awesome. writing the wiki so quickly haha [12:13:21] zhuyifei1999_: not a lot of info this time... [12:13:25] "Thread Pool I/O Selector" tid=0x0x2b02347ad700 this=0x0x2b0219a708e0 , thread handle : 0x2b0238000f40, state : waiting [12:13:34] bt is [12:13:36] https://www.irccloud.com/pastebin/UAK4xnDP/ [12:13:52] yeah that bt I posted [12:14:15] yeah [12:16:07] zhuyifei1999_: according to the mono stack that was posted before (https://www.irccloud.com/pastebin/raw/NM85k4so) it waits for the result from another thread. not sure why this causes a spinlock. I don't think it should [12:16:55] so now every thread except 21740 and 21746 is waiting in kernel mode; 21746 is spinlocking in user mode; 21740 is polling for something [12:17:11] the bt of those thread waiting in kernel mode isn't exactly the same [12:17:35] https://www.irccloud.com/pastebin/wHEF4ugz/ [12:17:42] zhuyifei1999_: one of the threads waiting might just be a parked thread pool thread [12:18:07] please check this list to know if we may lack some additional package. lines starting with 'i' are installed packages, starting with 'p' are not installed [12:19:07] https://phabricator.wikimedia.org/P7170 [12:19:25] arturo: rollback to 3.2.8 or something not as new as 5.12? [12:19:34] (apparently paste too big for irccloud) [12:20:00] vgutierrez: problem is that we don't have anything in the middle which is good for TLS [12:20:14] I have no idea about those packages :( [12:20:25] arturo: hmmm mono-project provides versions from 5.0 IIRC [12:22:02] hey, an idea. when I submit the task, I submit it with -mem 900m [12:22:10] maybe I should increase the limit? [12:22:25] when I run it without the grid engine, it never gets stuck [12:22:29] arturo: I'll try to thread apply all mono_stack. might write to your foo3 file [12:22:57] leloiandudu: what do you mean without grid engine? [12:23:01] zhuyifei1999_: no problem, go ahead. You need me to do something? [12:23:06] leloiandudu: I guess yeah you could try that, considering it's calloc stuck in calloc [12:23:16] *calloc stuck [12:23:21] vgutierrez: just manually running "mono my.exe " on one of the grid machines [12:23:28] ack [12:23:37] so yeah.. it could be related to memory limits [12:23:38] actually, let's see if calloc ever returned [12:24:45] ok, replaced with 2000m [12:25:05] did I screw up? [12:25:08] https://www.irccloud.com/pastebin/cS6H8n1j/ [12:26:04] https://www.irccloud.com/pastebin/u3dDpOcP/ [12:26:07] zhuyifei1999_: if you did, we have plenty of other processes that are stuck [12:26:32] just tell me if you need another one [12:26:52] let me see if I can resume the process. not confident about this... [12:27:39] https://github.com/mono/mono/issues/7356 --> memory leak in HttpWebRequest only on TLS connections apparently [12:27:42] for other processes I'd have to debug again, might also have to ask for their permission first :( [12:28:00] and apparently it's fixed, but it's still happening with proxied connections [12:28:33] toolforge instances have direct access to wikipedia or they go through a http(s) proxy? [12:28:48] vgutierrez: omg, you are right. it consumed ~880mb [12:28:53] I didn't notice [12:29:13] I almost don't allocate memory [12:29:20] so it must be the leak [12:29:34] vgutierrez: direct access afaik [12:29:41] zhuyifei1999_: thx [12:30:15] in case you guys need more lab rats: pids 3979 and 4299 on tools-exec-1441 [12:30:35] vgutierrez: the lsof entry is mono 21729 tools.chie-bot 4u IPv4 18103547 0t0 TCP tools-exec-1430.tools.eqiad.wmflabs:51523->text-lb.eqiad.wikimedia.org:https (CLOSE_WAIT) [12:30:52] leloiandudu: looking [12:31:54] I don't know how to force rewinding this stack. if I quit this gdb session it will terminate (might have core dump). you okay with that? or what should I do? [12:32:27] zhuyifei1999_: yup, direct access [12:32:30] yes, you can kill any of them [12:32:44] leloiandudu: would you mind opening a phab task about this issue? So we don't lost track of this and more people can contribute to it [12:32:53] arturo: ok [12:34:08] leloiandudu: please reference T194665 [12:34:08] T194665: Provide an up-to-date mono environment on toolforge - https://phabricator.wikimedia.org/T194665 [12:35:42] so... leloiandudu what happens if you grant the process more than 900mb? [12:36:12] vgutierrez: I just edited the crontab, we will see soon [12:36:18] ack [12:36:57] arturo: can I reference irccloud pastebin links? or will they expire soon? [12:37:17] leloiandudu: not sure. Better if you copy&paste [12:37:35] arturo: ok, thanks [12:39:08] arturo: can I go and install that mono-runtime-dbg on tools-exec-1441? [12:39:17] zhuyifei1999_: yeah [12:39:41] (k, doing) [12:40:08] arturo: T195834 [12:40:08] T195834: mono-based bot hangs after mono version upgrade - https://phabricator.wikimedia.org/T195834 [12:40:16] what is mono-runtime-boehm though? it's a dependency [12:40:34] zhuyifei1999_: bohem should be a GC implementation [12:40:49] I'm not sure if it's still used in modern mono [12:40:57] it's a really old one [12:43:00] https://en.wikipedia.org/wiki/Boehm_garbage_collector [12:43:50] weird. why would installing debugging symbols install a GC [12:44:12] indeed [12:46:04] sorry, I need to go now. please feel free to kill any bot processes or completely stop the bot if it causes too much trouble to the grid. I will check back in about 10 hours [12:46:36] https://pastebin.com/aij1yKuf list of currently stuck processes [12:46:37] thanks leloiandudu :-) and sorry for all the noise for your tool [12:46:49] ok [12:47:00] no new processes since I increased the memory limit to 2k [12:47:36] arturo: zhuyifei1999_: vgutierrez: thank you very much guys! I hope you will find a solution to this [16:21:49] ok, thanks to chicocvenancio and zhuyifei1999_ we have now a possible explanation for T195834 [16:21:50] T195834: mono-based bot hangs after mono version upgrade - https://phabricator.wikimedia.org/T195834 [16:22:15] so I won't rollback the packages, and wait for reports of whether increasing the memory works or not [16:23:30] thanks folks! you are awesome! [16:28:58] ;) [16:30:00] hello, does anyone know of a good example of a Node app on Toolforge that has a webservice? I am working on https://tools.wmflabs.org/sql-optimizer [16:30:56] maybe rillke has one or more? [16:31:04] I started the server with `webservice --backend=kubernetes nodejs`, the package.json is in www/js, and the package.json has a key/value for "main" (script entry point) [16:31:16] I can't even find an error log for the webservice [16:31:19] I have a forked logio install [16:32:00] for logs you can use kubectl to get it, or have a wrapper script to redirect the logs [16:34:23] the redirector I wrote https://github.com/toolforge/video2commons/blob/master/www/js/server.js [16:35:21] ah, thanks, I see log output now for the relevant pod. It says the server started, no errors [16:35:56] I think I'm trying to start a server with Hapi https://hapijs.com/ and make Toolforge use that, but am doing it incorrectly [16:36:36] I have it starting at 0.0.0.0:8080 [16:37:32] as with https://github.com/MusikAnimal/sql-optimizer/blob/master/server.js#L10-L15 (except that's the localhost configuration) [16:38:11] so maybe I'm using the wrong host/port? [16:38:29] the port you should listen on is specified in the env by process.env.PORT [16:38:45] it's unclear if I need `webservice` at all, or if I can start the script myself manually within the pod [16:38:49] 0.0.0.0 is fine for a listen address [16:39:17] what do you mean by start it manually within the pod? [16:39:53] with `node ./server.dist.js` [16:40:17] the script itself will start the Hapi server [16:40:22] you mean start a `wbeservice shell`, then `node ./server.dist.js`? [16:40:30] yeah [16:40:45] wbeservice shell won't open any ports afaik [16:41:13] they are firewalled [16:41:45] and another issue is tools-proxy is unable ti discover your service [16:41:58] let me ask this, how is `webservice start` used with a Node.js application? like, how does it know what script to run? [16:42:07] which host and port your service is listening on [16:42:26] I think it's always www/js/server.js, let me check [16:43:14] https://github.com/wikimedia/operations-software-tools-webservice/blob/master/toollabs/webservice/services/jswebservice.py#L20 [16:43:20] it invokes npm start [16:43:52] okay great, so that should be fine because it my start script is `node ./server.dist.js` [16:44:16] https://docs.npmjs.com/cli/start [16:44:52] musikanimal: in case this fills in any details https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#node.js_web_services [16:45:11] yes indeed! I will update the documentation after I figure this out [16:46:31] oh, note that there is no path 'mounting' with node. your service will see every request prefixed with sql-optimizer [16:46:57] oh sorry, maybe I haven't read that section. I was looking at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Container_images [16:47:33] though there are workarounds to 'mount' it with your own code [16:52:53] okay, the port thing was definitely part of it, and I think the other issue is the 'mounting' you speak of. I'm on to something! tytyty [16:53:52] np [17:21:28] musikanimal: fyi https://phabricator.wikimedia.org/T195836 [17:22:24] zhuyifei1999_: yes! I think that is why the old SQL Optimizer script started failing [17:22:38] quarry also failed [17:22:47] with this tool I think I've found a workaround, I think... [17:23:23] * zhuyifei1999_ looks forward [17:26:26] or at least I had it reliably working at the hackathon [17:55:39] bd808: Hi, Do you have some time? [17:56:00] Neha16: sure! How are you today? [17:56:26] bd808: I am good. I have a few questions regarding T95097 [17:56:26] T95097: webservice and webservice-runner have no man pages - https://phabricator.wikimedia.org/T95097 [17:57:43] man pages. Never my favorite thing to write [17:58:14] bd808: Well, I am still learning those. [17:58:19] I wonder if we have any at all in that package yet? I'm guessing not [17:59:40] Neha16: this is an example of a man page in the toollabs package -- https://github.com/wikimedia/labs-toollabs/blob/master/misctools/become.1.in [18:00:36] bd808: Which commands need man pages? [18:02:18] webservice for sure, and probably nice to have one for webservice-runner too. [18:02:49] I do not think they need to be highly detailed. [18:03:20] bd808: Great. Should I write those manually? I think a lot of tools like sphinx are used to generate man pages. [18:06:14] Using sphinx to generate man pages from ReStructuredText is probably the "normal" way to do a man page for a python app. [18:06:36] I'm googling in the hope of finding a tutorial [18:11:02] !log re enabling puppet on planet-hotdog to apply https://gerrit.wikimedia.org/r/#/c/435327/ [18:11:05] paladox: Unknown project "re" [18:11:08] mutante ^^ [18:11:13] !log planet re enabling puppet on planet-hotdog to apply https://gerrit.wikimedia.org/r/#/c/435327/ [18:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Planet/SAL [18:12:41] Neha16: starting at http://www.sphinx-doc.org/en/stable/tutorial.html may help you. There are special configuration options for generating man pages from RST source docs too -- http://www.sphinx-doc.org/en/stable/config.html#options-for-manual-page-output [18:13:58] paladox: "re" and thanks :) [18:14:04] your welcome :) [18:14:17] bd808: Cool. I will start with that. Thank you so much :) [18:15:09] Neha16: I would not spend a lot of time on this particular task. If you are still stuck after a day or so of reading things online and playing with it you should make some notes on the task and then move on to something more interesting. [18:16:36] Neha16: tasks like T159892, T140415, and T156626 are more impactful for our users [18:16:36] T156626: k8s webservice restart failure with `ValueError: get() more than one object; use filter` - https://phabricator.wikimedia.org/T156626 [18:16:36] T140415: `webservice restart` does not always wait for service to stop before trying to start again - https://phabricator.wikimedia.org/T140415 [18:16:36] T159892: Make tools-webservice use the official kubernetes python client rather than pykube - https://phabricator.wikimedia.org/T159892 [18:17:08] zhuyifei1999_: https://tools.wmflabs.org/sql-optimizer/explain?sql=SELECT+COUNT%28rev_id%29+FROM+revision_userindex+WHERE+rev_user_text+%3D+%27MusikAnimal%27 [18:17:20] but it only runs on s1 [18:18:05] * zhuyifei1999_ looks [18:18:29] bd808: Okay, If this task is not done in a day or so. I will move on to one of the tasks you mentioned. [18:18:56] musikanimal: shall I go ahead and play with more complex queries? [18:18:59] Neha16: :) hopefully you will make progress! [18:19:04] zhuyifei1999_: do it! [18:19:36] bd808: Will try my best :) [18:19:45] it uses a `SET max_statement_time = 1` so it should be safe to run anything [18:19:58] the queries are auto-killed [18:20:34] https://tools.wmflabs.org/sql-optimizer/explain?sql=USE+commonswiki_p%3B%0D%0ASELECT+ipb_address%2C+ipb_reason%2C+ipb_by_text%0D%0AFROM+ipblocks%0D%0AWHERE+ipb_reason+REGEXP+%27%5B%5B%3A%3C%3A%5D%5D%28%3F%3AT129845%7CZ567%7CZ591%7CWP0%7Cwikipedia+%28%3F%3Azero%7C0%29%29%5B%5B%3A%3E%3A%5D%5D%27%0D%0AAND+ipb_user+%21%3D+0%0D%0AAND+NOT+REPLACE%28ipb_address%2C+%27+%27%2C+%27_%27%29+IN+%28%0D%0A++SELECT+page_title%0D%0A++FROM+page%0D%0A+ [18:20:34] +INNER+JOIN+categorylinks%0D%0A++ON+cl_from+%3D+page_id%0D%0A++WHERE+cl_type+%3D+%22page%22%0D%0A++AND+page_namespace+%3D+2%0D%0A++AND+cl_to+IN+%28%0D%0A++++SELECT+%27Users_suspected_of_abusing_Wikipedia_Zero%27+AS+catname%0D%0A++++UNION%0D%0A++++SELECT+page_title+AS+catname%0D%0A++++FROM+page%0D%0A++++INNER+JOIN+categorylinks%0D%0A++++ON+cl_from+%3D+page_id%0D%0A++++WHERE+cl_type+%3D+%27subcat%27%0D%0A++++AND+cl_to+%3D+%27Users_susp [18:20:34] ected_of_abusing_Wikipedia_Zero%27%0D%0A++%29%0D%0A%29 [18:20:42] it's https://quarry.wmflabs.org/query/20466 [18:20:47] haha oh dear [18:20:55] the old optimizer failed on this one [18:21:33] I get "not an EXPLAINable command" [18:21:49] probably because there are multiple queries [18:22:28] https://quarry.wmflabs.org/query/26631 not working either [18:22:53] I just input the USE commonswiki_p; and the main SELECT body [18:23:14] https://tools.wmflabs.org/sql-optimizer/explain?sql=SELECT+ipb_address%2C+ipb_reason%2C+ipb_by_text%0D%0AFROM+commonswiki_p.ipblocks%0D%0AWHERE+ipb_reason+REGEXP+%27%5B%5B%3A%3C%3A%5D%5D%28%3F%3AT129845%7CZ567%7CZ591%7CWP0%7Cwikipedia+%28%3F%3Azero%7C0%29%29%5B%5B%3A%3E%3A%5D%5D%27%0D%0AAND+ipb_user+%21%3D+0%0D%0AAND+NOT+REPLACE%28ipb_address%2C+%27+%27%2C+%27_%27%29+IN+%28%0D%0A++SELECT+page_title%0D%0A++FROM+commonswiki_p.page%0D%0A++ [18:23:14] INNER+JOIN+commonswiki_p.categorylinks%0D%0A++ON+cl_from+%3D+page_id%0D%0A++WHERE+cl_type+%3D+%22page%22%0D%0A++AND+page_namespace+%3D+2%0D%0A++AND+cl_to+IN+%28%0D%0A++++SELECT+%27Users_suspected_of_abusing_Wikipedia_Zero%27+AS+catname%0D%0A++++UNION%0D%0A++++SELECT+page_title+AS+catname%0D%0A++++FROM+commonswiki_p.page%0D%0A++++INNER+JOIN+commonswiki_p.categorylinks%0D%0A++++ON+cl_from+%3D+page_id%0D%0A++++WHERE+cl_type+%3D+%27subcat%2 [18:23:14] 7%0D%0A++++AND+cl_to+%3D+%27Users_suspected_of_abusing_Wikipedia_Zero%27%0D%0A++%29%0D%0A%29 [18:23:20] that works [18:23:38] I need to turn the query in the URL into a hash or something so it can be easily shared [18:24:13] so the issue here is there can only be one statement (e.g. no semicolons) [18:24:23] Hoffman encode + base64? [18:24:35] which I think is fine [18:25:02] well, could you somehow make USE work? [18:25:19] I'm going to add a dropdown for you to select the db [18:25:54] I wouldn't want to declare the database for every single join [18:26:02] sure, that's understandable [18:26:23] I can probably make it work on multiple statements, but it's tricky [18:27:11] I don't think making multiple SELECTs work is going to be worth the time [18:27:25] yes that is my thought [18:27:39] so scratch that, ha [18:27:41] but USE sounds much simpler to me [18:27:51] I'll look into it! [18:27:55] k [18:28:27] by the way, base64 still can produce a huge hash, 960 characters for your query [18:28:33] any other ideas? [18:29:08] Hoffman encode [18:29:16] before applying base64 [18:29:39] base64 increase the size, not decrease... [18:29:49] ah I see [18:30:07] it changes binary to text [18:30:23] Hoffman encode is like 'compressing' to binary [18:30:36] uh [18:30:47] right [18:30:53] I just need to find a suitable library for it [18:31:17] Hoffman encode would need an explicit mapping... [18:31:41] this is complicated :( [18:32:06] deflate? [18:33:31] I'm just Googling these things as you mention them, that one looks promising [18:35:37] for https://quarry.wmflabs.org/query/26631 from the SELECT to ; I got 762 => deflated 434 => base64 588 [18:37:22] (deflate using zlib) [18:38:50] not bad [18:40:23] still quite long. I don't think you can send 588 bytes in a single message over IRC [18:41:28] like, idk if it'll worth [18:42:45] what is the intent here? have a small uri for each query? [18:42:46] 512 bytes is the limit on IRC IIRC, and that includes the full command not just the message [18:43:02] chicocvenancio: yes [18:43:44] does it have to have any relation to the query itself? [18:43:50] ideally I won't have to store them server-side in a database or whatever, but that's an option [18:44:26] I just need to be given a compressed string and get the original text from it, in this case an SQL query [18:44:29] formatted [18:44:56] I can't think of any better methods that losslessly compresses SQL without storing it [18:45:22] :( [18:45:31] well, the problem is sql queries can be quite long, event longer than 2000 chars for url... [18:45:31] we could potentially extract tokens from the query and reconstruct it [18:45:41] *even longer [18:46:16] I like how https://query.wikidata.org deals with this [18:46:27] which can get ugly fast [18:46:34] chicocvenancio: how? [18:47:08] you can send the query as a get param or in the post body [18:47:28] and for reference you get a shorturl service [18:48:15] but how does the shorturl work? stored queries somewhere? [18:48:33] it uses http://tinyurl.com [18:48:45] oh [18:49:03] musikanimal: ^ [18:49:07] (http://tinyurl.com is basically storing the queries, as urls...) [18:51:28] the api endpoint there is https://tinyurl.com/api-create.php [18:51:39] Is that kosher to use? Doesn't track or anything [18:51:41] https://tinyurl.com/api-create.php?url= [18:53:37] from a production service pov? not kosher at all [18:53:51] !help I'm the owner of webarchivebot. I'm unable to stop/restart my webservice [18:53:51] Amitie_10g: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [18:54:40] The job is stuck in the qstat list [18:54:55] And the Webservice is stoped. [18:55:03] musikanimal: if you leave the api-call server side and warn users of potential privacy issues, I think it is ok [18:55:58] musikanimal: /me don't understand https://github.com/MusikAnimal/sql-optimizer/commit/6c965a963e9009ca7cd52f58ea8a89d95215fc48 [18:57:02] Amitie_10g: job state is running [18:57:16] chicocvenancio: thanks, might look into that [18:57:22] what happens if you restart? [18:57:47] zhuyifei1999_: Node 6 doesn't support util-promisify, and apparently the polyfill didn't work either [18:59:36] I guess you could try something like https://github.com/zhuyifei1999/libslimer/blob/master/util.js#L5 ? [18:59:52] tools.webarchivebot@tools-bastion-03:~/public_html$ webservice stop [18:59:53] Your webservice is not running [19:00:25] it looks kind of ugly for a promise with settimeout and then resolve... [19:00:48] Amitie_10g: jstop the webservice should work [19:02:53] I got No job '' is currently queued or running [19:03:06] Using the job ID [19:03:35] jstop [19:03:59] in your case it is generic-webarchivebot [19:06:02] Stoped [19:06:27] Started, thanks [19:06:32] np [19:28:02] !help: I am trying to install nodeenv but can't use apt-get or any sudo command as I don't have root priviledges [19:28:18] install on toolforge* [19:28:31] what is nodenv? [19:29:20] a virtual inviroment for node applications [19:29:37] a competitor to nvm? [19:33:44] no [19:34:03] I don't think they serve the same purpose [19:37:09] do you have documentation for nodeenv you can point me to? is it this http://ekalinin.github.io/nodeenv/ ? [19:37:46] yea it is [19:38:02] that says to install with pip... [19:38:34] yes but pip is not install on toolforge :( [19:38:48] so follow the local installation version [19:39:29] I will still need to install wirtualenv [19:39:30] making use of sudo command [19:40:48] Virtualenv shouldn't require sudo. I use virtualenv on toolforge [19:40:59] There might be something I'm missing there? [19:41:14] * chicocvenancio is just as lost [19:42:08] maybe r054l13 could explain what you're trying to achieve with nodeenv? [19:45:09] bstorm_: could you pls link me to the installation doc you used [19:46:00] for virtualenv? [19:46:07] That's installed [19:46:31] chicocvenancio: I just need it to be able to place all the dependency of my application in it pls [19:46:53] for node I usually go with nvm [19:47:04] but it seems nodeenv would do the job as well [19:47:26] what is you application and how are you running it? Grid or kubernetes? [19:47:34] kub [19:47:52] my app is a bot [19:48:13] I installed nvm and used it like you said yesterday [19:48:29] did you have a problem with that? [19:48:51] but then the version of node was to hight for the app [19:49:13] it failes with higher than 5.6 node versions [19:49:25] r054l13: you can choose the node version with nvm [19:49:32] ok [20:06:43] r054l13: did you manage to get it working? [20:49:19] chicocvenancio[m: I have very poor connection presently sorry [20:50:08] I am experiencing something, the installation I made yesterday is no more present on my toolforge account [20:50:41] Or i am allucinating?or missing something? [20:55:43] r054l13: it should be there [20:56:29] But maybe you have to do `nvm use node version-number` [23:13:45] tools-bastion-03 feels very slowwww [23:15:06] MaxSem: indeed [23:15:29] some root running some script without nice? [23:15:42] probably a user [23:15:46] non-root [23:16:48] !log tools.info-farmer killing scripts running on tools-login. Please submit to job grid instead. [23:16:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.info-farmer/SAL [23:17:09] https://www.irccloud.com/pastebin/DDIEb5eF/ [23:17:37] my bet is the sqlite dump [23:18:54] !help tools-bastion-03 very slow, seems to be iops [23:18:55] chicocvenancio: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [23:19:22] bd808: didn't help. simple commands like crontab -e hang forever [23:19:35] iotop looks pretty boring [23:19:59] the dump is doen [23:20:00] hmm, looks better now that I've written this [23:20:17] It probably was that dump [23:20:55] * bd808 looks to see who to poke about webarchivebot doing that on the bastion [23:21:15] they were here a few hours ago... [23:21:53] *nod* Amitie_10g [23:22:11] gone now. I'll leave a talk page note [23:29:10] !log tools.webarchivebot Left note on maintainer's talk page about using job grid for IO intensive operations rather than interactive scrips on bastions [23:29:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.webarchivebot/SAL [23:29:44] * bd808 considers a !talk command for a future bot