[01:21:34] /wc [09:19:04] > !log tools SIGTERM PID 12542 24780 875 14569 14722. `tail`s with parent as init, belonging to user maxlath. they should submit to grid. [09:19:04] zhuyifei1999_ weird, I haven't used any `tail`in the last weeks [09:19:31] probably some disowned processes? [09:19:43] or nohup or some sort [09:20:16] I could paste the ps output if you want [09:20:29] (I still have it) [09:23:03] if it's still running, yes [09:24:08] I've sigtermed them, so they aren't really running anymore [09:24:21] it was: [09:24:26] https://www.irccloud.com/pastebin/hVfLEs83/ [09:26:41] I did run those, but I would have expected them to have been sigtermed when I closed the terminal or lost the connexion. I can't remember any reason I would have tried to run those in the background [13:30:15] maxlath[m]: I think when you close it should get something like SIGHUP, which by default termainates a process (and I don't think less overrides this action) [13:31:05] the processes have init as parent so they are either double-forked or the parent process somehow dies, allowing them to reparent to init [13:32:58] I can't think of a reason why tail had a double-forking parent, but for the latter case I can only think of disown and nohup in bash, which, if the parent process is bash, it stops SIGHUP propagating to this child process. bash dies after terminal disconnecting and having its SIGHUP handler run [13:33:21] *and I don't think less overrides this action <= s/less/tail/ [13:37:38] oh to clarify, SIGHUP is the 'hangup signal' issued when a terminal closes. SIGTERM is the signal saying 'you should do your cleanups and terminate'. and another relevant signal is SIGKILL which is like 'just exit immediately' [13:40:36] anyways, it's fine. I'm not blaming you or anything. I just saw they are in D-state (so contributing to NFS lag) and not interactive so sigtermed them [13:55:05] zhuyifei1999_: thanks for your help! :-) [13:55:13] (np) [14:20:19] !log admin changing tools.wmflabs.org to point to tools-proxy-03 in eqiad1 [14:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:21:42] !log tools T213418 put a backup of the docker registry in NFS just in case: `aborrero@tools-docker-registry-02:$ sudo cp /srv/registry/registry.tar.gz /data/project/.system_sge/docker-registry-backup/` [14:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:21:45] T213418: Toolforge: move docker nodes from eqiad to eqiad1 - https://phabricator.wikimedia.org/T213418 [14:29:46] Ok, now I see what's the problem with tools.wmflabs.org [14:31:25] jem: are you experimenting any issue? [14:31:34] Yes, arturo [14:31:53] the URL down simply? [14:32:00] First I thought it was just my tool [14:32:17] The main Web is Ok [14:32:33] But the individual tools are not [14:32:52] I'm checking now [14:34:01] jem: what's an example tool that's having trouble? [14:34:29] https://tools.wmflabs.org/videotutorials/ for example [14:35:40] Several ones do work, when they just say one line or 404 [14:35:52] jem: what kind of a service is that? Is it running on kubernetes? [14:36:06] and, do you know when it last worked? [14:36:08] I don't know, I just picked it randomly [14:36:13] oh :/ [14:36:18] But there are more... I'll paste [14:37:08] (Anyway: my tool is the first one to look at: https://tools.wmflabs.org/jembot/ ) [14:37:36] jem: ok, same questions for that one — when did it last work, and what kind of a service is it? [14:37:38] I've done several restarts, stops and stats [14:37:57] It was working about 5-10 minutes ago [14:38:26] Now stopped and started once more [14:39:00] jem: you see 504 bad gateway? [14:39:20] In videotutorials yes, after maybe one minute [14:39:27] Waiting now for jembot... [14:41:02] jem: dumb question, is that grid engine or kubernetes? [14:41:09] 504 after one minute also [14:41:47] arturo: I'm just starting/stopping webservice from the command line [14:42:02] No grid or kubernetes involved (I think) [14:42:13] ok, let me check [14:42:49] jem [14:42:52] tools.jembot@tools-bastion-03:~$ grep backend service.manifest [14:42:52] backend: gridengine [14:43:24] I take note [14:49:42] jem: `webservice` is a wrapper arround grid or k8s [14:51:23] Ok, chicocvenancio, I knew it was a wrapper but I didn't check [14:52:45] arturo: shall I look into that jembot webservice? [14:53:16] jem: try restarting your tool? we might not be ready for it yet but it's worth a try [14:53:43] zhuyifei1999_: it's probably related to the proxy movement, and some security groups fixes [14:53:58] ok [14:55:38] !log tools disable puppet in tools-docker-registry-01 and tools-docker-registry-02, trying with `role::wmcs::toolforge::docker::registry` in the puppetmaster for -03 and -04. The registry shouldn't be affected by this [14:55:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:15:30] andrewbogott: not workimg yet [15:16:14] Right now I get 503 [18:29:50] !log tools T213711 installed python3-requests=2.11.1-1~bpo8+1 python3-urllib3=1.16-1~bpo8+1 on tools-proxy-03, which stopped the bleeding [18:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:29:54] T213711: move tools proxy nodes to eqiad1 - https://phabricator.wikimedia.org/T213711 [18:46:41] !log tools Dropped A record for www.tools.wmflabs.org and replaced it with a CNAME pointing to tools.wmflabs.org. [18:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:46:56] bstorm_: ^ part of what we discovered in the debugging [18:47:21] 👍🏻 [18:47:45] I wonder what other magic hides in the designate DNS [18:48:44] * bstorm_ shudders [20:37:47] any idea why Id be getting 502 errors for my tool? [20:42:20] !help ^ [20:42:20] Betacommand: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [20:43:18] Betacommand: The proxy was moved. You may need to restart the webservice. Where is it running? [20:43:26] The old grid, the new grid or kubernetes? [20:43:45] And which tool [20:43:49] bstorm_: no clue, havent touched it it ages [20:43:56] betacommand-dev [20:43:56] Which tool then? [20:43:58] Ah ok [20:44:26] Looks like the answer is old grid [20:45:44] Betacommand: just the web service? [20:45:50] yeah [20:46:08] I restarted several other tasks manually [20:46:12] Looks like it was restarted today at some point [20:46:22] Checking something on the proxy [20:46:56] "2019-01-15T15:16:48.073748 No running webservice job found, attempting to start it" from the service.log file [20:47:13] The proxy can hit it [20:47:32] what is that...I saw that in some other service.log files [20:48:18] that's the service watcher demon for the grid jobs. [20:48:34] Ok...looking for that. I think that is broken and is breaking tools because of that [20:51:41] bstorm_: should be webservicemonitor.py running on the cron host for the new grid. Not sure if the old grid's copy runs on the cron host now or still on the services box [20:52:03] It'd still be on services methinks. I'll check there [20:53:00] Its always fun to discover something broken :) [21:02:43] !log tools restarting webservicemonitor on tools-services-02 -- acting funny [21:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:05:57] Should I see anything yet? [21:05:58] Betacommand: how does it look now? [21:06:06] Maybe :) [21:06:08] 502 [21:06:30] restart your webservice again [21:06:44] It might actually work this time [21:06:56] whats the recommended way, havent done that in a year [21:07:15] Betacommand: webservice restart [21:07:40] Yup :) [21:08:14] thanks knew we had a special command [21:09:02] 504 [21:09:30] BTW, my tool is working now, thanks, andrewbogott and arturo [21:10:19] :) [21:10:25] Well, that's good [21:10:38] It appears that the older webgrid tools are not working [21:11:46] I swear Im not trying to be a PITA [21:12:06] It's alright; we're in the midst of some transitions so it's inevitable that something breaks. [21:12:14] Things break under the best of circumstances! [21:12:44] * zhuyifei1999_ thinks out web code is growing overly complex [21:12:50] *our [21:13:05] zhuyifei1999_: its been this bad since late 2015 [21:13:29] :( [21:19:47] any idea whats broken? [21:21:39] Betacommand: looks like it is probably firewall stuff. We moved the proxy server for tools.wmflabs.org to a different network segment today [21:21:58] we are debugging [21:22:50] np [21:28:44] Betacommand: your's works now [21:29:20] I need to check through the grid and make sure this is fixed across the setup. [21:51:40] bstorm_: thanks [21:52:35] !log tools.audetools Fixed bad .lighttpd.conf and moved webservice from Trusty grid to Kubernetes PHP 7.2 [21:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.audetools/SAL [23:46:41] !log tools.wd-constraints-precheck Stopped webservice as it has been failing for months [23:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wd-constraints-precheck/SAL [23:50:43] JJMC89|away: your scp/sftp run is making tools-bastion-02 pretty sad