[00:18:42] bd808: tool is works again. Is it your magic or tool fixed itself? [00:21:08] Iluvatar_: I wondered if it would be fixed. We figured out that internal DNS was messed up such that something like `curl http://tools.wmflabs.org/` was failing from inside of Toolforge. j.eh fixed that with the forced labs-ip-alias-dump run that he !log'ed above. [00:22:53] Iluvatar_: sorry that we did not catch on to this in your initial reports here. The web socket failure was a bit of a red herring in understanding what broke. I think we would have figured it out faster if it was a plain http call that was broken. [00:23:43] Ok, thank you so much! [04:30:10] !log tools.svgtranslate Updating to version 0.10.13 [04:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.svgtranslate/SAL [07:40:58] Can anybody check the too wikihistory? I cannot start the webservice? It says "Your job is already running" [08:54:51] !help Can anybody check the tool wikihistory? I cannot start the webservice? It says "Your job is already running"; I cannot stop it, it says "Your webservice is not running" [08:54:51] Wurgl: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [09:07:14] mmm [09:07:36] Wurgl: try restart? [09:07:45] it is gris based or kubernetes based? [09:08:46] webservice --backend=kubernetes php7.2 restart <-- restart instead of start seemed to work [09:08:59] cool [09:09:22] I was not aware of that parameter … [10:07:31] !log tools deleting old jessie VMs tools-proxy-03/04 T235627 [10:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:07:36] T235627: Toolforge: upgrade main proxy servers to Debian Buster - https://phabricator.wikimedia.org/T235627 [10:49:45] !log tools deleting VMs tools-test-proxy-01, no longer in use [10:49:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:52:50] !log admin icinga downtime cloudvirt1001/1002/1024/1018/1012/1009/1015/1008 for 1h T227538 [10:52:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [10:52:53] T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) - https://phabricator.wikimedia.org/T227538 [10:58:15] running ffmpeg on sgebastian is okay? [10:58:52] close to 1 hour cpu-time and NFs is slow :-/ [10:59:49] Wurgl: please use the grid instead [11:00:01] you are using NFS anyway (the bastions use NFS too) [11:03:44] Hey, it is not me! [11:04:34] ? [11:05:10] ffmpeg is running, but not my process [11:06:03] oh [11:06:18] so you were reporting an issue :-P sorry I misunderstood [11:07:17] just a rant … not a real issue [11:07:28] phamhi: this is something that can happen from time to time [11:07:42] people run stuff in the tools bastions rather than in the grid or k8s [11:08:16] phamhi: if you run `htop` in tools-sgebastion-07.eqiad.wmflabs` you can see what Wurgl is referring to [11:08:17] https://www.irccloud.com/pastebin/D8TwS2X4/ [11:08:27] is it allowed? [11:08:32] exactly [11:08:49] so, we have strong limits and controls in place in the bastion by using systemd slices [11:09:13] this is something more or less new. Before that, any user could bring down a bastion by running a bot in them [11:09:40] ah ok [11:09:44] so, my proposal is we kill the process and !log a message for`tools.faebot` letting them know [11:10:08] ok i'll do it [11:10:11] I have no problem with one such job, but a few hours ago a different file was rendered [11:10:21] So it smells like a batch job [11:10:30] yup [11:10:47] and if you see https://tools.wmflabs.org/sal/tools.faebot this happened yesterday too [11:11:54] this may indicate A) user doesn't know how to properly use the grid B) we don't have a fluid communication channel with the user C) both A and B [11:12:32] !log tools.faebot Admins killed ffmpeg process running on tools-sgebastion-07 [11:12:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.faebot/SAL [11:17:57] it looks like it was invoked manually [11:24:42] anyone any idea how i can log into grafana-labs.wikimedia.org ?:) [11:26:57] addshore: what do you mean with log into? you want to create new dashboards? [11:27:16] i found the docs ! [11:27:16] https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org#Beta_cluster [11:27:32] ok [13:38:08] arturo: regarding faebot's issue, it looks like the user couldn't run the job on Grid (T236446) [13:38:08] T236446: Consistent errors from Video2Commons from YouTube - https://phabricator.wikimedia.org/T236446 [13:40:02] I see, interesting [16:04:02] the reason it was ran on the bastion is that youtube-dl downloading is blocked with cloud NAT. bastion have a floating IP which is an easy but non-ideal way to workaround it [16:31:20] zhuyifei1999_: very very not ideal because every time fae's thing runs it makes the bastion unresponsive for everyone [16:31:54] yeah ik, I think matanya is contacting people to get a whitelist for us [19:13:26] Yes, I spoke to rebecca, she is getting us in contact with someone from youtube