[00:51:25] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Strange_Memory_Behavior_on_Toolforge_with_Java [00:51:35] damn, why am i up at 2 am... [01:27:16] T219351 created for the issue that thedj was so nice to report here earlier [01:27:17] T219351: Java jobs run the Stretch grid seem to require a very large memory reservation - https://phabricator.wikimedia.org/T219351 [12:15:13] !log T218126 `aborrero@tools-sgegrid-master:~$ sudo qmod -d 'test@tools-sssd-sgeexec-test-2'` (and 1) [12:15:14] arturo: Unknown project "T218126" [12:15:14] T218126: LDAP: try how sssd works with our servers - https://phabricator.wikimedia.org/T218126 [12:15:20] !log tools T218126 `aborrero@tools-sgegrid-master:~$ sudo qmod -d 'test@tools-sssd-sgeexec-test-2'` (and 1) [12:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:25:26] !log tools truncated exim4/paniclog on tools-sgecron-01 [12:25:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:26:15] !log tools truncated exim4/paniclog on tools-sgewebgrid-lighttpd-0921 [12:26:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:08:20] Hello! I `webservice restart`-ed one of my tool and since then it goes in '504 Gateway timeout' From my side things look okay though − can’t see anything in the UWSGI logs. Is there any obvious area I should be looking at? (this is tracked at T219377) [13:08:21] T219377: wikiloves goes in 504 - https://phabricator.wikimedia.org/T219377 [14:43:13] !log shinken deployed shinken-puppetmaster-01 [14:43:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [14:45:31] !log tools cleared several "E" state queues [14:45:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:01:53] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @milimetric & @chiborg - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:23:46] JeanFred: let me check [15:25:02] zhuyifei1999_: Cheers :) [15:26:04] Has ssh tunneling been disabled? [15:27:01] 2019/03/27 15:23:17 [error] 528#528: *197143150 upstream timed out (110: Connection timed out) while connecting to upstream, client: [IP REDACTED], server: , request: "GET /wikiloves/images?event=monuments&year=2018&country=Italy HTTP/1.1", upstream: "http://192.168.114.122:8000/wikiloves/images?event=monuments&year=2018&country=Italy", host: "tools.wmflabs.org", referrer: [15:27:01] "https://www.google.it/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=2ahUKEwi98KWE0KLhAhUM_qQKHQgAC8IQjRx6BAgBEAU&url=https%3A%2F%2Ftools.wmflabs.org%2Fwikiloves%2Fimages%3Fevent%3Dmonuments%26year%3D2018%26country%3DItaly&psig=AOvVaw1NgCPC8sFrAvGu1VdtSAt1&ust=1553785466290577" [15:27:32] Dispenser: I don't think so [15:28:06] JeanFred: ^ doesn't seem to help much. :/ [15:29:35] zhuyifei1999_: Indeed :-/ [15:29:40] JeanFred: mind if I `become wikiloves` and check kubectl? [15:30:21] Actually, If I stop the webservice, I still get a loong wait before it gets a 504 ; I thought I would get straight "URL is not serviced" [15:30:26] zhuyifei1999_: Sure, go ahead [15:31:32] https://www.irccloud.com/pastebin/to9qtSKU/ [15:31:35] huh [15:33:41] https://www.irccloud.com/pastebin/Dmhektkg/ [15:33:48] this is weird [15:39:47] Not using a python image? [15:41:30] `docker-registry.tools.wmflabs.org/toollabs-python2-web:latest` That's a python image. python2, though [15:42:45] the pod was live, but somehow not reachable from the proxy I guess. [15:42:56] I got distracted before I got further :/ [15:44:11] It says it can't load python3 uwsgi. If it is trying to do that, shouldn't it be using the python image rather than python2? [15:44:30] nah, that's fine [15:44:43] Huh [15:44:52] But the logs above? [15:45:11] Command history shows they definitely used the python2 image at least :) [15:45:20] as well as kubectl [15:45:23] the uwsgi config that webservice generates tries to load both py2 and py3. One or the other fails, but it doesn't hurt anything [15:45:49] Ok fair :) [15:46:05] That makes more sense then [15:47:28] Yeah, I get it all now. The proxy rejection message and all that makes sense. [15:51:02] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @milimetric & @chiborg - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:51:37] zhuyifei1999_, JeanFred: did restarts help at all or is it still not reachable? [15:53:23] !log tools.wikiloves Restarted webservice to see if that fixes connectivity with tools.wmflabs.org proxy (T219377) [15:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [15:53:26] T219377: wikiloves goes in 504 since webservice restart - https://phabricator.wikimedia.org/T219377 [15:54:45] * bstorm_ finds no recent changes to kube2proxy [15:55:28] restart doesn't seem to have changed things :/ [15:56:03] curl inside the pod is working as expected, so the problem is somewhere between the proxy and the pod [15:56:10] bd808: kube2proxy is down [15:56:26] https://www.irccloud.com/pastebin/9rlcpqOT/ [15:56:27] well that would do it :/ [15:57:10] Why is the kubemaster not allowing kube2proxy... [15:58:07] Died here: [15:58:10] `Mar 27 11:24:48 tools-proxy-03 kube2proxy[10739]: 2019-03-27 11:24:48,521 Exception was: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))` [15:58:18] I'll try to start it up [15:58:32] Hope nobody was playing with firewalls today? [15:58:57] It's up [15:59:34] bd808: sorry I was off for a few mins [15:59:53] no worries. the awesome bstorm_ fixed it :) [16:00:05] apt search uwsgi does show python2 but not 3 afaict [16:00:07] ok [16:00:20] the py3 module thing is a red herring [16:00:49] and something I guess I should fix in webservice because it confuses the hell out of people all the time when debugging [16:00:57] Thanks bstorm_ :) [16:11:03] np :) [17:03:00] Turns out the SSH keys changed and autossh thought there was a man-in-the-middle [17:05:08] yeah, we recently set up a new bastion, so different fingerprint [21:49:20] !log tools.admin Updated to a03551b (Remove STS header and http->https redirect) T102367 [21:49:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [21:49:23] T102367: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 [22:10:05] !log paws moving paws host in `paws-proxy-02` to `tools-paws-worker-1005` T219460 [22:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [22:10:08] T219460: paws-master-01 high load and NFS client issues - https://phabricator.wikimedia.org/T219460 [22:16:28] would it be possible to use something like selenium with python in toolforge? [22:16:57] probably [22:17:36] you'd have to use a headless browser, I'm not sure which one will/won't work in toolforge but probably at least one will [22:17:42] DatGuy: yes? I'm not sure I understand the question [22:18:57] I mean generally if it would be technically possible, I really don't know much about frameworks like that but will they work fine under the grid service for example? [22:21:25] In theory, yes. In practice, I have no idea. Like chicocvenancio mentioned it may take some experimentation to find a headless browser that works for your task (whatever it is) [22:22:09] bd808: do we have xfvb in grid? [22:22:13] ggp: your irc client seems to be having some issues [22:22:18] chicocvenancio: yes [22:22:31] I know there is at least one tool actively using it [22:22:35] then you can just use any browser and it with selenium [22:23:47] firefox, chrome, chromium, whatever. If there is a selenium driver for it you can script it with xfvb [22:23:56] Almost anything I can think of right now that would want selenium would probably be a better fit for a Jenkins job than a tool, but maybe I'm just not creative enough :) [22:24:34] I guess scrapping complicated sites [22:25:23] * chicocvenancio has production systems that depend on selenium in another life [22:26:27] *shudder* Yeah I've done nasty things like that before as well [23:35:56] !log tools rebooted tools-paws-master-01 for NFS issue T219460 [23:35:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:35:59] T219460: paws-master-01 high load and NFS client issues - https://phabricator.wikimedia.org/T219460 [23:46:13] !log paws moving paws host in `paws-proxy-02` back to `tools-paws-master-01` T219460 [23:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [23:46:18] T219460: paws-master-01 high load and NFS client issues - https://phabricator.wikimedia.org/T219460