[10:52:32] !log cyberbot extending storage quota: `aborrero@cloudcontrol1005:~$ sudo wmcs-openstack quota set --gigabytes 350 cyberbot` (T277333) [10:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cyberbot/SAL [10:52:36] T277333: Request increased quota for cyberbot Cloud VPS project - https://phabricator.wikimedia.org/T277333 [15:26:42] mutante: https://wikistats.wmcloud.org/display.php?t=mh is blank [16:02:46] arturo: I'm around, let me know when `cloudvirt-wdqs1001` is restarted [16:03:00] ryankemper: ok! [16:03:08] ryankemper: do you want to stop the VMs beforehand? [16:03:15] or let them be rebooted with the hypervisor? [16:04:46] arturo: rebooted by hypervisor should be fine [16:05:05] ack [16:06:16] RhinosF1: Something on the Miraheze side must have changed again I suppose [16:06:37] mutante: let me know if there's anything we can do [16:07:06] ryankemper: rebooting now [16:07:48] RhinosF1: could you please make a ticket and assign? that would be great [16:09:21] ryankemper: so the VM disks should be very busy, because the hypervisor is failing to unmount the disks (or at least taking very long) [16:09:47] arturo: ack, looking [16:10:33] oh right I can't ssh now anyway [16:10:46] arturo: will that just delay how long it takes to complete? or do you think it's stuck [16:10:55] that's ok [16:11:11] it may be stuck, yes. I'll wait a few more minutes and then force-shutdown [16:11:18] ack [16:11:37] hopefully ext4 journal will kick in and nothing special will happen [16:12:13] mutante: https://phabricator.wikimedia.org/T277479 [16:12:20] RhinosF1: thanks [16:12:46] Apologies for the millionth time that's happened [16:13:27] np, let's see what it is, i'll look for sure just can't right now, that's why having the reminder is great [16:14:03] I can already see this is also going to be "well, don't delete the table if you cant get new data" as well [16:15:06] Yeah [16:15:49] ryankemper: rebooted! [16:15:54] back online [16:16:59] arturo: ack, wcqs-beta.wmflabs.org went from 502 to a 500 so I'll see if something needs to be restarted on my end [16:17:20] ryankemper: first thing I would verify is all VMs being online [16:18:09] arturo: it's back now [16:18:16] puppet run just finished [16:18:28] (on `wcqs-beta-01.eqiad.wmflabs`) [16:18:36] arturo: thanks for your help! that should be it [16:20:48] ryankemper: ok, good for the next one? [16:21:02] cloudvirt-wdqs1002.eqiad.wmnet [16:21:50] oh, they are empty! [16:21:59] 1002 and 1003 only run the canary VM [16:22:06] arturo: yeah, we only use `cloudvirt1001` for whatever reason [16:22:12] ok! [17:09:19] Hello everyone! I am Harshinee, an Outreachy intern at Wikimedia. I developed a tool called the "Image-Content-Filtration" tool which I, along with my mentors and Musikanimal, have been trying to deploy on Toolforge. However, for some reason, the API works (no syntactical, dependency errors, shows the correct screen for /predict), but, calling the API returns no response. We checked the log file (uwsgi.log) and i [17:09:19] requests went through but the results don't get displayed in time. This is the tool: https://toolsadmin.wikimedia.org/tools/id/image-content-filtration [17:10:49] What do you mean by no response? [17:10:58] Like the full headers [17:12:29] So, when I run the following command "curl -X POST -F image=@Test_A002.jpg "https://image-content-filtration.toolforge.org/predict", the cursor keeps blinking indefinitely [17:14:47] harshinee: have you tried other images? [17:17:42] Hi RhinosF1! Yes, I have. All these images exist in my local system and I'm using my Anaconda to run the curl command [17:18:26] PS: The API works in my local system "http://127.0.0.1:5000/predict" but not here. [17:18:47] skip the Anaconda part and just run curl in bash? [17:19:16] And/or run it straight on the command line? make sure your dependancies etc are all "ok" [17:20:35] harshinee: how long does it take locally? [17:23:33] I think it takes a maximum of 3-4 seconds [17:23:36] for a prediction [17:23:52] @mutante, okay let me try that [17:24:45] There's pyCurl which might be better [17:25:13] Than requests [17:26:18] harshinee: your code doesn't assume "Test_A002.jpg" exists, right? in other words, you're basically uploading the image when you run `curl -X POST -F image=@Test_A002.jpg "https://image-content-filtration.toolforge.org/predict` ? [17:28:08] musikanimal: https://github.com/HarshineeSriram/Image-Content-Filtration/blob/master/app.py#L19 [17:28:15] Oh, no. I "cd" to the directory that contains the actual Test_A002.jpg otherwise it shows a "file doesn't exist" error [17:28:55] Should I remove that like, @RhinosF1? [17:29:00] *line [17:29:13] harshinee: no I'm linking musikanimal too it [17:29:19] I will think after I've ate [17:29:28] Hahaha okay [17:29:49] I ran this on my bash and it shows: % Total % Received % Xferd Average Speed Time Time Time Current [17:29:49] Dload Upload Total Spent Left Speed [17:29:49] 100 33181 0 0 100 33181 0 1452 0:00:22 0:00:22 --:--:-- 0 [17:33:04] if you use -I, do you get any useful headers back? [17:38:18] also curl has --upload-file which you could use to test if uploading any file to any site with curl works in general [17:38:40] to exclude stuff like .. needs a proxy or whatnot [17:47:44] when I use -l, I don't really get anything back. [17:52:42] I just got back an empty object `{}` [17:55:42] I tested on a random image just now harshinee from google and it definately hangs [17:56:16] yeah, I'm seeing `DAMN ! worker 2 (pid: 44) died, killed by signal 9 :( trying respawn ...` in uwsgi.log [17:56:28] musikanimal: what's mem usage [17:56:53] where do I get that [17:58:13] musikanimal: I'd guess from Granafa-labs [17:58:24] But if it works locally my bet would be it's a resource issue [17:59:00] oh [17:59:24] We tried restarting the kubernetes webservice with 4Gi memory [18:01:48] andrewbogott: ideas? [18:03:53] did you use webservice restart or did you stop and then start [18:04:23] restart, and also I'm not certain about the syntax. Can you confirm `-m 4Gi` is correct? [18:04:41] Oh yeah you need to do a hard stop/start I think [18:05:01] yeah restart doesn't actually change the memory allocation [18:06:42] Oh [18:16:13] I made slight changes to the code to see if that helps. Also restarted the webservice [18:16:15] but, [18:16:16] % Total % Received % Xferd Average Speed Time Time Time Current [18:16:16] Dload Upload Total Spent Left Speed [18:16:16] 100 33181 0 0 100 33181 0 595 0:00:55 0:00:55 --:--:-- 0 [18:21:05] So to summarize, it's not clear where the failure is occurring (aka why nothing is returned when invoking the service on toolforge), but the attempts to upload a file as part of the invoking the service also isn't working? [18:21:17] so 0% received? [18:24:16] if uploading a file seems to be part of the problem, perhaps make the API accept a URL instead? that would be better for most clients anyway [18:37:42] filed https://phabricator.wikimedia.org/T277495 for the webservice restart thing, since this has tripped up several people recently [18:42:46] Yeah, I'll try to make that part work