[12:22:56] any ideas on why videoconvert began consistently failing at the end of the final upload to Commons, with this error: "Status: An error occured: An error occured: internal_api_error_DBQueryError: [W-KeNApAIDkAADj4Z4UAAACV] Caught exception of type Wikimedia\Rdbms\DBQueryError" ? Worked fine earlier today. [12:38:45] Hi abartov, while someone may reply to your query, could it be a good idea to also file a task here too? https://phabricator.wikimedia.org/tag/wikimedia-video/ [12:38:53] And tag the "Tools" project as well [12:39:08] Something like what Brion did here: https://phabricator.wikimedia.org/T134685 [12:39:55] I'm pretty sure the maintainers will catch it there as well [12:42:14] Also looking at the error message, it could be an issue with the tool facing some difficulties connecting to the database :( [12:42:31] Or failing on some db query [12:44:01] Checking here: https://tools.wmflabs.org/admin/tool/videoconvert, I see Lokal Profile and Prolineserver are the current tool maintainers, it could also be good to add them on the ticket on phab [13:36:40] Hi cloud-people - I have a question regarding quary [13:37:08] is there a place / a way to get stats on how many query fail because too long or too complicated? [13:39:46] joal: there is a query killer in the server, but I don't know if stats about his usage is exported somewhere [13:39:54] probably ask zhuyifei1999_ when he is awake [13:40:46] Thanks arturo - Will do :) [13:40:57] some bits of info here: https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/quarry#Queries [16:29:26] !log pagemigration deleting project [16:29:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Pagemigration/SAL [18:26:25] !log shinken stopping puppet and shinken service on shinken-02 during the deployment-prep migration [18:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [18:30:37] joal: umm, that would require a query to the query database itself [18:31:07] we don't yet provide a public endpoint for that [18:31:22] Hi zhuyifei1999_ - A complicated one, or a feasible one? [18:31:42] https://phabricator.wikimedia.org/T151158 [18:32:22] well, if you write a query for this statistics I can execute it for you :P. or I can write one myself [18:33:59] zhuyifei1999_: My request is a one shot, so if you can run the query that's great :) About writing it, do you have doc pointer for me to try to get how this should be done? [18:34:46] the quarry database layout is https://github.com/wikimedia/analytics-quarry-web/blob/master/tables.sql [18:36:26] Ok zhuyifei1999_ - I'm assuming I'd actually be more interested in /var/log/quarry/killer.log ? [18:36:36] * zhuyifei1999_ looks [18:37:10] zhuyifei1999_: I'm trying to get a feel and some examples of long queries and failing queries [18:37:21] that log goes back to 2018-09-23 14:29:01,562 [18:37:53] zhuyifei1999_: I'm assuming there is plenty in there - How could I look at it? [18:38:05] https://www.irccloud.com/pastebin/zxORyZZb/ [18:38:32] well, it doesn't contain the query ids [18:38:39] Arf - yeah just saw that :( [18:38:44] hm [18:40:16] zhuyifei1999_: in query_run table, there is status, task_id and extra_info fields - Would this contain releveant info? [18:46:42] joal: yeah [18:46:50] sorry was away for a few mins [18:47:05] status should contain a number indicating it's killed [18:49:46] zhuyifei1999_: I'm gonna try to write some queries and ping you when I have them - Many thanks for your help! [18:49:55] k np [18:50:08] zhuyifei1999_: zhuyifei1999_ Until what time can I bother you without being disrepectful? [18:50:37] s/disrepectful/disrespectful [18:51:35] can you do it within an hour? afterwards I might have to leave and get back around 4 hours later [20:36:02] Hi zhuyifei1999_ - I'm assuming you're gone [20:38:06] zhuyifei1999_: Here are 3 queries of interest for me: https://gist.github.com/jobar/39e9428a332d9bb2823ff03a6ab213a8 [20:38:48] zhuyifei1999_: Depending on their results, I might go for a second run for refinement, but it might not even be needed [20:38:58] zhuyifei1999_: Thanks a milion again for your help :) [21:55:41] gehel: if you're still up… I think some of your recent changes may have broken elasticsearch on deployment-prep. Do you have a moment to look? [21:55:48] https://www.irccloud.com/pastebin/6QLKjAKb/ [21:55:52] I should say, broke puppet [21:56:04] andrewbogott: looking [21:56:09] thanks! [21:56:26] I didn't chase down the bug, just pointing at you because you touched it last [21:56:42] yeah, seems plausible it would be me... [22:06:02] andrewbogott: I get a different error: [22:06:10] https://www.irccloud.com/pastebin/VCbxwduw/ [22:06:54] yeah, we moved the puppet infra and now some things are breaking in the old region. I don't understand this well enough to know if we can work around it... [22:07:07] oh, now it's broken in the new region too [22:07:24] well, anyway — my previously pasted error still stands :) It will return as soon as we figure out what's going on with this cert [22:08:42] ok, so multiple problems [22:09:04] that's not going to help understand my problem :) [22:10:05] it's ok if you want to set that aside for now. [22:10:17] It'll be a bit before we get puppet working properly again [22:11:09] I'll dig a bit more, see if I can understand (at the moment, I don't even understand how that could happen) [22:11:30] but it's getting late, I might finish that tomorrow [22:12:26] ok, no problem [22:19:39] andrewbogott: my brain isn't working right, too late.. I'll dig tomorrow [22:19:54] ok! Thanks for looking [22:20:03] andrewbogott: i think i have a solution for you, sec [22:20:46] i believe the problem is there is no longer a default cluster name applied [22:21:01] that would fit [22:22:48] I have to run but will be back later [23:16:18] joal: back [23:20:17] https://www.irccloud.com/pastebin/wewKOqiV/ [23:21:57] I'm assuming the issue is a dangling comma, so [23:22:03] https://www.irccloud.com/pastebin/Rk8diTp2/ [23:22:05] thanks, d3r1ck, but the problem is precisely that the tool I'm concerned about (videoconvert on toolforge) does not seem to be managing issues and code on Phabricator. Its maintainer is not responsive either. That's why I was hoping some cloud admins (harej, andrewbogott, bd808?) could comment on the particular error message. e.g. Could it be a disk space issue? A rate limit of some kind? [23:22:40] (I uploaded about 6GB worth of videos from GLAM-wiki in mp4s under 1.5GB each.) [23:23:17] Ohhh okay! I see now! Thanks for clarifying me :) [23:23:32] I'm assuming the error is about the extra space, so: [23:23:45] https://www.irccloud.com/pastebin/Z8dZnvgf/ [23:27:11] abartov: I looked up that error id in logstash.wikimedia.org. The error logs are about a timeout while trying to delete from the uploadstash table. So... transient error? [23:27:50] bd808: thanks! So I should just keep retrying? [23:28:08] bd808: what might cause a timeout on deleting from that table? [23:28:35] abartov: its a lock wait timeout, so basically the table being busy [23:28:58] stashbot: how large is the video that's being uploaded? [23:28:58] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [23:29:06] oops [23:29:09] abartov: ^ [23:31:10] bd808: I see. [23:31:13] zhuyifei1999_: https://pasteboard.co/HNXmvAB.png [23:31:48] the one 1.44GB will likely fail [23:32:15] https://phabricator.wikimedia.org/T128358 [23:33:08] there was a bug about some file locking mechanism weirdness... trying to find it [23:34:49] hmm it doesn't seen to be relevant https://phabricator.wikimedia.org/T197125 [23:35:15] aaron mentioned the timeout should be 5 min so you might try that