[02:47:10] thank you bstorm_ and arturo: the SSDictCursor fixed the problem! [02:48:28] Great! [02:51:21] now I'm on to the next crash [02:51:36] which is: any random http error with the API causing the whole script to barf [02:51:44] try/catch and retries to the rescue! [10:39:36] !log toolsbeta add myself to the `toolsbeta.admin` LDAP group (T225303) [10:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:39:39] T225303: sssd apparently not working fine in toolsbeta - https://phabricator.wikimedia.org/T225303 [11:55:35] !log tools.quickcategories deployed a35b558374 (add logout route) [11:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [14:01:03] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:50:48] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [20:33:25] !log account-creation-assistance apache2 shutdown to deal with spam (742 line DB change) [20:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Account-creation-assistance/SAL [20:35:34] !log account-creation-assistance apache2 restarted [20:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Account-creation-assistance/SAL [23:25:06] I notice running SQL queries in toolforge become very very slow. For example, I used to run a sample query each day, and it took 1-2 minutes to run. Now it takes more than hour to run, or it killed!. Is there a problem in toolforge connection to database? [23:26:18] Or it says "Lost connection". Same thing is in Quarry. No query successfully run these days with me. [23:26:51] ASammour, time taken to run a query will vary wildly with how complex the query is [23:27:13] suggest you find a particular query that now performs slower [23:28:50] ASammour: you may be hitting problems related to schema changes as well. See https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas for a recent set of changes that definitely have performance implications [23:29:13] Krenair: Thanks for reply. Take this query for example, it takes 3 seconds to run. But now when I run it via toolforge it take like 1 hour. https://quarry.wmflabs.org/query/32899 [23:29:48] This query for example list all broken redirects to be deleted. It's not complex at all. [23:29:59] redirect and page? [23:30:15] dont think that will be the actor migration [23:30:29] hopefully not, no [23:31:22] the optimiser says this is invalid [23:31:48] ... huh. [23:32:23] I think maybe the optimiser is broken by the use statement there, ok [23:32:59] bd808: Yes I know about the changes. The problem is that the syntax of the SQL is correct. But the performance become very bad. [23:34:13] https://quarry.wmflabs.org/query/36454 This query is very complex one. In the old days it takes like 200 seconds to successfully completed. Now I tried to run it from 3 hours, and it wont finish it :) [23:34:23] ASammour: when you say "toolforge" do you really mean Quarry? [23:35:54] bd808: Both of them. Same performance issues in Quarry, and toolforge. I try to run the same query in toolforge, and in Quarry, but the same problem happen. [23:35:59] $ mysql --defaults-file=$HOME/replica.my.cnf -h arwiki.analytics.db.svc.eqiad.wmflabs arwiki_p [23:36:14] *nod* [23:36:15] This is what I mean about toolforge. [23:36:28] labsdb1009 does that query in 10 seconds [23:36:40] which is web rather than analytics [23:37:30] Same thing when I try to run it via Workbench. [23:37:52] labsdb1011 in 26 seconds [23:41:07] 10 seconds is too much. It should finish it at most 4 seconds. [23:41:38] ASammour: that very much depends on what other work the database is doing at the same time [23:42:11] I don't think the replicas make any such guarantees [23:42:13] I'm looking at the monitoring dashboards now to see if there is anything really horrible with the server load [23:43:53] I'm sure there is a problem with the server load [23:44:39] ASammour: the question is if it is a problem we can actually fix or not. So far I'm not seeing anything particularly out of the ordinary [23:44:40] Additional note, Today I tried to run many queries in Quarry, and each time it says "Lost connection" [23:45:09] can you show an example? [23:46:29] The message like this https://quarry.wmflabs.org/query/37078 [23:47:49] This query I wrote it before https://quarry.wmflabs.org/query/31517. Usually it takes 1 minute to run. But now it wont able to finish it. [23:49:00] I tried to remove "group by", and restrict the range of query "where page_id between". But the same problem happen [23:49:13] labsdb1011, the only server for *.analytics.db.svc.eqiad.wmflabs right now, has 87 active users and a high replag for s1. The s1 replag is usually a sign of the server spending all its time running queries, leaving no room for the replication threads. [23:53:10] ASammour: unfortunately I don't think there is anything I can do to help you right now. The server is busy, but it looks like it is all legitimate queries from other Wiki Replica users [23:57:18] I understand your frustration with the performance changes over the last couple of weeks. We think this is being caused by the schema changes, but 1) I don't know how to prove that, and 2) I don't know what actions to take to correct it even if proven. [23:58:26] "Throw more hardware at the problem" would be the easiest solution technically, but we don't have any database class machines just sitting around to do that right now. [23:59:04] bd808: Thank you very much for investigating the problem. I understand the server is busy right now. But I notice this performance problem for the last 3 days. Is this healthy?