[02:47:10] <audiodude>	 thank you bstorm_ and arturo: the SSDictCursor fixed the problem!
[02:48:28] <bstorm_>	 Great!
[02:51:21] <audiodude>	 now I'm on to the next crash
[02:51:36] <audiodude>	 which is: any random http error with the API causing the whole script to barf
[02:51:44] <audiodude>	 try/catch and retries to the rescue!
[10:39:36] <arturo>	 !log toolsbeta add myself to the `toolsbeta.admin` LDAP group (T225303)
[10:39:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[10:39:39] <stashbot>	 T225303: sssd apparently not working fine in toolsbeta - https://phabricator.wikimedia.org/T225303
[11:55:35] <wm-bot>	 !log tools.quickcategories <lucaswerkmeister> deployed a35b558374 (add logout route)
[11:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL
[14:01:03] <wm-bot>	  Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
[14:50:48] <wm-bot>	  Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
[20:33:25] <Izhidez>	 !log account-creation-assistance apache2 shutdown to deal with spam (742 line DB change)
[20:33:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Account-creation-assistance/SAL
[20:35:34] <Izhidez>	 !log account-creation-assistance apache2 restarted
[20:35:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Account-creation-assistance/SAL
[23:25:06] <ASammour>	 I notice running SQL queries in toolforge become very very slow. For example, I used to run a sample query each day, and it took 1-2 minutes to run. Now it takes more than hour to run, or it killed!. Is there a problem in toolforge connection to database?
[23:26:18] <ASammour>	 Or it says "Lost connection". Same thing is in Quarry. No query successfully run these days with me.
[23:26:51] <Krenair>	 ASammour, time taken to run a query will vary wildly with how complex the query is
[23:27:13] <Krenair>	 suggest you find a particular query that now performs slower
[23:28:50] <bd808>	 ASammour: you may be hitting problems related to schema changes as well. See https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas for a recent set of changes that definitely have performance implications
[23:29:13] <ASammour>	 Krenair: Thanks for reply. Take this query for example, it takes 3 seconds to run. But now when I run it via toolforge it take like 1 hour. https://quarry.wmflabs.org/query/32899
[23:29:48] <ASammour>	 This query for example list all broken redirects to be deleted. It's not complex at all.
[23:29:59] <Krenair>	 redirect and page?
[23:30:15] <Krenair>	 dont think that will be the actor migration
[23:30:29] <bd808>	 hopefully not, no
[23:31:22] <Krenair>	 the optimiser says this is invalid
[23:31:48] <Krenair>	 ... huh.
[23:32:23] <Krenair>	 I think maybe the optimiser is broken by the use statement there, ok
[23:32:59] <ASammour>	 bd808: Yes I know about the changes. The problem is that the syntax of the SQL is correct. But the performance become very bad.
[23:34:13] <ASammour>	 https://quarry.wmflabs.org/query/36454 This query is very complex one. In the old days it takes like 200 seconds to successfully completed. Now I tried to run it from 3 hours, and it wont finish it :)
[23:34:23] <bd808>	 ASammour: when you say "toolforge" do you really mean Quarry?
[23:35:54] <ASammour>	 bd808: Both of them. Same performance issues in Quarry, and toolforge. I try to run the same query in toolforge, and in Quarry, but the same problem happen.
[23:35:59] <ASammour>	 $ mysql --defaults-file=$HOME/replica.my.cnf -h arwiki.analytics.db.svc.eqiad.wmflabs arwiki_p
[23:36:14] <bd808>	 *nod*
[23:36:15] <ASammour>	 This is what I mean about toolforge.
[23:36:28] <Krenair>	 labsdb1009 does that query in 10 seconds
[23:36:40] <Krenair>	 which is web rather than analytics
[23:37:30] <ASammour>	 Same thing when I try to run it via Workbench.
[23:37:52] <Krenair>	 labsdb1011 in 26 seconds
[23:41:07] <ASammour>	 10 seconds is too much. It should finish it at most 4 seconds.
[23:41:38] <bd808>	 ASammour: that very much depends on what other work the database is doing at the same time
[23:42:11] <Krenair>	 I don't think the replicas make any such guarantees
[23:42:13] <bd808>	 I'm looking at the monitoring dashboards now to see if there is anything really horrible with the server load
[23:43:53] <ASammour>	 I'm sure there is a problem with the server load
[23:44:39] <bd808>	 ASammour: the question is if it is a problem we can actually fix or not. So far I'm not seeing anything particularly out of the ordinary
[23:44:40] <ASammour>	 Additional note, Today I tried to run many queries in Quarry, and each time it says "Lost connection"
[23:45:09] <Krenair>	 can you show an example?
[23:46:29] <ASammour>	 The message like this https://quarry.wmflabs.org/query/37078
[23:47:49] <ASammour>	 This query I wrote it before https://quarry.wmflabs.org/query/31517. Usually it takes 1 minute to run. But now it wont able to finish it.
[23:49:00] <ASammour>	 I tried to remove "group by", and restrict the range of query "where page_id between". But the same problem happen
[23:49:13] <bd808>	 labsdb1011, the only server for *.analytics.db.svc.eqiad.wmflabs right now, has 87 active users and a high replag for s1. The s1 replag is usually a sign of the server spending all its time running queries, leaving no room for the replication threads.
[23:53:10] <bd808>	 ASammour: unfortunately I don't think there is anything I can do to help you right now. The server is busy, but it looks like it is all legitimate queries from other Wiki Replica users
[23:57:18] <bd808>	 I understand your frustration with the performance changes over the last couple of weeks. We think this is being caused by the schema changes, but 1) I don't know how to prove that, and 2) I don't know what actions to take to correct it even if proven.
[23:58:26] <bd808>	 "Throw more hardware at the problem" would be the easiest solution technically, but we don't have any database class machines just sitting around to do that right now.
[23:59:04] <ASammour>	 bd808: Thank you very much for investigating the problem. I understand the server is busy right now. But I notice this performance problem for the last 3 days. Is this healthy?