[01:39:50] Where there database changes in the last week? https://commons.wikimedia.org/wiki/User:Dispenser/Wrong_Extension Went from 30 minutes to query being killed. [03:50:14] doctaxon: which host are you connecting to? [13:50:17] Dispenser: There has been maintenance going on to compress tables. I don't know a lot about it, but it seems possible that is impacting things. [15:20:53] !log shinken stopping shinken and puppet on shinken-02 because there are about to be a lot of alerts related to the rebuild of cloudservices1003 [15:21:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [15:55:30] !log admin T221769 rebooting cloudservices1003 after bootstrapping is apparently completed [15:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:56:16] T221769: Upgrade cloudservices1003/1004 to stretch/mitaka - https://phabricator.wikimedia.org/T221769 [18:57:44] so, my cgi python script works well when I run it locally, but when accessed through browsers, I get a 500 ISE, seemingly because of the error "ImportError: No module named MySQLdb". what's the easiest way to fix that? [19:09:36] are you running it in a venv locally? [19:10:02] or, rather, not in a venv on tools? [19:25:54] Reedy, yeah, no venv [19:29:31] dungodung: you should be using a virtualenv to get your python packages for a webservice. This is probably the "best" documentation on that -- https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#python_(Python3_+_Kubernetes) [19:29:59] I put best in scare quotes because that doc is not really very readable at this point :/ [19:30:10] bd808, thanks, I'll give it a try [19:36:31] o/ zhuyifei1999_ [19:36:37] Did that fix for TSVs go out? [19:36:43] Looks like none of the fields are quoted anymore now [19:36:44] lol [19:36:50] ^ re. quarry [19:37:01] I think it went out, let me check [19:37:37] yeah it's at cc0c0a71330b12dae88b8bc8bf4ae3fa85037eaa [19:37:40] Otherwise, it seems to have worked. [19:37:50] So we want quotes around string fields in TSVs? [19:37:55] I don't care either way :) [19:38:27] I mean, we only quote when it's strictly necessary right? [19:38:36] like, if a field contains a tab, then we quote it [19:55:47] halfak: quotes are (or should be) only added when there is a risk of CSV injection (https://phabricator.wikimedia.org/T209226). [20:06:42] framawiki, personally, I would rather there be quotes on everything or not at all. [20:06:47] Simple formats FTW [20:07:17] If you only quote sometimes, I can't read these things in with a simple CSV reader -- like in R or python's CSV library. [20:11:20] zhuyifei1999_, quoting a tab is not the common way of escaping them in TSV. "\t" is. [20:12:03] I think we should go back to following MySQL's TSV format :| It seems to be the most common dialect. [20:12:05] I think we are using python's CSV writer [20:12:22] I think there's a MySQL dialect in there. [20:12:51] https://github.com/wikimedia/analytics-quarry-web/blob/master/quarry/web/output.py#L111 [20:13:42] I don't see mysql dialect in https://docs.python.org/3/library/csv.html [20:13:55] mind pointing me to at the right direction? [20:14:17] Hmm. I don't see it either. [20:14:22] I thought I did at one point in the past [20:15:29] I think if python generates it this way, I would expect it to be able to read it [20:17:57] Hmm. Looks like python would work for it. [20:18:14] You can use the QUOTE_MINIMAL argument to get it to behave. [20:18:17] But R will barf [20:18:33] delimiter="\t", doublequote=False, escapechar="\", quoting=QUOTE_NONE [20:18:39] That's the MySQL dialect. [20:18:49] I wish you could just import it from somewhere. [20:20:44] mind filing a patch / ticket? [20:21:31] Either way, this is working better than I expected, so you can file my complaints under "cute feature requests" ;) [20:21:41] will do zhuyifei1999_ [20:21:48] Sorry to get pushy. :) [20:22:17] np [20:24:29] * zhuyifei1999_ is writing code that's driving me nuts right now. that's an exaggeration but still, stuffs like 01001o0011eeeeee, 0100110000eeeeee0ddd000000000000 ... [20:27:07] ^ binary pattern matching, not hex [20:50:33] Hi. [20:50:43] tools-sgebastion-07 is really horrible right now. [20:50:53] It's taking minutes to run simple commands such as `ls`. [20:51:18] Is there a better bastion host I can use? [20:53:47] mzmcbride@tools-sgebastion-07:~$ time ls > /dev/null [20:53:48] real 0m48.063s [20:53:54] mzmcbride@tools-sgebastion-07:~/logs$ time cd .. [20:53:57] real 1m3.038s [20:54:05] hi [20:54:07] This is insane. [20:54:10] Marybelle: You get what you pay for [20:54:12] lemme see if I can ssh in [20:54:25] Someone is presumably overloading NFS [20:54:28] Reedy: I mean, at this point, I'm considering just setting up my own replica. [20:54:56] The filesystem is slow and horrible. The SQL servers also seem overloaded and slow and bad. [20:55:02] load average: 19.32, 10.66, 5.02 -- the bastion is being overloaded [20:55:16] That's from tools-sgebastion-07 [20:55:27] 20:45:06 up 6 days, 8:25, 27 users, load average: 2.61, 1.22, 0.56 [20:55:32] Was from a few minutes ago. [20:55:37] Maybe it's me overloading it. [20:55:57] I see an scp happening [20:56:25] It feels like basic file operations such as scp shouldn't cause the machine to become unusable. [20:57:04] Reedy: How much do you think it would cost per month to host an enwiki_p replica in AWS or Google Cloud? [20:57:19] I imagine it wouldn't be cheap [20:57:24] For that class of machine? expensive [20:57:26] How would you get it there? [20:57:29] As long as the database dumps are working regularly. [20:57:50] Quite a bit of cpu time to import them tohugh [20:57:53] though [20:57:57] Yeah, the *links tables are going to get you. [20:58:04] Marybelle: Like with toolserver back in the day [20:58:08] And revision. But if I just wanted like `redirect` and `page`. [20:58:12] People abuse the resources, and ruin it for everyone [20:58:34] Wikimedia Foundation Inc. has $100M. [20:58:40] I think we could afford a few servers. [20:58:47] It's primary goal isn't to run a shared hosting service :P [20:58:52] So people could abuse even more resources? [20:59:06] Abuse is a pretty loaded term. [20:59:15] Marybelle: Cloud are hiring an ops person. Maybe you can apply and solve all their problems? [20:59:17] If someone is abusing the hosts, ban them. It's trivial to do. [20:59:28] Reedy: It seems like a budget issue, not a personnel problem. [20:59:34] The server resources appear to me to be sufficient for normal usage. [20:59:37] Put more money into hardware. [20:59:47] Krenair: Then why is `ls` taking a minute? [20:59:52] The servers these run on are huge and very good. However, the scrubbing we are doing is in need of some love because the views require subqueries and joins at this point [21:00:00] Because someone is doing something they should not be on a shared bastion? [21:00:04] Marybelle: because someones is running a huge scp on NFS [21:00:18] AKA not normal usage [21:00:20] I don't understand why a basic operation like scp ruins the NFS server. [21:00:22] Usually it's not through malice that people cause problems, but not knowing better [21:00:25] It's done [21:00:25] How is using scp not normal????????? [21:00:28] The scp [21:00:29] scp in principle is fine. [21:00:38] Exactly [21:00:40] So people are supposed to know that they can't run scp? [21:00:42] The performance problem comes in with the amount of data [21:00:45] bstorm_: did you already kill the scp? [21:00:49] If you move a massive file or dump with scp, it can fill a pipe [21:00:55] legoktm: it finished [21:00:56] I don't see it on the bastion [21:00:57] ah [21:01:03] that would explain why I was able to login [21:01:06] Load is dropping like a stone [21:01:11] You could login anyway, it just took ages :P [21:01:37] Aren't there a million tools that can bound CPU and memory per-user and per-process? [21:01:37] Trying to find other things that might be eating NFS [21:02:09] Marybelle: almost everything on the bastion is restricted by cgroup to prevent abuse. NFS is unbounded, unfortunately because of the limitations of the system. [21:02:15] NFS is NFS [21:02:18] NFS sucks [21:02:23] It's very frustrating to try to do anything on Labs. [21:02:27] We are exploring replacement systems, but that takes time [21:02:38] If you need a system that is not as vulnerable to other users doing questionable things with resources, Wikimedia can provide that within reason too. [21:02:47] load is climbing again...trying to find it [21:02:57] There's plenty of Cloud VPS instances outside the tools project [21:03:01] I'm running a large `wc -l` fwiw. [21:03:02] Same user, same thing [21:03:19] Or trying to. NFS is slow. [21:03:36] Marybelle, if your main problem is filesystem slowness and you're working mainly in the tools project, it's probably a case of it being frustrating to work inside projects which use NFS [21:03:50] which are probably a minority [21:04:26] Krenair: I'm mostly using the filesystem because the SQL servers are too slow and overloaded. [21:04:37] So I was like "I can just write to disk and read the file, that shouldn't be so bad." [21:04:39] And yet. [21:04:42] they are? [21:05:13] [21:59:52] The servers these run on are huge and very good. However, the scrubbing we are doing is in need of some love because the views require subqueries and joins at this point [21:05:17] maintenance [21:05:28] Still can't use NFS. There might be something else.... [21:05:59] IIRC even in tools there may be some local storage available for users to do quick operations with? [21:06:08] Krenair: I have my own project/instance thing somewhere. Maybe I'll try that. [21:06:30] The filesystem is the weak point. The SQL servers are not actually slow. The query can be slow because you are using a massive database that currently is suffering from joins and subqueries in the backend. Queries are requiring increasing cleverness...which is typical in a massive DB, but still. [21:06:48] butterfly-m4m2? [21:06:52] Yeah. [21:06:55] Using the NFS to make up for what are some of our largest servers is not a good idea [21:06:58] Or butterfly or something, yeah. [21:07:41] What's the non-tools bastion address? [21:07:53] bastion.wmflabs.org or something? [21:08:11] should work [21:08:18] yes [21:08:24] 21:08:12 up 147 days, 6:39, 5 users, load average: 0.04, 0.03, 0.00 [21:08:27] very quiet over there [21:08:51] Lovely. Can I write large files there? [21:09:11] it's a small instance so you'll have up to 2gb iirc [21:09:11] You cannot write a file over 40GB on any bastion [21:09:16] Ugh, no sql installed. [21:09:21] oh you mean the bastion [21:09:22] Unless you are talking about some other place :) [21:09:24] sorry I was looking at butterfly [21:09:30] /dev/vda3 19G 12G 6.5G 64% / [21:09:32] Ah ok. I dunno that stuff. [21:09:57] bah, 20gb. [21:10:25] Didn't there used to be multiple bastion hosts for tools? [21:10:34] I thought I could escape to like tools-sgebastion-04 to avoid you fools. [21:10:36] There is [21:10:43] But NFS is shared among them? [21:10:44] But if you blast the other one's NFS, it's the same NFS [21:10:49] Bleh. [21:10:52] I hate computers. [21:10:53] NFS is shared across the entire compute system [21:10:58] Same servers [21:11:26] so, this is the process killing things it seems. It's locked up in uninterruptible sleep: [21:11:26] mzmcbri+ 17304 0.0 0.0 25312 684 pts/22 DN+ 20:42 0:01 wc -l enwiki-namespace-0-redirects-2018-05-05.txt enwiki-namespace-0-redirects-2018-05-14.txt enwiki-namespace-0-redirects-2018-05-20.txt enwiki-namespace-0-redirects-2018-06-06.txt enwiki-namespace-0-redirects-2019-05-29.txt enwiki-namespace-0-redirects-2019-06-02.txt enwiki-namespace-0-redirects-misspellings-2018-05-20.txt [21:11:40] That's me. [21:11:45] Please kill that. [21:11:56] Okay. [21:12:01] I can't use `wc -l`? [21:12:09] We'll see :) [21:12:12] Marybelle we just had this discussion. [21:12:15] I want to see if that stops the thrashing [21:12:24] Krenair: What was the conclusion? [21:12:28] It's not the command, it's how it behaves on NFS [21:12:34] Users can't run any commands that might touch the filesystem? [21:12:35] It's the same type of thing as with scp [21:12:43] SCP is fine on a small file [21:12:44] etc. [21:12:44] The commands in principle can be fine [21:12:52] I run things on NFS all day. [21:12:54] What file size? [21:12:56] You can also nice it sometimes [21:12:59] How big are all those files in terms of file size? [21:13:01] How would I even know? [21:13:10] If I run `du`, is that going to cause thrashing? [21:13:14] This is insanity. [21:13:15] When it gets to the point that you cannot use the bastion anymore, [21:13:25] That's when it's time to reconsider the size of the operation you're trying to run [21:13:35] Reconsider how? [21:13:35] Tools is better now [21:13:39] It's a large dataset. [21:13:39] It was that command [21:13:41] Isn't du done based on inodes, ot calculated on teh fly? [21:13:43] Actually it's probably a good idea to consider long before that, but still [21:14:02] Reedy: I think it can use either. [21:14:04] du --inodes will give you inodes [21:14:08] Marybelle: You could try running it against /data/scratch intsead [21:14:09] That. ^ [21:14:11] *instead [21:14:13] it normally does block usage [21:14:22] bstorm_: Is /data/ non-NFS? [21:14:30] It is NFS, but it isn't /home [21:14:37] It also is a different NFS server [21:14:38] krenair@tools-sgebastion-07:~$ ls -lh /data [21:14:39] total 0 [21:14:39] lrwxrwxrwx 1 root root 41 Feb 8 01:08 project -> /mnt/nfs/labstore-secondary-tools-project [21:14:39] lrwxrwxrwx 1 root root 55 May 28 18:08 scratch -> /mnt/nfs/secondary-cloudstore1008.wikimedia.org-scratch [21:14:57] lrwxrwxrwx 1 root root 38 Feb 8 01:32 /home -> /mnt/nfs/labstore-secondary-tools-home [21:14:59] So what is the guidance for users here? [21:15:07] Don't run `wc` or `scp` on large files? [21:15:12] This architecture doesn't scale. [21:15:34] How would anyone know that a basic command would deadlock the whole host? [21:15:36] Welcome to NFS and the tools project? [21:15:43] If you used a local disk for /home, you could cripple a largely multiuser system. I'm not interested in discussing how or why. You can figure that out if you like. [21:15:58] bstorm_: I think I just did. [21:15:59] However, if you use /project/scratch, you are not impacting everyone's home directories [21:16:07] https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pdf [21:16:10] And you may be able to do that without much problem [21:16:10] "Why NFS Sucks" [21:16:14] LOL [21:16:20] NFS is problematic [21:16:23] Reedy: Then use something else? [21:16:30] What, gluster? [21:16:36] Remember how well that worked out for toolserver? [21:16:47] Krenair: I'm not sure welcoming is really appropriate when I've been a Toolserver and Labs user for longer than you. :P [21:16:55] https://phabricator.wikimedia.org/T207590 [21:17:01] Mind you, Marybelle, I cannot guarantee that /data/scratch will respond well, but it is on 10Gb ethernet and is a much newer and less-used server [21:17:22] Okay. Can I move files from /home/ to /data/? [21:17:27] Or is that going to kill the host? [21:17:35] Seems like it's gonna kill the host. [21:17:58] run it with nice? [21:18:04] No. [21:18:13] I'll just re-run this query. It only took 147 minutes. [21:18:20] glhf [21:18:42] I don't understand how anyone would volunteer to work on this platform. [21:18:51] Like from a developer/user perspective. [21:18:52] Because they don't have other options? [21:19:09] I'm considering just thrashing api.php instead. [21:19:16] Since the production servers will keep up. [21:19:25] And I can just build my own local database. [21:19:33] ಠ_ಠ [21:19:48] ONE HUNDRED MILLION DOLLARS PER YEAR. [21:20:19] Wikipedia is not a shared hosting provider [21:20:25] Yes it is. [21:20:34] It's 2019. Pointing to how the Toolserver behaved in 2010 is pretty unacceptable and stupid. [21:20:49] Unfortunately, things like NFS haven't improved much in nearly a decade [21:21:30] Also, just because you (and other, whether rightly or wrongly) see cloud and tools as important, doesn't mean higher ups, the board, or other parts of the foundation see them as a priority [21:21:40] So maybe it doesn't get the money it should get [21:22:19] https://wikimediafoundation.org/ still has German text. [21:22:31] Go have that discussion with Comms [21:22:32] I'll wait [21:22:53] I mean, if that isn't symbolic of the utter dysfunction and mismanagement. [21:23:07] Max, this is a channel about cloud services [21:23:11] It's a feature, not a bug anyway [21:23:38] Krenair: Yeah, they're borderline unusable. [21:23:44] And your suggestion is basically "don't use them, then." [21:23:46] Thx so much. [21:23:49] I happen to agree with your point about the foundation website, but it's really off-topic here [21:24:08] they're not unusable. I deal with them all the time [21:24:21] Not when I'm running something wild like `wc`. [21:24:26] Rage. [21:24:30] on a large file [21:24:44] or, several [21:24:45] Why does the file size matter again? [21:24:55] Can't it just stream the data? [21:25:01] Isn't that whole point? [21:25:03] god knows [21:25:06] the * [21:25:06] I didn't write wc [21:25:10] does wc -l on many files in one command do it in parallel or one at a time? [21:25:22] mzmcbri+ 17304 0.0 0.0 25312 684 pts/22 DN+ 20:42 0:01 wc -l enwiki-namespace-0-redirects-2018-05-05.txt enwiki-namespace-0-redirects-2018-05-14.txt enwiki-namespace-0-redirects-2018-05-20.txt enwiki-namespace-0-redirects-2018-06-06.txt enwiki-namespace-0-redirects-2019-05-29.txt enwiki-namespace-0-redirects-2019-06-02.txt enwiki-namespace-0-redirects-misspellings-2018-05-20.txt [21:25:27] Also, if you stream the data from a server that is running on 10x greater bandwidth, you are likely to have a better time [21:25:37] Cause if it's trying to read all those files simultaneously... That kinda explains why [21:25:39] I'm going to try /data/scratch/. [21:25:47] You can copy files using rsync with the --bwlimit option to avoid filling the pipe [21:26:07] Reedy: Reading multiple files should be fine. [21:26:13] Since that's what computers do all day, every day. [21:26:17] I recommend --bwlimit of like 40000 [21:26:21] Maybe it's loading them all up into somewhere(?????). [21:26:36] But just counting lines in a file shouldn't be doing that. I don't get it. [21:26:41] It's suffering iowait, which is triggering high load. [21:26:47] Hi folks. I would like to remind everyone that we are literally all in this work together. And that ranting at each other won't do much other than make some people mad and others sad [21:26:51] Which is likely related to the network [21:27:12] It sucks when our shared resources are too slow [21:27:33] It sucks when we get blocked on things we thought should be easy [21:28:09] Yes. [21:28:13] But being aggressive to humans in response to machines being uncooperative if not helpful [21:28:43] *is not helpful [21:28:45] I feel like blaming NFS is a very old excuse. [21:29:14] My frustration level would be lower if this were not a problem spanning years. [21:29:33] its a very old tech, and something that takes a lot of money and time to replace [21:29:50] Marybelle: preach! Want to help keep my budget asks from being cut every year? [21:29:57] And, again, in 2019 with a massive budget, it's pretty bewildering that the infrastructure is in a state where... sigh. [21:30:15] I'm somewhat serious about the switch to api.php. [21:30:27] If production infrastructure is getting the money, I'm tempted to just use that. [21:30:28] Marybelle: As I said earlier, and as bd808 just said... Just because you see it as important, doesn't mean people with the purse strings do too [21:31:17] Marybelle: honestly if you have data needs that the Action API or something from RESTBase satisfies, you will probably get better performance there [21:31:17] there's a rant about NFS in 'the Unix Hater's Handbook' written in 1994. And yet people still run it all around the world. Shared filesystems turn out to be a hard problem :( [21:31:44] bd808: Right, my issue is that I want to query across the whole dataset. [21:31:53] Like checking every redirect or every page title or whatever. [21:32:02] And neither is really suited to that afaik. [21:33:09] enwiki-20190520-page.sql.gz 1.6 GB [21:33:26] I'm pretty sure I could just load that into a local MySQL database. [21:33:36] yup [21:33:51] but you will wait ~30 days for the next dump [21:33:56] Right. [21:33:57] so its a trade off [21:34:26] It's also pure insanity to need to run MySQL locally when there are supposed to be up-to-date replicas available for this exact purpose. [21:35:01] Reedy: I know nobody likes discussing politics in here, but I very much remember when Wikimedia Foundation Inc. killed off the Toolserver because the Labs world was going to be so much better. [21:35:14] There was budget then. [21:35:32] Marybelle: fly to PDX and yell at Erik. He's not here anymore [21:35:36] Heh [21:35:47] whatever happened in 2013 is done [21:35:57] I agree. [21:36:01] this is 2019 and these are the constraints we are tyring to live with [21:36:17] Computers have only gotten faster and cheaper. [21:36:20] we have a shared storage system that we outgrew 5 years ago [21:36:30] Outside of this dystopian world. [21:36:58] we have a database schema optimized for OLTP actions that peopel are tyring to use for OLAP queries [21:37:02] I should just set up a Linode instance and send Wikimedia Foundation Inc. the bill. [21:37:40] The key is no one here has the power to increase the funding to cloud to get even better things (TM). So #wrongvenue [21:37:54] Izhidez: What's the correct venue? [21:38:02] Give it a shot. A rapid grant would cover it I'm sure if you can write a convincing business case for the benefit to the movement [21:38:04] Reedy: You say that Wikipedia doesn't do shared hosting or whatever, but for the price of a single person's salary, it could easily pay the hosting costs for all these users. [21:38:11] Somewhere up the WMF chain of command [21:38:16] hell if I know [21:38:18] Izhidez: Thx so much. [21:38:24] Marybelle: Could it? Including the replicas and stuff people use? [21:38:31] Reedy: Shared hosting! [21:38:37] The replicas are separate. [21:38:42] In my mind, anyway. [21:38:47] And yeah, easily. [21:38:54] But a lot of these people using the shared hosting, are using the replicas [21:38:59] You know a Linode instance with lots of gigs is like $5/month? [21:39:07] So you can't just ship them off elsewhere, and expect things to work [21:39:09] Marybelle: really? ~$100K USD per year would get 14,000 shared hosting accounts? [21:39:35] call it $200K if you want (nobody gets paid that here), still going to be far short [21:39:40] Did 14K people log in this year? [21:39:51] That's a lot of people. [21:39:58] We don't have 14K active editors... [21:40:00] Technically, 100K would buy 20,000 1GB "Nanode 1GB" linode vms [21:40:12] plus tax, because americans are stupid with that [21:40:27] but nobody to run them :) [21:40:39] heh [21:40:54] linode can sort it when they're overloaded and under performing [21:41:13] Marybelle: that 14K number is just potentially authorized users. I took it from https://tools.wmflabs.org/openstack-browser/ which is looking at the total of LDAP accounts [21:41:35] I honestly don't have a tracked metric of ssh logins [21:41:43] It can't be more than 1K users. [21:42:11] I'm not sure that's a reasonable metric to track actually as the tool->user mapping is pretty variable [21:42:17] For $5/month, 1K users is $60K? [21:43:22] Reedy: Also, at least at this point in my life, I'd be willing to pay money to not waste my time on this. [21:43:31] So go and do it then? [21:43:33] Problem solved [21:43:36] * Reedy shrugs [21:44:18] Reedy: Are you selling the service? [21:44:27] "the"? [21:44:28] How much and to whom do I remit payment? [21:44:40] Yeah. I'd happily create a contract here. [21:44:51] At least then you couldn't snark with "you get what you pay for." [21:44:57] I'll pay, who do I send the money to? [21:45:11] You can pay it into the general donation pool [21:45:18] Where you get little to no say where the money is used [21:45:22] And there we have the current problem [21:45:38] Marybelle: are there going to be more on topic questions, or just rants about the WMF? [21:45:54] I'm game for rants, but maybe we could do that elsewhere [21:46:06] Maybe. [21:46:35] Reedy: I think what bothers me is the idea that I'm already donating volunteer time, pretty expensive volunteer time. [21:46:43] For someone who knows SQL, SSH, Unix, etc. [21:46:48] That's your choice though, isn't it? :) [21:46:50] And then it's like I also have to pay for the infrastructure? [21:46:51] No one is forcing you to [21:46:54] Sure. [21:47:03] But that's a really shitty attitude. [21:47:23] And I don't understand how you all will attract or retain any volunteers with that attitude. [21:48:41] bd808: I did have a technical question. [21:48:56] enwiki.analytics.db.svc.eqiad.wmflabs seems to be some kind of round-robin/load balanced address. [21:49:07] So when I try to monitor a long-running query, I get different results. [21:49:17] Is there some way to specify the actual host where the query is running? [21:49:37] mzmcbride@tools-sgebastion-07:~$ mysql -henwiki.analytics.db.svc.eqiad.wmflabs -e 'show full processlist\G' enwiki_p | grep "Time:" [21:49:46] Part of the idea is that we can depool one of the two servers on that address [21:49:47] Tho today it seems to be consistently going to the same host. [21:49:50] One is depooled right now [21:49:58] I'm repooling it as we speak [21:50:05] Right. If both are pooled, is there a way to monitor the query? [21:50:08] That's only running on one? [21:50:25] Other than just hitting it like 5 times and hoping of those 5 times hits the server I want? [21:50:50] Marybelle: as far as I know there is not a way to pin to a single host in the wiki replicas pool(s). Is this mostly about trying to get EXPLAIN info? [21:51:15] https://tools.wmflabs.org/sql-optimizer does a pretty good job of that [21:52:56] bd808: I thought `EXPLAIN` had been fixed with the replicas, but I tried again over the weekend and it has not. [21:53:03] bd808: This is `show processlist` tho. [21:53:06] If you are talking about showing the processlist, I'm not sure you can do that effectively outside of repeated runs [21:53:12]