[00:05:31] RoanKattouw, I'll try to review this weekend. [00:05:59] RoanKattouw, this is just my SQL gaps showing, but why does it need both DISTINCT and GROUP BY? [00:06:59] I would think DISTINCT alone should be good enough if each row of the result set has a unique ID that it knows it can distinguish by. [00:12:51] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Article-Reminder-Notifications: Determine language/i18n for Article Reminder notifications - https://phabricator.wikimedia.org/T168385#3440866 (10Mattflaschen-WMF) [00:13:18] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Article-Reminder-Notifications: Determine language/i18n for Article Reminder notifications - https://phabricator.wikimedia.org/T168385#3362850 (10Mattflaschen-WMF) [00:14:45] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Article-Reminder-Notifications: Determine language/i18n for Article Reminder notifications - https://phabricator.wikimedia.org/T168385#3440869 (10Mattflaschen-WMF) [00:14:55] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Article-Reminder-Notifications: Determine language/i18n for Article Reminder notifications - https://phabricator.wikimedia.org/T168385#3362850 (10Mattflaschen-WMF) [00:16:10] 10Collaboration-Team-Triage, 10Notifications, 10Article-Reminder-Notifications, 10MW-1.30-release-notes (WMF-deploy-2017-07-11_(1.30.0-wmf.9)), and 3 others: create new notification type for the article reminder - https://phabricator.wikimedia.org/T165754#3440873 (10Mattflaschen-WMF) [00:18:17] 10Collaboration-Team-Triage, 10Notifications, 10Article-Reminder-Notifications, 10MW-1.30-release-notes (WMF-deploy-2017-07-11_(1.30.0-wmf.9)), and 3 others: create new notification type for the article reminder - https://phabricator.wikimedia.org/T165754#3440877 (10Mattflaschen-WMF) [00:22:04] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Article-Reminder-Notifications: Determine language/i18n for Article Reminder notifications - https://phabricator.wikimedia.org/T168385#3440883 (10Mattflaschen-WMF) >>! In T168385#3439975, @jmatazzoni wrote: > Do we know all the other elements of this... [00:35:44] 10Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017), 10MediaWiki-extensions-PageCuration, 10MW-1.29-release (WMF-deploy-2017-02-14_(1.29.0-wmf.12)), 10Patch-For-Review: Remove the 250 character limit from messages sent through the Page Curation Tool - https://phabricator.wikimedia.org/T157056#3440896 (... [01:06:03] Whoa apparently I was away from my desk for an hour :O [01:06:10] I thought I'd just stepped away briefly [01:06:49] matt_flaschen: Technically just GROUP BY would be sufficient too, I think, but ChangeTag::modifyDisplayQuery() wouldn't know what GROUP BY to add exactly [01:07:38] For strange reasons, MySQL doesn't come up with a proper query plan unless you do exactly what I did (although the DISTINCT itself becomes optional at that point) [01:09:36] So, in order to get distinct rows you either need to literally use DISTINCT, or use GROUP BY rc_id which has the same effect in this case (it deduplicates to make rc_ids distinct, which is the same as making the results distinct because the duplicates only differ in ct_tag which is not in the list of SELECTed fields) [01:10:08] But then you have GROUP BY rc_id ORDER BY rc_timestamp DESC, and MySQL doesn't like it if GROUP BY and ORDER BY differ (GROUP BY has to be a prefix subset of ORDER BY) [01:10:45] (Well, it doesn't *have* to be, but if you don't do that, you get a weird query plan with terrible performance) [01:12:52] Thankfully, we can do ORDER BY rc_timestamp DESC, rc_id DESC which is OK because 1) it uses IDs as tiebreakers which is probably a feature since the sorting was underspecified before and 2) MySQL secretly pads non-unique indexes with the primary key to make them unique, so the (rc_timestamp) index being used here is secretly (rc_timestamp, rc_id), and the query planner knows this so it's able to use the index to do the [01:12:52] sorting [01:14:16] Then we have GROUP BY rc_id ORDER BY rc_timestamp DESC, rc_id DESC, which is still not OK. But we're already grouping by a unique field, so it doesn't matter if we group by more things. GROUP BY rc_timestamp, rc_id means "the combination of timestamp+ID must be unique", but since each ID has only one timestamp, that's equivalent to "the ID must be unique". So GROUP BY rc_timestamp, rc_id is equivalent to GROUP BY rc_id [01:14:31] So now we have GROUP BY rc_timestamp, rc_id ORDER BY rc_timestamp DESC, rc_id DESC for all the crazy reasons above [01:16:04] Once you have that, you don't really need DISTINCT any more. We could remove it, but I chose to keep it because it seemed to make the code more reasonable (modifyDisplayQuery does *something* to make the results unique, and the caller can fix up the query to deal with that; whereas otherwise the caller would have to remember to do deduplication itself) and it didn't affect the query plan (it appears the optimizer realizes [01:16:04] that DISTINCT is already implied by GROUP BY rc_id) [01:16:41] RoanKattouw, I wonder if we even need rc_timestamp, or just rc_id. Don't really know, though, and that's a more drastic change. [01:16:48] Heading out, I'll check scrollback though. [01:16:51] Hmm right [01:16:59] Yeah that would be a drastic change but potentially helpful [01:17:17] Because of the PK padding stuff, probably every query plan would get better [01:17:46] And pagination wouldn't have strange bugs in cases where timestamps are non-unique [01:18:30] I know rev_id and rev_timestamp can be out of sequence with each other but it seems strange for rc_id and rc_timestamp to be out of sequence. Maybe there are edge cases that do cause that though [01:18:36] RoanKattouw, yeah, rc_id only goes back to 2004 as I suspected: c9dcdbc00710863c0258518ac8520d913deea62a [01:18:48] Well, not specifically 2004, but that there originally wasn't an artificial key. [01:18:49] "only" [01:19:15] Interesting.... [01:19:22] RoanKattouw, well, that's when I started editing Wikipedia, and Wikipedia existed before me. QED. [01:19:31] So the PK on that table is likely newer than a lot of the code in SpecialRecentchanges.php [01:19:35] Yep [01:20:02] RoanKattouw, did you check the query plan doesn't change if there are no tags? [01:20:10] Wondering on behalf of the DBAs. [01:21:34] Yes I did [01:21:44] In fact, the query doesn't change at all if there are <2 tags [01:21:57] Cause all of the changes only kick if for >1 tags [01:22:19] Also come to think of it.. [01:22:22] I should add a unit test [01:24:11] ChangeTags.php doesn't have any tests at all, and RC/Watchlist don't appear to have tests for their queris [01:35:16] Holy crap, if you omit the $tag_filter parameter (or set it to false), it defaults to $wgRequest->getVal( 'tagfilter' ) [01:35:18] That's insane [01:35:35] Thankfully nothing in core uses this, everything passes '' or manually passes in a tag filter [01:35:47] I'll grep extensions and if no one uses this misfeature there either, I'll remove it [06:06:45] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Edit-Review-Improvements-Integrated-Filters, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-Recent-changes, and 2 others: Implement 'Live Updates' feature for RC page filters - https://phabricator.wikimedia.org/T167743#3441042 (10Catrope) >>! In... [07:05:25] (03PS2) 10Umherirrender: Break very long lines [extensions/Flow] - 10https://gerrit.wikimedia.org/r/364500 [12:30:00] 10Collaboration-Team-Triage, 10MediaWiki-extensions-PageCuration, 10English-Wikipedia-New-Pages-Patrol: Page Curation toolbar doesn't add to PROD log - https://phabricator.wikimedia.org/T169474#3441246 (10Aklapper) [15:44:53] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10User-notice-collaboration: Incoming ORES deployments (Romanian & Albanian Wikipedia) - https://phabricator.wikimedia.org/T170723#3441409 (10Halfak) [15:45:41] 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10User-notice-collaboration: Incoming ORES deployments (Romanian & Albanian Wikipedia) - https://phabricator.wikimedia.org/T170723#3440470 (10Halfak) I mistakenly thought we were ready to train the English Wiktionary model, but it turns out that our lo... [20:38:12] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/Echo] - 10https://gerrit.wikimedia.org/r/365479 (owner: 10L10n-bot) [20:38:44] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/Flow] - 10https://gerrit.wikimedia.org/r/365481 (owner: 10L10n-bot)