[00:00:30] AaronSchulz: going to review Niklas' from september again? :p [00:00:57] I'm waiting on tim's rev to get OKed [00:01:17] heh [00:01:49] *AaronSchulz gives r110922, r110960 to reedy [00:01:50] raindrift: so what's this about wikieditor click tracking? [00:05:18] well, actually, maybe we don't need to do it in wikieditor... [00:05:31] basically, we need to keep track of the page titles that are being edited using the new workflow. [00:05:49] we could actually log that when the user clicks the button to go to wikieditor, so we don't have to instrument the wikieditor itself. [00:06:17] nod [00:06:25] the important thing is that, when we load up the wikieditor via the ACW interstitial, we log the page title in question. that'll let us find out later the survival rate of those pages [00:06:28] so we need to log the *pages* [00:06:31] (which is, presumably, our metric for quality) [00:06:32] okay [00:06:42] let me think about this a bit [00:06:48] not sure what the best approach is [00:06:49] okay. [00:06:55] yeah. i'm not either. [00:07:05] do we want to create a whole new table? eee [00:07:10] or can we stick them in clicktracking? [00:07:11] is that weird? [00:07:19] i think we may want to make a new table for this. [00:07:34] in which case we need to create a whole new API module, etc [00:07:47] it'll be really handy, i think, for the page title to live in its own dedicated column. i know that's what i'd want if i was trying to get answers later. [00:08:07] if clicktracking is flexible enough to store it in a way that's useful, we should use that just because it sounds like the path of least resistance. [00:09:58] mmm, joining [00:10:00] I get it [00:10:58] I think there's precedent for sticking the page title in the additional_info field [00:11:21] There's a namespace field in the clicktracking table, but no title field, so that's what people have been doing :S [00:15:15] RoanKattouw: oo [00:15:17] we could do that [00:15:19] but then joining is harder [00:15:45] raindrift: how do you feel about that? [00:15:50] Depends on which direction you're joining in [00:16:02] Plus: No new table, no new API module [00:16:06] *aude waves [00:16:28] If you're doing queries that say "here's a bunch of clicktracking rows, now find the page table rows for these", you'll be fine [00:16:48] Minus: it's stored in a different format to it would be in the page table, so you can't use a natural join UNLESS the page is in the main namespace [00:17:11] werdna: There is a namespace field in the clicktracking table [00:17:31] Use that and you'll be fine [00:17:44] ahhhhhhhhhhhh [00:17:45] \o/ [00:17:55] Doing queries that say "here's a bunch of (namespace,title) pairs, find me the clicktracking rows for these" will not be fine because there's no index [00:17:59] werdna: i'd say if there's a precedent, let's do it that way. we're not building any sort of persistent reporting interface, so queryability is a pretty low priority. [00:18:00] nod [00:18:10] raindrift: With RoanKattouw's caveat [00:18:21] Also, you should be aware that, in production, the clicktracking table isn't written to at all [00:18:29] oh? [00:18:40] Instead, it goes through the UDP log collection system [00:18:43] right. [00:18:47] Because it turns out the latter is a tad more scalable :) [00:18:47] ahh [00:18:51] nod [00:18:57] so how do we get the data then? [00:19:06] I assume it all goes through intact, though? [00:19:11] so, one way or another, we have to pull the data into some script and intersect it with the article data outside mysql. [00:19:37] *RoanKattouw remembers when he crashed the site by accidentally logging a few hundred thousand clicktracking events in a few minutes' time [00:20:41] Back then, it melted the enwiki master by burying it in write load. Nowadays, it would just cause emery to drop a bunch of packets [00:20:50] so, yeah. werdna, let's use clicktracking for storing the article titles since it scales well, and we can do the work of building reasonable stats with some script that can grind away on that data for a few hours later. [00:21:10] raindrift: definitely [00:21:16] that's like a 3 line change [00:21:18] \o/ [00:21:20] I approve of this [00:21:53] RoanKattouw: where does the data go? [00:22:30] Currently, emery:/var/log/aft [00:22:46] Longer term, we want to put in a multiplexer that can split data from different sources [00:22:59] We're already mixing AFTv4, AFTv5 and edit toolbar data there [00:23:00] RoanKattouw, you are the fount of all knowledge [00:24:20] I'm sometimes amazed that I do things like tell people about our UDP-based analytics logging setup while simultaneously telling someone that IE's stupidity won't break their CSS anymore [00:25:05] (IE limits the number of