[10:05:53] Reedy: I'm seeing a little bug I can't trace in CR-trunk. The trimmed commit summaries are like this in the revision list: "foobar..." for some reason they have a line break before the ... part (which makes the table 2x as tall) [10:05:53] any ideas ? [10:05:54] Complete commit summary is "foobar\n-lorem\n-ipsum" [10:05:55] Ah, found it http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CodeReview/ui/CodeView.php?&pathrev=91519&r1=91518&r2=91519 [10:05:55]
lorem
... [10:05:55] blocklevel element, duh [10:05:55] heh [14:16:07] I have now read the Annual Plan Q&A. I'm putting off reading the entire Annual Plan till I'm on a train later today. So far I love it. [17:14:52] what number for the scrum? [17:15:51] roan? are you in the scrum? what conference line are we using/ [17:15:52] ? [17:16:10] Oh damn, the scrum [17:16:12] I'm still eating breakfast [17:16:17] I'll go grab a meeting room [17:17:10] x2003 [17:17:33] thx [19:55:18] I figured out why the code review stats are off [19:56:21] someone deferred many revisions by directly futzing with the db, but didn't futz it correctly [19:58:27] they didn't inject any entries into the code_prop_changes table [19:59:06] probably wgDeferedPaths? [20:02:30] Bryan: oh....we're just automatically deferring some stuff in the current version? [20:05:58] yes [20:07:37] gross....now there's no way of knowing just from the db what the initial state of a bug is, without reconstructing the history, then looking at the current state, then if the two don't match, assuming that the current state was that way all along [20:11:19] howief, Ryan_Lane: Alright, AFT ramp-up time [20:11:49] ok [20:11:53] anything you need me to do? [20:12:09] Just keep an eye out [20:12:13] will do [20:12:14] jump in front of the bullet? [20:12:22] Stuff like that yeah [20:12:49] GRENADE!!! [20:12:58] *hexmode throws Ryan_Lane on it [20:13:11] nice [20:13:18] :) [20:13:24] oh, you lived! [20:13:29] haha [20:15:00] RoanKattouw: are you in the office? [20:15:06] Yes, at Priyanka's desk [20:15:10] (which is my desk for now) [20:15:20] OK so we wanna go from 2.7% to 10% today [20:15:57] So that means we need to take down ClickTracking from 10% to 2.7% [20:16:28] so that's roughly an incremental 250k or so articles, correct? [20:16:29] We probably want to do that in stages too [20:16:35] Ahm [20:16:57] 242821 /home/catrope/dario1.txt [20:16:59] 877394 /home/catrope/dario2.txt [20:17:05] i would actually keep thing simple with the clicktracking and just step change it to 2.7% [20:17:08] So 242k now, 877k at the end [20:17:08] fewer moving parts that way [20:17:20] Yeah that's what I was wanting to do [20:17:23] great [20:17:30] But I don't wanna take it AFT up to 10 at once [20:17:35] yeah [20:17:36] Probably should go to 4 or so first [20:17:40] sounds good [20:18:21] Oh, crap [20:18:29] Ignore those numbers, I forgot to exclude redirects [20:18:31] *RoanKattouw regenerates [20:18:43] (the 242k, 877k numbers I mean) [20:19:55] 103,828 now [20:20:28] OK I am going to take down ClickTracking to 6.75% and ramp up AFT to 4% now [20:21:05] cool [20:21:40] See logmsgbot messages in #wikimedia-tech for what's going on [20:22:40] We are going to end up at 374k articles today [20:22:52] sounds right [20:24:05] [[Bloody_Sunday_(1972)]] is our test article for this step [20:24:15] http://en.wikipedia.org/wiki/Bloody_Sunday_%281972%29 [20:24:49] nice choice [20:25:16] I see AFT on it [20:25:24] i don't see it yet [20:25:29] Both in FF5 logged-in on secure and in rekonq as an anon [20:25:38] btw were we able to get the text change on the survey [20:25:47] No that's not in yet [20:25:54] I had a meeting 11-noon [20:26:00] when can we put that in? [20:26:13] Preferably after this ramp-up [20:26:18] k [20:26:43] the reason i'm asking is that i'd like to include some user comments in the blog post that erik and i are doing for tomorrow. [20:27:28] Right [20:27:31] ok i see it on sunday bloody sunday [20:27:36] ? [20:27:43] You mean the one I linked? [20:27:48] sorry, i was thinking of the U2 song [20:27:51] yes, the one you linked [20:28:00] Another one is Celibacy [20:28:13] ok [20:28:39] Which works for me [20:29:11] me too [20:30:06] Alright, just looking at Ganglia now [20:30:49] We'll do another five minutes of graph staring and then we'll proceed to the next step [20:37:45] howief: Litmus tests for next change: [[Alameda, California]], [[Amerigo Vespucci]], [[Almond]] [20:38:27] ok [20:38:33] Going to 7/3.85 [20:38:58] haven't heard about amerigo vespucci since hs [20:39:45] aww [20:40:02] mr. vespucci is not cooperating [20:40:16] I haven't done it yet [20:40:19] Watch -tech [20:41:03] ok [20:41:20] There, now it'll take up to 5 minutes to take effect [20:41:55] Not showing yet in Rekonq [20:42:05] Nor on secure [20:42:08] So let's wait a few mins [20:43:55] aft on amerigo vespucci [20:44:11] alameda, ca too [20:44:28] almond too [20:45:57] I'm not seeing it yet [20:46:30] Not as an anon in rekonq anyway [20:47:05] i just logged out and am seeing aft on almond [20:47:36] I'm still not seeing it [20:47:41] I'm logged out and using a different browser [20:49:05] hmm i just tried another browser and aft is showing up on amerigo [20:50:11] Interesting, it's still not working for me, don't know why [20:50:51] But meh, I believe you [20:50:56] rekonq is probably just weird [20:51:38] Heh, there's a small spike on the API Squids at exactly 20:38 [20:51:43] Then it falls off after 5 min [20:51:55] May or may not be a coincidence [20:55:44] OK, 7% is looking good to me [20:55:46] Let's go to 10 [20:56:57] Your litmus pages for this one will be [[Antisemitism]], [[Aorta]], [[Arithmetic-geometric mean]] [21:05:20] Alright, bump happened at 21:02 UTC, should see results any minute now [21:07:04] ok [21:09:23] It's appearing on [[Antisemitism]], [[Aorta]], [[Arithmetic-geometric mean]] [21:10:30] It's not appearing for me yet, strangely [21:10:34] Might take another couple minutes [21:11:01] Oh, wait, d'oh [21:11:06] I forgot the RL bug workaround thing [21:11:12] *RoanKattouw does workaround [21:12:24] whoever came up with this arithmetic-geometric mean idea must have been really bore [21:12:25] bored [21:13:36] Yay everything works for me now [21:13:58] To be fair I didn't actually know what this was [21:13:59] nice [21:14:06] I knew what AM and GM were, but not AGM [21:14:16] yeah it's a really strange concept [21:14:22] i'm not sure what it would be used for [21:14:26] But intuitively I kind of feel how they'd converge [21:14:32] yeah [21:14:47] Like you said I also don't understand what you'd use it for [21:15:12] A few years back I did do some stuff involving proving inequalities that used AM/GM/HM a lot [21:15:23] according to the talk page, agm inspired the algorithm for the quickest way to calculate pi [21:15:34] that sounds like fun [21:15:45] Oh, right, because of fast convergence [21:15:57] i suppose [21:19:04] so are we're at 10% now? [21:19:10] does that mean we're done for the day? [21:19:21] howief, Ryan_Lane: Alright, ramp-up done, we're at 10% now. All is good, we're done for today, we'll continue tomorrow [21:19:29] nice :) [21:19:31] thanks everyone! [21:19:38] \o/ [21:21:08] yay [21:37:28] RoanKattouw: \o/ thanks!!! [21:43:13] hey RoanKattouw can you let me know when the updated copy is up on the survey? i'd like to send Geoff, our attorney, a screenshot [21:43:41] I'll get you a screenshot from my localhost setup [21:43:53] that works too! [21:44:00] ...once I get it to work [21:44:15] It shows now because I made a stupid mistake [21:46:43] Oh graaaah [21:46:59] Thumbnailing broken on officewiki via secure [21:49:13] howief: https://secure.wikimedia.org/wikipedia/office/w/img_auth.php/d/d6/Aft-legalese.png , requires login on officewiki [21:51:47] argh i'm logged into office wiki but can't access the page [21:52:07] can you email the file to me? [21:56:10] OK [22:41:58] 20 min to IRC bug triage [22:42:04] back in 1s [22:58:27] hexmode: how's it going? point to the Etherpad? [22:58:42] sure, 1s [22:58:55] http://etherpad.wikimedia.org/BugTriage-2011-07 [22:59:05] wish I could set the topic [22:59:15] oh [22:59:17] heh [23:00:22] opps, needed to be on -tech [23:01:30] ok. so we're all here and started [23:02:05] first: http://bugzilla.wikimedia.org/20468 User::invalidateCache throws 1205: Lock wait timeout exceeded [23:02:14] and I think the next is like it [23:02:40] Yeah those are just kinda annoying [23:02:47] Reedy had been looking at those but I don't think he got anywhre [23:03:00] but I don't see any recent updates on them, so not too big a deal? [23:03:16] <^demon> They're annoying at most. [23:03:18] e85783 is really not the right way to deal with that [23:03:24] !r 85783 [23:03:24] --elephant-- http://www.mediawiki.org/wiki/Special:Code/MediaWiki/85783 [23:03:32] the right way to deal with it is to reduce the transaction time and the number of locks in the transaction [23:03:49] for those just joining us, we're looking at 8 caching-related bugs to figure out how to prioritize and assign them [23:03:53] There's also some anecdotal 'evidence' that User::invalidateCache() is called too foten [23:03:57] *often [23:04:07] <^demon> That's probably true. [23:04:12] k [23:04:35] User::invalidateCache() doesn't call $dbw->commit(), I think we may have tried that once but it broke other things [23:04:53] transactions are tricky because they can't be nested [23:05:00] So maybe we need a transaction analysis? [23:05:03] so they don't interact with the call stack in a nice way [23:05:06] yes [23:05:26] TimStarling: there are many places where transactions *are* nested though [23:05:28] look at what is calling User::invalidateCache() when it is locking for a long time [23:05:35] found that in my import to Pg [23:05:42] then add a commit to the calling code [23:05:56] Oh right [23:06:04] I guess it'd be useful to get backtraces on these errors [23:06:05] as long as that code isn't embedded in some other larger transaction [23:06:08] So we can see what else is in the transaction [23:06:58] is there any where those are reliably popping up? [23:07:25] the timeout exceeded, that is? [23:07:25] That's the thing, I don't think we even know that [23:07:41] We know User::invalidateCache() and Block::purgeExpiry() (??) throw them all the time [23:07:50] But we don't know what the path to those functions is [23:07:57] Cause we don't have backtraces on these errors, and I think we should [23:08:03] ok, so "all the time" is better than rarely. [23:08:05] :) [23:08:31] just log it [23:08:34] Well it happens relatively frequently [23:08:36] Yeah that'll work [23:08:53] log the backtrace? [23:09:03] yes [23:09:06] k [23:09:27] well, unless there is something else, lets move to #3 [23:09:46] #3: http://bugzilla.wikimedia.org/26338 Wikimedia Javascript and CSS files are getting an extra max-age cache-control param [23:11:29] just a shell issue and move on? [23:11:37] *hexmode answers his own q: [23:11:38] looks like that's one we punted until the glorious future of ResourceLoader [23:11:38] ok [23:11:43] heh [23:11:49] hexmode: I'm not clear on who's going to make the changes to start logging for bugs # 1 & 2 [23:11:51] oh wait, the glorious future rocks! [23:12:02] I'm not sure it's a shell issue [23:12:22] sumanah: RoanKattouw or TimStarling ... TimStarling ?? [23:12:51] could you put the logging in for those messages, TimStarling ? [23:13:05] or RoanKattouw ^^ [23:13:10] so...one thing to interject on trying to find assignees: [23:13:12] *hexmode wants to flip a coin [23:13:24] we talked about this last week, and the conclusion was [23:13:38] we should get the priority figured out [23:13:49] ExpiresByType can be set in a section [23:13:51] ...and we should make a rough estimate for time needed (easy, medium, hard) [23:14:03] but not assign in the triage [23:14:10] we still do send some static files [23:14:16] ah... [23:14:20] ExpiresByType can be set for the directories that they are in [23:14:52] so, sumanah, looks like we did all but assignment :) [23:14:53] robla, hexmode -- ok, understood re assignees; for investigative next-steps, it would still be good to assign *that action*, no? [23:15:29] TimStarling: is that an ops issue? [23:15:32] or shell [23:16:06] we can probably handle it [23:16:37] before you assign lots of things to me, I should probably tell you that today is already looking pretty busy for me, and tomorrow I will be on holiday [23:16:37] TimStarling: I meant the ExpiresByType ... is that what you meant? [23:16:54] sumanah: if someone is in a position to volunteer in this meeting, then great. however, generally, we want to get make backlog ordering a team thing rather than something that happens in the triage [23:17:18] got it, robla [23:17:21] robla++ [23:17:28] makes a ton of sense [23:17:48] hexmode: yes, it just needs to be someone who knows how to configure apache, doesn't matter who [23:17:54] k [23:18:08] probably easier for us to do it [23:18:26] already on the next one, ^demon is going wild: [23:18:37] http://bugzilla.wikimedia.org/26360 Disabling sessions in memcached produces open() error [23:18:46] <^demon> Yeah, I broke it way back in r49370 [23:18:53] heh [23:19:07] you break, you buy it ;) [23:19:45] <^demon> I'm pretty sure changing the default for $wgSessionHandler will resolve it. [23:19:52] good, next: http://bugzilla.wikimedia.org/29223 Querying for rvdiffto=prev fails for many revids: "notcached" [23:20:03] <^demon> Especially since my comment in GlobalFunctions said nearly as much. [23:20:56] Oh, yeah that one [23:21:00] Reedy has been messing with that one [23:21:04] I'm pretty sure it's fixed in trunk somehow [23:21:12] It's a bug in DifferenceEngine, or in the way we're calling it [23:21:17] I'm not sure how much progress he made with it [23:21:49] wish reedy were in SF *this* week [23:22:18] k, so I'll follow up with Reedy. [23:22:22] that reminds me [23:22:27] ? [23:22:31] the same problem exists with action=parse [23:22:40] it only fetches from the parser cache, it doesn't store to it [23:22:49] Right [23:22:55] Do we have a bug for that? [23:23:04] probably not [23:23:10] that sounds not so good [23:23:13] but it'll be reducing our parser cache hit ratio significantly [23:23:20] I can create a bug for it [23:23:21] How so? [23:23:28] since we now have huge numbers of action=parse hits due to android and iphone apps [23:23:29] hexmode: is there any way you can aim on wrapping up this triage by 7:45 instead of 8pm? it overlaps with my commit access queue review meeting, which includes Tim & Chad [23:23:31] Surely fixing that *increases* our pcache hit ration [23:23:44] sumanah: sure, think so [23:23:48] thanks [23:23:49] yeah, the fact that it is not fixed is reducing our parser cache hit ratio [23:24:01] OK, then I just misparsed your statement [23:24:13] Those API cache fixes can be given to either Sam or me [23:24:19] k [23:24:22] I can't guarantee I'll have time for them until early next week [23:24:30] Cause I've eaten my 20% time for this week by doing HTTPs [23:24:40] sokay, I'll remember to bug you ;) [23:24:45] even next week [23:24:52] Could you also file the action=parse bug then? [23:24:52] next http://bugzilla.wikimedia.org/29384 Load order of cached request in IE6 messes with dependancy resolving (mediawiki.util not available in time) [23:24:57] yes [23:25:06] Krinkle said he'd been debugging that and didn't know what's causing it [23:25:13] yep [23:25:15] Maybe he and Trevor can look at it together next week [23:25:34] excellent. [23:26:08] (I'm saving time for the last one ;) ) [23:26:11] next [23:26:12] http://bugzilla.wikimedia.org/29552 Squid cache of redirect pages don't get purged when page it redirects to gets edited [23:26:42] I think Ariel said about this [23:26:57] "Let's first confirm that multicast to esams is really fixed, then if it is we can go back to blaming MW" [23:27:01] or something similar [23:27:31] the redirect purge? [23:27:38] or the last one? [23:27:42] Bawolff says he installed 1.17wmf1 locally and says MW is sending HTCP purge packets for the redirects correctly [23:27:46] The redirect purge bugs [23:27:47] *bug [23:28:33] could be packet loss [23:28:55] what order are the HTCP packets sent in? [23:29:38] k, don't think I saw Ariel's comment. In the meantime, does action=purge resend the squid purge requests? [23:29:39] It don't know, would have to read the code [23:29:46] (to both Tim and Mark) [23:30:14] TimStarling: Why would the order matter? [23:31:54] well, HTCP packets are UDP [23:32:13] they are sent by SquidUpdate::HTCPPurge() without throttling [23:32:31] so if there is enough packets, maybe they will overflow a buffer in the next hop router and be lost [23:32:34] Oh you're saying if they're sent too close together, the second will be dropped [23:32:43] the later packets would be lost and the earlier ones wouldn't [23:32:47] Right [23:33:38] so, how would that change what we see? [23:33:39] But what it comes down to is that there must be packet loss in our network somewhere [23:33:45] We can ask Mark B to track that down, right? [23:33:48] it's just an idea [23:33:52] Or Ryan or any of the ops folks [23:33:57] Well if MW is definitely sending the packets ... [23:34:01] It's multicast, right? [23:34:04] I think if it were happening, it would be more obvious in templatelinks than in the redirect table update [23:34:19] Can't we attach a test listener to the multicast group and see what makes it through? [23:34:27] there's likely to be more templatelinks than redirects right? [23:34:31] we can [23:34:41] Sure, and I think there frequently are problems there [23:34:47] They probably get falsely attributed to the job queue [23:34:52] Plus, tl targets are edited too [23:34:55] Redirects are never edited, really [23:35:10] so who does the test links listener? [23:35:43] I could but I'm too busy [23:35:50] ops? [23:35:52] To do it in the short term at least [23:36:06] RoanKattouw: ok, not looking for a person [23:36:14] just want to know which group [23:36:29] I would think ops, yes [23:36:38] k [23:36:43] To me it seems most likely the problem is on "their" side [23:36:48] Cause I'm fairly sure MW is behaving correctly [23:37:02] Maybe the lack of throttling that Tim mentioned is unfortunate [23:37:07] That could be related [23:37:32] k, 10 min [23:37:48] http://bugzilla.wikimedia.org/28613 Thumbnails of updated files fail to purge on squids [23:38:19] is that one just the same (need some test listeners) or is there something else we can do? [23:38:45] well, if it is packet loss, it would cause a lot of different squid purge bugs [23:39:05] if it isn't, then maybe they have different causes [23:39:20] We actually don't really konw [23:39:29] Bryan did some debugging with Reedy's help and was surprised by what he saw [23:39:55] *hexmode goes back to the bug to check for surprise [23:40:20] Not sure it's in the bug [23:40:21] hmm, reading the bug report now [23:41:01] yeah, he does seem confused there [23:41:33] where confused means "wha??? This doesn't make sense!" [23:43:04] comment #46 [23:43:26] I posted some http responses to that bug as well, all MISS and still gettin gold file [23:43:59] MISS means squid doesn't have it, either, right? [23:44:09] it can't be a squid bug if it's a miss [23:44:22] so squid goes to the server and gets the file [23:44:24] Then it's probably ms5 or something [23:44:33] The thumb doesn't exist on the NFS system [23:44:39] But it's getting served out anyway [23:44:44] One possible cause could be a perms issue [23:44:48] ms5 -- nfs client? [23:44:50] that prevents it from being saved [23:44:57] ms5 serves thumbs [23:45:01] reads them from NFS [23:45:11] Or, if not present, dispatches a scaler to create it and store it on NFS [23:45:20] the 404 handler will stream out the result of thumb.php [23:45:31] ms5 reads thumbs from nfs or creates them from nfs? [23:45:44] but it's hard to see how thumb.php could stream out something without the file being created [23:45:44] It does at least the former [23:45:54] I'm not sure whether writing the file happens on the scaler or on ms5 [23:45:57] the image scaler puts it straight on NFS, then streams it out from there [23:46:02] Ah, right [23:46:17] Well it seems to be serving thumbs for a file even though its thumbs directory is empty [23:46:24] That sounds like a mystery for Ariel to me [23:46:49] k, I'll give her these logs tomorrow [23:47:04] hexmode: ok, I am going to pull two of these folks away now so they can review code in the review meeting [23:47:09] anything else, or is that it so sumanah can go? [23:47:12] np [23:47:22] I think we're done [23:47:30] Edokter: thanks for your input [23:47:39] welcome! [23:47:48] interesting that packet loss is implicated in comment 37 [23:48:24] Oh heh, someone just closed the R1 door [23:48:32] I could hear sumanah all the way across the room [23:48:47] heh [23:49:02] argh, I do not mean to be loud! it is the speakers! [23:49:15] No worries :) [23:49:36] I forgot how loud those speakers are, got bitten by it yesterday with Timo and Guillaume [23:49:37] ^demon: you're dialing in? [23:49:43] <^demon> x2003? [23:49:54] Edokter: see pm? [23:50:29] ^demon: yes [23:54:12] just a fleeting though: if server serves old file and dir is empty, where is old file coming from? different dir maybe? [23:55:18] If the file is missing there's a handler that generates a new one [23:55:25] My theory is that the new one gets served out but not stored [23:57:38] but the new doesn't get served; i still get the old file [23:58:20] if thubs dir is empty, where does the old thumb come from? [23:59:49] You're getting a MISS too?