[00:00:09] old news [00:00:15] Roan said those were harmless [00:00:19] noisy though [00:00:24] Yeah [00:00:32] I'll go through and clear out old paths in the DB [00:00:33] * AaronSchulz is piping through grep for php-1.19 [00:00:37] After we fully migrate to 1.19 [00:00:45] Also, 1.19 shouldn't throw those notices at all [00:00:50] we're getting about 20 per second [00:01:01] (And I'm very surprised those paths even manage to get in there, it's supposed to be impossible) [00:01:51] moar unit tests [00:02:19] real coders use C, do all the code in one go, and don't need tests [00:02:21] Is editing ment to be broken on MW wiki atm? [00:02:27] No? is it? [00:02:30] * AaronSchulz wishes he was that good [00:02:36] Cannot find section [00:02:36] You tried to edit a section that does not exist. It may have been moved or deleted while you were viewing the page. [00:02:46] Link? [00:03:00] PROBLEM - MySQL Replication Heartbeat on db50 is CRITICAL: CRIT replication delay 241 seconds [00:03:19] I just vandalised my user page fine [00:03:24] hmm seems to be a pile of old image description pages [00:03:27] PROBLEM - MySQL Slave Delay on db50 is CRITICAL: CRIT replication delay 270 seconds [00:03:37] Reedy: is that metaphysically possible? [00:04:00] http://www.mediawiki.org/wiki/Category:Images_with_unknown_copyright_status (at least that i've noticed) [00:04:45] so https://www.mediawiki.org/w/index.php?title=File:Flash_Video_Extension_Screenshot_with_explanations.png&action=edit [00:04:48] PROBLEM - LVS Lucene on search-pool2.svc.pmtpa.wmnet is CRITICAL: Connection timed out [00:05:23] well yes, that is one [00:05:24] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.876 seconds [00:05:24] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.883 seconds [00:05:26] What the hell [00:05:30] That's a bug for sure [00:06:05] §ion=0 works [00:06:11] Makes you wonder what the default value is, if any [00:08:07] if ( $this->section != '' ) { [00:08:07] // Get section edit text (returns $def_text for invalid sections) [00:08:07] $text = $wgParser->getSection( $this->getOriginalContent(), $this->section, $def_text ); [00:09:18] PROBLEM - Lucene on search6 is CRITICAL: Connection timed out [00:09:18] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:18] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:51] but that error is shown whenever $this->textbox1 === false [00:11:37] var_dump( $page->getRevision()->getText() ); [00:11:39] bool(false) [00:12:00] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.631 seconds [00:12:00] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.625 seconds [00:12:05] same with $rev = Revision::newFromId( 114629 ); [00:12:11] though it gets the user name and such [00:13:12] RECOVERY - Lucene on search6 is OK: TCP OK - 8.992 second response time on port 8123 [00:14:06] DiffHistoryBlob::patch: incorrect base checksum [00:14:30] this is the bug apergos found [00:14:34] about a year ago [00:14:51] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [00:15:54] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:54] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:07] !log tstarling synchronized php-1.19/includes/HistoryBlob.php 'temp fix for checksum bug' [00:16:09] Logged the message, Master [00:17:43] TimStarling: is there a bug #? [00:17:55] gn8 folks [00:18:16] no, I'm filing one now [00:18:57] at the time I was unhappy about apergos just hacking around it in the ugliest way possible, I thought he should have fixed it properly [00:19:41] are the line numbers in CodeReview new? [00:19:46] yeah [00:19:49] nice [00:19:57] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.672 seconds [00:19:57] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.699 seconds [00:20:01] something hashar added in relation to adding comments to lines [00:20:04] * robla pokes around for more substantial breakage [00:20:07] * AaronSchulz vaguely recalls this [00:20:28] we're going to miss being able to hack on our code review tool, aren't we? [00:20:28] PHP Fatal error: Cannot access protected property WikiPage::$mTouched in /home/wikipedia/common/php-1.19/includes/Article.php on line 1743 [00:20:30] robla: there's a few other stats things I added.... We can use them for like the next month we still have CR! [00:20:52] magic get [00:20:52] yum [00:21:18] PROBLEM - Lucene on search6 is CRITICAL: Connection timed out [00:22:12] AaronSchulz: probably saner just fixing the actual caller to get Touched [00:22:30] I couldn't find it yesterday [00:22:38] ah [00:22:40] * AaronSchulz will use the logs [00:22:53] Why have we got __get and __set again? :/ [00:23:46] are we ready to do some more? [00:24:25] !log aaron synchronized php-1.19/includes/Article.php [00:24:27] Logged the message, Master [00:24:38] Reedy: that b/c code [00:24:43] *that is [00:24:43] indeed [00:25:12] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:25:12] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:25:39] RECOVERY - LVS Lucene on search-pool2.svc.pmtpa.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [00:25:53] which one of these next? https://www.mediawiki.org/wiki/MediaWiki_1.19/Communications [00:26:15] RECOVERY - Lucene on search6 is OK: TCP OK - 0.002 second response time on port 8123 [00:26:16] let's just get strategywiki and usability out of the way now [00:26:36] and what the heck, throw in simplewiki and simplewiktionary [00:27:11] Need to drop hewikisource from the list [00:28:59] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: strategywiki, usabilitywiki, simplewiki and simplewiktionary to 1.19wmf1 [00:29:02] Logged the message, Master [00:29:03] well, I'm not thrilled about doing it ambassador free, but I think we should still do it [00:29:22] we need an RTL wiki [00:29:31] lemme try to scare up someone on hewiki [00:29:33] i wonder if there's anyone about in their irc channel [00:30:20] Reedy: I tried that...there's just one person [00:30:58] ask, um, asaf? [00:31:02] he lives upstairs [00:31:05] well he doesn't live there [00:31:10] but you know [00:31:48] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.865 seconds [00:31:48] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.863 seconds [00:35:05] * robla checks on staff channel [00:35:07] !log tstarling synchronized php-1.19/includes/HistoryBlob.php [00:35:10] Logged the message, Master [00:36:34] Reedy: haha, the mTouched bug is in robots.php [00:36:41] * AaronSchulz would not have guessed that [00:36:49] Niice [00:38:47] !log aaron synchronized live-1.5/robots.php 'fixed access to protected Page field' [00:38:49] Logged the message, Master [00:39:03] lol [00:40:59] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:08] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:42:47] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , enwiki (24029) [00:43:06] 24k, nice :) [00:43:12] jobs? [00:43:19] yeah [00:43:33] Are we solving US unemployment now? [00:43:37] Someone is updating templates again [00:44:15] why weren't we mentioned in the state of the union ? [00:44:25] !log starting to delete broken thumbnails from swift and squid. job running in a screen session on ms-fe1 [00:44:27] Logged the message, Master [00:44:32] * AaronSchulz is surprised how little error log spam there is for 1.19 [00:44:44] TimStarling: do you think it's reasonable to start in on squid purging now? [00:44:53] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.856 seconds [00:44:57] not enough spam? the server kittehs will starve! [00:45:02] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.530 seconds [00:45:13] AaronSchulz: anything exciting? [00:45:17] robla: purging what from squid? [00:45:27] maplebed wants to start his work [00:45:33] oh, thumbnails? [00:45:37] Reedy: aside from tuuuumbleweed!...no [00:45:40] yeah [00:45:45] TimStarling: digging swift for broken thumbnails and purging them from both swift and squid. [00:46:00] have you written the script already? [00:46:04] I have. [00:46:12] can I review it? [00:46:16] you want to take a look? [00:46:17] sure. [00:46:30] it's on ms-fe1 in ~root/purgebadimages/delete-stuff.py [00:46:31] public flogging has commenced [00:46:38] it'll be called from ./caller.py [00:46:45] delete-stuff is an excellent name for a script [00:46:50] to get 20 concurrent purgers, each working on one bucket. [00:46:54] werdna: inorite! [00:47:07] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: meta to 1.19wmf1 [00:47:09] Logged the message, Master [00:47:30] hrm [00:47:34] TimStarling: I warn you, it's ugly as shit. [00:47:42] and for that I apologize. [00:47:53] half way [00:48:16] function HTCPPurge( $url ) { [00:48:16] print "baz"; [00:48:56] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:05] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:49:09] :) [00:49:11] Helpful output is helpful [00:49:24] it looks slow [00:49:25] hey, it told me the thing was getting successfully called. [00:49:28] I can probably pull it. [00:49:30] TimStarling: it is slow. [00:49:32] how many images have we got to purge? [00:49:42] !log aaron synchronized extract2.php 'fixed access to protected Page field' [00:49:44] Logged the message, Master [00:49:49] but given my test on one commons bucket, my estimate is it'll finish in 1.5 days. [00:50:45] !log dist-upgrading prototype.wikimedia.org [00:50:48] Logged the message, Master [00:51:24] I guess that's tolerable [00:51:34] TimStarling: the other thing that'll make it be fast enough is that since the buckets are all broken apart into shards, I'm gonig to run 20 copies simultaneously. [00:51:44] if it's too slow, I'll bump it to 30. or 50. [00:51:46] :P [00:51:57] ah right [00:52:08] (doing them serially would take weeks, it's true.) [00:52:32] RECOVERY - MySQL Replication Heartbeat on db50 is OK: OK replication delay 0 seconds [00:52:32] RECOVERY - MySQL Slave Delay on db50 is OK: OK replication delay 0 seconds [00:53:34] (that estimate came from a test yesterday where going through one commons bucket took about 4 hours and we have 256 commons buckets, 256 enwiki buckets, and then many smaller buckets.) [00:54:14] hm. I think I didn't do math right yesterday. [00:54:29] * maplebed sets it to run 30 copies. [00:54:44] at least there's no risk of overloading squid with a high rate of HTCP purges [00:56:08] it's also only 1.6% of thumbs that are affected, so I'll have to make 100 calls to swift for every 16 HTCP packets. [00:56:21] err.. 116 calls to swift for every 16 HTCP packets. [00:56:27] (100 stats and 16 deletes) [00:56:53] New patchset: Lcarr; "Adding mobile wap to redirect to new mobile site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2620 [00:57:10] binasher: that look right for rewriting the url ? https://gerrit.wikimedia.org/r/#change,2620 [00:57:13] line 122, that should use the actual correct container name not a random shard, right? [00:57:20] * maplebed looks [00:57:32] PHP Fatal error: Allowed memory size of 125829120 bytes exhausted (tried to allocate 7864320 bytes) in /usr/local/apache/common-local/php-1.19/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php on line 17315 [00:57:33] thanks! [00:57:36] piles of these [00:57:47] PROBLEM - MySQL Replication Heartbeat on db50 is CRITICAL: CRIT replication delay 204 seconds [00:57:47] PROBLEM - MySQL Slave Delay on db50 is CRITICAL: CRIT replication delay 206 seconds [00:57:59] AaronSchulz: that's a lot of lines [00:58:12] TimStarling: fixed. reload? [00:58:15] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2606 [00:58:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2606 [00:58:18] * AaronSchulz opens the file in NetBeans [00:58:39] TimStarling: "Opening the file could cause OutOfMemoryError, which.." [00:58:47] it's over 25,000 [00:58:56] lol [00:58:56] * AaronSchulz chuckles at his IDE's warning and opens anyway [00:59:37] file is 4.64MB [00:59:45] over 40k lines in WikimediaMessages alone [00:59:49] Just a bit ;) [01:00:13] just take it out of CommonSettings.php [01:00:27] LeslieCarr: that wouldn't send a redirect at all, but would internally rewrite the query to lang.mobile which then gets internally rewritten again to lang.wiki.. should work for serving the requested content under the original wap domain name [01:00:31] $wmgUseWikimediaLicenseTexts [01:00:47] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.547 seconds [01:01:02] oh [01:01:05] i think it's fine to do that [01:01:19] okay cool :) [01:01:23] but [01:01:39] Won't we have people complaining about the messages being missing? [01:01:41] i'm guessing there's a better way to do that :) [01:02:44] Reedy: which is worse, messages being missing or the site? [01:03:05] Depends who you ask ;) [01:03:08] maplebed: I see variable names pix, byts, pixes, sorted_byts [01:03:17] what to requests for en.wap look like that aren't for / [01:03:17] is this your own special language or some sort of regional dialect? [01:03:26] lol [01:03:32] I was feeling uninventive. [01:03:39] and wanted to write hard-to-read code. [01:03:44] that file went up from 4.0M to 4.6M this release cycle [01:04:12] I do actually usually choose better names (delete-stuff?) [01:04:19] just not this time. [01:04:44] !log reedy synchronized wmf-config/CommonSettings.php 'Disable WikimediaLicenseTexts for the time being' [01:04:47] Logged the message, Master [01:04:50] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:05:09] Reedy: you made the right choice Sam, you made the right choice :) [01:05:11] TimStarling: so... does it look ok to run? [01:05:18] so pix = width? [01:05:22] AaronSchulz: if enwiki ask, it wasn't me [01:05:22] binasher wap.wikipedia.org:80 208.80.152.75 - - [15/Feb/2012:21:29:51 +0000] "GET /images/81px-Wikipedia-logo.gif HTTP/1.0" 302 599 "http://aoliva.com/blog/2011/12/29/banco-de-imagenes-y-sonidos-del-ite/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.11) Gecko/20101013 Ubuntu/9.04 (jaunty) Firefox/3.6.11" or wap.wikipedia.org:80 208.80.152.47 - - [15/Feb/2012:22:20:42 +0000] "GET /wiki/A_Midsummer_Night's_Dream HTTP/1.0" 302 603 "-" [01:05:23] MCS 9.0" [01:05:34] pix is the ###px pulled out of teh URL - I think that's width? [01:05:49] people are using wap for image links on blogs? ugh [01:05:54] byt = file length? [01:05:57] yeah. [01:06:01] robla: shall we put enwikiquote and enwikibooks over? [01:06:02] LeslieCarr: do you see anything that looks like an article request [01:06:24] did we just disable license text for 1.18 and 1.19, or just 1.19? [01:06:33] * robla wants to understand that change some more [01:06:44] everywhere for the moment [01:06:46] binasher: " wap.wikipedia.org:80 208.80.152.88 - - [15/Feb/2012:22:24:01 +0000] "GET /wiki/%C3%89tats_pontificaux HTTP/1.0" 302 600 "-" "Sevenval FIT MCS 9.0" " [01:06:51] and sizes is also file length except indexed by width? [01:07:00] yup. [01:07:07] Eh [01:07:17] robla: it's disabled everywhere except testwiki and commons [01:07:26] i'll just turn it off on testwiki then [01:07:33] LeslieCarr: can you see if anything contains ?go= or &go= ? [01:07:36] binasher: relatively rare though, would adding a .* at the end and have it all redirect to the frontpage of X.mobile.wikipedia.org ? [01:07:52] what is sorted_byts? [01:08:07] no, you're only rewriting the Host: header of the request [01:08:14] binasher: wap.wikipedia.org:80 10.64.0.130 - - [15/Feb/2012:14:48:05 +0000] "GET /transcode.php?go=eduardo+cobi%C3%A1n+y+roffignac&seg=2&phpsessid=36df2daa30ee6686168b6eb99f687222 HTTP/1.0" 302 791 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; sbcydsl 3.12; YComp 5.0.0.0; yplus 5.1.02b)" [01:08:19] !log reedy synchronized wmf-config/CommonSettings.php 'Scrap that' [01:08:21] Logged the message, Master [01:08:40] !log reedy synchronized wmf-config/InitialiseSettings.php 'Just disable wmgUseWikimediaLicenseTexts for testwiki' [01:08:42] Logged the message, Master [01:09:02] you seem to be checking the file sizes sorted by file size against the file sizes sorted by width [01:09:17] TimStarling: After making a list of all sizes ordered by width, I create a copy of the list and sort it by sizes. If they're the same, all images with increasing width have increasing sizes and we're cool. If they're not the same, then a "larger" image has a smaller size meaning it's truncated. [01:09:38] you think this will be reliable? [01:09:42] it's not a perfect indicator but it's a decent heuristic. [01:09:46] what if there's only one thumbnail? [01:09:55] oh those crazy old transcode.php urls [01:09:56] 1.18wmf1 it is 4.15MB, 1.19wmf it's 4.52MB [01:09:59] or what if the smallest one is truncated? [01:10:00] I throw those out into a seprate file that I'll look at later. [01:10:05] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.260 seconds [01:10:10] if the smallest one is truncated it won't catch it. [01:10:33] LeslieCarr: i think your varnish change is good, just rewrite the comment above it s/Redirect/Rewrite/ [01:10:38] I'm ok only hitting a large majority since the problem can be fixed for an individual image by doing ?action=purge. [01:10:49] (hitting a large majority but not 100%) [01:11:20] what if the truncation happened towards the end of the file? [01:11:29] probably won't catch it. [01:11:43] we've seen a few files truncated at around 64KB [01:12:22] I'd really like to figure out the OOM and 500 thing [01:12:33] fwiw, this is the same algorithm that generated my "affects ~4.5% of all images, 1.6% of all thumbs" estimate. [01:12:40] we may need to scrub the rest of the deployment if we don't figure that out [01:13:11] so you don't have any figures on what percentage of errors this will actually catch? [01:13:12] well, WM is only going to be an issue when we want to do commons [01:13:25] because you've not written any script which compares the thumbnails against the source? [01:13:26] It's literallly not got enough memory to load in the file [01:13:30] New patchset: Lcarr; "Adding mobile wap to redirect to new mobile site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2620 [01:13:34] TimStarling: that's correct. [01:13:50] why not just do a HEAD request to ms5 and check the Content-Length header? [01:14:08] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:14:18] What's the 500 thing? [01:14:28] binasher: ^^ [01:14:39] I'm happy to do that as a second pass; I'd like to get this started to at least start clearing out the ones I know are broken. [01:14:43] Reedy: Erik was getting 500 errors loading recent changes on meta [01:15:00] but i won't be able to do the HEAD to ms5 comparison and get it running before I leave for the day. [01:15:05] Oh, didn't see anyone say that.. [01:15:07] you mean swift is still enabled in squid? [01:15:13] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2620 [01:15:14] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2620 [01:15:17] no, but broken images are still cached in qsuid. [01:15:28] robla: do you know where/what he was doing? [01:15:39] squid is currently backed by ms5, so broken images whose squid caches expire will come back ok, [01:15:43] looking at Special:RecentChanges :) [01:15:51] I can't repro, for what it's worth [01:15:51] but those expiration times are potentially as large as 10 day.s [01:16:06] Ahh [01:16:10] robla: try loading 500 RC items [01:16:24] loading 500 makes an error 500 [01:16:26] * robla does [01:16:26] how fitting [01:16:35] 100 is ok, 250 isn't [01:17:20] yup [01:17:29] well, I think it should be ok to run this until you write a better script [01:17:50] ok. [01:18:09] if you replaced the shelling out to the swift client with a proper swift client library, and did the HEAD request to ms5, it should be faster overall as well as giving more results [01:18:11] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.373 seconds [01:18:35] I'm a bit nervous about calling out to ms5 for every thumbnail; it'd be pretty easy to overwhelm ms5. [01:18:51] I see memory errors on 1.19 for wmlicensestext still [01:18:54] *oom [01:19:34] it's still in ExtensionMessages-1.19.php [01:19:50] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.946 seconds [01:20:06] ahh [01:20:26] I'm still unclear on the effect of disabling wmlicensestext [01:21:38] ha [01:21:55] you can't disable it on all wikis except commons, it'll split the l10n cache [01:22:12] it'll cause constant refreshing of the cache file [01:22:14] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:22:19] ok, it's running. [01:23:27] anyway I imagine disabling it will break commons horribly [01:23:43] removing all non-english messages would probably be better [01:24:05] Even whilst commons is still on 1.18? [01:24:20] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:24:56] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.831 seconds [01:25:09] it'll only break the 1.18 wikis [01:25:32] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [01:27:56] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:28:49] !log tstarling synchronized php-1.19/extensions/WikimediaMessages/WikimediaLicenseTexts.i18n.php [01:28:50] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:28:51] Logged the message, Master [01:29:27] !log tstarling synchronized wmf-config/CommonSettings.php 're-enabled WikimediaLicenseTexts' [01:29:29] Logged the message, Master [01:29:47] Query: DELETE FROM `msg_resource` [01:29:48] Function: MessageBlobStore::clear [01:29:50] Error: 1213 Deadlock found when trying to get lock; try restarting transaction (10.0.6.47) [01:30:55] meh, transient [01:31:00] never mind, I'm wrong [01:31:08] ;) [01:31:36] !log distupgrading formey [01:31:39] Logged the message, Master [01:32:20] TimStarling: it looks like it's purging about 3 objects per second. [01:32:33] I feel like that's a little higher than I'd like [01:32:33] AaronSchulz: fixing that problem is what extension-list is for [01:33:20] !log tstarling synchronized wmf-config/InitialiseSettings.php [01:33:22] http://ganglia.wikimedia.org/latest/graph_all_periods.php?m=swift%20object%20change&z=small&h=Swift%20pmtpa%20prod&c=Swift%20pmtpa&r=hour <-- is measured in deletions per 30s period, so divide by 30 to get deletes per second [01:33:22] Logged the message, Master [01:34:00] we're about to have a short svn downtime [01:34:14] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.483 seconds [01:34:32] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.402 seconds [01:34:33] maplebed: squid can handle hundreds of deletions per second [01:34:33] is that going to severly affect anyone? [01:34:41] but can ms5? [01:34:43] Ryan_Lane: for 1 min or so? [01:34:50] sure. let me know when it's ok [01:34:52] err [01:34:56] yeah, just a reboot [01:34:57] you're worried about a reduced hit rate? [01:35:02] Ryan_Lane: ok [01:35:06] yeah, ms5 should be able to, since it's handling 60qps, this would bump it to 63qps. [01:35:08] that ok? [01:35:08] TimStarling: yeah. [01:35:16] ok. [01:35:22] !log rebooting formey [01:35:23] Ryan_Lane: sure, be quick :) [01:35:25] Logged the message, Master [01:35:40] that's not how it works... [01:36:09] Ryan_Lane: this isn't good timing [01:36:19] TimStarling: I'm assuming the worst case, where every image I delete is immediately requested [01:36:24] well, that's why I was asking.... [01:36:48] probably won't be during our deploy window [01:37:23] PROBLEM - Host formey is DOWN: CRITICAL - Host Unreachable (208.80.152.147) [01:38:11] and it's back [01:38:15] formey is back [01:38:17] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:20] maplebed: just run it and watch the hit rate [01:38:26] RECOVERY - Host formey is UP: PING OK - Packet loss = 0%, RTA = 0.17 ms [01:38:30] it hasn't changed appreciably yet. [01:38:35] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:38:40] robla: I thought this was inbetween deployment windows... [01:38:43] robla: meh, t'was quick [01:39:03] I think a better model would be to consider what percentage of the cache you're deleting and how that will affect hit rate [01:39:14] to be fair, if the dist-upgrade went bad, it could have been longer [01:39:34] Ryan_Lane: hmm, I thought you were just rebooting [01:39:53] AaronSchulz: I had already done the dist-upgrade [01:40:04] it didn't affect any packages that svn relied on [01:40:09] I was thinking we had a longer window scheduled today. well, live and learn [01:40:36] * AaronSchulz sees 3-7 in his calender [01:40:44] the cache has maybe 97% of the images [01:40:45] hm [01:40:49] I only put 3-5 on http://wikitech.wikimedia.org/view/Software_deployments [01:40:49] ah well [01:41:05] if you delete 1% of the cache, it will have say 96% of the images [01:41:06] robla: and there you have it :) [01:41:08] am I missing this calendar? [01:41:22] occupy the cache! [01:41:25] the WMF Engineering calendar [01:41:28] can someone add me to it? [01:41:30] so the miss rate would go from 3% to 4% and you'd have a 33% increase in backend traffic [01:41:42] oh. I'm on it [01:41:55] crap. it does indeed say 3-7 [01:41:56] sorry [01:42:33] werdna: I could make a size comment about that… [01:43:00] no prob [01:43:25] 93.97% 15.347369 2 - query-m: INSERT IGNORE INTO `msg_resource` (mr_lang,mr_resource,mr_blob,mr_timestamp) VALUES ('X') [01:43:35] http://meta.wikimedia.org/w/index.php?title=Special:RecentChanges&limit=120&forceprofile=true [01:44:06] maplebed: your estimate is that you're deleting 1.5% of thumbnails [01:44:17] right. [01:44:31]