[00:03:08] chrismcmahon: wanna help me click on all of the wikinews links? http://meta.wikimedia.org/wiki/Special:SiteMatrix [00:04:18] robla: sure. I looked briefly at https://en.wikinews.org in FF, figured I'd poke it with IE7 since I have iE7 [00:04:57] chrismcmahon: thanks for your help on the message file comparison thing, it was very useful [00:05:16] !log synchronized payments cluster to r112275 [00:05:18] Logged the message, Master [00:05:23] you can commit any scripts you wrote to /trunk/phase3/maintenance/language if you like [00:05:43] TimStarling: glad to hear it. [00:08:16] * hexmode goes to watch wikinews channels [00:08:41] just watch the WN main page: if WN would be broken it would be a top breaking news ;) [00:09:01] TimStarling: the scripts are just little one-off Ruby things that read files and do regexes, not sure they'd be of much use in general, but they're easy to tweak for different kinds of comparisons [00:09:02] heh [00:09:16] chrismcmahon: you said the r word!! [00:09:35] Reedy: I like Ruby :) so shoot me. [00:09:41] you know in some parts of the world, you can use ruby without being ridiculed [00:09:55] strange but true [00:10:01] It's a good job we cover most of the world [00:10:25] Reedy TimStarling Ruby has a lot of traction in the testing community, along with Python, and I'm less fond of Python [00:10:27] also there's not many places where you can use PHP without being a joke, so maybe we should keep quiet [00:10:27] if if we don't -- that's a bug! [00:10:47] * chrismcmahon wrassles IE7 some more [00:11:31] chrismcmahon: /trunk/tools is a good place for little projects without much wider applicability [00:13:16] "1 fault (11)" thanks fatal monitor [00:14:41] ahh, startup module expiry time [00:14:55] thanks for reminding me in your email just now, RoanKattouw [00:15:44] 500 errors on thumbs.... [00:16:37] haha [00:16:46] Yeah we forgot that for this deploy didn't we [00:17:15] it's still just WN isn't it? [00:17:20] yup [00:17:25] Aye [00:17:32] Also thumbs are broken on the WN main page [00:17:44] Open the WN main page incognito (or hard refresh it) and you get 500s for all thumbs [00:18:02] The ops dept is standing around Ben's desk looking at it now, sounds like [00:18:16] ...and now they're going for tea [00:18:20] just asked at de.WN: no problems. Also thumbs on main page ;) [00:18:33] yay [00:19:36] !log tstarling synchronized wmf-config/InitialiseSettings.php 'reduce startup module expiry time for all projects except wikipedia' [00:19:37] load this to see the backend error code: http://magnesium.wikimedia.org:8080/wikipedia/commons/thumb/a/a0/Sweden_and_Norway_union_arms.jpg/150px-Sweden_and_Norway_union_arms.jpg [00:19:39] Logged the message, Master [00:19:42] robla: ^^^^ [00:19:55] "convert: missing an image filename `/tmp/transform_1fdcc7d-1.jpg' @ error/convert.c/ConvertImageCommand/2970" [00:19:59] !log tstarling synchronized wmf-config/CommonSettings.php 'reduce startup module expiry time for all projects except wikipedia' [00:20:01] Logged the message, Master [00:21:30] the file appears to be actually gone [00:21:51] can we revert now? [00:21:55] * TimStarling reverts [00:22:20] TimStarling: you're reverting for all of wikinews now? [00:22:26] (that'd be fine) [00:22:38] yeah [00:23:05] source images going missing, way to scary for me [00:23:19] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: wikinews back to 1.18 [00:23:21] Logged the message, Master [00:23:36] TimStarling: it's really broken now [00:23:44] wikiversions.cdb version entry does not start with `php-` (got `1.18`). [00:23:45] yay no css [00:24:29] better to have the site down than to have it deleting source images [00:24:38] someone please tell me those source images are not really gone [00:24:42] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: wikinews back to 1.18 [00:24:44] Logged the message, Master [00:25:21] https://upload.wikimedia.org/wikipedia/commons/3/3b/Buddy_Roemer_Reform_Party.JPG [00:26:06] http://en.wikinews.org/wiki/Whales_mingle_with_dolphins_days_before_World_Whale_Day [00:26:08] lol [00:26:33] ok i did an action=purge on the enwikinews main page, now showing thumbs again: https://en.wikinews.org/wiki/Main_Page [00:26:46] mmm [00:27:15] Could cdb stuff show up as errors? "wikiversions.cdb version entry does not start with `php-` (got `1.18`)." [00:27:28] that sould've been fixed [00:27:33] hexmode, that should be gone now [00:27:47] where was that sweden thumb supposedly shown? [00:28:40] maplebed? [00:29:11] he stepped out for a sec [00:29:15] TimStarling, all the images in the lead articles on https://en.wikinews.org/wiki/Main_Page [00:29:23] but ONLY when logged out [00:29:28] brion, yeah, but stilll.... filing a bug to track it [00:29:39] well, that image that I was worried that was lost has a template on it "The most recent version of this image has been reported as lost or corrupted due to the September 2008 image loss bug." [00:29:48] he must have just got it out of a log file or something [00:29:51] hexmode, well i think that's just 'somebody typed the wrong thing in config' [00:30:10] brion: it's like a WP blog saying it can't reach server [00:30:37] so the issue on wikinews was just some thumbnail URL issue or something [00:30:47] yeah... very strange though [00:31:14] the template on that image links to a wikitech-l post by me [00:31:18] bad memories... [00:31:22] https://en.wikinews.org/w/index.php?title=Template:Lead_2.0&action=edit [00:32:35] there's some fun template stuff for constructing at least some of the image [00:32:41] so nobody has any cached information about the broken URLs? [00:33:30] if not, we might have to switch back to 1.19 just to reproduce it [00:33:38] what kind of info? [00:33:45] http://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Sweden_and_Norway_union_arms.jpg/150px-Sweden_and_Norway_union_arms.jpg [00:34:04] that's the URL maplebed posted, which made me revert the deployment [00:34:05] i think i saw multiple sizes (another one was 100px i think) [00:34:19] but like I say, it's been broken since 2008 so it couldn't be used on the en.wn main page [00:34:26] right [00:34:31] i have no idea where it *came* from [00:34:43] yeah, a few sizes [00:34:44] but it seems to have inserted itself into all those images somehow [00:34:46] he probably just got it out of an error log on a swift server [00:35:09] ok, you're saying that every image turned itself into sweden? [00:35:12] yeah [00:35:21] freaky [00:35:26] every image in the lead article templates on en.wikipedia.org main page *when logged out* [00:35:31] when logged in, they showed normal [00:35:41] action=purge confirmed it both ways [00:36:02] file cache? [00:36:08] other images on the page (such as the icons in the sections below) were untouched [00:36:08] file object cache in MW I mean [00:36:18] are there any parser tags that are unique to wikinews that might be used in the templates? [00:36:20] the specific image is also referenced in this bug related to sha-1: https://bugzilla.wikimedia.org/show_bug.cgi?id=23529 [00:36:25] just brainstorming [00:36:28] hmm, File cache objects should be same for both anon and user though [00:36:50] ah, FlaggedRevs [00:36:54] TimStarling: I might blame FR [00:37:03] aha [00:37:04] jinx ;) [00:37:10] yeah, FlaggedRevs uses sha-1 indexes [00:37:16] anons see the stable and stable gets files via sha1 [00:37:17] \o/ and we have a winnar [00:37:26] if a bug was causing a fetch for the empty string it would do this [00:37:44] great [00:37:44] so just remove the bad image rows from the table maybe [00:37:47] we should have tested stable=0 [00:38:03] so a function returns null and we see sweden's coat of arms everywhere, funny failure mode [00:38:18] or rather don't see, since the image wasn't there ;) [00:39:34] can we configure test2 to repro? [00:39:47] test2 is already in flaggedrevs.dblist [00:41:38] let's deploy 1.19 to some non-FR wikis while Aaron works on isolating it [00:42:29] gn8 folks [00:42:38] what do you think, robla? [00:42:59] this might be a FileRepo bug too, but it would probably only affect FR wikis in such cases AFAIK [00:43:01] TimStarling: good idea [00:43:12] which ones? [00:44:48] the FR wikis seem to be pretty well-distributed across all projects [00:44:56] 10 of 33 wikinews are FR [00:45:58] wikiversity is all clean [00:46:00] 14 wikis [00:46:08] we have a winner! [00:47:18] want me to press the button? or have my button-pressing privileges been revoked after the php-1.18 versus 1.18 thing? [00:47:26] haha [00:47:29] we'll let it slide [00:47:30] i just made the changes [00:47:54] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Put all wikiversity on 1.19wmf1 [00:47:57] Logged the message, Master [00:48:33] Like you said, safer for the site to be down than potentially deleting images [00:50:52] My favorite quote from that scrollback: " so a function returns null and we see sweden's coat of arms everywhere." [00:50:59] Yup [00:51:03] That one goes into BZ quips [00:55:56] how about we do the wikisources now, except for the 5 with FR? [00:56:27] you remember wikisource was one of the main reasons for having this separate deployment window [00:56:38] All our favorite extensions? [00:56:57] yep [00:57:08] Might aswell get it over with [00:57:48] * AaronSchulz is about ready to blame r96357 [00:57:51] Most of the wikimedia wikis can be done also, bar de, en and flaggedrev labs [00:58:05] } elseif ( isset( $options['sha1'] ) ) { // get by (sha1,timestamp) [00:58:07] $file = RepoGroup::singleton()->findFileFromKey( $options['sha1'], $options ); [00:58:20] in Parser, which is OK, but that rev about does not interact well [00:58:21] styling on http://en.wikiversity.org/wiki/Wikiversity:Sandbox looks fairly awful in IE7 but OK in Chrome [00:58:40] 'false' counts as isset() :) [00:59:14] chrismcmahon: just the sandbox? [01:00:06] Reedy: afaict yes. the cartoon sand image on that page obscures the text in IE7 [01:00:19] looking further... [01:00:42] I wouldn't worrry about it too much ;) [01:00:48] indeed [01:01:25] TimStarling: sure, let's get some of the wikisources out of the way [01:02:10] I'm editing the file [01:02:54] checking random pages in IE7, looks basically ok. math is ok, tables ok, images mostly OK, close enough [01:03:41] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: switching wikisource to 1.19 except for wikis with FlaggedRevs enabled [01:03:43] Logged the message, Master [01:05:58] how come this works? http://test2.wikipedia.org/wiki/Imagetest [01:06:53] https://en.wikisource.org/w/index.php?title=Page:Children%27s_Development_of_Social_Competence_Across_Family_Types.djvu/4&action=edit [01:06:54] looks good [01:08:26] so proofreadpage works [01:08:41] or doublewiki [01:15:28] Yes, Proofread page [01:17:16] doublewiki doesn't work for me .. e.g. http://fr.wikisource.org/wiki/Criton_(trad._Cousin)?match=en doesn't do the side-by-side view [01:17:54] Yeah, I was just thinking the same thing [01:17:55] !log aaron synchronized php-1.19/extensions/FlaggedRevs 'deployed r112284' [01:17:58] Logged the message, Master [01:18:55] The page at https://fr.wikisource.org/wiki/Criton_(trad._Cousin)?match=en displayed insecure content from http://toolserver.org/~phe/fonts/LinLibertine_Re-4.7.5.ttf. [01:18:57] eww [01:21:00] Wow, wtf [01:21:07] That's not OK [01:21:15] just a bit [01:21:19] Probably a violation of the privacy policy [01:21:27] it works over https... [01:21:44] Krinkle flagged another wiki earlier where they were pulling a font from a totally random 3rd party domain [01:21:55] That definitely violates the privacy policy; toolserver I'm not sure [01:22:03] at least it's not prototype [01:22:11] very true [01:22:34] looks like it must be in site js [01:23:41] RoanKattouw: TimStarling: Afaik toolserver users have the privacy policy applied to them when users access the Toolserver via direct url (e.g. when interacting with a tool) [01:24:08] however I don't think (just thinking) that the policy allows ts-users to access IP information from anonymous wiki project readers [01:24:21] every ts-user can read the apache logs on toolserver [01:25:17] I do not think this should happen without user consent (e.g. gadget activation) [01:25:19] when an anonymous user visits my Toolserver tools, I have to keep the information I gather under that policy. So it's more the toolserver adopting the same policy, than the wmf policy covering the Toolserver as part of the whitelist. [01:25:37] indeed Saibo [01:25:39] var request_url = "http://toolserver.org/~phe/ocr.php?url=http:"+proofreadPageViewURL+"&lang="+wgContentLanguage+"&user="+wgUserName; [01:25:40] got it [01:25:47] it is another server.. other admins [01:26:02] an: frequently goes down [01:26:06] *and [01:26:54] same will/does apply to Wikimedia Labs, regardless of it being a WMF project. Users have high access there including apache logs [01:27:38] is anyone working on DoubleWiki? [01:27:57] is that something like Doublethink from 1984? ;) [01:28:32] shh Saibo [01:29:11] dewiki is even worse [01:29:14] it looks like RoanKattouw and Reedy did commits to it in the last cycle, maybe one of you has a test install? [01:29:18] [blocked] The page at https://de.wikisource.org/wiki/Hauptseite ran insecure content from http://wikisource.org/w/index.php?title=MediaWiki:Base.js&action=raw&ctype=text/javascript. [01:29:37] I don't [01:29:39] Reedy: dewiki*source*? [01:29:44] I don't even know what the hell it does [01:29:49] near enough [01:29:55] RoanKattouw: crosswiki diff essentially [01:31:37] side-by-side view of rendered page output from the same page in two languages of the same project, used for wikisource source text comparisons [01:32:09] adds a little <=> arrow next to interlanguage links which should load a page with the ?match=xx parameter that triggers the side-by-side view [01:32:50] * robla hunts around pl.wikisource for doublewiki [01:33:03] robla: ?match=en [01:33:21] http://pl.wikisource.org/wiki/Wiki%C5%BAr%C3%B3d%C5%82a:Strona_g%C5%82%C3%B3wna?match=en [01:33:24] ah, nice [01:36:36] appears to be installed on beta.wmflabs: http://en.wikisource.beta.wmflabs.org/wiki/Special:Version [01:36:46] ...though it doesn't appear to be working [01:37:21] (which...not a big surprise) [01:38:11] dewiki http includes fixed [01:40:38] should we rollback and postpone wikisource? [01:41:16] I'm worried that we're going to kill a bunch of time debugging an extension that none of us understands very well [01:41:32] I guess so [01:41:58] I'm surprised we didn't get word from the fr.wikisource folks [01:42:08] https://fr.wikisource.org/w/index.php?title=MediaWiki%3ACommon.css&diff=3279236&oldid=3256926 [01:42:09] Seriously [01:42:15] it's late there [01:42:28] we deployed frwikisource last week, didn't we? [01:43:00] want me to leave that one on 1.19? [01:43:08] sure [01:43:34] also hewikisource [01:43:37] TimStarling: ^ [01:44:02] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: reverted wikisources back to 1.18 except frwikisource [01:44:05] Logged the message, Master [01:44:17] We can get Hashar/Guillom to check their tech pages [01:44:46] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: and hewikisource [01:44:48] Logged the message, Master [01:44:50] heh [01:45:02] Try the wikimedia wikis next? [01:45:09] minus labs [01:45:15] let's circle back to wikinews [01:45:34] New patchset: Bhartshorne; "passing through text of back end error messages in addition to their HTTP response code" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2753 [01:45:52] did Aaron fix that bug? [01:45:57] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2753 [01:45:57] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2753 [01:46:05] He merged something [01:46:17] 01:17 logmsgbot: aaron synchronized php-1.19/extensions/FlaggedRevs 'deployed r112284' [01:46:18] yeah, he did [01:46:21] !r 112284 [01:46:21] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/112284 [01:46:43] good...I'm not the only one who missed that :) [01:46:51] ok, all wikinewses then? [01:46:56] yup, let's do it [01:48:10] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: switching all wikinewses to 1.19 [01:48:13] Logged the message, Master [01:50:04] Looks better [01:51:56] yup....anon loads on all of the wikinews wikis looks fine on casual inspection [01:52:06] * chrismcmahon cooks something, ping me if you need something [01:52:11] k...thanks [01:56:17] Anyone mind if I move the wikimedia wikis over? [01:56:34] AaronSchulz: http://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Little_kitten_.jpg/7398px-Little_kitten_.jpg [01:56:49] also TimStarling ^^^ [01:57:09] * Reedy wonders hwy maplebed wants such a large little kitten [01:57:20] I think it looks cute at 500 [01:57:24] Reedy: so it'll fit on the hwy. [01:57:46] anyway MediaWiki is not meant to output URLs like that [01:57:49] (the point of that URL is to demonstrate the error message not being "Unexpected error 500") [01:58:06] if the user requests a large image, it's meant to give a link to the source image and use client scaling [01:58:10] https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Little_kitten_.jpg/5000px-Little_kitten_.jpg gives error 500 [01:58:32] so you get that error if there is cached HTML from a time when the image was larger [01:58:36] sorry. Tim, this is fixing your bug report of swift translating 500s into 404s and obscuring the back end error message. [01:58:47] or if you just change the width manually [01:59:12] oh, well in that case it looks fine [01:59:22] Reedy: let's do it [01:59:23] yep [01:59:30] (wikimedia wikis, that is) [02:01:21] PHP Warning: Invalid argument supplied for foreach() in /usr/local/apache/common-local/php-1.19/includes/api/ApiParamInfo.php on line 150 [02:03:37] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikimedia wikis to 1.19 [02:03:40] Logged the message, Master [02:03:42] AaronSchulz: examples aren't false, but aren't an array [02:04:10] I wonder if it's a string [02:04:18] 5000 pixel kittens ? they'd be megakittens [02:04:23] rawr [02:04:29] !log deployed updated rewrite.py to swift to pass through error codes and error messages it gets from the back end during 404 handling. [02:04:31] Logged the message, Master [02:11:18] so...let's see what we have left: wikiquote, wikibooks, and wiktionary [02:11:28] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [02:11:53] postponing wikisource [02:12:01] robla: and the specials [02:13:06] !log reedy synchronized php-1.19/includes/api/ApiParamInfo.php 'r112291' [02:13:08] Logged the message, Master [02:13:16] let's do wikiquote, wikibooks, and wiktionary in rapid succession (5 min?) [02:13:40] there's 32 specials [02:14:10] Reedy: is it reasonable to treat them as one deployment group? [02:14:52] Probably.. a few have been done already - mw.org, meta, commons... [02:15:08] I suppose wikimediafoundation.org is different enough that we may want to give its own slot [02:15:17] the rest can probably be lumped together [02:16:24] (actually, maybe that last comment more applies to officewiki...) [02:16:27] i dunno [02:16:42] we have weird extensions on all of them, don't we? [02:17:00] nah, not all of them [02:17:12] http://noc.wikimedia.org/conf/special.dblist [02:17:28] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [02:17:28] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [02:17:57] anyway....we can decide that after we're done with wikiquote, wikibooks, and wiktionary [02:18:10] !log LocalisationUpdate completed (1.18) at Fri Feb 24 02:18:10 UTC 2012 [02:18:12] Logged the message, Master [02:18:36] Reedy: wikiquote? [02:21:09] There's 88 [02:22:35] I'll move them over then [02:24:42] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Move wikiquotes over to 1.19 [02:24:45] Logged the message, Master [02:29:23] the only stuff in fatalmonitor is 1.18 [02:29:23] all of the medium-large ones load fine [02:29:25] amusing [02:30:00] wikibooks now? [02:30:27] sure [02:30:36] 121 [02:32:05] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Move wikibooks over to 1.19 [02:32:07] Logged the message, Master [02:34:38] !log LocalisationUpdate completed (1.19) at Fri Feb 24 02:34:38 UTC 2012 [02:34:40] Logged the message, Master [02:35:30] seems to be fine [02:36:11] load is looking fine on apaches [02:36:41] wiktionary? [02:36:46] PHP Warning: mysql_real_escape_string() expects parameter 1 to be string, array given in /usr/local/apache/common-local/php-1.19/includes/db/DatabaseMysql.php on line 429 [02:37:02] mmm [02:37:32] 171 wiktionarys [02:37:34] to bad PHP didn't have scalar type hinting ;) [02:37:45] need a backtrace for that [02:37:56] AaronSchulz: that's not in 5.3? ;-) [02:38:04] TimStarling: are you hacking it in? [02:38:07] should I enable wmerrors? [02:38:18] tim always wants to enable that [02:38:27] wmerrors works just fine except that it segfaults on OOM [02:39:29] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Move wikitionarys over to 1.19 [02:39:31] Logged the message, Master [02:40:01] ganglia is showing htat [02:40:39] did we finally make ganglia notice that we're doing something? [02:40:50] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=&s=by+name&c=Application%2520servers%2520pmtpa&tab=m&vn= [02:41:00] look at cpu/network [02:41:55] some bits spikes, but that's expected [02:44:17] the network spike seems to have stopped [02:45:08] apache cpu is still up 10% [02:46:40] tstarling cleared profiling data [02:46:47] it's hard to tell where it comes from [02:47:46] profiling just shows the usual suspects [02:48:02] yep [02:48:14] http://ganglia.wikimedia.org/latest/graph_all_periods.php?hreg[]=^ms-fe[1-9]&mreg[]=swift_.*_404_avg>ype=line&title=Swift+average+query+response+time+-+404s&z=large&aggregate=1&r=hour [02:48:31] * AaronSchulz wonders what that spike waas [02:51:59] robla: are the specials the only things left for today? [02:52:08] the parser cache miss rate seems to have increased [02:52:09] Reedy: yup [02:52:29] https://graphite.wikimedia.org/render?from=-2hours&until=now&width=500&height=380&target=*.pcache_miss_absent.count [02:53:20] but note that we have a parser cache the size of a pea at the moment [02:53:32] mmmm...green peas [02:53:38] so maybe that doesn't mean anything [02:54:01] other parse graphs seem normal so far [02:54:22] Just keep an eye on it? [02:54:36] dropping already [02:56:04] apaches seem to be settling down, too [02:56:30] New patchset: Tim Starling; "Re-enable wmerrors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2756 [02:56:59] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2756 [02:57:00] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2756 [02:57:59] OOM still only appearing on 1.18 [02:58:37] !log re-enabling wmerrors [02:58:40] Logged the message, Master [02:59:37] there's your error again aaron [03:01:25] maybe wmerrors can help [03:02:22] I'm wmerrors and I'm here to say [03:02:27] the help you need is on its way! [03:03:27] * AaronSchulz thinks less of tim now [03:04:30] actually wmerrors is only giving backtraces for fatals, not warnings [03:04:35] ;) [03:04:41] how about we have it throw an exception instead? [03:05:19] I guess [03:06:29] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Move specials over to 1.19 [03:06:31] Logged the message, Master [03:07:06] !log tstarling synchronized php-1.19/includes/db/DatabaseMysql.php 'debugging patch for array parameter warning' [03:07:08] Logged the message, Master [03:07:21] PROBLEM - Apache HTTP on srv234 is CRITICAL: Connection refused [03:07:39] PROBLEM - Apache HTTP on srv247 is CRITICAL: Connection refused [03:07:52] Newbie here. Can someone help me with a question, please? [03:08:01] hmm, no backtraces in exception log? [03:08:06] ok, trigger_error() [03:08:19] odea: ask away [03:08:28] I've done trigger_error() with wfDebugBacktrace() before...hideous, but worked [03:08:29] Ok, thanks [03:08:54] robla: and that should be the lot [03:09:08] A bugzilla was created on my behalf last year but the tech who was dealing with it is not around much, so now my bug has lain dormant for months [03:09:16] so can someone look at it for me please? [03:09:29] https://bugzilla.wikimedia.org/show_bug.cgi?id=30923 [03:09:43] !log tstarling synchronized php-1.19/includes/db/DatabaseMysql.php 'debugging patch with trigger_error' [03:09:46] Logged the message, Master [03:09:48] I just need some stranded user history merged after I changed my wiki ID. [03:09:57] "just" [03:10:03] ;) [03:10:04] PHP Fatal error: Call to a member function text() on a non-object in /usr/local/apache/common-local/php-1.19/includes/SpecialPage.php on line 649 [03:10:19] What do you mean, Reedy? [03:10:52] New review: Hashar; "I have removed the warning from the git-review labsconsole article :-)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2682 [03:11:10] Hmm, is Hashar still up, or up early? [03:11:17] It's not a parcitularily difficult thing to do, but it's not simple either [03:11:51] I see...I know nothing of the underlying technical procedure. [03:12:00] 1 and a half months is not really dormant relatively [03:12:07] we've bugs untouched (and open) for years [03:12:36] Well, the thing really goes back to September. [03:13:17] AaronSchulz: I wonder if there is an associated warning [03:13:29] ...actually before September, my ID change preceded that and some history was stranded [03:13:57] !log reedy synchronized wmf-config/InitialiseSettings.php 'fix kowiki namespace aliases' [03:13:59] Logged the message, Master [03:14:49] Feb 23 21:02:53 10.0.8.23 apache2[6573]: PHP Warning: filemtime() [function.filemtime]: stat failed for /usr/local/apache/common/php-1.17/resources/jquery.ui/themes/vector/images/ui-anim_basic_16x16.gif in /usr/local/apache/common-local/php-1.18/includes/resourceloader/ResourceLoaderFileModule.php on line 380 [03:15:09] !log reedy synchronized wmf-config/CommonSettings.php [03:15:12] Logged the message, Master [03:15:13] 1.17? [03:15:20] file(/home/wikipedia/common/php-1.19/../switchover-jun30.dblist) [function.file]: failed to open stream: No such file or directory in /home/wikipedia/common/wmf-config/CommonSettings.php on line 135 [03:15:26] wtf is that? [03:15:39] an old db list [03:15:55] just kill that [03:15:57] TimStarling: I guess that means we've still got wrong rl db entries [03:16:00] I just did [03:16:09] thanks [03:16:37] Ok, I don't really know what's happening now. As I said, I never used this forum before. Is the conversation over? [03:16:39] I guess the RL issues will be cleared up when we re-enable MessageBlobStore::clear() [03:17:05] anyway there doesn't appear to be a warning associated with the SpecialPage.php thing [03:17:16] odea: sorry we're a little distracted right now. we're in the middle of a major site deployment [03:17:32] So should I come back tomorrow or what? [03:18:06] ok that text() error weird, how does it get past call_user_func_array( 'wfMessage', $args )->setContext( $this ) ? [03:18:13] * robla looks at odea's request in bugzilla [03:18:22] I'll speak to the person the bug is assigned to when I've had some sleep and they're also not asleep [03:18:32] I'm not sure if it's as simple as doing a DB update query [03:18:51] though I seem to recall Ariel having written a maintenance script for something related to it [03:18:58] Ok thanks reedy. Be advised that person said at one of his pages that he is not around much, and that is why I wound up here. [03:19:52] AaronSchulz: I thought that maybe the call_user_func_array() in SpecialPage::msg() was failing [03:20:07] but you'd think that would cause a warning and I don't see one [03:20:26] call_user_func_array has some failure modes where it can return null [03:20:27] Ok, I'll take off and let you get on with your major development. I'll check in a day or two to see if my bug haas been progressed and if not I'll come by here again. Thank you. [03:21:06] Declaration of SkinTomas::outputPage() must be compatible with that of Skin::outputPage() in /usr/local/apache/common-local/php-1.19/extensions/skins/Tomas/Tomas.class.php on line 63 [03:21:36] yeah that looks easier to debug at least [03:23:33] RECOVERY - Apache HTTP on srv247 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [03:23:42] I reproduced the SpecialPage one in eval.php [03:24:55] how did you not get a warning? [03:26:00] I just called $special->getDescription() and it gave a fatal [03:27:32] ahhh [03:27:42] ContributionTracking overrides msg() [03:27:52] function msg() { [03:27:52] return wfMsgExt( func_get_arg( 0 ), array( 'escape', 'language' => $this->lang ) ); [03:27:52] } [03:27:54] * AaronSchulz sighs [03:30:03] right, accidentally [03:30:11] so it's a b/c break bug [03:33:48] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.038 second response time [03:34:51] PROBLEM - Disk space on mw35 is CRITICAL: DISK CRITICAL - free space: /tmp 27 MB (1% inode=88%): [03:36:07] !log reedy synchronized php-1.19/extensions/DoubleWiki/DoubleWiki_body.php 'r112292' [03:36:09] Logged the message, Master [03:36:32] !log cleaned up /tmp on mw35 [03:36:34] Logged the message, Master [03:36:48] RECOVERY - Disk space on mw35 is OK: DISK OK [03:38:24] if I had a financial reason to care greatly about stability, I would probably be prefixing every custom method name in every class derived from a MW core class [03:38:45] robla: we can put 1.19wmf1 on wikisource now... [03:39:03] TimStarling: especially if we go wild on traits :p [03:39:23] !log tstarling synchronized php-1.19/extensions/ContributionTracking/ContributionTracking_body.php 'r112294' [03:39:25] Logged the message, Master [03:41:27] [24-Feb-2012 03:41:08] Fatal error: Declaration of SkinSchulenburg::outputPage() must be compatible with that of Skin::outputPage() at /usr/local/apache/common-local/php-1.19/extensions/skins/Schulenburg/Schulenburg.class.php on line 72 [03:41:29] starting on this one now [03:42:54] TimStarling: probably want to fix the other skins too [03:43:22] abstract function outputPage( OutputPage $out = null ); [03:43:29] what sort of type hint is that? [03:43:31] Tomas and Donate are also used [03:43:53] Yeah.. Considering that won't work [03:44:01] TimStarling: I was wtfing at that too [03:44:03] I remember seeing something similar not so long ago [03:44:05] in Tomas [03:44:05] that's in core [03:44:13] not in Schulenburg [03:44:22] Skin::outputPage [03:44:25] right [03:44:56] so what do you do when it's null? [03:45:04] or is this just for some errant weird subclass [03:45:41] I'm annotating [03:46:29] https://www.mediawiki.org/wiki/Special:Code/MediaWiki/95959 [03:48:35] Lovely [03:48:50] so it's always null, not just for some weird subclass [03:49:28] so then nothing called Tomas::outputPage [03:49:30] it has $bodyText = $out->getHTML(); [03:49:37] which would explode [03:49:58] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Move wikisource over to 1.19 [03:49:59] yeah I'll update it [03:50:01] Logged the message, Master [03:53:07] sorry...was talking to Erik [03:56:00] !log tstarling synchronized php-1.19/extensions/skins/Schulenburg/Schulenburg.class.php [03:56:02] Logged the message, Master [03:56:18] PROBLEM - Disk space on mw9 is CRITICAL: DISK CRITICAL - free space: /tmp 10 MB (0% inode=89%): [03:56:44] !log tstarling synchronized php-1.19/extensions/skins/Tomas/Tomas.class.php [03:56:47] Logged the message, Master [03:56:51] I wonder what's causing those disks to fill up [03:56:57] I guess I should just run the cron job early [03:57:17] !log tstarling synchronized php-1.19/extensions/skins/Donate/Donate.class.php [03:57:20] Logged the message, Master [03:57:21] * jdelanoy once saw someone who had set up recursive backups... [03:57:27] that was fun [03:57:54] 2G /tmp [03:57:58] 239G /a that's unused [03:59:12] yeah maybe we should start apache with TMP=/a/tmp [03:59:28] Certainly seems sensible [03:59:39] assuming they all have /a [03:59:41] Should the EasyTimeline stuff get deleted? I see stuff from October 2011 [04:00:04] I thought there was a cron job to do it, but apparently that only clears php* files [04:00:18] ah [04:00:41] !log cleaning up /tmp on all apaches [04:00:46] Logged the message, Master [04:01:32] Looking at a few apaches suggests most should have a /a [04:02:18] RECOVERY - Disk space on mw9 is OK: DISK OK [04:02:59] And it's once again 4am [04:03:01] I'm out [04:03:19] 4am, time to knock off for today? [04:03:28] time for dinner, watch some TV? [04:03:45] good night ;) [04:05:36] Exactly! Night [04:07:32] yup, goodnight! [04:07:56] thanks Reedy! thanks TimStarling! [04:09:01] TimStarling: I think I'm going to take off here as well. anything you need before I go? [04:09:06] I complained to the ops people about the small /tmp partitions, but of course it's a hard thing to change once it's done [04:09:18] (actually, come to think of it, I'll probably get back online once I get home) [04:09:19] nope, bye [04:14:38] working on the Database::strencode() thing now [04:32:01] !log tstarling synchronized php-1.19/includes/api/ApiFeedContributions.php [04:32:04] Logged the message, Master [04:42:24] I isolated the OOM in LandingCheck: [04:42:41] > print Language::getFallbackFor('pt-br') [04:42:41] pt [04:42:41] > print Language::getFallbackFor('pt') [04:42:41] pt-br [04:43:21] nice [04:43:28] that makes it go into an infinite loop [04:44:09] it looks like code I would have written, relying on the circular reference checks in LocalisationCache to avoid an infinite loop [04:44:14] but Nikerabbit took those out [04:44:33] I guess he did it to support it as a feature [04:44:53] * Aaron|home eats rice pudding [05:06:07] Well, at least these bugs get worked out before a proper MediaWiki release. [05:06:14] That's nice. :-) [05:13:44] Joan: yes, well you know the plan was just to leave them broken on the cluster till the next release >.> [05:17:42] !log tstarling synchronized php-1.19/extensions/DonationInterface/globalcollect_gateway/globalcollect.adapter.php 'r112301' [05:17:45] Logged the message, Master [05:18:20] !log tstarling synchronized php-1.19/extensions/LandingCheck/SpecialLandingCheck.php 'r112301' [05:18:22] Logged the message, Master [05:18:50] !log tstarling synchronized php-1.19/extensions/SecurePoll/includes/pages/Page.php 'r112301' [05:18:52] Logged the message, Master [05:32:28] !log synchronized payments cluster to r112287 [05:32:30] Logged the message, Master [06:22:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:24:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.909 seconds [06:59:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.730 seconds [07:39:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:43:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.353 seconds [07:49:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [08:47:06] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [08:50:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.056 seconds [09:06:35] New patchset: Mark Bergsma; "Set method { biosgrub } to install a BIOS grub partition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2757 [09:07:14] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2757 [09:07:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2757 [09:16:31] no holidays for willow http://munin.toolserver.org/Login/willow/cpu.html [09:24:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:26:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.158 seconds [09:49:41] New patchset: ArielGlenn; "show item status (= items in archive.org 'job queue')" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/2758 [09:49:44] New review: gerrit2; "Lint check passed." [operations/dumps] (ariel); V: 1 - https://gerrit.wikimedia.org/r/2758 [10:01:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:02:54] hi, can someone look into this picture: http://commons.wikimedia.org/wiki/File:Billete_$100_Mexico_Centenario_Reverso.png [10:03:06] i only get the mainpage served from commons [10:03:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.351 seconds [10:03:12] but it works with http://commons.wikimedia.org/w/index.php?title=File:Billete_$100_Mexico_Centenario_Reverso.png [10:04:17] nevermind, https://bugzilla.wikimedia.org/show_bug.cgi?id=34684 [10:30:26] New patchset: Mark Bergsma; "Break up swift::storage into subclasses for dependencies" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2759 [10:31:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2759 [10:32:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2759 [10:32:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2759 [10:37:40] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:40:21] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [10:53:01] New patchset: Mark Bergsma; "Modify partman recipe to not make a huge swap partition" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2760 [10:53:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2760 [10:53:45] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2760 [10:53:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2760 [10:53:48] New review: Hashar; "(no comment)" [operations/dumps] (ariel) C: 1; - https://gerrit.wikimedia.org/r/2683 [10:55:56] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2748 [11:50:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:52:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.634 seconds [11:58:11] New patchset: Mark Bergsma; "Puppet was making no attempt to start Swift" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2761 [11:58:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2761 [11:58:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2761 [11:58:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2761 [12:02:23] !log tstarling synchronized php-1.19/includes/AutoLoader.php 'r112316' [12:02:25] Logged the message, Master [12:02:43] !log tstarling synchronized php-1.19/includes/PathRouter.php 'r112316' [12:02:45] Logged the message, Master [12:12:32] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [12:18:32] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [12:18:32] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [12:27:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:29:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.970 seconds [13:03:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:05:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.292 seconds [13:12:30] I'm baaack... [13:12:37] * jeblad puts on hockeymask [13:12:42] Troubles.. [13:13:00] Amsterdam seems to be out of sync somehow [13:13:33] http://commons.wikimedia.org/wiki/File:Blomster_i_skogen_%28Katyn%29.jpg [13:14:16] The previous image is correct if visited while logged in, but the text is wrong if visited as an anonymous user [13:14:58] The text was changed 21. feb., so I guess it has something to do with the upgrade [13:15:33] I have changed the text to force a refresh so I don't know if the error is still there [13:16:16] Otherwise, nice to see MW1.19 is deplyed! =) [13:17:54] jeblad, the text? the image description? what's the correct one? [13:18:31] ah, but now you've edited it [13:19:34] Yes, I've edited it.. Perhaps someone should hav had access to the old version for debugging purposes? [13:20:02] Anyhow, something weird might be happening with the Amsterdam-cluster after the upgrade [13:22:08] It was this change http://commons.wikimedia.org/w/index.php?title=File%3ABlomster_i_skogen_%28Katyn%29.jpg&diff=67348790&oldid=66556983 [13:22:19] It didn't replicate to Amsterdam [13:22:30] Yes, the page might have been useful for debugging, too late now I guess. [13:22:58] I had someone nag'ing on me about it :( [13:25:56] The people asking about the image says you all do a great job! =) [13:30:35] * Nemo_bis surely not, doesn't do anything [13:40:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.995 seconds [14:16:12] New patchset: Mark Bergsma; "Fix the broken cron job mail bombing me" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2762 [14:16:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2762 [14:16:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2762 [14:16:41] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2762 [14:16:42] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2762 [14:19:45] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2685 [14:19:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2685 [14:20:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:26:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.986 seconds [14:31:16] jeblad: are you still seeing examples of out of sync stuff on commons as anon? [14:48:20] New patchset: Pyoungmeister; "don't want generic" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2763 [14:48:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2763 [14:50:36] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2763 [14:50:36] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2763 [14:57:31] New patchset: Pyoungmeister; "harumph dynamic typing..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2764 [14:57:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2764 [14:58:24] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2764 [14:58:24] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2764 [15:02:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.042 seconds [15:37:58] New patchset: RobH; "chris changed keys, updated admins file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2765 [15:38:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2765 [15:38:52] New review: RobH; "simple key swap" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2765 [15:38:52] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2765 [15:40:53] New patchset: Pyoungmeister; "puppetizing some indexer stuffs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2766 [15:41:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2766 [15:42:57] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2766 [15:42:57] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2766 [15:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [16:04:34] PROBLEM - Lucene on mw1020 is CRITICAL: Connection refused [16:16:42] New patchset: Hashar; "mention changes should be ignored" [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:18:07] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/30/ (1/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:18:07] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/25/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:18:37] robla: I had one client (an university employee) that complained over one image that wasn't in sync [16:19:15] That is, it wasn't the client discovering the failure but the source for the image [16:19:40] I don't know if there are more images with similar problems [16:20:28] ok. thanks for the clarification. we'll keep an eye on it [16:21:23] I guessed it was because of the upgrade and that the slave simply was lagging behind somehow [16:23:23] New patchset: Hashar; "jenkins: git preparaton script for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [16:24:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:24:55] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/31/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2617 [16:26:24] New patchset: Hashar; "mention changes should be ignored" [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:27:33] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/26/ (1/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:27:34] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/32/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [16:30:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.538 seconds [16:40:28] New review: Hashar; "(no comment)" [test/mediawiki/core2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2767 [16:40:28] Change merged: Hashar; [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2767 [16:59:10] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/33/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:03:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:09:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.461 seconds [17:12:36] New patchset: Krinkle; "Adding some new lines during the git training" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2768 [17:12:37] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2768 [17:16:36] RECOVERY - Lucene on search1003 is OK: TCP OK - 0.026 second response time on port 8123 [17:19:15] New review: Krinkle; "Hey" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2768 [17:24:06] PROBLEM - Lucene on mw1110 is CRITICAL: Connection refused [17:24:33] PROBLEM - Lucene on mw1010 is CRITICAL: Connection refused [17:26:54] New review: Sumanah; "Publishing inline comment." [test/mediawiki/extensions/examples] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2768 [17:36:52] New review: Sumanah; "This is a great change!" [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2768 [17:36:52] Change merged: Sumanah; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2768 [17:39:45] New patchset: Sumanah; "suggestion to use Git" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2769 [17:39:47] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2769 [17:43:54] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:44:38] New review: Krinkle; "Approving this message." [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2769 [17:45:08] Change merged: Krinkle; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2769 [17:45:40] !log catrope synchronized multiversion/MWVersion.php 'Add file_exists check for /home before trying to access /home' [17:45:42] Logged the message, Master [17:48:26] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/34/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:48:55] New patchset: Andre Engels; "* Worked Andrew's comments into my code * Changed test_access_log_pipeline.py into test_traits.py * Did some changes in determining traits based on the test * Removed the trait 'domain' because it duplicated the trait 'site' * Used yield instead of creati" [analytics/reportcard] (andre/mobile) - https://gerrit.wikimedia.org/r/2770 [17:49:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.156 seconds [17:51:10] !log And of course I can't commit this because the code in /h/w/common/multiversion hasn't been updated this calendar year and there are undeployed commits from January *grumble* [17:51:13] Logged the message, Mr. Obvious [17:51:13] AaronSchulz: --^^ [17:52:34] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/35/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:56:04] yeah, I'll update that [17:56:23] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/36/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [17:56:26] RoanKattouw: they were likely forgotten about in the time it took for them to be reviewed [17:56:40] Once you update it, commit my change too [17:56:46] It doesn't fix the issue anyway [17:56:51] include_once( "/home/wikipedia/common/wmf-config/CommonSettings.php" ); in Localsettings.php [17:58:52] !log catrope synchronized php-1.19/LocalSettings.php 'Guard against /home/wikipedia not existing' [17:58:54] Logged the message, Master [18:00:49] ok, no scap for a sec [18:00:53] * AaronSchulz checks around [18:04:53] !log aaron synchronized multiversion 'deployed all changes through HEAD' [18:04:55] Logged the message, Master [18:06:00] RoanKattouw_away: you can commit now [18:09:39] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/37/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [18:10:35] New review: jenkins-bot; "Build Started https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/38/ (2/2)" [test/mediawiki/core2] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2767 [18:15:44] has something been changed regarding loading of javascript? - all my helper scripts have stopped working [18:16:07] 1.19 deployment, probably [18:16:25] what were they using? [18:16:38] no idea, I just use them :-) [18:17:39] sigh [18:18:04] !log aaron synchronized multiversion/MWMultiVersion.php 'deployed r112335' [18:18:06] Logged the message, Master [18:18:09] the panes/tabs in my settings are gone as well [18:19:01] hmm, changed to vector and back to monobook, and the tabs were back [18:19:15] that points to css not loaded completely [18:19:21] maybe a hard refresh would solve it [18:19:32] yes, would seem so [18:20:24] a couple of hours ago, templates weren't placed correctly and I did the same trick [18:20:44] but the javascript thing is a different matter [18:23:04] you could have those problems with a half-loaded javascript [18:24:10] I had the same problem in a different browser [18:24:25] but now some of the scripts have returned in my usual browser [18:24:45] !log aaron synchronized multiversion/MWVersion.php [18:24:47] Logged the message, Master [18:25:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:31:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.126 seconds [18:37:12] New patchset: ArielGlenn; "the mhash workaround for snaps is hopefully no longer needed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2771 [18:37:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2771 [18:37:50] hi ops! sumana sent me here -- may I get an account/password for gerrit/labconsole? [18:38:15] my username is "au" on the old svn machine and "Au" is my mediawiki.org account name. [18:38:16] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2771 [18:38:17] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2771 [18:43:46] New patchset: Trevor Parscal; "Added important notes to HelloWorld INSTALL" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2772 [18:43:47] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2772 [18:46:25] au: one more step! wikimedia-labs is the place to go [18:46:34] thx! [18:46:53] . o O (In a maze full of wikimedia-* channels, all different) [18:47:14] New patchset: Robmoen; "added hello world from rob" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [18:48:12] au, I'm in lots of channels [18:48:22] I just see tabs with caption "#w..." [18:48:41] except a couple which begin with #m (mediawiki) and another ## [18:48:51] rof... [18:48:54] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [18:55:20] New review: Trevor Parscal; "Maybe it's better if people aren't asked directly to give me money." [test/mediawiki/extensions/examples] (master) C: -1; - https://gerrit.wikimedia.org/r/2772 [18:57:00] New review: Robmoen; "No money for you." [test/mediawiki/extensions/examples] (master) C: 0; - https://gerrit.wikimedia.org/r/2772 [18:57:56] New review: Sumanah; "Great!" [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2772 [18:57:56] Change merged: Sumanah; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2772 [18:58:38] Change abandoned: Sumanah; "Sorry." [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [19:02:43] New patchset: Sumanah; "blowing your mind" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2774 [19:02:44] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2774 [19:03:34] New review: Trevor Parscal; "This is by far the most brilliant commit I have ever seen." [test/mediawiki/extensions/examples] (master) C: 2; - https://gerrit.wikimedia.org/r/2774 [19:03:38] New review: Robmoen; "Cool" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2774 [19:04:05] New review: au; "Tres cool!" [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2774 [19:04:10] Change merged: Trevor Parscal; [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2774 [19:05:49] New review: au; "Abandon all changes, ya who commits I5e298de1." [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2773 [19:06:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.611 seconds [19:15:29] New patchset: au; "* Testing commit" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2775 [19:15:31] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2775 [19:16:04] New review: au; "Oh I would prefer that I didn't submit this." [test/mediawiki/extensions/examples] (master) C: -1; - https://gerrit.wikimedia.org/r/2775 [19:16:17] New review: au; "...but I changed my mind..." [test/mediawiki/extensions/examples] (master) C: 1; - https://gerrit.wikimedia.org/r/2775 [19:25:09] Change abandoned: au; "...and I shall finish this change... with abandon!" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2775 [19:45:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:44] guillom, lots of translations for https://www.mediawiki.org/wiki/MediaWiki_1.19/Deployment_announcement :-O [19:51:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.691 seconds [19:51:48] Nemo_bis, yeah, I didn't expect so many. [19:51:54] me neither [20:15:44] New patchset: Sumanah; "better and shorter string" [test/mediawiki/extensions/examples] (master) - https://gerrit.wikimedia.org/r/2776 [20:15:45] New review: gerrit2; "Lint check passed." [test/mediawiki/extensions/examples] (master); V: 1 - https://gerrit.wikimedia.org/r/2776 [20:25:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:10] PROBLEM - Host search1009 is DOWN: PING CRITICAL - Packet loss = 100% [20:29:40] RECOVERY - Host search1009 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [20:30:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.270 seconds [20:49:10] PROBLEM - MySQL Slave Delay on db16 is CRITICAL: CRIT replication delay 205 seconds [21:04:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:10:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.442 seconds [21:16:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 8140: HTTP/1.1 500 Internal Server Error [22:01:38] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [22:05:02] !log aaron synchronized php-1.19/extensions/FlaggedRevs/frontend/modules/ext.flaggedRevs.review.js 'deployed r112361' [22:05:06] Logged the message, Master [22:09:08] @replag [22:09:40] @replag [22:09:41] Krinkle: [s7] db16: 2536s [22:09:47] O_O ? [22:09:59] O_O ? [22:10:01] @replag [22:10:02] Krinkle: [s7] db16: 2536s [22:10:06] hm.. [22:10:11] @info s7 [22:10:11] Krinkle: [s7] db16: 10.0.6.26, db37: 10.0.6.47, db18: 10.0.6.28, db26: 10.0.6.36 [22:10:33] I guess the bot's copy of noc.wm.o is oudated [22:10:44] or not? [22:10:55] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:11:16] @replag centralauth [22:11:16] Krinkle: [centralauth: s7] db16: 2543s, db37: 0s, db18: 0s, db26: 0s [22:11:33] nope still up 2 date [22:12:00] 22:10 binasher: restarted the 1.19 schema migration script - it's going to hit the just rotated s3 (db34), s2 (db30), s7 (db16), and s4 (db31) ex-masters before resuming s5 (db55) and all s6/s1 slaves [22:12:06] @replag s3 [22:12:07] Krinkle: [s3] db34: s, db39: 0s, db25: 0s, db11: 0s [22:12:17] s [22:12:24] @replag s2 [22:12:24] Krinkle: [s2] db30: 0s, db13: 0s, db24: 0s, db54: 0s [22:13:38] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [22:17:14] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:19:38] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [22:19:39] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [22:24:43] New patchset: Lcarr; "Adding in plugin-config files to puppet and new nagios manifest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:25:48] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2777 [22:25:49] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2777 [22:26:06] Hi, the 50x - server errors are increasing drastically. [22:26:22] rillke: can you give us some example pages ? [22:26:26] 50x server errors? Where? What kinds of requests? [22:26:43] It's probably not 500s right? I see no PHP errors in the log [22:26:50] yes. please wait while pasting to pastebin [22:26:51] So I assume this is about 503 and such [22:26:58] 502 [22:31:05] http://p.defau.lt/?lVc_skMZPNiSdCT_gXZKiw [22:31:46] eep secure.wikimedia.org [22:31:52] The API. But even normal editing throws lots of errors. [22:32:04] edit/move/delete [22:32:09] why are you using the API to secure.wikimedia ? [22:32:22] (we're trying to get all users off of secure.wm) [22:32:45] Is this relevent for you? Simply shut down secure.wm and I will move. [22:32:57] No problem [22:33:05] Dammit we still need to do those redirect [22:33:08] Ryan_Lane: ! [22:33:14] *redirects [22:33:19] heh [22:33:20] But if you offer this servervice it must work. [22:33:22] yeah. we do [22:33:28] rillke: Well it's quite possible that the reason you get these 503s all the time is because you're using secure [22:33:43] secure.wm.o is still offered but it is deprecated [22:33:44] rillke: just use https [22:33:49] Then shut it down. [22:33:53] The only reason it doesn't redirect to https yet is because Ryan and I have been busy [22:33:54] https://en.wikipedia.org [22:34:02] you *can* use it if you want [22:34:07] but there's no good reason to [22:34:11] If I hadn't decided to move to another country, this would've been done alredy [22:34:15] heh [22:34:37] I'm simply doing too many things at once [22:34:40] Me too [22:35:12] ah so secure lives on singer ? [22:35:20] yep [22:35:25] 8] me lookingfor this TParis, anyone know where he is? [22:35:30] Doesn't the blog still live there too? [22:35:37] not anymore [22:35:41] planet does [22:35:45] planet does, yeah [22:35:55] also, puppet's dead right now, which can't be helping singer ... [22:36:00] LeslieCarr: We once had a fun incident where the site had DNS issues, someone tweeted that secure still worked, folks flocked to secure, broke singer, and that took down the blog post announcing the downtime as well [22:36:13] heh [22:36:21] Seems like one of his toolserver scripts doesn't work [22:36:29] i'm going to track down the puppet breakage right now, if somoene can figure out why secure is flipping out [22:38:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.876 seconds [22:39:01] !log reloaded apache2 on stafford (puppetmaster) [22:39:03] Logged the message, Mistress of the network gear. [22:39:17] LeslieCarr: ----^^ HTTP 400 != OK [22:39:22] I filed an RT ticket about that a few days ago [22:39:36] lol [22:39:43] hehe [22:40:08] maybe it should say "expected result for this check" [22:40:17] because that's the expected result [22:40:21] That's just lame [22:40:48] okay, so i see a lot of timeouts/errors on secure right now [22:40:50] There couldn't be a check where the expected status is 200 or 301 [22:41:01] I'm not sure if there is [22:41:03] Like, just like everything else? :) [22:41:11] because it requires a client certificate [22:41:21] D'oh, of course [22:41:29] RoanKattouw: it's more important that i get the new nagios server up and running so spence doesn't die every few days, then we can get better checks :) [22:41:43] Well I would argue that if you're not testing the actual thing, you're not really monitoring it fully [22:41:45] I don't think the check can be any better [22:41:45] Yeah, sure [22:42:07] unless we want the nagios server to authenticate using the system's puppet certificate [22:42:14] but that is likely pretty insecure [22:42:41] !log aaron synchronized php-1.19/includes/specials/SpecialContributions.php 'deployed r112366' [22:42:44] Logged the message, Master [22:43:53] PHP Warning: Division by zero in /usr/local/apache/common-local/php-1.19/extensions/ParserFunctions/Expr.php on line 423 [22:44:18] RoanKattouw: the world will explode! Some fool divided by zero! [22:44:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:45:45] annoying to see that as an error [22:50:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.943 seconds [22:53:12] @quit [22:53:24] hrm, so it looks like there's a few commands that degrade performance but reduce some of the errors we're seeing on secure.wm [22:53:39] basically force the proxy requests to be HTTP 1.0 instead of 1.1 [22:57:54] Krinkle: ? [22:57:58] sry [22:58:05] updating externals [22:58:15] toolserver changed something, bot didn't work [22:58:20] should be fine now [23:00:52] rillke: i'm changing some proxy settings now [23:00:57] New patchset: Lcarr; "Updating secure.wikimedia.org proxy config a la https://httpd.apache.org/docs/2.2/mod/mod_proxy_http.html" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2778 [23:01:41] rillke: we'll see if that helps the secure.wm.org issues , if you're willing to test in about 10 minutes ? [23:02:07] yes, perhaps [23:02:10] thanks [23:02:42] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2778 [23:02:42] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2778 [23:03:17] !log pushing new apache conf file to singer for secure.wikimedia.org - may impact performance of secure site [23:03:19] Logged the message, Mistress of the network gear. [23:03:45] !log reloading apache2 on singer [23:03:47] Logged the message, Mistress of the network gear. [23:04:16] oh that totally killed secure :( [23:08:12] that will include planet and some other stuff too then, eh? [23:08:41] PROBLEM - HTTP on singer is CRITICAL: Connection refused [23:08:52] yeah [23:09:17] AaronSchulz: funny, the svn repo of dbbot-wm basically reproduces a fraction of the hidden svn repo for the noc files :P [23:09:21] https://fisheye.toolserver.org/browse/krinkle/trunk/wmfDbBot/external/db.php [23:09:53] doesn't have to be in subversion though, could be done by the install script [23:11:14] New patchset: Lcarr; "adding in ssl file info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2779 [23:11:37] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2779 [23:11:42] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2779 [23:11:42] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2779 [23:12:36] !log restarted apache on singer again [23:12:39] Logged the message, Mistress of the network gear. [23:13:05] rillke: can you test again and see if you still get a lot of 500's ? [23:14:41] RECOVERY - HTTP on singer is OK: HTTP OK - HTTP/1.1 302 Found - 0.003 second response time [23:14:42] * rillke is testing [23:21:23] New patchset: Lcarr; "Ensuring /etc/nagios-plugins directory exists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2780 [23:23:46] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2780 [23:23:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2780 [23:24:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:15] New patchset: Hashar; "jenkins: git preparaton script for gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [23:27:04] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2513 [23:27:23] thanks RoanKattouw :-) [23:30:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.497 seconds [23:31:24] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2513 [23:31:24] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2513 [23:32:35] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2629 [23:32:57] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2587 [23:32:58] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2587 [23:35:27] New review: Bhartshorne; "I don't think we want all swift access logs cetnralized in this way. If we tune swift logs so that ..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/2673 [23:38:13] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2677 [23:38:13] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2677 [23:39:55] New review: Lcarr; "Looks good but a little cautious, want another review" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2514 [23:41:20] New review: Bhartshorne; "well, nevermind. they're already getting pushed via syslog. +1 on splitting them into a swift-spec..." [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2673 [23:41:32] @replag [23:41:34] Alchimista: [s7] db16: 2202s [23:42:04] @replag all [23:42:05] Alchimista: [s1] db36: 0s, db12: 0s, db32: 0s, db38: 0s, db52: 0s, db53: 0s; [s2] db13: 0s, db30: 0s, db24: 0s, db54: 0s; [s3] db39: 0s, db34: s, db25: 0s, db11: 0s [23:42:06] Alchimista: [s4] db22: 0s, db31: 0s, db33: 0s, db51: 0s; [s5] db45: 0s, db35: 0s, db44: 0s, db55: 0s; [s6] db47: 0s, db43: 0s, db46: 0s, db50: 0s; [s7] db37: 0s, db16: 2155s, db18: 0s, db26: 0s [23:43:03] LeslieCarr: Seems to be better now. Thank you! [23:43:45] awesome :) [23:43:50] glad to help! [23:55:24] gn8 folks [23:55:38] RECOVERY - MySQL Slave Delay on db16 is OK: OK replication delay 0 seconds [23:55:43] New patchset: Hashar; "rm ansi sequences when validating puppet changes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2781 [23:57:58] Hi, [23:58:18] http://www.mediawiki.org/wiki/Maxlag_parameter says: "Unusually high or persistent lag should be reported to #wikimedia-tech on irc.freenode.net." [23:58:38] My bot was just sleeping an hour and I am not very happy with that! [23:59:15] Updating page [[Szerkesztő:BinBot/munka]] via API Pausing 300 seconds due to database server lag. Updating page [[Columbo]] via API Pausing 300 seconds due to database server lag. Updating page [[Szerkesztő:BinBot/munka]] via API Pausing 300 seconds due to database server lag. Updating page [[Columbo]] via API Pausing 300 seconds due to database server lag. Updating page [[Szerkesztő:BinBot/munka]