[00:00:43] (03CR) 10Reedy: [C: 032] Use uca-cy collation on Welsh projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106163 (owner: 10Reedy) [00:00:52] (03Merged) 10jenkins-bot: Use uca-cy collation on Welsh projects [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106163 (owner: 10Reedy) [00:01:58] !log reedy synchronized wmf-config/InitialiseSettings.php [00:02:05] Logged the message, Master [00:16:16] (03PS1) 10BryanDavis: Ensure that logstash is running [operations/puppet] - 10https://gerrit.wikimedia.org/r/106164 [00:27:36] (03CR) 10Ori.livneh: [C: 032] "Keep in mind that having the service be automatically refreshed (read: restarted) by Puppet may not be desirable in the long run, if peopl" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106164 (owner: 10BryanDavis) [00:35:17] !log kaldari synchronized php-1.23wmf9/extensions/VectorBeta [00:35:23] Logged the message, Master [00:36:28] (03PS1) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/106169 [00:36:29] (03PS1) 10BryanDavis: Proxy kibana.wikimedia.org via misc varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/106170 [00:37:47] (03Abandoned) 10BryanDavis: Kibana puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/104172 (owner: 10BryanDavis) [00:41:37] !log reedy synchronized php-1.23wmf9/extensions/CategoryTree/CategoryTreeFunctions.php [00:41:44] Logged the message, Master [00:42:02] (03PS2) 10BryanDavis: Add kibana.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/105105 [00:43:17] !log reedy synchronized php-1.23wmf9/resources 'touch' [00:43:24] Logged the message, Master [00:53:46] I'm trying to do a git push to gerrit, but it says my authentication fails. I'm using the same username and password as I do for the gerrit website. Is there something special I need to do when doing git push from the command line? [00:54:00] sorry, wrong channel [01:01:27] !log reedy updated /a/common to {{Gerrit|Ic65d167b3}}: Use uca-cy collation on Welsh projects [01:01:31] (03PS1) 10Reedy: Remove $wgCategoryTreeDisableCache, leave as default of true [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106172 [01:01:33] Logged the message, Master [01:04:07] (03CR) 10Reedy: [C: 032] Remove $wgCategoryTreeDisableCache, leave as default of true [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106172 (owner: 10Reedy) [01:04:17] (03Merged) 10jenkins-bot: Remove $wgCategoryTreeDisableCache, leave as default of true [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106172 (owner: 10Reedy) [01:04:54] !log reedy synchronized wmf-config/ [01:05:01] Logged the message, Master [01:05:11] (03PS2) 10BryanDavis: [WIP] Add logstash config for udp2log [operations/puppet] - 10https://gerrit.wikimedia.org/r/106154 [01:06:06] !log reedy synchronized php-1.23wmf9/extensions/CategoryTree/CategoryTreeFunctions.php 'rv' [01:06:12] Logged the message, Master [01:06:54] !log reedy updated /a/common to {{Gerrit|I613b194f9}}: Remove $wgCategoryTreeDisableCache, leave as default of true [01:07:00] Logged the message, Master [01:07:03] (03PS1) 10Reedy: Fix indenting in $wmgUseCategoryTree [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106174 [01:07:18] (03CR) 10Reedy: [C: 032] Fix indenting in $wmgUseCategoryTree [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106174 (owner: 10Reedy) [01:07:27] (03Merged) 10jenkins-bot: Fix indenting in $wmgUseCategoryTree [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106174 (owner: 10Reedy) [01:14:33] !log reedy synchronized wmf-config/ [01:14:41] Logged the message, Master [01:17:33] !log reedy updated /a/common to {{Gerrit|Ib7ed56216}}: Fix indenting in $wmgUseCategoryTree [01:17:37] (03PS1) 10Reedy: Remove wmgCategoryTreeDynamicTag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106180 [01:17:40] Logged the message, Master [01:18:06] !log kaldari synchronized php-1.23wmf9/extensions/MobileFrontend/ [01:18:12] Logged the message, Master [01:18:38] (03CR) 10Reedy: [C: 032] Remove wmgCategoryTreeDynamicTag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106180 (owner: 10Reedy) [01:18:59] (03Merged) 10jenkins-bot: Remove wmgCategoryTreeDynamicTag [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106180 (owner: 10Reedy) [01:19:41] Reedy: can i pop into test wiki, mw1017, i want to remove a single try/catch block so we get the original exception in the errors logs (mediawiki throws away information about nested exceptions). I can't get the error to reproduce on our labs or in vagrant (or in beta) [01:20:01] Reedy: in the flow exception [01:20:05] s/exception/extension/ [01:20:35] !log reedy synchronized wmf-config/ [01:20:42] Logged the message, Master [01:25:46] !log bsitu synchronized php-1.23wmf9/extensions/Flow 'Revert "Utilize BufferedCache in TreeRepository"' [01:25:52] Logged the message, Master [02:02:21] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [02:20:11] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:21] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:27:28] !log LocalisationUpdate completed (1.23wmf9) at Wed Jan 8 02:27:28 UTC 2014 [02:50:40] Is wikitech down? [02:51:55] !log LocalisationUpdate completed (1.23wmf8) at Wed Jan 8 02:51:54 UTC 2014 [03:01:01] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.252 second response time [03:01:11] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.548 second response time [03:06:03] (Now appears to be back up.) [03:26:40] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Jan 8 03:26:40 UTC 2014 [03:26:46] Logged the message, Master [03:27:24] James_F: again? [03:27:34] huh [03:27:40] 21:25 <+icinga-wm> PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:27:47] virt0 == wikitech [03:27:55] Yeah. It came back. [03:27:59] yeah [03:28:11] just, confirming that a bot saw the same as you :) [05:03:21] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [06:43:38] (03CR) 10Jeremyb: [C: 031] "This fixes some broken redirects in addition to the refactoring/cleanup." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106107 (owner: 10Jeremyb) [08:04:21] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [08:04:33] (03PS9) 10Physikerwelt: Add Mathoid module (TeX -> MathML / SVG conversion web service) [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 [08:05:03] (03CR) 10Physikerwelt: "@GWicke: Any updates here?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/90733 (owner: 10Physikerwelt) [09:08:28] (03PS1) 10Yuvipanda: Deploy Extension:MobileApp to betalabs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106217 [09:08:40] hashar: ^ is all that is needed in this case, I think? [09:08:48] I'm creating a tracking bug now [09:09:56] YuviPanda: and probably need an entry in extension-list-labs [09:10:07] though you don't have any i18n message [09:10:14] hashar: hmm, extension-list-labs is empty [09:10:17] is that expected? [09:10:18] ahh [09:10:34] yeah it get emptied whenever extensions are moved to production [09:10:41] in that case, it lands in extension-list [09:10:44] ah [09:10:45] ok [09:10:48] i'll just add this there then [09:10:50] as in [09:10:55] to extension-list-labs [09:11:00] but I think we have some $wg setting for it nowadays [09:11:33] (03PS2) 10Yuvipanda: Deploy Extension:MobileApp to betalabs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106217 [09:11:38] hmm, updated it anyway [09:12:47] hashar: it might still be used, I see a wgExtensionEntryPointListFiles in CommonSettings-labs.php [09:14:44] YuviPanda: yeah that variable :D [09:14:50] :) [09:14:51] can't remember how it works though [09:14:54] heh [09:14:58] i think it is used by mergeMessages.php script [09:15:01] but added the extension to that list anyway [09:15:12] so you might not have to use extension-list-labs [09:15:18] and can amend the wiki doc :-D [09:15:53] hashar: well, $wgExtensionEntryPointListFiles[] = "$wmfConfigDir/extension-list-labs"; [09:15:59] hashar: so that file is probably still being used :D [09:17:23] hashar: and I guess the doc is accurate then [09:18:44] hashar: think you can merge that patch? :D [09:19:39] have you filled form 804x3 and got it signed by your chain of commandment? [09:20:17] I'm looking for a pigeon to attach the three copies of 804 for transport [09:20:23] the gargoyles seem to keep eating them all [09:20:31] :D [09:20:43] :P [09:20:59] hashar: do you want me to get approval from someone before merging this? [09:21:07] hashar: greg-g perhaps? or someone else? [09:21:27] na I don't give a shit about approvals [09:21:36] or we will end up a giant bureaucratic organization [09:21:55] I am not sure why we need another extension to push a .less file but I guess you have a valid use case :-D [09:22:01] WIN! :) [09:22:46] hashar: <3 :) [09:24:35] hashar: use case being that they are two very different products, and if I put these files in MobileFrontend then the team that maintains it will never really be using it, and then it will probably just rot [09:24:43] or I'll have to step into using MF properly [09:24:55] hashar: MF also has a slightly complicated RL setup, because of targets and such [09:25:03] which the app does not need at all [09:25:46] hashar: two teams, so separate extensions make sense? [09:26:03] (03PS3) 10Hashar: Deploy Extension:MobileApp to betalabs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106217 (owner: 10Yuvipanda) [09:27:34] (03CR) 10Hashar: [C: 04-1] "The change itself is good but can't be deployed because mediawiki/extensions.git is currently broken. That prevents MobileApp code to be d" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106217 (owner: 10Yuvipanda) [09:27:40] YuviPanda: so yeah that would works [09:27:41] BUT [09:27:58] code is deployed using mediawiki/extensions.git which has been broken for a few days now [09:28:26] so if we merge that change in, MobileApp/MobileApp.php does not exist and will cause beta to dies :/ [09:28:47] gotta wait for mediawiki/extensions.git to be fixed up which is https://bugzilla.wikimedia.org/show_bug.cgi?id=49846 [09:29:07] hashar: ah, right. [09:29:28] hashar: is anyone looking at it? [09:29:41] pinged Chad about it yesterday, but apparently it is not fixed [09:29:50] and I have no clue what needs to be done [09:30:04] pfft, sigh [09:30:05] ok [09:30:10] sorry :( [09:30:20] hashar: nah, that's fine :) [09:30:56] hashar: I'm planning ahead so we don't run into random last minute stuff. Hopefully that gets fixed sometime this week [09:31:07] hashar: can't we run a git submodule init MobileApp by hand? [09:32:37] YuviPanda: clever idea, unfortunately the jenkins job is made to recheck all submodules :D [09:32:44] hah [09:32:47] making sure to erase any live hacks that might have been done on beta :D [09:32:55] good idea, generally :D [09:33:06] I made it so nothing could be done manually (at least in theory) [09:33:11] hmm, yeah [09:33:14] ensuring everything goes via Gerrit / review [09:33:16] since [09:33:23] it will eventually land on production [09:33:42] right [09:33:44] you don't want a live hack on beta to be forgotten when deploying in prod [09:33:49] yeah [09:33:52] and that force people to write feature switches [09:34:01] so the new experimental feature is deployed in prod but disabled [09:34:07] right [09:34:09] then switched on on beta [09:34:11] feature switches are nice [09:34:12] ci at best [09:34:14] yeah [09:34:26] we probably have been using them since day one [09:34:40] I gave a few talks about wikimedia CI / deployment [09:35:04] oh? where? [09:35:09] and basically the summary: how to don't care about deploying bad code because it can be reverted in a second [09:35:27] :D [09:35:31] nice! [09:35:33] s/reverted/disabled/ [09:35:44] talks to local groups or local startups [09:35:46] hashar: hopefully in a few years we can actually do continuous *deployment* :P [09:35:47] nothing fancy [09:35:51] rather than just integration [09:35:52] yeah that is the aim [09:36:02] we talked about it extensively with Greg back in the amsterdam hackaton last year [09:36:10] that is more or less a long term vision [09:36:20] yeah [09:36:27] we already deploy mw two or three times per week [09:36:28] hashar: how long term? [09:36:32] 3 years? 5? [09:36:33] and extensions more or less continuously [09:36:33] 1? [09:36:39] maybe next year [09:36:44] there is a lot to achieve first [09:36:54] daily deployment is probably closer [09:37:01] we need to sort out scap first [09:37:10] and fix up l10n messages taking 30-40 minutes to sync [09:37:10] git-deploy? [09:37:13] or whatever that is called now [09:37:15] + moar integration tests [09:37:48] ah git-deploy is now named either: Sartoris (old legacy name), Trebuchet (new fancy name which might not take up) [09:38:04] or currently either: git-deploy || Ryan deployment system [09:38:06] heh I bet we'll rename it at least once more [09:38:07] haha [09:38:07] I think 'Trebuchet deploy' came up [09:38:17] * YuviPanda waves at ori without the -l [09:38:28] * ori waves at YuviPanda without the -zz [09:38:36] :D [09:38:42] the cabal is waiting for ori to replace all of that with a zeromq python script [09:38:56] with bittorrent [09:39:01] i am a one trick pony, it's true [09:39:12] written in Ruby, because Ops love Ruby [09:39:19] but it's a good trick :P [09:39:41] iirc ops concern with ruby is not having to maintain yet another language facing internet users [09:40:36] hashar: also things like ruby people saying 'oh just use RVM' [09:41:13] ori: shouldn't you be sleeping? [09:42:20] i don't sleep, i busy-loop [09:42:48] sounds very power intensive [09:43:04] It isn [09:43:10] isn't 11 UTC yet, YuviPanda [09:43:18] heh [09:43:32] * Aaron|home likes having compiled https://github.com/MSOpenTech/redis and having ceph inside vagrant...all in windows [09:44:36] * Aaron|home needs to compile hiphop in one of the other 2 VMs for completeness [09:44:49] we don't have a hhvm vagrant role yet, do we? [09:44:58] (03PS1) 10Alexandros Kosiaris: Purge etherpad package [operations/puppet] - 10https://gerrit.wikimedia.org/r/106220 [09:45:00] we do, but it's a stub [09:45:12] hmm, I should probably help out there [09:45:24] would be great [09:45:39] configuring hhvm to run behind nginx takes a minute, but it's cheating [09:45:47] because we're going to be using apache in prod, at least initially [09:45:57] * YuviPanda googles [09:45:57] since we don't want to pile a web server migration on top [09:46:03] http://www.sebastien-han.fr/blog/2013/04/22/play-with-ceph-vagrant-box/ is pretty cool [09:46:13] hhvm has shiny fastcgi support [09:46:28] i couldn't get it to work, but i didn't try very hard, and i know some fixes were committed in the last few days [09:46:33] * Aaron|home has a list of changes need to get that working for swift/mw [09:46:36] *needed [09:46:57] Aaron|home: make it an MWV role [09:47:15] YuviPanda: erik b. has it running behind apache in https://gerrit.wikimedia.org/r/#/c/105834/ [09:47:28] ori: hmm, so it our end result would be nginx -> varnish -> apache -> (fastcgi) hhvm? [09:47:32] but that's using ProxyPass + HTTP, no fastcgi [09:47:54] YuviPanda: for SSL, yes. [09:48:09] hmm, right [09:48:30] https://bugzilla.wikimedia.org/show_bug.cgi?id=59793 [09:48:38] ughh...can somebody saw oauth? [09:48:45] it's not the sort of design you sketch out for yourself; it's the sort of design that makes sense for a distributed organization with different teams working on different parts of the stack [09:49:09] it still needs to be simplified IMO but i think that's uncontroversial; just a matter of when [09:49:11] * Aaron|home handles ori https://gerrit.wikimedia.org/r/#/c/105241/ [09:49:34] * ori looks [09:49:49] how come no one harasses aaron for being up? he's in sf too [09:50:04] * YuviPanda saws OAuth for Aaron|home [09:50:07] erm, "hands" not "handles" ;) [09:50:13] * Aaron|home needs to sleep [09:50:23] ori: heh, we should have a bot that does the nagging... [09:50:53] * Aaron|home is also leaning into high left arm in a bad position and it's going numb now [09:51:06] *his [09:51:25] go sleep Aaron|home [09:51:27] geez [09:51:31] * Aaron|home changes position [09:51:31] :-P [09:51:33] ah...better [09:52:08] ori: I wonder if the UID stuff could use shm if available [09:53:03] what would that buy you? [09:53:39] in theory though, the only difference would be it not getting caught up in the fsync daemon, which might make no difference [09:54:06] * Aaron|home was also thinking about getLocalReference() using that but was afraid of spamming it with too much stuff [09:54:20] * Aaron|home wonders if domas was joking with that suggestion or not [09:54:46] it would help for those copy scripts, since that's actually a fair amount of mbs [09:56:47] i have a vague sense that HHVM might change the rules of the game a little but dunno [09:56:59] now sleep! [09:57:07] You can't! [09:57:10] (03CR) 10Alexandros Kosiaris: [C: 032] Purge etherpad package [operations/puppet] - 10https://gerrit.wikimedia.org/r/106220 (owner: 10Alexandros Kosiaris) [09:57:11] It's not 11 UTC yet! [09:57:20] what happens at 11 UTC? [09:57:34] You get to bed at 11 UTC [09:57:57] ori: making fun of you; remember the day Nemo pinged you about why you're still working at 3 AM? :-) [09:58:04] ah, heh [09:58:24] yes, it's probably accurate [10:01:36] ori: https://gerrit.wikimedia.org/r/105677 [10:02:14] ori: i thought you were going to sleep :P [10:02:23] Nemo_bis: 01:45 configuring hhvm to run behind nginx takes a minute, but it's cheating [10:02:40] it takes more than a minute and it's not cheating, i was being hyperbolic [10:02:49] but it's still not exactly right [10:02:52] anyway, https://github.com/facebook/hhvm/issues/1437 worth following it seems [10:03:16] well there are measurable improvements [10:03:16] hmm, $wgServer bites again! [10:03:29] heh [10:04:16] * Aaron|home read "hyperbolic" as in "hyperbolic topology" [10:04:35] Nemo_bis: HHVM + nginx is a great setup; the only thing going against it is the fact that it'd be too hard to swap out Apache at the moment [10:04:52] ori: not for everyone though [10:05:16] hmm does wikiapiary have stats on how many use apache [10:05:21] Nemo_bis: i wasn't disparaging Nikerabbit's patch; it looks really cool [10:06:18] YuviPanda: also, have you seen ? [10:06:37] oh, someone set us up on Travis? [10:06:38] one failure, already fixed upstream: https://bugzilla.wikimedia.org/show_bug.cgi?id=55532 [10:06:49] quite nice! [10:06:59] ori: is that building with hhvm? [10:07:04] yeah [10:07:05] * ori nods [10:07:12] nice! [10:07:21] ori: did you test ebernhardson's patch? [10:07:27] https://github.com/wikimedia/mediawiki-core/blob/master/.travis.yml [10:07:30] YuviPanda: nope [10:08:09] ori: and I wasn't defending it, just wondering how the different usecases can help each other benefit from the experiments around hhvm [10:08:45] ori: hmm, okay. [10:08:46] i wonder if we should create a mailing list [10:08:55] there is one already.. [10:08:56] and an IRC channel [10:08:56] it'll probably die out eventually after the migration but it might still make sense [10:08:58] :P [10:09:27] rather, shouldn't there be a bugzilla keyword or at least whiteboard something for hhvm stuff [10:09:52] ori: http://lists.wikimedia.org/pipermail/hiphop/ [10:09:56] Nemo_bis: i was just discussing that with andre over at -dev [10:10:05] (or are you talking of something else= [10:10:07] oh [10:10:07] https://bugzilla.wikimedia.org/show_bug.cgi?id=40926#c5 [10:10:42] * Nemo_bis hadn't noticed the blocker [10:27:54] (03PS1) 10Alexandros Kosiaris: Remove hooper, eiximenis [operations/dns] - 10https://gerrit.wikimedia.org/r/106223 [10:30:26] !log screwed gallium by doing a chown -R jenkins-slave:jenkins-slave on /srv/ that includes the Zuul git repositories :( [10:30:32] Logged the message, Master [10:36:02] hashar: what is fatal: bad revision 'remotes/gerrit/origin' ? [10:38:56] (03CR) 10Dzahn: [C: 031] "thx for also killing eiximenis, was never sure where it originally started" [operations/dns] - 10https://gerrit.wikimedia.org/r/106223 (owner: 10Alexandros Kosiaris) [10:41:58] (03CR) 10Dzahn: "oh..hmm.. see https://twitter.com/byteofprash/status/15689761272373248 f.e., we seem to have some incoming links to this in Google.. worth" [operations/dns] - 10https://gerrit.wikimedia.org/r/106223 (owner: 10Alexandros Kosiaris) [10:42:29] akosiaris: grmbl.. we would break $some URL [10:43:20] correction, it was broken a long time, but links to it exist out there [10:44:16] mutante: which one ? [10:44:37] google: eiximenis.wikimedia.org -ganglia [10:45:20] https://twitter.com/byteofprash/status/15689761272373248 but not that many .. shrug [10:47:41] mutante: seems like all are etherpad pads [10:47:55] https://etherpad.wikimedia.org/p/MediaWikiNotes [10:48:04] akosiaris: yes [10:48:47] old etherpad was reachable under that so people used it, or it was just hostname before it got servicename as well [10:49:07] yeah [10:49:13] well two solutions I see (yoda talk) [10:49:15] before my time as well [10:49:27] 1) point eiximenis as a CNAME to etherpad [10:49:29] and now old etherpad is gone [10:49:44] 2) not care and see if people complain [10:50:00] re 2) they didnt complain all this time [10:50:05] exactly [10:50:08] it's not like it worked yesterday [10:50:18] re 1) good URLs dont change [10:50:22] hmmmm [10:50:41] I vote for 2 [10:51:07] it is called the shooting method akosiaris [10:51:15] leaning to 2, but it's a little weak because the "dont break URLs"mantra is strong [10:51:23] *shouting [10:51:30] but it's already done.. so ok ..2 [10:51:54] all better than remnants in our code that didnt work anyways :) [10:52:03] mutante: yeah but notice the "good" adjective in the mantra [10:52:11] hehe, fair [10:52:19] I 'd say a url having a host part inside ... not so good :-) [10:52:24] hostnames in URLs suck [10:52:29] yea [10:52:44] ok, kill, thx for cleanup :) [10:53:13] ok. I will merge this after decomming hooper and wiping it [10:53:24] thanks [10:53:27] cool [11:05:21] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [11:54:00] akosiaris: would manifests/misc/racktables.pp be appropriate to be a module"? [12:24:20] matanya: it is so small that I am tempted to say no. Anyway... there is an effort to migrate away from racktables so don't waste any time on it [12:24:20] to what akosiaris [12:27:21] ralph has been evaluated but has not received any positive reviews, servermon is a way forward (it is missing still functionality however) [13:01:44] !log jenkins: migrating jobs git url from integration.wikimedia.org to zuul.eqiad.wmnet {{bug|59774}} {{gerrit|106116}} [13:01:50] Logged the message, Master [13:04:43] !log "upgrading" pep8 package 1.4.6-1 --> 1.4.6-1.1 (simply provides python3-pep8) RT #6420 [13:04:50] Logged the message, Master [13:05:36] akosiaris: thank you for the pep8 update [13:09:41] :-) [13:12:56] (03PS1) 10MaxSem: Enable beta mobile diff on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106236 [13:36:53] ori: cool [13:54:33] !log restarting Zuul, it lost track of some jobs and keep changes in its queues for no real reason :/ [13:54:39] Logged the message, Master [14:06:21] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [14:41:47] (03PS1) 10Springle: Expose max_user_connections var in mysql_multi instance .cnf [operations/puppet] - 10https://gerrit.wikimedia.org/r/106254 [14:44:07] (03PS2) 10Hashar: zuul: define push_change_refs has false [operations/puppet] - 10https://gerrit.wikimedia.org/r/105959 [14:48:43] DB errors [14:49:04] קפיצה אל: ניווט, חיפוש [14:49:04] אירעה שגיאה בשאילתה לבסיס הנתונים. שגיאה זו עלולה להעיד על באג בתוכנה. [14:49:04] פונקציה: IndexPager::buildQueryInfo (LogPager) [14:49:04] שגיאה: 1176 Key 'PRIMARY' doesn't exist in table 'logging' (10.64.16.30) [14:50:07] and in english: there was an error in the DB query: function: IndexPager::buildQueryInfo (LogPager) [14:50:19] error: 1176 Key 'PRIMARY' doesn't exist in table 'logging' (10.64.16.30) [14:50:48] probably my fault. s7 is testing a partitioned logging [14:51:43] springle: can't hide revisions [14:52:17] arg more FORCE INDEX [14:52:21] * springle sighs [14:56:43] (03PS1) 10Springle: depool db1041, forced index query errors during partitioning test [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106255 [14:58:15] (03CR) 10Springle: [C: 032] Expose max_user_connections var in mysql_multi instance .cnf [operations/puppet] - 10https://gerrit.wikimedia.org/r/106254 (owner: 10Springle) [15:02:12] (03PS1) 10Hashar: zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 [15:04:10] !log springle synchronized wmf-config/db-eqiad.php 'depool db1041, forced index query errors during partitioning test' [15:04:17] Logged the message, Master [15:07:03] (03CR) 10Dzahn: [C: 032] "root@gallium:~# /usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 4730 --timeout=2" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [15:09:00] tests are lagging again :( [15:11:34] for some reason Jenkins has tests being stuck: https://integration.wikimedia.org/ci/job/operations-puppet-doc/4052/console [15:12:54] how nice [15:13:29] yeah and the queue on left of https://integration.wikimedia.org/ci/ shows red jobs [15:13:33] i.e. they are stalled somehow [15:14:00] I really need to drop Jenkins [15:21:55] thanks springle [15:22:13] matanya: np. sorry about that [15:22:31] thought I'd found all the forced indexes to work around [15:23:40] i guess now you did :) [15:26:37] (03PS2) 10Hashar: zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 [15:30:16] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Better to use the nrpe::monitor_service definition." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [15:30:37] ahhhhh [15:33:35] (03CR) 10Dzahn: [C: 031] zuul.pp retab + some puppet-lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/105958 (owner: 10Hashar) [15:34:40] (03CR) 10Dzahn: [C: 031] zuul: define push_change_refs as false [operations/puppet] - 10https://gerrit.wikimedia.org/r/105959 (owner: 10Hashar) [15:37:19] (03CR) 10Hashar: "bah I copy pasted the zuul check :( Will refactor in a new patchset" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [15:38:40] (03CR) 10Dzahn: "eh yeah, didn't think of the new method either, back then it was state of the art :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [15:39:07] mutante: akosiaris: I will replace the patch with akosiaris proposal [15:39:16] :) [15:39:49] (03CR) 10Springle: [C: 032] depool db1041, forced index query errors during partitioning test [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106255 (owner: 10Springle) [15:41:04] (03CR) 10Hashar: zuul: monitor gearman service (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [15:41:20] (03PS3) 10Hashar: zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 [15:42:44] cool :-). I will merge as soon as jenkins say ok :-) [15:43:18] we can also do the retabbing of zuul.pp if you want , hashar [15:43:23] lgtm [15:45:25] (03PS1) 10Hashar: zuul: switch to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106267 [15:45:29] ah shit [15:45:31] forgot the retab [15:45:34] gotta rebase :-] [15:48:58] (03PS4) 10Hashar: zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 [15:48:59] (03PS2) 10Hashar: zuul: switch to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106267 [15:49:19] (03CR) 10jenkins-bot: [V: 04-1] zuul: switch to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106267 (owner: 10Hashar) [15:49:58] (03PS5) 10Hashar: zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 [15:50:01] (03PS3) 10Hashar: zuul: define push_change_refs as false [operations/puppet] - 10https://gerrit.wikimedia.org/r/105959 [15:50:02] (03PS2) 10Hashar: zuul.pp retab + some puppet-lint fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/105958 [15:50:56] (03PS3) 10Hashar: zuul: switch to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106267 [15:50:58] sometimes I hate gerrit [15:52:52] mutante: so https://gerrit.wikimedia.org/r/#/c/105958/ is the retab, easy enough [15:53:08] and I have rebased all the other changes on top of it [15:53:16] (03PS4) 10Dzahn: do not use generated .htaccess from Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/105944 [15:53:20] https://gerrit.wikimedia.org/r/#/c/105959/ define Zuul push_change_refs to default to false [15:53:29] which is already defined but that is merely for consistency [15:53:56] then the new nrpe::monitor_service is used to monitor zuul gearman https://gerrit.wikimedia.org/r/#/c/106256/5/manifests/zuul.pp,unified [15:54:31] and finally for akosiaris, https://gerrit.wikimedia.org/r/#/c/106267/ refactor the Zuul service check to use nrpe::monitor_service [15:54:32] ouch [15:54:33] (03CR) 10Dzahn: [C: 032] "yes, lgtm, and this first, then the other changes hashar rebased on top of it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105958 (owner: 10Hashar) [15:55:59] (03CR) 10Dzahn: [C: 032] zuul: define push_change_refs as false [operations/puppet] - 10https://gerrit.wikimedia.org/r/105959 (owner: 10Hashar) [15:56:31] hashar/ akosiaris , and now i leave it to you again ... [15:56:34] k? [15:58:16] info: Caching catalog for gallium.wikimedia.org [15:58:54] notice: Finished catalog run in 47.31 seconds [16:00:14] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: monitor gearman service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106256 (owner: 10Hashar) [16:00:40] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: switch to nrpe::monitor_service [operations/puppet] - 10https://gerrit.wikimedia.org/r/106267 (owner: 10Hashar) [16:00:54] (03CR) 10Dzahn: "PS4: changed to leave the option enabled but ALSO copy it to the main config. per csteipp's comments" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105944 (owner: 10Dzahn) [16:01:42] (03PS5) 10Dzahn: do not use generated .htaccess from Bugzilla [operations/puppet] - 10https://gerrit.wikimedia.org/r/105944 [16:02:14] :-) [16:02:48] hashar: I am sure I have asked this before: why is gallium running a postgres ? To satisfy tests ? [16:03:34] yeah the idea was to run tests against a postgres backend [16:03:38] it kept being postponed [16:09:06] akosiaris: hashar did another nice one https://gerrit.wikimedia.org/r/#/c/104978/ [16:09:46] oh I thought you merged that one already [16:11:00] (03PS1) 10Hashar: zuul: document git-daemon URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/106271 [16:11:44] (03CR) 10Dzahn: [C: 032] "apache2ctl configtest - Syntax OK" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105944 (owner: 10Dzahn) [16:13:43] (03PS2) 10Hashar: zuul: update git-daemon URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/106271 [16:13:58] akosiaris: while at it, is a trivial change with Zero impact on prod https://gerrit.wikimedia.org/r/106271 [16:18:45] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: update git-daemon URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/106271 (owner: 10Hashar) [16:22:23] (03PS1) 10Dzahn: comment duplicate use of NameVirtualHost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106272 [16:23:39] akosiaris: mutante: does puppet auto deploy icinga configuration changes? [16:23:48] hashar: should [16:23:49] yes [16:23:53] \O/ [16:23:54] it might take some time though [16:24:00] tis ok [16:24:02] but neon can require some patience [16:24:40] and while around, the Zuul status page at https://integration.wikimedia.org/zuul/ now shows some progress bar [16:24:54] should gives a clue about the process of tests without having to load jenkins [16:25:18] nice! [16:25:37] (03CR) 10Alexandros Kosiaris: [C: 032] beta: remove old parsoid updater [operations/puppet] - 10https://gerrit.wikimedia.org/r/104978 (owner: 10Hashar) [16:25:51] comes from OpenStack,I merely copy pasted :( [16:26:05] (03CR) 10Dzahn: [C: 032] comment duplicate use of NameVirtualHost [operations/puppet] - 10https://gerrit.wikimedia.org/r/106272 (owner: 10Dzahn) [16:26:45] i merged both ion palladium [16:26:57] was on it for the other change anyways [16:29:31] PROBLEM - HTTP on aluminium is CRITICAL: Connection refused [16:29:43] oops? [16:30:21] (03CR) 10Hashar: "> I see die() calls in multiversion/activeMWVersions.php …" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105693 (owner: 10Hashar) [16:30:49] (03PS2) 10Hashar: multiversion: replace die() with print; exit(1); [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105693 [16:31:59] argh the / partitions on wtp10* hosts are full [16:32:05] guess Parsoid filled them up [16:32:49] root@aluminium:~# echo "hi, other root. are you working on this? monitoring just reported apache being down" | wall [16:33:31] mesg n [16:33:40] i like those utilities [16:33:48] sounds like an ancestor for IRC [16:35:27] RECOVERY - HTTP on aluminium is OK: HTTP OK: HTTP/1.1 302 Found - 557 bytes in 0.001 second response time [16:35:49] so... what was it ? [16:42:58] (03PS1) 10Odder: Disable local uploads on Korean Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106273 [16:46:18] puppet ran on icinga. Thank you mutante and akosiari :-]  I am off [16:51:05] (03CR) 10Chad: [C: 032] multiversion: replace die() with print; exit(1); [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105693 (owner: 10Hashar) [16:55:20] (03Merged) 10jenkins-bot: multiversion: replace die() with print; exit(1); [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105693 (owner: 10Hashar) [16:57:31] !log demon synchronized multiversion/ 'No changes, code cleanup. Ia4910d22' [16:57:37] Logged the message, Master [17:02:06] !log demon synchronized wmf-config/CommonSettings.php 'No changes, code cleanup. Ia4910d22' [17:02:12] Logged the message, Master [17:06:47] PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 07 Jan 2014 11:01:51 PM UTC [17:11:36] (03PS6) 10Chad: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 [17:12:43] (03CR) 10Chad: [C: 032] Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 (owner: 10Chad) [17:12:51] (03Merged) 10jenkins-bot: Move all other small misc. wikis over to Cirrus [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/104765 (owner: 10Chad) [17:14:03] !log demon synchronized cirrus.dblist [17:14:09] Logged the message, Master [17:14:32] !log demon synchronized wmf-config/InitialiseSettings.php [17:14:38] Logged the message, Master [17:21:41] (03PS1) 10Manybubbles: Give dewiki and all wikibooks Cirrus BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106280 [17:28:41] ^d: 566 PHP Warning: Pool error searching Elasticsearch: pool-queuefull [Called from CirrusSearch\{closure} in /usr/local/apache/common-local/php-1.23wmf8/extensions/CirrusSearch/includes/Searcher.php a [17:28:43] t line 655] in /usr/local/apache/common-local/php-1.23wmf8/includes/debug/Debug.php on line 301 [17:28:45] not sure what that is [17:28:46] ruh roh [17:28:58] servers are a bit busy now [17:29:01] <^d> Pool queue is full. [17:29:03] but don't look especially angry [17:29:09] I mean, I don't know why it is full [17:29:15] * ^d dunno [17:30:18] seems to have moved on [17:30:35] large spike in query time across the whole cluster [17:32:04] !log demon synchronized php-1.23wmf9/extensions/CirrusSearch [17:32:12] Logged the message, Master [17:32:18] that should help once it gets to wmf8 [17:32:33] !log demon synchronized php-1.23wmf8/extensions/CirrusSearch [17:32:40] Logged the message, Master [17:32:41] (03CR) 10Chad: [C: 032] Give dewiki and all wikibooks Cirrus BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106280 (owner: 10Manybubbles) [17:33:08] (03CR) 10Chad: [V: 032] Give dewiki and all wikibooks Cirrus BetaFeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106280 (owner: 10Manybubbles) [17:33:39] !log demon synchronized wmf-config/InitialiseSettings.php [17:33:42] looks like we got some timeouts [17:33:45] Logged the message, Master [17:33:55] probably concurrent searches on the indexes that you are about to build [17:34:41] <^d> They're mostly built :p [17:34:45] nice [17:37:20] the timeouts stopped [17:37:24] which is nice [17:37:38] I've got that going for me [17:37:51] greg-g: thanks [17:37:55] :) [17:38:12] I just like to show that I'm paying attention, really. :) [17:38:20] <^d> All of these are just a few hundred to a few thousand pages, tops. [17:38:29] <^d> sourceswiki is about 108k, which was more than I thought. [17:38:33] <^d> But almost done, either way :) [17:39:03] cool. I imagine sources will take some time to index once it is queued [17:39:08] minutes [17:39:10] whole minutes [17:40:28] <^d> Pass 1 done for all new wikis, queues all drained. [17:41:04] !log reissued old star.wikimedia.org certificate [17:41:11] Logged the message, RobH [17:47:03] (03PS1) 10RobH: reissue star.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/106286 [17:50:06] (03CR) 10RobH: "Plan: merge star.wikimedia.org.key, then merge this, then shred the chained cert off cp1043/1044, run puppet, profit." [operations/puppet] - 10https://gerrit.wikimedia.org/r/106286 (owner: 10RobH) [17:55:09] (03CR) 10RobH: [C: 032] reissue star.wikimedia.org certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/106286 (owner: 10RobH) [17:56:42] ok, star.w.o cert and key merged, pushing live on cp1043 presently [17:58:08] mutante: so that failed, bad chain file on cp1043 [17:58:15] i removed it... whyyyyyy [17:58:22] Restarting nginx: nginx: [emerg] SSL_CTX_use_certificate_chain_file("/etc/ssl/certs/star.wikimedia.org.chained.pem") failed (SSL: error:0906D066:PEM routines:PEM_read_bio:bad end line error:140DC009:SSL routines:SSL_CTX_use_certificate_chain_file:PEM lib) [17:58:30] ls [17:59:42] "bad end line" ? [18:00:07] PROBLEM - HTTPS on cp1043 is CRITICAL: Connection refused [18:00:13] yea, checking now [18:00:14] did you see puppet creating it? [18:00:17] i did [18:00:28] but now i pulled them all again, redoing [18:00:36] if it doesnt happen twice, it didnt happen,right? [18:00:42] hmmm [18:00:54] it should do this: [18:01:10] /bin/cat ${certname}.pem ${ca} > ${location}/${certname}.chained.pem [18:01:18] i see it make the chain in the puppet run [18:01:30] so if both $cert and $ca file are fine by itself [18:01:36] then the rest should be fine [18:01:55] does the last line look like on other certs? [18:03:59] not sure what you mean [18:04:14] bleh, i wiped all star.wikimedia.org cert and keys off cp1043 manually, run puppet [18:04:18] it downloads them and creates chain [18:04:21] but something is borked. [18:04:32] i mean compare it to the one that worked [18:04:34] the last line [18:04:41] how does it end? newline? [18:04:41] no carrier return [18:04:42] seems fine [18:04:44] no newline [18:05:27] ahh [18:05:28] found it [18:05:34] its not copying down my new private key [18:05:36] -----END CERTIFICATE----------BEGIN CERTIFICATE----- [18:05:38] <-- [18:05:43] that doesnt look right [18:05:47] ? [18:05:50] where do you see this? [18:05:56] https://gerrit.wikimedia.org/r/#/c/106286/1/files/ssl/star.wikimedia.org.pem [18:06:01] root@cp1043:~# cat /etc/ssl/certs/star.wikimedia.org.chained.pem [18:06:19] needs newline between the two certs [18:06:24] so add one to the first one [18:06:29] ohh, you are right [18:06:34] but that is made automatically [18:06:40] it just does "cat" [18:06:46] to combine them [18:07:22] so i manually add the carriage return [18:07:39] yea, do it and start nginx [18:08:03] fail [18:08:05] same error... [18:08:32] oh, i see it maybe. lets see [18:09:18] fixed [18:09:20] mutante: ytm [18:09:38] one more line? [18:09:40] so yea, it recreated the chain, but doesn't do the newline between them [18:09:44] nah, miscut the ---- [18:09:50] had one too many after end cert [18:09:51] =P [18:09:58] ah, ok [18:10:06] well, that was exciting! [18:10:07] RECOVERY - HTTPS on cp1043 is OK: OK - Certificate will expire on 08/24/2015 12:06. [18:10:10] hmm,, recovery? [18:10:11] huzzah! [18:10:12] there we go..good [18:10:20] ok, going to fix cp1044 the same way [18:10:24] mutante: srsly, thank you! [18:10:24] kk [18:10:29] np [18:10:30] i really appreciate it [18:12:50] !log cp1043 and cp1044 have had the reissued star.wikimedia.org certificate put into place [18:12:53] \o/ [18:12:56] Logged the message, RobH [18:13:05] so now all that is left is integration and doc! [18:14:38] RobH: and the http check outputs the expiry date [18:14:58] being the normal HTTPS check,, that was the one [18:15:27] doc/int - talked to hashar yet? [18:15:32] you saw the ticket comments right [18:18:15] chained permission check is ok, so whew... [18:26:58] PROBLEM - Varnish traffic logger on cp1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:37] PROBLEM - Varnish HTTP text-backend on cp1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:27:47] PROBLEM - Varnish HTCP daemon on cp1055 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:35:00] ottomata: hey, you meeting? [18:40:48] anybody aware of mail servers being slow? There are some comments in https://bugzilla.wikimedia.org/show_bug.cgi?id=59731 [18:43:32] Yes, I still haven't gotten the password reminder to a mailing list I requested three weeks ago [18:43:41] LeslieCarr: ^^ re mail servers (hi RT duty person ;) ) [18:43:54] haha [18:43:59] ok, let me look [18:44:06] if there's a 3 week ago thing, thats not the mail servers [18:44:07] twkozlowski: I think you should give up waiting for it and try again :P [18:44:37] Reedy: I would if I remembered what mailing list it was :-P [18:45:08] it is also possible it's just yahoo causing delays [18:45:17] it does sometimes do that to mail servers when it receives large volumes [18:45:18] LeslieCarr: Well, at first I thought it was a couple of hours delay [18:45:37] (03PS1) 10EBernhardson: Add global permissions for Flow [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106306 [18:45:37] then... "Mmm, 12 hours delays happened to me before..." [18:45:43] and then I forgot about it :-P [18:46:51] LeslieCarr: tail -f /var/log/exim4/mainlog or very close to that while you let him try again [18:46:54] sodium [18:47:07] if it's lists. [18:47:31] yeah i'm checking out the exim queues on sodium [18:48:37] !log removing all messages older than 7 days from exim queue on sodium [18:48:43] Logged the message, Mistress of the network gear. [18:48:53] oops [18:49:40] oh crap [18:49:41] sorry manybubbles [18:49:49] was eating lunch [18:50:01] (03CR) 10RobH: [C: 031] "this looks right to me, but I prefer not push it live until someone else can glance (plus i want hashar around for it)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [18:50:31] !log removing all messages older than 24 hours from exim queue on sodium (all bounces) [18:50:38] Logged the message, Mistress of the network gear. [18:51:31] twkozlowski: areyou on yahoo mail ? [18:52:27] RECOVERY - Varnish HTTP text-backend on cp1055 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.000 second response time [18:52:33] LeslieCarr: No. [18:52:37] RECOVERY - Varnish HTCP daemon on cp1055 is OK: PROCS OK: 1 process with UID = 111 (vhtcpd), args vhtcpd [18:52:40] But I think I have an example of a user who is [18:52:44] lemme dig it for you [18:52:47] RECOVERY - Varnish traffic logger on cp1055 is OK: PROCS OK: 2 processes with command name varnishncsa [18:53:04] (03CR) 10Hashar: "As long as we don't alter the DNS entry for integration.wikimedia.org it should be harmless. We could test it out using our /etc/hosts." [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [18:53:05] (03PS1) 10BBlack: A much better fallocate() patch [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/106307 [18:53:06] (03PS1) 10BBlack: varnish (3.0.3plus~rc1-wm29) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/106308 [18:53:12] ah it's just that i noticed yahoo mail has the largest backlog [18:53:17] (03PS1) 10BBlack: varnish (3.0.5plus~wmftest-wm3) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/106309 [18:53:37] (03CR) 10BBlack: [C: 032 V: 032] A much better fallocate() patch [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/106307 (owner: 10BBlack) [18:53:48] http://lists.wikimedia.org/pipermail/translators-l/2013-July/002345.html [18:53:50] (03CR) 10BBlack: [C: 032 V: 032] varnish (3.0.3plus~rc1-wm29) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/106308 (owner: 10BBlack) [18:53:54] http://lists.wikimedia.org/pipermail/translators-l/2013-July/002356.html [18:54:00] since it's RT duty, you should make an RT for her [18:54:00] LeslieCarr: ignore the lack of the Enter key [18:54:02] (03CR) 10BBlack: [C: 032 V: 032] varnish (3.0.5plus~wmftest-wm3) unstable; urgency=low [operations/debs/varnish] (3.0.5-plus-wm) - 10https://gerrit.wikimedia.org/r/106309 (owner: 10BBlack) [18:54:14] mutante: ? [18:54:29] twkozlowski: just saying put it on some ticket [18:54:36] How do I do that? [18:54:38] because you said there already is bugzilla [18:54:47] yep, yahoo is delaying our mail [18:55:04] twkozlowski: mail ops-requests@rt if you like [18:55:09] mutante: I think you are mistaking me for someone else [18:55:17] I didn't say there was a Bugzilla bug for it [18:56:06] twkozlowski: sorry, i totally did. it was andre [18:56:08] 10:47 < andre__> anybody aware of mail servers being slow? There are some comments in https://bugzilla.wikimedia.org/show_bug.cgi?id=59731 [18:57:06] Well, my only addition to this is that's been occurring for many people with different mail boxes for a long time [18:57:33] wanadoo.fr seems to be sth else than Yahoo [18:57:35] ok [18:58:08] "Wanadoo was the ISP division of Orange S.A." [18:58:26] Orange S.A., formerly France Télécom S.A., is a French multinational telecommunications corporation [18:59:19] indeed, wanadoo.fr redirects to orange.fr :-) [19:01:47] RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Wed Jan 8 19:01:44 UTC 2014 [19:02:57] heh, wanadoo [19:03:01] That goes back a few years [19:04:30] hosts Reedy on geocities [19:11:11] Coren: So I have a labs question, I guess you are the one to ask these days? [19:11:20] I need to replace the star.wikimedia.org cert on virt1000 [19:11:27] RobH: Probably. [19:11:29] I'd prefer to NOT use another wildcard [19:11:53] looks like it just needs https://virt1000.wikimedia.org cert? [19:12:10] and only apache is presently using it? [19:12:24] (I dunno the virt infrastructure as I should, so I just want to make sure nothing else uses it) [19:12:50] Hm. It should be wikitech.wikimedia.org. puppet.wikimedia.org also has a cert but that one should be self-signed. [19:12:58] (Puppet uses itself as CA) [19:13:35] huh [19:13:45] virt1000 is the future new wikitech. [19:13:48] the apache vhost doesntt reference wikitech [19:14:16] its where it lives now? [19:14:20] or only planned? [19:14:21] virt0 [19:14:31] ok, so now on virt0, but will be virt1000 eventually [19:14:35] Right. [19:14:39] As we move to eqiad [19:14:40] so have to fix virt0 cert as well, ok [19:14:52] I'm going to add that to the ticket and then we'll get the cert, just wikitech.wikimedia.org though right? [19:15:22] I think virt0 doesn't use the star cert; it's on virt1000 because of its temporary nature. Lemme double check. [19:15:40] hrmm [19:15:42] yea [19:15:43] you are right [19:15:52] virt0 uses a wikitech specific certificate [19:16:14] Coren: So, the wildcard cert on virt1000 has been reissued, and I don't want to put new wildcard certs on hosts if i can help it [19:16:18] there is also labsconsole.wikimedia.org [19:16:27] and labs. redirects to wikitech from cluster [19:16:31] for convenience [19:16:44] well, it rewrites the fqdn as well [19:16:50] that is because we merged labs and wikitech [19:16:53] labsconsole is the legacy name; I dunno if we need to keep it around anymore. [19:16:53] so seems it doesnt need its own cert [19:16:54] ok [19:17:02] but gtk [19:17:18] Coren: So what exactly is virt1000 using the old wildcard for now and can we remove it? [19:17:43] if it still neesd the wildcard, we can possibly put the new one on there [19:17:45] but i rather not [19:17:57] RobH: It's used mostly for testing while gage is doing the new openstack infrastructure. [19:18:02] (also, if its really not doing anything, the only result from inaction is certificate error) [19:18:14] jgage: ^ [19:18:18] RobH: We'll certainly move the real cert once eqiad labs goes live. [19:18:25] (afk for a few) [19:18:53] hrmm [19:19:14] hi. i'm doing new openstack infrastructure? news to me :) [19:19:47] Did I say gage? I mean mhoover [19:19:47] welcome to ... :) [19:19:48] :-) [19:19:57] phew! ok :) [19:21:33] Coren: you still don't really want to use labs.wikimedia.org, because it's already used for interal purposes, right [19:22:04] once added the redirect on cluster redirects only because people kept trying, so much easier than labsconsole or wikitech if we always talk about labs [19:22:28] but Ryan said we don't want it because it should be separate and is already used [19:22:40] Coren: cool, i've cc'd mhoover to ask about it then [19:22:49] I apprecicate the prod in the right direction [19:34:45] Greetings. I just requested shell access at wikitech. [19:37:36] (03PS2) 10Hashar: varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 [19:38:38] (03PS1) 10Hashar: refreshDomainRedirects exit(1) on error [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106312 [19:38:39] (03PS1) 10Hashar: docs/integration.mediawiki.org to wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106313 [19:39:02] (03PS3) 10Hashar: varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 [19:39:12] (03CR) 10Hashar: "Apache conf is in https://gerrit.wikimedia.org/r/#/c/106313/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [19:39:27] hcohl: welcome, you mean shell on labs? [19:40:32] (03CR) 10Faidon Liambotis: [C: 04-1] "Some commit message comments." [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [19:40:39] hi. I think that's right, wmflabs.org [19:40:58] hcohl: know about #wikimedia-labs already , that might be more effective to ping for this [19:41:52] (03PS2) 10Faidon Liambotis: refreshDomainRedirects exit(1) on error [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106312 (owner: 10Hashar) [19:42:01] (03CR) 10Faidon Liambotis: [C: 032] refreshDomainRedirects: exit(1) on error [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106312 (owner: 10Hashar) [19:42:24] paravoid: if we manage to migrate all of docs / integration to misc varnish that would be awesome :-D [19:42:34] one step toward reclaiming the public IP on gallium \O/ [19:42:47] yeah, that's the idea [19:42:55] it was supposed to happen since Friday [19:43:00] hashar: give it to me so we can serve Bugzilla to XP users :p [19:43:26] paravoid: I am not sure I get your comment on the misc-varnish change at https://gerrit.wikimedia.org/r/#/c/105939/ [19:43:38] paravoid: can't we setup gallium as varnish backend right now ? [19:43:42] then switch using dns? [19:43:48] yup, we can [19:44:04] pong RobH :D [19:44:30] mutante: and bugzilla is going to support some HTML5 apparently. [19:44:50] gwicke: http://ceph.com/dev-notes/atomicity-of-restful-radosgw-operations/ [19:45:00] heelllo [19:45:08] hashar: 2012 called... [19:45:09] hehe, I was thinking "that kind of is suboptimal" after reading [19:45:15] hashar: i just meant https on a single IP and SNI.:) let's talk about that later, one by one, go on with the docs/int :) [19:45:16] then I saw http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6295 today ;) [19:45:30] mutante: they updated their template to finally get rid of the DOCTYPE HTML 4.01 :D [19:45:42] AaronSchulz: context? :) [19:45:55] though in theory that way is actually better with btrfs since it avoids the indirection I/O [19:46:16] hashar: :) [19:46:38] *that old way [19:47:02] * gwicke clicks [19:47:40] * AaronSchulz just reading/testing around [19:47:41] paravoid, read http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_133.pdf last night and forwarded the link to Aaron [19:47:42] I have Dumpling running on my big beautiful Debian box (and Emperor on my Win7 box) [19:47:42] mutante: that was a bug from 2005 https://bugzilla.wikimedia.org/show_bug.cgi?id=1463 :D [19:47:46] * AaronSchulz hugs VirtualBox [19:49:18] about idempotence and eventual consistency [19:50:49] btw, we have a FB Open Academy project that is about to start to develop a Cassandra backend to the mass round-trip test server we use for parsoid [19:50:57] (03PS4) 10Hashar: varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 [19:51:11] (03CR) 10Hashar: "docs -> doc in commit message" [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [19:51:43] (03PS5) 10Faidon Liambotis: varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [19:51:55] becoming a co-mentor there could be a great opportunity to get your hands dirty with node & cassandra [19:52:07] lol, "offlineable systems" [19:52:11] is that to me or aaron? :) [19:52:12] or both? [19:52:14] * AaronSchulz should Tim if that is a word [19:52:27] *ask Tim [19:52:37] paravoid, I'm fishing for any of you ;) [19:52:49] (03PS2) 10Hashar: doc/integration.mediawiki.org to wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106313 [19:52:58] (03CR) 10Hashar: "docs -> doc" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106313 (owner: 10Hashar) [19:53:00] AaronSchulz, Tim as a verb might work too [19:53:01] (03PS3) 10Hashar: doc/integration.mediawiki.org to wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106313 [19:53:08] stupid me [19:53:21] hashar: don't feel obliged to do any of these, RobH should be able to handle all of these [19:53:37] tim; verb; to get the authoritative truth [19:54:03] gwicke: I think that if I have more people relying on me right now, I'll just fail them [19:54:13] hands full [19:54:24] k, that's fine [19:54:34] they were before, and we keep losing opsens [19:54:38] * AaronSchulz watches paravoid go into kernel panic mode [19:54:42] yeah, I know [19:54:51] dying breed [19:54:58] you'll start swapping first...and thrashing [19:55:01] * hashar looks for a kernel 0day to help ops [19:58:17] mutante: no ping from #wikimedia-labs [19:58:27] AaronSchulz, http://ceph.com/dev-notes/atomicity-of-restful-radosgw-operations/ looks fairly low-level to me [19:58:34] on the level of a single blob [19:59:22] it mentions wanting to avoid cross-rgw locking, which is good...I was just wining about the implementation [19:59:33] *whining, ugh [19:59:43] i need a house can't work [19:59:48] see you tomorrow [19:59:51] heh [20:00:12] hashar, good night! [20:00:38] AaronSchulz, that is more akin to splitting up a large file into chunks and updating the manifest last and atomically [20:01:06] the manifest is useful for rados striping indeed [20:01:25] in addition to when btrfs is not around [20:01:39] since the chunks are spread over PGs [20:01:55] * AaronSchulz debugs a test failure for https://gerrit.wikimedia.org/r/#/c/86642/ [20:08:38] AaronSchulz, it does sound as if ceph does destructive updates to chunks instead of just the manifest though [20:11:58] but http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6295 sounds as if that is at least partly addressed [20:28:48] (03PS1) 10RobH: moving integration and doc behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 [20:29:35] (03CR) 10RobH: "this doesnt merge and go live until after the patchsets for the configuration of integration and doc" [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 (owner: 10RobH) [20:30:01] hashar: Ok, merging your apache request first cuz it can affect the cluster and im paranoid [20:31:59] (03CR) 10RobH: [C: 032] doc/integration.mediawiki.org to wikimedia.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/106313 (owner: 10Hashar) [20:33:12] (03CR) 10Hashar: [C: 04-1] "The mediawiki.org should be sent to the production cluster which would redirect them to wikimedia.org entries." [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 (owner: 10RobH) [20:33:28] RobH: mediawiki.org need to be sent to the production varnish [20:33:43] thus they will hit the production cluster which would redirect them to wikimedia.org [20:34:08] we can look at it after [20:36:19] ahh, cool [20:36:27] apache test worked, im gonna push to all of them now [20:36:39] mw1220 didnt die and returns proper output [20:38:25] hahaha, my new laptop is fast enough onw that apache-graceful-all's ssh agent/keychecking doesnt cause the script to fail [20:38:34] \o/ [20:38:47] (my old air would, and I would have to cheat and use salt to roll apache restarts) [20:40:43] #firstworldproblems [20:41:05] ok, apache reboots done and site isnt down [20:41:07] so yay. [20:41:19] ok, merging the misc-web-lb changes now [20:41:25] then we'll test and do dns. [20:41:32] note the DNS is wrong [20:42:44] yep, i saw, i have to change the mediawiki.org [20:42:52] and i need to append the addtional rt reference to my commit msg [20:43:03] figured i'd do that while you test once i have varnish workin [20:43:04] =] [20:43:32] (03PS6) 10RobH: varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [20:43:37] rebaseeeee [20:45:36] (03PS2) 10RobH: moving integration and doc behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 [20:45:45] (03CR) 10RobH: [C: 032] varnish: adds gallium to misc-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/105939 (owner: 10Hashar) [20:45:56] ok, pushing live on cp servers [20:48:19] live on one of two [20:48:24] \O/ [20:48:52] so once this is done, i guess we should leave the old stuff on integration for a few hours for dns to propogate [20:49:05] (once we finish it all that is) [20:49:05] I guess [20:49:12] also looking at the DNS entry https://gerrit.wikimedia.org/r/#/c/106321/2/templates/wikimedia.org,unified [20:49:21] doc and integration point to different misc-web-lb entries [20:49:28] one having a .eqiad suffix [20:49:35] bahhhhhhh [20:49:39] and thats why we have CR [20:49:43] + trailing whitespaces https://gerrit.wikimedia.org/r/#/c/106321/2/templates/mediawiki.org,unified :D [20:50:13] (03PS3) 10RobH: moving integration and doc behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 [20:50:19] ignore that i missed the whitespaces [20:50:25] poor rob [20:51:21] (03PS4) 10RobH: moving integration and doc behind misc-web-lb [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 [20:51:26] i'll take small mistakes like this over major outages any day [20:51:40] I like outage [20:51:42] i've been making modifications to those apache files for years [20:51:47] and they still make me quite nervous [20:51:47] makes me feel like an important person [20:52:06] but said apache configs were the cause of my major site outage [20:52:10] (years and eyars ago) [20:52:23] ok, ps 4 should be good [20:52:49] hashar: if you would be so good as to test things with your local hosts hack? [20:52:59] yeah doing so [20:53:04] cool [20:53:39] then once i strip off the old cert on virt1000 we'll have fully migrated onto the reissued wildcard for all sysetms [20:53:54] or moved them to more narrow scoped certificates... you know that thing we've been slowly doing for a year... [20:53:56] heh [20:54:09] ahah [20:54:14] infinite redirect [20:54:17] =P [20:54:26] bad apache config? [20:54:44] so I did the host hack to point to cp1044 [20:54:47] querying https://integration.wikimedia.org/ [20:54:57] varnish seems to talk to apache on gallium properly [20:55:00] but over http I guess [20:55:03] yep [20:55:08] so gallium complains and emit a redirect to …. [20:55:10] wait for it [20:55:13] https://integration.wikimedia.org/ [20:55:15] loop ! [20:55:18] oh, cuz its forcing https on its host [20:55:21] ? [20:55:27] because the apache conf on gallium does not support X forwarded for [20:55:33] hehe [20:55:36] =P [20:55:40] yeah gallium apache does the redirect [20:55:46] man i cannot even login to gallium [20:55:54] err its it not X forward for [20:56:13] to ssh there, you need to pass via a bastion [20:56:23] iron doesnt count eh? [20:56:29] it needs to, its ops bastion [20:56:38] iron is not in the default ferm rules apparently [20:56:41] so no doesn't work :( [20:57:10] ahh, ok, i hit bast then [20:57:21] we shoudl add iron though while we are modifying things on gallium [20:57:44] ahh X-Forwarded-Proto [20:58:03] phone call, its dell/vendor and second time he called me today, i better take, back in 5 [20:58:11] take your time [21:01:13] ahhh the :80 vhosts have: [21:01:13] RedirectMatch permanent ^/((?!(zuul\/git)).*) https://integration.wikimedia.org/$1 [21:03:58] so you fixing and submitting patchset? (or should i?) [21:04:07] if you make it, add me as reviewer and im happy to review and merge =] [21:05:54] got to clean up the apache conf a bit [21:06:07] hmm [21:06:16] that redirect match above need to be converted to some rewrite rule magic [21:08:00] (03PS1) 10Hashar: contint: remove git over http [operations/puppet] - 10https://gerrit.wikimedia.org/r/106425 [21:08:12] RobH: that one cleanup some old / no more used apache config [21:08:22] which was to publish some repositories over HTTP, instead we are using git:// [21:08:25] forgot to clean that up [21:09:43] hashar: cool, seems fine to me, ok to merge? [21:10:00] i admit that some of the zuul redirection stuff you removed i dunno if it needs, but you certainly would ;] [21:10:37] hashar: if ok with you i'll merge now and push puppet update on gallium [21:12:36] * RobH pokes hashar with pointy poking stick [21:12:53] yup [21:12:57] hold [21:13:04] k [21:13:06] crazy rewrite rule incoming [21:13:07] (03PS1) 10Hashar: contint: support X-Forwarded-Proto for main site [operations/puppet] - 10https://gerrit.wikimedia.org/r/106426 [21:13:17] I took that from apache main.conf [21:14:08] the mediawiki entries are landing on varnish production which send that to integration.wikimedia.org so no change needed for mediawiki.org entries [21:14:18] the we have two cases when hitting integration.wikimedia.org : [21:14:36] 1) direct hit to gallium, X-Forwarded-Proto is not set so we redirect to https [21:15:00] 2) hit misc varnish, which terminate ssl, set X-Forwarded-Proto: https and query gallium [21:15:02] that should work [21:15:20] I say lets merge that and test :-] [21:16:09] ok, merge both patches i assume? [21:16:42] (03CR) 10RobH: [C: 032] contint: remove git over http [operations/puppet] - 10https://gerrit.wikimedia.org/r/106425 (owner: 10Hashar) [21:17:08] yes [21:17:23] (03CR) 10RobH: [C: 032] contint: support X-Forwarded-Proto for main site [operations/puppet] - 10https://gerrit.wikimedia.org/r/106426 (owner: 10Hashar) [21:17:39] ok, merging on palladium now and running on gallium [21:19:02] * RobH waits on puppet run [21:20:01] hashar: Ok, run is finished, test! [21:20:02] =] [21:20:24] apache restarted? [21:20:43] restarting it [21:20:43] should have rehuped, lemme force it [21:20:47] or you can =] [21:20:51] * RobH doesnt do it [21:20:54] ahh [21:21:03] http://integration.wikimedia.org [21:21:04] :D [21:21:05] not found [21:21:35] jus redirect to HTTPS, that works ~ [21:21:45] yeah that was the idea [21:21:51] https://gerrit.wikimedia.org/r/106426 [21:21:55] but I screwed wsomething [21:22:16] oh. heh [21:22:39] hashar: need to rollback? [21:22:53] or fix (just lemme know what you want me to do =) [21:23:05] fix [21:23:07] * AaronSchulz runs into http://tracker.ceph.com/issues/6462 again [21:23:11] trying to figure out how it works [21:23:12] I knew that was familiar [21:23:17] * AaronSchulz upgrades [21:23:26] hashar: i hope you figure it out cuz im trying to follow along but im not sure of the fix at this point ;] [21:26:15] gah, no saucy packages [21:28:46] (03PS1) 10Hashar: contint: rewrite rule needs 'RewriteEngine On' [operations/puppet] - 10https://gerrit.wikimedia.org/r/106431 [21:28:47] RobH: long story short: we need to enable apache rewrite engine :( [21:28:56] RobH: https://gerrit.wikimedia.org/r/106431 fixed it [21:29:24] ahh, ok [21:29:32] i'll merge it and lets see! [21:29:40] live hacked it already [21:29:44] (03CR) 10RobH: [C: 032] contint: rewrite rule needs 'RewriteEngine On' [operations/puppet] - 10https://gerrit.wikimedia.org/r/106431 (owner: 10Hashar) [21:29:46] then I need another patchset to copy paste the https config to http [21:29:51] cool [21:29:59] heh, i didnt wait for cr... [21:30:02] * RobH waits on zuul ;] [21:31:25] hashar: rewrite on is live on palladium, so next puppet run will pull it down, didnt see the point to force the run when you are about to do an additional ps for it =] [21:31:34] and you already live hacked [21:32:19] top thing I hate about apache [21:32:28] (03PS1) 10Hashar: contint: copy/paste website conf for http [operations/puppet] - 10https://gerrit.wikimedia.org/r/106432 [21:32:30] is copy pasting conf between :80 and :443 virtual hosts [21:32:37] which is what https://gerrit.wikimedia.org/r/106432 does [21:33:22] live hacked [21:33:23] works :-D [21:33:35] cool, i'll merge onto palladium and we can puppet run gallium [21:33:40] curl http://integration.wikimedia.org/ [21:33:43] gives me a redirect to https [21:33:47] (03CR) 10RobH: [C: 032] contint: copy/paste website conf for http [operations/puppet] - 10https://gerrit.wikimedia.org/r/106432 (owner: 10Hashar) [21:33:50] curl -v -H 'X-Forwarded-Proto: https' http://integration.wikimedia.org/ [21:33:54] serves content :D [21:34:12] its live on palladium, you have sudo on gallium right? [21:34:16] yeah [21:34:18] running puppet [21:34:22] cool [21:34:42] eventually got granted some privileges to save european ops some time [21:34:44] hashar: thanks for the work on this, more than likely you could have simply made it all my problem [21:34:47] so i appreciate it =] [21:34:51] I kept nagging everyone to get stuff changed on gallium hehe [21:35:07] team work! [21:35:11] \o/ [21:35:15] plus the apache conf on contint is crazy [21:35:38] yes, which is the part i am extra grateful about not having to do ;] [21:36:05] so integration.wikimedia.org is fixed up [21:36:09] gotta look at doc now :( [21:37:01] RobH: wanna play with modules/contint/files/apache/doc.wikimedia.org [21:37:02] ? :D [21:37:28] let me look at it [21:37:31] then i answer you ;] [21:37:44] hint: we don't need to change the mediawiki.org virtual hosts there [21:40:18] RobH: I think I got it [21:40:29] oh, i am looking now, but feel free! [21:40:52] * RobH is also now eating his lunch while working ;] [21:41:09] i had to freakin go home cuz i ran outta test strips for diabetes =P [21:41:16] couldnt wait was starving [21:41:27] definitely eat [21:41:35] (03PS1) 10Hashar: contint: support X-Forwarded-Proto for doc website [operations/puppet] - 10https://gerrit.wikimedia.org/r/106433 [21:41:36] that and sleep() :-D [21:41:40] in process of doing, have for past 10 or so [21:41:41] heh [21:41:59] so https://gerrit.wikimedia.org/r/106433 is doing the something on doc that we did for integration [21:42:13] just removing redirection on http [21:42:16] the pity is that all of that will be deleted after DNS got replicated since gallium will no more serve https hehe [21:42:17] so we dont endlessly loop [21:42:23] heh [21:42:29] yea, its odd that we have to stop gap it [21:42:35] well [21:42:40] that is a good exercise [21:42:53] another way would have to clean out all the conf, drop https then push DNS and wait a few hours [21:42:57] (03CR) 10RobH: [C: 032] contint: support X-Forwarded-Proto for doc website [operations/puppet] - 10https://gerrit.wikimedia.org/r/106433 (owner: 10Hashar) [21:43:22] I really like that X-Forwarded-Proto header hack [21:43:26] its live on palladium [21:43:32] who ever invented that is definitely smart [21:43:32] should be able to puppetrun on gallium [21:43:48] i hadn't done this hack before [21:43:58] i've copied the logic of it into my growing pile of notes [21:43:58] it is heavily used in our varnish conf [21:44:06] yea, but i hadnt messed with our varnish =] [21:44:11] ahh [21:44:17] you should, it is really fun [21:44:26] im liking messing with misc-web-lb [21:44:31] the VCL language is quite powerful [21:44:49] easier to mess with varnish on a much smaller service group =] [21:44:56] no taking down enwiki for me. [21:45:30] restarting apache [21:45:42] wanna share screen while I test ? [21:46:04] oh no you are eating [21:46:14] watching screen is easy [21:46:16] so yes =] [21:46:26] hanging out [21:47:11] attach to what screen? [21:47:15] ah screen yeah [21:47:40] 4970.pts-2.gallium [21:48:22] maybe screen -x works [21:48:24] on it [21:48:25] yep [21:48:29] screen -x screenname [21:49:42] damned screen wont scroll up =P [21:49:49] ahh [21:49:58] so once it scrolls past, its gooooone for me [21:50:00] oh well [21:50:04] i'm still gettiting most of it [21:50:14] so basically varnish tells apache that the query already went via https [21:50:26] so there is no point in forcing https [21:50:37] neat [21:50:44] testing integration [21:50:47] now im stealing all your commands for a moment [21:50:51] so i can run on own later to learn more [21:50:59] well, i steal when you finish =] [21:51:10] hmm, heya [21:51:29] got taught those tricks by mark / maxsem / domas [21:51:40] and we had a crazy issue a few month ago on beta [21:51:49] which involved squid and varnish caching 301 differnetly [21:52:02] curl -H 'X-Forwarded-Proto: https' http://doc.wikimedia.org/ [21:52:11] that is all you need -H pass a custom header [21:52:27] yea used curl and such to pull but not pass custom headers [21:52:31] good stuff [21:52:46] curl -v is nice too [21:52:51] almost as good as my lunch =] [21:52:53] it shows you the header being sent and the one received [21:52:57] * RobH noms red beans and rice [21:53:11] sounds typically american [21:53:19] very [21:53:24] so I think we get all apache conf figured out [21:53:30] need to test out with the host hack [21:53:47] basically in /etc/hosts : 208.80.154.241 doc.wikimedia.org [21:53:52] then replay the same command [21:53:57] (IP being the misc varnish) [21:54:39] curl -i http://integration.wikimedia.org [21:54:39] gives me a redirect by varnish [21:55:06] but if I query the https URL, varnish set the proto header, relay it to apache and i get content [21:56:19] only trouble is that cp1044 and cp1043 cached a different copy :( [21:58:07] just ran enough to get reply from both? [21:58:13] cuz i hit only one so far in my curl [21:58:15] (s) [21:58:31] I have no idea why they are caching different copies [21:59:04] RobH: cache madness http://paste.debian.net/75114/ [21:59:28] * bd808 is deploying a small update to scholarships with the blessing of greg-g  [21:59:31] apparently cache has been cleared [22:00:13] yep, now im getting identical [22:00:35] well, i never got different, i did a bunch in a row and kept getting cp104 [22:00:37] 44 [22:00:46] but now im getting both fairly reliably and they are same [22:01:02] and https://doc.wikimedia.org/ is cached with a redirect to https :( [22:01:13] but if you query https://doc.wikimedia.org/?cachekiller , you get content [22:01:17] indeed [22:01:23] im getting the redirection for it each time [22:01:55] hrmm [22:01:59] i get redirect on curl [22:02:16] gotta purge https://doc.wikimedia.org/ on cp1043 and cp1044 I guess [22:02:21] shouldnt it be giving me the full page? [22:02:29] yup it should [22:02:30] i had it for a single run against 1043 [22:02:41] which it is doing when you add a cache killer like ?cachekill [22:02:47] well, lets see how to purge on varnish then [22:03:01] I usually | mark :-D [22:03:04] squid i knew, varnish, have not done [22:03:09] greg-g: All done. Thanks [22:03:30] hashar: https://wikitech.wikimedia.org/wiki/Varnish#One-off_purges [22:03:37] seems pretty straightforward to me [22:03:43] I love doc [22:04:04] bd808: coolio [22:04:20] RobH: fixed apparently [22:04:26] i didnt do it! [22:04:29] it fixed itself. [22:04:33] maybe it expired [22:04:36] yep [22:04:44] i was about to run the command actually, was typing in fqdn [22:04:45] heh [22:04:58] lesson learned, varnish will magically fix its own shit ;] [22:05:15] hashar: so it sound like we are good for dns push [22:05:33] https://gerrit.wikimedia.org/r/#/c/106321/ [22:05:41] grmblbl the http://doc.wikimedia.org gives me content on cp1044 :( [22:05:42] (03PS1) 10Faidon Liambotis: deployment: s/mw_statsd/mw_carbon/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/106437 [22:05:43] (03PS1) 10Faidon Liambotis: deployment: switch carbon host to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/106438 [22:05:53] wha? [22:06:02] http version [22:06:13] the https had a redirect cached, that one is fixed [22:06:22] now the http version has content cached, which is wrong :-D [22:06:30] http gives me doc moved redirect [22:06:31] yea [22:06:43] (03CR) 10Faidon Liambotis: [C: 032] deployment: s/mw_statsd/mw_carbon/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/106437 (owner: 10Faidon Liambotis) [22:06:45] but its hitting the server for that no? [22:06:46] same for http://integration.wikimedia.org [22:06:49] gallium that is [22:06:52] (03CR) 10Faidon Liambotis: [C: 032] deployment: switch carbon host to tungsten [operations/puppet] - 10https://gerrit.wikimedia.org/r/106438 (owner: 10Faidon Liambotis) [22:06:55] might want to use the purge cache command on all four URLS [22:07:50] so i think its dsh -c -g bits varnishadm -T 127.0.0.1:6082 -S /etc/varnish/secret http://integration.wikimedia.org poweredby_mediawiki_88x31 [22:08:05] from the wikitech doc [22:08:10] hmm almost [22:08:11] though the pwoeredby addtion is odd to me. [22:08:14] drop poweredby_mediawiki_88x31 [22:08:17] yea [22:08:20] that is the icon of mediawiki [22:08:25] someone typoed that into the command when pasting [22:08:26] heh [22:08:27] purge.url is the command to run I think [22:08:40] well, im going to run directly on the hosts rather than dsh [22:08:43] cuz dsh makes me nervous [22:08:48] since its only two hosts. [22:08:52] dsh -c -g bits varnishadm -T 127.0.0.1:6082 -S /etc/varnish/secret purge.url [22:08:53] and you want to replace the dsh "bits" group [22:08:59] (03PS1) 10Jgreen: move fundraising.wm.o from aluminium.* al-fundraising.* [operations/dns] - 10https://gerrit.wikimedia.org/r/106439 [22:09:06] yeah just ssh to both host probably [22:09:22] I can confirm that everything works when using cache killers :-] [22:10:15] so that is good to me beside the bad version being in cache [22:10:26] (03CR) 10Jgreen: [C: 032 V: 031] move fundraising.wm.o from aluminium.* al-fundraising.* [operations/dns] - 10https://gerrit.wikimedia.org/r/106439 (owner: 10Jgreen) [22:11:27] !log moved fundraising.wikimedia.org to 208.80.154.12, flipped DNS [22:11:31] thats not workin for me [22:11:34] Logged the message, Master [22:11:52] so i try varnishadm -T 127.0.0.1:6082 -S /etc/varnish/secret purge.url doc.wikimedia.org and also with the fqdn having http [22:11:57] no dice, so im not getting syntax [22:12:21] (03PS1) 10Faidon Liambotis: gdash: fix code deploy colors for white background [operations/puppet] - 10https://gerrit.wikimedia.org/r/106440 [22:12:36] robh: maybe http://doc.wikimedia.org [22:12:43] i tried that too, same result [22:12:44] (03CR) 10Faidon Liambotis: "Any opinions on colors?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106440 (owner: 10Faidon Liambotis) [22:12:50] :-( [22:12:53] Unknown request [22:13:55] * RobH is googling on it [22:14:45] paravoid: you clear specific cache hits in varnish before? [22:14:56] i ping him since i see him pushing gerrit changes ;] [22:15:19] yes [22:15:23] use the PURGE HTTP method [22:15:34] on localhost right ? [22:15:41] PURGE / HTTP/1.1\nHost: doc.wikimedia.org\n\n [22:15:42] yes [22:16:10] via a curl? sorry lost me [22:16:17] curl, telnet, doesn't matter [22:16:40] ok, ill do real quick on both hosts [22:16:42] thank you! [22:18:26] meh [22:18:33] now all but https integration redirect [22:18:37] =P [22:18:42] moving backwards. [22:18:53] hehe [22:19:03] before https doc was working [22:19:15] telnet localhost 80 on cp1043 and cp1044 to run the purge [22:19:31] PURGE / HTTP/1.1\nHost: doc.wikimedia.org\n\n & PURGE / HTTP/1.1\nHost: integration.wikimedia.org\n\n [22:19:33] hey MaxSem and bd808: max is going to be moving some code from MobileFrontend ext to it's own, wanna watch bd808 ? [22:19:38] two different commands, not one with & [22:19:44] bd808: this is re that email from robla [22:19:54] Sure. [22:19:55] s/re/pertinent to/ [22:20:03] Hangout or ?? [22:20:11] maybe i didnt do purge right... [22:20:11] MaxSem: ^ [22:20:33] RobH: looks like it worked, I got misses :-] [22:20:35] miss [22:20:46] oh, cool, why am i getting redirect answers? =P [22:20:51] my curl tests [22:20:58]

The document has moved here.

[22:21:09] it should be giving me full output... [22:21:11] curl -i http://integration.wikimedia.org|egrep '(X-Cache|HTTP)' [22:21:32] show me that cp1044 serves content for http://integration.wikimedia.org/ [22:21:47] i get 301 moved [22:21:49] ah no more [22:21:59] =/ [22:22:18] its amusing we are getting different results as we run against the exact same sysetms at the same time [22:22:25] by amusing i mean 'this makes no sense' [22:22:28] curl -i http://doc.wikimedia.org|egrep '(X-Cache|HTTP)' [22:22:35] gives me 301 on both caches now [22:22:47] yep, same [22:22:52] there is some kind of magic involved :( [22:22:53] at least we are wrong together. [22:23:05] cool, I've created a calendar entry [22:23:08] mouahah [22:23:20] you kept me hanging MaxSem ;) [22:23:40] RobH: so yeah sorry, that is an issue in varnish or apache somehow. [22:24:30] RobH: should have thought about it before, but varnish is queried with either http or https. But always query gallium over http [22:24:39] RobH: so varnish will cache the reply from gallium [22:24:50] even when the cache should vary based on the client request [22:25:00] MaxSem: Cool I'll be there. It might be nice to log your shell session on tin when we do that too. It may be easier for Greg and I to review later. [22:25:03] eww. [22:25:11] that... kind of makes sense. [22:25:30] we had that on beta, but I can't remember how we got it fixed [22:25:30] MaxSem: what bd808 said [22:26:10] hashar: yea i have zero clue, what happens makes sense now that you explained it, but no clue how to fix [22:26:18] atleast its not broken for real users until dns =] [22:26:35] so if we have to fallback to mailing list to see if someone recalls, its not end of the world [22:26:53] stupid me [22:27:16] yea well if your handling of this is stupid, mine is downright moronic [22:27:18] ;] [22:27:21] bd808, there will be nothing particulary interesting though submodule pull+config change+scap [22:27:48] MaxSem: You'd be amazed what I find to be interesting :) [22:28:40] RobH: LOL [22:28:52] RobH: what is stupid is that I spend literally 3 day s debugging something like that before [22:29:36] its a very specific bit of varnish/apache erroring [22:29:46] at least you had come across it before at all, i had not. [22:30:31] way to learn I guess [22:30:50] so that might need some VCL magic in the misc proxy :/ [22:32:06] who would be the person to handle that? (I am guessing its mark's domain but not sure) [22:32:23] I guess [22:32:28] i dont wanna be stuck where you think im doin it, and i think yer doin it, and it doesnt get done ;] [22:32:34] I can't find reference / bug / change about it :( [22:33:04] or we could just steal that code =] [22:34:45] RobH: ahhhhhhhh [22:34:52] found it? [22:35:01] so I have hit that on beta back in July [22:35:02] bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=51700 [22:35:14] which list all the history, bunch of commands and how to pass headers to curl [22:35:16] that is a long read [22:35:32] you can see all the debugging step I took to figure out the issue [22:35:48] the explanation is comment #11 https://bugzilla.wikimedia.org/show_bug.cgi?id=51700#c11 [22:36:02] I found that on the text varnish [22:36:08] and that would have hit us in production for sure [22:36:16] but at that time prod was still using squid .. [22:36:37] so after a few days of headaches / debugging / feeling like an impostor [22:36:42] !log aaron started scap: timing test (beta) [22:36:49] Logged the message, Master [22:36:50] mark get the fix in 5 lines of VCL : https://gerrit.wikimedia.org/r/#/c/75583/3/templates/varnish/text-backend.inc.vcl.erb,unified :D [22:36:50] yea, its a long read, i'm doing a skim first time trhough, heh [22:37:07] that code snippet is in templates/varnish/text-backend.inc.vcl.erb [22:37:07] yep already on it [22:37:17] so that tiny little bit for vlc_fetch is needed [22:37:18] ? [22:37:21] it should probably be added to the misc varnish as well [22:37:28] indeed [22:37:57] what it does is roughly hack in response sent by apache to make vary by protocol whenever it is a redirect [22:38:04] aka, it fix our use case :-] [22:38:20] I am not sure why it is not in all varnish configs [22:40:32] (03CR) 10Hashar: [C: 04-1] "We can't do the switch yet :(" [operations/dns] - 10https://gerrit.wikimedia.org/r/106321 (owner: 10RobH) [22:40:40] RobH: i commented on the dns change [22:40:49] RobH: we probably had enough varnish fun for now [22:40:59] so probably we should just push the DNS switch [22:41:05] well, if we push this change for the config it should fix it though right? [22:41:13] yeah maybe [22:41:17] worst case we fubar the misc-web-lb cp servers only [22:41:19] worth a ping to mark [22:41:20] so its worth tryin gi think [22:41:25] i bet he is asleep [22:41:30] hrmm [22:41:31] I should as well [22:41:44] so maybe we should follow up tomorrow ? [22:41:53] I think it would be a good varnish exercise :-] [22:41:55] let follow up with him tomorrow morning, since this works as is now [22:41:59] sound right? [22:42:02] yeah [22:42:03] then we dont force downtime at all [22:42:05] cool [22:42:11] go to bed man, thanks for all the help! [22:42:28] that is a bit crazy, sorry we lost so much time with the cache redirect [22:42:42] the worth is that we spent a lot of time for something that is going to be phased out :-] [22:42:42] I'm working from home tomorrow so I'll be online and working an hour sooner than my usual [22:42:47] but, did learn a lot [22:42:52] well, its still an interesting exercise [22:43:00] and i didnt know shit about our varnish setup 72 hours ago [22:43:05] so indeed, still worth the time [22:43:15] ahh great [22:43:18] varnish is nice [22:43:20] plus i still have your bz ticket to review heh [22:43:30] you had a novel of troubleshooting =] [22:44:02] for contint, Apache on gallium is acting as a proxy for some other daemon. Maybe we could get that moved to the mlisc varnish as well :-] [22:44:09] hehe [22:44:20] dont see why not [22:44:29] the misc-web-lb make a lot of sense [22:44:44] in the past we had rolled some misc items like blog into the main squid config [22:44:53] but it was unwieldy [22:45:21] for example: https://integration.wikimedia.org/zuul/status that hit gallium Apache, then apache proxy to Zuul which is running locally [22:45:42] possibly, we could get some VCL that makes /zuul/status to hit gallium directly on Zuul port [22:46:11] varnish - ---- -> gallium: [22:46:19] but that might be too complicated [22:48:12] RobH: follow up tomorrow. midnight there and wife already sleeping :D [22:48:24] yup! [22:51:34] * hashar waves [22:51:36] !log aaron finished scap: timing test (beta) (duration: 17m 15s) [22:51:42] Logged the message, Master [23:08:15] (03CR) 10Ori.livneh: [C: 032] gdash: fix code deploy colors for white background [operations/puppet] - 10https://gerrit.wikimedia.org/r/106440 (owner: 10Faidon Liambotis) [23:09:55] (03PS1) 10Aaron Schulz: Removed features flagged around MW_SCAP_BETA [operations/puppet] - 10https://gerrit.wikimedia.org/r/106451 [23:11:04] ori: ^ [23:11:39] (03CR) 10Ori.livneh: [C: 032] Removed features flagged around MW_SCAP_BETA [operations/puppet] - 10https://gerrit.wikimedia.org/r/106451 (owner: 10Aaron Schulz) [23:38:55] (03PS1) 10Faidon Liambotis: noc: remove ishmael & cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/106455 [23:40:02] (03CR) 10Faidon Liambotis: [C: 032] noc: remove ishmael & cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/106455 (owner: 10Faidon Liambotis) [23:40:55] legoktm: I know how to fix the IRC thing [23:41:05] A second username isn't required though [23:41:29] Much like: 2<24rc-pmtpa2> 14[[07Special:Log/delete14]]4 delete10 02 5* 03Ponyo 5* 10deleted "[[02User talk:Penchal-CEO of Pen Groups10]]": [[WP:CSD#G11|G11]]: Unambiguous [[WP:NOTADVERTISING|advertising]] or promotion [23:41:48] But on another log: 2<24rc-pmtpa2> 14[[07Special:Log/pagetriage-curation14]]4 reviewed10 02 5* 03WhatamIdoing 5* 10WhatamIdoing marked [[02Hippocampus biocellatus10]] as reviewed [23:42:01] See what I mean? [23:42:17] * ori has an anime seizure [23:42:30] Same with 2<24rc-pmtpa2> 14[[07Special:Log/articlefeedbackv514]]4 create10 02 5* 03189.176.190.22 5* 10189.176.190.22 submitted feedback post #050dd0e... on Global warming [23:42:44] Hm [23:43:08] Gloria ^ [23:58:32] 'kay - greg-g, MaxSem, who's going first here? [23:58:48] you go first [23:59:04] marktraceur: you're first on the list :) [23:59:12] Fun times