[00:13:49] (03PS1) 10Jgreen: exim.conf template cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/77845 [00:14:37] (03CR) 10Jgreen: [C: 032 V: 031] exim.conf template cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/77845 (owner: 10Jgreen) [00:19:59] (03PS1) 10Jgreen: disable ssmtp for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77846 [00:20:36] (03CR) 10Jgreen: [C: 032 V: 031] disable ssmtp for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77846 (owner: 10Jgreen) [01:06:10] (03PS2) 10Pyoungmeister: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 [01:06:52] (03CR) 10jenkins-bot: [V: 04-1] WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 (owner: 10Pyoungmeister) [01:20:01] (03PS3) 10Pyoungmeister: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 [02:06:34] !log LocalisationUpdate completed (1.22wmf12) at Tue Aug 6 02:06:34 UTC 2013 [02:06:52] Logged the message, Master [02:16:18] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 6 02:16:18 UTC 2013 [02:16:29] Logged the message, Master [02:37:02] (03PS1) 10Tim Landscheidt: Tools: Enable Ganglia monitoring for Exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77848 [02:49:02] (03CR) 10Tim Landscheidt: "Tested on toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77848 (owner: 10Tim Landscheidt) [06:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 06:32:47 UTC 2013 [06:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 07:02:33 UTC 2013 [07:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:03:50] PROBLEM - Disk space on analytics1025 is CRITICAL: DISK CRITICAL - free space: / 1046 MB (3% inode=90%): [07:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 07:32:43 UTC 2013 [07:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:36:09] (03PS1) 10ArielGlenn: temporarily turn off rsync of dumps between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/77861 [07:38:00] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:50] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [07:40:30] (03CR) 10ArielGlenn: [C: 032] temporarily turn off rsync of dumps between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/77861 (owner: 10ArielGlenn) [07:54:00] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [08:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 08:02:44 UTC 2013 [08:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 08:32:38 UTC 2013 [08:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:03:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 09:03:08 UTC 2013 [09:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 09:32:35 UTC 2013 [09:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:03:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 10:03:08 UTC 2013 [10:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:15:36] !log stop apache on williams.wikimedia.org [10:15:41] !log stop exim on williams.wikimedia.org [10:15:48] Logged the message, Master [10:15:59] Logged the message, Master [10:20:59] !log db1046 mysql replication stopped [10:21:09] Logged the message, Master [10:33:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 10:33:13 UTC 2013 [10:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:35:00] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [10:36:08] !log stopped icinga notifications for db1048, db1046, db48, db49 [10:36:19] Logged the message, Master [10:46:00] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [11:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 11:02:39 UTC 2013 [11:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:29:05] (03CR) 10Mark Bergsma: [C: 032 V: 032] varnish (3.0.3plus~rc1-wm14) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/74354 (owner: 10Mark Bergsma) [11:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 11:32:43 UTC 2013 [11:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:36:17] (03CR) 10Mark Bergsma: [C: 031] Add an authdns module & associated role classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/74119 (owner: 10Faidon) [11:36:28] oh I'm not done [11:36:34] I haven't addressed your changes [11:36:38] er, your first comments [11:36:42] plus a few of my own [11:36:48] I just pushed an intermediate commit at some point [11:37:13] I haven't done the "take an ipaddress array" or the --review option that I have been testing [11:37:23] mark: ^ [11:37:33] i am aware [11:38:25] why not just put the ip addresses in the role classes btw? [11:43:10] PROBLEM - mysqld processes on db1046 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:46:36] yeah I was thinking of doing that too [11:48:08] ipv6 maps there too, or else it'll get too complicated [11:48:42] https://bugzilla.wikimedia.org/show_bug.cgi?id=52531 looks like not all caches remove files on delete, is there some way to force a cleanup? [12:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 12:02:37 UTC 2013 [12:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 12:32:48 UTC 2013 [12:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:50:48] (03CR) 10Ori.livneh: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [12:51:19] (03PS9) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [13:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 13:02:39 UTC 2013 [13:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 13:32:48 UTC 2013 [13:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:40:25] yo paravoid, i'm looking at building a kafka 0.8 beta deb [13:40:29] what did you say we should call the package? [13:40:47] the package name would be the same [13:40:58] the version would be something like 0.8~beta1-1 [13:41:13] (0.8~beta1 being the upstream version, 1 the Debian revision) [13:41:27] ~ means "less than", i.e. 0.8 > 0.8~foo [13:43:28] oh huh, ok [13:43:41] makes sense? [13:43:46] yeah, right now we are building from trunk [13:43:55] should I see if I can find the last RC? [13:45:45] that'd be best, yes [13:45:57] if you need trunk, then 0.8~20130806 would work too [13:46:12] okay bbl, very late lunch/very early dinner time [13:48:06] ok, thanks [13:50:48] hm, the beta1-candidate1 branch's last commit was june [13:50:52] oh well, ok will try that first i guess [13:55:29] (03CR) 10Andrew Bogott: [C: 031] "I will merge this when I have an hour or two to babysit. Ori, can you confirm that you've tested all of the paths on labs? (If not, that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [14:03:19] (03CR) 10Andrew Bogott: [C: 031] "This looks good to me -- would be good to get thoughts from someone else who actually knows about nrpe :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 (owner: 10Akosiaris) [14:03:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 14:03:19 UTC 2013 [14:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:04:07] (03CR) 10Andrew Bogott: "I'm happy to merge this but, yeah, maybe some comments about usage would be good first." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77090 (owner: 10Hashar) [14:21:54] yoo akosiaris, you around? [14:22:07] trying to do some kafka building and having trouble [14:22:07] ottomata: yeah [14:22:37] so, there is a '0.8.0-beta1' tag [14:22:44] i want to try to build from that [14:22:58] actually, first, let me tell you what I already did [14:23:12] since we were building from trunk before, (and before I realized I wanted to build from the tag) [14:24:14] i went ahead and merged in recent changes from upstream trunk [14:24:14] to both our trunk, and into our debian branch [14:24:14] so, that's that. [14:24:14] so, i figured if I want to build the tag for now [14:24:14] i'd just create a new 'debian-branch' from the tag, and merge in the debianization commits we made [14:24:15] so, i've done that [14:24:40] now i'm trying to build using this new debian-branch, using the upstream-tag [14:24:55] but am getting dpkg-source: error: aborting due to unexpected upstream changes [14:25:11] i'm looking at the diff, and i dunno where all of these changes are coming from [14:27:31] Ok let's have a look. Have you pushed the changes to https://gerrit.wikimedia.org/r/operations/debs/kafka or should i pull from the official kafka repo? [14:27:43] i haven't pushed the most recent ones about the tag [14:27:46] i only pushed the trunk merges [14:28:08] i'll push the tag [14:28:10] to gerrit [14:28:12] then you can try too [14:28:39] ok, pushed [14:28:40] they also have a branch 0.8.0-beta1-candidate1 [14:28:48] and a tag ? wtf? [14:28:51] yeah, i'm not sure which to use, i figured the tag since it was a tag [14:29:18] ok i got the tag too [14:29:19] the tag has a few more recent commits [14:30:55] oh btw, our log4j patch has conflicts, so I just ignored that for now [14:31:01] commented it out in series [14:31:08] once we get it building on the tag we can fix the patch [14:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 14:32:39 UTC 2013 [14:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:34:29] OTRS is still down? [14:43:04] Bsadowski1: yes. we're generally on track with the upgrade [14:43:30] that's so sweet to hear, so sweet [14:43:55] i really hope everyone loves the new version :-P [14:44:34] Jeff_Green: if Philippe is right that they were originally asked 1 M$ for upgrading OTRS (!), are you our million-dollar-man? [14:45:02] ha. no I really don't deserve that kind of credit. they've done all the hard work [14:46:17] my part is more like day-to-day ops stuff and ticking through their 10-page upgrade protocol [14:50:18] 10 pages upgrade protocol ? [14:50:59] i hope they use 36 or 72 size fonts [14:58:45] Where can I find puppet config that sets up the apache vhost for commons? I want to see how it does a couple of rewrite tricks. [14:59:34] I guess I'm actually looking for the vhost config for upload.wikimedia.org [15:01:42] bd808: have you looked here? http://noc.wikimedia.org/conf/ [15:03:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 15:03:24 UTC 2013 [15:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:03:55] akosiaris: That looks promising. I'll poke around and see if I can find the magic I need. [15:07:37] akosiaris: any luck with kafka? :) [15:08:37] ottomata: just tried building it and got this src/main/scala/kafka/utils/ZkUtils.scala:332: error: value writeDataReturnStat is not a member of org.I0Itec.zkclient.ZkClient [15:08:51] trying to resolve this and move on... [15:10:37] hm, didn't get that [15:10:40] are you building from trunk or tag? [15:10:46] trunk [15:10:47] (03PS1) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:10:51] ok i haven't tried trunk yet [15:10:52] just to see if it will [15:10:53] will try too [15:10:58] and it doesn't :-( [15:11:07] i think i pinned it down though ... gimme a sec [15:12:00] Oooo, I htink I might have just realized why I was getting modified files [15:12:01] ahh [15:12:03] doh [15:12:52] hmm, maybe not [15:13:04] nope, nm no idea [15:13:05] :/ [15:17:42] hm yeah i'm getting that zk error from trunk too [15:26:09] (03CR) 10Mark Bergsma: [C: 04-2] "Unreviewable, because it's a massive, messy commit, changing many conceptually different things." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [15:30:00] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:01] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:48] man they are gonna drive me crazy. They downgraded a library version... [15:31:09] ha, zk? [15:31:27] oh yammer metrics? [15:31:31] nome. That one i fixed (upgraded library version). [15:31:35] yeah that one [15:31:53] they went from 3.0 to 2.2.0 [15:32:59] hi somebody say to ask my question here I wonder how much wikipedia/https can be attaqued by this attack https://media.blackhat.com/us-13/US-13-Prado-SSL-Gone-in-30-seconds-A-BREACH-beyond-CRIME-WP.pdf [15:33:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 15:33:42 UTC 2013 [15:34:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:39:10] ottomata: success against trunk. Now... you want to build against 0.8.0-beta1 right ? [15:39:47] yup [15:39:50] i mean, well [15:39:52] what do you think? [15:40:05] that it makes sense ? [15:40:07] we're considering building a .deb for testing in labs, and possibly if kafka is too slow, using in production [15:40:31] should we build from trunk or the beta tag (last commit june 19) [15:40:37] ? [15:40:47] i 'd say beta tag [15:41:05] it is beta1 though. So probably not a lot more stable than trunk [15:41:26] i am perplexed btw from the beta1-candidate trunk [15:44:10] yeah, probably just some artifact of something someone did, who knows, the tag seems safer, as it has a few more commits, and we know it won't change [15:44:52] so beta1 is 3 commits after the branch. Which pauses the question of what that tag is doing there but anyway ... tag it is [15:45:02] that branch* i mean [15:45:36] god... i don't even make sense to myself... what i am writing ? [15:52:39] haha [15:52:50] yes, good q, but i say let's use the branch [15:53:00] i was trying to make gbp.conf automatically use the branch [15:53:08] sorry [15:53:09] tag* [15:53:11] :p [15:53:42] anyway, does what I tried make sense? [15:54:04] I created 'debian-0.8.0-beta1' from 0.8.0-beta1 tag [15:54:16] then, merged the two debianization commits we made into that branch [15:54:20] then, in gbp.conf [15:54:36] debian-branch=debian-0.8.0-beta1 [15:54:37] upstream-tag=%(version)s [15:54:37] upstream-tree=tag [15:54:48] then edited changelog for the new version [15:55:30] ( for now, i'm saying th version is '0.8.0-beta1-1' which I know is incorrect, but am trying it first so it can match the tag) [15:55:34] akosiaris: ^ [15:57:05] debian-0.8.0-beta1 is what ? a branch or a tag ? [15:57:16] a branch [15:59:01] branch that I made [15:59:03] from the tag [15:59:10] it could/should be a tag [15:59:16] but for now I just want to build it [16:01:33] !log authdns-update to flip ticket.wikimedia.org to williams [16:01:41] (03PS1) 10Demon: Poor googlebot, spidering git too fast [operations/puppet] - 10https://gerrit.wikimedia.org/r/77909 [16:01:45] Logged the message, Master [16:01:47] <^d> Easy puppet change if someone's got a sec ^ [16:02:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:48] googlebot a bad spider ? [16:03:08] <^d> It's crawling too quickly and gitblit can't handle. [16:03:15] and bingbot from what i can see [16:03:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 16:03:17 UTC 2013 [16:03:22] lol... [16:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:03:47] <^d> Yeah, I've tried to be nice and let them spider but it's hurting performance for real users. [16:04:31] well. This means that git.wikimedia.org will not be google indexed anymore though. Right ? [16:06:44] <^d> I would like to have them index it :\ [16:06:51] <^d> I wonder if we can throttle them somehow then? [16:09:19] akosiaris: can you push whatever you did to fix the trunk build? [16:09:56] (03PS1) 10Jgreen: new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 [16:10:44] <^d> Ryan_Lane: Any thoughts on this? [16:10:58] <^d> Googlebot is indexing git.wm.o a bit hard. Totally blocking would be kinda bad :( [16:11:10] RECOVERY - mysqld processes on db1046 is OK: PROCS OK: 1 process with command name mysqld [16:11:15] (03CR) 10Jgreen: [C: 031 V: 031] new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 (owner: 10Jgreen) [16:11:27] (03CR) 10Jgreen: [C: 032] new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 (owner: 10Jgreen) [16:11:59] ^d: not sure what to do about that [16:12:31] I've also been traveling for like 20 something hours, so, yeah :) [16:12:46] <^d> They also don't seem to be respecting robots.txt, which disallows /zip/ [16:12:49] * ^d sighs [16:13:07] no one respects robots.txt, someday our robot overlords will extract revenge. [16:13:21] ^d: block the useragent for that path [16:13:47] should put it behind varnish [16:13:52] that too [16:14:04] mark: are you in HK? [16:14:05] hmm [16:14:07] no [16:14:11] i'm not coming [16:14:15] :( [16:14:24] we need to talk soon, about https ;) [16:14:32] yep [16:14:46] that's one of the reasons I hoped you were here [16:14:51] sorry [16:15:22] looks like the rel=canonical change helped [16:15:31] https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=SSL+cluster+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [16:15:47] I had a feeling that was causing the surge [16:16:14] <^d> mark: So I could just run varnish + gitblit and remove apache from the mix entirely as well, right? [16:16:15] I'm surprised how quickly it helped, though [16:16:23] ^d: no [16:16:25] oh [16:16:26] wait [16:16:27] yes [16:16:32] I forgot it runs java [16:16:53] maybe i'll setup 2 old varnish servers as the misc varnish cluster i've been talking about for ages now [16:16:55] <^d> Yeah, 8080 on the local machine, apache's acting as reverse proxy. [16:17:05] mark: that would be awesome [16:17:06] and indeed, varnish can talk to 8080 directly [16:17:14] we can test ssl on it too [16:17:30] mark: I'm somewhat concerned about putting the terminators on the frontends [16:17:35] i heard [16:17:40] based on the cpu util on the ssl cluster [16:17:47] and that's with really efficient ciphers [16:18:00] how many req/s are we doing over ssl now? [16:18:17] alas, no good metrics on it [16:18:45] cpu seems to be pretty 1:1 with bandwidth, though [16:19:01] i also wonder how efficient current ssl session caching is [16:19:10] behind the sh lvs scheduler [16:19:20] should be relatively efficient [16:19:27] should be :) [16:19:35] and the ssl cache can hold far more ssl sessions than necessary [16:21:00] saturating a 1G line eats about 50% of the cpu on the newer boxes [16:21:08] using RC4 [16:21:24] how many cpu cores? [16:21:31] I'll disable cipher preference and use aes to see if that's more or less efficient [16:21:38] 12, with HT enabled [16:21:45] HT helps dramatically [16:21:51] by nearly 2x [16:21:53] in ganglia graphs or for real? ;) [16:21:59] ah. good point [16:22:02] ganglia's graphs [16:22:02] hahahaa [16:22:14] it's worrysome if it DOESNT help 2x in ganglia [16:22:34] true. 2x the number of cores :) [16:23:23] I'm planning on doing some cipher tests this week [16:23:32] psudeo cores ;] [16:23:41] (03CR) 10Ori.livneh: "Thanks! I tested the module itself but not the stuff in manifests/." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [16:24:08] specifically the ECDHE ciphers [16:24:34] and I'll be enabling the GCM ciphers for tls 1.2 [16:24:41] ok [16:24:52] i'll need to sync your nginx config changes to the on-cache ssl terminator config [16:24:57] i used a different template for it [16:25:05] radically simpler ;) [16:25:09] heh [16:25:21] simpler in which way? [16:25:29] well the old one had a ton of logic [16:25:33] we're going to need all the clusters [16:25:35] to go to the correct upstream cache clusters [16:25:43] the new one just redirects to 127.0.0.1 [16:25:46] ah [16:25:46] right [16:25:49] and it doesn't know anything about $::site etc [16:26:09] cool, as long as the vhosts are all still there, that sounds good [16:26:12] what do you mean by "all the clusters"? [16:26:17] yeah it's not, i wasn't sure about that [16:26:17] we'll need the vhosts for SNI [16:26:21] meh [16:26:22] ok [16:26:28] otherwise they aren't necessary right now [16:26:33] it'll work properly as is [16:26:41] the unified cert makes that work [16:26:48] yeah I was hoping it would [16:26:52] that's going to be our fallback cert for browsers that don't support SNI [16:27:38] all of them can still just connect to 127.0.0.1, though :) [16:27:43] yep [16:27:46] so most of the logic is still gone [16:27:52] but I do need to add the vhosts and certs for them [16:28:00] * Ryan_Lane nods [16:28:03] which is a pity, the new config looked so small & clean ;) [16:28:20] well, depending on how expensive the termination is, we may not be able to do it this way anyway [16:28:32] unless we're going to expand the clusters for it [16:28:34] we can always buy more varnish boxes [16:28:43] advantage is that it increases the cache too [16:28:47] although I kinda hate it [16:28:51] yeah, may be cheaper to do that than have a separate cluster [16:28:54] we're optimizing varnish to handle tons of requests per box [16:28:57] and then ssl kills it all again ;-) [16:29:01] heh [16:29:14] well, if we get SSL boxes with a shit-ton of nice CPUs, it isn't necessary [16:29:19] well, lots of cpus and bonded ports [16:29:29] bonded ports? [16:29:39] i just got rid of bonded ports, 10G all the way [16:29:44] ah. that works too [16:29:52] much better, in fact :) [16:29:58] varnish will easily fill 10G [16:30:42] what's your take on apache or stud vs nginx? [16:30:58] paravoid was doing some apache testing yesterday [16:31:03] yeah [16:31:09] stud and nginx need dev work if we want to keep them [16:31:14] apache may work without much [16:31:21] (03PS1) 10Akosiaris: Updating/dowgrading libraries [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/77913 [16:31:25] I'd like to get a config going for apache and test it live vs nginx [16:31:32] does apache support spdy? [16:31:35] yes and no [16:31:48] google wrote mod_spdy for apache [16:31:53] but it doesn't support 2.4 yet [16:32:07] * Ryan_Lane has that bug on his track list :) [16:32:09] i guess we really don't have a clear winner ;) [16:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 16:32:46 UTC 2013 [16:32:51] he [16:32:52] err [16:32:53] heh [16:32:54] nope [16:33:04] though apache has a number of features we want [16:33:15] we can do proper PFS with it [16:33:30] (when we're actually willing to use PFS) [16:33:38] it has OCSP stapling, distributed cache [16:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:34:12] nginx has ocsp stapling too now [16:34:25] yep [16:34:38] but it relies on openssl for key rotation for PFS [16:34:43] which only occurs when it starts [16:35:02] and it has no distributed cache [16:35:08] yeah [16:35:09] stud has a distributed cache [16:35:16] and I think someone sent in a patch to fix the PFS issues [16:35:17] i like stud as a concept [16:35:21] but it's definitely missing some stuff [16:35:22] but we'd need proxy support [16:35:32] i'd be happy to add proxy support to varnish though [16:35:54] I need to read up on its feature support vs nginx and apache [16:36:06] its dist cache works differently [16:36:08] it'll eat more memory [16:36:27] and I'm not totally sure how key rotation would work with it [16:36:57] since if we're switching to wrr, it would need to sync the rotation across the boxes [16:37:10] i'm not sure if we need to switch to wrr though [16:37:11] paravoid and I talked about a way of doing that with apache [16:37:26] you can't weight sh [16:37:43] it has a weight value, but it doesn't really work [16:37:51] http://web.archiveorange.com/archive/v/euQpSpzZDfGI86ZV6tGz [16:37:58] it's sad that I have to google for it ;) [16:38:34] ah. you added something to make it actually work? [16:38:35] i'm not sure where it is now chad moved us off svn ;) [16:38:46] i wrote an lvs scheduler with weighted consistent hashing [16:38:48] a long time ago [16:38:52] 2008 or so [16:38:53] svn is still up read only [16:39:01] i never pushed it upstream [16:39:05] ahhh [16:39:07] but I could fresh it up and deploy on our lvs boxes [16:39:13] that would make things much better [16:39:27] assuming that we wouldn't need a dist cache either [16:39:30] also, _right now_ we don't really need weighting ;) [16:39:47] well, kind of [16:40:19] the eqiad cluster has 4 kind of crappy systems and 5 nice ones [16:40:20] svn is still up but I don't see the code in there [16:41:17]