[00:13:49] (03PS1) 10Jgreen: exim.conf template cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/77845 [00:14:37] (03CR) 10Jgreen: [C: 032 V: 031] exim.conf template cleanup [operations/puppet] - 10https://gerrit.wikimedia.org/r/77845 (owner: 10Jgreen) [00:19:59] (03PS1) 10Jgreen: disable ssmtp for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77846 [00:20:36] (03CR) 10Jgreen: [C: 032 V: 031] disable ssmtp for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77846 (owner: 10Jgreen) [01:06:10] (03PS2) 10Pyoungmeister: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 [01:06:52] (03CR) 10jenkins-bot: [V: 04-1] WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 (owner: 10Pyoungmeister) [01:20:01] (03PS3) 10Pyoungmeister: WORK IN PROGRESS: check graphite data from nagios [operations/puppet] - 10https://gerrit.wikimedia.org/r/77366 [02:06:34] !log LocalisationUpdate completed (1.22wmf12) at Tue Aug 6 02:06:34 UTC 2013 [02:06:52] Logged the message, Master [02:16:18] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 6 02:16:18 UTC 2013 [02:16:29] Logged the message, Master [02:37:02] (03PS1) 10Tim Landscheidt: Tools: Enable Ganglia monitoring for Exim [operations/puppet] - 10https://gerrit.wikimedia.org/r/77848 [02:49:02] (03CR) 10Tim Landscheidt: "Tested on toolsbeta." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77848 (owner: 10Tim Landscheidt) [06:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 06:32:47 UTC 2013 [06:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 07:02:33 UTC 2013 [07:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:03:50] PROBLEM - Disk space on analytics1025 is CRITICAL: DISK CRITICAL - free space: / 1046 MB (3% inode=90%): [07:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 07:32:43 UTC 2013 [07:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:36:09] (03PS1) 10ArielGlenn: temporarily turn off rsync of dumps between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/77861 [07:38:00] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:50] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [07:40:30] (03CR) 10ArielGlenn: [C: 032] temporarily turn off rsync of dumps between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/77861 (owner: 10ArielGlenn) [07:54:00] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [08:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 08:02:44 UTC 2013 [08:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 08:32:38 UTC 2013 [08:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:03:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 09:03:08 UTC 2013 [09:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 09:32:35 UTC 2013 [09:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:03:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 10:03:08 UTC 2013 [10:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:15:36] !log stop apache on williams.wikimedia.org [10:15:41] !log stop exim on williams.wikimedia.org [10:15:48] Logged the message, Master [10:15:59] Logged the message, Master [10:20:59] !log db1046 mysql replication stopped [10:21:09] Logged the message, Master [10:33:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 10:33:13 UTC 2013 [10:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:35:00] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [10:36:08] !log stopped icinga notifications for db1048, db1046, db48, db49 [10:36:19] Logged the message, Master [10:46:00] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [11:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 11:02:39 UTC 2013 [11:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:29:05] (03CR) 10Mark Bergsma: [C: 032 V: 032] varnish (3.0.3plus~rc1-wm14) precise; urgency=low [operations/debs/varnish] (testing/3.0.3plus-rc1) - 10https://gerrit.wikimedia.org/r/74354 (owner: 10Mark Bergsma) [11:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 11:32:43 UTC 2013 [11:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:36:17] (03CR) 10Mark Bergsma: [C: 031] Add an authdns module & associated role classes [operations/puppet] - 10https://gerrit.wikimedia.org/r/74119 (owner: 10Faidon) [11:36:28] oh I'm not done [11:36:34] I haven't addressed your changes [11:36:38] er, your first comments [11:36:42] plus a few of my own [11:36:48] I just pushed an intermediate commit at some point [11:37:13] I haven't done the "take an ipaddress array" or the --review option that I have been testing [11:37:23] mark: ^ [11:37:33] i am aware [11:38:25] why not just put the ip addresses in the role classes btw? [11:43:10] PROBLEM - mysqld processes on db1046 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [11:46:36] yeah I was thinking of doing that too [11:48:08] ipv6 maps there too, or else it'll get too complicated [11:48:42] https://bugzilla.wikimedia.org/show_bug.cgi?id=52531 looks like not all caches remove files on delete, is there some way to force a cleanup? [12:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 12:02:37 UTC 2013 [12:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 12:32:48 UTC 2013 [12:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:50:48] (03CR) 10Ori.livneh: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [12:51:19] (03PS9) 10Ori.livneh: Clean up sysctl parameters. [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 [13:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 13:02:39 UTC 2013 [13:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 13:32:48 UTC 2013 [13:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:40:25] yo paravoid, i'm looking at building a kafka 0.8 beta deb [13:40:29] what did you say we should call the package? [13:40:47] the package name would be the same [13:40:58] the version would be something like 0.8~beta1-1 [13:41:13] (0.8~beta1 being the upstream version, 1 the Debian revision) [13:41:27] ~ means "less than", i.e. 0.8 > 0.8~foo [13:43:28] oh huh, ok [13:43:41] makes sense? [13:43:46] yeah, right now we are building from trunk [13:43:55] should I see if I can find the last RC? [13:45:45] that'd be best, yes [13:45:57] if you need trunk, then 0.8~20130806 would work too [13:46:12] okay bbl, very late lunch/very early dinner time [13:48:06] ok, thanks [13:50:48] hm, the beta1-candidate1 branch's last commit was june [13:50:52] oh well, ok will try that first i guess [13:55:29] (03CR) 10Andrew Bogott: [C: 031] "I will merge this when I have an hour or two to babysit. Ori, can you confirm that you've tested all of the paths on labs? (If not, that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [14:03:19] (03CR) 10Andrew Bogott: [C: 031] "This looks good to me -- would be good to get thoughts from someone else who actually knows about nrpe :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77720 (owner: 10Akosiaris) [14:03:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 14:03:19 UTC 2013 [14:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:04:07] (03CR) 10Andrew Bogott: "I'm happy to merge this but, yeah, maybe some comments about usage would be good first." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77090 (owner: 10Hashar) [14:21:54] yoo akosiaris, you around? [14:22:07] trying to do some kafka building and having trouble [14:22:07] ottomata: yeah [14:22:37] so, there is a '0.8.0-beta1' tag [14:22:44] i want to try to build from that [14:22:58] actually, first, let me tell you what I already did [14:23:12] since we were building from trunk before, (and before I realized I wanted to build from the tag) [14:24:14] i went ahead and merged in recent changes from upstream trunk [14:24:14] to both our trunk, and into our debian branch [14:24:14] so, that's that. [14:24:14] so, i figured if I want to build the tag for now [14:24:14] i'd just create a new 'debian-branch' from the tag, and merge in the debianization commits we made [14:24:15] so, i've done that [14:24:40] now i'm trying to build using this new debian-branch, using the upstream-tag [14:24:55] but am getting dpkg-source: error: aborting due to unexpected upstream changes [14:25:11] i'm looking at the diff, and i dunno where all of these changes are coming from [14:27:31] Ok let's have a look. Have you pushed the changes to https://gerrit.wikimedia.org/r/operations/debs/kafka or should i pull from the official kafka repo? [14:27:43] i haven't pushed the most recent ones about the tag [14:27:46] i only pushed the trunk merges [14:28:08] i'll push the tag [14:28:10] to gerrit [14:28:12] then you can try too [14:28:39] ok, pushed [14:28:40] they also have a branch 0.8.0-beta1-candidate1 [14:28:48] and a tag ? wtf? [14:28:51] yeah, i'm not sure which to use, i figured the tag since it was a tag [14:29:18] ok i got the tag too [14:29:19] the tag has a few more recent commits [14:30:55] oh btw, our log4j patch has conflicts, so I just ignored that for now [14:31:01] commented it out in series [14:31:08] once we get it building on the tag we can fix the patch [14:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 14:32:39 UTC 2013 [14:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:34:29] OTRS is still down? [14:43:04] Bsadowski1: yes. we're generally on track with the upgrade [14:43:30] that's so sweet to hear, so sweet [14:43:55] i really hope everyone loves the new version :-P [14:44:34] Jeff_Green: if Philippe is right that they were originally asked 1 M$ for upgrading OTRS (!), are you our million-dollar-man? [14:45:02] ha. no I really don't deserve that kind of credit. they've done all the hard work [14:46:17] my part is more like day-to-day ops stuff and ticking through their 10-page upgrade protocol [14:50:18] 10 pages upgrade protocol ? [14:50:59] i hope they use 36 or 72 size fonts [14:58:45] Where can I find puppet config that sets up the apache vhost for commons? I want to see how it does a couple of rewrite tricks. [14:59:34] I guess I'm actually looking for the vhost config for upload.wikimedia.org [15:01:42] bd808: have you looked here? http://noc.wikimedia.org/conf/ [15:03:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 15:03:24 UTC 2013 [15:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:03:55] akosiaris: That looks promising. I'll poke around and see if I can find the magic I need. [15:07:37] akosiaris: any luck with kafka? :) [15:08:37] ottomata: just tried building it and got this src/main/scala/kafka/utils/ZkUtils.scala:332: error: value writeDataReturnStat is not a member of org.I0Itec.zkclient.ZkClient [15:08:51] trying to resolve this and move on... [15:10:37] hm, didn't get that [15:10:40] are you building from trunk or tag? [15:10:46] trunk [15:10:47] (03PS1) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:10:51] ok i haven't tried trunk yet [15:10:52] just to see if it will [15:10:53] will try too [15:10:58] and it doesn't :-( [15:11:07] i think i pinned it down though ... gimme a sec [15:12:00] Oooo, I htink I might have just realized why I was getting modified files [15:12:01] ahh [15:12:03] doh [15:12:52] hmm, maybe not [15:13:04] nope, nm no idea [15:13:05] :/ [15:17:42] hm yeah i'm getting that zk error from trunk too [15:26:09] (03CR) 10Mark Bergsma: [C: 04-2] "Unreviewable, because it's a massive, messy commit, changing many conceptually different things." [operations/puppet] - 10https://gerrit.wikimedia.org/r/77330 (owner: 10ArielGlenn) [15:30:00] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:00] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:01] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:30:48] man they are gonna drive me crazy. They downgraded a library version... [15:31:09] ha, zk? [15:31:27] oh yammer metrics? [15:31:31] nome. That one i fixed (upgraded library version). [15:31:35] yeah that one [15:31:53] they went from 3.0 to 2.2.0 [15:32:59] hi somebody say to ask my question here I wonder how much wikipedia/https can be attaqued by this attack https://media.blackhat.com/us-13/US-13-Prado-SSL-Gone-in-30-seconds-A-BREACH-beyond-CRIME-WP.pdf [15:33:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 15:33:42 UTC 2013 [15:34:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:39:10] ottomata: success against trunk. Now... you want to build against 0.8.0-beta1 right ? [15:39:47] yup [15:39:50] i mean, well [15:39:52] what do you think? [15:40:05] that it makes sense ? [15:40:07] we're considering building a .deb for testing in labs, and possibly if kafka is too slow, using in production [15:40:31] should we build from trunk or the beta tag (last commit june 19) [15:40:37] ? [15:40:47] i 'd say beta tag [15:41:05] it is beta1 though. So probably not a lot more stable than trunk [15:41:26] i am perplexed btw from the beta1-candidate trunk [15:44:10] yeah, probably just some artifact of something someone did, who knows, the tag seems safer, as it has a few more commits, and we know it won't change [15:44:52] so beta1 is 3 commits after the branch. Which pauses the question of what that tag is doing there but anyway ... tag it is [15:45:02] that branch* i mean [15:45:36] god... i don't even make sense to myself... what i am writing ? [15:52:39] haha [15:52:50] yes, good q, but i say let's use the branch [15:53:00] i was trying to make gbp.conf automatically use the branch [15:53:08] sorry [15:53:09] tag* [15:53:11] :p [15:53:42] anyway, does what I tried make sense? [15:54:04] I created 'debian-0.8.0-beta1' from 0.8.0-beta1 tag [15:54:16] then, merged the two debianization commits we made into that branch [15:54:20] then, in gbp.conf [15:54:36] debian-branch=debian-0.8.0-beta1 [15:54:37] upstream-tag=%(version)s [15:54:37] upstream-tree=tag [15:54:48] then edited changelog for the new version [15:55:30] ( for now, i'm saying th version is '0.8.0-beta1-1' which I know is incorrect, but am trying it first so it can match the tag) [15:55:34] akosiaris: ^ [15:57:05] debian-0.8.0-beta1 is what ? a branch or a tag ? [15:57:16] a branch [15:59:01] branch that I made [15:59:03] from the tag [15:59:10] it could/should be a tag [15:59:16] but for now I just want to build it [16:01:33] !log authdns-update to flip ticket.wikimedia.org to williams [16:01:41] (03PS1) 10Demon: Poor googlebot, spidering git too fast [operations/puppet] - 10https://gerrit.wikimedia.org/r/77909 [16:01:45] Logged the message, Master [16:01:47] <^d> Easy puppet change if someone's got a sec ^ [16:02:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [16:02:48] googlebot a bad spider ? [16:03:08] <^d> It's crawling too quickly and gitblit can't handle. [16:03:15] and bingbot from what i can see [16:03:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 16:03:17 UTC 2013 [16:03:22] lol... [16:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:03:47] <^d> Yeah, I've tried to be nice and let them spider but it's hurting performance for real users. [16:04:31] well. This means that git.wikimedia.org will not be google indexed anymore though. Right ? [16:06:44] <^d> I would like to have them index it :\ [16:06:51] <^d> I wonder if we can throttle them somehow then? [16:09:19] akosiaris: can you push whatever you did to fix the trunk build? [16:09:56] (03PS1) 10Jgreen: new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 [16:10:44] <^d> Ryan_Lane: Any thoughts on this? [16:10:58] <^d> Googlebot is indexing git.wm.o a bit hard. Totally blocking would be kinda bad :( [16:11:10] RECOVERY - mysqld processes on db1046 is OK: PROCS OK: 1 process with command name mysqld [16:11:15] (03CR) 10Jgreen: [C: 031 V: 031] new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 (owner: 10Jgreen) [16:11:27] (03CR) 10Jgreen: [C: 032] new apache settings for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77912 (owner: 10Jgreen) [16:11:59] ^d: not sure what to do about that [16:12:31] I've also been traveling for like 20 something hours, so, yeah :) [16:12:46] <^d> They also don't seem to be respecting robots.txt, which disallows /zip/ [16:12:49] * ^d sighs [16:13:07] no one respects robots.txt, someday our robot overlords will extract revenge. [16:13:21] ^d: block the useragent for that path [16:13:47] should put it behind varnish [16:13:52] that too [16:14:04] mark: are you in HK? [16:14:05] hmm [16:14:07] no [16:14:11] i'm not coming [16:14:15] :( [16:14:24] we need to talk soon, about https ;) [16:14:32] yep [16:14:46] that's one of the reasons I hoped you were here [16:14:51] sorry [16:15:22] looks like the rel=canonical change helped [16:15:31] https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=SSL+cluster+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [16:15:47] I had a feeling that was causing the surge [16:16:14] <^d> mark: So I could just run varnish + gitblit and remove apache from the mix entirely as well, right? [16:16:15] I'm surprised how quickly it helped, though [16:16:23] ^d: no [16:16:25] oh [16:16:26] wait [16:16:27] yes [16:16:32] I forgot it runs java [16:16:53] maybe i'll setup 2 old varnish servers as the misc varnish cluster i've been talking about for ages now [16:16:55] <^d> Yeah, 8080 on the local machine, apache's acting as reverse proxy. [16:17:05] mark: that would be awesome [16:17:06] and indeed, varnish can talk to 8080 directly [16:17:14] we can test ssl on it too [16:17:30] mark: I'm somewhat concerned about putting the terminators on the frontends [16:17:35] i heard [16:17:40] based on the cpu util on the ssl cluster [16:17:47] and that's with really efficient ciphers [16:18:00] how many req/s are we doing over ssl now? [16:18:17] alas, no good metrics on it [16:18:45] cpu seems to be pretty 1:1 with bandwidth, though [16:19:01] i also wonder how efficient current ssl session caching is [16:19:10] behind the sh lvs scheduler [16:19:20] should be relatively efficient [16:19:27] should be :) [16:19:35] and the ssl cache can hold far more ssl sessions than necessary [16:21:00] saturating a 1G line eats about 50% of the cpu on the newer boxes [16:21:08] using RC4 [16:21:24] how many cpu cores? [16:21:31] I'll disable cipher preference and use aes to see if that's more or less efficient [16:21:38] 12, with HT enabled [16:21:45] HT helps dramatically [16:21:51] by nearly 2x [16:21:53] in ganglia graphs or for real? ;) [16:21:59] ah. good point [16:22:02] ganglia's graphs [16:22:02] hahahaa [16:22:14] it's worrysome if it DOESNT help 2x in ganglia [16:22:34] true. 2x the number of cores :) [16:23:23] I'm planning on doing some cipher tests this week [16:23:32] psudeo cores ;] [16:23:41] (03CR) 10Ori.livneh: "Thanks! I tested the module itself but not the stuff in manifests/." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [16:24:08] specifically the ECDHE ciphers [16:24:34] and I'll be enabling the GCM ciphers for tls 1.2 [16:24:41] ok [16:24:52] i'll need to sync your nginx config changes to the on-cache ssl terminator config [16:24:57] i used a different template for it [16:25:05] radically simpler ;) [16:25:09] heh [16:25:21] simpler in which way? [16:25:29] well the old one had a ton of logic [16:25:33] we're going to need all the clusters [16:25:35] to go to the correct upstream cache clusters [16:25:43] the new one just redirects to 127.0.0.1 [16:25:46] ah [16:25:46] right [16:25:49] and it doesn't know anything about $::site etc [16:26:09] cool, as long as the vhosts are all still there, that sounds good [16:26:12] what do you mean by "all the clusters"? [16:26:17] yeah it's not, i wasn't sure about that [16:26:17] we'll need the vhosts for SNI [16:26:21] meh [16:26:22] ok [16:26:28] otherwise they aren't necessary right now [16:26:33] it'll work properly as is [16:26:41] the unified cert makes that work [16:26:48] yeah I was hoping it would [16:26:52] that's going to be our fallback cert for browsers that don't support SNI [16:27:38] all of them can still just connect to 127.0.0.1, though :) [16:27:43] yep [16:27:46] so most of the logic is still gone [16:27:52] but I do need to add the vhosts and certs for them [16:28:00] * Ryan_Lane nods [16:28:03] which is a pity, the new config looked so small & clean ;) [16:28:20] well, depending on how expensive the termination is, we may not be able to do it this way anyway [16:28:32] unless we're going to expand the clusters for it [16:28:34] we can always buy more varnish boxes [16:28:43] advantage is that it increases the cache too [16:28:47] although I kinda hate it [16:28:51] yeah, may be cheaper to do that than have a separate cluster [16:28:54] we're optimizing varnish to handle tons of requests per box [16:28:57] and then ssl kills it all again ;-) [16:29:01] heh [16:29:14] well, if we get SSL boxes with a shit-ton of nice CPUs, it isn't necessary [16:29:19] well, lots of cpus and bonded ports [16:29:29] bonded ports? [16:29:39] i just got rid of bonded ports, 10G all the way [16:29:44] ah. that works too [16:29:52] much better, in fact :) [16:29:58] varnish will easily fill 10G [16:30:42] what's your take on apache or stud vs nginx? [16:30:58] paravoid was doing some apache testing yesterday [16:31:03] yeah [16:31:09] stud and nginx need dev work if we want to keep them [16:31:14] apache may work without much [16:31:21] (03PS1) 10Akosiaris: Updating/dowgrading libraries [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/77913 [16:31:25] I'd like to get a config going for apache and test it live vs nginx [16:31:32] does apache support spdy? [16:31:35] yes and no [16:31:48] google wrote mod_spdy for apache [16:31:53] but it doesn't support 2.4 yet [16:32:07] * Ryan_Lane has that bug on his track list :) [16:32:09] i guess we really don't have a clear winner ;) [16:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 16:32:46 UTC 2013 [16:32:51] he [16:32:52] err [16:32:53] heh [16:32:54] nope [16:33:04] though apache has a number of features we want [16:33:15] we can do proper PFS with it [16:33:30] (when we're actually willing to use PFS) [16:33:38] it has OCSP stapling, distributed cache [16:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:34:12] nginx has ocsp stapling too now [16:34:25] yep [16:34:38] but it relies on openssl for key rotation for PFS [16:34:43] which only occurs when it starts [16:35:02] and it has no distributed cache [16:35:08] yeah [16:35:09] stud has a distributed cache [16:35:16] and I think someone sent in a patch to fix the PFS issues [16:35:17] i like stud as a concept [16:35:21] but it's definitely missing some stuff [16:35:22] but we'd need proxy support [16:35:32] i'd be happy to add proxy support to varnish though [16:35:54] I need to read up on its feature support vs nginx and apache [16:36:06] its dist cache works differently [16:36:08] it'll eat more memory [16:36:27] and I'm not totally sure how key rotation would work with it [16:36:57] since if we're switching to wrr, it would need to sync the rotation across the boxes [16:37:10] i'm not sure if we need to switch to wrr though [16:37:11] paravoid and I talked about a way of doing that with apache [16:37:26] you can't weight sh [16:37:43] it has a weight value, but it doesn't really work [16:37:51] http://web.archiveorange.com/archive/v/euQpSpzZDfGI86ZV6tGz [16:37:58] it's sad that I have to google for it ;) [16:38:34] ah. you added something to make it actually work? [16:38:35] i'm not sure where it is now chad moved us off svn ;) [16:38:46] i wrote an lvs scheduler with weighted consistent hashing [16:38:48] a long time ago [16:38:52] 2008 or so [16:38:53] svn is still up read only [16:39:01] i never pushed it upstream [16:39:05] ahhh [16:39:07] but I could fresh it up and deploy on our lvs boxes [16:39:13] that would make things much better [16:39:27] assuming that we wouldn't need a dist cache either [16:39:30] also, _right now_ we don't really need weighting ;) [16:39:47] well, kind of [16:40:19] the eqiad cluster has 4 kind of crappy systems and 5 nice ones [16:40:20] svn is still up but I don't see the code in there [16:41:17] Ryan_Lane: yeah but assuming ssl termination on the caches [16:41:21] and when we ever start replacing varnish systems and the terminators are on them we'd surely want to [16:41:26] agreed [16:41:37] don't need it _right now_, will definitely need it in the future [16:41:41] which is why I wrote that code [16:41:43] hehe [16:41:46] :D [16:42:03] I needed to finish some linux hacking course at university at the time, to get some study points [16:42:07] i decided to write that [16:42:12] got my points, never pushed it upstream [16:42:19] tested it but never really used it either [16:43:39] well, depending on which route we go we may or may not need it :) [16:43:51] yeah [16:43:54] having a shared ssl cache with wrr is nice in ways [16:43:55] just saying, it's an option [16:44:11] yeah [16:44:27] distributed ssl cache makes things more complicated, for sure [16:44:32] the code does the same thing as my varnish director that we use [16:44:58] oh, nice. [16:45:28] anyway, yeah, lots of fun work ahead :D [16:48:16] haha [16:48:19] wb to core ops ;-p [16:48:40] wb? [16:48:43] oh [16:48:44] hah [16:49:04] well, I was doing a little here and there with deployment ;) [16:49:08] yeah [16:49:15] but yeah, otherwise it's been all labs all the time [16:49:31] nice to work on something production again :) [16:50:26] i'll work on ipsec too [16:50:30] did a bit of testing already [16:50:46] oh, sweet [16:53:20] (03PS1) 10Jgreen: tweak apache vhost conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77914 [16:54:55] (03CR) 10Jgreen: [C: 032 V: 031] tweak apache vhost conf for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77914 (owner: 10Jgreen) [16:56:55] No joy yet in my search to find apache config for upload.wikimedia.org that shows how prod handles the thumb rewrites. [16:57:02] I've looked in operations/apache-config, operations/mediawiki-config and operations/puppet [16:57:49] I've tried following the instructions at https://www.mediawiki.org/wiki/Manual:Thumb.php directly but I'm doing something wrong. [17:03:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 17:03:11 UTC 2013 [17:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 17:32:41 UTC 2013 [17:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:55:00] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [18:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 18:02:38 UTC 2013 [18:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [18:23:40] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [18:26:51] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 18:32:37 UTC 2013 [18:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [18:49:24] heya paravoid [18:49:31] can you tell how alex created this patch? [18:49:32] https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/logging_to_var_log.patch [18:49:40] i'm trying git diff, quilt, etc. [18:49:53] so far no luck, my patches keep getting rejected [18:50:03] uhm, i don't know how *he* created it :) [18:50:05] by dpkg0source [18:50:11] what do you mean rejected? [18:50:24] error exit status 1 [18:50:24] patching file config/log4j.properties [18:50:25] Hunk #1 FAILED at 12. [18:50:32] so, it doesn't apply [18:50:34] just while building [18:50:37] right right [18:50:54] i know, but i recreated the patch directly by editing the new original and then taking the diff [18:51:04] i think i'm just generating the diff in a way that dpkg-source doesn't like [18:51:32] when i've done this before I've done it with quilt, but this seems to be a different format? not sure [18:51:43] it doesn't really care about the format much [18:51:45] it just uses patch [18:51:57] as in /usr/bin/patch [18:51:57] hm [18:52:18] but quilt is a nice way to test without building [18:52:24] just try "quilt push -a" [18:52:27] oh hm [18:52:27] ok [18:55:29] ottomata: On a totally unrelated note: https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Caveats says "There is currently no easy way to rollback [...]" Could you add what manual steps are needed? [18:55:47] (Perhaps in the FAQ.) [18:56:00] hmmm [18:56:22] there isn't really a certain set of steps to role back [18:56:34] you could point the puppet agent back at the main labs puppetmaster [18:56:42] but any changes that your puppetmaster applied would no longer be tracked [18:56:50] which is what that caution is refering to [18:57:10] it would be nice to be able to automatimcally switch back to the main labs puppetmaster [18:57:12] hmmm [18:57:17] is that all you want to do scfc_de? [18:59:09] (03PS1) 10Jgreen: otrs PostMaster.pl path changed [operations/puppet] - 10https://gerrit.wikimedia.org/r/77922 [18:59:46] ottomata: Background: Sometimes I want to test a Puppet patch on a live (Labs) system. So I'm looking for a checklist on what to do to reset the instance to the "pull from central puppetmaster" state so that I don't miss anything. Does unsetting role::puppet::self suffice? I'm not concerned with extra packages. [19:00:02] hashar and I tried to get a patch working for that once and failed [19:00:10] it's still around in gerrit somewhere with his name on it [19:00:20] ottomata: [19:00:59] (03CR) 10Jgreen: [C: 032 V: 031] otrs PostMaster.pl path changed [operations/puppet] - 10https://gerrit.wikimedia.org/r/77922 (owner: 10Jgreen) [19:01:04] apergos: BTW, /public/datasets/public seems to have taken a summer vacation, last update Jun 23 06:14.. [19:01:26] apergos: Is that a failure on Labs' side? [19:02:31] apergos, oh cool [19:02:33] scfc_de, i see [19:02:40] no, unchecking role::puppet::self won't do it [19:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 19:02:36 UTC 2013 [19:02:42] hm [19:02:51] but there should be a way to autmate reversing that [19:03:02] all you *really* have to do is fix puppet.conf [19:03:12] so that it looks like it does on normal labs instances [19:03:14] /etc/puppet.conf [19:03:25] there might be more to it than that (certs?) but I betcha it will do [19:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [19:03:40] i can probalby make it so that unchecking role::puppet::self would do that [19:03:52] ottomata: I'll try it on an instance and update Help:Self... with my findings. [19:04:02] ok cool [19:04:07] and i'll add that to my todo list [19:04:10] ottomata: A checklist-type thingy would be enough, I believe. [19:04:13] yeah [19:04:16] so [19:04:16] also [19:04:25] if you want to keep your self hosted puppet [19:04:33] but still get latest production puppet changes [19:04:44] you can just sudo git pull in /var/lib/git/operations/puppet [19:06:01] Yep, but then you would have to cron that, etc. The simpler, the better :-). [19:06:48] thanks paravoid, I got it, quilt push -a helped me figure that out [19:07:26] cool [19:08:02] scfc_de: that's on my side, I'm busy moving stuff around [19:08:11] when the move looks stable I'll restart that [19:09:20] apergos: Okay, thanks. [19:09:58] apergos: Just want to avoid people starting to pull from dumps.wikimedia.org. [19:12:36] yep [19:22:39] (03CR) 10Ottomata: [C: 032 V: 032] Updating/dowgrading libraries [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/77913 (owner: 10Akosiaris) [19:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 19:32:42 UTC 2013 [19:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [19:45:28] (03PS1) 10Jgreen: fix shell for otrs user [operations/puppet] - 10https://gerrit.wikimedia.org/r/77970 [19:47:21] (03CR) 10Jgreen: [C: 032 V: 031] fix shell for otrs user [operations/puppet] - 10https://gerrit.wikimedia.org/r/77970 (owner: 10Jgreen) [19:59:00] PROBLEM - search indices - check lucene status page on search19 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 60051 bytes in 0.112 second response time [20:00:14] (03PS1) 10Ottomata: Adding role/analytics/kafka.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:00:31] (03CR) 10jenkins-bot: [V: 04-1] Adding role/analytics/kafka.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 (owner: 10Ottomata) [20:01:45] (03PS2) 10Ottomata: Adding role/analytics/kafka.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:02:03] (03CR) 10jenkins-bot: [V: 04-1] Adding role/analytics/kafka.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 (owner: 10Ottomata) [20:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 20:02:46 UTC 2013 [20:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:20:10] (03PS3) 10Ottomata: Adding role/analytics/kafka.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:24:16] (03PS4) 10Ottomata: Adding role/analytics/kafka.pp Also adding modules/kafka [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:30:51] yo dudes, how does one get a wikitech account? [20:30:53] toby negrin is asking [20:30:57] it's ldap, right? [20:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 20:32:45 UTC 2013 [20:33:28] ori-l: do you know? ^^ [20:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:35:17] it is [20:35:43] just ldap [20:36:00] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [20:36:01] well [20:36:20] by that I mean it's stored in ldap, you don't have to use ldap to create the account [20:36:28] I think you can just register on wikitech nowadays [20:37:00] the fact that it's ldap prevents from being central login so it needs a separate registration, but it's still self-served [20:37:47] (03PS1) 10Ottomata: Updating .properties files with recent 0.8 branch zookeeper property name change. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77973 [20:38:14] paravoid: so if tnegrin has an ldap account, he should be able to log in, right? [20:38:51] (03CR) 10Ottomata: [C: 032 V: 032] Updating .properties files with recent 0.8 branch zookeeper property name change. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/77973 (owner: 10Ottomata) [20:39:29] (03PS5) 10Ottomata: Adding role/analytics/kafka.pp Also adding modules/kafka [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [20:41:51] yes [20:42:32] are you deploying kafka 0.8? [20:42:42] in labs [20:43:03] cool [20:43:09] anything I can do? [20:43:18] hmmmmm dunno! [20:43:31] i kinda messed up the kafka debian branch this morning [20:43:33] its not super messed up [20:43:38] just hard to build with the beta tag the way I did it [20:43:44] heh [20:43:48] i had to create a new debian branch and cherry pick the debianization changes [20:44:02] it'll be fine once we can build from anything later than today [20:44:10] (i rebased the debian branch :/ ) [20:44:11] fine with me [20:44:36] if subsequent versions are okay, I don't think we need to do anything now [20:44:38] but since the tag was older, i couldn't merge the debianization commits because I had rebased them on top of newer trunk [20:44:41] yeah it'll be fine [20:44:48] i just have a local branch that I used to build my test beta .deb [20:44:51] no need to put it in apt [20:44:51] cool [20:44:53] i think its working [20:44:59] i just got it to work with puppet [20:45:00] what's the current version? [20:45:17] beta1? [20:45:30] Version: 0.8.0~beta1-1 [20:45:39] cool [20:47:00] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [20:47:32] i couldn't get the labsdebrepo thing to work [20:47:34] i had to dpkg -i manually [21:00:30] (03PS6) 10Ottomata: Adding role/analytics/kafka.pp Also adding modules/kafka [operations/puppet] - 10https://gerrit.wikimedia.org/r/77971 [21:03:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 21:03:05 UTC 2013 [21:03:17] coooool, working great in labs [21:03:34] paravoid, if you want to play, my next step is to start playing with camus [21:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [21:03:48] i'm going to sign off for the day now [21:04:01] you're welcome to use the setup kafka + hadoop setup in labs to play with whatever you want [21:04:20] instances: kraken-namenode and kraken-kafka are probably the most relevant ones [21:04:33] nah, I have my hands more than full, I was basicaly asking if there's anything to brainstorm or needing of a review [21:06:10] ah ok [21:06:11] cool [21:06:21] naw think i'm good for now, danke [21:06:38] cool [21:06:46] ooo, actually [21:06:50] here's a quick one [21:07:09] what's your plan for putting it in prod? [21:07:27] I'm asking for varnishkafka, it's not strictly needed initally but it might help with perf testing [21:07:43] ah! first my q: [21:07:44] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/zookeeper.pp#L23 [21:07:54] can you thikn of a good way to set a global variable via labsconsole to get that config? [21:07:59] i'm doing the same for kafka [21:08:04] its not just a list of hostnames anymore [21:08:09] which is easy enough to do via the form [21:08:12] but a hash? [21:08:19] (03PS1) 10BryanDavis: Add ganglia monitoring for vhtcpd. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 [21:08:34] aw crap :) [21:08:43] (03PS1) 10Jgreen: enable Apache2::Reload for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77976 [21:08:58] dunno, I don't think it's possible [21:09:16] yeah, don't think so either [21:09:24] ok your question: [21:09:24] dunno [21:09:28] bd808: first commit? [21:09:31] since we are now repaving everything (yay) [21:09:39] paravoid: pretty close [21:09:47] :) [21:09:48] its pretty easy to do whatever we want from scratch [21:09:50] re varnishkafka [21:09:57] i'd love to get it installed on the mobiles before anywhere else [21:09:59] bd808: congrats? :) [21:10:03] I did a few trivial commits for vagrant earlier today [21:10:22] (03CR) 10Jgreen: [C: 032 V: 031] enable Apache2::Reload for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/77976 (owner: 10Jgreen) [21:10:33] ottomata: yeah, the question is, what will these varnishkafka talk with [21:10:35] i want to set varnishkafka up in labs too [21:10:41] have a varnish instance there that produces to kafka [21:10:44] will be real nice :) [21:10:52] working on that this week [21:10:58] ok, i gotta run [21:11:00] see ya! [21:11:35] bye! [21:11:35] oh paravoid, when we repave [21:11:44] we'll have analytics1021 and 1022 running 0.8 [21:11:49] perfect [21:12:02] k cool, laaatas [21:13:06] bd808: huh, I like your github page [21:13:18] contributed to kibana I see? [21:13:41] I'd love to hear your thoughts on it, I've been playing with the idea of logstash/kibana for us [21:14:22] paravoid: I liked kibana ok. Most of my patches were when it was still a php app [21:14:35] have you seen kibana three? [21:14:52] no. I'll go check it out [21:14:53] they switched from php to ruby (that's two) [21:15:00] I'm a big fan of logstash [21:15:01] and three is purely frontend [21:15:13] just javascript, in browser [21:15:15] and elasticsearch in general [21:15:21] cool :) [21:15:32] ( http://three.kibana.org/ ) [21:15:37] you know about the ES work, right? [21:15:47] for search [21:16:07] so yeah, kibana three just talks with ES [21:16:10] aiui [21:16:41] I've been stalking the ES for prod work a little. Talked to robla and erik some about it in my interviews [21:16:49] perfect :) [21:17:12] what have you used logstash for? [21:17:29] developer vm logs at my last job. [21:17:46] was trying to get all beta/sandbox logs into it but ran into ops bottlenecks [21:18:00] basically we didn't have disk in our environment for it [21:18:06] heh [21:18:25] they were "working on that" for 6 months [21:18:51] we had something like the wm vagrant setup for devs [21:18:58] I put kibana on that [21:20:06] cool [21:20:09] the good old days when I felt like I knew what I was doing. :) [21:20:30] now I'm just another FNG n00b slowing people down with lots of dumb questions [21:20:41] yeah, the idea was to run^Wtest logstash/kibana3 for system logs [21:20:57] but maybe MW exception logs and such would be a nice target too [21:21:12] do they have saved searches yet? [21:21:30] dunno, I haven't played with it almost at all :) [21:22:15] so I get you have a lot of experience with ES? [21:22:41] some. [21:22:51] heheh [21:22:56] trick question wasn't it [21:23:23] I'm no expert. Pretty good with the schema config and writing queries [21:23:34] tunig was a mystery that I was working on [21:23:38] *tuning [21:24:05] our prod cluster was a dog, but mostly because the use case changed completely right after I built it [21:24:50] saved queries should be pretty easy to put into an all client side kibana. I hacked it into the ruby version [21:25:01] nod [21:25:44] My logstash work was related to trying to replace splunk [21:26:01] oh that's interesting [21:26:01] it's a slick product, but too much $$$ [21:26:02] how come [21:26:05] ah [21:26:29] they license by the log volume [21:26:46] it's out of the question for us anyway, but I do wonder about pros/cons [21:27:08] splunk is slick. nice gui and some neat backend features [21:27:30] graylog2 was the other thing I was looking into [21:27:39] I had a look at my previous job and discarded it because of mongodb back then [21:27:45] but nowadays it seems it has moved to ES as well [21:27:54] and I've read people are combining it with logstash [21:28:29] I think it was still mongo the last time I looked at it [21:28:51] logstash seems like a pretty good answer to the "how to parse logs" problem [21:29:09] but it needs a frontend that does good things [21:29:19] kibana has been working on that for sure [21:29:32] yeah I've been wondering why logstash/kibana never merged [21:29:43] I mean when it was php it did make some sense [21:29:50] but with kibana going ruby and now pure frontend [21:30:15] I think the ruby switch was an attempt to get toghether [21:31:33] the first code that Rashid committed was pretty obviously a hack he had whipped up quickly [21:31:46] his later stuff is much more polished [21:32:59] damn, if the day had 48h [21:33:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 21:32:52 UTC 2013 [21:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [21:34:24] oh, cool. Kibana 3 is in the ElasticSearch github org account. I wonder if Shay hired him? [21:34:39] yeah I saw that [21:35:16] it's kinda weird that it's a separate subdomain under kibana.org and if you got to www it says nothing about it [21:35:23] and says "Now in Ruby", ironically [21:35:33] * bd808 likes talking about people he's never met like they are old friends [21:35:53] haha :) [21:37:57] (03CR) 10BryanDavis: "(3 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [21:38:59] Any idea who I should add to my puppet/ganglia review? I've put ori-l and bblack so far [21:39:53] yeah bblack is fine :) [21:45:23] paravoid: I'd love to help with a logstash project if you need coder fingers and/or somebody to talk at. That sounds like something I can actively contribute to without being 10+ years behind in reading the codebase. :) [21:49:54] bd808: thanks :) I'm not expecting to have the time to work on it very soon [21:50:48] such is life. Fun projects get buried under official tasks [21:52:10] (03PS1) 10Jgreen: make exim otrs choose db's within site [operations/puppet] - 10https://gerrit.wikimedia.org/r/77977 [21:53:14] (03CR) 10Jgreen: [C: 032 V: 031] make exim otrs choose db's within site [operations/puppet] - 10https://gerrit.wikimedia.org/r/77977 (owner: 10Jgreen) [21:53:21] bd808: your interested noted in https://wikitech.wikimedia.org/wiki/Projects [21:54:17] cool. [21:55:31] I've done some neat stuff in the past for summarizing log feeds too. Python scripts with regex to count "common" errors and highlight new problems. Could be sort of related [21:56:25] (03PS1) 10Jgreen: fix typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/77978 [21:57:22] (03CR) 10Jgreen: [C: 032 V: 031] fix typo [operations/puppet] - 10https://gerrit.wikimedia.org/r/77978 (owner: 10Jgreen) [21:57:27] really old version at https://code.google.com/p/casadebender/source/browse/python/logscan/logscan.py [22:04:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 22:04:07 UTC 2013 [22:04:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:30:16] (03CR) 10Ori.livneh: "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [22:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 22:32:39 UTC 2013 [22:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:43:16] (03PS2) 10BryanDavis: Add ganglia monitoring for vhtcpd. [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 [22:54:14] (03CR) 10BryanDavis: "Yuck. Is `commit --append` really the right workflow for updating a review?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [22:54:45] append? [22:54:47] amend you mean? [23:02:40] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [23:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 23:02:36 UTC 2013 [23:03:30] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.98 ms [23:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:06:00] PROBLEM - Apache HTTP on mw31 is CRITICAL: Connection refused [23:06:37] paravoid: yes, —amend [23:07:00] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.250 second response time [23:12:50] (03CR) 10MZMcBride: "I'm not totally sure what you're asking, but I think the answer is yes. I've found " [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [23:15:30] (03CR) 10BryanDavis: "@MZMcBride That's what I followed. I was just reacting viscerally to the loss of change history implied. I'm not a fan of squashed history" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [23:18:07] (03CR) 10BryanDavis: "(2 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77975 (owner: 10BryanDavis) [23:32:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Aug 6 23:32:39 UTC 2013 [23:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:37:46] bd808: There's always GitHub. :-) [23:41:59] Elsie: is that a valid option? I've been cajoled for liking non-FOSS products on more than one occasion already. [23:55:00] PROBLEM - DPKG on db1048 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:59:17] http://i.imgur.com/3CS31.jpg *waddles off*