[00:03:35] !log maxsem synchronized php-1.21wmf9/extensions/MobileFrontend 'Touchhhhh' [00:03:36] Logged the message, Master [00:20:27] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [00:20:27] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [00:20:27] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [00:22:03] preilly: ahoy [00:33:07] RobH: hey, have you been working on the blog stuffs? [00:33:49] or was that daniel? [00:35:48] notpeter: me [00:36:00] did you make some dumps of the db on db9? [00:36:20] yep, and anyone older than the most recent can go away, want me to go police them? [00:36:42] i make one right before I roll an upgrade [00:36:46] nah, but I'm going to move them to the /a partition [00:36:53] because the root partition is really full [00:37:01] ok, i'll be sure to put on there in future [00:37:05] cool! [00:37:31] I'll make /a/blog_dumps for them [00:38:17] just wanted to make sure that I didn't disappear them on you :) [00:38:40] nah, they are pretty much useless after i verify the upgrade [00:39:06] ah, gotcha [00:39:08] New patchset: Krinkle; "admins.pp: Add new key for 'krinkle'. Invalidate old key." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48766 [00:39:23] i just keep the most recent one because i am paranoid [00:39:29] makes sense [00:39:34] and like the ability to roll back major updates [00:39:40] so if we have room, keeping them is never bad. [00:39:58] yeah, /a has a good amount of space [00:40:03] cool [00:40:10] should be enough to last us until we start using new boxes.... [00:41:14] New review: Krinkle; "Patch Set 1:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48766 [00:42:26] !log restarting varnishncsa on cp1043 [00:42:28] Logged the message, notpeter [00:49:55] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/48766 [00:50:03] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48766 [00:52:18] RECOVERY - Puppet freshness on ms-fe1002 is OK: puppet ran at Wed Feb 13 00:52:04 UTC 2013 [01:13:19] Hm. I should probably catch up on as much Labs infrastructure docs as I can get my hands on. Is what's on labsconsole it? [01:13:39] Pretty much, yes. [01:13:42] rfaulkner: nose should good to me [01:14:07] I was hoping that there was a hidden cache I could tap once I learn the hidden handshake. :-) [01:14:48] Coren: not really. If you see obvious holes or have suggestions about how to organize… I'm happy to explain and/or rewrite. [01:15:49] andrewbogott: Well, I don't yet know what parts I don't know, but once I know I'll let you know. :-) [01:16:01] sounds good :) [01:20:12] RECOVERY - NTP on mw1182 is OK: NTP OK: Offset -0.004286646843 secs [01:20:13] RECOVERY - NTP on mw1188 is OK: NTP OK: Offset 0.001575231552 secs [01:20:21] RECOVERY - NTP on mw1165 is OK: NTP OK: Offset -0.0001726150513 secs [01:20:57] RECOVERY - NTP on mw1176 is OK: NTP OK: Offset -0.00748705864 secs [01:26:10] New patchset: Pyoungmeister; "create a test.w.o role class to increase maxclients" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48770 [01:28:02] New review: Pyoungmeister; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/48770 [01:28:12] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48770 [01:46:03] New patchset: Pyoungmeister; "explicitly passing maxclients to applicationserver::config::apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48771 [02:02:03] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [02:03:42] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [02:15:22] preilly: https://gerrit.wikimedia.org/r/4877[4|5|6|7] [02:15:41] started defining unit tests plus some fixes [02:28:07] !log LocalisationUpdate completed (1.21wmf9) at Wed Feb 13 02:28:06 UTC 2013 [02:28:11] Logged the message, Master [02:40:36] PROBLEM - Puppet freshness on mw37 is CRITICAL: Puppet has not run in the last 10 hours [02:52:39] !log LocalisationUpdate completed (1.21wmf8) at Wed Feb 13 02:52:39 UTC 2013 [02:52:41] Logged the message, Master [03:17:48] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [03:59:35] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [04:23:45] RECOVERY - MySQL disk space on neon is OK: DISK OK [04:44:41] New review: Tim Starling; "Patch Set 3: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/46907 [04:44:50] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/46907 [05:04:25] Change abandoned: Tim Starling; "git-deploy didn't happen." [operations/apache-config] (newdeploy) - https://gerrit.wikimedia.org/r/43148 [05:22:49] New patchset: Tim Starling; "Make mwscript sudo to apache if an admin tries to run a script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48785 [05:23:12] New review: Tim Starling; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48785 [05:23:20] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48785 [05:30:10] New patchset: Tim Starling; "Maybe also avoiding running scripts as root would be good?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48786 [05:33:12] New patchset: Tim Starling; "Prevent MediaWiki maintenance scripts from running as privileged users" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44200 [05:33:45] New review: Tim Starling; "Patch Set 3: Verified+2 Code-Review+2" [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/44200 [05:33:46] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44200 [05:43:39] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [05:49:13] New review: Tim Starling; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48786 [05:49:22] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48786 [05:50:44] !log tstarling synchronized multiversion/MWScript.php [05:50:46] Logged the message, Master [06:16:44] New patchset: Krinkle; "(bug 39380) Enabling secure login (HTTPS)." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21322 [06:17:16] New review: Krinkle; "Patch Set 11:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21322 [06:43:36] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [07:14:46] RECOVERY - MySQL disk space on neon is OK: DISK OK [07:52:25] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [08:17:16] hello [08:17:33] apergos: good morning :-] Will you be around for the next hour or so ? [08:17:48] morning [08:17:50] I'm here [08:18:01] and wishing you were a mobile front end person :-D [08:18:04] cool! I am going to update Jenkins API tokens and might poke you to get a key updated :) [08:18:07] oh [08:18:09] sure thing [08:18:11] mobile causing trouble again ? [08:18:17] dumps broken [08:18:28] Fatal error: Call to a member function getText() on a non-object in /a/usr/local/apache/common-local/php-1.21wmf8/extensions/MobileFrontend/includes/MobileContext.php on line 273 [08:18:35] damn [08:18:42] someone didn't run test suite and broke them [08:18:48] surely MaxSem could help with MobileFrontend issues [08:18:56] that's the hope [08:19:01] we'll see when he gets on [08:19:18] who, me? [08:19:32] looking [08:19:32] MobileFrontend seems to echo some fatal errors :/ [08:20:43] [13-Feb-2013 15:17:51] [08:20:48] hmm how is that possible? :-D [08:20:58] a date in the future! [08:21:14] wmf8 [08:21:46] that is from a request on the Thailand's wiki [08:21:51] so most probably local time instead of gmt [08:22:04] [13-Feb-2013 15:17:51] Fatal error: Call to a member function getDBkey() on a non-object at /usr/local/apache/common-local/php-1.21wmf8/extensions/MobileFrontend/includes/skins/SkinMobile.php on line 264 [08:22:09] MaxSem: another one :) [08:22:24] hm didn't see those [08:23:37] the first one breaks abstracts and stubs [08:23:38] and therefore also page content dumps [08:23:55] RECOVERY - MySQL disk space on neon is OK: DISK OK [08:25:26] !log jenkins: changing encryption key and regenerating secrets. See {{bug|44592}} [08:25:27] Logged the message, Master [08:26:10] apergos, hashar: https://gerrit.wikimedia.org/r/48797 [08:26:55] * apergos is already enjoying the improved gerrit [08:28:38] MaxSem: I don't know anything about MF but that change is not going to get things any worse :-] [08:29:03] MaxSem: CR +2 [08:29:10] um, is that really going to prevent the fatal or will it just happen in the if? [08:29:13] * apergos is looking [08:32:50] apergos, I can deploy it [08:33:59] oh, this is the second error [08:34:05] sure [08:34:44] I'll +2 that [08:34:55] no I won't, someone else did ;-D [08:35:06] thanks hashar [08:35:36] now the wmf branch need an update :-] [08:35:41] branches [08:35:43] yup [08:35:53] but max knows how to do that [08:36:17] I was running on wmf8 for one of these [08:36:19] so I could easily test that again [08:36:36] (command already cued up) [08:39:21] PROBLEM - Puppet freshness on cp3022 is CRITICAL: Puppet has not run in the last 10 hours [08:40:37] I'm on it [08:42:06] thanks a lot btw [08:43:53] apergos, do you need it only on wmf8? [08:44:11] don't know but I was going to say let's test it there first [08:44:23] I can check the other failures and see what version they had [08:45:00] there are failures also in 9 [08:48:53] grrrr, new gerrit merges so slowwwwwly [08:49:30] MaxSem: might be jenkins / unit tests [08:49:43] then it's you whom I should be biting [08:49:49] indeed [08:49:51] * MaxSem bites hashar [08:50:02] merge into what ? [08:50:05] mw/core ? [08:50:27] this is an extension [08:52:45] !log maxsem synchronized php-1.21wmf8/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/48797/' [08:52:45] Logged the message, Master [08:52:48] here we go [08:53:35] ok [08:53:57] if it fixed wmf8, I can deploy it to wmf9 too [08:54:09] I'll find out shortly [08:54:17] ope [08:54:26] PHP Fatal error: Call to a member function getText() on a non-object in /a/usr/local/apache/common-local/php-1.21wmf8/extensions/MobileFrontend/includes/MobileContext.php on line 273 [08:54:34] well it fixed ne but now [08:54:34] eh [08:54:36] we have the second [08:54:53] sorry for the horrible tying [08:54:57] *typing! [08:55:49] or maybe it didn't fix the one (since I haven't encountered that error in the dumps) [08:55:52] anyways.... [08:56:29] I'm an idiot [08:56:59] I forgive you. [08:57:06] * apergos whacks Susan [08:57:15] Now the other cheek. [08:57:23] that's pretty cheeky of you [08:57:59] :D [08:57:59] * apergos could go somewhere raunchy with this but this is a publically logged channel [08:58:49] https://gerrit.wikimedia.org/r/48804 [09:00:14] all right let's try that [09:01:04] merged [09:07:12] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [09:10:02] !log maxsem synchronized php-1.21wmf8/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/48804/' [09:10:04] Logged the message, Master [09:11:20] looks better [09:11:43] I'd say push them both out to 9 [09:17:59] !log maxsem synchronized php-1.21wmf9/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/48804/' [09:18:00] Logged the message, Master [09:24:12] it wil be a little while before I have a complete run and can move onto a test under 9, but I presume it will be successful [09:24:25] thanks again [09:24:38] :) [09:35:33] apergos: Jenkins is still processing .. :D [09:35:44] daaannnggg :-) [09:36:36] RECOVERY - MySQL disk space on neon is OK: DISK OK [09:36:41] my test dump run is still going, it's in meta-current [09:37:30] ah now it's in meta-history [09:54:06] so... how is jenkins? [09:58:02] rekeying stuff [09:58:09] apparently it parse all the .xml files there [09:58:14] I guess it will take a while [09:58:23] feel free to get to lunch / out / bed whatever :-] [09:58:30] it is probably not going to cause any issues [10:00:11] heh [10:00:16] not bed, its very early here :-) [10:00:32] also it is pouring buckets and there is occasional thunder and lightning [10:00:46] a nice day for hot chocolate which I will fix soon [10:07:35] yummm [10:14:57] today I learned another english idiomatic: "pouring buckets" [10:14:59] I guess that is the same as "it is raining cats and dogs" [10:15:09] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [10:16:30] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 197 seconds [10:16:30] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 197 seconds [10:17:30] Yes. [10:19:28] it is, though the image is not of buckets falling from the sky (as the other one invokes the image of cats and dogs falling) but of buckets of water being emptied onto the passersby below, at least that is how I envision it [10:21:54] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [10:21:54] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [10:21:54] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [10:44:24] RECOVERY - MySQL disk space on neon is OK: DISK OK [11:18:16] apergos: yeah that "pouring buckets" image makes a lot of sense. Much more than the cats and dogs failing upon us :-] [11:19:11] ah [11:19:17] Jenkins has completed its rekeying stuff [11:20:27] and how does it look? [11:20:44] Zuul is still able to communicate with Jenkins using its API key [11:20:46] so I guess it is fine [11:23:37] apergos: I have closed the bug. Rekeying is a success as far as I am concerned. [11:23:42] apergos: thanks for staying around :-] [11:23:56] sure! [11:23:59] it's hailing her enow [11:24:04] the fun never ends.... [11:34:02] lunch time [11:34:06] bb soon [11:57:31] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [12:30:13] RECOVERY - MySQL disk space on neon is OK: DISK OK [12:41:37] PROBLEM - Puppet freshness on mw37 is CRITICAL: Puppet has not run in the last 10 hours [13:12:04] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 1 seconds [13:13:16] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 1 seconds [13:26:18] New patchset: Mark Bergsma; "Remove statically configured test backend in favor of dynamic director" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48829 [13:27:39] New review: Mark Bergsma; "Patch Set 1: Code-Review+2" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/48829 [13:27:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48829 [13:33:06] !log reedy synchronized php-1.21wmf9/resources/mediawiki [13:33:07] Logged the message, Master [13:48:48] https://gerrit.wikimedia.org/r/48771 "currently every box has the default, this will get the apis up to their intended 100, and imagescalers down to their intended 18" [13:48:49] Oops :D [14:00:23] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [14:06:23] PROBLEM - Puppet freshness on tin is CRITICAL: Puppet has not run in the last 10 hours [14:12:23] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [14:42:48] RECOVERY - MySQL disk space on neon is OK: DISK OK [15:06:22] New review: Dzahn; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/47795 [15:06:31] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47795 [15:11:09] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [15:12:06] <-- hmm, that looks temporary.. ssh_exchange_identification: Connection closed by remote host [15:12:15] but 3 seconds later.. login just fine [15:12:48] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:19:38] Unable to open CDB file for write "/home/wikipedia/common/php-master/cache/l10n/l10n_cache-ab.cdb" [15:19:38] pff [15:19:40] (on labs) [15:20:08] Invalid escape flag: j [15:20:14] pff (on RT) [15:21:03] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [15:21:37] hashar: it's already owned by another user? [15:24:17] yeah the l10n cache files were owned by mwdeploy [15:24:26] our permissions are SUCH as mess :-] [15:26:47] mutante: I guess the mail to cron would be fixed now. [15:27:46] hashar: great:) i did not get a new one yet. before i got it every 5 minutes or something [15:29:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.565 seconds [15:44:46] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [15:51:11] RECOVERY - MySQL disk space on neon is OK: DISK OK [16:05:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:16:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [16:25:32] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [16:26:04] heya mark, you around? [16:26:48] q about the best way to make some aggregated kraken data available for graphing [16:28:59] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [16:35:36] New patchset: Ottomata; "Including stats system user on analytics nodes." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48838 [16:36:05] New review: Ottomata; "Patch Set 1: Verified+2 Code-Review+2" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/48838 [16:36:14] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48838 [16:40:29] ottomata: yes? [16:46:06] so, in hdfs [16:46:14] there is /wmf/public directory [16:46:45] this directory is meant to be used as a place to store output hadoop jobs that can be used as input to limn to graph [16:47:16] previously, we were pointing limn at a .csv file that was available over http via the proxy [16:47:35] since we're not doing that now, i need to figure out a new way to make that data availble. [16:47:51] i could copy that directory over periodically to stat1001 and host it from stats.wikimedia.org [16:48:26] also, stefan is working on creating a .deb for Limn. once that is done we'd like to puppetize and host reportcard (and other limn sites) on stat1001 [16:48:50] if Limn is on stat1001, it has access to the /wmf/public directory hdfs over http via webhdfs [16:49:37] without knowing any more specifics, the latter seems like the cleanest solution [16:50:15] i think so too, and that should be ok since stat1001 is in eqiad and already on the backend network, so no public proxy is needed, right? [16:50:35] if it were in pmtpa it would be no different [16:50:39] right right [16:50:43] but [16:50:45] i guess i mean just on the backend network in general [16:50:49] how is access control handled by webhdfs? [16:51:23] the files in /wmf/public are world readable, so webhdfs will allow access to them [16:51:38] so basically webhdfs just looks at unix file permissions? [16:51:48] right, and webhdfs is configured to run as a particular hdfs user [16:51:56] that doesn't seem very secure, does it? [16:52:01] too easy to make mistakes [16:52:14] yeah probably [16:52:56] also, since it'll cross the analytics VLANs acl, you'll need to add it to the ticket [16:53:32] that's true, oh yeah, I wanted to talk about that yesterday in the meeting, but forgot to bring it up with all the other stuff [16:53:47] i assume webhdfs is some FUSE thing? [16:54:39] no, rest http api that comes with hadoop [16:54:58] right, but how is it accessed by limn? [16:55:09] just a url [16:55:53] so basically, webhdfs needs a proper setup, with puppetization, good access control, SSL, security review [16:56:22] right (i'm verifying that this url actually uses webhdfs at the moment, one sec…it might just be a datanode thing…), but yeah [16:56:32] (been a few months since I looked at this) [16:56:58] so I think you can't use that yet [16:57:09] if something is needed now, some temporary rsync is probably better [16:57:46] New patchset: Alex Monk; "(bug 44587) Multiple changes for trwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48841 [17:03:25] yeah, sounds good, i'm not entirely sure if the whole process at the moment. It looks like this url is not actually using webhdfs (hue web ui is), but its just a datanode service that turns on with the datanode [17:03:50] is webhdfs currently configured? [17:04:37] hue uses it, yes [17:05:26] what is hue and where does it run? [17:05:42] hue is a web interface for a lot of generic hadoop services [17:05:49] it is running on analytics1027 [17:06:09] ok [17:06:18] is webhdfs configured to only allow access from the analytics cluster? [17:11:43] morning [17:12:00] re: hue and the proxy, I was told that it's not temporary as I initially thought [17:12:10] and that it will be the portal for analysts [17:12:32] whatever that's gonna be, it needs to be locked down now [17:12:33] hue, yes, proxy and how people would needs figuring out [17:12:34] yeah [17:12:35] if that's the case, then it should get a service IP and hostname and get behind the normal SSL cluster [17:12:43] yeah totally [17:12:52] there is no proxy running [17:12:54] not set up an nginx in an1027 as we briefly discusses [17:13:03] and nothing running on analytics1001 right now [17:13:04] s/s$/d [17:13:06] ottomata: there may not be a proxy running [17:13:12] but right, mark sorry [17:13:14] was checking about webhdfs [17:13:18] but does webhdfs honour requests from the rest of the network? [17:13:27] it really shouldn't at this poit [17:13:28] point [17:13:35] just checked, and yes it does [17:13:39] so we should turn it off [17:13:49] either that [17:13:53] or lock it down [17:13:55] you need to turn it off or lock it down sufficiently rightaway [17:14:24] i would prefer to lock it down (iptables on analytics nodes for basically the same things that are in that RT ticket) [17:14:29] but if you would rather me turn it off I will do so [17:14:42] i don't like just iptables [17:14:53] i'm guessing webhdfs can be configured with access control as well [17:15:03] that would be a good start [17:15:18] i'm looking into it…but I think with kerberos :( [17:17:32] When security is on, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the delegationquery parameter, the authenticated user is the user encoded in the token. If the delegation parameter is not set, the user is authenticated by Kerberos SPNEGO. [17:18:47] if it doesn't have anything better, I think it should be turned off at this point [17:18:58] link...? [17:19:21] http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-yarn/hadoop-yarn-site/WebHDFS.html [17:19:39] what's webhdfs? a java app? a cgi? [17:19:48] hdfs rest API that ships with hadoop [17:20:09] that are its interfaces [17:20:13] but what is *it*? [17:20:56] a fuse filesystem? a jar that runs under a java servlet container? a cgi? [17:20:57] its part of the namenod [17:20:59] namenode [17:21:05] so hadoop java app [17:21:35] ugh [17:21:42] its disableable though [17:21:43] doesn't look like it has any ip based access control [17:21:45] just a config setting [17:21:55] yeah, just kerberos :/ [17:21:55] then do that now until we can have a thorough look at it [17:21:56] run it on loopback and set up a reverse proxy in fornt of it [17:22:03] hmmmmmm [17:22:09] that's a good idea [17:22:13] that does whatever the hell we want [17:22:29] that's not going to help us with authz though [17:22:50] well, we can at least keep it restricted to analytics nodes with that, and only people with shell access to those could use it anyway [17:22:54] is that ok? [17:23:09] i mean, that's doable with iptables too though, is reverse proxy better? [17:24:18] yes, reverse proxy is better [17:24:34] happy to do that, can I ask why? [17:25:31] because application level security always trumps firewalling? [17:25:32] really both is best [17:25:39] but we're gonna have that ACL too [17:25:50] and any application should be secure when the firewall is inactive [17:26:41] aye [17:26:45] ok [17:26:51] q then [17:28:23] this would be easy to puppetize in the analytics branch where variables are avaiable, but that is no longer active. I can commit a reverse proxy setup to production puppet and get it reviewed by paravoid, but doing it immediately won't be as pretty as the final product once everything is properly reviewed and puppetized [17:28:37] more hardcoded crap [17:28:39] etc. [17:28:48] stop thinking of some large overhaul far into the future [17:29:04] i'm thinking of an iterative one that we haven't really started [17:29:05] start fixing things now [17:29:17] yes [17:29:18] and we'll iteratively improve them [17:29:20] can't it be iterative with band aids :) [17:29:23] ? [17:29:28] this is your active setup, it needs to be locked down now [17:29:35] without puppet if it's not properly done in puppet yet [17:29:50] hm [17:29:51] ok so [17:30:15] i can do it now without puppet. [17:30:15] i can do it now with puppet but in band aid form [17:30:15] i can do it slowly and beautifully in puppet in pretty form [17:30:24] you do it now without puppet [17:30:31] and then you do it slowly and beautifully in puppet in pretty form later [17:30:34] ack [17:30:35] like it. [17:30:39] haha paravoid does not :p [17:30:43] I do [17:30:46] oh ok [17:30:46] haha [17:30:49] as long as later is not in 6 months [17:30:52] nono [17:30:58] i want to work on this whenever you are available to do so [17:31:03] later means as soon as you close the immediate hole [17:31:25] this should have been there from the very start [17:31:30]