[00:00:41] Looks like I need chown -h [00:01:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:01:17] That seems to work [00:01:59] !log Restarting the find-chown, this time with -h so symlinks are handled correctly (for some reason there are a bunch of broken symlinks with weird characters out there...) [00:02:08] Logged the message, Mr. Obvious [00:02:11] RoanKattouw: Wouldn't it be better to just wipe those broken links? [00:02:17] They point to nowhere anyway [00:02:18] I bet those files are 2004ish [00:02:22] !log find /export/upload/wik*/*/{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,archive,math,temp,timeline} ! -user apache -exec /root/fixownership2 \{\} \; where fixownership2 = chown -h apache $1 [00:02:30] Logged the message, Mr. Obvious [00:02:30] AaronSchulz: 2008 actually [00:02:50] weird [00:02:54] hoo: Hmm, maybe, but I don't feel comfortable deleting stuff from this system without talking to Aaron (and maybe Ariel) first [00:04:13] Sure, just saying... broken symlinks only cause trouble [00:04:56] if you're having spare time, you could join it against the DB to look whether one of those files still "exists" (or at least the wiki things so) [00:05:03] PROBLEM - NTP on analytics1011 is CRITICAL: NTP CRITICAL: No response from NTP server [00:10:57] New patchset: Ryan Lane; "Fix ldap client settings for nfs1/2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19968 [00:11:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.892 seconds [00:11:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/19968 [00:11:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19968 [00:19:23] !log running mwscript purgeParserCache.php --wiki=enwiki --age=1209600 [00:19:32] Logged the message, Master [00:21:49] New patchset: Dzahn; "move zirconium from private to public" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19969 [00:22:30] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19969 [00:24:12] !log running "mwscript purgeParserCache.php --wiki=$db --age=1814400" instead [00:24:22] Logged the message, Master [00:27:45] New patchset: Ryan Lane; "Fixing script user info" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19972 [00:28:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/19972 [00:29:57] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19972 [00:40:58] !log installing package upgrades on singer [00:41:07] Logged the message, Master [00:43:36] Getting a database error [00:43:44] saying there are too many concurrent transactions [00:43:47] on enwiki [00:53:21] Jasper_Deng: thanks [00:53:33] binasher: what happened? [01:00:31] New patchset: Dzahn; "let partman take all the space that is left instead of fixed value that was too high for zirconium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19977 [01:01:21] New review: Dzahn; "thanks Ben" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/19977 [01:01:21] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19977 [01:20:03] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/19765 [01:20:22] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/19751 [01:32:37] New patchset: Ryan Lane; "labs puppetmasters should be cas..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19979 [01:33:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/19979 [01:33:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19979 [02:02:57] New patchset: Jeremyb; "followup Iec13c027653f21d0" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19981 [02:03:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/19981 [02:05:53] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19981 [02:10:51] !log fixed arecord issues with labsconsole by adding an exception handling live hack for the jobs [02:11:05] Logged the message, Master [02:12:17] !log pushed in large puppet change for ldap, openstack, gerrit and ldap pdns to make it more modular and to add support for eqiad region [02:12:27] Logged the message, Master [03:17:46] New patchset: Dzahn; "add nagios monitoring group "misc_pmtpa" because Nagios is broken without it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19983 [03:18:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/19983 [03:18:59] New review: Dzahn; "just to fix Nagios right now.." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/19983 [03:19:00] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/19983 [03:47:07] PROBLEM - Puppet freshness on lvs1 is CRITICAL: Puppet has not run in the last 10 hours [03:47:07] PROBLEM - Puppet freshness on lvs3 is CRITICAL: Puppet has not run in the last 10 hours [03:47:07] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [03:47:07] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [03:47:07] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [03:57:28] !log nagios back up after adding missing monitor groups misc_pmtpa in appserver role (srv194) [03:57:34] out [03:57:37] Logged the message, Master [04:34:28] RoanKattouw_away: best to look at one of the symlinks directly and see if it's referenced anywhere in the db [04:35:06] I would expect titles like that to have been cleaned up but I've run across a few bad ones before [08:37:07] heya [08:40:02] who's the heya to? [08:44:09] everyone :) [08:53:39] apergos: can you please resolve two bugs for me? [08:54:17] it's going to depend on what they are [08:54:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=39402 [08:54:59] https://bugzilla.wikimedia.org/show_bug.cgi?id=39399 [08:57:13] um [08:57:20] where are these two new statuses if they have been added? [08:57:57] what do you mean where? [08:58:21] shouldn't I see them in the drop down list under status at the bottom of the bug? [08:58:39] after you add them, you will [08:59:20] oh, you are asking me to make these changes to bugzilla [08:59:22] I see [08:59:55] I think you should ask someone who actually is involved in bugzilla maintenance or admin to some degree [09:00:05] such as? [09:00:18] I don't follow any of the conversations about it, and I have no idea what anyone wants over there [09:00:56] who does? [09:01:05] I'm looking into that now [09:02:38] well it looks like thehelpfulone is the most active recently [09:02:40] https://bugzilla.wikimedia.org/buglist.cgi?list_id=139066&resolution=FIXED&query_format=advanced&component=Bugzilla&product=Wikimedia [09:03:50] but reedy seems to also be doing things [09:04:05] so I would check with one of them and see if that's appropriate [09:10:39] thanks [09:10:50] Reedy: ping [09:10:52] I'll be lurking to see what they say [09:11:04] great, thank you [09:11:47] sure [10:27:57] Logged the message, Master [10:28:06] Logged the message, Master [10:28:16] Logged the message, Master [10:28:25] Logged the message, Master [10:28:34] Logged the message, Master [10:28:43] Logged the message, Master [10:28:52] Logged the message, Master [10:29:01] Logged the message, Master [10:29:10] Logged the message, Master [10:29:20] Logged the message, Master [10:29:29] Logged the message, Master [10:29:39] Logged the message, Master [10:35:40] fucker [10:50:05] !log Setup semi-sync snapmirror from nas1-a:home_pmtpa to nas1001-a:home_pmtpa [10:50:16] Logged the message, Master [11:26:05] Logged the message, Master [11:26:13] Logged the message, Master [11:26:22] Logged the message, Master [11:27:04] grrr [11:27:24] does anyone has op around here? [11:27:28] Logged the message, Master [11:27:37] Logged the message, Master [11:27:40] we should at least ban pp-pdf2 and pp-pdf3 [11:27:45] Logged the message, Master [11:27:50] although I'd say pp-pdf1 one too [11:30:47] apergos: do you have op? [11:30:59] or mail contacts for those people [11:31:06] I don't think so [11:31:09] (for op) [11:31:16] and I definitely don't (have email contacts) [11:31:39] tfinc? [11:32:58] New patchset: Matthias Mullie; "Bug 36772 - Article Feedback - Supporting feedback on help pages" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17503 [11:37:15] paravoid: why ban? [11:37:33] they had been specifically asked to log it when they do updates [11:37:38] by Tim IIRC [11:37:43] can't look at channel access list so can't tell but I think that counts as a "no" [11:39:28] just stop caring [11:40:30] https://wiki.openstreetmap.org/w/index.php?title=Talk%3AWiki&action=historysubmit&diff=796940&oldid=796938 [11:40:38] ehm https://wikitech.wikimedia.org/index.php?title=Server_admin_log&curid=3768&diff=50285&oldid=50284 [11:43:41] heh [11:43:51] they could leave em in for the one server [11:43:57] *shrug* [12:05:41] !log Removed OSPF metric on xe-5/2/1.0 on cr2-eqiad, to move eqiad->pmtpa traffic to the lower latency link [12:05:50] Logged the message, Master [12:12:18] paravoid: what's the labs glustermanager cron spam that can't contact ldap? [13:26:41] !log Enabled and started SIS deduplication on home_pmtpa on nas1-a [13:26:50] Logged the message, Master [13:37:48] No working slave server: Unknown error (10.0.6.43)) [13:38:11] seems a glitch [13:39:54] but it's quite slow [13:47:56] mark: sorry just came on again [13:48:10] mark: labstore2 can't connect to virt0:389 apparently, trying to understand why [13:55:25] New review: Faidon; "I don't disagree with the change but rather with its stated effect (and hence commit message). $is_l..." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/19786 [13:59:49] *sigh* [14:01:58] oh? [14:09:26] Isn't labs on prod now anyway? So that change should pretty much have 0 effect. [14:09:47] huh? [14:09:55] how do you figure labs is on prod? [14:10:14] we decided to drop the test branch a while ago [14:10:23] ^ [14:10:37] oh, *that* production [14:10:44] The branch not the server. [14:10:53] not e.g. realm or some other things [14:10:53] s/server/puppetmaster/ [14:15:37] !log stopping puppet on brewster to continue partman troubleshooting for analytics dells [14:15:46] Logged the message, Master [14:32:20] New patchset: Jgreen; "remove fetch_udplogs from aluminium/grosley, it's handled by netapp replication now" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20023 [14:33:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20023 [14:34:40] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20023 [14:54:02] New patchset: Mark Bergsma; "Add some classes for NFS mounts from the NetApps" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20025 [14:54:43] New patchset: Mark Bergsma; "Mount the home_pmtpa volume on bast1001:/srv/home_pmtpa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20026 [14:55:23] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/20025 [14:55:23] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/20026 [14:55:50] New patchset: Mark Bergsma; "Add some classes for NFS mounts from the NetApps" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20025 [14:56:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20025 [14:58:05] New patchset: Mark Bergsma; "Mount NFS home_pmtpa on bast1001:/srv/home_pmtpa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20027 [14:59:04] Change abandoned: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20026 [14:59:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20027 [14:59:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20025 [14:59:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20027 [15:04:28] New patchset: Mark Bergsma; "Correct eqiad hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20028 [15:05:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20028 [15:07:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20028 [15:10:27] New patchset: Jgreen; "deprecate manual replication of gzipped fundraising udplogs, instead netapp replication" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20030 [15:11:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20030 [15:11:12] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20030 [15:20:41] New patchset: Mark Bergsma; "Need to mount the othersite NFS volumes readonly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20032 [15:21:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20032 [15:21:34] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20032 [15:24:48] New patchset: Mark Bergsma; "Fix volume paths" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20033 [15:25:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20033 [15:25:41] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20033 [15:51:12] Hey opsies + RobH [15:51:19] agh, ottomata1! [15:51:20] brb [15:51:40] ok phew [15:51:42] that's better [15:51:43] yeah heya [15:51:49] Can someone get me a wikitech account? [15:52:55] New patchset: Ottomata; "analytics-dell.cfg - Not using swap (for now) Confirming to skip past no swap warning, also confirming to overwrite partition table." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20036 [15:53:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/20036 [15:56:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20036 [15:58:57] !log starting puppet on brewster. (Woo, partman looking better!) [15:59:06] Logged the message, Master [16:01:00] Hmmm. Getting a lot of parser/language related OOMs atm [16:07:23] Logged the message, Master [16:09:06] cmjohnson1: i don't think account creation's been open for a while? [16:10:11] i doubt it [16:10:24] also l10n cache is broken there so stuff like http://wikitech.wikimedia.org/view/Special:Log/newusers makes no sense [16:11:06] and regular users also can't make new accounts: You do not have permission to create this user account, for the following reason: The action you have requested is limited to users in the group: Administrators. [16:11:35] so you need one of these ppl: http://wikitech.wikimedia.org/view/Special:ListUsers/sysop [16:12:07] haha, morebots is an administrator? [16:13:35] morebots is a helpful guy! [16:13:51] or LeslieCarr ;) [16:13:55] ? [16:14:01] nagios-wm is the most helpful [16:14:04] LeslieCarr: ottomata needs a wikitechwiki account [16:14:11] oh [16:14:15] let's see if i have admin access [16:14:20] :) [16:14:20] you do! [16:14:29] oo thank you! [16:14:33] oh yay [16:14:57] LeslieCarr: you're in order right before ma rk coincidentally [16:15:02] what's your user ? [16:15:08] haha, did mark do that on purpose ? ;) [16:15:18] i think it's alphabetical [16:18:27] hah, paravoid's a crat but not a sysop? [16:21:05] i'm in! thanks LeslieCarr [16:25:06] * mark always hides behind leslie when there's work to be done ;-) [16:25:18] haha [16:25:19] ) [16:25:21] :) [16:29:26] !log Stopping puppet on brewster again. Sigh. PARTMAAAAN! [16:29:36] Logged the message, Master [16:33:41] maplebed: here? [16:33:50] yeah. [16:33:53] on the phone [16:35:07] okay [16:35:23] ok, back. [16:35:25] so, the window is in 1½ hour [16:35:36] what were you planning to observe? [16:35:40] any particuarly interesting graphs? [16:35:59] there are a few graphs and logs [16:36:21] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=swift+frontend+proxies [16:36:44] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Swift%2520pmtpa&tab=m&vn=swift+frontend+proxies [16:38:13] then tailing /home/w/logs/syslog/swift on fenari [16:38:25] (I'd probably set up a tail specific to originals) [16:38:43] but I don't actually have much expectation that we'll see interesting things in any of those sources [16:38:48] cmjohnson1: i can do what when back? [16:38:54] make a wikitech account? [16:39:10] so tomorrow all ms servers can die with no issues, right? ;) [16:39:23] another thing I've done in the past is run tcpdump on one of the proxies filtering out just GET requests [16:39:23] no [16:39:25] mark: not yet. [16:39:33] mark: squids still point to ms [16:39:34] but close. [16:39:35] bummer [16:39:38] and this isn't scheduled for today [16:39:57] still [16:40:02] if they'd die, we wouldn't be in a lot of trouble [16:40:03] and I don't think that we should do two things at the time too [16:40:25] +1 paravoid [16:40:29] agreed [16:40:38] yeah, it'd probably be less work to fix squid rather than fixing ms in case of an incident, as I see it [16:40:44] correct me if I'm wrong Ben [16:41:23] paravoid: I think the thing that will actually make me comfortable it's working right is to do tests around original upload and fetcthing and see that it's coming from MW i [16:42:56] RobH: it was done already (wikitech acct) [16:43:21] mark, can I ask you a question about some partman stuff? [16:43:28] yes [16:43:33] notpeter has been helping me but I think we are a little stumped, maybe you'd have an idea [16:43:42] i'm trying to get the new analytics dells installed [16:43:44] i'm so so so close [16:43:48] i'd like [16:43:52] 30GB physical / [16:44:02] and 12GB swap, or no swap at all, i don't really care right now [16:44:16] if I do physical swap, no matter what I say, it fills up the rest of the disk [16:44:25] if I remove the swap partition in the recipe, then / fills up the whole disk [16:45:16] paravoid: I'm going to go talk to aaron and see what other things we can look at. [16:45:22] biab. [16:46:06] this is the current recipe [16:46:07] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=files/autoinstall/partman/analytics-dell.cfg;h=3d61c91350c21f00e0fdab33326a62d72937b417;hb=HEAD [16:47:57] ok, so i see it as just having 30GB physical right now... [16:48:03] if i remember my partman) [16:48:12] yup [16:48:15] except [16:48:17] after install [16:48:20] it fills all of sda [16:48:29] if I make a second partition (swap) [16:48:35] / will be 30GB [16:48:49] and the second partition will fill the rest of the space on sda [16:49:33] did you try d-ipartman-auto-lvm/guided_sizestring80% [16:49:42] no, but I am not using LVM [16:49:53] right? [16:49:56] why are you not [16:49:59] for / [16:50:00] ? [16:50:35] just use partman/lvm.cfg and be happy [16:51:16] ah it has /boot [16:51:19] physical [16:51:21] hmmm [16:51:23] you know what [16:51:24] OK [16:51:25] don't even make your own [16:51:27] that is fine with me [16:51:27] just use that [16:51:37] I have no idea why people keep making custom new recipes when it doesn't seem to matter at all [16:51:44] I think it's a big waste of time [16:51:52] it will matter eventually [16:51:56] once we know how we want these things partitioned [16:52:00] i will need to make one [16:52:04] for the others we wanted mirrored raid on / [16:52:07] this has an SSD on / [16:52:10] so no mirrored raid [16:52:13] sorry [16:52:17] SSD on sda* [16:52:25] but ja, i will try this [16:52:50] Logged the message, Master [16:57:54] ok, I have those two ganglia graphs open [17:01:48] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/20040 [17:02:06] ms-be3 in the pool but not up. hm [17:15:53] cool, lvm.cfg works fine [17:15:54] thanks mark! [17:31:11] paravoid: aaron suggests that the best test will be to verify that thumbnail creation and purging works as expected. [17:32:23] okay [17:32:47] apergos: seriously?! ::sigh:: ms-be1 went down yesterday, and ms-be3 today. grumblegrumble. [17:32:57] anything on console? or just time to powercycle... [17:33:34] I didn't check thaat, I just saw it in the log with 'error syncing with node' [17:33:55] do we know what makes em fall over? [17:34:20] ms-be1 was looking like the 219d bug [17:34:25] but I checked uptime on the rest. [17:34:51] look at the dates on the boot logs on ms-be1 [17:35:56] yesterday ms-be3 had an uptime of 146 days. [17:36:21] do you want to poke at ms-be3 or should I? [17:37:00] oh, I can power cycle it [17:37:25] or not [17:37:28] uh [17:38:03] please go ahead. I usually also watch the console on boot to see if it does something funny like a fs check [17:38:47] how do I get on mgmt though? [17:38:49] cause [17:38:55] ariel@fenari:~$ ssh -l root ms-be3.mgmt.pmtpa.wmnet [17:38:55] ssh: connect to host ms-be3.mgmt.pmtpa.wmnet port 22: Connection refused [17:39:00] what stupid thing am I missing? [17:39:08] that's right! these are dell c2100s. they don't do ssh. only ipmi. [17:39:12] \o/ [17:39:30] http://wikitech.wikimedia.org/view/Dell_PowerEdge_C2100 [17:39:43] ohjoy [17:39:54] :) [17:39:59] hey guys [17:40:24] dschoon wants to use oracle's java for the analytics machines [17:40:27] (hadoop, etc.) [17:40:30] why? [17:40:50] drdee, dschoon? [17:41:10] using oracle java has a lot of issues [17:41:29] Hadoop requires Java 1.6+. It is built and tested on Oracle (nee Sun) Java, which is the only "supported" JVM. [17:41:31] we do use it for certain things [17:41:33] among philosophical concerns, there's also legal problems; we can't e.g. put it into apt.wikimedia.org (as far as I understand) [17:41:34] ok that failed: [17:41:41] right, but there is this: [17:41:43] since that would be redistibution which is forbidden [17:41:44] root@sockpuppet:~# ipmitool -U root -H ms-be3.mgmt sol activate [17:41:47] http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html [17:41:49] Error: This command is only available over the lanplus interface [17:41:56] which is an 'installer package' [17:42:06] that does not keep the java package itself in apt [17:42:14] but an installer that dls and installs from oracle [17:42:16] apergos: use "the easy way" [17:42:29] right [17:42:29] but trusts a remote website to run code on our machines. [17:42:43] help.ubuntu suggests using this [17:42:58] the installer we can put in our own apt [17:43:00] sun java? grrrr [17:43:18] ottomata: we shouldn't blindly trust remote code in production machines imho [17:43:27] why can't you use the openjdk though? [17:43:37] there is basically no alternative a openjdk is not compatible with hadoop [17:43:38] reason one: [17:43:39] http://wiki.apache.org/hadoop/HadoopJavaVersions [17:43:45] it is possible to do [17:43:46] i hate when ipmi sucks [17:43:47] but not supported [17:44:27] OpenJDK6 has some open bugs w.r.t handling of generics (https://bugs.launchpad.net/ubuntu/+source/openjdk-6/+bug/611284, https://bugs.launchpad.net/ubuntu/+source/openjdk-6/+bug/716959), so OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and beyond, please use other JDKs. [17:44:33] (might be ok in 7, who knows) [17:44:42] i did the puppet work for this in lucid [17:44:47] to use sun jdk 6 [17:44:49] using alternatives [17:44:56] could do the same for this [17:44:59] keeping openjdk as default [17:45:18] ottomata: that's for *compiling* [17:45:25] and it says "Hadoop does build and run on OpenJDK" [17:45:26] so from the console (i.e. ipmi_mgmt console) I should be able to get a linux login prompt (typically)? [17:45:29] yes, compiling maprreduce code [17:45:50] apergos: if the machine was up and responsive, yes; hitting 'return' would trigger a prompt refresh. [17:45:50] oh okay [17:45:58] we need to be able to compile mapreduce code, people will be using it [17:46:04] ok, just making sure it works the way I think it ought to [17:46:05] the launchpad bugs seems to suggest that it works with openjdk 7 though [17:46:09] yup. [17:46:19] can we test openjdk 7 first and if that fails try to find a way around oracle? [17:48:23] ah I see, it wants -I lanplus [17:48:33] (to not use the "easy" way) [17:48:35] asking dschoon, he's got more info on this than i do [17:48:42] I would love to find a way around snoracle. [17:49:04] though I'm not involved in the hadoop stuff [17:49:29] New patchset: Aaron Schulz; "Make reads come from swift for all wikis." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/20043 [17:50:04] !log powercycled ms-be3 ... in theory [17:50:04] qatop [17:50:11]