[00:00:12] New patchset: Lcarr; "Make test puppet repo act like production (pull from git)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2096 [00:01:15] lvs4 is still monitoring them [00:01:28] mark: 7th time's the charm ? [00:17:24] New patchset: Bhartshorne; "put in a default so things don't break on new servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2103 [00:17:44] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2103 [00:17:45] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2103 [00:19:21] LeslieCarr: cool if I merge your changes? [00:19:21] your change is unmerged on sockpuppet maplebed - is it safe to merge ? [00:19:25] haha [00:19:26] jinx [00:19:27] sure [00:19:28] :) [00:19:30] New patchset: Asher; "snapshot db26" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2104 [00:19:53] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2104 [00:19:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2104 [00:19:59] merged [00:30:24] maplebed: seems fine [00:30:38] you can do a second invocation for sda2 and sdb2 with partition_nr => 2 [00:31:02] won't they fail because sda and asb are 'os' disks? [00:31:17] oh yeah [00:31:20] they make a new partition table [00:31:22] yeah that won't work [00:33:19] mark: https://rt.wikimedia.org/Ticket/Display.html?id=2328 are for the two enwiki db servers [00:33:24] one quote per datacenter [00:34:01] two per data center, right [00:34:06] yep [00:34:10] well, nothing stopping me from making the final two partitions by hand. [00:34:19] i'm making a major prod puppet change, so if it breaks, yell at me :) [00:34:37] mark: RobH: http://pastebin.com/uY9ztij3 [00:35:17] nice [00:35:22] though less impressive than with 48 drives ;-) [00:35:28] lol [00:35:40] cheaper than the 48 disk version. [00:35:45] mark: Is there anything else you want done to this host before we order 4 more? [00:35:47] those thors were pricy [00:35:59] or woosters or anybody else? [00:36:01] if rob and you are happy with them [00:36:08] then I'm comfortable buying a few more [00:36:10] I am happy with them for this project [00:36:21] RobH: you're satisfied with the ipmi stuff as a method for management? [00:36:22] i would want to hack at the c series before moving it to anything else mind you [00:36:26] we'll get to know them better in the next few weeks [00:36:40] maplebed: yep, it does what we need it to do just fine [00:36:46] ok, sounds like we're set to order. [00:36:48] and when i finish polishing script it will be easy for everyone [00:36:54] suggest we place the order asap, maplebed [00:36:59] RobH: maybe you should order 5 instead of 4 (4 for me and 1 for you to continue playing with)? [00:36:59] ok, i am going to ask dell to update quote for better pricing since we are ordering a lot of shit [00:37:21] nah, too expensive a system for testbed [00:37:24] i think we are good to go [00:37:38] any testing i do for scripting works against R series just fine [00:37:50] well, if you need to play with them more before moving on to anything else, but you won't order one to play with, how do you propose ever moving them into something else? [00:37:59] and we need to test swift's fault tolerance anyway, right? ;-) [00:38:07] rob plays with production [00:38:12] damn right [00:38:13] riiight... [00:38:18] no fun to play in a sandbox. [00:38:38] ok. RobH do you need anything from me (rt or whathaveyou) or are you good to go? [00:38:47] the question of them being worth using is more of price versus ease of use [00:38:58] if the system is fine, then i am set, will ask for updated quote [00:41:26] maplebed: so you want 4 additional hosts [00:41:30] in addition to the one you have now [00:41:32] correct? [00:42:15] asked for 4 [00:47:49] mark: err: /Stage[main]/Puppetmaster::Gitclone/Git::Clone[operations/software]/Exec[git clone operations/software]/returns: change from notrun to 0 failed: git clone https://gerrit.wikimedia.org/r/p/operations/puppet returned 128 instead of one of [0] at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:756 ? [00:47:52] know what's up with that ? [00:49:42] yes leslie [00:49:45] you didn't get the origin right ;-p [00:49:49] New patchset: Lcarr; "fixing software repo for puppetmaster in prod" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2105 [00:50:00] * LeslieCarr throws a can over the cube wall [00:50:34] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2105 [00:50:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2105 [00:53:44] PROBLEM - DPKG on ms-be1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:54:07] RobH: yes, that's correct. total of 5 - ms-be1 + 4 more new ones. [00:55:35] PROBLEM - Disk space on ms-be1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:56:02] maplebed: cool, quote update requested [00:56:17] Jeff_Green: https://rt.wikimedia.org/Ticket/Display.html?id=2323 is for tampa right? [00:57:11] #2323: procure a pair of hosts to do lvs/pybal for the new payments cluster [00:57:21] i assume yes, but i rather be certain ;] [00:57:58] low-perf misc server cluster would be fine [00:58:18] New patchset: Lcarr; "trying moving the git clone to last" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2106 [00:58:27] pretty sure we dont have any spare of those in tampa, so will have to order [00:58:29] RobH: thats eqiad actually [00:58:34] oh, then we have a ton of them [00:58:35] yay! [00:58:41] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2106 [00:58:41] Jeff_Green: need public IP? [00:58:41] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2106 [00:58:46] i assume yes, lvs [00:58:48] er [00:58:52] these go in the extra secure rack [00:58:56] mark: so that means i have to snag from row b for both for now. [00:58:58] so you can't allocate [00:59:03] ahh, have to wait for row C [00:59:04] row C ;) [00:59:04] ? [00:59:11] RobH: oh we fixed the glitch :) [00:59:13] and that rack will be handled differently [00:59:16] wasnt sure if we would do now or wait, i will update the ticket and tie to the row C [00:59:16] we have public ip's in row a now [00:59:19] yeah exactly, secure rack [00:59:28] not relevant to this case, but good to know for the future [00:59:35] Jeff_Green: So that order is in for those racks, I expect a two week leadtime to get them in, perhaps 3. [00:59:39] i'm going to have to build out a pxeboot situation there [00:59:41] LeslieCarr: indeed =] [00:59:58] you're going to have to build out EVERYTHING in there ;) [00:59:59] robh that's totally fine [01:00:03] mark: yes [01:00:03] it'll be an autonomous island [01:00:10] no air is allowed in [01:00:14] ;) [01:00:14] PROBLEM - RAID on ms-be1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:00:15] it needs a name [01:00:20] greenland [01:00:23] ha [01:00:24] mark: so i can just take two exisiting ones and move them then? [01:00:28] or we wanna order more? [01:00:31] mark: that's funny on multiple levels [01:00:34] RobH: either way [01:00:47] will hold off, try to use existing once new row is in [01:00:52] and if they get used before then, will order more [01:01:00] tying to the row C expansion and such [01:01:00] where green is me, the color of US bills, island, etc. [01:01:04] :) [01:01:12] and it's cccccold there [01:01:39] it'll need it's own logo that looks Disney-ish too [01:02:57] we can actually put other servers in that rack right [01:03:03] otherwise unrelated to fundraising [01:03:05] sure [01:03:14] so we could build our normal infrastructure, as long as there's room for your stuff, plus it's secure [01:03:14] just has to lock shut and the like [01:03:18] indeed [01:03:26] so would be easiest to toss our shit in top of that rack [01:03:28] so we'll put a normal EX4200 in etc [01:03:30] i was thinking it might make sense to move aluminium, and fundraising dbs etc there [01:03:31] yeah [01:03:31] and just set aside the bottom half for payments [01:03:37] indeed [01:03:37] Jeff_Green: indeed [01:03:45] jinx. [01:03:46] is there redundant power? [01:03:50] everywhere [01:03:50] yep [01:03:57] all eqiad is redundant, and tampa will slowly migrate to be [01:04:06] can we do two switches? [01:04:11] ehm [01:04:15] not for our normal production stuff [01:04:30] why do you need that? [01:04:34] it'd be swell not to lose everything if we lose one power circuit or one switch, just because we chose to put it all in the same rack [01:04:38] if a switch dies i would imagine [01:04:41] yeah [01:04:43] then let's not put it all in one rack [01:04:49] =/ [01:04:52] ok, agreed [01:05:10] i would like to avoid cross rack within the same datacenter [01:05:21] what do you mean by cross rack? [01:05:35] dont all the payments stuff have to plug into some jujiper device we are ordering [01:05:43] that will all be in one rack [01:05:57] payments stuff in other racks wont need to plug into that? [01:06:08] unless I misunderstood jeff [01:06:15] that one secure rack will be redundant [01:06:17] so there will be 2 firewalls, one per power circuit [01:06:22] yeah [01:06:36] but stuff NOT in the secure realm, will be on our normal production switches [01:06:42] and of those, we'll only have one switch per rack [01:06:52] and if we end up using switches within the payments cluster there will be two little switches on each power circuit [01:06:54] if we place payments stuff in two racks, the second rack will not need to hit those firewalls with direct plugged connection? [01:06:54] so if THAT needs to be more redundant, let's not put THAT in one rack [01:07:01] yeah--that's what I was referring to [01:07:01] Jeff_Green: that's fine [01:07:04] ok [01:07:14] * RobH is confused [01:07:15] RobH: payments stuff will be in one rack [01:07:33] fundraising stuff will remain spread out across racks? [01:07:35] there will be no cross rack connections anywhere [01:07:37] (excepting payments) [01:07:39] RobH: think of it this way [01:07:41] ah, yes [01:07:41] err [01:07:56] payments is the walled garden that is internally redundant on that rack [01:08:13] payments will plug into the firewalls direct, and not need an access switch, correct? [01:08:24] and the rest of fundraising is essentially aluminium, db's, activemq box, maybe some log processing stuff [01:08:35] if they plug into a switch, you are back to a non-redundant spof in that access swtich. [01:09:05] robh if we do switches within the payments environment there need to be a pair but they can be tiny [01:09:17] so two 24 port junipers [01:09:22] or less [01:09:56] (cuz this means we should be adding those to the juniper order) [01:10:00] yeah, i mean we're talking about ports for: 4 payments servers, 2 lvs boxes, 1 shell/bastion, 1 logger box [01:10:12] the SRXes can do that just fine [01:10:14] no switches needed [01:10:23] they are 8 port? [01:10:27] 16 [01:10:27] yeah I'm totally on board for doing direct to the SRXs [01:10:29] cool. [01:10:33] it would be pretty funny to have more RU of network equipment than servers :) [01:10:34] ok, i am on same page [01:10:56] LeslieCarr: that's what we ended up with at CL isn't it? [01:10:58] :-P [01:11:09] haha very true :) [01:11:23] * mark wonders when perl will sneak in [01:11:45] mark: did you see CL donated $100K to perl foundation? [01:11:50] i did [01:11:54] hence my comment :D [01:11:59] and it's already there, like a disease [01:12:07] all my scripting is in perl :-P [01:12:07] yeah I saw that [01:12:17] not much longer ;-P [01:12:36] mark: so we wanna have sdtpa expand row d to redundnat power right? [01:12:42] did you want to email and ask for quote on that? [01:12:54] I can write it in perl and know it'll work, or attempt it in python (to be cool) and know it'll break until I get python chops [01:13:05] d3 will be empty when we decom those servers, which would be a good spot for the gluster servers [01:13:07] well, half of them [01:13:09] that's fine, we can fix it then ;) [01:13:10] split for redundancy [01:13:25] RobH: do the ciscos have redundant power? [01:13:29] yes [01:13:34] if i recall correctly. [01:13:39] lemme login to one to confirm [01:13:41] prolly do [01:15:31] 99.99% they do [01:15:55] indeed, they do [01:16:57] I love it when I make the $1 test donation to make sure I didn't just break payments, and still get the "You are amazing..." email from civicrm-Sue [01:17:22] take that self esteem boost and run with it [01:17:24] seems like for $1 for a US donor a "gee thanks" would do [01:17:25] it cost you a buck ;] [01:17:44] RobH: someday when I donate big it'll be in $1 increments [01:18:16] It looks like the same email, but she's actually being sarcastic. [01:18:37] ah thinking of it that way restores my sense of universal balance [01:19:45] mark speaking of perl-bashing I've had the pleasure(?) of hacking a CGI for OTRS reporting over the past couple days, complete with half-using their API and half avoiding it like the plague [01:20:00] Jeff_Green: have you ever donated not in english? you will get a translated thank you email :-) [01:20:01] API is the wrong word [01:20:27] I haven't--that sounds like a good too. [01:20:30] err good test too [01:20:35] heh [01:20:52] just make sure its a bigger-ish language, i didn't have time to add things like wolof [01:21:03] hahahah [01:21:17] is esperanto supported? [01:21:28] klingon? [01:21:33] I might understand the forms enough to function in French or German [01:21:35] we closed that wiki [01:21:50] must support all open wikis! [01:21:52] RobH: no one has translated that one [01:21:56] https://meta.wikimedia.org/wiki/Category:Translation/Fundraising_2011/Thank_You_Mail [01:24:09] alright--i'm out. have a good evening folks [01:24:16] night [01:35:16] New patchset: Pyoungmeister; "copy/paste error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2107 [01:35:49] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2107 [01:35:49] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2107 [01:57:48] New patchset: Asher; "snapshot db32" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2108 [01:58:14] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2108 [01:58:14] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2108 [02:03:52] PROBLEM - Frontend Squid HTTP on cp1002 is CRITICAL: Connection refused [02:06:52] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [02:09:32] PROBLEM - Memcached on ms-fe1 is CRITICAL: Connection refused [02:10:52] PROBLEM - Memcached on ms-fe2 is CRITICAL: Connection refused [02:23:38] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1710s [02:32:58] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 2316s [02:35:59] New patchset: Catrope; "Fix puppet restart for udp2log-aft" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2109 [02:43:08] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:44:18] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:52:58] RECOVERY - Apache HTTP on srv197 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.206 second response time [03:53:56] New patchset: Catrope; "Fix puppet restart for udp2log-aft" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2109 [03:54:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2109 [04:17:48] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:24:18] RECOVERY - Disk space on es1004 is OK: DISK OK [04:40:18] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [06:04:16] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [06:20:06] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [08:13:15] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [10:01:50] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 442696 MB (3% inode=99%): [10:06:47] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 412796 MB (3% inode=99%): [10:28:30] New patchset: Hashar; "integration site now mobile aware" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2110 [10:28:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2110 [11:13:14] RECOVERY - MySQL slave status on es1004 is OK: OK: [12:47:03] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [12:47:03] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [15:55:31] PROBLEM - check_minfraud3 on payments1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:31] RECOVERY - check_minfraud3 on payments1 is OK: HTTP OK: HTTP/1.1 200 OK - 8644 bytes in 0.223 second response time [16:14:21] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [16:30:21] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [16:40:31] PROBLEM - check_minfraud3 on payments2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:45:31] RECOVERY - check_minfraud3 on payments2 is OK: HTTP OK: HTTP/1.1 200 OK - 8644 bytes in 0.225 second response time [17:39:55] robh: regarding srv199, i think the best option is to turn off sata port a in bios. [17:40:09] !rt 2293 [17:40:09] https://rt.wikimedia.org/Ticket/Display.html?id=2293 [17:40:12] uhh [17:40:24] wouldnt that then disable using that drive? [17:41:08] its covered under warranty until 2012-02-06 [17:41:50] basically if its under warranty [17:41:53] i dont want to just disable something [17:41:57] i want it fixed ;] [17:42:24] is the hard disk plugged into sata port a, or into a controller card? [17:42:30] cmjohnson: 6? [17:42:33] ^ even [17:42:34] heh [17:42:41] into the controller card....nothing is on sata port a [17:42:49] in the other 1950's it is not used as well. [17:42:59] and looks like it was disabled in bios [17:43:08] ahh, ok [17:43:22] as long as its not the primary, then the error is more than likely it expects to see it, but doesnt [17:43:29] so yea, disable the port in bios and we should be ok [17:43:48] then make sure it boots without that, need it shutdown? [17:44:00] yes please [17:45:15] !log shutting down srv199 for bios tinkering by chris [17:45:18] Logged the message, RobH [17:45:28] cmjohnson: done [17:49:41] PROBLEM - Host srv199 is DOWN: PING CRITICAL - Packet loss = 100% [17:53:32] robh i am done...booting now [17:53:45] had to go get keyboard from crash cart upstairs [17:54:18] cool [17:55:58] robh: can you powercycle it and see if boots correctly now [17:56:29] it should show it on initial boot with crash cart [17:56:36] com2 in use, resetting drac [17:56:44] RECOVERY - Host srv199 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [17:57:05] cmjohnson: if you have the crash cart connected, you should be able to confirm the message is gone during post [17:57:11] since it requires someone to hit f1 to boot [17:57:36] ok then it should be fine..becuase it went through post and is at os login [17:57:44] cool, then its ok [18:03:34] PROBLEM - Apache HTTP on srv199 is CRITICAL: Connection refused [18:04:14] !log forcing puppet run on srv199 [18:04:15] Logged the message, RobH [18:06:26] yea, nagios will clear up [18:09:00] stuff on noc.wikimedia.org/conf/ is in SVN... is the repo public? [18:13:54] RECOVERY - Apache HTTP on srv199 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [18:22:54] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [18:27:19] hello ops :) [18:27:27] can someone could potentially merge https://gerrit.wikimedia.org/r/#change,2110 ? :) [18:27:42] that is just some CSS / html tweaks for the continuous integration website [18:27:52] no need to deploy it right now, but a merge would be great [18:32:57] maplebed: are we buying eqiad storage nodes as well? [18:33:05] yes but not right now. [18:33:13] wanna try in tampa first? [18:49:22] does anyone know the story with the cronspammy "lsusb | grep..." cronjob on spence? [18:49:57] I don't but I'd bet it has something to do with a USB SMS gateway. [18:50:28] i'm gonna fix it--looks like the barf is just a path issue [18:51:06] Jeff_Green: yes, that's me [18:51:11] I can fix that [18:51:18] i'm in there anyway, should I just do it? [18:51:35] sure [18:51:42] LeslieCarr: So I fixed the udp2log init script issue. At first I thought it was a bug in the init-scripts library, but as it turns out it was really just Ryan's fault [18:51:49] haha [18:51:54] of course it was ;) [18:51:59] ok. perhaps cronspam will become smsspam :-P [18:52:27] i dunno, i was enjoying having the longest stretch ever of not getting paged [18:53:09] Jeff_Green: weeee [18:53:14] the spool should be clean [18:54:20] New review: Lcarr; "good, looks like it will now actually look for the right pidfile :)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2109 [18:54:50] Jeff_Green: but it's good to know that our sms sending doober needs to be restarted so often... [18:55:21] sounds like quality hardware yah [18:55:29] LeslieCarr: For laughs, read patchset 1 on that change (including the commit-msg), that's when I was still convinced /lib/lsb/init-functions was at fault [18:56:17] :) [18:57:56] New patchset: Mark Bergsma; "Added eqiad service IPs for lvs realservers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2112 [18:58:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2112 [18:58:20] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2112 [18:58:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2112 [19:06:46] New patchset: Mark Bergsma; "Copied text-squid role class into role/cache.pp, renamed to role::cache::squid::text" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2113 [19:07:23] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2113 [19:07:24] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2113 [19:07:26] Jeff_Green: ugh. I'll redirect stderr and out for that thing [19:08:23] Jeff_Green: that second | was needed I'llreadd it [19:11:12] New patchset: Mark Bergsma; "Fix variable name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2114 [19:11:29] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2114 [19:11:29] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2114 [19:14:23] New patchset: Mark Bergsma; "Work around Puppet bug" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2115 [19:14:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2115 [19:15:01] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2115 [19:15:01] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2115 [19:21:00] New patchset: Asher; "path to support legacy mysql installs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2116 [19:21:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2116 [19:21:19] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2116 [19:21:19] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2116 [19:23:07] New patchset: RobH; "upated added new simple shell script for ipmi mgmt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2084 [19:23:14] notpeter: sorry, was on phone with the heating contractor. cool re. /dev/null [19:23:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2084 [19:24:49] !log disregard any flapping by mw1001, its my script testbed [19:24:50] Logged the message, RobH [19:25:17] ok, someone review my script ;] [19:27:03] New patchset: Asher; "provide socket path for older installs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2117 [19:27:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2117 [19:27:23] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2117 [19:27:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2117 [19:29:11] mark: so you dont think the mgmt script should just be on all bastion hosts? [19:29:22] bast1001, fenari, etc.. [19:30:07] well I'd prefer it if ops started handling sensitive stuff and passwords and shit on a host where not every third party contractor or volunteer logs in ;) [19:30:10] cuz i can toss the file to be included as part of misc::bastionhost [19:30:18] that would be very lazy of you ;) [19:30:29] but those hosts can access ipmi anyhow [19:30:36] with a quick ipmi tunnel through the bastion anyhow [19:30:47] but they don't have the password [19:30:49] (plus fenari already has it installed ;) [19:31:00] the software isn't sensitive, the password is [19:31:06] right, but the script doesnt have the password, though i would assume folks will set it in shell environment [19:31:17] but if they dont, it will simply prompt per command run [19:31:19] yes [19:31:21] but if you install that on fenari [19:31:24] people will use it there [19:31:54] true, but they already have the ability to do all the stuff the script does, by manually running the commands [19:32:09] so the arguement is to get all ops to use an ops bastion? [19:32:16] yes [19:32:30] So should I install bast1002 or something for ops only use? [19:32:37] then create a ops bastion misc group for this stuff? [19:32:45] or something with a nicer name :P [19:32:52] (i would assume i should pull ipmitool OFF fenari) [19:32:55]