[00:04:00] maplebed: i think so, but not 100% sure, i have no reason to think he wouldnt be [00:04:12] that's good enough for me. [00:04:20] I have ms-be2,3,4 currently waiting at the bios screen [00:04:25] doing ms-be5 now. [00:04:31] I'll send him email. [00:56:59] maplebed: commonswiki: bug 34440 InvalidResponseException trying to list_objects. Message `Invalid response (0): Unexpected HTTP response code: 503`; Site: `wikipedia` Lang: `commons` ThumbRel: `5/5a/Igor_Utkin_17.jpg/` [00:57:08] there is a modest number of such things in the last few days [00:59:23] AaronSchulz: where is that line from? a log somewhere? who (mediawiki?) was making the network call, and do you know what it was trying to accomplish? [01:00:02] maplebed: from wmf-config/swift.php in wikipedia/logs [01:00:08] this is for thumbnail purging [01:00:49] the screnshot in that bug says 'Container has no objects' as the error message - I thought you patched that. [01:01:47] yeah I don't know why that's still in the log entries [01:02:14] unless it's hashar's comment "I have added a wfDebugLog() call on production (local revision number is 2973)." [01:02:19] and that's what's showing up... [01:04:02] anyway, I wonder why Swift would respond with 503 [01:04:56] maplebed: are there any internal logs for swift that show failures? [01:05:01] I think there are a bunch of conditions under which swift responds with a 503 [01:05:18] IIRC I could get log lines to show up in syslog, yeah. [01:05:27] or /var/log/messages. [01:05:34] but only when I specifically said to log stuff [01:05:40] I think by default not much is logged. [01:06:30] hm. [01:06:43] it's clearly not the case that the commons buckets are empty though, [01:06:56] so this is likely not the same as the 'Container has no objects' thing unless that was a misleading error message. [01:07:39] AaronSchulz: is it possible to log more stuff when that error is triggered in cloudfiles? [01:07:56] maybe the content of the 503 page returned? [01:09:45] note sure how easy that would be [01:09:49] * AaronSchulz was looking [01:10:49] I found a matching 503 line in ms-fe2's log for one of them - it's interesting that the query took 30.0041s to execute. Sounds like a timeout to me. [01:14:41] * AaronSchulz looks at https://graphite.wikimedia.org/dashboard/temporary-4 [01:24:50] maplebed: crazy :) [01:24:56] ? [01:25:27] what's crazy? [01:25:38] the tp50 graph [01:25:47] * AaronSchulz wonders if that link works [01:25:59] shit. I can't log into to graphite. [01:27:44] AaronSchulz: would you screenshot it for me? [01:28:06] binasher: tp999 in the default list is not very useful, I'd prefer a tp90 or something, heh [01:30:05] maplebed: i can get into graphite... tried labs creds? [01:32:30] yup. [01:32:33] oh wait... [01:32:56] wrong username. [01:33:00] ::sigh:: [01:36:02] it uses the same username as labsconsole [01:36:07] username/password, that is [01:37:20] AaronSchulz: just before one of the 503 errors: proxy-server ERROR with Container server 10.0.0.249:6001/sdt1 re: Trying to GET /v1/AUTH_abcd/wikipedia-commons-local-thumb.8c: Timeout (10s) (client_ip: 10.0.2.212) [01:37:24] AaronSchulz: yeah, i can replace tp999 with a tp90 [01:38:53] there were some cases where seeing 99.9% was useful but probably not many.. [01:39:05] it can still be done manually right? [01:39:31] the graph commands are just freetext [01:39:53] AaronSchulz: my suggestion - when that error occurs, retry once. [01:39:54] :P [01:39:56] gah, doesn't work [01:40:11] binasher: hmm, maybe you can keep it but add a 90 [01:40:24] "No Data" fail :/ [01:48:10] you can calculate nth % in graphite but that doesn't provide the real thing on pre-aggregated data. i'll see if i really care about 999, i could also replace tp50 with a tp90. its already updating 39k graphs/minute so i'm hesitant to add another per metric, which would up it to 52k. maybe i should steal another server! it supports sharding. [12:44:32] New review: Demon; "I disagree that we should allow everything but just disallow viewvc. I know that was the original pu..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/2888 [12:46:49] New review: Mark Bergsma; "Please stop using these global, unqualified variables. It's dirty and will break in the next version..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2902 [12:54:32] New review: Mark Bergsma; "Please replace this by the generic rsync classes we already have. No point in redoing this every tim..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2879 [13:05:04] New review: Mark Bergsma; "Please fix indentation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2815 [13:07:40] New review: Mark Bergsma; "Please fix indentation" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2819 [13:10:28] New review: Mark Bergsma; "Please fix modes, normal files should be 0400, 0440 or 0444 (read-only)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2838 [13:12:47] New patchset: Hashar; "Bug 28469 - Make SVN Documentation be indexed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2888 [13:13:57] New review: Demon; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2888 [13:17:13] New review: Mark Bergsma; "Thinking ahead, can you replace that by a recursive definition for the docroot directory instead? So..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2888 [13:20:40] New review: Mark Bergsma; "rt-mailgate doesn't support https..." [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2446 [13:22:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2607 [13:22:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2608 [13:22:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2607 [13:28:14] New review: Mark Bergsma; "Please get your indentation right" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2839 [15:18:22] !log backing up wikitech in hopes of upgrading some of its software [15:18:41] if wikitech starts to be odd, its cuz im maxing out the vm its on. [15:19:45] Logged the message, RobH [15:20:02] !log shutdown frdev offsite vm per email to engineering last week [15:20:06] Logged the message, RobH [15:22:29] bah, it broke [15:22:38] seems i need to lock it to pull it [15:38:23] bleh, wikitech isnt in lvm, but the linode is only 60% space used [15:38:33] we must have gotten an upgrade, now i kinda wanna redo that entire system. [15:39:32] i had to wipe out a ton of old stuff on it to get room to dump the database as it is. [15:41:00] !log Converted Toolserver switch interfaces on csw2-esams to pure routed-only mode [15:41:06] Logged the message, Master [15:51:26] I broke wikitech, fixing it now [15:51:34] stupid update script failed out, rolling it back [15:54:01] restoring db backup of wikitech [16:05:58] !log wikitech outage resolved [16:06:04] Logged the message, RobH [16:06:04] whew. [16:08:51] hi chris [16:09:23] cmjohnson1: I think that ben/maplebed was asking about you. [16:09:47] He plans to do some work on the c2100 series servers, having you on site makes him feel a bit better since these are new servers [16:10:00] (but its early yet on west coast) [16:10:08] i have some other work possibly ;) [16:10:14] I'd like to clear out row B in pmtpa [16:10:57] huzzaaaaah [16:11:01] death to old racks! [16:15:17] new pmtpa row C racks are 42U right? [16:16:35] * mark cleans out the old racks in racktables [16:16:37] one by one... [16:18:18] robh: yes...i am working on what he needed [16:18:30] coolness [16:19:53] how come sq81 is listed in rack C2 pmtpa? [16:21:53] no clue [16:21:59] mark: i must've not moved it...there is a hole in d3 where it is located....i will update [16:23:23] thanks [16:23:32] ok I've updated the racktables layout to reflect the row C changes [16:24:07] * mark adds a new eqiad row C as well [16:24:13] RobH: 47RU right? [16:24:48] i think only 42 [16:24:53] since it was 20amp not 30 [16:25:01] cmjohnson1: ^? [16:25:03] eqiad. [16:25:09] ohhh, sorry, yes [16:25:26] good [16:25:37] eqiad looks impressive in racktables rackspace view now [16:25:40] so if we want to have a rull row d [16:25:45] row d even [16:25:45] as opposed to tampa, which looks very shitty with all the 3-rack rows [16:25:51] we may need to expand the cage sooner than later [16:26:01] as the facility is filling up, and when i asked omid about it [16:26:08] he said they have no notes about us reserving further space [16:26:10] we have right of first refusal [16:26:15] thats what i thought [16:26:15] in our contract I think [16:26:22] mail CT about it [16:26:24] but he didnt see it in the system, so we should follow up [16:26:24] he should look it up [16:26:26] will do. [16:27:25] do we still need those decommissioned racks? [16:27:28] racktables can delete now [16:27:41] if we delete it then we have no record of it existing [16:27:58] less than ideal in a real inventory mgmt system [16:28:00] ok [16:28:06] I deleted the powermedium power strips [16:28:08] but none of the rest [16:28:12] yea those we dont care [16:28:14] indeed [16:28:15] indeed [16:28:25] damn it we have worked together too long. [16:28:42] we are saying the same things at the same time ;P [16:29:50] robh: yes 42 ...but i think you covered it [16:30:06] yea i thought mark was asking about something else, no worries [16:31:05] yeah [16:31:12] cmjohnson1: will you have time to move stuff from pmtpa B3 to D1 today? [16:31:26] yes...whatever you need [16:31:30] I see the management core switch is still there [16:31:35] yes [16:31:36] and that will involve running a bunch of patches [16:31:40] i'm making a ticket now [16:32:08] cmjohnson1: after you finish what you are working on for mark [16:32:19] the issue of those not saving in bios is more than likely going to fall to you to fix. [16:32:26] i am going to take a quick look at one now [16:32:59] robh: okay [16:33:52] !log poking at bios on ms-be3 [16:33:56] Logged the message, RobH [16:36:47] the esc-0 command worked for f10 for me. [16:36:55] its rebooting, lets see if it really saved. [16:37:14] i hope so...i did it with cart and it did not. [16:38:05] it said save and exit though? [16:38:15] hrmm, yea, did it to me just now [16:41:24] hmm [16:41:30] I think hostway removed our out of band management link [16:42:09] i found why the c2100 hosts do pxe first [16:42:11] * Force PXE First [Enabled] * * [16:42:17] under boot settings configuration [16:43:14] changed it, i bet it works now. [16:45:08] mark: is that a multimode fiber? [16:45:10] ok, its installing, but grub error, but it goes to disk [16:45:39] robh: that would make sense since the C series are the "cloud" servers [16:45:45] cmjohnson1: it was copper [16:45:54] the only copper link we got from hostway [16:47:09] do you want me to find out? [16:47:16] yes please [16:47:23] and hopefully move, to [16:47:24] too [16:47:30] I want to get rid of row B [16:47:34] k [16:49:28] !log fixed boot order on ms-be3, fixing ms-be4 [16:49:31] Logged the message, RobH [16:52:03] i bet eqiad looks nice now with the 3 rows [16:52:10] cmjohnson1: kinda, we had some servers donated to us by yahoo a long time ago [16:52:15] and they always pxe booted [16:52:24] to get console redirection and then told to boot on disk [16:52:29] but always to pxe first [16:52:53] mark: matt didn't know off hand...will call me when he gets over here [16:53:08] are you in pmtpa now? [16:53:17] i am on 10 right now...will be moving shortly [16:53:33] I thought you were gonna check out what's there, not ask matt ;) [16:55:23] going to relocate now [16:55:25] !log ms-be4 boot order fixed, fixing ms-be5 & ms-be2 [16:55:28] Logged the message, RobH [17:03:42] RobH: thanks for finding the force to pxe thing. [17:03:57] how is it that you can save the bios but I can't? [17:04:01] maplebed: I believe you're gonna need the management network today? [17:04:02] esc+0 [17:04:11] hrmph. [17:04:13] dell f10 emulation [17:04:24] escape always popped up a 'exit without saving' thing for me. [17:04:27] seems it will take f10 in drac console [17:04:29] do you have to just do it faster? [17:04:30] but not ipmi console [17:04:36] yea, esc then 0 [17:04:38] fast. [17:04:42] but not at same time. [17:05:16] mark: I'm going to be upstairs for the analytics day, so ... actually no. [17:05:21] ok [17:05:23] good [17:05:30] though I'm extremely annoyed that the ms-be hosts are now in a grub error. [17:05:34] we're gonna move the management network gear in pmtpa [17:05:42] what is the grub error? [17:06:06] mark/robh - found that contract [17:06:16] we do have the ROFR [17:06:19] - Customer shall have a right of first refusal ("ROFR") during the first twenty-four (24) months of the Initial Service Term for seven (7) additional [17:06:19] cabinets that are specifically located in Cage 61150 in DC6 ("ROFR Space"). [17:06:24] woosters: we should let equinix know [17:06:39] will send to Omid and ErikSilver [17:06:44] ok [17:08:01] RobH: do you have the grub error up? [17:08:34] cmjohnson1: what's up? :) [17:09:23] mark: i see a red utp going to mr1pmtpa from hostway [17:09:33] port 7? [17:09:54] correct [17:09:56] hmm [17:09:58] it's down [17:10:02] mark: I can't get to any of the consoles at the moment. RobH reported that there was another setting in the bios I was missing to get them to boot from disk but now they're at a grub error. [17:10:16] did you already take down the mgmt network? [17:10:23] I didn't [17:10:23] maplebed: console payload error? [17:10:27] maybe chris did? [17:10:28] mark: i have Matt coming by later...he can check his end [17:10:32] cmjohnson1: ok [17:10:38] maplebed: or network error? [17:10:42] RobH: ms-be2 says another connection is active, and the rest just don't respond. [17:10:44] cmjohnson1: can you run management uplinks to prepare for the management kit move? [17:10:49] yea, thats just console error, not network [17:10:53] maplebed: send consoleclose [17:10:54] maplebed: make sure 'serial redirection after bios' is DISABLED [17:10:58] its in the help part of the script ;] [17:11:01] otherwise that can give a grub error [17:11:10] this is a grub disk load error [17:11:15] it goes to grub rescue shell [17:11:20] not a post redirection error [17:11:35] ok [17:11:44] mark: i am not certain which ones are mgmt uplinks [17:11:44] maplebed: the ~~. kills the sockpuppet connectoin rather than the console connection half the time [17:11:54] so it leaves the serial open [17:11:57] RobH: you gotta count the ~s. [17:11:59] you have to just close it with consoleclose [17:12:02] for me it's three. [17:12:09] but that's only the problem on ms-be2. [17:12:10] thats what i did wrong, [17:12:15] the others connect but just sit there. [17:12:17] cmjohnson1: so... every rack in row C and row D should get a separate uplink to D! [17:12:20] d1 [17:12:27] it's easy right now [17:12:31] because you only need to row D today [17:12:31] maplebed: i literally just did them [17:12:35] they take while to reboot [17:12:40] but they eventually posted [17:12:41] ah. [17:12:48] they are all working though, i watchen them work [17:12:51] ok, ms-be5 is now at the grub rescue prompt [17:12:55] (post, reboot, svae disk setttings) [17:13:01] so yea i think your install on them has issues now [17:13:06] but the boot order issue is fixed [17:13:07] cmjohnson1: as far as I can tell, only D2 right now has an uplink going back to B3 [17:13:10] that needs to be removed [17:13:21] I -think- that racks D1 and D3 have an uplink to the management switch in D2 [17:13:30] those uplinks should be removed as well [17:13:32] maplebed: it may be an issue with the partman config and the bios partition, but atleast now you can reliably boot to see it happen [17:13:49] and this is why I don't fuck around with hardware. [17:13:50] then msw1-d1-pmtpa, msw1-d2-pmtpa and msw1-d3-pmtpa should get uplinks to msw1-pmtpa (in rack D1) [17:14:02] d1 and d2 have uplinks [17:14:03] this shit just pisses me off, and not in a way that inspires me to figure out how to fix it. [17:14:07] (beware, there will be TWO management switches in D1, the normal rack management switch and the core management swithc we're moving) [17:14:19] cmjohnson1: uplinks going where? [17:15:12] d1 is going to port 21 on msw1 [17:15:29] d2 port 11 [17:15:44] port 21 is supposed to be the SCS [17:15:46] so how did that end up there!? [17:15:58] anyway [17:16:01] nevermind what is there now [17:16:03] make new patches [17:16:07] it is scs [17:16:14] looked at the wrong one [17:16:27] i don't have anyting on d1 [17:16:29] sorry [17:16:33] it doesn't matter [17:16:35] make new patches [17:16:45] one from msw-d2-pmtpa to top of rack D1 [17:16:52] one from msw-d3-pmtpa to top of rack D1 [17:16:59] and one from msw-d1-pmtpa to top of its own rack [17:17:07] that's where msw1-pmtpa will be, top of D1 [17:22:49] mark patches are set [17:22:59] good [17:23:11] cmjohnson1: so to move mr1-pmtpa and msw1-pmtpa, we need to move two patches from hostway: [17:23:18] 1) the fiber to the 10th floor, which is on msw1-pmtpa [17:23:23] and 2) the copper patch just mentioned [17:23:27] I assume you're gonna need Matt for that? [17:23:46] yes...good assumption [17:23:57] and he's coming over later? [17:24:00] then we can do it then [17:24:22] matt has already run the fiber...just not sure which goes where [17:24:32] yeah he'll need to plug it in on his end [17:24:47] there are two fibers to the 10th floor [17:24:52] make sure he's doing the right one [17:24:55] one is management, one is production [17:25:04] we're gonna move both at some point, but right now i'm talking about management [17:25:29] okay [17:27:04] matt is onsite [17:27:10] ok [17:27:16] do you know what needs to be done? [17:27:31] move the mgmt fiber from msw1 [17:27:36] and the copper [17:27:37] move mr1-pmtpa, msw1-pmtpa to top of D1 [17:27:46] and the two connections from hostway [17:35:40] mark: do you want me to use u410/41 for msw? is okay to disconnect...also...we have copper link from port 0/1 mr1 to rx80 [17:35:48] a bit lower I think [17:35:53] keep some free space for cable mgmt [17:36:05] you can remove the link to the rx8 [17:36:11] that's not working anyway [17:37:22] cmjohnson1: it's probably easier to put the switch on top, the router below it [17:37:29] since there will be cables coming from all racks from the top [17:37:34] and the router only needs one uplink [17:38:06] okay [17:40:57] !log disconnecting management fiber from msw1-pmtpa [17:41:00] Logged the message, Master [17:46:54] !log powering down msw1-pmtpa for relcocation to d1-pmtpa [17:46:57] Logged the message, Master [17:53:17] where are we gonna put csw5? [17:56:20] mark: ..... [17:56:24] in the bin? [17:56:26] ;] [17:56:34] hehe [17:56:36] it should go on a row end [17:56:48] i figure to exhaust to the end cap [17:57:04] sound right? [17:57:29] it helps the side to side slightly, and there is so much cold air in pmtpa now i worry more about it getting rid of hot [17:57:56] though the cold intake would be best facing towards the wall where the cooling comes in. [17:58:19] the new row C has same hot cold orientation right? [17:58:24] I don't know [17:58:33] lookin at photos now, it does [17:58:49] so if i recall csw5, when racked facing the hot row in normal network fashion [17:58:58] has the cold intake on its left [17:59:03] and hot output on its right [17:59:26] so putting it in c3 would be with the cold intake facing the row of servers, oriented towards the cold air blower on the side of room [17:59:37] with the hot output into the side of the rack, which has a side panel [17:59:53] if we keep it turned off I'm fine with it too ;) [17:59:54] any hot air excaping through the rack front are blown away from cold intake of the row [18:00:04] in that case lets not use it. [18:00:05] it's just for spares for csw1-sdtpa now [18:00:06] ;] [18:00:13] it can be used as a lab switch [18:00:16] well, racking that would be the best place i think [18:00:16] but not sure how much we need that [18:00:20] yes [18:00:38] it can soft off and be powered on via mgmt right? [18:00:57] uh [18:00:58] no [18:00:59] then its best of both worlds, lab whne needed but off and not fubaring airflow otherwise [18:01:02] bnah [18:01:05] damn them! [18:01:14] i vote dont rack it ;] [18:01:24] make a BBQ out of it? [18:01:37] send to office to incorporate into a coffee table fixture [18:01:42] 'this ran wikipedia for years' [18:01:50] now it holds up my drink. [18:02:00] hehe we COULD send it to the office [18:02:13] for use there [18:02:36] send them some mrj21 too ;] [18:12:57] mark: fiber link has been re-established [18:13:03] yes stuff is back up [18:13:04] checking now [18:13:32] so what's the story on the copper link now? [18:13:36] where do you want mgmt connections for d1 and d3? (ports?) [18:13:46] it's in the ticket... [18:13:48] checking [18:14:24] okay...i will check it [18:14:45] 0/1/10 Down None None None None No l 0024.380d.7849 << msw-d1-pmt [18:14:46] 0/1/11 Down None None None None No l 0024.380d.784a << msw-d2-pmt [18:14:46] 0/1/12 Down None None None None No l 0024.380d.784b << msw-d3-pmt [18:14:49] ports 10 to 12 [18:16:14] mark: spoke with Matt about out of band...he states that it was associated with "wikia" and that it was disconnected at our request a year ago. if we want to set it back up please let them know [18:16:34] it was originally wikia that's right [18:16:53] but it was working recently ;) [18:17:00] I don't think we requested it to be disconnected [18:17:02] but I'll ask them [18:25:45] chris... I think you've created a loop [18:26:26] make sure that none of the management switches in row D have any old/temporary uplinks left [18:28:22] i'm so happy now that management network is separate from production network :) [18:28:38] mark:on msw-d2 i have a mgmt link going fro 46m port 48 to msw d1 pt [18:28:42] 46 [18:28:56] remove that [18:31:21] mark: still looping? [18:31:29] it's still blocked [18:31:31] let me bounce the port [18:31:55] still looping [18:32:01] can you check all other racks? [18:32:16] there should be one, and only ONE connection on each rack's mgmt switch leaving the rack [18:32:20] and that should be to msw1-pmtpa [18:32:34] if row B still has anything connected, disconnect that [18:40:23] now I can't reach it anymore [18:40:25] what are you doing? [18:41:28] mark: nothing ...verifying i have things in the right place ....the only question i have is [18:41:30] 0/1/20 Down None None None None No l 0024.380d.7853 cr2-pmtpa [18:41:40] i have a link from msw-d1 to cr2...should I replace [18:42:02] yes please [18:43:09] there is still something on D2 causing a loop [18:43:43] got it [18:43:47] how about now [18:44:15] yep now it's fine [18:44:18] what was it? link to d3? [18:44:22] yes [18:44:29] ok [18:44:38] cool [18:45:02] now please make sure racktables is up to date [18:45:04] i have to make patches for scs still but the bulk of everything is done [18:45:13] and that :) [18:45:23] I wonder if we should move the other floor 10 fiber today too [18:45:26] the production one... [18:45:31] that'll cause a few seconds of downtime if we're quick [18:45:46] i will need to get matt back in here...but why not [18:45:54] if he's available yeah [18:45:59] but you can do some prepwork for it first [18:45:59] lemme check [18:46:08] no first finish what you have to do on the current stuff [18:48:07] mark: i can do the prep work now...let's do it now while matt is available [18:48:18] ok [18:48:22] so here's the situation [18:48:27] there is that one fiber coming from the 10th floor [18:48:34] currently it goes to csw5-pmtpa port 5/4 [18:49:11] then from csw5-pmtpa there are TWO patches to cr2-pmtpa [18:49:17] the plan is to move csw5-pmtpa out of that path [18:50:11] first, I'm gonna disable one of the two links between csw5-pmtpa and cr2-pmtpa [18:50:35] okay....where on cr2 do you want the new fiber to go [18:50:48] port xe-0/0/0 which is now in use [18:50:59] that's why I'm going to disable it [18:51:03] okay [18:51:40] ok [18:51:42] checking [18:51:53] ok [18:52:03] can you unplug the fiber that is now in xe-0/0/0 on cr2-pmtpa? be careful to pick the right one [18:52:53] pulled...i see where you already have it down [18:52:59] yes [18:52:59] ok [18:53:11] now... matt has already run a new fiber to cr2-pmtpa right? [18:53:13] the one that is going to be used [18:53:19] that one you can now plug in in xe-0/0/0 [18:53:27] assuming it's not connected on the other end [18:53:37] matt wants to know if you did anyting on csw5 [18:53:46] what does that mean? [18:54:22] nevermind...his mistake. [18:54:24] yes I disabled a port [18:54:28] the new fiber is plugged in [18:54:29] the one you just disconnected [18:54:32] ok [18:54:42] now... don't do anything yet [18:54:58] if matt in a bit plugs out the old fiber on his patch panel, and plugs in the new one, then csw5-pmtpa is out of that loop [18:55:02] however [18:55:12] then csw5-pmtpa is still connected via that other link [18:55:20] and that will cause problems too [18:55:31] so you two will have to work together, and quick [18:56:02] chris: YOU are going to disconnect the other link, xe-1/0/0 [18:56:09] that is supposed to be the other fiber to csw5 [18:56:11] is that correct? [18:56:45] at the same time, matt will move the 10th floor link to the new fiber [18:56:54] this should be done within a few seconds [18:57:06] is this clear? [18:57:16] yes [18:57:29] ok [18:57:31] if stuff goes to hell [18:57:36] for more than a minute or so [18:57:40] and you cannot talk to me [18:57:43] (I assume you can't) [18:57:48] then you two both rollback [18:57:49] i am going to pull the fiber at the same time he plugs in the new one [18:57:51] do it in reverse [18:57:55] yes [18:58:00] okay.. [18:58:05] if you two reverse, the current situation should be back [18:58:08] ok [18:58:13] going now [18:58:14] do you have my nr? [18:58:16] nooo [18:58:18] no [18:58:26] never do this until I say start [18:58:30] k [18:58:31] my number is +31 654282595 [18:58:58] yours is +1 813 965 1968? [18:59:03] yes [18:59:11] i dont have int'l dialing [18:59:16] what? [18:59:24] * RoanKattouw_away reminds mark and cmjohnson1 that they're posting their cellphone numbers to a publicly logged channel [18:59:25] we need to make sure you get that [18:59:47] call me if it goes bad [18:59:50] ok [18:59:59] can you text? [19:00:04] yes [19:00:08] ok [19:00:12] what is the port you need to unplug? [19:00:34] 1/1 [19:00:39] no [19:00:48] 1/0 [19:00:53] 1/0/0 [19:00:54] sorry [19:01:05] is matt ready? [19:01:10] yes [19:01:17] ok, start [19:01:21] ok [19:05:19] "Request: GET http://en.wikipedia.org/wiki/Special:Preferences, from 10.64.0.125 via cp1014.eqiad.wmnet (squid/2.7.STABLE9) to () [19:05:19] Error: ERR_CANNOT_FORWARD, errno (11) Resource temporarily unavailable at Fri, 02 Mar 2012 19:04:55 GMT " [19:08:06] what just happened? [19:08:54] robla: I think they were moving pipes around [19:09:08] saw something about switching fiber [19:09:23] last thing in the log was from cmjohnson1 [19:09:45] robla: I'm just looking at backlog here [19:09:54] god damn foundry [19:10:22] !log Did a hot cut to remove csw5-pmtpa out of the path of cr1-sdtpa -> csw1-sdtpa -> csw5-pmtpa -> cr2-pmtpa [19:10:25] Logged the message, Master [19:11:12] mark: u there? [19:11:16] yes [19:12:50] do you want to change any of the existing links to single mode today? [19:12:56] or another time? [19:12:58] yes we can [19:13:06] let's get it over with [19:13:13] the 1G link is no longer used now [19:13:19] we can't do as13-680 today [19:13:20] it's on csw5-pmtpa which is not connected to our network as of just now [19:13:23] that's fine [19:13:24] we're not using that [19:13:30] tell matt he can hook it up to our new router whenever convenient [19:13:35] ok [19:14:03] the other one, AS30217, we can do now if I deactive it gracefully first [19:14:28] matt will need a few minutes [19:14:31] that's fine [19:14:54] please confirm that it's in port xe-0/0/2 [19:15:38] yes..that is correct [19:15:42] ok [19:15:49] let me know when matt is ready, then I will shut down BGP on that port [19:15:53] k [19:17:29] about 10 mins [19:21:32] please make sure you get the SCS back up today [19:24:02] i will...only 2 connections to msw1 [19:27:39] mark: matt is ready [19:27:50] i'm gonna shut down bgp now, wait until i report back [19:27:54] okay [19:29:34] cmjohnson1: ok go ahead [19:29:45] okay [19:31:15] actually [19:31:21] we should replace the optic on our end too of course [19:31:23] do you have one? [19:32:03] yes...i replaced it already ...just waiting on matt who had to replace on his end [19:32:09] ok [19:32:18] we should be good [19:32:19] where did you get it from? [19:32:36] csw5 [19:32:42] ok [19:33:08] I can't ping the other end [19:33:20] 84.40.25.101 [19:33:38] i'll bring up bgp anyway, perhaps it's a filter [19:33:40] i see link on our end....going to check hostway [19:33:51] yeah nevermind [19:33:53] it's working [19:34:23] ok [19:34:25] cmjohnson1: can you make sure that all fibers are labeled [19:34:28] and then give me the nrs? [19:34:35] yes [19:34:41] cool [19:37:55] mark: what is 0/0/3? [19:38:03] looks like it is going back to csw5 [19:38:18] checking [19:38:39] no it should be going to asw-d-pmtpa [19:38:46] matt wants to know if you want to use 1/0/0 for as13680 [19:38:53] no [19:38:59] i'll give him a new port [19:39:03] ok [19:39:08] if he wants to do it now I can prepare it [19:39:44] he wants to get all the cross connects in place. he needs someone else to do config changes [19:39:52] yes that's fine [19:40:01] i'll prepare the port now [19:40:04] then he can finish it whenever [19:40:29] port 1/2/0 [19:40:36] can you make sure to put an LR optic in there? [19:40:40] you can take from csw5-pmtpa indeed [19:41:32] cool [19:45:10] can someone take a look at bastion1.pmtpa.wmflabs? I'm getting random bytes when I attempt to ssh [19:45:22] There's no Ryan in here, so I dunno who to ping. [19:46:00] it works for me [19:46:17] but... I see error messages in the kernel logs [19:48:16] I can't log in. [19:48:18] :/ [19:48:22] so i can't get to any labs box. [19:48:32] bastion1.pmtpa.wmflabs... [19:49:01] so silly question, but you arent trying to access and internal fwdn from your home wihtout forwarding are you? [19:49:16] of course not. [19:49:21] i'm on the 6th floor :) [19:49:21] ok, just had to ask. [19:49:26] thats not going to work for that [19:49:36] the office isnt tied to the internal datacenter network [19:49:42] its bastion1.wmflabs.org right? [19:49:50] er, yes. [19:49:55] (wrong cp) [19:50:06] ok, just making sure that wasnt the issue ;] [19:50:20] Host bastion1.pmtpa.wmflabs [19:50:20] HostName bastion.wmflabs.org [19:50:27] hrmm, bastion for labs lets me in too, ahh, ok. [19:52:20] cmjohnson1: so what is left in rack B3, or row B in general? [19:52:24] csw5-pmtpa I know ;) [19:52:29] you can unrack that now [19:52:38] RobH: Well. I have no idea how to debug this. [19:53:08] I get back 9 miles of junk. [19:53:12] and it trashes my shell. [19:53:34] !log Decommissioned csw5-pmtpa from AS14907 service. rest in pieces ;) [19:53:37] Logged the message, Master [19:55:35] hrmm, its bneing slow for me [19:55:49] insanely slow [19:56:04] it has been insanely slow for a long time. [19:56:13] we assumed that was just how things were :P [19:56:17] well i know we are at the instance limit [19:56:24] we shipped some hardware down to fix this [19:56:39] which is racked and ryan was readying for deployment, but i dont think its quite there yet [20:02:46] mark: besides csw5 nothing [20:03:44] dschoon: so I am looking at it, but its super slow and some of the docs i have for seeing if its instance overload causing the issue seem to be outdated [20:03:49] trying to hunt down ryan [20:03:55] (labs being in beta and all ;) [20:05:17] cmjohnson1: ok please update racktables to reflect that then ;) [20:05:26] and feel free to unrack csw5-pmtpa whenever you have someone to help you [20:05:36] I guess you can put it on the floor of one of the new row C racks for now, or something [20:05:39] then we can cancel row B [20:06:24] sounds good [20:08:03] history in the makin. [20:08:23] thus the last of the rows that existed from before i started are going away. [20:14:31] ok I'm gonna have weekend I think [20:17:57] have a good one mark :) [20:18:03] good week-end :) [20:18:09] mark: and thanks for your reviews! [20:18:31] have a good weekend [20:47:56] !log added redirect/301 from http://static.wikimedia.org --> http://dumps.wikimedia.org now that archival static html dumps are located there [20:47:59] Logged the message, Master [21:08:08] gerrit-wm: ping? :°D