[00:06:09] observium > torrus [00:18:46] maplebed: TimStarling: check your email. msg from Ralf [00:19:24] "since we noticed in the last few days that some of the images we fetch [00:19:24] are corrupt" <-- nice of him to report it... [00:19:29] grumblegrumblegrumble. [00:20:04] I'm asking him to join the channel now [00:20:46] who is ralf ? [00:21:00] he's from PediaPress [00:21:05] ah [00:22:48] ping robla [00:23:01] hi schmir...thanks for joining [00:23:31] New patchset: Tim Starling; "Limit fanout like in scap, to avoid overloading the NFS server." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2570 [00:24:07] New patchset: Lcarr; "Generating initcwnd.erb with both default gateway and default interface fact" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2571 [00:24:29] New review: Aaron Schulz; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2570 [00:24:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2570 [00:24:29] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2570 [00:24:29] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2570 [00:24:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2571 [00:24:52] schmir: we probably shouldn't reenable collections before we have a good plan of attack here [00:26:07] maplebed and TimStarling are the main people that have been working on this [00:26:41] schmir: in ralf's email, he says "since we noticed the last few days ..." [00:26:47] do you have more detail on exactly when that started? [00:26:55] schmir == ralf :) [00:27:16] specifically, did it coincide with http://blog.wikimedia.org/2012/02/09/scaling-media-storage-at-wikimedia-with-swift/ [00:27:26] ah. hooray for multiple names. [00:27:37] (says the dude with at least three identifiers in mediawiki) [00:27:45] * robla was just observing that :) [00:28:47] robla: http://wikitech.wikimedia.org/view/User:Bhartshorne/pdf_thumbnail_issue [00:28:52] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2571 [00:28:53] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2571 [00:28:58] https://github.com/pediapress/mwlib.rl/commit/b1ead48e94f32c302a711160d3544141c4679928 is one of the first commits that tries to handle broken images [00:28:59] fwiw, swift does have a truncated version of the 1200px thumbnail. [00:29:04] that's from january 20 [00:29:07] so there may be a legit bug here. [00:30:47] robla: I already changed back the default to 1200px on the render servers. [00:32:08] so you changed the default without telling any of us or logging it in the server admin log? [00:32:42] TimStarling: yes. I didn't expect it to cause problems like this. [00:32:59] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2556 [00:33:00] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2556 [00:33:31] besides I don't even know what the server admin log is. [00:34:14] well, join #wikimedia-tech, and then type "!log ...the log entry..." [00:34:31] then a bot will add your message here: https://wikitech.wikimedia.org/view/Server_admin_log [00:34:59] do you think you could do that every time you change something on the pdf servers, regardless of whether you think it will break something? [00:35:34] only if I automate it [00:36:08] schmir: it's going to be pretty important for us to have a protocol for changing production services that works for all of us here [00:36:56] I'm going to guess that Tomasz or someone from the WMF never outlined that as a hard and fast requirement for you all, but it's definitely a requirement on our end [00:37:56] automation or no, it would be useful to just be in #wikipedia-tech or this channel so that if there's a flurry of several people spending a few hours banging their head against something you can speak up. [00:39:56] well, I didn't know about the flurry of people... [00:40:50] you'd like to use puppet for administration anyway...but I'm still waiting for someone to get us a pdf render machine on labs with some basic puppet config [00:42:05] * maplebed sees a labs project titled "pediapress" in existence already. [00:42:38] * maplebed will stop being grumpy now. [00:44:12] New patchset: Lcarr; "temp commenting out config file until new facter script propogates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2572 [00:44:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2572 [00:45:06] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2572 [00:45:07] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2572 [00:47:20] schmir: you have the ability to make the instance yourself [00:47:37] Ryan_Lane: please read my last mail on the subject. [00:47:54] you mean the "that's not my job, it's yours"? email? [00:47:59] s/?// [00:48:11] I want to have a basic puppet template as a jump start [00:48:29] I'd imagine someone from the ops team will work with you on puppetization [00:48:49] I think RobH is supposed to be working with you on this [00:50:11] anyway, you don't need to start by puppetizing it. we recommend: 1. installing everything manually, and documenting the steps. 2. Puppetizing it. 3. Make an instance configured from scratch using the puppet configuration and see if it builds properly [00:50:29] err 2. Puppetizing it based on the documentation [00:50:58] if it doesn't work based on the documentation, the documentation should be updated. doing it this way ensures the documentation is accurate [00:51:09] the docs are fine: http://mwlib.readthedocs.org/en/latest/installation.html#installation-instructions-for-ubuntu-10-04-lts [00:51:12] schmir: in all of my puppet efforts i just look at existing machines and copy over the information :) there's a lot of random machines out there to check out [00:52:13] LeslieCarr: I don't even know where to start. I have no experience with it. [00:52:32] again, I don't think you need to do the puppetization [00:52:45] but, we aren't going to puppetize something with pip [00:52:54] we discussed that already, though [00:54:01] Ryan_Lane: there are no debian packages! [00:54:12] fyi, nfs1 is not very responsive - i am trying to conenct to its mgmt now [00:54:40] nfs1 is sending alerts [00:54:46] I'd imagine that's for LDAP [00:54:55] actually it appears that nfs1 is completely unresponsive [00:55:03] schmir: well, that's going to be a problem [00:55:29] it's fine for your software. we didn't want to manage that via puppet anyway [00:55:36] but the dependencies need to be packaged [00:55:52] why don't you start doing that then? [00:56:04] I'm not even working on this [00:56:05] * Ryan_Lane shrugs [00:56:21] None of us is working on it. so who is? [00:56:21] but I do need to ensure it's being done in a way that we can manage [00:56:22] guys, nfs1 looks completely fubar [00:56:24] rebooting ? [00:56:28] LeslieCarr: sounds good [00:57:02] !log rebooted nfs1 as it was unresponsive on console and via IP [00:57:04] Logged the message, Mistress of the network gear. [00:57:36] schmir: I don't know. I'm busy with Labs, otherwise I'd do it [00:57:53] but realistically, it's generally the developer who makes packages [00:58:14] if RobH is working with you on this project, he may be able to help [00:58:52] Ryan_Lane: having debian packages is *your* requirement, not ours. [00:59:51] well, we often don't use software that isn't packaged [01:00:25] nfs1 was totally segfaulting [01:00:38] especially when the software isn't terribly easy to work with [01:00:47] not saying the software is bad. just complex. [01:01:10] and yes, we'll likely package the software if you are unwilling to do it, but it's going to take a while [01:02:20] LeslieCarr: is it coming back up ok? [01:02:27] yeah, came back ok [01:02:29] cool [01:02:50] we can certainly do it if we're getting paid for it...but it's a lot of work...for very little gain. [01:03:26] it's the kind of thing that can be automated into your builds [01:03:37] so, it's a bit of upfront work, but that's it [01:03:48] anyway, it looks like it's moving nowhere. [01:05:18] I'm not sure who's working with me on it, not sure if I am allowed to use pip in order to install into a git repository, not sure what to do now [01:05:44] <^demon|away> schmir: You don't have to use git-review, it just makes it easier. [01:06:12] schmir: again, for your own software, manage it how you want [01:06:12] <^demon|away> http://www.mediawiki.org/wiki/Git/Workflow#Manual_setup [01:06:18] the dependencies need to be packaged [01:06:18] it looks like you rather turn off the collection extension cause it's a complex... [01:06:31] eh? that had nothing to do with me [01:06:38] Tim turned that off, I believe [01:06:38] schmir: we turned off the extension because it BROKE WIKIPEDIA [01:07:11] we will turn off anything that breaks wikipedia [01:07:17] our job is to keep wikipedia up [01:07:22] yes, I got that. Ryan said you often do not use software that isn't packaged and is complex [01:09:31] yes. we often don't [01:09:45] because it takes too much effort to manage it [01:12:01] and to me it looks like you threaten me to turn the thing off [01:12:11] o.O [01:12:20] I don't understand what you are talking about [01:12:50] heya folks...let's not try to hash out the longer term stuff. I'd like to have Tomasz and Kul around for that [01:12:53] looks like. [01:13:14] for the short term issue, it'd be nice to get Collections back up and running [01:13:16] I'm not threatening anything. I'm saying that we want to use software that is properly packaged, rather than an unmanagable way. [01:13:38] schmir: and I'm only talking about the new service, not the current one [01:14:05] robla: well, this is a continued conversation that's been taking place for months [01:14:10] via email [01:14:33] Ryan_Lane: sure...let's *make sure* it happens when we've got the right people around [01:15:12] * Ryan_Lane shrugs [01:16:09] TimStarling: maplebed: you guys feel comfortable that we can turn this back on now? [01:16:20] * maplebed does [01:16:59] maplebed and I just discussed this problem: http://wikitech.wikimedia.org/view/User:Bhartshorne/pdf_thumbnail_issue ...and he's got some ideas about how to fix that [01:17:54] schmir: we think we have an idea about what the root cause of the image corruption problem is, and we're working out a plan for fixing htat [01:18:04] robla: thanks [01:18:35] schmir: if you have or can create a list of truncated 1200px images, I'd love a copy. [01:19:32] maplebed: sorry, I think that's not possible without changing the software... [01:20:03] you don't have them in logs or something perhaps? [01:20:39] (i.e. wherever the russia china locator example came from) [01:21:00] if you don't, no big deal, but it would help confirm or refute my hypothesis on why they're truncated. [01:21:08] (by giving me a larger dataset) [01:23:45] volker may be able to provide some more. I'll ask him. [01:28:30] !log resuming 1.19 schema migrations after fenari reboot (on first s4 commons slave, db22) [01:28:31] Logged the message, Master [01:42:15] maplebed: btw volker reported the issue in some irc channel...without getting an answer. we changed the default size after that. [01:42:44] you don't know which channel that was, do you? [01:43:08] (just curious) [01:43:12] I don't know. I guess the tech channel [01:43:40] I posted announcements about swift in the en tech village pump, the commons village pump, the tech mailing list and commons mailing list, and the blog. [01:44:04] if there was another spot that I missed that might have allowed you to see it, I'd like to add it to the list for the next time. [01:45:15] maplebed: me see it? [01:46:00] the post included instructions "if you see anything weird going on with thumbnails, ..." and how to get ahold of me. [01:46:20] well, I'm not reading any of that stuff... [01:46:30] basically, I just want to avert future reoccurrences, [01:46:41] so am asking for help in advice on how to broadcast about changes that might affect folks, [01:46:50] so that they can help us avoid situations where wikipedia breaks. [01:48:17] so you broke it with prior denouncement? :) [01:48:21] schmir: is there any way we can inform you of changes that might affect your service? [01:48:37] mail [01:48:50] do you read any mailing lists? [01:48:54] mwlib-l? [01:49:13] if we send it there, will you see it? [01:49:25] is there a mwlib-l mailing list? [01:49:26] I guess mwlib doesn't have a -l. heh [01:49:47] there's a mwlib one, for sure [01:50:01] mwlib@googlegroups.com [01:50:39] at any rate, I gotta bail for the evening. [01:51:03] maplebed: seeya [01:52:39] !log started indexer on searchidx2 with /home/rainman/scripts/search-restart-indexer per docs [01:52:41] Logged the message, Master [01:52:47] yes, same here. good night. [01:53:25] Ryan_Lane: I would prefer mail to my @brainbot.com address... [01:53:59] well, it's easier to send to a list, that way your coworkers also get the info [01:54:15] I'll take that into consideration, though [01:54:24] I can setup an alias on our mail server. [01:54:36] that would be great [01:54:49] thanks [01:56:20] RobH is Rob Halsell? [01:56:34] yep [01:57:10] ok. good night. [02:10:11] Hi - CT asked me to report here… I have two independent reports of 502 errors when saving edits on wikis. I'm not sure if it's important, or normal :) [03:13:31] New patchset: Tim Starling; "Fixed sync-l10nupdate again." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2573 [03:13:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2573 [03:22:56] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2573 [03:22:57] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2573 [03:43:48] New patchset: Tim Starling; "Added sudoers rule for l10nupdate -> mwdeploy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2574 [03:44:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2574 [03:44:18] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2574 [03:44:39] New review: Tim Starling; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2574 [03:44:58] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2574 [04:26:39] New patchset: Demon; "Revert 682b27, was a stupid change. Just adding something like" [operations/software] (master) - https://gerrit.wikimedia.org/r/2575 [06:57:47] when did they make that default size change (what time and what day) I wonder [07:29:43] New patchset: Asher; "graphite stats retention" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2576 [07:30:06] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2576 [07:30:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2576 [07:30:07] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2576 [07:36:13] New patchset: Asher; "fix lower-precision longer term storage of stats data" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2577 [07:36:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2577 [07:37:42] New patchset: Asher; "fix lower-precision longer term storage of stats data" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2577 [07:38:05] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2577 [07:38:05] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2577 [07:38:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2577 [13:07:28] do we have ipv6 interfaces in prod? [13:07:41] I ask b/c of https://bugzilla.wikimedia.org/34362 [13:08:58] Reedy: ^^ [13:11:34] Reedy: Can you make me admin on test2? [13:12:45] we have them only for a limited list [13:12:55] lemme think about that [13:13:03] that's true for upload maybe, don't remember about the ret [13:13:04] rest [13:13:47] apergos: can you make me admin on test2? [13:13:52] nm [13:13:56] :-D [14:16:23] New patchset: Catrope; "Fix MIME type for .woff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2578 [14:16:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2578 [15:15:14] New review: Mark Bergsma; "Any reason not to deploy this in base.pp, i.e., on all servers? :)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2556 [15:16:32] New patchset: Mark Bergsma; "Working upstart job varnishncsa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2579 [15:17:03] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2579 [15:17:03] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2579 [15:30:59] New patchset: Mark Bergsma; "Pass the environment as arguments" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2580 [15:32:19] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2580 [15:32:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2580 [15:33:46] New patchset: Mark Bergsma; "Syntax error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2581 [15:34:28] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2581 [15:34:29] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2581 [15:36:09] New patchset: Mark Bergsma; "Apparently Puppet doesn't do string concatenation with +" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2582 [15:36:53] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2582 [15:36:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2582 [15:45:34] hexmode: http://wikitech.wikimedia.org/view/IPv6_deployment#Current_IPv6_Deployment_status [16:02:38] !log spence lost /home, mount was "Stale NFS file handle", causing outage of stats.wikimedia.org, fixed by remounting [16:02:39] Logged the message, Master [16:11:52] hi robh [16:12:39] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/2264 [16:12:48] i have a general ip notation question (not for robh specifically), what does the following notation mean 232.53.21.35/32 (it's the /32 part I am curious about) [16:13:00] It's an IP range [16:13:07] a range over what? [16:13:26] Well if I remember correctly, a /32 range is a range of 1 IP address :D [16:13:32] :) [16:13:48] what would /18 mean? [16:13:54] (for example) [16:14:13] Well, a /31 is a range of 2 addresses, a /30 is 4, a /29 is 8, etc etc [16:14:21] the first 32 bits of the address are the network mask. that's all the bits in this case [16:14:25] So an 18 contains 2^(32-18) = 2^14 addresses [16:14:34] =4096 [16:15:09] so for a given ip address, what is then the lower and upperbound [16:15:15] ? [16:15:27] To figure that out you have to convert it to binary [16:15:33] ugggh [16:15:44] it's not hard. you do each octet separately [16:15:48] Then the lower bound is that address with the last N bits zeroed out, and the upper bound is that address with the last N bits one-d [16:15:49] Yeah [16:16:02] ok [16:16:09] Esp. in C this is easy [16:16:14] :D [16:16:22] if you know your bitwise operators [16:16:43] i guess this is a good learning case [16:17:16] here, this is not bad: http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a00800a67f5.shtml [16:17:25] except no one uses "class a" etc anymore [16:17:33] it's all subnetting now [16:18:06] thanks apergos! [16:18:09] much appreciatee [16:18:10] d [16:18:11] sure [16:19:17] oh apergos,could you maybe take care of this ticket: http://rt.wikimedia.org/Ticket/Display.html?id=2436 [16:20:30] hmm my rt ticket filter must not be narrow enough, it didn't end up in my inbox [16:36:34] so drdee [16:36:50] yes [16:37:01] instead of me doing that, I helped out a little with the nfs mount issue, and mutante is setting up andre :-D [16:37:14] excellent! [16:46:10] New patchset: Dzahn; "add account for aengels, add to stat1, fix last UID counter" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2583 [16:46:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2583 [16:56:50] New patchset: Mark Bergsma; "Make start-stop-daemon work with multiple instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2584 [16:59:36] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2584 [16:59:37] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2584 [17:05:43] New patchset: Mark Bergsma; "Sigh." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2585 [17:06:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2585 [17:10:24] New patchset: Mark Bergsma; "Add job name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2586 [17:10:47] New review: Dzahn; "approved now in RT 2436" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2583 [17:10:47] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2583 [17:10:48] Change abandoned: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2585 [17:11:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2586 [17:11:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2586 [17:19:03] I need jenkins on "gallium" to ssh to "formey" so it can executes some gerrit commands there. [17:19:36] so I could create a jenkins user on formey with a ssh key but I am not sure that is much a good idea to have jenkins able to do anything it wants on formey [17:20:13] did we ever setup something like that previously? (softwares sshing between hosts) [17:30:00] New patchset: Hashar; "swp files are now ignored" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2587 [17:30:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2587 [17:49:15] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2264 [17:49:45] New patchset: Dzahn; "enhance page_all - area code API lookup one-liner :p - option to skip an area" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2264 [18:05:22] Now where the hell am I going to fit 4 new swift servers. [18:06:55] typical answer: under the floor [18:07:08] nobody will notice, it is cold and close to power plugs [18:09:25] RobH: yeah. raid-6 with 12 disks it is [18:10:37] 12? [18:10:47] Ryan_Lane: so we used raid6 on dataset2 [18:11:13] yeah, why not 12? [18:11:20] checking what we did on dataset2 [18:12:03] well god damn [18:12:10] seems its a huge 15 disk raid6. [18:12:12] i dont like. [18:12:17] but meh. [18:12:20] why not? [18:12:27] two disks out of twelve is a lot [18:12:34] true, and its mirrored to another system [18:12:38] so indeed, i guess thats good [18:12:42] netapp, for instance, defaults to 15 disk raid-6 [18:12:54] yea that is not concerning [18:12:57] netapp does it ;P [18:12:58] * Ryan_Lane nods [18:13:04] well, by default [18:13:09] you can make it larger, and many people do [18:13:10] cuz those netapps have been totally great to work with [18:13:14] heh [18:13:17] and /sarcasm off [18:13:30] maplebed: I am tracking down where I can shove these [18:13:37] its a one here, another there kind of thing [18:13:45] all I ask is 3 racks. [18:13:49] well, I ask for 5. [18:13:51] but I'll take 3 [18:14:02] hm [18:14:09] why does this have one unconfigured disk? [18:15:38] crap. I accidentally just made a raid- [18:15:39] 0 [18:15:43] hehe [18:15:47] damn netsplits [18:16:21] maplebed: trying to get you more than three [18:16:37] hey RoanKattouw_away [18:16:55] maplebed: i have you 4 so far, trying to find one more. [18:17:10] chris is going to hate this, its all mostly full racks and he has to rack in top ;] [18:17:32] well he's netsplitted away, so he won't know our evil plans, mwhahaha [18:17:42] ah, there we go [18:19:05] ah. there's two arrays. I forgot about that [18:19:28] RobH: so, I should do two raid-6 and LVM them? [18:19:43] thats what i would suggest yea [18:19:50] * Ryan_Lane nods [18:19:59] though you wanna leave some space on first raid6 for os / [18:20:16] that's a lot of wasted space :( [18:20:23] that's 4 drives [18:20:26] oh well [18:20:40] ? [18:20:45] Ryan_Lane: put the OS in the raid6 [18:20:49] yeah [18:20:49] just dont LVM partitoin it [18:20:50] I am [18:21:13] whats the wasted space? [18:21:29] nothing is. I'm crazy. ignore me :) [18:21:57] that Ryan_Lane, he done goned crazy [18:22:03] maplebed: i now hate ms-b3 [18:22:05] be [18:22:11] i wish we had just called it msbe [18:22:14] i hate the - ;] [18:24:00] hm [18:24:03] it seems we have a bad disk [18:24:26] 01:00:02: Rebuild: 1862.50 GB [18:24:38] it keeps cycling between that and missing [18:24:40] hey, so db46 has a very bad replication lag -- can anyone help/show me how to troubleshoot htis ? [18:25:11] LeslieCarr: is it one of the ones that is having the schema change done? [18:25:42] binasher's email said we'd see some dbs with replag [18:25:45] not that i know of ? (quick email search doesn't show that) [18:26:04] which slice is currently being done? [18:26:16] does SAL mention anything about which slice is being done right now? [18:26:26] ah [18:26:39] RobH: should I enter a ticket about this drive? [18:26:45] is there an easy way to see which slice a db belongs to ? [18:27:14] asher made some tool for this [18:27:51] This is the room for serious operations updates, right? [18:28:01] If so: the espresso machine is now fixed. [18:28:03] That is all. [18:28:07] heh [18:29:53] Ryan_Lane: yea drop ticket in pmtpa for cmjohnson1 to get it replaced [18:29:58] ok [18:30:05] though ya may not be able to set it up today then [18:30:12] * Ryan_Lane nods [18:30:14] not sure if you can force it to build the array with a bad disk. [18:30:18] I'll do the rest of them for a raid [18:30:19] you can't [18:30:22] I tried. heh [18:30:43] I'll make sure the rest are OK [18:30:52] I can make a two-node gluster cluster [18:30:57] then add the other two later [18:33:08] Ryan_Lane: so checking the delay i do see a huge delay ( http://wikitech.wikimedia.org/view/Checking_MySQL_replication ) - just not sure what to do about it :) [18:33:29] you may not want to do anything [18:33:38] if it is the one doing a schema migration [18:33:56] i think i will wait until binasher gets in… [18:33:56] http://noc.wikimedia.org/~hashar/db.php [18:34:15] that's s6 [18:35:41] cool :) [18:36:43] well, seems that's not the slice he was doing [18:39:02] ryan_lane: any idea which disk you think could be bad? [18:39:08] in the ticket [18:39:20] 01:00:02 [18:40:12] got it thks [18:40:19] I'm not even getting a console on labstore2 :( [18:40:23] ah. now I am [18:40:31] just took a while [18:40:43] ah so if that isn't the slice he's doing, something else is wrong... [18:41:24] I looked at the process list [18:41:30] looks like a bunch of processes waiting on the master [18:43:34] Jeff_Green: ? [18:45:00] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=2442 is for the swift servers arriving tomorrow [18:45:55] Ryan_Lane: so we have an offer from the author of OTRS to help us with upgrading to the current stable version [18:46:04] robh: cool [18:46:19] sounds great! [18:46:45] is he going to take the security issues into consideration? or are they already fixed in the newer version? [18:46:50] he reviewed our existing build including the patches Tim deployed, and he's willing to port those patches to v3 [18:47:00] ah. great. [18:48:14] I'm not really sure what staging that upgrade will look like, but I'm guessing the 'hard' parts are porting the patches, puppetizing, testing, and the data-modification [18:48:28] data modification will likely be the hardest part [18:48:40] and a lot of that data contains private information, which is tricky in labs [18:48:47] exactly [18:49:01]