[00:07:42] New patchset: Lcarr; "Script to change initcwnd to 10 packets see http://wikitech.wikimedia.org/view/TCP_Tuning#initcwnd_10" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2521 [00:08:30] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2521 [00:08:30] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2521 [00:14:13] LeslieCarr: no they are preparing for celebrating the revolution [01:09:37] whee!!! db42 ate itself. [01:09:38] http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=&c=MySQL+pmtpa&h=db42.pmtpa.wmnet&tab=m&vn=&metric_group= [01:09:47] * maplebed goes to try and smack it. [01:11:03] RobH: you're not around, are you [01:14:57] maplebed: kinda whassup? [01:16:38] * robla also looks at the email alert about udp2log [01:17:18] anyone been doing anything on emery? [01:18:25] RobH: db42 bit the dust and I'm trying to remember where the docs are for its LOM [01:18:38] ahhhh, ibm. [01:18:40] lemme link [01:18:52] all I need is a powercycle. [01:18:53] everything links from platform specific docs [01:19:01] oh, power cycle [01:19:03] or reset. [01:19:09] http://wikitech.wikimedia.org/view/IBM_X3650M3 [01:19:25] unfortunately, the two IBMs do not appear to accept IPMI commands [01:19:26] thanks. [01:19:29] or my nifty script would work [01:19:38] maplebed: btw, ipmi_mgmt scirpt is on sockpuppet [01:19:43] the other half of the question, of course, was 'what make is db42?' [01:19:44] for the Cseries and such [01:19:46] which you already answered. [01:20:01] yea, db42 is an ibm [01:20:09] no way to konw without logging in and recognizing it, or pulling up racktables [01:20:16] i just recognize the login prompts =P [01:20:21] I'll try reset first. [01:20:49] ok! [01:20:50] console 1 starts console, and the escape for serial is ctrol+{ then shift ( [01:20:57] well, thanks, db42. [01:21:11] i am about until its fixed, i assumed you rather learn the software ;] [01:21:31] 'learn these two one off servers we wont be buying again!' ;] [01:21:34] meh. dario said if it doesn't come back up easily it's cool to wait till monday. [01:21:45] oh, and the mgmt required a nifty little one by one inch bit of circuit [01:21:56] that is the hardware logic and licensing chip [01:22:05] i hate that its called a licensing chip [01:22:28] the actual servers are nice though, just annoying since we dont have a bunch [01:22:32] but the internals are slick [01:22:37] (the case design and the like) [01:23:59] looks like it's booting. [01:25:05] RobH: back up and able to ssh in. [01:25:08] thanks for the link. [01:25:26] sweet [01:25:29] glad to help [01:25:54] Erik Zachte just forwarded me a nagios alert from emery. "CRITICAL: filters absent: /var/log/squid/filters/sqstat3,". It was followed pretty quickly by "OK: all filters present". [01:26:07] I'm assuming transient issue; no big deal [01:26:33] ...but he was surprised to get the emails, and didn't know what to do with them [01:26:55] I'm assuming I can poke around in puppet somewhere and see who gets those [01:27:23] there was just a conversation a week or two ago about who should get alert emails from emery and locke... [01:27:54] I vaguely recall that conversation. I know Diederik set a bunch of that stuff up [01:28:04] drdee: you aren't around by any chance, are you? [01:28:26] umm.. [01:28:35] what's on disk on emery doesn't match what's in puppet. [01:28:45] he is on tech channel i think [01:29:22] it's nimish that's doing it. [01:29:27] he's logged into this channel, but it's 8:30pm where he is [01:29:45] is Nimish futzing on emery now? [01:30:47] maplebed: you at your desk at work? [01:31:03] only he and I are logged in and the timestamp on the config file is 3 minutes ago. [01:31:18] yes, I am at my desk. [01:31:37] actually, maybe I should call him [01:31:48] well... there are actually two root shells connected. [01:31:51] so maybe someone else is here. [01:32:11] binasher: arey ou here? [01:32:12] can't hurt to call and ask [01:32:18] he's holding the other one. [01:32:58] I am going to go play videogames and be a goddamn bum. [01:33:08] so kinda around if shit breaks. [01:33:36] mahey [01:33:44] just tried calling him...no answer [01:33:59] i'm fucking around on emery [01:34:30] the monitors.. they work! [01:34:53] ah, puzzle solved! [01:34:54] binasher! [01:35:25] watcha doin'? [01:36:02] replacing the filter i wrote [01:36:06] hooray for last and ps! [01:36:24] maybe [01:36:34] why are you doing that at 5:30pm on a Friday? [01:37:30] because i'm not doing anything fun for another 90 minutes [01:38:21] good answer ;-p [01:38:34] not really :-P [01:39:04] binasher: does Diederik know what you're up to? [01:39:39] binasher: did you catch the db42 fun? it crashed ~5hrs ago so I powercycled it. slaving restarted on its own, so I tihnk maybe it's ok. [01:39:51] robla: does deidrek need to know whether or not something i wrote on off time which isn't used by anyone else is running at any given moment? [01:40:10] yes [01:40:17] if it's on emery, absolutely [01:40:35] locke and emery are ridiculously fragile systems [01:40:53] * robla digs up rant from a month ago the last time things went sideways there [01:43:33] and to think i was just going to replace something that uses an entire core with something way more efficient… [01:43:37] nevermind [01:44:39] keep going asher. [01:44:51] that's really cool if you're making it more efficient. we just need to know rather than getting surprised by it, especially because it might be that it doesn't work as designed [01:44:53] replacing the that thing is well worth it. [01:45:05] robla - there are a few guys i trust on the system …Asher is one of them [01:45:10] so we are in good hands [01:45:15] robla: the one binasher is working on takes 3x the resources of any other script, and probably more than all the others combined. [01:45:44] woosters: he doesn't necessarily know where all of the landmines are with that particular system, and I don't think anyone should be above peer review [01:48:08] diederik put those monitors in place because the last time we had a problem, we didn't know who it was that screwed it up, and didn't detect it for a very long time [01:48:21] peter put those monitors in place and i helped write them.. [01:48:24] anyways [01:49:16] +1 monitor. +1 detect "false positives" and keep going. [01:49:20] ::sigh:: [01:52:27] I guess I'm just confused [01:54:55] g'night. [04:39:39] New review: Demon; "Do this for core2 instead." [test/mediawiki/core] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/2507 [04:39:46] Change abandoned: Demon; "(no reason)" [test/mediawiki/core] (master) - https://gerrit.wikimedia.org/r/2507 [04:40:00] Change abandoned: Demon; "(no reason)" [test/mediawiki/core] (master) - https://gerrit.wikimedia.org/r/2506 [10:55:01] New patchset: Hashar; "adding gitreview config file" [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2522 [17:08:17] !log db9 shutting down to move racks, offline during this includes: blogs, bugzilla, racktables, rt, survey, etherpad, observium [17:08:19] Logged the message, RobH [17:08:53] cmjohnson1: Once db9 turns off, you can move it. [17:08:58] okay [17:12:53] heh, everything I use for datacenter work is now offline (rt and racktables) [17:17:37] robh: powered on [17:18:07] ok, logging into drac and watching it post [17:20:26] 282 days without fsck, im letting it run [17:22:00] wow...puppet is supposed to that...right? [17:22:23] i mean i tell it to start manually and it fails [17:23:26] anybody else here already poking at the blog? [17:23:26] maplebed: you know more aboutmysql than i [17:23:31] db9 is down [17:23:36] lame. [17:23:37] we moved it, now it wont fire up mysql [17:23:41] * maplebed logs in [17:23:42] i just pinged asher [17:23:52] so you moved it to a new rack [17:23:56] and the OS came back up [17:23:58] but not mysql? [17:24:00] it has a ton of gmond failures for every single mysql metric [17:24:02] yep [17:24:07] ok. [17:24:16] I don't think mysql is set to start on boot for most of our DBs [17:24:20] i am on the sysetm but if you are poking i wont run anything [17:24:25] i try to start it manually, it fails [17:24:45] how'd you try to start it? [17:24:49] /etc/init.d/mysql [17:24:53] yea [17:24:57] or /usr/local/mysql/bin/mysqld_safe? [17:25:09] didnt try safe yet, you litterally hopped on [17:25:16] * maplebed just rat it [17:25:19] looking at logs now [17:25:38] huh, its up now [17:25:44] so hgave to safe start it, interesting [17:25:58] fb version, yeah [17:26:05] error log looks fine [17:26:09] !log db9 moved, all systems online [17:26:11] Logged the message, RobH [17:26:17] yes, I was just looking at it (the log) [17:26:24] slow, but pulling up all servies to check [17:26:54] yeah, it's very misleading that we have /etc/init.d/mysqld and that it doesn't work. [17:27:08] so we are running the FB version on all dbs [17:27:09] ? [17:27:17] so i need to safestart everthing ? [17:27:21] I think so. [17:27:26] but there's probably an exception [17:27:31] I'm sure there is [17:27:33] well good to know. [17:27:40] "if it's already running don't start by hand" :-P [17:28:05] !log manual test of each affected service complete, db9 fully online. [17:28:07] Logged the message, RobH [17:28:10] maybe 'if it doesn't start on boot, start the fb version' would be a good heuristic... [17:28:23] RobH: you actually have a list of all those services?! [17:28:54] well i have the database list and my knowledge of which are actually used [17:29:32] gotcha [17:33:50] New review: Demon; "(no comment)" [test/mediawiki/core2] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2522 [17:33:50] Change merged: Demon; [test/mediawiki/core2] (master) - https://gerrit.wikimedia.org/r/2522 [17:34:58] cya [17:36:45] New review: Demon; "Rather than creating a new commit to revert the old one, you should abandon the original change so i..." [analytics/reportcard] (master) C: 0; - https://gerrit.wikimedia.org/r/2419 [20:31:13] !log restarted lightty on dataset2 [20:31:15] Logged the message, Master [20:32:53] apergos: watchmouse page? [20:33:03] i just got it, i assume you just fixeD? [20:33:04] yeah, that's why I restarted it [20:33:06] uh huh [20:33:37] once every couple weeks it needs a kick in the bum [20:33:45] when i get a watchmouse page it plays the mario game over tone. [20:33:49] hahaha [20:34:42] oh well, going back afk since its fixed ttyl [20:34:49] enjoy [23:51:15] New patchset: Asher; "frontend cache tier udplog data for graphite" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2523 [23:51:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2523