[00:02:02] Change abandoned: Alex Monk; "Looks like this is being done in change 6578 now." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5547 [00:05:11] New patchset: Bhartshorne; "moving ms1-3 to be part of the test cluster instead of prod." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6580 [00:05:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6580 [00:06:06] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6580 [00:06:09] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6580 [00:06:48] !log moved ms1-3 from the production cluster to the test cluster [00:06:51] Logged the message, Master [01:17:41] LeslieCarr: what's with all the smokealerts? [01:18:05] Lacking nicotine? [01:19:07] lemme see which host it's trying to contact [01:19:31] oh it's trying to contact a decomissioned host [01:20:32] hah [01:21:12] that would explain why it's not working too well [01:23:58] ok, fixed that and reloaded [01:24:07] where's that? [01:24:12] just so that I know for next time :) [01:24:30] in streber [01:24:34] under /etc/smokeping/config [01:24:38] (could use puppetizing) [01:24:50] fix it then service smokeping reload [01:43:13] New review: Catrope; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/6463 [02:20:56] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5443 [02:20:59] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5443 [06:33:34] New review: Hashar; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/6578 [06:59:16] New review: Siebrand; "Roan, can you please give this another look?" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/5783 [07:47:30] !log preemptive rebooting of sq* servers identified as having > 200 days of uptime [07:47:35] Logged the message, Master [08:48:44] New patchset: Hashar; "rsyslog did not reload on config change" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6593 [08:49:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6593 [08:49:36] !log dist-upgrade & new kernel & reboot: sq33, sq36 [08:49:39] Logged the message, Master [09:34:41] !log dist-upgrade/kernel/reboot: sq37, sq41. rebooting upload squid sq41 [09:34:45] Logged the message, Master [10:02:26] !log tossed knsq1 through 7 from squid_knams dsh nodegroups file, prolly lots more cleanup where that came from [10:02:29] Logged the message, Master [10:03:15] apergos: amssq is probably newer [10:03:52] sure. I meant generally [10:04:07] those files prolly all have a fair amoount of cruft in them [10:05:32] once started generating sq and amssq from nagios config .. i might have lost the script. [10:05:37] looking [10:24:52] New patchset: Mark Bergsma; "256 MB of (frontend) malloc storage is rather puny, especially when you're trying to squeeze 700 MB objects through it." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6595 [10:25:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6595 [10:25:47] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6595 [10:25:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6595 [10:38:01] !Log restarted lighty on dataset2 [11:08:21] seems morebots can't handle upper case L's [11:12:43] no it can't [11:12:48] rats [11:13:10] !log restarted lighty on dataset2 ... about ... half an hour ago. stupid case sensitivity [11:13:13] Logged the message, Master [11:13:20] ^_^ [11:13:23] it answers with upper case, you see that? :-P [11:23:55] !Log does !Log work? [11:24:49] <^demon> *yawn* Morning folks. [11:25:46] hello [11:29:00] hi [11:29:16] <^demon> I found a cool trick for my PS1 :) I added $(__git_ps1 " (%s)") to it :) [11:29:29] <^demon> __git_ps1 gives you your current working branch if you're in a git repo [11:29:37] <^demon> (and you have git-completion installed) [11:31:20] <^demon> http://www.mediawiki.org/wiki/File:ChadBash.png :) [11:32:28] ah:) and i wanted to mention this: on Debian/Ubuntu boxes: [11:32:35] apt-cache show vim-puppet [11:33:15] <^demon> Ooh :) [11:39:15] New patchset: Mark Bergsma; "Stream large objects, instead of fetch, store & deliver." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6598 [11:39:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6598 [11:39:55] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6598 [11:39:58] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6598 [11:48:11] !log powercycling srv266 one more time, but now creating RT for it, once already showed CPU issue before it was reinstalled recently [11:48:14] Logged the message, Master [12:06:45] !log dist-upgrade & kernel & reboot - sq42,sq43 - rebooting upload squids one by one [12:06:48] Logged the message, Master [12:23:51] !log (just) new kernels & reboot - sq63 to sq66 (209 days up) [12:23:54] Logged the message, Master [12:30:19] !log Sending ALL non-european upload traffic to eqiad [12:30:22] Logged the message, Master [12:47:36] !log (just) new kernels & reboot - sq45,sq49 (upload) [12:47:39] Logged the message, Master [12:53:58] !lgo starting script to move /usr/local/apache to /a partition on all remaing non-imagescaler apaches [12:54:11] !log starting script to move /usr/local/apache to /a partition on all remaing non-imagescaler apaches [12:54:13] Logged the message, notpeter [12:56:19] !log maximum uptime in the sq* group down to 171 days, so we have like a month now for the rest. stopping upgrades for the moment being. [12:56:23] Logged the message, Master [12:57:53] just do the rest too [12:58:15] ok, no concerns about dist-upgrade on all at once either? [12:58:28] instead of "just kernel" [12:58:37] did it partly , just in case [13:00:42] not at all [13:01:02] ok [13:16:36] !log going through sq80 to sq86 (upload), full upgrade & reboot [13:16:39] Logged the message, Master [13:30:13] !log removing srv221 from pybal pool for repartitioning [13:30:16] Logged the message, notpeter [13:43:29] !log putting srv221 back into pybal pool [13:43:31] Logged the message, notpeter [13:50:23] !log removing srv222 from pybal pool for repartitioning [13:50:26] Logged the message, notpeter [13:50:45] sounds like you have a long road ahead, peter ;-) [13:50:58] that's just for imagescalers [13:51:04] the rest are running automated [13:51:13] but the imagescalers use /a [13:51:30] so it actually involves splitting the partition, which I'd rather do by hand for 6 boxes [13:52:38] especially as our imagescaling is... fragile [14:02:47] !log putting srv222 back into pybal pool [14:02:49] Logged the message, notpeter [14:08:04] !log removing srv223 from pybal pool for repartitioning [14:08:06] Logged the message, notpeter [14:12:37] is there any network problem? [14:12:49] labs are having troubles with connectivity between own instances [14:14:54] mutante, ssmollett can you check it? [14:15:00] I am not the only one [14:15:18] [16:13:15] maxsem@bastion1:~$ ssh mobile-enwp [14:15:18] [16:13:15] ssh: connect to host mobile-enwp port 22: No route to host [14:17:23] New patchset: Mark Bergsma; "Add LVS service IPs for authoritative and recursive DNS, OSM and the misc varnish cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6603 [14:17:42] New patchset: Mark Bergsma; "Add the new LVS service IPs to the load balancers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6604 [14:17:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6603 [14:18:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6604 [14:18:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6603 [14:18:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6603 [14:22:11] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6604 [14:22:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6604 [14:29:21] petan|wk: i don't see a networking issue. bastion and instance in same subnet. most instances still reachable, seems to just affect deployment-prep (and mobile) [14:29:38] yes now it's back normal [14:29:43] oh.ok! [14:29:47] but 10 minutes ago terminal was not responding [14:30:08] even on bastion [14:36:46] New patchset: Mark Bergsma; "Add DNS LVS services" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6610 [14:37:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6610 [14:38:46] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6610 [14:38:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6610 [14:50:48] !log going through sq6x (text), full upgrades [14:50:51] Logged the message, Master [14:51:52] New patchset: Mark Bergsma; "Reorder LVS services" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6611 [14:52:11] New patchset: Mark Bergsma; "Add LVS services OSM and Misc Varnish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6612 [14:52:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6611 [14:52:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6612 [14:53:00] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6611 [14:53:03] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6611 [14:54:02] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6612 [14:54:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6612 [14:56:01] !log putting srv223 back into pybal pool [14:56:03] Logged the message, notpeter [15:09:19] !log removing srv224 from pybal pool for repartitioning [15:09:22] Logged the message, notpeter [15:09:37] robh: please check rt 2823 [15:09:44] you know [15:09:49] just shutting down apache does the same ;) [15:10:46] yep. that's what I'm doing on the automated majority of them [15:11:09] cmjohnson1: ya just want confirmation you can decom right? [15:11:12] is the site still alive? [15:11:15] other than the one puppet change [15:11:17] seems so [15:11:22] yes please [15:11:26] then i'll reboot our LVS servers [15:11:58] cmjohnson1: ok, done, ok to decom [15:12:07] thx [15:12:15] just pull the memory and such, we will go over the search cluster soon for memory upgrades [15:12:24] looks up to me :) [15:19:20] thank god for the image table [15:19:30] (I guess you don't hear that every day) [15:21:14] notpeter do you want to fix the image scaler cron jobs as soon as those get moved? [15:21:27] apergos: what do you mean? [15:21:36] well [15:21:49] the emails from root's crontab? [15:21:51] mayeb it's just during the move [15:21:52] yeah, these [15:21:53] yeah [15:21:55] find: `/a/magick-tmp': No such file or directory [15:21:59] ok nm then [15:22:00] there is no /a partition for part of it [15:22:11] ok [15:22:25] I was gonna say "or do you want me to do it" but the answer is "neither one" [15:22:45] yep! thanks! [15:23:22] putting srv224 back into pybal pool [15:23:27] !log putting srv224 back into pybal pool [15:23:30] Logged the message, notpeter [15:30:17] cmjohnson1: random question: do you know anything about rt 1496. it's about srv281 [15:31:28] notpeter: no, i forgot about it. i will run a few test [15:31:57] cool. I was looking at it for a couple of reasons, just curious [15:32:23] !log removing srv281 from rending pool until we figure out what's going on with it [15:32:26] Logged the message, notpeter [15:40:05] mark: ms7 has a very high pitch whine..could be hard drive [15:40:09] Reedy: you around? [15:40:27] cmjohnson1: uh oh [15:40:40] Reedy: nvm [15:40:47] cmjohnson1: it better be a hard drive, actually [15:40:52] at least it has 48 of those [15:40:55] unlike anything else ;) [15:40:59] heh [15:41:48] good point...is there anyway to check to see the status of the drives? the noise is deafening [15:42:02] did it just start? [15:42:08] !log going through sq7x servers (text), full upgrades [15:42:10] Logged the message, Master [15:42:33] i noticed it yesterday and it has gotten worse today [15:42:59] it reports all drives as online [15:44:30] mark: it could be db29 or 30 ...hard to pinpoint but it's one of those 3 [15:44:53] uh oh [15:45:07] * apergos hopes it is not ms7. really really hopes [15:45:18] there's ms8 isn't there [15:45:22] yes there is [15:45:39] but it would be a pita and we would lose up to 45 minutes of uploads [15:45:43] yes...but the sound is not coming from above ms7 [15:45:44] replication runs a bit behind [15:45:57] it's hard to say chris [15:46:12] okay, more of a FYI [15:46:13] I don't see anything wrong on ms7, but we're not solaris experts [15:46:17] yes, thanks very much [15:46:37] do you want me to look at it? or are you confident? [15:46:44] please look at it [15:46:47] I just tried a few commands I know [15:46:48] ok [15:46:50] sure [15:47:11] anyway [15:47:24] 45 minutes of uploads is better than 10 years of uploads [15:47:41] there's ms1002 too [15:48:07] that's true, although that's maybe several hours behind [15:48:12] yes [15:48:26] just saying [15:48:30] yep [15:48:33] that's a whole lot better than when I started :P [15:48:44] :-D [15:48:46] well, hey, if it's db29 or db30, then we're in luck! as those aren't doing anything [15:48:49] anything must be better than then [15:49:43] * mark loops leslie [15:49:57] awww [15:49:58] eep [15:50:06] !log rebooting sq67 (bits) [15:50:09] Logged the message, Master [15:52:27] whaat [15:52:41] ms7 and ms8 are racked on top of eachother [15:53:42] i hate tampa ;) [15:53:57] i'll second that [15:54:05] hehe [16:04:13] with my (admittedly limited) knowledge, all disks look ok and without errors [16:06:49] if db29 and db30 are not doing anything, shut them down to see if the nose goes away [16:08:39] notpeter: can you take db 30 down [16:08:41] the simplicity... ;) [16:09:19] New patchset: Mark Bergsma; "Bind new LVS service IPs to the correct LVS hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6615 [16:09:38] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/6615 [16:17:37] New patchset: Mark Bergsma; "Bind new LVS service IPs to the correct LVS hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6615 [16:17:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6615 [16:18:34] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6615 [16:18:36] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6615 [16:22:37] New patchset: Mark Bergsma; "Revert "Bind new LVS service IPs to the correct LVS hosts"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6616 [16:22:47] I have just created an article named Aarohan Gurukul but its not displaying in search list [16:22:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6616 [16:22:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6616 [16:22:58] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6616 [16:23:00] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6616 [16:23:07] any one help me to re edit this article [16:24:04] cmjohnson1: back from lunch [16:24:06] one sec [16:26:23] !log turning off db30 (former s2 db, still on hardy, will ask asher what to do with it) to test noise in DC [16:26:26] Logged the message, notpeter [16:27:48] cmjohnson1: I've stared shutting down mysql. this will take a little while ;) [16:29:05] aaand, halting [16:29:33] cmjohnson1: noise gone? [16:29:35] Ryan_Lane: so when is a good time to move virt0's ip and dns and all that ? [16:31:06] apergos: dumps... [16:31:42] cmjohnson1/RobH do you know if the cameras have some special port they use for their management ? [16:31:47] i want to switch their ip's around [16:33:05] yes, I just got the page [16:33:09] and restarted them [16:33:15] I really don't know what is up with that [16:33:23] run inside gdb [16:33:36] for production? [16:33:39] yes [16:34:02] then if it dies, you probably can get a backtrace [16:35:57] ok in a minute I'll restart it that way [16:41:35] i was thinking about the other day's deploy network bomb meltdown, we're deploying by rsync right? [16:41:58] yes [16:42:13] and we're deploying content we just pulled out of git? [16:42:54] git is idiotic about file mod times, and it clashes with rsync's lastmod file test [16:42:54] yes [16:43:22] at CL we ended up writing a wrapper to massage file mod time based on git log, to save ourselves just this kind of pain [16:43:36] you can tell rsync to work purely checksum based [16:43:47] yes but do we? [16:44:06] I don't think so [16:44:07] i'd look but I'm not sure what exactly they're using to deploy [16:44:14] but I don't see why it would be problematic? [16:44:19] git just uses the current time on every touch right [16:44:29] so it's always newer/different than whatever is on the apaches [16:44:37] right, *that* is the problem [16:44:47] why? [16:44:59] if unequal, rsync checks checksums right [16:45:04] it depends on how exactly they're using git [16:45:07] RobH: are you in dc today ? [16:45:17] it's the difference between rsyncing a ton of files you don't need to rsync because they haven't changed [16:45:19] was yesterday [16:45:20] !log lighty on dataset2 is running under gdb in screen session as root, if it dies please leave that alone (or look at it if you want to investigate) [16:45:24] Logged the message, Master [16:45:31] LeslieCarr: so on the cameras you wanna redo the ips? [16:45:42] not super important right now, what is important is fpc5 :( [16:46:35] i dont see this issue [16:46:47] i imagine its for that shipment that was delivered when i was gone? [16:46:50] yes [16:49:16] mark: "if unequal, rsync checks checksums right" -- iirc rsync copies the file if either modtime or checksum don't match, unless you tell it to ignore things [16:49:57] but even if it's just a matter of checking checksums, that would be hugely painful on fenari given how huge that tree is [16:50:24] cmjohnson1: are you in the dc today ? [17:03:49] LeslieCarr: preferably not on a friday [17:04:36] monday work ? [17:04:46] I'm upgrading gluster on monday [17:04:57] if that goes well, yes [17:05:46] cool [17:20:42] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6312 [17:20:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [17:21:09] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6451 [17:21:12] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6451 [17:22:22] LeslieCarr: have I told you today how much better you are than banisher? [17:22:28] s/banisher/binasher [17:24:02] LeslieCarr: are you around? [17:24:54] hahaha [17:25:04] preilly: what's the changelist [17:25:04] !log shutting down gerrit so that everything can be backed up [17:25:07] Logged the message, Master [17:25:14] LeslieCarr: https://gerrit.wikimedia.org/r/6618 [17:26:27] ah so you're the reason for the outage Ryan_Lane :P [17:27:17] <^demon> Thehelpfulone: We're upgrading gerrit :) [17:27:22] ah ok [17:27:33] * Thehelpfulone just thought git was broken again ;) [17:27:48] broken for me I mean [17:27:54] what's new in gerrit ^demon? [17:28:02] !log adding gerrit 2.3 package to the repo [17:28:05] Logged the message, Master [17:28:24] <^demon> Thehelpfulone: http://gerrit-documentation.googlecode.com/svn/ReleaseNotes/ReleaseNotes-2.3.html [17:29:08] cmjohnson1: you about? [17:32:47] !log installing gerrit package on manganese [17:32:49] Logged the message, Master [17:38:30] LeslieCarr: can you do that for me? [17:38:47] preilly: yeah gimme a minute [17:38:59] LeslieCarr: okay, coolio [17:47:47] ssmollett: you're not in #ganglia, are you? [17:49:38] preilly: doh, i now can get it and gerrit is borked [17:55:22] !log starting gerrit [17:55:25] Logged the message, Master [18:01:51] preilly: hey, is now a good time to do the test ? [18:01:54] sorry about delays [18:02:07] dealing with router which may or may not have issues (gotta love that) [18:02:17] !log gerrit upgrade is done [18:02:20] Logged the message, Master [18:04:17] New review: Lcarr; "lint checker borked" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6618 [18:04:20] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6618 [18:04:42] New patchset: Ryan Lane; "Wrap version in quotes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6619 [18:05:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6619 [18:05:11] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6619 [18:05:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6619 [18:08:35] !log reloaded mobile varnish caches and purged them [18:08:38] Logged the message, Mistress of the network gear. [18:10:33] New patchset: Demon; "Specify download options for patches rather than relying on defaults." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6620 [18:10:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6620 [18:35:29] New review: Ryan Lane; "Cloning via http and anon_http is insecure. Also, gerrit is running on 127.0.0.1 and is being proxie..." [operations/puppet] (production); V: 0 C: -2; - https://gerrit.wikimedia.org/r/6620 [18:36:54] Ryan_Lane: Re your git prompt thingy: [18:37:00] PS1='${debian_chroot:+($debian_chroot)}\u@\h:\[\033[01;34m\]\w\[\033[01;31m\]$(dirty_git_prompt)\[\033[01;32m\]$(clean_git_prompt)\[\033[00m\]\$ ' [18:37:02] #PS1='${debian_chroot:+($debian_chroot)}\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\e[m\[\033[01;31m\]$(dirty_git_prompt)\[\033[01;32m\]$(clean_git_prompt)\[\033[00m\]\$ ' [18:37:14] the top one is good? [18:37:17] The commented-out one is the original, the top one is mine [18:37:28] Change abandoned: Demon; "This isn't necessary." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6620 [18:37:33] It fixes the bug, and adapts it to my prompt style, so there are differences in other places I guess [18:37:55] awww [18:37:59] it's not multi-line anymore :( [18:38:00] Or... maybe not? [18:38:23] notpeter: yes..db30 was the problem [18:38:26] and doesn't have command numbers [18:38:27] <^demon> I like my PS1 :) http://www.mediawiki.org/wiki/File:ChadBash.png [18:38:30] you want a multi-line PS1?! [18:38:30] cmjohnson1: cool! [18:38:37] much better than ms7 :) [18:39:02] lesliecarr: i am back in DC [18:39:22] Mine is very similar to Chad's [18:39:59] cmjohnson1: cool [18:40:00] catrope@roanLaptop:~/mediawiki/git/extensions/VisualEditor (master)$ [18:40:01] notpeter: yes much better! [18:40:08] when are you going to be there until do you think [18:40:16] until 6 [18:40:18] Ryan_Lane: BTW could you merge https://gerrit.wikimedia.org/r/#/c/5444/1 please? [18:40:22] est [18:40:45] ^demon: What's your implementation of __git_ps1 ? [18:40:46] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5444 [18:40:47] ha ha ha http://xkcd.com/1051/ [18:40:48] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5444 [18:41:02] Oooh wait, it's built in ? [18:41:06] <^demon> RoanKattouw: Dunno, it's part of git-completion [18:41:36] RoanKattouw: done [18:41:37] Cool [18:41:48] It doesn't color it green or red based on whether you have local mods though [18:46:03] heh [18:46:06] ^demon: __git_ps1 is useful, thanks [18:46:08] I put my multiline back in [18:46:12] * RoanKattouw has now rewritten Ryan's thingy to use __git_ps1 [18:46:19] wait [18:46:21] where' that? [18:46:24] It's still slow with submodules though [18:46:38] Ryan_Lane: git-completion package has __git_ps1 [18:46:42] ah [18:46:46] doesn't work on os x then [18:46:46] Which returns " ($branch)" [18:46:54] and no color, right? [18:47:15] I can use it on fenari, though [18:47:29] No color, correct [18:48:53] i'm gonna go get some lunch, bbiab [18:49:05] LeslieCarr: have a good lunch [18:50:27] Oh God [18:50:37] strace tells me git status invokes itself for each submodule [18:50:51] <^demon> Ryan_Lane: `source` https://raw.github.com/git/git/master/contrib/completion/git-completion.bash into your .profile on OSX. Bash completion in git is a must :) [19:13:03] nopeter: are you going to leave db30 down? [19:13:28] notpeter ^^ [19:13:54] cmjohnson1: hhhhmmmm. not sure. [19:14:03] lemme ask asher when he's back online [19:14:12] whether he thinks it's worth repairing or not [19:14:20] but for now, yeah, dont' worry about it [19:14:27] ok..cool [19:15:03] RoanKattouw: got a sec? [19:16:15] notpeter: sure [19:17:07] just wanted to check in about my apache disk moving shennannigans [19:17:25] OK [19:17:30] I have been using the apaches present in mediawiki-installations dsh group [19:17:33] I was looking at what you were trying to do and it looks good to me [19:17:35] this is a good list to work from, yes? [19:17:38] Yes [19:17:42] ok [19:17:51] Although from tailing your log file and checking the boxes, I couldn't see real success on any box except mw60 [19:17:56] It looked like the rest silently failed [19:18:03] was this yesterday? [19:18:16] I closed my laptop at one point and agent forwarding died... [19:18:22] it looks to be doing stuff now [19:18:32] Yeah that was yesterday [19:18:42] ok, cool [19:19:28] tarting srv198 at Thu May 3 22:38:00 UTC 2012 [19:19:29] srv198 done at Thu May 3 22:38:01 UTC 2012. sleeping for 5 minutes [19:19:31] starting srv199 at Thu May 3 22:43:01 UTC 2012 [19:19:32] srv199 done at Thu May 3 22:43:01 UTC 2012. sleeping for 5 minutes [19:19:34] starting srv200 at Thu May 3 22:48:01 UTC 2012 [19:19:35] srv200 done at Thu May 3 22:48:02 UTC 2012. sleeping for 5 minutes [19:19:37] starting srv201 at Thu May 3 22:53:02 UTC 2012 [19:19:38] etc etc [19:19:47] on monday (or some point after the script is done running) would you be willing to resync the apaches, just to make sure that everything is up to date? [19:19:52] I inspected df -h on a few of those and it didn't look right [19:19:53] For sure [19:20:05] Also, you should figure out some command to validate that it has done the right thing [19:20:11] ok, I will take a look [19:20:13] Like df -h | grep apache or something [19:20:19] yeah [19:20:31] that was my thought on the dsh mediawiki-installtion node list [19:20:38] And then run something like if [ -z `df -h | grep apache` ]; then echo YELL AT PETER; fi over dsh [19:20:49] heh, yes [19:21:06] To find any hosts that you missed because of the agent bug, or just in general [19:21:14] I mean, it must be mostly working, or else the site would be down by now :) [19:21:19] yeah [19:21:24] Yeah I read the switch script and it looked good to me [19:21:29] I've been watching as it goes [19:21:44] Although I'm not /entirely/ sure how job runners would be affected [19:21:58] I also did the imagescalers by hand, becuase they actually use /a [19:22:05] Oh, right, yeah [19:22:21] hhhhmmmm, I wasn't sure if the jobrunner used anything from the codebase, so I stopped it just to be sure [19:23:12] Wasn't that code commented out? [19:23:23] Oh now it's not [19:23:50] Hmmm [19:23:56] Are you running this on live job runners just yet? [19:24:10] Cause ideally you'd "graceful" (-ish) the job runners as opposed to stopping them [19:25:53] start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/mw-jobs.pid --retry 60 [19:25:58] notpeter: when you're done the app srvr convo, let me know about db30 [19:26:12] That'll kill the parent process but allow the children to continue and finish their jobs [19:26:23] RoanKattouw: ah, ok. I can do that [19:26:32] Please do [19:26:43] You don't seem to have reached the bulk of the job runners yet (cause they're in mw* mostly) [19:26:59] RoanKattouw: what should I use to stop it? the above? [19:27:19] notpeter: please can you tell me if https://rt.wikimedia.org/Ticket/Display.html?id=2891 has been processed? Erik just bumped the priority of the bugzilla bug for it (https://bugzilla.wikimedia.org/show_bug.cgi?id=36477) [19:27:21] Yeah the start-stop-daemon command above [19:27:26] And then the normal command to start [19:28:02] RoanKattouw: ok [19:28:06] Oh, hm, you already did a bunch of them [19:28:21] well, I need to restart anyway, as my computer just crashed :/ [19:29:08] Hmm, so don't bother [19:29:14] You have already done almost all job runners [19:29:19] kk [19:29:29] So we'll have lost a few update jobs probably. Whatever. [19:29:43] New patchset: Asher; "adding db24 to decom list" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6625 [19:30:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6625 [19:30:14] thehelpfulone ..paravoid is going to work on it [19:30:28] ok thanks woosters [19:30:39] Thehelpfulone: yeah, sorry, I didn't see that until just now [19:30:40] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6625 [19:30:40] in fact, I just did [19:30:42] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6625 [19:30:46] yeah, looked closed ;) [19:30:47] added already :-) [19:30:48] I was confused [19:30:54] great! [19:31:06] easy enough [19:31:12] any of you able to create the wiki too? [19:31:15] …once you read the docs :-) [19:32:20] binasher: notpeter is all yours now as far as I'm concerned [19:32:37] !log powering off db24 [19:32:40] Logged the message, Master [19:33:20] Thehelpfulone: I was told that it's better for the devs to do that since it's convoluted and they know better. [19:33:28] RoanKattouw: ok, just to make sure we're on the same page: run a sync on monday just to be sure. and I'll df -h | grep apache over mediawiki-installations just to be sure, yes? [19:33:31] I see some docs about it, not sure if it's uptodate [19:33:41] notpeter: Yeah [19:33:58] binasher: chris said that there was some awful highpitched whine coming from ms7 or db29 or db30. I turned off db30 to see if it went away. it did [19:34:16] oh, great [19:34:17] we could either fix it, decom it, catch it on fire and toss it into the bay [19:34:26] much much better than ms7! [19:34:35] ms7 deserves a much worse fate [19:34:42] Tossing it into the Bay is not enough [19:34:48] we were going to use db30 for that storage cluster test [19:35:02] paravoid: ok no problem, maybe RoanKattouw can do it :) [19:35:04] so if chris could get us 6 months of life out of it, that'd be good [19:35:19] binasher: ok, I'll put in a ticket [19:35:30] Thehelpfulone: I can do what now? [19:35:36] probably just a dying fan [19:35:52] RoanKattouw: create wikimania2013 wiki? https://bugzilla.wikimedia.org/show_bug.cgi?id=36477 [19:36:31] Ooh [19:36:35] Do we have DNS for it yet? [19:36:44] yes [19:36:47] yes, paravoid just did it [19:36:56] OK [19:39:58] notpeter: i wonder if we can just use db30 in all of its failing glory [19:40:03] and then decom it for good [19:40:47] binasher: I mean, I'm ok with that. but chris found it difficult to be near it it was so annoying [19:41:08] so, I mostly want him to make it ok for him to be around :) [19:45:21] that sounds like the better choice. ok, we need to scrounge up hardware [19:51:10] fyi, i will be performing a master swap on s2 in a few minutes [19:51:33] binasher: don't fuck up [19:53:19] plwiki is on there. and when george w bush tells the nation "don't forget poland", i listen. [19:53:55] binasher: any idea why the commit for s2-master says "Author: faidon"? :-) [19:54:41] New patchset: Asher; "changing s2 master from db13 to db52" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6627 [19:54:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6627 [19:56:01] paravoid: looks like you may have saved your personal svn credentials into root@sockpuppet [19:56:27] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6627 [19:56:29] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6627 [19:57:08] binasher: eh!? [19:57:15] what credentials? [19:57:18] no I didn't [19:57:42] I just svn commit --username faidon, as to preserve attribution/blame [19:57:49] just did* [19:58:35] ah [19:58:54] i just did an svn commit [19:59:30] your username was saved in .subversion/auth/foo [19:59:36] argh [20:00:02] at least this repo is moving to git [20:01:22] ok, here goes [20:05:18] !log performing mysql replication steps for s2 master switch to db52 [20:05:20] Logged the message, Master [20:10:08] s2 switch looks ok, there are just jobrunners with the old conif [20:10:31] unrelated, it looks like db40 (parsercache) is having problems [20:10:49] cmjohnson1: hey [20:10:59] heya [20:11:13] so we need to move one of the links between tampa and eqiad from cr1-sdtpa to csw1-sdtpa [20:11:29] ok [20:12:08] so that we can not have both links going to cr1 [20:12:33] redundancy? [20:12:38] yep [20:13:30] so on cr1-sdtpa we'd like to move the link on xe-0/0/1 to csw1-14:1 [20:13:46] can you double check and see if there's an optic on 14/1 on csw1 (or an optic available for it?) [20:14:16] y [20:16:21] lesliecarr: yes their is an optic available [20:16:42] yay [20:17:01] cool, so let's try moving this (fyi if all of a sudden all the internets die, please unplug) [20:17:11] and then say goodbye to leslie cuz i will kill myself [20:17:39] i want to be clear on the fiber I am moving from cr1 [20:18:20] mw7: ssh: Could not resolve hostname mw7: Name or service not known [20:18:22] wtf? [20:18:36] I mean connection timeout sure [20:18:38] sure, xe-0/0/1 should be the 2nd from the left on the chassis itself [20:18:39] But no DNS? [20:18:53] RoanKattouw: just mw7 [20:18:54] that's weird [20:19:09] just set disable, so hopefully the led should be off :) [20:19:17] okay [20:19:44] notpeter: Well there are other errors of course but those are the usual connection timed out messages [20:19:45] let me know when to move it [20:21:03] lesliecarr: ^ [20:21:11] move now please [20:21:14] k [20:21:57] k [20:22:05] it looks like db40 perf started to tank at 19:30 [20:22:48] any problems? [20:23:21] well nothing's exploding yet… :) [20:23:29] always a good sign [20:24:48] but not currently up …. [20:26:56] arg db40 20504 20:17:02 InnoDB: Warning: cannot find a free slot for an undo log. [20:33:48] notpeter: db30...you can fire it back up until it dies...I was more concerned that it was ms7 and could've been a bigger issue [20:34:35] cmjohnson1: ok! cool [20:34:55] i will update and close ticket [20:39:09] cmjohnson1: sigh, can you revert the plugging ? [20:39:39] for some reason i can't get ip connectivity going [20:41:31] lesliecarr: okay [20:42:56] reverted [20:45:10] New patchset: Ryan Lane; "Follow up to I024289f6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6629 [20:45:20] thanks [20:45:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6629 [20:45:29] sigh [20:45:32] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6629 [20:45:34] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6629 [20:58:49] Reedy: how's the packing going? ;) [21:11:50] cmjohnson1: still around ? [21:11:58] yep [21:13:53] can we do a second try on the link move ? [21:14:07] sure...ga and move it? [21:14:18] yes please [21:14:20] k [21:15:20] okey [21:17:04] Can someone please add http://p.defau.lt/?Q2SSY_m8GsVNOG4TxO1j9w to /home/wikipedia/conf/httpd/remnant.conf ? (Damned special wikis [21:17:35] yes, someone please do that for Reedy :D [21:17:55] cmjohnson1: oh yay [21:17:58] this is working better [21:18:04] when i fix the config it's better [21:18:29] usually helps :P [21:18:44] Pffft, configs [21:28:37] hi maplebed [21:29:34] hey - sorry, in a meeting. [21:29:38] I'll be busy all afternoon. [21:29:44] heh np [21:30:42] * Thehelpfulone points LeslieCarr to Reedy's earlier comment [22:17:04] Can someone please add http://p.defau.lt/?Q2SSY_m8GsVNOG4TxO1j9w to /home/wikipedia/conf/httpd/remnant.conf ? (Damned special wikis [21:30:53] People are busy [21:30:55] New patchset: Faidon; "Add scriptencoding to root's vimrc" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6632 [21:30:59] Hence asking non specifically :p [21:31:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6632 [21:34:12] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6632 [21:34:14] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6632 [21:38:08] Reedy: heh, one can try, the bug has been in the system for 36 hours and the HK team believes that's slow - they should see the backlog :P [21:38:30] Are they not active community members then? [21:39:09] they are plenty active i think [21:39:23] or at least i've met some of them on multiple continents each [21:39:37] i've no idea if they have experience doing shell reqs though [21:40:12] yeah they're all active, but probably not so much in the shell requests area [21:46:02] lesliecarr: rollback to cr1 [21:46:11] ? [21:46:23] yeah rollback to cr1 [21:46:24] :( [21:46:27] ok [21:47:07] done [21:49:17] cool [21:50:09] LeslieCarr: did you see yourself on the new recruitment video yet? [21:50:17] (although I probably shouldn't distract you from work) [21:50:23] thank you cmjohnson1 [21:50:38] yw [21:51:30] Thehelpfulone: i did, i need to hire a makeup artist to follow me around for sudden videos :) [21:51:40] hehe [21:52:02] you seemed to be bubbly enough about it though (and from what I hear it wasn't quite sudden you were told :P ) [21:59:54] hehehe, yeah i was :) [22:03:23] Thehelpfulone: you should provide links ;) [22:04:48] jeremyb: oh I did, I just didn't want to keep copying reedy's comments :P [22:04:49] Reedy> Can someone please add http://p.defau.lt/?Q2SSY_m8GsVNOG4TxO1j9w to /home/wikipedia/conf/httpd/remnant.conf ? (Damned special wikis [22:05:27] Thehelpfulone: sorry, i'm ambiguous. i was referring to the video [22:05:53] oh [22:06:17] http://commons.wikimedia.org/wiki/Image:Wikimedia_Foundation_extended_HR_video.ogv [22:06:39] and http://commons.wikimedia.org/wiki/File:Wikimedia_Foundation_90_second_HR_Video_with_Disclaimer_720p.theora.ogv for the shorter version [22:18:46] New patchset: Asher; "initial work to start collecting slow query digests and histories from all core dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6637 [22:19:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6637 [22:30:54] preilly: we never reverted that mobile change [22:30:59] preilly: should we revert ? [22:31:04] LeslieCarr: yes, please [22:31:18] LeslieCarr: oops [22:31:31] one day we will not have to revert! [22:31:33] one day... [22:31:37] LeslieCarr: ha ha [22:32:15] New review: Lcarr; "lint checker borked" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6641 [22:32:18] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6641 [22:32:21] Ryan_Lane: lint checker is still broken on gerrit :( [22:32:31] it's been working for me all day [22:32:33] Ryan_Lane: also not notifying when something is pushed [22:32:41] also been working for me all day [22:32:44] didn't do anyhting for my last two [22:34:50] !log clearing varnish cache and reloading varnish on mobile [22:34:53] Logged the message, Mistress of the network gear. [22:35:24] preilly: done … en.m still appears up [22:35:45] and now i know red tailed black cockatoos are sexually dimorphic [22:36:14] ... [22:37:25] it's a phenotypic difference between males and females of the same species :p [22:37:32] (ok, i got that from clicking through as well) [22:39:45] (that's TFA i guessed and was right) [23:05:15] LeslieCarr: thanks! [23:23:40] New patchset: Asher; "initial work to start collecting slow query digests and histories from all core dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6637 [23:24:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6637 [23:24:25] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6637 [23:24:28] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6637 [23:35:26] Ryan_Lane: http://pastebin.com/awD7xSVW [23:35:34] sweet [23:35:35] thanks