[00:08:11] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:09:11] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [01:20:37] (03PS1) 10Jeremyb: brwikimedia: fix import sources [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 [01:20:42] (03PS1) 10Jeremyb: fix import sources for all chapter wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 [01:20:43] (03PS1) 10Jeremyb: import sources: move chapter wikis to own section [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 [01:23:08] (03CR) 10Nemo bis: [C: 031] "We usually use the one-letter prefixes IIRC (see below), but there are both (see above), so who cares" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 (owner: 10Jeremyb) [01:25:59] (03CR) 10Nemo bis: fix import sources for all chapter wikis (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [01:26:22] (03CR) 10Nemo bis: fix import sources for all chapter wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [01:26:59] (03CR) 10Jeremyb: "Nemo says:" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103730 (owner: 10Jeremyb) [01:27:03] (03CR) 10Nemo bis: "no opinion" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 (owner: 10Jeremyb) [01:30:26] (03CR) 10Jeremyb: fix import sources for all chapter wikis (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [01:31:43] (03CR) 10Jeremyb: fix import sources for all chapter wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [01:39:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [02:10:16] !log LocalisationUpdate completed (1.23wmf7) at Thu Dec 26 02:10:16 UTC 2013 [02:10:35] Logged the message, Master [02:17:31] PROBLEM - Puppet freshness on db9 is CRITICAL: Last successful Puppet run was Wed 25 Dec 2013 05:14:45 PM UTC [02:18:37] !log LocalisationUpdate completed (1.23wmf8) at Thu Dec 26 02:18:37 UTC 2013 [02:18:54] Logged the message, Master [02:35:43] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 26 02:35:43 UTC 2013 [02:35:57] Logged the message, Master [04:40:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [05:18:31] PROBLEM - Puppet freshness on db9 is CRITICAL: Last successful Puppet run was Wed 25 Dec 2013 05:14:45 PM UTC [07:41:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [08:17:32] (03CR) 10TTO: [C: 04-1] import sources: move chapter wikis to own section (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103732 (owner: 10Jeremyb) [08:19:31] PROBLEM - Puppet freshness on db9 is CRITICAL: Last successful Puppet run was Wed 25 Dec 2013 05:14:45 PM UTC [08:50:32] (03CR) 10Chad: [C: 04-1] "I wonder if we can kill sillyshell entirely while we're here, rather than copying it over to the module." (037 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [10:32:58] (03CR) 10Nemo bis: [C: 04-1] Disable interwiki magic for wikimedia (chapter) sites [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103649 (owner: 10TTO) [10:39:09] (03PS2) 10Nemo bis: fix import sources for all chapter wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [10:41:12] (03CR) 10Nemo bis: fix import sources for all chapter wikis (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103731 (owner: 10Jeremyb) [10:42:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [11:20:31] PROBLEM - Puppet freshness on db9 is CRITICAL: Last successful Puppet run was Wed 25 Dec 2013 05:14:45 PM UTC [11:43:02] (03CR) 10Nemo bis: "Thanks for actually explaining on the bug now. :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103649 (owner: 10TTO) [11:49:03] (03CR) 10Odder: [C: 031] "It is! Well done, thanks Mateusz!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/99768 (owner: 10M4tx) [13:43:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [14:21:31] PROBLEM - Puppet freshness on db9 is CRITICAL: Last successful Puppet run was Wed 25 Dec 2013 05:14:45 PM UTC [14:29:01] RECOVERY - Puppet freshness on db9 is OK: puppet ran at Thu Dec 26 14:28:55 UTC 2013 [14:29:37] !log killed stuck puppet agent on db9 and forced manual puppet run. [14:29:55] Logged the message, Master [14:30:34] hi akosiaris, no vacation? [14:31:24] matanya: hey. Just lower productivity [15:38:32] PROBLEM - Host virt1007 is DOWN: PING CRITICAL - Packet loss = 100% [15:43:41] RECOVERY - Host virt1007 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [16:05:14] cmjohnson1: working today? [16:05:34] yes...but not going to DC till 2 [16:05:53] what's up [16:06:23] Two of those virt boxes are giving me trouble, and I don't really know how to diagnose. [16:06:43] what kind of trouble? [16:06:51] virt1007… partman fails because it can't find [16:07:09] well, it maybe can't find /any/ drives, I can't tell. It has "primary unknown/dev/sdh1" many different volumes [16:07:30] I can copy/paste a bunch of the partman output if that's useful... [16:08:06] okay [16:08:26] the Cisco servers are very temperamental. [16:08:58] here's a snip of partman output: https://dpaste.de/5XeY [16:09:01] there's a lot more like that :) [16:09:16] what boot cfg are you using? [16:10:07] virt-raid10-cisco.cfg [16:10:29] which worked for seven of the nine. [16:10:48] fyi: these were the same issues I ran into with an1007 [16:11:08] This is a different box, right? [16:11:37] Seems like if it can't see any drives then probably a controller card is fried or just needs re-seating or something? [16:11:40] * andrewbogott takes a wild guess [16:14:41] I believe it has something to do with the controller. I do have a spare on-site so I can swap out. [16:15:01] and see if that works. I need to view the settings first [16:15:20] cisco support is really sketchy for these since they were donations [16:15:34] * andrewbogott nods [16:15:56] I am going to DC at 2...to swap out main board for an1012...I will look at it then [16:16:54] andrebogott: which two? [16:17:13] virt1007 and ? [16:17:56] 1006 is doing a different thing [16:18:05] of course :) [16:18:14] When it boots, it gets to the point of requesting an IP [16:18:28] but then it just hangs there. Brewster offers and offers but virt1006 hangs [16:18:48] I'm hopeful that that's some kind of networking thing rather than a hardware failure, but… no idea really. [16:19:12] LeslieCarr observed the same behavior on Tuesday and didn't have any idea either. [16:19:47] i will poke at it [16:20:32] thanks [16:22:02] andrewbogott: did you try to run the client in debug mode? [16:22:29] matanya: I'm not sure I know what that means :) [16:25:25] andrewbogott: are you using dhclient? [16:26:14] Oh, um… this is pxe boot on a new server. So, no proper OS at all… I wouldn't know how to do things interactively. [16:28:41] andrewbogott: want me to through some ideas or you ar good? i have 2 or 3 [16:29:24] I think if cmjohnson1 is willing to look I'm happy to leave it to him. Most likely this box will land on the scrap heap :( [16:30:07] andrewbogott: sure [16:30:26] matanya andrewbogott: feel free to poke around [16:30:36] I will not get to it till later today. [16:30:51] let me know if you find anything [16:44:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [17:05:22] (03PS1) 10Ottomata: Adding jmxtrans classes for datanode and nodemanager [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/103755 [17:06:34] (03PS2) 10Ottomata: Adding jmxtrans classes for datanode and nodemanager [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/103755 [17:06:49] (03CR) 10Ottomata: [C: 032 V: 032] Adding jmxtrans classes for datanode and nodemanager [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/103755 (owner: 10Ottomata) [17:19:10] (03PS1) 10Aaron Schulz: Made CDB scripts less verbose [operations/puppet] - 10https://gerrit.wikimedia.org/r/103759 [17:30:40] (03CR) 10Chad: [C: 031] Made CDB scripts less verbose [operations/puppet] - 10https://gerrit.wikimedia.org/r/103759 (owner: 10Aaron Schulz) [18:00:16] (03CR) 10Dekel E: [C: 031] Enable $wgImportSources for Hebrew Wikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103118 (owner: 10Odder) [18:27:03] (03CR) 10Ori.livneh: [C: 032] Made CDB scripts less verbose [operations/puppet] - 10https://gerrit.wikimedia.org/r/103759 (owner: 10Aaron Schulz) [18:45:04] (03CR) 10Jdlrobson: [C: 031] "Merge at will! :)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103611 (owner: 10Kaldari) [18:52:43] (03PS1) 10BryanDavis: Revert "Configure Varnish not to cache scholarship app reqs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 [18:53:06] (03CR) 10jenkins-bot: [V: 04-1] Revert "Configure Varnish not to cache scholarship app reqs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 (owner: 10BryanDavis) [18:55:17] (03PS2) 10BryanDavis: Revert "Configure Varnish not to cache scholarship app reqs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 [18:56:06] (03CR) 10BryanDavis: "Patch set 2 was a manual rebase." [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 (owner: 10BryanDavis) [19:04:37] (03PS4) 10Andrew Bogott: Added 'adminadd' tool to auto-generate new user entries. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 [19:06:29] (03PS1) 10Ottomata: Adding jmxtrans README and convenience worker and master classes [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/103769 [19:06:51] (03PS5) 10Andrew Bogott: Added 'adminadd' tool to auto-generate new user entries. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 [19:06:56] (03CR) 10Ottomata: [C: 032 V: 032] Adding jmxtrans README and convenience worker and master classes [operations/puppet/cdh4] - 10https://gerrit.wikimedia.org/r/103769 (owner: 10Ottomata) [19:09:33] (03PS1) 10Aaron Schulz: Prevent two people from running scap at the same time [operations/puppet] - 10https://gerrit.wikimedia.org/r/103770 [19:10:08] (03PS6) 10Andrew Bogott: Added 'adminadd' tool to auto-generate new user entries. [operations/puppet] - 10https://gerrit.wikimedia.org/r/98700 [19:10:34] ori: those 3 lines work for me in a script in ~ on tin [19:14:57] (03PS1) 10Andrew Bogott: Add Katie Filbert (Aude) to mortals. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103771 [19:14:59] (03PS1) 10Ottomata: Changing default run_interval to 15 and log_level to info. [operations/puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/103772 [19:15:12] (03CR) 10Ottomata: [C: 032 V: 032] Changing default run_interval to 15 and log_level to info. [operations/puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/103772 (owner: 10Ottomata) [19:16:54] (03PS1) 10Ottomata: Now using jmxtrans instead of built in ganglia support for Hadoop [operations/puppet] - 10https://gerrit.wikimedia.org/r/103773 [19:17:30] (03CR) 10Ottomata: [C: 032 V: 032] Now using jmxtrans instead of built in ganglia support for Hadoop [operations/puppet] - 10https://gerrit.wikimedia.org/r/103773 (owner: 10Ottomata) [19:30:31] andrewbogott: going to try and swap nic card out on virt1006 with the spare cisco I received from tampa. it's not the cable, connected to right port, the config looks right and vlan is correct [19:30:53] cmjohnson1: ok! Let me know when it's swapped and I'll try another pxe boot [19:33:50] (03PS2) 10Aude: Add Katie Filbert (Aude) to mortals. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103771 (owner: 10Andrew Bogott) [19:33:52] (03CR) 10Ori.livneh: [C: 031] Add Katie Filbert (Aude) to mortals. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103771 (owner: 10Andrew Bogott) [19:36:36] (03CR) 10Andrew Bogott: [C: 032] Add Katie Filbert (Aude) to mortals. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103771 (owner: 10Andrew Bogott) [19:37:01] \o/ [19:37:35] ori, can you brief aude about what's involved in actually having the 'mortal' right? [19:37:58] Specifically, which things she should or shouldn't do without checking in with other folks first? It's a pretty big hammer. [19:38:16] ok :) [19:38:23] 1) always blame Reedy [19:38:24] <^demon|away> Nobody gave me that talk! [19:38:29] ori: or if you think she should be in a different group… [19:38:34] <^demon|away> It was more like "here you go, enjoy" [19:38:50] andrewbogott: no, it's appropriate [19:39:04] * aude will just look at stuff like flourine at least to start [19:39:31] maybe ori and reedy can explain more when i come to sf in january [19:39:49] aude: I think the main point (which probably goes without saying) is -- you can now deploy new code, but please don't actually do that without scheduling a window and coordinating with people and such. [19:40:01] obviously [19:40:20] yeah, obvious, but I still feel like I should say it out loud :) [19:40:28] aude: the joking-aside bits are: don't use the prod key for anything other than prod; don't leave your computer unattended / unlocked; if you lose your computer, get in touch with ops asap and report it so that the key can be revoked [19:40:28] yep [19:40:40] * aude nods [19:41:57] don't download random stuff from the internet onto production boxes, etc. do you have a good ~/.ssh/config ? [19:42:12] yep [19:42:49] * ori invites others to chime in, if i'm missing something. [19:42:55] aude: have you tried logging in yet? [19:42:58] not yet [19:43:27] I was thinking, we should write all this down… then I remembered that we sort of did. A quick read: https://wikitech.wikimedia.org/wiki/Server_access_responsibilities [19:45:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [19:49:01] did i miss something important ori ? [19:49:17] ori: andrewbogott it works [19:49:35] cool! [19:50:28] probably takes time for puppet to run everywhere [19:50:36] matanya: hm? [19:50:47] aude, if you're waiting on a particular box I can give it a push [19:51:06] fluorine [19:51:25] no hurry [19:51:27] * aude can wait [19:51:49] i'm really glad this worked out [19:52:08] Oh, I've been spelling fluorine wrong [19:52:09] thanks ori [19:52:11] it makes me feel good about the wmf [19:52:28] can bug ree-dy less now [19:53:37] aude: there was a lot of work by quim, mutante, and robla to work out the requirements -- you should thank them, not me :) and andrewbogott for doing the deed [19:53:52] :) [19:54:10] aude, does fluorine like you now? [19:54:16] i saw : don't download random stuff from the internet onto production boxes, etc. do you have a good ~/.ssh/config ? [19:54:29] that looked like something important [19:54:41] matanya: http://ur1.ca/edq22 (in topic) [19:55:26] broken: https://gdash.wikimedia.org/dashboards/jobq/ [19:55:28] just ssh fluorine ? [19:55:43] aude, depends on how you're doing it. [19:55:54] Are you using proxycommand, or forwarding via bast1001, or...? [19:56:02] bast1001 [19:56:28] hm… yep, you have a home account there. [19:56:51] on fluorine I mean [19:57:05] andrewbogott: have you had the pleasure of trying to get new nrpe checks to show up in icinga before? [19:57:08] and [19:57:10] if I restart icinga [19:57:21] will everyone get text messages on their holiday vacas? [19:57:22] :p [19:57:23] aude, I'd advise you to set things up with proxy command like this, though: https://wikitech.wikimedia.org/wiki/Server_access_responsibilities#SSH [19:57:37] I was dragging my feet about that but just got things put together last week and it is /way/ better than key forwarding. [19:58:01] yeah, that's what i have [19:58:29] Ah, so… then the ssh command kind of depends on what you have in your config. [19:58:56] -A is for if i have to make changes for gerrit (unlikely) [19:59:12] ottomata: I haven't done much with icinga. I don't think it'll page people, but… you should probably ask someone who knows before trying :) [19:59:22] aude: yep, sounds like you're doing things right then [19:59:59] yeahhhh [20:00:05] maybe i'll wait on figuring that out :p [20:00:34] AaronSchulz: cool, i didn't know about using exec o create file handles [20:00:52] *to [20:01:06] aude, so, not working? [20:04:34] not yet [20:06:01] I see a fair number of access requests on fluorine that are maybe you… they're all root@ though [20:06:08] you should be able to log in as yourself, aude@ [20:06:27] (03CR) 10Ori.livneh: [C: 04-1] "The error message that is printed when scap cannot acquire the lock should mention the lock file name, so you don't have to fish around th" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103770 (owner: 10Aaron Schulz) [20:07:09] greg-g, ori, andrewbogott I have to push config changes to Parsoid (config broke last week when we removed all "default" entries from our config file, but that broke VE on a few wikis which I got an email about today) .. so, given that this is code freeze week, who do I co-ordinate this with? [20:07:21] subbu: greg-g [20:08:02] subbu: will it require purging varnish? [20:08:06] ori, do you know if he is working / online today. [20:08:08] ori, no [20:08:11] i do ssh -i (mykey) aude@bast1001.wikimedia.org [20:08:21] -i not needed probably [20:08:39] Oh, ok, lemme look at the logs on bast1001 [20:08:43] (I was on fluorine before) [20:09:07] ori, but, we might require someone to restart cluster in case the restart command doesn't work .. ryan was finetuning and testing that and not sure if it will work. [20:09:09] ok [20:09:25] whoah, puppet has a bad cert on bast1001. What the heck? [20:09:43] This would explain your troubles, but may take me a few minutes to sort out. [20:09:43] -i is indeed not needed but i think is better/faster [20:09:55] ok [20:10:12] for labs, i just do ssh tools-dev from bastion [20:10:16] or whatnot [20:10:41] subbu: greg-g should be around today, yeah. if VE is actually broken on some wikis i can't imagine he'd say no. let's wait for him to answer, but in the meantime i think you can prep. [20:11:01] ori, will do, thanks. [20:12:25] ori: I don't see the use of --force [20:13:38] (03PS2) 10Aaron Schulz: Prevent two people from running scap at the same time [operations/puppet] - 10https://gerrit.wikimedia.org/r/103770 [20:13:48] seems like it's asking for trouble [20:14:33] Aaron|home: OK, printing the lock filename is probably sufficient, in the unlikely scenario that someone needs to urgently scap and bypass some lingering lock [20:14:37] that would make sense with a "this is my deploy window lock", but not with "don't clobber data" lock [20:15:11] and cntrl-c on scap kills the lock anyway [20:15:14] Aaron|home: OK, I am persuaded. [20:15:20] it would be pretty rare to mess that up [20:15:20] ok [20:15:53] waiting for jenkins [20:20:07] * aude is spelling challenged [20:20:29] aude, try bast1001 now? [20:20:39] now i just get "Permission denied (publickey)." [20:20:45] i can try logout/in again [20:21:15] no luck [20:21:19] (03CR) 10Ori.livneh: [C: 032 V: 032] Prevent two people from running scap at the same time [operations/puppet] - 10https://gerrit.wikimedia.org/r/103770 (owner: 10Aaron Schulz) [20:21:44] Hm... [20:22:08] Failed publickey for aude from [20:22:40] no idea what i have to do different or what [20:22:57] aude, sometime recently a user solved this problem by restarting their terminal program… something about ssh and the shell caching a bad staet. [20:23:04] I would think that -i would address that in any case though... [20:23:07] Ryan_Lane: curious, do you use -t with ssh-add ? [20:23:33] is that the option that asks for permission every time? [20:23:41] no, that's -d or something [20:23:43] -t is ttl [20:24:11] nope [20:24:16] -c is confirm [20:24:39] Ryan_Lane: it's kind of cool...but does not get along with scap :) [20:25:40] why the need for ttl? [20:25:58] stop using scap. problem solved! :D [20:26:20] Ryan_Lane: in case one forgets ssh-add -D before leaving [20:26:56] ah. good reason [20:27:10] but if it expires mid-scap it's kind of funny [20:28:01] if you want to do a simple incremental update to scap, just make a separate account with a separate key [20:28:10] and have tin have that private key and be done with it [20:28:33] <- ssh key forwarding hater [20:29:52] I also hate ssh forwarding [20:30:23] but I'm a salt proponent [20:35:42] * Aaron|home wonders what ori thinks of that [20:41:08] subbu: heya, yeah, please fix :) [20:41:22] ok, thanks :-) [20:41:24] sorry, wifi at coffee shop wasn't working, and they were busy as heck, so just had to walk back home [20:41:57] subbu / greg-g: i think andrewbogott is available to help with the restarting [20:42:10] subbu: so you can just coordinate with him, since you got the green light from greg-g [20:42:39] ori, ok, I'm going to deploy now then .. ok, thanks. [20:43:05] I see that Ryan_Lane is also around in case service-restart doesn't do the trick. [20:43:17] ish. I'm a volunteer now [20:43:43] right. ok. [20:43:45] "don't wanna do it" carries more weight now [20:43:48] ;) [20:44:10] andrewbogott, you around to help with parsoid cluster restart, if necessary? [20:44:24] I'm around but only if you spoonfeed me the actual commands to run [20:45:09] hmm .. ok, let me look up my notes. [20:45:33] greg-g: well, if I'm around and not billing for other work I'd probably do it :) [20:45:44] I'm supposed to be billing for labs migration work right now [20:46:35] andrewbogott, ok have them. ok, will sync config changes in just a bit. [20:48:09] * Aaron|home tries scap now [20:48:25] Aaron|home: probably better to wait until parsoid deploy [20:48:29] until after, I mean [20:48:44] * Aaron|home cancels [20:48:48] meh, ok [20:49:26] * Aaron|home didn't see anything on https://wikitech.wikimedia.org/wiki/Deployments [20:49:57] !log updated localsettings.js to unbreak VE on wikis with "-" in their name (ex: nds-nl) [20:50:06] Aaron|home: it's an emergency deployment; coordination happened in this channel over the last 30 mins or so [20:50:14] Logged the message, Master [20:50:56] oh right, pushbot... [20:51:13] Ryan_Lane, FYI about service-restart. https://gist.github.com/subbuss/c8b6e7bc00ce36caf156 [20:51:24] andrewbogott, so, you'll have to restart cluster. [20:51:29] not surprising [20:51:33] that was fast! [20:51:35] OK... [20:51:43] related to the init script [20:51:49] you guys really need to switch to the upstart [20:52:03] this is what I have in my notes: using salt: salt -b 10% -G 'deployment_target:parsoid' parsoid.restart_parsoid parsoid [20:52:06] Ryan_Lane, yes, we haven't pushed the updated code yet. [20:52:09] yeps [20:52:20] we have migrated to upstart and deployed in rt testing. [20:52:24] * greg-g took too long to find the quote, but here it is anyways: [20:52:25] 13:28 < ori> AaronSchulz: do the scripts need some sort of concurrency control? [20:52:28] yep [20:52:29] ;) [20:52:32] !log restarting parsoid cluster via salt [20:52:47] Logged the message, Master [20:52:58] greg-g: which scripts? [20:53:08] Ryan_Lane: what Aaron|home is working on [20:53:21] ah, ok [20:53:44] subbu: how's that? [20:54:03] looks like that did it. [20:54:14] thanks. now to verify the fix does what it is supposed to. [20:54:19] andrewbogott, thanks [20:54:30] well, look at that: https://gerrit.wikimedia.org/r/#/c/103770/ "Prevent two people from running scap at the same time [20:54:33] " [20:55:45] Aaron|home, greg-g .. deployment done. [21:02:03] (03PS1) 10Aaron Schulz: Use LOCK_EX for making the CDB MD5 file for sanity [operations/puppet] - 10https://gerrit.wikimedia.org/r/103846 [21:03:05] sweet [21:03:17] !log aaron started scap [21:03:35] Logged the message, Master [21:07:37] !log Recreating GeoData index [21:07:55] Logged the message, Master [21:08:08] subbu: you're done with me, right? [21:08:30] andrewbogott, yes. [21:08:47] btw, are you in mpls or out somewhere? [21:09:51] mpls this week :) [21:10:41] and escaping somewhere after that? its been too fricking cold here .. anyway, we should find halfak and catch up again sometime. [21:10:47] I was in FL last week [21:10:58] going to singapore in mid-January. [21:11:07] But yeah, I'll be around for a few weeks so we can meet up. [21:11:09] oh my, ok :) [21:14:20] (03PS1) 10Ottomata: Removing disabled views [operations/puppet] - 10https://gerrit.wikimedia.org/r/103847 [21:14:34] (03CR) 10Ottomata: [C: 032 V: 032] Removing disabled views [operations/puppet] - 10https://gerrit.wikimedia.org/r/103847 (owner: 10Ottomata) [21:15:58] (03CR) 10Ori.livneh: [C: 032] Use LOCK_EX for making the CDB MD5 file for sanity [operations/puppet] - 10https://gerrit.wikimedia.org/r/103846 (owner: 10Aaron Schulz) [21:16:09] (03PS1) 10Ottomata: Adding Hadoop Ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/103848 [21:16:20] (03PS2) 10Ottomata: Adding Hadoop Ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/103848 [21:16:25] (03CR) 10Ottomata: [C: 032 V: 032] Adding Hadoop Ganglia view [operations/puppet] - 10https://gerrit.wikimedia.org/r/103848 (owner: 10Ottomata) [21:18:51] !log aaron finished scap [21:19:07] Logged the message, Master [21:20:09] (03PS1) 10Ottomata: Including hadoop view [operations/puppet] - 10https://gerrit.wikimedia.org/r/103850 [21:20:27] (03CR) 10Ottomata: [C: 032 V: 032] Including hadoop view [operations/puppet] - 10https://gerrit.wikimedia.org/r/103850 (owner: 10Ottomata) [21:23:57] (03CR) 10Chad: [C: 031] Revert "Configure Varnish not to cache scholarship app reqs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103768 (owner: 10BryanDavis) [21:25:41] ottomata: so an1012 is still broke...the 3rd mainboard came today and was damaged in shipping or packing so next week sometime now [21:25:43] (03PS1) 10Ottomata: Fixing hadoop stacked graphs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103851 [21:25:52] (03CR) 10Ottomata: [C: 032 V: 032] Fixing hadoop stacked graphs. [operations/puppet] - 10https://gerrit.wikimedia.org/r/103851 (owner: 10Ottomata) [21:26:19] haha [21:26:22] awesome, thanks cmjohnson1 [21:28:11] it was unbelievable....a corner was completely smashed ...the box was fine so it was packed that way [21:29:14] ha [22:15:30] (03PS1) 10Odder: Enable local TimedText on Portuguese Wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103861 [22:20:04] (03CR) 10Siebrand: "Should be merged once all Wikimedia sites run 1.23wmf9." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/96472 (owner: 10Siebrand) [22:46:31] PROBLEM - Puppet freshness on virt1007 is CRITICAL: Last successful Puppet run was Tue 24 Dec 2013 07:29:50 PM UTC [23:41:40] (03PS1) 10Aaron Schulz: Sync out the wikiversions.cdb in scap-1 only [operations/puppet] - 10https://gerrit.wikimedia.org/r/103870 [23:43:44] (03PS5) 10BryanDavis: [WIP] Logstash puppet class [operations/puppet] - 10https://gerrit.wikimedia.org/r/100395 [23:59:33] !log aaron started scap [23:59:50] Logged the message, Master