[01:54:37] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [02:23:51] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1724s [02:34:30] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:17:50] blog.wikimedia.org is only intermittently accessible because of the heavy traffic -- is there anything that should be done about it or just leave it? [04:18:09] RECOVERY - Disk space on es1004 is OK: DISK OK [04:18:48] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:24:05] casey, we're trying to move it [04:24:25] Yay. :-) [04:29:31] ping Ryan_Lane [04:29:41] howdy [04:29:44] evenin. [04:29:53] Erik asked me to come join the party. [04:29:58] what can I do to help? [04:29:58] we have a major blog event occuring, and the blog server is dying [04:30:04] we need to move it to another server [04:30:09] hooper has a dead disk [04:30:16] can we put a cache in front of it? [04:30:18] can you find me a server to move it to? [04:30:23] sure. [04:30:26] tampa? [04:30:27] I was about to do that, before I saw the dead disk [04:30:29] yeah [04:30:31] Ryan_Lane: heya [04:30:34] how about one of the OWA hosts? [04:30:37] \o/ [04:30:38] so this is easy [04:30:39] I have three for swift. [04:30:44] I don't need all of them. [04:30:44] one of the OWA will likely work [04:30:46] we have a high performance server [04:30:49] in tampa we can allocate to this [04:30:53] RobH: that's even better [04:30:58] can you get me that really quick? [04:30:58] nice. [04:31:04] so blogs are down now? [04:31:05] we're going to have to steal the IP from hooper [04:31:11] it's up, but won't be for long [04:31:14] cuz we should be able to push blogs back up with mirror [04:31:26] it eventually runs out of memory and dies [04:31:39] Ryan_Lane: we can't change dns to point to the new machine? [04:31:40] well, blogs is a CNAME for hooper [04:31:47] it'll take an hour for it to move [04:31:56] I guess we can keep rebooting hooper till it moves [04:31:56] so spin up with new ip in public vlan [04:31:58] that's ok, isn't it? [04:31:59] and then do that [04:32:01] it's not actually down now. [04:32:10] ok, let me snag a server and get the install running [04:32:16] maplebed: it's been dying pretty frequently [04:32:40] I'm going to drop the TTL to 5m now [04:32:48] so that maybe by the time the new machine is ready it'll be quick to shift. [04:32:50] they also removed the links that tell people to comment there [04:32:58] cool. thanks [04:33:13] RobH: I added some stuff to WP to help [04:33:20] W3 total cache and APC [04:33:28] is it puppetized? [04:33:35] only the webserver config [04:33:39] but we can just rsync the rest [04:34:57] Ryan_Lane: blog.wikimedia.org's TTL is now 5m. It will be 5m everywhere by 9:35pm. [04:35:04] sounds good [04:35:16] did you wipe it from the cache? [04:35:26] RECOVERY - Puppet freshness on spence is OK: puppet ran at Tue Jan 17 04:34:59 UTC 2012 [04:35:30] no, but a dig against all three of our nameservers shows 5m. [04:35:34] ah. cool [04:35:37] have we picked a new server now? [04:35:42] TimStarling: rob is on it [04:37:23] I notice there is a server called harmon which is in pmtpa and unused [04:37:36] Ryan_Lane: can you push dns changes i just made please, they are checked in [04:37:41] sure [04:38:30] done [04:39:50] Ryan_Lane: to confirm, we're just waiting for rob-h's word for now, right? [04:39:55] yep [04:40:04] RobH: is that system already installed and all? [04:40:15] no, working on it now [04:40:18] ok [04:40:19] its bare metal [04:40:24] * Ryan_Lane nods [04:41:30] Ryan_Lane: are you good with any networking stuff we'll need to do or should we ping leslie? [04:41:44] if it's junos, I should be able to do it [04:41:57] otherwise, I'd prefer leslie do it [04:42:23] wouldn't rob need the network stuff done before the system can be built? [04:42:37] depends on what parts change. [04:42:39] one of you login to asw-b4-sdtpa and set the vlan for me? [04:42:48] * maplebed looks at Ryan_Lane for that. [04:42:49] :) [04:43:00] relabel port WMF3641 to marmontel [04:43:05] its a ex4200 [04:43:12] if it was foundry, i could do it ;] [04:43:16] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:43:23] hm. why can't I ssh into it? [04:43:34] I've got leslie's number if we want her. [04:43:35] the dns for those is odd [04:43:42] it's .mgmt, right? [04:44:05] maplebed: I'd say ping her [04:44:05] asw-b4-sdtpa.net.mgmt.pmtpa.wmnet [04:44:13] doing so now. [04:44:21] you mean asw-b4-sdtpa.mgmt.pmtpa.wmnet ? [04:44:28] it's timing out for me [04:44:37] i mean i pulled that out of dns just now for 10.1.1.13 [04:44:45] asw-b4-sdtpa.net.mgmt.pmtpa.wmnet [04:44:52] oh, reverse DNS [04:44:58] yea, its not matching up [04:45:04] but ip should be fine [04:45:22] timing out [04:45:23] but if not, also has serial [04:45:25] Ryan_Lane: ^ [04:45:35] how do I access it? [04:45:35] scs-a1-sdtpa.mgmt.pmtpa.wmnet [04:45:43] pmshell to list ports [04:45:47] then # of port [04:45:54] disconnect from serial is ~~. [04:46:07] leslie's on her way in - a few minutes. [04:46:09] let me know when the vlan is tagged, i have dhcp setup for it now [04:46:34] Ryan_Lane: if hooper can stay online, we can migrate this a hell of a lot easier than fresh install =] [04:46:52] RobH: what system is this? [04:46:56] and which port is it on? [04:47:09] yeah. hooper isn't totally dead [04:47:13] i do not know the port, but it should be labeled for WMF3641 [04:47:20] which need to be renamed to marmontel [04:47:33] one port up from ms-fe2 [04:47:52] there's a few boxes in ganglia with very low utilisation if setting up a new box is going to take too long [04:48:13] the os install is minutes. [04:48:22] just getting to that is troublesome ;P [04:48:36] TimStarling: they would need to be apache hosts with public ip space running misc tasks [04:48:42] hey [04:48:45] yvon and gurvin both claim to be IPv6/SSL proxies [04:48:52] got called, we working on this channel ? [04:48:56] found it [04:49:02] they don't seem to be doing much [04:49:03] TimStarling: neither are [04:49:07] LeslieCarr: Ryan_Lane is working on tagging a vlan for a server deploy [04:49:47] okay, which port/valn [04:50:05] ge-1/0/2 [04:50:12] asw-b4-sdtpa, server wmf2642, relabeling to marmontel and setting to public vlan [04:50:12] let me exit this, so you can get in [04:50:26] ... [04:50:37] Ryan_Lane: bah, you didnt do it, why did you take those classes ;p [04:50:38] RobH: how do I get out of pmsell again? [04:50:42] ~~. [04:50:46] LeslieCarr will be much faster than me [04:50:55] I was in the interface to do it, but I have to find crap [04:51:02] ok. I'm out [04:51:05] ge-1/0/2 on asw-b4-sdtpa, yah ? [04:51:10] yep [04:51:21] needs to go into public services, or publicservices2? [04:51:29] RobH: ? did you assign an IP? [04:51:35] yep [04:51:42] as long as you pushed dns? [04:51:44] publcserices or publicservices2? [04:51:46] I did [04:52:00] 208.80.152.150 [04:52:05] ah publicservices, then [04:52:21] i think so yea [04:52:32] hrm, ge-1/0/2 is marked as WMF3641 [04:52:40] !log another dns update for servermgmt [04:52:43] Logged the message, RobH [04:52:55] LeslieCarr: confirm, but relabel now to marmontel as it has a name. [04:53:12] RobH: you said 3641 once and 3642 once [04:53:22] err, 2642* [04:53:38] huh? [04:53:48] you gave two server names [04:53:53] done, committing now [04:53:56] "server wmf2642" vs. WMF3641 [04:54:00] is it WMF3641? [04:54:01] https://racktables.wikimedia.org/index.php?page=object&object_id=1414 [04:54:19] yep. 3641 [04:54:23] ok. dhcp [04:54:27] the server with asset tag wmf3641 is marmontel. [04:54:29] i did that [04:54:33] all i need is the port. [04:54:37] ah. cool [04:54:39] it's committed, not pingable [04:54:43] its not online. [04:54:44] it isn't up yet [04:54:47] so shouldnt ping [04:54:51] okay that would be a good reason why :) [04:55:38] heh [04:55:58] just rsyncing and adding the puppet config should be good enough [04:56:06] let me do puppet really quick [04:56:17] cool, thanks [04:56:28] Ryan_Lane or RobH did you already do dhcp? [04:56:42] i'll still be online until this is done, just say my name and i will come running :) [04:56:43] yes, its done [04:56:50] cool. [04:56:58] Ryan_Lane: so you are taking care of marmontel puppet manifests? [04:57:02] yes [04:57:09] good times [04:57:11] or want me to get anything else or would i just be in the way ? [04:57:13] LeslieCarr: thanks a ton [04:57:16] I made some other puppet changes, so I should really be the one to do it :) [04:57:19] err [04:57:19] maplebed: np [04:57:21] apache changes [04:57:41] Ryan_Lane: so we should be able to rsync over all the stuff, as you said, and be ok with puppet runs [04:57:45] but we will see ;] [04:58:20] os install in progress [04:58:37] New patchset: Ryan Lane; "Adding blog to marmontel and allowing .htaccess in blogs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1927 [04:58:47] this will be a more robust server than hooper was, plus not sharing with etherpad. [04:59:01] claiming dns - I'm dropping all the other names (besides blog.) that point to hooper to 5m TTLs. [04:59:10] sounds good [04:59:12] you mean etherpad is down/slow too? oh noes [04:59:17] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1927 [04:59:18] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1927 [04:59:37] heh [04:59:46] I'm not worried about EP right now [05:00:04] and racktables =P [05:00:06] we should deploy EP-lite soon [05:00:06] does etherpad store its stuff in MySQL or locally? [05:00:09] mysql [05:01:54] partitions formattin [05:05:12] installing software part of os install. [05:08:25] rebooting into os, [05:10:16] crap [05:10:24] I ran the puppet command for ssl wrong [05:11:03] ? [05:11:11] so i am ready to do its first puppet run, not yet? [05:11:24] that should deploy the directories and the like for it. [05:11:42] Ryan_Lane: ? [05:12:02] (cert is awaiting signing on sockpuppet) [05:12:06] ok. puppet is running [05:12:17] ? [05:12:23] you ran it, or you mean on sockpuppet? [05:12:25] I ran it [05:12:29] =P [05:12:56] well, you doing rsync or shall i? [05:13:11] i assume you since yer running puppet [05:13:27] I got it [05:13:49] ok, so i will hang around a bit if needed can ping [05:13:52] ok [05:13:55] but if you are doing that, and maplebed will move dns [05:14:00] then nothing for third person to do now ;] [05:14:15] I want to see tests work before moving dns. [05:14:17] :) [05:14:29] agreed [05:15:24] well, my key worked on marmontel, so puppet's working. [05:16:03] rsyncing [05:16:30] !log installing php-apc on marmontel [05:16:31] Logged the message, Master [05:16:44] maplebed: you mean like hacking your localhost to push to it for blog? [05:16:51] cuz it should work otherwise, same backend. [05:16:58] heh. blog isn't puppetized properly [05:16:59] I was gonna use telnet, but yeah, something like that. [05:17:07] Ryan_Lane: nope, only does base setup [05:17:16] and apache config [05:17:19] Ryan_Lane: what a suprise. [05:17:20] :P [05:17:24] doesnt do the actual web frontend install [05:17:35] heh [05:17:45] you may also have to add permission for that host in the db [05:17:51] since it should be set to the specific hosts [05:17:59] * maplebed logs into the db to look. [05:18:05] RobH: do you know offhand what the db host is? [05:18:08] db9? [05:18:26] it used to be, but i think it may have moved [05:18:30] asher was moving services off it [05:18:34] ok. time to test [05:18:45] Ryan_Lane: do you know what host is the backend db? [05:18:52] Error establishing a database connection [05:18:53] db9 [05:18:58] seems we'll need a grant [05:19:20] do you know the username? [05:19:21] yea, its db9, confirmed on hooper [05:19:44] nm, robh answered. [05:20:01] alright, I'll set up the grant. [05:20:16] ok [05:21:02] hm. I wonder if it is missing php packages [05:21:28] all the packages for it should be installed via puppet [05:21:33] it did pass that part of testing [05:21:39] hm. at least one missing [05:21:41] not sure which one [05:21:59] .... [05:22:10] thats annoying, cuz puppet setup used to work for this [05:22:18] ah. tidy [05:22:27] Ryan_Lane: grants granted. [05:22:27] due to new plugin? [05:22:30] I installed some stuff [05:22:31] yes [05:22:36] so you broke it ;p [05:22:50] still says error establishing connection to database [05:23:02] oh, forgot to flush privsv. [05:23:04] try again? [05:23:13] still [05:23:22] I can connect via telnet [05:24:24] hm. wikidiff error [05:25:57] crap. what's the fix for that again? [05:26:07] !log installing the mysql client on marmontel to test connectivity to the DB [05:26:08] Logged the message, Master [05:26:14] heh [05:26:16] damn [05:26:21] =/ [05:28:32] fixed wikidiff2 issue [05:29:06] why does hooper have mysql installed? [05:29:09] server, tha is [05:29:27] shouldnt, does etherpad do it? [05:29:31] dunno [05:30:05] hm [05:30:09] I can connect via the client [05:30:50] oh [05:30:51] no I can't [05:31:10] Ryan_Lane: mysql is fixed. [05:31:16] (access, that is) [05:31:18] would you verify? [05:31:19] indeed it is [05:31:24] great. [05:31:45] (I had left hooper in the second grant statement) [05:32:05] it works [05:32:10] http://marmontel.wikimedia.org/2012/01/16/wikipedias-community-calls-for-anti-sopa-blackout-january-18/ [05:32:21] loogs good to me. [05:32:22] coolness [05:32:24] * maplebed tests with blog [05:32:28] seems to work for me [05:32:33] marmontel that is [05:32:34] let's switch DNS [05:32:44] this has way more memory. should handle the traffic much better [05:32:57] worked with blog.wikimedia.org for me. [05:33:01] dual cpu 6core and a shitton more ram [05:33:15] yeah, this should handle things much better [05:33:25] all agree I should switch DNS now? [05:33:26] ok, who's doing DNS? :) [05:33:27] yeah [05:33:30] ben is [05:33:51] I'm only moving blog first (not racktables or communityblog or ...) [05:33:57] only move blog [05:34:02] the rest don't move [05:34:03] communityblog? [05:34:04] we'll do them tomorrow [05:34:10] anything with blog [05:34:17] or else the redirection for blog feeds wont work [05:34:33] leave racktables and etherpad alone of course [05:34:40] those will remain on hooper, hooper just needs repair. [05:35:29] done with dns [05:35:34] maplebed: So please also move all the whatever_blogs [05:35:41] I didn't move the racktables or etherpad software [05:35:44] if blog works, shouldn't we move *blog? [05:35:44] or it breaks [05:35:45] we can handle that later [05:35:52] .... [05:35:58] we need to move all blog names now [05:35:59] right, what robh said. [05:36:01] or shit will break [05:36:17] those are just simple redirects for the blog feeds for departments on the blog server [05:36:30] gotcha. [05:36:34] prepping that change now. [05:36:36] RobH: I only needed to move the blog directory, right? [05:36:38] testblog can actually be dumped out [05:36:46] Ryan_Lane: yea, wp needs nothin else [05:36:50] cool [05:36:55] that's all I rsync'd [05:37:06] and the apache stuff, but thats puppet [05:37:15] kind of puppet anyway [05:37:27] I'll fix that soon [05:37:36] I can't believe no one installed php-apc :) [05:37:41] for shame! heh [05:38:00] I still didn't use memcache. I configured w3tc to use apc for caching [05:38:03] so i see no reason to move the other shit off hooper [05:38:07] ok, pushing the change for all the other blogs now. [05:38:12] its more than enough machine for whats left, and its just a bad disk right? [05:38:16] yeah [05:38:28] the blog was causing the server to swap death [05:38:35] its under warranty so will be all good then [05:38:40] yeah [05:38:44] Ryan_Lane: did you want to drop a ticket for hdd replacement in pmtpa? [05:38:52] sure [05:39:10] do note in ticket that its not hot swap, and downtime needs to be scheduled when the replacement disk arrives [05:39:31] done with dns for *blog [05:39:50] just fyi for ops [05:39:50] I haven't increased the TTLs back to 1H - I'd like to leave that for tomorrow. [05:39:57] the r410s we get tend to NOT be hot swap [05:40:00] yeah. thats a good idea [05:40:03] as we want the cabled controller [05:40:05] in case something goes wrong [05:40:13] cool [05:40:16] ok, im goin to bed. [05:40:37] heh, my dns is already updated [05:40:41] RobH: night! [05:40:46] and i use google, so they are updated [05:41:29] http://www.whatsmydns.net/#A/blog.wikimedia.org [05:41:36] yay for maplebed's earlier ttl change [05:41:43] ganglia is showing marmontel picking up traffic [05:42:31] I'm not sure I should be glad it took us the hour it took to drop the TTL to get the server ready, but I suppose it's still a win... :P [05:43:44] so, if someone would drop the mysql rights of hooper, or add a rt ticket for the cleanup atleast, that would rock [05:43:57] cuz it needs to get cleaned off hooper completely [05:43:57] heh. well, it's perfect timing [05:44:01] great job guys [05:44:10] indeed, night all =] [05:44:34] g'night! [05:44:43] LeslieCarr: night [05:44:44] call me if anything else falls over [05:44:46] will do [05:46:18] here's good evidence the DNS change went as expected: [05:46:19] http://ganglia.wikimedia.org/2.2.0/graph.php?r=hour&z=xlarge&title=&vl=&x=&n=&hreg[]=%28marmontel|hooper%29&mreg[]=bytes_%28in|out%29>ype=stack&aggregate=1&embed=1 [05:46:30] It's cool to see the 5m color shift. [05:47:31] !log marmontel has now replaced hooper as blog.wikimedia.org [05:47:32] Logged the message, Master [05:47:35] heh. indeed [05:48:33] * maplebed creates an RT ticket to drop blog privs for hooper from db9 [05:49:49] Ryan_Lane: I'm going to sign off as well. We're all done for the night, right? [05:49:55] yep [05:49:57] thanks for the help! [05:50:04] np. glad it went smoothly. [05:50:22] me too [05:59:39] thank u all! [08:24:05] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Tue Jan 17 08:23:40 UTC 2012 [09:12:58] New review: Dzahn; "about cron jobs running every minute. see:" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1926 [09:15:24]