[00:00:01] df -h [00:01:12] it should be at /user/local/search/ls2/bin [00:01:30] should I sudo ant ...? [00:01:33] /usr/local? [00:01:38] yes [00:01:47] if it's in /usr, it requires root [00:02:09] ok [00:05:05] where should LocalSettings.php be located ? [00:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [00:13:09] ok it's indexing [00:13:31] I hope it won't run out of space [00:17:38] hi OrenBochman [00:19:05] hi [00:19:09] wassup [00:20:09] sumanah: do you have petan and notpeter's email ? [00:20:29] OrenBochman: yes, I think so, just a moment [00:20:40] OrenBochman: you may find http://toolserver.org/~krinkle/wikimedia-svn-users.php useful as a directory [00:21:01] OrenBochman: they're both listed in there [00:21:30] sumanah: we need to coordinate some work tommorow [00:22:26] look like search-test is indexing [00:24:49] OrenBochman: in my opinion you should email wikitech-l to update the dev community of what you're up to and what your needs are [00:24:56] hexmode: did you already talk with OrenBochman about what you need? [00:25:42] sumanah: OrenBochman and I talked. He is working with petan and (I think) notpeter [00:25:44] we talked - we will need assistence from notpeter [00:26:08] OrenBochman: let me see if he is online [00:26:23] i am [00:26:55] :) Well, I pinged him in -operations :) [00:27:02] thanks [00:27:23] he isn't around. I pinged him an hour or so ago [00:27:31] hrm [00:28:00] Ryan_Lane: guess I'll have to catch him tmw AM then? [00:28:06] I guess so [00:28:15] OrenBochman: when will you be around? [00:28:49] OrenBochman: is there anything I can ask notpeter to do if you're not around when he is available? [00:29:27] I'll write him an email [00:29:34] can I cc you as well [00:30:22] OrenBochman: pm'd my email ;) [00:32:30] ok [00:33:04] what can I call this project - search deployment in .... [00:34:00] are you based in europe ? [00:34:03] That sounds fine to me [00:34:13] I'm on the East Coast USA [00:34:29] so UTC+5 ? [00:34:36] in budapest [00:34:38] * hexmode has to go look [00:34:45] it's realy late here [00:35:30] yeah, time for bed. We can pick this up tomorrow and hope notpeter shows up [00:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [00:36:45] how can I tell to run a command even after I disconnect [00:37:44] OrenBochman: Use screen [00:38:21] screen [00:40:17] clear [00:42:53] hexmode: is there going to be a deployment of the dumping [00:44:16] OrenBochman: petan and johnduhart have been managing imports. I think they're only taking a small % of the pages ... except for simple [00:45:20] yep [00:45:58] I mean the machine that runs the wikipedia dumps - from the database - is that going to be in the deployment [00:46:21] deployment-sql? [00:46:23] yes [01:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [01:25:55] i saw a mention from petan... [01:26:05] back in ~10 mins [01:33:47] got a problem [01:34:10] when running service apache2 restart I get [01:34:12] * Restarting web server apache2 [01:34:13] (13)Permission denied: make_sock: could not bind to address 0.0.0.0:80 [01:34:15] no listening sockets available, shutting down [01:34:16] Unable to open logs [01:34:18] ...fail! [01:34:30] jeremyb: could you have a look why apache wont restart [01:34:53] btw search is working on search-test [01:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [01:37:18] ok, here now [01:37:27] * jeremyb was on a bus :) [01:37:33] ;-) [01:37:59] OrenBochman: where was that apache issue? search-test? [01:38:04] I've been on a bus atram a trolibus a train and in my car [01:38:54] I modified the LocalSettings to work with search [01:39:11] 10 01:34:13 < OrenBochman> (13)Permission denied: make_sock: could not bind to address 0.0.0.0:80 [01:39:17] where is that? search-test? [01:39:37] yes [01:40:03] were you running it with sudo? [01:40:07] yes [01:40:17] anyway, worked for me... [01:40:18] $ sudo service apache2 restart * Restarting web server apache2 ... waiting ...done. [01:40:43] works now [01:41:21] but I still get results from the stabase not from the lsearchd [01:41:53] but I've tested lsearchd with curl and it works [01:41:55] let me see [01:41:58] arrrrr [01:42:20] TLAPD? [01:42:29] ? [01:43:05] I made a copy of LocalSettings before I messed with it [01:43:43] it's LocalSettings.php.bak [01:44:11] OrenBochman: https://en.wikipedia.org/wiki/TLAPD ;) [01:44:18] OrenBochman: anyway, looking [01:46:06] hillerious [01:48:53] Ryan_Lane: Does the nova database persist over a reboot? (That seems like a silly question, but mine does not. I suspect it's because devstack is wiping it clean...) [01:49:12] devstack is wiping yours clean [01:49:19] it normally persists [01:49:20] ok, i won't worry about that then :) [01:49:29] and so does the ampq [01:49:52] * Ryan_Lane is reading about the ampq right now :D [01:50:53] I think I knew that queue messages were supposed to persist over reboots but it still strikes me as unlikely. [01:50:55] I need to move virt1 into the cluster, and make a virt0 that will replace it [01:51:02] heh [01:51:16] ampq supports persistent queues and non-persistent queues [01:51:38] I think most if not all of the openstack queues are persistent [01:53:30] hmm. rabbitmq supports ldap [01:53:42] that could be interesting for use inside of instances [01:53:45] for auth? [01:53:49] yeah [01:54:20] is firewall not enough? [01:54:54] how would you avoid users messing up other user's queues? [01:55:31] i'm not seeing where it would be used [01:55:41] this would be for a user to make a queue in one project, and pick up messages in another [01:55:51] this is for e.g. jobqueue AAS? [01:55:55] for instance, I'd love to have a unified irc bot [01:56:12] where an IRC bot runs in one project, but can pick up messages from everywhere [01:56:24] and I can stick a message in from anywhere [01:56:58] it sucks to need to run a different bot for every single ircecho we need [01:57:15] sure [01:57:25] if we had a queue, all the ircecho daemons could stick messages into a queue, and one bot could spit out the messages [01:57:30] btw, only marginally related but have you seen pubsubhubbub? [01:59:20] * jeremyb sees tcp6 0 0 :::8123 :::* LISTEN 15915/java  [01:59:45] that the search [01:59:48] lots of ports it's listening on [01:59:56] but i guess that's the nature of java... [02:00:03] (profiler attaching port, etc.) [02:00:18] it listeing on 8123 [02:01:41] I got to sleep [02:01:47] night [02:01:50] jeremyb: yeah, I've seen pubsubhubbub [02:01:52] layla tov [02:02:10] 10x [02:02:59] btw it could be a puppet problem [02:03:11] perhaps we need to open the port on localhost [02:03:46] i don't follow [02:03:51] it's all on one box, right? [02:03:58] yep [02:04:52] we should consult with Ryan_Lane [02:05:04] i think i've got it for now [02:05:12] eh? [02:05:18] what's the issue? [02:05:31] can we have a firewall issue [02:05:45] between instances in the same project? [02:05:50] with a port on localhost [02:05:55] or between instances in different projects? [02:05:55] no [02:05:58] same machine [02:06:00] ok [02:06:02] there are no firewalls in labs [02:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [02:06:13] excluding the security groups [02:06:24] and that's between projects [02:06:28] ok [02:06:29] well, and the outside world [02:06:36] but it's not even hitting the switch [02:06:39] for loopback [02:06:49] but in-project all incoming traffic is allowed [02:07:09] or even the host. it's all within the guest [02:08:08] ok just cheking - Once upon a time I helped write an applicative fire wall that blocked evrything and it's mother [02:08:39] heh [02:08:56] ok I'll look at it 2morrow If you don't figure it out [02:10:28] did it do genealogical research? [02:10:33] was it mormon? [02:17:24] here we go... [02:17:25] Creating titlekey table...ok. [02:17:25] Populating titlekey table... [02:19:45] and it's hitting the search daemon [02:21:46] !log search search-test: added to LocalSettings.php: error_reporting( E_ALL );ini_set( 'display_errors', 1 );$wgShowExceptionDetails = true;$wgShowSQLErrors = true; [02:21:47] Logged the message, Master [02:22:16] !log search search-test: SQL error for missing titlekey DB. ran update.php and it showed up [02:22:17] Logged the message, Master [02:22:52] !log search search-test: MWSearch now seems to be working (or at least it's definitely being used) [02:22:53] Logged the message, Master [02:25:44] Ryan_Lane: i'm confused. we allow anon edits on labsconsole? [02:25:49] (see RC) [02:26:13] also, Labslogbot should have bot flag [02:26:32] hashar created it [02:26:43] created what? [02:26:50] for some reason mediawiki fucked up and thought the user's username was an IP [02:27:02] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000000db [02:27:07] that was the page created by the IP [02:27:19] yeah... [02:27:27] notice the instancecreator value [02:27:49] i do now... [02:31:29] this is either an OSM bug, or a MW bug [02:31:47] at first I thought it was an LdapAuthentication bug, but the session issue I was having was a MW bug, and that's been fixed [02:32:05] I thought that would also fix this bug, but apparently not [02:32:16] maybe I'm somehow messing up wgUser [02:33:07] the really funny part is that the instancecreator_username variable is set using wgUser [02:33:21] so I don't know how MW isn't getting the right value [02:33:59] damnit, i keep thinking that's openstreetmap [02:34:03] heh [02:34:47] aude: ^^^ [02:34:54] we can share the pain :) [02:35:30] Ryan_Lane: consider HOT officially on your project naming blacklist [02:35:41] what's HOT? [02:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [02:36:10] http://hot.openstreetmap.org/ [02:36:15] heh [02:39:54] ughhh wtf [02:40:01] ? [02:40:06] johnduhart@deployment-web:/usr/local/apache/common$ php bin/sql.php [02:40:06] PHP Warning: require_once(/usr/local/apache/common/wmf-config/DectectSite.php): failed to open stream: No such file or directory in /usr/local/apache/common/bin/sql.php on line 24 [02:40:06] PHP Fatal error: require_once(): Failed opening required '/usr/local/apache/common/wmf-config/DectectSite.php' (include_path='.:/usr/share/php:/usr/share/pear') in /usr/local/apache/common/bin/sql.php on line 24 [02:40:12] the file is there [02:40:22] The website wouldn't work without it [02:40:27] why are you running that directly? [02:40:42] ? [02:40:43] sql.php? that's a fairly normal thing to run [02:40:46] is it? [02:41:04] Ryan_Lane: It takes sql files and applies them [02:41:06] I thought there was a maintenance script for that [02:41:11] yes [02:41:15] that *is* the maint script [02:41:24] That's the script with a modification to make it work [02:41:30] ah. ok [02:41:40] deployment-test? [02:41:44] Yes [02:41:45] No [02:41:48] deployment-web [02:42:00] oh [02:42:10] well... [02:42:11] stat /usr/local/apache/common/wmf-config/DectectSite.php [02:42:11] stat: cannot stat `/usr/local/apache/common/wmf-config/DectectSite.php': No such file or directory [02:42:12] I'm supposed to run maintanence scripts from -test [02:42:15] huh [02:42:31] jeremyb: Fine here [02:42:42] on -web? [02:42:46] yes [02:42:55] that seems impossible... [02:43:19] johnduhart: oh, the problem is an extra c [02:43:20] Also fine on -test [02:43:29] AHHHH [02:43:43] * johnduhart throw computer out window [02:43:54] please don't [02:44:04] * aude *waves* [02:44:12] hey aude [02:44:29] aude: please register a complaint re the naming of OSM, kthx :) [02:45:01] :) [02:50:50] petan: OrenBochman: i still didn't read scrollback yet but yes, feel free to cc me on Search mails [02:50:53] night everyone. i'll be on less for the next couple days. be sure to say my name if there's something i should see :) [02:53:46] !log bastion test [02:53:46] Logged the message, Master [02:54:10] test failed [02:56:46] !log bastion test [02:56:47] Logged the message, Master [02:57:25] damn it [02:58:45] !log bastion test [02:58:46] Logged the message, Master [02:59:06] yay [03:00:30] no more broken SAL [03:01:06] yay [03:01:09] interwiki works [03:04:11] !log deployment-prep interwiki now works [03:04:11] Logged the message, Master [03:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [03:10:08] Think I've found our slowness issue [03:10:28] SELECT * FROM `wikiset` WHERE ws_id = '1' LIMIT 1 [03:10:35] At leaast 60 queries of that [03:13:07] !r 38929 [03:13:07] on is http://www.mediawiki.org/wiki/Special:Code/MediaWiki/38929 [03:14:08] wow. rabbitmq is fucking stupid [03:14:13] oh? [03:14:26] you can *never* change the rabbitmq servername [03:14:33] lols [03:14:48] that is the most full of fail thing ever [03:15:18] "The Erlang Mnesia database is host specific (because it is a distributed DB). The simplest way to get you fixed is to clear out the database dir. That's assuming that you can stand losing any persisted exchanges etc. The persisted messages are also stored in that folder, but not in Mnesia. [03:15:18] If you need to keep any persisted exchanges, queues, message etc. then that is a bit harder." [03:15:25] I only have like 3 rabbitmq servers, it's a bit of a pain to maintain. [03:15:43] haven't they ever heard of fucking host IDs? [03:15:55] this really makes me want to stab [03:16:15] lol [03:16:25] well, when I move this, I'll be making a virt-rabbit1.wikimedia.org alias [03:16:40] and pointing at that from now on [03:16:43] virtual rabbits! [03:16:58] or, better, virtqueue1 [03:18:43] * johnduhart slaps Reedy [03:18:48] https://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CentralAuth/WikiSet.php?r1=65348&r2=71290 [03:26:22] !log deployment-prep svn up'd live [03:26:23] Logged the message, Master [03:30:54] Updating databases [03:31:34] !log deployment-prep Updated databases [03:31:34] Logged the message, Master [03:31:56] I love how there's an entry in all.dblist for arwiki but there's no database for it [03:32:43] !log deployment-prep Last update solves an issue where CentralAuth would make 70+ queries per page [03:32:44] Logged the message, Master [03:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [03:36:43] Ryan_Lane: what's the " GPS storage and retrieval" project about? [03:36:52] dunno [03:37:05] some mediawiki project a dev is working on [03:37:49] I believe it's adding some semantic support for that attribute in MW [03:38:03] so that paged with geocoordinates can be found in a programatic way [03:38:06] *pages [03:38:18] hmm... i know about wikidata but [03:38:25] this is a different project [03:38:34] i know they are doing coordinatesstuff with the mobile app [03:38:45] I think that's the main motivation for it [03:38:49] but ought to be working together? [03:38:50] I think yuvi is working on it [03:38:58] well, this is a short-term project [03:39:06] wikidata may never see the light of day for all we know [03:39:11] its goals are way more abitious [03:39:35] well, geocoordinates is one place to start on wikidata [03:39:41] *important* [03:40:20] they are looking at implementing SMW for wikidata, last I heard [03:40:43] geocoordinates support for mobile is planned to be that single thing [03:40:49] maybe wikidata can transition it later [03:40:52] nah, not sure SMW is the answer for wikidata and [03:41:02] other options are being considered [03:41:03] just repeating what I heard :) [03:41:16] s/heard/read in an email/ [03:41:28] i'll talk to yuvi maybe [03:41:52] I'm pretty sure they plan on having this done within months [03:42:04] wikidata isn't even going to be set up for months [03:42:48] I doubt they'll even have something demoable by the time mobile would have this live [03:43:10] yeah, but [03:43:32] wikidata and SMW are the eternal cookie lickers for anything semantic going in [03:44:10] some coordination might be nice, and having interwikilinks + geocooridnates more accessible via api and such are reasonable [03:44:41] objectives, in the shorter term [03:44:43] I have a feeling it'll likely just derail the mobile stuff. meh. not my project though :) [03:45:33] or duplicate efforts but i hope not [03:46:18] i'll hop on wikimedia-mobile and learn more [03:56:12] cool [03:56:56] looks like maxsem is working on it [04:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [04:17:26] Ryan_Lane: Still around? [04:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [04:37:27] !log deployment-prep unmounted deployment-nfs-memc:/mnt/export on /mnt from deployment-web [04:37:28] Logged the message, Master [05:00:35] !log deployment-prep Created nfs export /mnt/upload on deployment-nfs-memc [05:00:36] Logged the message, Master [05:00:48] !log deployment-prep Mounted that export onto -web [05:00:48] Logged the message, Master [05:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [05:11:32] !log deployment-prep Made a quick stab at upload config [05:11:33] Logged the message, Master [05:11:49] !log deployment-prep Adding apache config for upload.deployment.wmflabs.org [05:11:50] Logged the message, Master [05:30:39] !log deployment-prep Installed imagemagick on web [05:30:40] Logged the message, Master [05:32:12] !log deployment-prep thumbnails now working [05:32:13] Logged the message, Master [05:35:39] yep [05:35:42] ish [05:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [05:36:21] what's up? [05:47:37] Nevermind, gotit [05:47:50] Uploads! http://commons.wikimedia.deployment.wmflabs.org/wiki/File:Empire_State_Building_Top.jpg [05:48:00] * johnduhart is done for tonight [05:48:09] Try not to breal anything ;) [05:54:48] heh [06:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [06:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [07:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [07:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [08:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [08:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [09:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [09:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [10:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [10:15:37] can we ACK nagios in labs? [10:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [11:01:45] oh for fucks sake [11:01:53] petan: Permissions are broken again [11:03:23] petan: No I'm reverting you're last 2 changes, that's not how you do it [11:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [11:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [12:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [12:19:51] !log deployment-prep Fixed wmf-config permissions on -nfs-memc [12:19:52] Logged the message, Master [12:20:45] d-wx-wx-wx 16 petrb depops 4096 2012-01-09 19:59 live [12:20:50] what the fuck [12:21:30] depops? [12:21:37] Wait, -wx ? [12:21:50] chmod 333? [12:26:22] I know [12:26:34] wtf [12:26:52] RoanKattouw: And depops is just the group we use for deployment-*" stuff [12:27:28] !log deployment-prep Running updatedata (very slowly) [12:27:29] Logged the message, Master [12:28:22] 1.90 load on -sql, yay [12:32:18] Haha http://ganglia.wikimedia.org/ganglia/?c=Virtualization%20cluster%20pmtpa&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [12:32:27] There's yer problem! [12:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [12:38:36] yay central notice works [13:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [13:11:31] RoanKattouw: hehe http://en.wikipedia.deployment.wmflabs.org/wiki/Main_Page [13:11:36] (Central notice) [13:14:23] lolo [13:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [13:52:59] OrenBochman: hey, sorry I was afk for the night. ping me when you get this [14:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [14:08:14] PROBLEM Disk Space is now: CRITICAL on puppet-lucid puppet-lucid output: DISK CRITICAL - free space: / 16 MB (1% inode=35%): [14:13:14] PROBLEM Disk Space is now: WARNING on puppet-lucid puppet-lucid output: DISK WARNING - free space: / 48 MB (3% inode=35%): [14:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [14:43:21] Mmm pizza [14:43:26] OrenBochman: around? [14:43:30] * johnduhart-phone noms [14:43:39] johnduhart-phone: stop teasing us [14:44:01] Hehe [14:45:24] Have you seen the central notice? :p [14:45:49] link? [14:45:59] I just got here [14:46:20] One sec [14:47:03] none when I'm logged in... [14:47:16] http://en.wikipedia.deployment.wmflabs.org/wiki/Main_Page [14:47:26] er, not logged in [14:48:14] hrm... [14:48:30] deployment.wmflabs.org isn't working [14:48:39] * hexmode goes to redirect to labs [14:48:44] Hold up [14:48:48] k [14:49:00] Oren: around? [14:49:01] Petan did that before but did it the wrong way [14:49:19] It should be an apache config, not an if in site detector [14:49:52] I can do it properly in 30 mins [14:50:27] while you are at it, makes the vector skin background pink or something different than the main wikipedia :b [14:51:09] I can do that, hashar. change body in CSS? [14:51:48] Not body, cotent div [14:52:27] either the body background or the content div [14:54:12] Any ideas when rlaner will be in the office? Virt2 needs to be looked at, load is oddly high [14:55:25] johnduhart-phone: probably another 3 hours or so [14:55:31] K [14:56:26] johnduhart-phone: is it ok if I redirect deployment to labs while I'm in there? [14:57:06] hexmode sure [14:58:11] johnduhart-phone: huh, I click the Read Now button on the notice and it doesn't do anything [14:58:36] * johnduhart-phone shurgs [14:58:54] Isnt meant to link anywhere [14:59:25] Just a humourous test :) [15:00:15] ohhh. ok. [15:00:17] :) [15:01:23] done [15:02:54] johnduhart-phone: we should link it to a page explaining people that we want them to test the heck out of this :) [15:03:09] and on that page link to the test plans [15:04:38] link here: http://labs.wikimedia.deployment.wmflabs.org/wiki/Please_test_mercilessly [15:05:14] IT S BROKEN BEYOND REPAIR!!!!!!!!!!! [15:05:34] sumanah: OrenBochman was talking to me last night about doing some more rigorous testing... maybe we should get him to try for the QA position? [15:05:55] hexmode: go ahead & tell him about it [15:06:03] I will :) [15:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [15:09:21] * hexmode goes to hunt down the css files [15:16:51] johnduhart: so, now I'm confused... how do I update the CSS? [15:17:01] uh [15:17:09] hold on [15:17:15] just got situatied [15:17:37] it is under /usr/local/apache/common/live? but I can't get to that [15:17:54] oh god [15:17:55] no [15:17:59] Don't touch core files [15:18:04] ever [15:18:16] k, just tell me what to do [15:18:30] hexmode: http://labs.wikimedia.deployment.wmflabs.org/wiki/MediaWiki:Common.css [15:18:38] doh [15:18:40] :P [15:22:13] hexmode: Notice now has a link :) [15:22:27] :) [15:23:26] background color on content & body is obscured, need to find a better way [15:26:18] hexmode: Look at what is done on mw.org for Manual pages [15:26:30] k [15:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [15:36:49] Ryan_Lane: Can you take a look at virt2, last I check it had a load of 30 [15:37:25] it's like 5-10 right now [15:37:45] mmm, seems to have died down http://ganglia.wikimedia.org/ganglia/graph.php?r=day&z=xlarge&c=Virtualization+cluster+pmtpa&h=virt2.pmtpa.wmnet&v=12.20&m=load_fifteen&jr=&js=&vl=+&ti=Fifteen+Minute+Load+Average [15:38:52] waitio was pretty high [15:38:57] yup [15:38:58] what were you guys doing on it? [15:39:00] importing? [15:39:18] I'm not sure if petan was doing anything at the time [15:41:49] I'm working on adding another node it. hopefully will have it in this week [15:42:00] yay [15:42:07] which should also hopefully help with the IO as well [15:42:08] i'd like a (one) puppet manifest that is just in production to also be in test. just create a new commit and push to test or better do it a different way? [15:42:25] mutante: you can cherry pick the commit across [15:42:47] git cherry-pick 9a3bc67ef........ [15:42:53] git push-for-review-production [15:42:55] thanks [15:43:07] review-test ? [15:43:11] Oh [15:43:17] sorry, misread your comment [15:43:23] ok [15:43:34] Right, you want to cherry-pick from prod to test, usually it's the other way around [15:43:49] So yeah git checkout test && git cherry-pick [commithash] && git push-for-review-test [16:00:00] what [16:00:08] why can't I get into live [16:04:39] !log deployment-prep Site broken, currently recreating live folder [16:04:40] Logged the message, Master [16:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [16:07:34] petan/hexmode: I don't know what you're doing, but for some reason it feels like I'm spending so much time fixing file permissions. What exactly are you doing? [16:08:03] johnduhart: nothing right now... where are you seeing permission problems? [16:08:09] umask maybe? [16:08:50] No matter what I can't get access into the live folder [16:09:16] and it was to to -wx-wx-wx this morning for some godforesaken reason [16:09:27] I didn't touch live for the same reason... maybe that is nfs? [16:09:35] nfs problem? [16:10:05] * johnduhart shurgs [16:10:08] Maybe [16:10:40] NFS isn't *that* bad in my experience [16:10:48] It can be slow as shit but I've never had it corrupt my persm [16:10:50] *perms [16:10:55] haven't seen petan so I'm blaming him ;) [16:11:58] RoanKattouw: btw, is there squid configs anywhere? [16:12:08] Puppet maybe? [16:12:10] might be in puppet, haven't looked [16:12:13] * johnduhart looks [16:12:14] I don't know if the Squids are puppetized [16:12:28] I know two things about Squid [16:12:34] Well, 3 [16:12:57] #1 They cache things! [16:13:00] 1) I know how Cache-Control headers work, 2) how X-Vary-Options headers work, 3) how CARP works on a high level [16:13:03] xD [16:13:08] hehe [16:13:40] <^demon> 4) And they mess with your headers even when you don't want them to ;-) [16:14:36] they can also cache too much :) [16:15:09] This looks promising: https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=tree;f=files/squid;h=5e30c24fbf682b0655d9f34cb5e9485f7b7d4550;hb=3e8c21850230d24c0289486fc4dae0989fe3422a [16:17:23] it wouldn't be puppet or nfs [16:17:35] oh?> [16:17:46] either someone had a bad umask, or fucked up their numbers when setting something [16:18:06] also, is that squid stuff you are looking at what you are thinking about using for mediawiki? [16:18:14] if so, it's not the right squid conf [16:18:24] huh [16:18:25] that squid conf is for the apt-proxy [16:18:31] :( [16:18:43] Ryan_Lane: Where would I look for that config? [16:18:44] I have the idea it wasn't puppetized [16:18:45] Are you telling me that /only/ the apt-proxy Squid conf is in puppet? [16:18:50] squid for production is generated by php and deployed to all systems [16:18:50] but it could have been since [16:18:57] RoanKattouw: correct [16:19:02] And the text Squid and upload Squid configs are not? Boo [16:19:10] I agree [16:19:11] Generated by PHP .... ? [16:19:11] it sucks [16:19:13] yes [16:19:18] Whoa, I learned something new today [16:19:20] in /home/w/conf/squid [16:19:24] That doesn't happen to me that often anymore [16:19:46] themoreyouknow.jpg [16:20:03] :D [16:20:18] Ryan_Lane: Is the squid configuration genereation script in svn? [16:20:26] RoanKattouw: if you yank the private info, you can pull the info across to labs [16:20:41] note, there's a password, and we likely don't want to release our IP blocks [16:20:46] oh [16:22:08] I wouldn't mind if this was puppetized, but it would need to be a mix of puppet and generating [16:22:16] using includes for the stuff that needs to be generated [16:22:32] the generated stuff, at minimum, needs to be the blocks [16:23:25] mark: ^^ any opinions? [16:25:20] johnduhart: wtf... I'm getting MW installation screen? [16:25:33] * hexmode waits [16:26:00] hexmode: oops [16:26:01] one minute [16:29:04] hexmode: ok [16:29:27] !log deployment-prep live and extensions recheckedout into a new folder [16:29:28] Logged the message, Master [16:31:45] Ryan_Lane: opinions on what? [16:31:55] mark: how to handle squid in labs [16:32:09] I'd like to puppetize at least part of the squid conf [16:32:22] and use includes for the parts that need to be generated [16:32:46] I don't want to puppetize squid at all [16:32:52] we're moving away from squid [16:33:01] ok. so just move the generation across to labs? [16:34:57] Hmm, OK [16:35:23] RoanKattouw: if you are going to do so, then make sure the password doesn't make it across [16:35:27] I see that point, but the flip side is that we can't get a proper test cluster up in labs until we set up Varnish for text&upload [16:35:37] varnish is different [16:35:40] that's puppetized [16:35:47] Yeah [16:36:01] hmm, I guess Squid could be minimally puppetized, I guess that's what you're suggesting [16:36:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [16:36:11] well, mark is saying not to do that [16:36:17] Put it in puppet in a hacky way and leave it be until we move text&upload to varnish? [16:36:19] instead, just move the generation scripts across [16:36:24] Right [16:36:49] so, no puppet at all for squid [16:47:23] oh I've thought of having puppet *serve* the config files [16:47:26] like, deploy them to hosts [16:47:32] but I don't want to spend time on generating the configs with puppet [16:48:05] !log deployment-prep installing wikimedia-task-appserver on -web [16:48:06] Logged the message, Master [16:48:39] right now configs don't get deployed to squids which are down [16:48:44] that's bad, and puppet could fix that easily [16:52:54] Ryan_Lane: are you still in classes? [16:53:17] mark: yeah [16:53:22] mark: and that's the part I meant [16:53:30] puppet serve the main configs [16:53:40] That's basically what I meant when I said minimal puppetization [16:53:44] then have the generate script push out includes [16:53:55] Do just enough so the config is in puppet and can be deployed to labs VMs, but nothing more [16:54:08] since we do want to have the speed of generating and pushing the configs [16:54:12] that's fine with me [16:54:39] * Ryan_Lane nods [16:54:44] PROBLEM dpkg-check is now: CRITICAL on deployment-web deployment-web output: DPKG CRITICAL dpkg reports broken packages [16:54:54] johnduhart: after telling me not to, are you messing with core files? [16:55:00] I'm done with class tomorrow [16:55:10] * hexmode tsk tsks [16:55:33] hexmode: I needed to get to a access a folder to get something :P [16:55:46] Plus I'll need to write to the folder to svn up [16:56:52] johnduhart: I just saw a temporary error on the page, that is what I was teasing about. We do want to run svn up, though, so go for it [16:57:20] I just did one when I recreated the folder [16:58:34] Ryan_Lane: waitio on virt2 again, I'm apt-get installing a bunch of things [16:58:36] just fyi [16:58:46] * Ryan_Lane nods [16:59:31] one of the three systems isn't doing disk IO at all [16:59:43] since it isn't part of the gluster pool [16:59:48] ok [16:59:56] I was thinking about lvs for labs [16:59:58] what a lazy server [17:00:00] getting that done finally [17:00:04] need even sets of nodes in pools [17:00:06] mark: cool [17:00:19] so, when I add a new node in, the IO should be more evenly distributed [17:00:24] Great [17:00:41] mark: rabbitmq is a rediculous application, btw [17:00:48] maybe really, it's erlang's fault [17:00:53] heh [17:00:55] but it makes me quite mad [17:01:03] once you name a node, you can't ever rename it [17:01:22] Does the world end if you do? [17:01:30] the server won't start [17:02:50] * hexmode copies en.w.deploy's common.css to the other wikis on deploy [17:03:13] hexmode: Why? [17:03:38] petan is supposed to import each wiki's MediaWiki namespace into each wiki [17:04:39] johnduhart: k, well, maybe I'll wait, but I'd like to have div#content {background-color} be different for deploy [17:05:21] hexmode: We could just have a central notice for all wikis [17:06:04] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [17:08:50] johnduhart: fair enough... now, where is petan? [17:08:57] * johnduhart shurgs [17:09:00] * hexmode is getting antsy [17:09:04] :) [17:13:59] Just don't do any major configuration changes :P [17:14:44] To physics! [17:14:48] * johnduhart floats away [17:15:20] Ryan_Lane: so right now, all access to instance IPs goes via the host firewalls right [17:15:39] we either need to drill a hole in the host firewall or do some dirty trickery where LVS traffic gets sent from the routers directly [17:15:53] drill a hole in host firewall? [17:15:56] well [17:16:00] you mean the NAT? [17:16:03] lvs traffic needs to pass through [17:16:10] it's not NATed [17:16:15] but it's probably being filtered anyway [17:16:30] assuming we're using public lvs service ips [17:16:39] ah. right [17:16:45] virt2 and friends will need to allow that traffic to be bridged to the instance [17:16:51] OR [17:16:57] we setup the routers to deliver it to the instances directly [17:17:06] the former is likely difficult [17:17:08] but we don't want that for normal traffic, so it'll need a special setup [17:17:23] the latter is likely difficult too, but let's see how dirty it will be ;) [17:17:27] heh [17:17:42] nova-compute and nova-network manage the devices, and we'd need to modify code to handle that [17:17:54] which means maintaining the code changes too [17:18:12] i'll look at network trickery then ;) [17:18:50] i.e. handle traffic announced by bgp (pybal in labs) differently [17:19:14] ok. cool. [17:19:22] hmm [17:19:24] I'll look into the code if that's looking too dirty [17:19:26] that doesn't fix return traffic though [17:20:08] can't we simply allow a specific ip range through unconditionally? [17:20:25] I don't see why not [17:21:19] i'll still need to have the routers drop the lvs traffic directly to the instances I think [17:22:06] I think so [17:22:17] otherwise the linux boxes need to route it [17:22:19] which complicates it more [17:22:20] otherwise the network node will take it [17:22:24] yeah [17:22:24] heh [17:22:54] I wonder if we can do that without taking two instance IP addresses on the routers [17:23:08] probably not [17:24:42] See what you're making us do!? [17:24:53] :D [17:28:25] Ryan_Lane: i'll need two internal instance ip addresses in that range, one for each router [17:29:14] ok [17:29:35] I'm in a lab, and am fucking things up. let me get back to you in a little bit [17:29:47] don't worry, not in a hurry [17:34:48] what do you need to do in the lab? [17:34:51] (only answer if you have time ;) [17:36:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [17:43:09] mark: setting static routes [17:43:51] of course, they won't directly share any materials with us, so the network diagram isn't easy to follow [17:44:04] I have to ask the instructor to put that up, and switch between it and some other document [17:44:07] it's fucking stupid [17:44:57] silly [17:45:16] god help that I share their copyrighted material with the world [17:45:34] set routing-options static route next-hop [17:45:43] yeah [17:45:52] doing it for neighbor devices [17:46:08] let's hope they won't cover RIP next ;) [17:46:23] they are doing ospf next [17:46:30] good [17:46:33] I have a feeling we aren't doing rip [17:46:40] that's good [17:46:53] what's the point? does anyone really even use it anymore? [17:47:00] hopefully not [17:47:06] we use OSPF [17:47:15] yeah. seems more efficient [17:47:21] a lot ;) [17:47:36] this course is useful, so far [17:47:52] been like 10 years since I've taken a networking class [17:48:56] we're getting rid of foundry, so we could move to ISIS now [17:48:57] and it was a concept class [17:49:02] although there's hardly any reason to [17:49:06] heh [17:49:07] OSPF and ISIS are conceptually the same [17:49:18] I'm not familiar with ISIS [17:49:36] basically the same, slightly different implementation [17:50:00] why change, if OSPF is working for us? [17:50:30] there are a few minor reasons, like being able to do ipv4, ipv6 and mpls in one protocol etc [17:50:39] now we run OSPF2 (for ipv4) and OSPF3 (for ipv6) [17:51:27] ah. ok [17:51:29] makes sense [18:06:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [18:14:57] here [18:15:02] hexmode: needed me? [18:15:29] petan: just trying to see where we are [18:15:46] right we were fighting a bit with john I guess :D [18:15:53] because didn't know what he's doing [18:16:04] so we were probably reverting ourself a bit [18:16:22] johnduhart: ping? [18:16:24] right now I am importing back simple db [18:16:31] k [18:16:52] also some import got broken since someone broken confs for a while I guess, for some reason site wasn't available [18:17:23] now should be ok, I am importing MW to de cs hi and commons [18:17:40] petan: enwiki is done, right? [18:17:46] sort of [18:18:20] what about the sample of pages... and what is your current timeframe? [18:18:33] enwiki has tons of sample pages [18:19:04] I'm gonna try to work with OrenBochman later and get search going ... he said it should take a couple of hours [18:19:38] petan: by "sample of" I mean "small % of" in case that wasn't clear [18:20:48] http://en.wikipedia.deployment.wmflabs.org/wiki/Special:Random [18:20:58] http://en.wikipedia.deployment.wmflabs.org/w/index.php?title=Special:Statistics [18:21:02] 700+ pages [18:21:06] is it enough? [18:21:11] people can import if they needed [18:22:20] it's incredible that en wiki sysops need extension preventing them from deleting main page [18:22:34] looks like if they were dumb heh [18:23:04] petan: ok, forgot about statistics ty [18:24:05] petan: johnduhart: people are already starthing to look at this so keeping it up would be good :) [18:24:15] I am trying to keep it up all the time [18:24:22] dunno why it's getting offline [18:24:41] johnduhart: could you make config for test wiki which would override current settings and test changes in that? [18:24:48] petan: oh, there was a problem that johnduhart and I ran into.... [18:24:55] so that we don't get complete crash on mistake [18:25:08] petan: did you change perms on /live to -wx ? [18:25:13] no [18:25:21] hrm... strange [18:25:38] I will enable root squash there [18:25:44] so it's not so hard to fix it [18:25:46] That is what I blamed on you while you weren't here ;) [18:25:52] heh [18:26:04] wait I changed it to 775 [18:26:09] that's what I did [18:26:13] group depops [18:26:24] if you mean that? [18:26:33] we saw d-wx-wx--x [18:26:36] yes [18:26:39] it was strange [18:26:50] that isn't 775, though [18:26:58] drwxrwxr-x [18:27:02] that's what I did [18:27:10] right [18:27:13] if you saw d-wx I have no idea :D [18:27:22] puppet? [18:27:24] I use only chmod +w [18:27:32] rarely octal [18:27:34] * hexmode looks for someone to blame [18:27:39] heh [18:27:41] I need to go now [18:27:43] well, no biggie [18:27:48] be back in hour [18:27:54] petan: k, thanks for your help [18:27:55] cya [18:28:03] johnduhart: please try to lock db of simple wiki [18:28:08] no one must touch it now [18:36:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [18:51:56] o [18:52:35] o [18:53:03] btw maybe you should notice that background color is not change to sw but just to differ it from prod :D [18:53:16] or people will ask why 1.19 has different color [19:05:36] !log deployment-prep tweaked memcached and flushed cache [19:05:37] Logged the message, Master [19:05:50] http://ar.wikipedia.deployment.wmflabs.org/wiki/%D8%A7%D9%84%D8%B5%D9%81%D8%AD%D8%A9_%D8%A7%D9%84%D8%B1%D8%A6%D9%8A%D8%B3%D9%8A%D8%A9 [19:05:56] hexmode: rtl wiki is up [19:06:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [19:06:08] hexmode: can you check sites we have and tell me if something is wrong [19:06:21] \o/ [19:06:23] brb [19:06:29] * hexmode goes off to check [19:13:10] I'm sorta here now [19:13:21] I'm providing support for a school group and their wordpress atm [19:13:34] petan hexmode what's the current problems [19:13:41] * johnduhart signs into -web [19:14:05] johnduhart: I think petan is currently importing sites [19:14:08] enwiki is up [19:14:18] arwiki needs importing but is up now [19:14:47] What does petan mean by simplewiki must be locked? [19:15:02] I'm hoping oren (or OrenBochman) comes on soon so we can get search [19:15:38] johnduhart: I don't know... maybe he wants the wiki closed while he imports? [19:16:16] I tihnk I can do that [19:16:24] Meh [19:16:53] hexmode: [19:17:08] OrenBochman: ! search? [19:17:45] hexmode: should we make 4 machines ? [19:17:57] OrenBochman: one for search, you mean? [19:18:05] yes [19:18:08] I think we can get it from Ryan_Lane [19:18:27] maybe ... [19:18:54] what I am asking is this - as far as I know production uses 4 machines [19:19:48] do we want to replicate all 4 in this deployment ? [19:20:06] OrenBochman: k, so can you set something up on what we currently have or just use one dedicated? [19:20:35] It would be better to use a dedicated one [19:20:37] OrenBochman: I was also wanting to talk to you about your ideas for gadget testing system later [19:20:44] sure [19:20:48] k [19:21:19] I'll find out what we need to do to add one machine to our test cluster [19:22:19] OrenBochman: in the meantime, (since ryan is out right now, it looks like) what else can we do to help you? [19:22:34] why would you need me for anything? [19:22:52] we need notpeter [19:23:17] Ryan_Lane: I got one question - how large can this instance be ? [19:23:30] how large does it need to be? [19:24:15] I don't know - it would need about 75% of the storage of what it should index [19:24:28] and how much do you think it will index? [19:24:41] about 140GB is in the databases, right? [19:24:59] hexmode: can you confirm [19:25:24] OrenBochman: can you create the instance or do you need me to add you in the deployment group [19:26:18] I can create an instance in search [19:26:27] OrenBochman: petan or johnduhart would be better to confirm that, but I think you're right [19:26:34] sorery not right now [19:26:40] don't know about in your project [19:26:43] what is it called [19:26:48] OrenBochman: Let me see if I can add you to this project [19:26:58] hexmode: you do that via "Manage projects" [19:26:58] deployment-prep [19:28:12] 01/10/2012 - 19:28:11 - Creating a home directory for oren at /export/home/deployment-prep/oren [19:28:39] * jeremyb looks up [19:29:13] OrenBochman: you should be in there now [19:29:13] 01/10/2012 - 19:29:13 - Updating keys for oren [19:29:29] ok [19:29:34] henrik: did you add him to sysadmin and netadmin? [19:29:36] notpeter: ping [19:29:43] probably doesn't need to be added to netadmin [19:29:49] Ryan_Lane: ? [19:29:56] but needs sysadmin to create instances [19:30:01] Ryan_Lane: k, well, I'll remove him if needed later [19:30:09] from netadmin [19:30:20] henrik: I think Ryan_Lane meant me [19:30:27] whoops [19:30:30] henrik: sorry :) [19:30:47] ah, no worries :) [19:30:50] OrenBochman: notpeter might be afk... can i help with something? [19:31:06] afk ? [19:31:12] !afk [19:31:24] he sent me an email he should be around till 22:00 [19:31:24] afk = away from keyboard == not here [19:31:36] oh aftk [19:31:46] !afk is Away From Keyboard [19:31:46] Key was added! [19:32:06] he should be the one to setup the instance [19:32:15] unless it can be sone later [19:32:25] unless it can be done later [19:32:36] OrenBochman: Let me try to find him [19:32:48] OrenBochman: did you see i got search working (or "at least using the search service") last night on search-test? [19:33:08] nope [19:33:14] how did you do it? [19:33:54] brb [19:36:02] it looks excellent [19:36:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [19:48:03] OrenBochman: having trouble finding notpeter ... but I left him messages and such... maybe he'll appear shortly. [19:48:29] I'm doing some installations locally [19:48:39] but we can talk about testing too [19:48:42] k [19:49:21] also there is anther issue [19:49:45] so what are your general ideas about testing? I'm not sure what we should do for gadgets, but maybe if we left deployment up and svn-uped it nightly that might be a start [19:49:58] oren: another issue? [19:50:10] just a sec [19:50:44] we will need to install oairepository and the docs are out of date [19:51:01] anychance you can ask some one to update them ;-) [19:51:04] oren: I think brion knows about that [19:51:09] 1s [19:51:10] I agree [19:51:43] oren: https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource:Search/SAL&diff=1269&oldid=1266 [19:52:16] oren: when brion is around we can ask him about OAI [19:52:29] till then, I'm working on http://labs.wikimedia.deployment.wmflabs.org/w/index.php?title=Please_test_mercilessly&action=edit [19:52:46] err... http://labs.wikimedia.deployment.wmflabs.org/wiki/Please_test_mercilessly [19:53:11] what I think should be done with gadgets is this - [19:53:54] we ask one of the developers to put his gadget into the CI [19:54:24] now the gadget should have unit test [19:54:46] the CI should be set to get the gadget srom subversion [19:54:58] run PHPunit tests [19:55:27] makes tons of sense [19:55:55] there is also a set of browser test which can be run from there [19:56:04] oren: we've also been trying to get Selenium up ... which I *think* would be better than just phpunit [19:56:18] Does anyone actually still use Selenium? [19:56:23] that or testswam can do browser testingt [19:56:24] yes except for better [19:56:37] *testswarm [19:56:50] I ment that [19:56:56] RoanKattouw: in mediawiki, i don't think so [19:57:20] the CI should support end every unit testing framework people like to use [19:57:48] oren: so we should work with Krinkle to get the js testing to have support for gadget authors [19:58:03] yep [19:58:21] now if you want even a better suggestion [19:58:24] * Krinkle is in a meeting but writes down that we should meet up here in 20 minutes [19:58:34] why would gadgets not be testswarm? [19:58:56] oh, ok you mentioned that [19:58:57] Krinkle: now I can hardly wait :) [19:59:45] Krinkle: re which project? [19:59:49] I would say make a helloWorldGadget sample + php test + test swarm which uses resource loader etc etc [20:00:09] oren: yes, yes [20:00:41] i.e. a best practice thing - I remeber in wikimania they showed some security stuff as well [20:01:53] I don't remember who gave the lecture - it wasn't krinkle [20:02:07] but he seemed to have some code [20:02:23] was it Markus Glaser? [20:02:33] hrm [20:02:35] no [20:03:20] oren: It was me, we did it with 3 ppl. [20:03:28] Markus, ryan and me [20:03:34] brb in 10 min [20:03:35] ah. right [20:03:53] was that the talk I didn't know I was in? [20:03:58] I rembember you talked about resource loader [20:04:11] Ryan_Lane: Yes [20:04:13] Ryan_Lane: I bet it was :) [20:04:16] :D [20:04:36] Ryan_Lane: Neither did I, I saw my name on Markus' opening slide :D [20:04:37] so I bet you people give that same talk from time to time [20:04:42] hahaha [20:04:53] oren: I'm glad it sounded practices, it wasn't. [20:04:56] I think that was the first and only [20:04:59] practiced* [20:05:23] it was spontaneous [20:05:38] Yes, we love the subject. [20:06:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [20:07:16] so do you have a hellow world type of project around - which would have a some php and some javascript that needed unit testing? [20:07:31] re puppet2 but not only ---> can we ACK nagios in labs? [20:07:46] hey [20:07:47] back [20:07:47] what is nagios ? [20:07:48] i saw mutante ACKing in prod and now i want ACKs here :) [20:07:53] hey [20:07:54] was out walking and absorbing sunlight [20:08:03] we could if nagios was doing authentication of some variety [20:08:03] notpeter: and vitamins! [20:08:16] you know, ldap auth there would make sense. :D [20:08:19] Ryan_Lane: i think nagios is happy with LDAP? [20:08:38] Ryan_Lane: what's prod auth for nagios? LDAP? [20:08:53] I don't think that's been configured yet [20:09:08] notpeter: I'd like to bring you up to speed [20:09:28] yes! [20:09:31] oren: did you see my link to the log? [20:09:42] nope [20:09:50] what link [20:09:56] notpeter! [20:09:58] < jeremyb> oren: https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource:Search/SAL&diff=1269&oldid=1266 [20:09:59] :) [20:10:08] ok got iy [20:10:10] vitamin d is important [20:10:44] I checked it - let me know what you did there? [20:11:14] oren: everything last thing is in the log. but i gave the wrong link slightly. see https://labsconsole.wikimedia.org/w/index.php?title=Nova_Resource:Search/SAL&diff=cur&oldid=1264 [20:11:30] err, every last thing* [20:11:48] hexmode: yes, I was like "who is this person calling me!??!?!" then I saw your message and was like "ooooo, glad I'm almost home..." [20:12:22] notpeter: did you submit your DNA samples and 10 card? [20:12:37] notpeter: we would like you to help up to setup an instance that replicates the search machines for the deployment-beta [20:13:06] jeremyb: ?? [20:13:27] Oren: ok, I haven't set up and instance yet, but there has to be a first time for everything :) [20:13:33] notpeter: "who is this calling me"... nvm :) [20:13:49] notpeter: deployment-prep is a project to be next gen prototype.wm.o [20:14:21] I was led to believe you are the expert [20:14:40] he's handling search in production [20:14:51] he hasn't used labs yet, though, eh? [20:14:56] notpeter: right? [20:15:00] anyhow making an instance is a breeze [20:15:04] notpeter: next time I'll call you from my phone so you'll be even more confused: "Who is this guy in Lancaster, PA calling me?" [20:15:05] did I make an account for you? [20:15:13] I even did it myself (after 10 false starts) [20:15:59] Ryan_Lane: correct [20:16:06] petan: interwiki links on labs? I tried "mw:" and got nothing [20:16:07] Ryan_Lane: and yes [20:16:11] ok. cool [20:16:32] anyhow petan is an expert on labs [20:16:58] perhaps he can make the instance but I'll quiz notpeter about the setup we need to create ? [20:17:23] sounds reasonable [20:17:25] Q1. do the labs machines have mediawiki installation on them [20:17:27] I will tell you all I know [20:17:44] oren: I can do it, too (or so Ryan_Lane tells me) ... what do you want on there? [20:17:48] which labs machines? [20:18:02] notpeter: will tell us [20:18:29] there are no labs instances created for this yet [20:18:33] sorry, I'm unclear at the moment. [20:18:37] notpeter has been doing it in production [20:19:00] in prod the search boxes do not have a mediawiki installation on them [20:19:11] but they are reliant on several of the config files [20:19:13] I know we will need java 1.6 and ant 1.8 [20:19:36] yes. I have a deb for java, btw [20:19:40] I know abit about that [20:19:56] deb ? [20:20:21] a .deb file so that it can be added to whatever apt repo is used and installed by puppet. or we can install it by hand [20:20:23] all the same to me [20:20:48] I'm all for automation [20:20:50] oren: debian package [20:21:08] ubuntu uses debian packages, as it's based on debian [20:21:13] since I'll need to do it many times in the future [20:23:14] I'm asking if the production search machines run mediawiki - because it seems to be needed for the installation [20:23:51] Oren: nope. [20:24:04] interrsting [20:24:52] the search I intalled yesterday needed access to the instrnatinalization files [20:25:05] and to a localsettings [20:25:11] yes [20:25:13] that is required [20:25:20] in prod that is handled by an NFS mount [20:25:28] that's a shitty system that I'm trying to do away with [20:25:35] ;-) [20:25:40] I agree [20:25:50] I have a much better Idea [20:26:02] here [20:26:05] hi [20:26:08] so I'm trying to get all of the files pushed out via another method, our scap script. roan said he would work on that [20:26:33] petan: hey [20:26:56] johnduhart: I am importing to sql table [20:26:57] it would be better if any server which wanted search service would ask the search machine [20:27:03] if someone touch simple wiki blow up [20:27:30] table text is hard locked you can't writo or select from it [20:27:47] anyhow that is anoter bussiness [20:28:22] so does the search machine have all the wikis mounted via ntfs or just one ? [20:28:31] hexmode: interwiki's should work [20:29:01] or just the required files? [20:29:10] petan: https://www.mediawiki.org/wiki/Unit_testing [20:29:13] oosp [20:29:24] petan: http://labs.wikimedia.deployment.wmflabs.org/wiki/Please_test_mercilessly [20:29:35] interwiki links aren't working there [20:29:56] hm... [20:30:00] ah interwiki, sure [20:30:02] sorry [20:30:07] * petan fixes [20:30:14] oren: the search indexer queries the various db servers to index all of the content. those indexes are then rsynced to the various search boxes, different clusters for different wikis [20:30:40] http://noc.wikimedia.org/conf/lsearch-global-2.1.conf [20:30:46] we can see the various search groups in that [20:31:01] requests to each search group is load balanced by lvs [20:31:11] ok [20:31:27] for testing purposes, it would be fine to just have the search indexer be the only search box and have it serve up the content [20:31:29] ok [20:32:22] johnduhart: squid? [20:32:35] let's install it [20:32:42] hexmode: did you announce yet? [20:32:45] opening [20:32:46] of site [20:33:04] petan: not yet, was hoping to get the search thing done first [20:33:07] ok [20:33:09] is there just one indexer [20:33:09] cool [20:33:13] we can install squid then [20:33:24] :) go for it! [20:33:39] Oren: yep [20:33:53] things are starting to make sense [20:33:54] which is currently a spof and something that I'm working on... but that's another story [20:33:57] interwiki's are imported but there is probably issue with configuration [20:34:07] for some reason small capitals links are not allowed [20:34:18] oren: excellent! I'm glad :) [20:34:35] notpeter: are you just an sysop or also a developer ? [20:35:00] just ops. I haz no php skillz [20:35:04] Oren: being a sysop doesn't really mean much :D being a dev does heh, at least for me [20:35:40] petan: we could not survive without Ops [20:35:42] + there is difference between sysop (admin of wiki) and ops (people with shell) [20:35:48] I mean sysop as admin of wiki [20:35:49] ah [20:35:50] sysops know lots of stuff [20:35:50] not ops [20:35:51] :D [20:35:54] yes I know [20:36:05] PROBLEM host: puppet2 is DOWN address: puppet2 PING CRITICAL - Packet loss = 100% [20:36:06] couldn't do without sysops, too ;) [20:36:11] anyhow let get back on point [20:36:14] sure [20:37:18] notpeter: there are servral wikis in the deployment cluster [20:37:31] oren I am going to replicate all we have on prod [20:37:38] with small db only [20:37:53] MW space for all + 2000~ content pages [20:37:56] latest rev [20:38:25] but I need to create a script for that [20:38:38] until then I replicate only some biggest wikis [20:38:51] petan - I'd like to talk to you about that [20:38:55] ok [20:39:14] I calculated that each wiki eats about 50mb of space in sql [20:39:17] that's not so much [20:39:28] simple wiki eats 38gb, that's full [20:39:30] db [20:39:44] full history ? [20:39:46] yes [20:39:50] ok [20:40:18] that will be main wiki for testing other wikis would be rather for local communities to test if stuff work before deployment [20:40:25] petan It would be best if you could include all pages from a few catagories [20:40:27] I mean simple wiki would be best for devs [20:40:46] that's np [20:40:53] but I need to know which pages [20:40:58] cats [20:41:40] all the featured pages, did you know [20:41:49] pages that is [20:43:15] ok [20:43:19] I will try [20:43:51] Ryan_Lane: hi [20:43:59] is squid in puppet good? [20:44:22] also former featured articles [20:44:36] petan can you do that ? [20:44:51] Oren anyone can register and import whatever they want... so I think it's no problem to get there anything [20:45:31] johnduhart: why I can't login :o [20:45:49] oh nvm [20:45:49] :D [20:46:28] Featured article candidate etc [20:46:32] ok [20:46:38] petan: no [20:46:42] Ryan_Lane: ah [20:46:43] ok [20:46:47] petan: squid isn't puppetized at all [20:46:51] right [20:46:55] we are talking about puppetizing it *some* [20:46:59] cool [20:47:52] are all the wikis being imported to one sql server or each to its own? [20:48:35] for now to one [20:48:35] also it the database on the same instance as the MW or seperated? [20:48:43] later Ryan set up a new [20:48:57] it's on a different instance [20:49:11] there are 5 instances now [20:49:28] so at this point we will need to setup the search instance to have access to the dbase instance [20:49:30] 1 web 1 sql 1 nfs 12 maintenance [20:49:35] * 2 ma. [20:49:49] that's no problem [20:49:53] do you understand what needs to be nfs shared ? [20:50:08] uh? [20:50:10] what [20:51:06] notpeter: could you explain the NFS is used in production to allow the indexer to accesss the various machines ??? [20:51:12] oren: at this exact point, a lot of common config files, and, unfortunately, all of a user, rainman's, home director.... [20:51:16] I think it's just used for config [20:51:28] config and scripts, right [20:51:32] yeah [20:51:41] there's a search-related rsync setup [20:51:42] there's no reason it can't be puppetized, or deployed like squid/mediawiki [20:51:45] that moves the indexes [20:51:46] I would rather use gluster for stuff [20:51:52] petan: me too [20:51:54] anyway our nfs is used for php files only [20:52:06] but for this we don't really need a shared filesystem, realistically [20:52:07] so maintenance instance run rebuildall etc [20:52:12] while apache only web [20:52:19] notpeter: the rsync can wait for later [20:52:32] Oren I don't want to install anythin more to -web [20:52:32] Ryan_Lane: yeah, puppet can handle most of it. there's still the 80 meg jar.... but that can probably be done with rsync as well [20:52:49] if there was a seatch it woul be on own instance [20:53:39] Ryan_Lane: wikimedia squid or squid? [20:53:45] it petan the search will need two instances [20:53:56] one will be an indexer [20:54:06] the seacond will replicate it [20:54:21] this should work untill you import large wikis [20:54:35] hello there :-) Is anyone available to help me debug an ssh connection issue? [20:54:43] hashar: sure [20:54:45] I can connect to bastion host but not to the VM [20:54:53] hm... [20:54:57] I am using ssh -A bastion.wmflabs.org [20:55:09] then "ssh testpuppet" just denied me [20:55:20] agent is up? [20:55:23] on your pc [20:55:35] notpeter: do you know how does the lsearch demon contacts the remote databases? [20:55:36] it's ubuntu? [20:55:39] yup since I can connect to bastion [20:56:06] it is a Mac OS [20:56:08] let me try ubuntu [20:56:38] oren: I do not really know. I have tried to read through the code some, but have made relatively little progress through it. [20:56:53] oren: I would love to know more, though, if you find out more in your research [20:57:12] is there a local.conf as well ? [20:58:03] oren: yes. let me find a suitable method by which to link you to it... [20:58:04] also what is in file:///home/wikipedia/common/pmtpa.dblist [20:58:06] 01/10/2012 - 20:58:06 - Updating keys for hashar [20:58:09] 01/10/2012 - 20:58:09 - Updating keys for hashar [20:59:27] oren: https://gerrit.wikimedia.org/r/#patch,unified,1590,1,templates/lucene/lsearch.conf.erb can you see that file? [20:59:41] petan: same issue with an ubuntu machine :b [20:59:43] RECOVERY dpkg-check is now: OK on deployment-web deployment-web output: All packages OK [21:00:01] oren: that's a list of all wikis that are hosted in tampa [21:00:12] oren: currently served up by nfs [21:04:49] oops [21:04:53] removed myself from the project :-) [21:05:27] !Ryan [21:05:27] man of the all answers ever [21:05:28] what we would need for the indexer to work for the long term is some script that make a file://home/wikipedia/common/pmtpa.dblist with the list of all the wikis that should be indexed in the same format [21:05:29] hashar: ^ [21:06:02] yeah I am asking ryan right now [21:06:28] Is anything still *using* pmtpa.dblist? [21:06:32] I guess search is [21:06:35] It should be using all.dblist [21:06:42] That way we have only one file listing all wikis instead of two [21:07:39] Oren: sure, that's reasonable. [21:11:11] i'm wiriting up the stuff [21:11:39] why u no respond, labs? [21:12:11] petan: are all the wikis configured with MWSearch [21:12:25] I don't know... [21:12:28] we copied it from prod [21:13:48] petan: :D [21:13:54] huh? [21:14:11] notpeter: definitely don't add the jar to git :) [21:14:24] petan: you will need it and you will also need OAIRepository, and OpenSearchXml [21:14:27] the jar can likely be packaged [21:14:30] and installed via apt [21:14:32] labs still no respond!!!! [21:14:36] Oren that we have [21:14:40] hexmode: eh? [21:14:41] * hexmode runs around freaking out [21:14:42] I'm not sure abouth the last one [21:14:47] something broken? [21:14:50] hexmode: I install squid [21:14:52] I told you [21:14:53] :o [21:14:57] oh [21:15:01] nm then [21:15:05] did you install it on a separate instance? [21:15:15] hm... not for now [21:15:19] (you should, if didn't) [21:15:21] I think 2gb of ram is enough [21:15:29] I didn't want to eat more space to you [21:15:29] install it on a separate box [21:15:32] ok [21:15:34] then both can use port 80 [21:15:40] ok [21:16:06] Ryan_Lane: don't worry. I would never do that [21:17:24] PROBLEM HTTP is now: CRITICAL on deployment-web deployment-web output: Connection refused [21:17:32] notpeter: you mean like I did for gerrit :D [21:17:48] !log deployment-prep created squid box [21:17:49] Logged the message, Master [21:17:57] notpeter: howmuch memory/storage do the search machines use [21:19:19] New patchset: Pyoungmeister; "most of the way towards removing dependency on nfs. now just dependent on scap...." [operations/puppet] (testlabs/searchoverhaul) - https://gerrit.wikimedia.org/r/1825 [21:20:05] New review: Pyoungmeister; "(no comment)" [operations/puppet] (testlabs/searchoverhaul); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1825 [21:20:06] Change merged: Pyoungmeister; [operations/puppet] (testlabs/searchoverhaul) - https://gerrit.wikimedia.org/r/1825 [21:22:23] RECOVERY HTTP is now: OK on deployment-web deployment-web output: HTTP OK: HTTP/1.1 302 Found - 565 bytes in 0.004 second response time [21:22:24] oren: searchidx currently uses 600gig, 48 gigs of ram, search boxes have 16 or 32 gigs of ram, and looks to be using 50-100 gigs of storage [21:25:02] Ryan_Lane: I can't connect to it [21:25:17] did it finish building? [21:25:21] yes [21:25:30] ip 10.4.0.17 [21:25:33] PROBLEM host: deployment-squid is DOWN address: deployment-squid PING CRITICAL - Packet loss = 100% [21:25:43] this happens a lot [21:25:51] I always have to recreate it, like 5 times [21:25:56] oh [21:25:58] it's a DNS issue [21:26:02] probably yes [21:26:06] if you deleted then recreated it [21:26:10] not now [21:26:11] the dns entry is cached [21:26:12] before [21:26:16] this one wasn't recreated [21:26:17] oh [21:26:29] then there shouldn't be an issue [21:26:40] I can't ssh even to ip [21:26:48] neither pingh [21:26:54] hmm [21:26:55] weird [21:27:02] https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=deployment-prep&instanceid=i-000000dc [21:27:10] gimme a min [21:27:10] o.o [21:27:12] ok [21:27:13] I'm in a class [21:27:16] ah [21:27:19] sorry [21:28:41] I don't see how it could build itself, then fail [21:30:46] yes I think it's a net issue [21:30:54] or firewall [21:31:38] "deployment-squid" [21:31:44] what. [21:31:53] johnduhart: hm? [21:31:55] what's wrong [21:32:09] 01/10/2012 - 21:32:08 - Creating a home directory for hoo at /export/home/bastion/hoo [21:32:14] [12:13:59] Just don't do any major configuration changes :P [21:32:24] Thank you for honoring my wishes [21:32:27] hexmode: I finished simple wiki [21:32:31] I really appreicate that [21:32:33] johnduhart: that's empty instances [21:32:36] * instance [21:32:37] Good [21:32:42] if you want I can nuke it [21:32:50] I don't really think I touched your config [21:33:05] petan: w00! [21:33:09] 01/10/2012 - 21:33:09 - Updating keys for hoo [21:33:36] I jsut walked in the door, one moment [21:33:42] btw johnduhart that essage is from today? [21:33:46] I am on irc since now [21:33:52] you should have send me an email [21:33:53] RECOVERY host: deployment-squid is UP address: deployment-squid PING OK - Packet loss = 0%, RTA = 1.46 ms [21:34:03] I don't read all stuff [21:34:03] petan: working now [21:34:04] petan: Yes, that was today [21:34:10] johnduhart: I wasn't here [21:34:15] cool [21:34:23] johnduhart: so, I am about to set up a squid [21:34:25] telling you now [21:34:27] for some reason the nova-compute service on virt4 is acting strange [21:34:29] or you want to do that? [21:34:37] hm... [21:34:37] hexmode: You were here? [21:34:53] johnduhart: right here [21:34:55] 'sup? [21:35:16] I would personally like everything to come to a halt so we can review what's happened since I left, what needs to happen, and outstanding issues [21:35:30] Etherpad please [21:35:36] !log deployment-prep maintenance on simple [21:35:37] Logged the message, Master [21:35:51] http://etherpad.wikimedia.org/DeploymentPrep [21:36:51] petan hexmode http://etherpad.wikimedia.org/DeploymentPrep [21:37:05] oren: http://etherpad.wikimedia.org/DeploymentPrep [21:37:23] PROBLEM Total Processes is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:37:31] !ep DeploymentPrep | johnduhart [21:37:31] johnduhart:http://etherpad.wikimedia.org/DeploymentPrep [21:38:13] PROBLEM dpkg-check is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:38:53] PROBLEM Current Load is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:39:33] PROBLEM Current Users is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:39:43] petan: What's the need for squid right now? [21:40:00] that's how production works? [21:40:02] :) [21:40:04] notpeter: I think it would be usefull to have a look at the scripts running production... [21:40:10] johnduhart: we want to have site stable [21:40:15] that mean no outages [21:40:24] so if we don't set it up now, we should not do that later [21:40:30] Ryan_Lane: Well yes but there's still some on-site issues that need to be ironed out [21:40:32] PROBLEM Disk Space is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:40:35] we == me and hex [21:40:45] ah [21:40:46] petan: Fine, that would be first thing I'll do after citical issues [21:40:49] I see [21:40:52] PROBLEM Free ram is now: CRITICAL on deployment-squid deployment-squid output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:40:59] johnduhart: why not to handle both together? [21:41:05] what critical issues you talk about [21:41:05] labs-nagios-wm: shhhh [21:41:30] petan: I don't know? Did anything catch on fire while I was gone? [21:41:36] yes [21:41:44] <^demon|away> What you talkin bout labs-nagios-wm? [21:41:45] actually some of you changes did :D [21:41:53] oh? [21:41:54] you did syntax error [21:41:58] ] oops [21:42:06] that blow up all wikis :D [21:42:10] :D [21:42:12] oren: https://gerrit.wikimedia.org/r/#patch,sidebyside,1825,1,files/lucene/lucene.jobs.sh that is a script that I made based on the scripts that run as crons in prod [21:42:29] anyway there was a problem in sul [21:42:33] no one could create an account [21:42:41] since yesterday [21:42:44] That's an issue [21:42:46] okay [21:42:48] that was why there are no new users [21:42:53] I fixed it (hope) [21:43:00] Oren: there are a couple of others for doing things like building the .jar on the search indexer and pushing it out to the various search boxes [21:43:06] there was arwiki db but empty [21:43:12] configured in all.dbs [21:43:13] petan: Yup I rmoved that from dblist [21:43:17] removed* [21:43:19] ah really? [21:43:22] I still saw it there [21:43:24] I thought I did.. [21:43:28] anyway it's now working [21:43:32] so no need to do that [21:43:41] k [21:43:49] petan: How did you fix SUL? [21:43:50] simplewiki is consistent [21:43:52] PROBLEM Current Load is now: CRITICAL on cantlogin cantlogin output: Connection refused by host [21:43:57] johnduhart: sul didn't work because of ar [21:44:04] ok [21:44:05] it crash on invalid db [21:44:19] Oren: there are also some start/stop scripts that I want to turn into an init script [21:44:32] PROBLEM Current Users is now: CRITICAL on cantlogin cantlogin output: Connection refused by host [21:44:43] !log deployment-prep deleting big db's expect db lags ^^ [21:44:44] Logged the message, Master [21:44:46] petan: squid setup hasn't started yet right? [21:44:56] yes it didn't but I started installing it though [21:45:01] it's unconfigured [21:45:06] Okay [21:45:20] rebooting instance [21:45:20] now [21:45:21] Do you have their script for generating configuration? [21:45:27] maybe [21:45:32] PROBLEM Disk Space is now: CRITICAL on cantlogin cantlogin output: Connection refused by host [21:45:40] there is a package wm-squid [21:45:47] No that's not it [21:45:50] so, lets write up what we still need to finish on the etherpad so I can keep track and not freak out :) [21:45:54] I'll handle that [21:45:54] right so feel free to install it [21:45:56] it's just squid, as far as I remember [21:46:02] lemme check [21:46:02] PROBLEM Free ram is now: CRITICAL on cantlogin cantlogin output: Connection refused by host [21:46:29] squid, squid-common, squid-frontend [21:46:36] wikimedia-task-squid? [21:46:39] Ryan_Lane: ^ [21:46:44] if not, why is it there [21:46:45] Ryan_Lane: Didn't we have a discussion today about there being a script to generate the squid configuration [21:46:46] that too [21:46:48] ok [21:46:51] that's what I did [21:46:56] johnduhart: yes. [21:47:05] we'll need to move that from production for this to work [21:47:13] and it needs to be sanitized [21:47:22] PROBLEM Total Processes is now: CRITICAL on cantlogin cantlogin output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:47:45] Ryan_Lane: Do you know someone from ops that can help me with that? [21:48:12] PROBLEM dpkg-check is now: CRITICAL on cantlogin cantlogin output: CHECK_NRPE: Error - Could not complete SSL handshake. [21:48:13] most people probably can [21:48:16] cantlogin? [21:48:23] who named an instance that? :D [21:48:43] oren: o.O [21:48:59] that is me [21:49:13] oh [21:49:15] I have created a fresh instance to verify my ssh connection issue :b [21:49:22] oren: ifgnore me :) [21:49:30] I may need to run fsck on sql [21:49:32] hashar: You should be able to login to testswarm [21:49:34] Ryan_Lane, johnduhart: I'll just sanitize the /h/w/conf/squid directory and throw it into gerrit, you guys can take it from there then [21:49:34] :o [21:49:45] RoanKattouw: Thanks [21:49:45] that would be good to connect with squid, johnduhart [21:49:45] RoanKattouw: ok [21:49:57] let's do it while we configure squid [21:50:01] and site doesn't work [21:50:02] RoanKattouw: remember to remove the password and any blocks we have [21:50:04] Ryan_Lane: yeah I should :-)) But somehow the forwarded key is rejected. [21:50:04] Yes [21:50:08] we don't want to advertise the block list :) [21:50:21] will try to reproduce the issue locally with my own cluster [21:50:22] or is it normal that when you drop database disk is still full? [21:50:25] Anything that looks like a password or an IP address will be removed [21:50:28] hashar: I checked your authozied keys file [21:50:34] I'll leave it to John to convert it to proper templates with secret vars etc [21:50:34] Ryan_Lane: where what ? [21:50:45] Krinkle: eh? [21:51:10] Ryan_Lane: I think the issue is that the laptop I am using use a different user name ( amusso ) which is different from the one on the labs cluster [21:51:17] oh [21:51:18] yeah [21:51:28] wait [21:51:28] ssh hashar@foobar.com ? [21:51:35] but then you couldn't get into bastion [21:51:36] OH MY GODD [21:51:48] Ryan_Lane: Hm.. I thought you mentioned me… ? nevermidn [21:52:02] Krinkle: is possible. might have been accidental [21:52:20] RoanKattouw: no no :) I have a ~/.ssh/config file to override the username to hashar on *.wmflabs.org host. And ssh to bastion works :-b [21:52:25] Oh, OK [21:52:37] * hashar contacts foobar.com postmaster to claims above email address [21:53:32] PROBLEM host: deployment-squid is DOWN address: deployment-squid CRITICAL - Host Unreachable (deployment-squid) [21:54:49] what happened to deployment-squid? [21:55:48] ah. rebooted [21:56:18] notpeter: perhaps you can join us in the etherpad [21:56:28] http://etherpad.wikimedia.org/DeploymentPrep [21:56:29] link? [21:56:37] thanks :) [21:56:42] petan: You're lwriting in the wikinews section [21:57:00] heh thanks [21:57:28] Ryan_Lane: thank you for the authorized key check :-) [21:57:33] yw [21:57:34] will poke mutante tomorrow [21:57:36] is it working now? [21:57:40] nop [21:57:44] hm [21:57:56] you are getting permission denied? [21:58:04] try sshing into bastion, from bastion [21:58:10] it is too late for me to investigate further anyway. Will give it a poke with mutante tomorrow I think [21:58:19] that'll tell you if your agent is forwarded properly or not [21:58:27] you ran an agent, and added your key to it, right? [21:58:42] RECOVERY host: deployment-squid is UP address: deployment-squid PING OK - Packet loss = 0%, RTA = 0.81 ms [21:59:03] I am on Mac OS X and got an agent running [21:59:16] petan: if you have an issue with an instance not responding, then try rebooting it before rebuilding it [21:59:18] so I usually just "ssh-add" and then type my key passphrase [21:59:21] ah [21:59:22] MaxSem: can you help with importing abusefilter rules? [21:59:24] right [21:59:34] ssh localhost on bastion does work :) [21:59:38] ok [21:59:39] really? [21:59:41] weird [21:59:42] Ryan_Lane: I think I tried it once [21:59:52] but it didn't work... [22:00:04] OH [22:00:09] hexmode, tomorrow, it's 2am here:) [22:00:21] hashar: / on testpuppet is full [22:00:30] yeah hence the cantlogin VM :-) [22:00:30] MaxSem: k... may be done by then, but thanks [22:00:51] hashar: what would fill it? [22:01:25] I have a 1 minute cronjob fetching full /trunk/phase3 dumps :b [22:01:32] so that fill a disk quickly :b [22:01:48] you should put it in /mnt [22:01:50] I'll look at the abusefilter stuff [22:01:52] where's it write to? [22:02:20] surely /var/lib/testswarm/ [22:02:26] probably /var/lib/testswarm/checkouts [22:02:59] petan: Can you update list below the todo? [22:03:14] thanks [22:03:36] hashar: I'm moving the entire directory to /mnt [22:03:49] and linking /var/lib/testswarm to /mnt/testswarm [22:03:49] the cronjob will fill it out again [22:04:07] now it'll write to mnt rather than var [22:04:09] and that job + the scripts are both managed by puppet :b [22:04:26] * Ryan_Lane nods [22:04:41] johnduhart: what capacity you mean? [22:05:47] petan why not join the etherpad session [22:05:49] http://etherpad.wikimedia.org/DeploymentPrep [22:05:59] Oren I am Petrb [22:06:00] oren: He is in it ;) [22:06:16] heheh [22:06:21] Ryan_Lane: the "cantlogin" vm has the same issue anyway and puppet probably has run on it by now :b [22:06:28] heh [22:06:30] Ryan_Lane: so there must be something else screwed [22:06:55] should I do different abusefilter rules for each wiki [22:07:25] hashar: what init script restarts testswarm? [22:08:05] meh. screw it. rebooting testswarm [22:08:10] Ok... so I guess, I need to be part of a project to do anything? [22:08:26] yep [22:08:55] Oren: I'm signing off for the night. need to make dinner. feel free to email me about anything you want to know about and I'll respond later tonight/tomorrow [22:09:00] !log testswarm rebooting testpuppet [22:09:01] Logged the message, Master [22:09:23] !log testswarm moved /var/lib/testswarm to /mnt/testswarm, and make a link pointing to it [22:09:24] Logged the message, Master [22:10:06] RoanKattouw: Still working on that? [22:10:36] Yes [22:10:37] Just started [22:10:39] Krinkle: are you looking for some work to help us out with on deployment? [22:10:41] scp took 10 mins, it's a shitload of files [22:10:48] And I'm having to do quite a bit of censoring [22:10:55] heheh [22:11:05] hexmode: I'm not bored, but sure keep me posted, and tell me where I can help out. [22:11:25] !ep DeploymentPrep | Krinkle [22:11:25] Krinkle:http://etherpad.wikimedia.org/DeploymentPrep [22:11:57] Krinkle: know anything about the problem with images on commons on that page? [22:12:11] I think that is the only issue that is left right now [22:12:22] (that someone isn't handling) [22:12:32] RECOVERY Current Users is now: OK on deployment-squid deployment-squid output: USERS OK - 0 users currently logged in [22:12:36] to whomever manages wm-bot , it needs to put a space after "name:", otherwise common irc clients make it one long thing [22:13:10] !wm-bot | petan [22:13:10] petan: http://meta.wikimedia.org/wiki/WM-Bot [22:13:13] it does [22:13:16] Krinkle: ^ [22:13:23] !wm-bot gsghsd | petan [22:13:23] petan: http://meta.wikimedia.org/wiki/WM-Bot [22:13:31] !test is $1 [22:13:31] Key was added! [22:13:32] PROBLEM host: testpuppet is DOWN address: testpuppet CRITICAL - Host Unreachable (testpuppet) [22:13:32] RECOVERY Disk Space is now: OK on deployment-squid deployment-squid output: DISK OK [22:13:37] !test gdfsg | petan [22:13:37] petan: gdfsg [22:13:47] Krinkle: waht did you mean [22:13:52] RECOVERY Free ram is now: OK on deployment-squid deployment-squid output: OK: 92% free memory [22:15:22] RECOVERY Total Processes is now: OK on deployment-squid deployment-squid output: PROCS OK: 78 processes [22:15:26] !ep Foo | petan [22:15:26] petan:http://etherpad.wikimedia.org/Foo [22:15:37] !eop [22:15:38] !ep [22:15:38] http://etherpad.wikimedia.org/$1 [22:15:45] !ep del [22:15:46] Unable to find the specified key in db [22:15:52] RECOVERY host: testpuppet is UP address: testpuppet PING OK - Packet loss = 0%, RTA = 0.66 ms [22:15:54] !etherpad [22:15:54] http://etherpad.wikimedia.org/$1 [22:16:01] aha [22:16:05] !t alias test [22:16:05] Successfully created [22:16:09] !t | petan [22:16:09] petan: [22:16:12] RECOVERY dpkg-check is now: OK on deployment-squid deployment-squid output: All packages OK [22:16:13] !test [22:16:13] $1 [22:16:18] !t xx | petan [22:16:18] petan:xx [22:16:21] ok [22:16:24] will fix [22:16:24] !test xx | Krinkle [22:16:24] Krinkle: xx [22:16:27] :) [22:16:52] RECOVERY Current Load is now: OK on deployment-squid deployment-squid output: OK - load average: 0.00, 0.03, 0.00 [22:18:14] petan: Start on squid? Sure [22:18:18] yes [22:18:29] I want to do fsck during that [22:18:39] you will have to turn off webserver for a while [22:18:45] sure [22:19:14] Going to get something to eat while I wait on roan [22:19:18] ok [22:19:31] when you gonna do that, because it's late here [22:19:35] Sorry, lots of stuff to censor [22:19:36] or it's gonna be late [22:19:42] stuff to censor? :o [22:19:51] From the Squid cofig [22:19:52] in configs [22:19:53] ah [22:19:54] ok [22:19:55] Passwords, blocked IPs [22:20:15] there are blocked ip's on squid? [22:20:17] yay [22:20:26] I thought people were only blocked from editing till now [22:20:27] :D [22:20:47] PROBLEM dpkg-check is now: CRITICAL on testpuppet testpuppet output: Connection refused by host [22:21:22] PROBLEM Current Users is now: CRITICAL on testpuppet testpuppet output: Connection refused by host [22:22:32] PROBLEM Free ram is now: CRITICAL on testpuppet testpuppet output: Connection refused by host [22:22:42] PROBLEM Total Processes is now: CRITICAL on testpuppet testpuppet output: Connection refused by host [22:22:52] PROBLEM Current Load is now: CRITICAL on testpuppet testpuppet output: Connection refused by host [22:24:36] heading bed, have a good night [22:24:40] night [22:27:55] petan: There's people who do much more evil things than vandalise. That's how you get an IP block [22:28:27] RoanKattouw: No rush [22:30:00] hexmode: sorry, I know nothing about commons' cluster stuff. [22:30:16] !log deployment-prep created nfs:/mnt/export/backup use it for all files which aren't versioned [22:30:17] Logged the message, Master [22:30:24] Krinkle: k np [22:46:22] RECOVERY Current Users is now: OK on testpuppet testpuppet output: USERS OK - 1 users currently logged in [22:46:49] johnduhart: did you forgot to install texvc? [22:46:54] forget [22:47:06] petan: It was installed with wikimedia-task-appserver [22:47:07] I think that's why math doesn't work [22:47:10] ah ok [22:47:11] Oh wait [22:47:15] no that's in common/bin [22:47:18] and is it configured right? [22:47:21] You compile that [22:47:24] It should be [22:47:29] is it compiles? [22:47:30] d [22:47:32] RECOVERY Free ram is now: OK on testpuppet testpuppet output: OK: 66% free memory [22:47:42] RECOVERY Total Processes is now: OK on testpuppet testpuppet output: PROCS OK: 108 processes [22:47:52] RECOVERY Current Load is now: OK on testpuppet testpuppet output: OK - load average: 0.37, 0.09, 0.03 [22:47:54] ok [22:48:01] oren: I'm not clear on where we are with search. Is deployment set up to use it? Is indexing still happening? [22:48:28] johnduhart: we must put it to own config file [22:48:33] this will overriden by prod [22:48:34] Huh? [22:48:41] you commented it out in prod config [22:48:48] CommonSettings [22:48:50] No that's not prod config [22:48:57] Ryan_Lane: i think there was discussion about having a labs mail list? was there a verdict? [22:49:01] I know but it's in our copy [22:49:04] * jeremyb heads to scrollback [22:49:06] which is supposed to be same [22:49:11] It's mismosh because the current configuration setup is sucky [22:49:17] hm... [22:49:23] petan: No InitialiseSettings.php is the file you never touch [22:49:32] ok you should create override for commons too [22:49:55] commons could be changed on prod too and then what [22:50:09] We'd merge in the changes [22:50:21] isn't it easier to create override for that? [22:50:25] CommonSettings.php is rarely changed, and if every stuff gets added on the bottom [22:50:35] yeah, we should have a labs mailing list [22:50:36] so that we could cron update prod [22:50:36] petan: No, it's really hard to override stuff in CommonSettings [22:50:42] RECOVERY dpkg-check is now: OK on testpuppet testpuppet output: All packages OK [22:50:44] Ryan_Lane: yes [22:50:50] +1 [22:50:54] lemme see if I can set that up [22:50:57] ok [22:51:03] I use mails in work but not irc [22:51:44] johnduhart: but that block us from automating sync with prod [22:51:51] what is so hard on it? [22:52:01] perhaps we could change LocalSettings to do that [22:52:08] No [22:52:15] why [22:52:31] There's just stuff in CommonSettings that causes problems if we were to just sync and override [22:52:49] right still there is a reason to override [22:52:53] we wouldn't need to merge [22:52:59] which suck [22:53:02] [17:50:25] CommonSettings.php is rarely changed, and if every stuff gets added on the bottom [22:53:24] our changes are not in bottom [22:53:28] IntialiseSettings.php is where most configuration happens, and we can sync that [22:53:34] CommonSettings.php:// $wgTexvc = "/usr/local/apache/uncommon/$wmfVersionNumber/bin/texvc"; // override default [22:54:01] petan: No, listen. [22:54:06] hm, right... but in that case try to automate merging somehow, if we override more stuff it's going to be annmoying [22:54:45] In production, new changes to CS.php are made at the bottom of the file. [22:54:53] We make changes in the midle of the file for existing things [22:55:10] CS.php is rarely changed outside of adding new things. [22:55:23] why not just have include "CommonsSettingDep.php" on bottom of CommonsSettings.php? it's already different [22:55:26] johnduhart: I'm done censoring, now waiting on Ryan to verify, and he's in a class [22:55:41] petan: [17:52:31] There's just stuff in CommonSettings that causes problems if we were to just sync and override [22:56:00] johnduhart: If he approves my censored version before 23:59 PST tomorrow (Wednesday), I will put it in gerrit myself, otherwise he will have to [22:56:01] definitely $wgTexvc isn't that stuff [22:56:40] yup [22:57:25] oh right there is a problem with extensions being loaded, hm... [22:57:58] however our modifications should be visible then [22:58:05] RoanKattouw: understood, thanks. [22:58:10] like # THIS IS A MODIFICATION TO PROD CONFIG [22:58:19] so that we don't overlook it while merge [22:58:28] otherwise it's gonna suck [22:58:39] petan: Would you like to look in the git history and do that? [22:58:54] I am still thinking of other ways [22:59:48] btw is squid config out now? [23:00:10] ah [23:00:15] !Ryan ... [23:00:15] man of the all answers ever [23:00:17] :o [23:00:51] ? [23:01:03] Ryan_Lane: Roan needs you to check conf of squid [23:01:08] yeah [23:01:12] ok [23:01:15] I talked to him about it [23:01:18] ok [23:01:33] btw you hear on !Ryan? :P [23:01:48] is it like !admin [23:01:50] :) [23:01:58] well, I ping on Ryan [23:02:01] ah [23:02:02] ok [23:02:06] :) [23:03:10] johnduhart: are you using site now? [23:03:28] Not doing anything atm [23:03:30] I am thinking of fsck now, because I don't want to wait for a long time... [23:03:35] no one use it atm [23:03:38] I would do it now [23:03:39] apart of you ane me [23:03:45] Turn off apache [23:03:52] no need [23:03:53] for it [23:04:49] !log deployment-prep disabled sql for fs checks [23:04:50] Logged the message, Master [23:06:58] !log deployment-prep checks done [23:06:59] Logged the message, Master [23:09:58] btw Ryan_Lane why we use ext3 [23:10:02] isn't ext4 better for io [23:10:10] not really [23:10:15] I use ext3 too but not on vm's [23:10:28] I use it because of compatibility issues [23:10:34] that's all [23:11:16] wait. do we use ext3? [23:11:23] I'm pretty sure / is ext4 [23:11:30] I don't think so [23:11:45] it's ext3 [23:11:48] ah. yeah. ext3 [23:11:52] ext3 installs faster [23:11:57] yes [23:12:01] but it's slower :o [23:12:04] or people say that [23:12:12] I don't know [23:12:20] I like ext3 more, because it runs everywhere [23:12:44] but on system with bad io it would make sense to use ext4... although I am no expert on this [23:12:58] it's not the filesystem's fault [23:13:03] it's virtualization's fault [23:13:04] I know [23:13:11] also, the IO is going over the network too [23:13:12] but maybe it would improve it a bit [23:13:19] since the instances are running on gluster [23:13:24] hm... [23:13:35] so, it's ext3 -> gluster -> ext3 -> raid10 [23:13:40] if gluster was directly mounted to vm's maybe it would be better? [23:13:45] :o [23:13:48] it would be, yeah [23:13:56] that's why I want to get the volume cluster in [23:14:00] cool [23:14:16] actually performance is affected by sql now [23:14:37] once we get a dedicated sql it would be much better I hope [23:14:42] heh, and really, it's ext3 -> qcow2 -> gluster -> ext3 [23:14:49] btw are you going to set up limits on query time? [23:14:52] -> raid10 [23:14:56] yes [23:14:59] damn [23:15:07] what about large imports etc [23:15:14] today I run a query which took 3 hours [23:15:30] I needed to copy data from a table to another table [23:15:32] large imports are a bunch of queries [23:15:37] ah [23:15:37] not really [23:16:12] if I did it using export import it would be 5 times longer [23:16:21] reading about squid on apache box vs. ded squid box... can labs instances have multiple NICs or multiple internal IPs? (e.g. ethernet aliases) [23:16:25] calculating time of bzip [23:16:30] to compress input [23:16:55] of course one solution is apache listens to localhost:80 and squid listens to eth0:80 [23:16:56] jeremyb: technically they could have [23:17:03] depends on what Ryan allows [23:17:14] petan: and OSM... [23:17:47] and idk enough about the IP allocation stuffs in nova [23:18:16] (or the separate project/svc for that... can't remember what it's called) [23:18:45] is labs completely puppetized? could we rebuild it from scratch without backups? (other than git repos) [23:19:09] err, and apt repos ;-) [23:19:40] jeremyb: not really [23:19:50] jeremyb: and kind of, yes. heh [23:20:02] Ryan_Lane: ambiguous response... which are you responding to? :) [23:20:15] for multiple internal ips [23:20:24] we are going to do something with LVS for this [23:20:37] and yes, multiple NICs are supported [23:20:41] kind of [23:20:46] but we aren't using them [23:20:53] their implementation currently sucks [23:21:20] what? no. [23:21:24] apache lives on one box [23:21:27] and squid lives on another [23:21:34] why install both on the same instance? [23:21:47] that makes things harder [23:22:17] and yes, labs infrastructure is puppetized [23:22:25] deployment-prep isn't, currently [23:23:22] sure, i know about deployment-prep ;) ;) [23:24:07] Ryan_Lane: i was just thinking the rationale was so they can both run on :80 and my response was you don't need separate boxen for that [23:24:23] yeah, but we *want* seperate ones [23:24:33] it makes more sense [23:24:34] sure. but then that's a different rationale [23:24:46] it's how it is in production [23:25:01] if you wanted them on the same, I'd say bind apache to lo, and squid to eth0 [23:25:06] well... are we doing fe and be squids? ;-) ;-) [23:25:41] fe? [23:25:51] frontend/backend sorry [23:26:27] ah [23:26:30] we could even do squids in eqiad pulling from squids in pmtpa! [23:26:31] yeah. fe and be [23:26:56] and purging... [23:26:58] that's the idea ;) [23:27:12] and upload squids [23:29:46] email list made: https://lists.wikimedia.org/mailman/listinfo/labs-l [23:34:35] Ryan_Lane: emtpy archive [23:34:46] it has no messages yet [23:34:48] the list is lonely [23:34:52] heh [23:35:01] I *just* created it ;) [23:35:47] 2 Non-digested Members of Labs-l:; 0 Digested Members of Labs-l: [23:38:58] http://labs.wikimedia.deployment.wmflabs.org/global/viewfile.php?file= 404 error, http://deployment.wmflabs.org/global/viewfile.php?file= redirects [23:39:17] Ryan_Lane: ^ [23:39:53] * Ryan_Lane is confused [23:40:04] what does that mean? [23:43:44] Krinkle: will fix it [23:43:59] ok [23:44:39] fixed :D [23:44:42] somehow [23:45:14] viewfile will be available soon [23:45:21] we moved configs so it didn't work either [23:46:40] k [23:48:25] petan: new account worked for saibo, but "hexmode-test" account seems busted for me [23:48:35] I update db now [23:48:39] maybe it work now? [23:48:47] I can use my "markA..." account, though [23:49:31] ah... [23:49:35] what error you get? [23:49:38] cookies? [23:51:26] Krinkle: http://labs.wikimedia.deployment.wmflabs.org/w/global/viewfile.php?file= [23:51:31] try it [23:57:13] hexmode: johnduhart I need to go sleep [23:57:25] petan: night! [23:57:30] anything from me you needed? btw johnduhart check live/global it needs to be moved to proper place [23:57:38] I just noticed live is actually svn path [23:58:06] petan: I think I'm good, tyvm for your help! [23:58:08] Ryan_Lane: Still in class? [23:58:10] hexmode: login works to me [23:58:14] yep [23:58:16] dunno what's wrong... [23:58:19] andrewbogott: almost over [23:58:23] 'k [23:58:28] andrewbogott: what's up? [23:58:34] ldap questions. [23:58:46] we're doing class of service, and I don't really care much about it [23:58:57] Well, I'm happy to wait 10 minutes, too. [23:59:14] it's probably more like 30 mins to an hour [23:59:32] OK. Well, I'll ask my questions and you can answer at your leisure. [23:59:38] I'm confused by entries like this one: [23:59:39] ('dc=pmtpa,ou=hosts,dc=wikimedia,dc=org', {'objectClass': ['domainrelatedobject', 'dnsdomain', 'domain', 'dcobject', 'top'], 'sOARecord': ['hostmaster@wikimedia.org 20111228151003 1800 3600 86400 7200'], 'associatedDomain': ['pmtpa.wmflabs'], 'l': ['pmtpa'], 'dc': ['pmtpa']})