[00:07:52] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 164 processes [00:17:01] opps :P [00:58:24] hi; I'm now getting a "Permission denied (publickey)" when I try to ssh into my instance [00:58:34] I was able to ssh last week [00:58:57] connecting into bastion (with ssh -A) still works [01:00:26] does that have to do anything with /home ? [01:02:52] PROBLEM Total processes is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS CRITICAL: 201 processes [01:08:43] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 169 processes [01:09:05] bots-4 is fine btw :) [01:10:01] gribeco, i dont think so [01:10:17] ok [01:10:21] isnt it -f you have to use, not -A? [01:10:40] wait, how are you sshing to bastion? :) [01:10:45] it still says -A in https://labsconsole.wikimedia.org/wiki/Help:Access#Using_agent_forwarding [01:11:02] that's from my local system [01:11:39] and then straight ssh from bastion into bots-salebot [01:11:39] from your local system you should use -f [01:11:39] i think [01:11:39] hmmm, ok, worth a try [01:11:39] and then once in bastion use -A to go to your instance [01:11:39] gribeco: you can get into bastion? [01:11:39] but not into bots-4? [01:11:46] Ryan_Lane: yes, into bastion, but not into bots-salebot [01:11:53] let me try bots-4... [01:11:57] on bastion, type this: ssh-add -l [01:12:07] works into bots-4 [01:12:31] I see why [01:12:41] ah! [01:12:46] well, kind of [01:12:51] /public/keys isn't mounting [01:12:54] eh. [01:13:12] now it is [01:13:12] okay, that's consistent with what ssh -v is saying [01:13:17] autofs needed to be restarted [01:13:43] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 94 processes [01:13:54] is it working for you now? [01:14:13] I can log in, but the connection gets closed immediately [01:14:17] hm [01:14:18] one sec [01:15:39] weird [01:15:44] there's something up with this instanc [01:17:08] hm. it's shared to it properly [01:19:44] gribeco: I think this instance needs to be rebooted [01:19:58] it seems it was never rebooted after the switch to the new homedirs [01:20:12] and I can't get the mounts to be sane [01:20:17] sure -- is that something I can do myself without logging in? [01:20:25] yes, but I can also do it right no [01:20:26] *now [01:20:29] ok [01:20:34] ok, it's rebooting [01:22:41] yup, I can log in now; thanks! [01:23:11] hm [01:23:16] there's still something wrong [01:23:21] xD [01:23:33] it's still mounting the old homedir [01:23:40] even after a reboot [01:23:42] which makes no sense [01:23:55] yes, I only have r/o access to /home/gribeco [01:25:11] o.O [01:25:23] I can't write to anything in /root or /tmp [01:29:22] I'm so confused [01:30:18] it says no space left on device [01:30:25] but it also says / has 77% left [01:30:27] err [01:30:30] in use [01:30:32] yeah [01:31:04] df is still reporting errors, too [01:31:25] yeah, that's because I disabled autofs [01:31:37] ok [01:31:44] it seems that / was indeed full [01:33:56] now to see what's eating so much space [01:34:17] heh [01:34:22] it's salebot's cache [01:34:35] 5.2GB of space [01:34:52] you should really write that to /mnt, rather than /var/local/salebot/cache [01:34:59] ok [01:35:12] want me to clear it, or move it? [01:35:32] move, please [01:35:34] I can move the salebot directory to /mnt, if you'd like [01:35:34] ok [01:35:44] I'll add a link, too [01:35:53] thanks [01:35:57] yw [01:36:55] it's weird that df reports incorrectly [01:37:12] I wonder if you are going over the filesystem limit for number of files in a directory [01:37:13] I agree =) haven't seen this before [01:37:21] oh, that's plausible [01:37:34] what does it cache? [01:37:38] I thought I was flushing that cache; I'm having doubts now [01:37:43] heh [01:37:52] at 5.2GB I'd imagine you aren't :) [01:38:07] if it's usually a small amount of cache, you may want to consider using memcache or redis [01:38:07] ^^ [01:38:24] which will automatically purge the last used item [01:38:29] ok [01:38:55] looks like you are good to log in now [01:39:08] it's still moving the data [01:39:20] but the mounts are working [01:39:41] yup, df is a lot longer now [01:39:52] labs-nfs1:/export/home/bots/gribeco [01:39:55] that still shows up [01:39:56] which is wrong [01:40:12] your homedir is writable [01:40:28] yes, works for me too [01:41:01] there we go [01:41:06] I lazy unmounted that [01:42:16] Ryan_Lane, should i keep my data/code in /home/addshore or in /data/project/something ? [01:42:30] addshore: /data/project/something [01:42:59] i guess i should fix that shortly :P [01:43:04] whats the benefit of it being there? [01:45:55] in case you leave and someone else wants to take over your bot [01:46:02] or if you want others to help you run it [01:46:07] or to collaborate in some way [01:46:16] i see [= [01:46:57] it's possible to make that happen in your homedir, too, but messing up your homedir permissions could end up with your account getting owned [01:47:11] usually other people are unwilling to go into a homedir, too [01:47:25] also, we quota the homedirs more heavily than the project data store [01:49:29] gribeco: heh. I very much think this is over the limit for files in a directory [01:49:36] or well, over what's a good idea [01:49:42] the copy is going incredibly slow [01:49:45] im guessing there is no special way to make a project folder, i just dive in and do it? [01:49:46] err [01:49:47] move [01:50:00] addshore: hm. it may require sudo [01:50:11] okay =] [01:50:17] Ryan_Lane: ok, it's fine to nuke all the files [01:50:51] it's not a big deal, it can copy as long as you'd like to wait [01:51:07] if the cache is important [01:55:36] Ryan_Lane: you can abort the move for now, I'm deleting older files [01:55:46] I'll resume when I'm back to a smaller amount [01:56:10] and add the symlink once that's done [01:56:50] ok [02:22:52] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 197 processes [02:30:53] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [02:35:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [02:54:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 22% free memory [03:12:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 12% free memory [03:57:52] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 150 processes [04:00:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [04:15:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [04:42:42] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 152 processes [04:50:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.42, 5.30, 5.11 [05:10:33] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.81, 4.77, 4.93 [05:17:42] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [05:20:53] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [05:38:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 7.11, 6.38, 5.50 [05:40:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [06:30:42] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 155 processes [06:32:52] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 151 processes [06:35:42] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 149 processes [06:37:52] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 148 processes [09:05:54] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [09:10:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 10% free memory [11:22:42] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 152 processes [12:07:42] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 142 processes [13:05:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [13:10:58] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [13:22:22] PROBLEM Total processes is now: WARNING on vumi-metrics.pmtpa.wmflabs 10.4.1.13 output: PROCS WARNING: 151 processes [13:23:43] PROBLEM Free ram is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Critical: 5% free memory [13:27:23] RECOVERY Total processes is now: OK on vumi-metrics.pmtpa.wmflabs 10.4.1.13 output: PROCS OK: 148 processes [13:41:23] !g I78263b8b428a37549db5b59400867ccdc27bb0e7 [13:41:23] https://gerrit.wikimedia.org/r/#q,I78263b8b428a37549db5b59400867ccdc27bb0e7,n,z [13:58:23] Change abandoned: Hashar; "(no reason)" [labs/nagios-builder] (master) - https://gerrit.wikimedia.org/r/39691 [14:32:47] !log Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 {{gerrit|42951}} [14:32:47] Made is not a valid project. [14:32:54] !beta Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 {{gerrit|42951}} [14:32:54] 42951}}: !log deployment-prep Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 {{gerrit [14:33:03] o.0 [14:33:04] STUPID FOGUAC BOT [14:33:17] !log deployment-prep Made enwiktionary to use 1.21wmf6 and enwikibooks to use 1.21wmf7 {{gerrit|42951}} [14:33:19] Logged the message, Master [14:55:34] !beta reloaded udp2log-mw on -dbdump [14:55:34] !log deployment-prep reloaded udp2log-mw on -dbdump [14:55:37] Logged the message, Master [14:58:36] !beta -video05 : trimmed /var/log/glusterfs/data-project.log file [14:58:36] !log deployment-prep -video05 : trimmed /var/log/glusterfs/data-project.log file [14:59:10] !log deployment-prep -video05: running apt-get upgrade [15:00:59] !log deployment-prep -video05 updating puppet local repository and running puppet [15:04:25] stupid wm-bot [15:04:29] seriously pissed up by it [15:04:41] !beta deployment-prep -video05: running apt-get upgrade; updating puppet local repository and running puppet [15:04:41] !log deployment-prep deployment-prep -video05: running apt-get upgrade; updating puppet local repository and running puppet [15:05:03] !beta copy php-master/LocalSettings.php to php-1.21wmf6 and 1.21wmf7 [15:05:03] !log deployment-prep copy php-master/LocalSettings.php to php-1.21wmf6 and 1.21wmf7 [15:32:15] !beta -video05 : restarted puppet and puppetmaster, killed stalled puppet processes. Rerunning puppet manually [15:32:15] !log deployment-prep -video05 : restarted puppet and puppetmaster, killed stalled puppet processes. Rerunning puppet manually [15:32:18] Logged the message, Master [15:46:11] !log deployment-prep Running mw-update-l10n on deployment-bastion in screen 16609.pts-0.i-00000390 [15:46:14] Logged the message, Master [15:50:22] iw bots can run in labs? [15:56:02] !log deployment-prep Refreshing the TrustedXFF cache: cd /home/wikipedia/common/php-master/extensions/TrustedXFF && mwscript extensions/TrustedXFF/generate.php --wiki=aawiki ../../cache/trusted-xff.cdb [15:56:03] Logged the message, Master [15:59:24] !beta cp php-master/cache/trusted-xff.cdb php-1.21wmf6/cache/ [15:59:24] !log deployment-prep cp php-master/cache/trusted-xff.cdb php-1.21wmf6/cache/ [15:59:28] Logged the message, Master [15:59:29] !beta cp php-master/cache/trusted-xff.cdb php-1.21wmf7/cache/ [15:59:29] !log deployment-prep cp php-master/cache/trusted-xff.cdb php-1.21wmf7/cache/ [15:59:32] Logged the message, Master [15:59:45] hashar: please don't use !beta. then we don't know who !log'd it [15:59:50] !beta del [15:59:50] Successfully removed beta [15:59:59] seriously [16:00:11] too many bugs :-] [16:00:12] i is totally serious ;-) [16:00:56] <^demon> too many bots. [16:01:30] 09 15:59:28 < hashar> !beta cp php-master/cache/trusted-xff.cdb php-1.21wmf7/ca\che/ [16:01:44] hashar: also, your last path component there is messed up [16:01:54] some encoding thing? but it should just be ascii [16:02:37] !log deployment-prep enwiktionary beta now running 1.21wmf6 http://en.wiktionary.beta.wmflabs.org/wiki/Special:Version [16:02:38] Logged the message, Master [16:03:06] jeremyb: bug in the bot I guess :-D [16:03:12] no... [16:03:15] jeremyb: looks fine in my scrollback [16:03:23] !beta cp php-master/cache/trusted-xff.cdb php-1.21wmf7/cache/ [16:03:26] copy paste [16:03:40] hashar: it's broken on my screen too [16:03:51] not just on wiki [16:04:12] might be an issue from my terminal thus [16:04:44] !log deployment-prep the recent (today at least, probably most of them) !logs from wm-bot are really from hashar. in case you were looking for the source. [16:04:46] Logged the message, Master [16:05:08] got the job done anyway :-] [16:05:10] now I am out again! [16:05:14] * hashar waves [16:07:59] Alchimista, you could try bots-nr2 [16:07:59] I think that's unused - https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots [16:08:33] Thehelpfulone: ok, i must finish something here, 2 min.. [16:10:07] sure [16:10:31] jeremyb, did we disable the bot that tells everyone who added whom to which project? [16:10:36] !log bots added Alchimista [16:10:38] Logged the message, Master [16:15:17] hello lovely people [16:15:25] I'm Dan Andreescu, analytics [16:16:05] I just rebooted a labs instance (reportcard2) to fix the read-only home directory [16:16:12] and now I can't ssh into it [16:16:31] *** /dev/vda1 will be checked for errors at next reboot *** [16:16:31] *** /dev/vdb will be checked for errors at next reboot *** [16:16:35] those are two of the errors [16:16:49] and these are two others: [16:16:50] Creating directory '/home/milimetric'. [16:16:56] Unable to create and initialize directory '/home/milimetric'. [16:17:23] milimetric: will be checked for errors is not an error... [16:17:25] that's reportcard2.pmtpa.wmflabs [16:17:26] Thehelpfulone: erm, i never knew about that bot [16:17:40] well, it won't let me ssh into it [16:17:43] sure [16:18:05] let me know if you'd like any other info [16:18:10] drdee can't hit it either [16:18:39] thanks for looking into it :) [16:18:53] it happened at a very bad time - I just took down the reportcard for the metrics meeting [16:19:00] * jeremyb can't look into it... [16:19:06] maybe andrewbogott_afk or paravoid can? [16:19:14] ok, cool [16:19:16] (reportcard2's borked) [16:19:22] milimetric, metrics is tomorrow right? [16:19:26] yep [16:19:31] milimetric: hah, bad time ;) [16:19:36] indeed [16:19:41] i have to deploy the new limn :) [16:20:05] milimetric: well it's all puppetized right, so you can just build a new one? ;-P [16:20:27] server guy I am not :) [16:21:16] so maybe I can get otto to help but do you think that'd be easier than fixing this? [16:22:56] let's first try to fix it [16:23:17] paravoid, are you around? we have a borked labs instance that we need for tomorrows metrics meeting [16:25:33] do you need to do some work on it too drdee? I wouldn't worry too much if not, ~30 mins until the usual SF folk turn up [16:26:17] Thehelpfulone: that early?? ;) [16:43:25] well if we're lucky :P [16:49:54] hey [16:49:55] what's up [16:50:11] reportcard2? [16:50:13] looking [16:50:14] yup [16:50:25] milimetric: ^ [16:50:33] hi! [16:50:35] yes [16:50:40] can't ssh into it after reboot [16:50:58] Creating directory '/home/milimetric'. [16:50:58] Unable to create and initialize directory '/home/milimetric'. [17:00:51] sigh I have no idea how the new glusterfs setup works [17:01:33] and this non very unixy way of doing things (no entry in fstab) doesn't help me understand it :) [17:03:05] paravoid: i think it's an automount? [17:05:15] maybe from /etc/autofs_ldap_auth.conf [17:05:23] * jeremyb isn't really sure of course [17:06:08] sorry paravoid, is there anything I would know that would help? [17:06:16] * milimetric doesn't know almost anything [17:06:17] :) [17:07:03] milimetric: i think the problem is not specific to your instance [17:07:11] gotcha [17:07:49] so the reason I restarted this machine was because /home/milimetric was read-only [17:08:20] I think I'll wait for Andrew or Ryan, they should be waking up about now [17:09:22] paravoid: for future reference, would mike be good to poke about this? [17:09:36] I think so [17:09:56] k [17:10:57] ooh, just read about Cambridge. Congratulations Thehelpfulone!! Best of luck [17:11:28] thanks milimetric :) [17:11:47] yep, congrats :) [17:11:52] Thehelpfulone: where shall i do this reading? [17:12:29] or milimetric ? [17:12:55] * jeremyb also has cambridge, MA on the mind atm. but presumably you don't mean there [17:13:55] Thehelpfulone: i can't conect to bastion, did you add me there too? [17:14:57] it was on wmfall jeremyb https://lists.wikimedia.org/mailman/private/wmfall/ [17:15:10] but even though I'm on that list, I can't authenticate to find the message [17:15:15] Alchimista: you already have the shell right? [17:15:22] no :s [17:15:30] well you need that first then [17:15:34] milimetric: i'm not on that list [17:15:35] i mena, i only have git access [17:15:39] *mean [17:15:45] Alchimista, oops sorry - added you [17:15:49] :D [17:15:55] Thehelpfulone: i wants news! ;) [17:16:03] lol [17:17:27] i still can't connect :s If you are having access problems, please see:bla bla bla \n Connection closed by 208.80.153.207 [17:17:43] do i wait some time, or is it an ssh key problem? [17:18:03] i'm doing ssh -A alchimista@bastion.wmflabs.org by the way [17:18:43] I don't think you need the bastion.wmflabs.org bit? or is that how you're doing the instance? [17:18:59] Alchimista, did you use the correct type of SSH key? [17:19:20] i'm following the manual: https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [17:20:13] Thehelpfulone: well the ssh key worked for git, so it should work for labs, right? [17:20:15] it needs to be an OpenSSH ormat [17:20:32] yeah it should do I think (presuming you've added it at https://labsconsole.wikimedia.org/wiki/Special:NovaKey ) [17:21:07] yap [17:24:45] do you use windows Alchimista? [17:25:12] nop, ubuntu [17:25:41] hmm "clear any cache settings or deny host settings that may also prevent you from ssh�ing into bastion. [17:25:41] [edit]" -- - jeremyb are you a labs admin? [17:27:33] Thehelpfulone: i isn't [17:27:47] Thehelpfulone: mazel tov! [17:30:37] spaciba! [17:49:42] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 151 processes [17:50:15] Thehelpfulone: i've done better, removed all my keys, and restarted ssh-server, uploaded a new key and nnothing [17:50:21] i still can't conect to bastion [17:57:18] Alchimista, are you going to be around for a bit? Ryan Lane is pretty good at troubleshooting given he's the lead on Labs :) [17:57:34] sure, i'll poke him later then :d [17:58:05] and thanks by the help :D [17:59:44] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 149 processes [18:05:16] milimetric: reportcard2's still broke? [18:05:19] * jeremyb spies a Ryan_Lane [18:05:36] jeremyb I will try again but I don't think anyone did anything [18:05:46] yes, still broken [18:05:51] 09 16:50:56 < milimetric> Creating directory '/home/milimetric'. [18:05:51] 09 16:50:57 < milimetric> Unable to create and initialize directory '/home/milimetric'. [18:05:54] 09 17:07:48 < milimetric> so the reason I restarted this machine was because /home/milimetric was read-only [18:06:11] Ryan_Lane ^ we're not able to connect to labs instances via ssh [18:06:13] Ryan_Lane: ^ paravoid looked and deferred to you [18:06:17] thanks jeremyb :) [18:06:52] Ryan_Lane: oh, you're here now. I've lost track of the gluster /home work [18:07:03] how do we mount gluster nowadays? [18:07:12] how is glusterfs spawned? [18:07:29] Alchimista: did you get sorted? [18:08:02] nops, still can't connect [18:08:27] Alchimista: what key are you using? i see 4 in ldap and one of those is formatted differently from the other 3 [18:09:54] jeremyb: the last one, i'll clean all the others. i've got one single key in ./ssh [18:11:50] now there is only on on labs too [18:13:16] Alchimista: so what's the key in ssh then? ssh-keygen -l -f .ssh/foo.pub [18:14:38] i've followed the labs tutorial: https://www.mediawiki.org/wiki/Git/Tutorial#Set_Up_SSH_Keys_in_Gerrit so it's ./ssh/id_rsa.pub [18:15:31] jeremyb: ^ [18:15:56] Alchimista: so run `ssh-keygen -l -f .ssh/id_rsa.pub` [18:16:03] and tell me what it says [18:16:08] ./ssh is the wrong place [18:16:32] 2048 b8:2e:d5:0e:01:d6:e9:f2:74:a6:8e:ca:37:2a:8f:6f alchimistawp@gmail.com (RSA) [18:18:55] Ryan_Lane: got you the patch to have 'beta' application servers to mount /dev/vdb on /srv : https://gerrit.wikimedia.org/r/#/c/42743/ :D [18:19:15] and we also have wmf6 / wmf7 wikis in beta now :-] [18:25:14] jeremyb: 2048 b8:2e:d5:0e:01:d6:e9:f2:74:a6:8e:ca:37:2a:8f:6f alchimistawp@gmail.com (RSA) where should be the key? [18:25:31] hold on, now i'm logged on bastion [18:27:02] and now i'm in \o/ [18:34:57] drdee: uuuuugggghhhhh [18:35:08] reportcard2 is running oneiric? [18:35:36] Alchimista: keys go in .ssh not ./ssh [18:36:55] hashar: cool [18:37:15] milimetric ^^ [18:37:26] i see yea [18:37:31] is reportcard2 puppetized? [18:37:34] no [18:37:39] lets see how it goes [18:37:52] I'm going to see if I can add the proper version of glusterfs on it [18:37:59] ok [18:38:07] if it doesn't work in oneiric, there isn't much I can do [18:38:13] you guys really need to puppetize this [18:38:16] gotcha, we'd have to update it? [18:38:20] and to move it to a precise instance [18:38:26] no. upgrades don't really work [18:38:26] yes, we should [18:38:31] oh ok [18:38:49] in fact, that usually permanently break the instance [18:38:58] so there'd be no way to regain ssh access to it if oneiric doesn't like glusterfs, correct? [18:38:58] most of the time people get locked out half-way through [18:39:15] I can likely add your keys to root's key [18:39:17] but... [18:39:29] that would be just to get stuff off of it [18:39:38] puppet would overwrite it [18:40:59] labs-morebots, feeling ok? [18:40:59] I am a logbot running on i-0000015e. [18:40:59] Messages are logged to labsconsole.wikimedia.org/wiki/Server_Admin_Log. [18:40:59] To log a message, type !log . [18:41:10] ok, hm, well if I could get back in I would just fix up the deployment of the reportcard so it's ok for tomorrow [18:41:17] then we can puppetize it after tomorrow [18:42:35] milimetric: does it use anything out of /home [18:42:36] ? [18:42:55] are you going to blow /home out? [18:43:02] well, /home is gluster [18:43:08] no glusterfs, no /home [18:43:23] to deploy the new version of limn, I'll need a writeable /home or to figure out some work-around [18:43:35] are you guys seriously running stuff out of /home? [18:43:38] nono [18:43:49] it just needs to write to /home/user/.npm when it does npm install [18:43:58] and I think I can configure it to use some other directory [18:44:11] but not having a writeable /home makes that hard because all the defaults are stored there [18:44:28] this is why ops people hate stuff like npm [18:44:41] among other reasons, I'm sure [18:45:01] I promise we'll have a respectable .deb as soon as possible after this [18:45:07] * Ryan_Lane nods [18:45:09] I had never deployed and had no idea what headaches it caused [18:45:30] why's the production version of reportcard running in labs, anyway? [18:45:51] it's really meant to puppetize/package/etc, so that we can move it to production [18:46:17] I'm going to see if I can manually install the glusterfs deb [18:47:52] ugh [18:47:55] didn't notice that [18:47:57] sorry [18:48:18] no worries. the first thing I check when a gluster mount won't mount is the glusterfs version [18:48:27] then I try to upgrade it [18:48:34] then I see it's running oneiric :D [18:48:50] so, how does it work now anyway? [18:48:53] do we still have autofs? [18:48:58] yes [18:49:01] for /home [18:49:06] it's a direct autofs mount [18:49:08] yeah, for /home [18:49:12] aha [18:49:15] all autofs mounts are direct now [18:49:20] so they all show up [18:51:41] I am so dumb [18:53:02] Ryan_Lane: tiny merge for ya https://gerrit.wikimedia.org/r/42996 . My patchset compared $::realm with 'wmflabs' instead of 'labs' [18:54:53] ah. I should have noticed that [18:55:06] or unit tests should have catches it :-D [18:55:10] catched [18:55:35] dinner time [18:55:42] will check it on my labs instance later on [18:56:08] milimetric: how hard would this be to install on another instance? [18:56:41] i'm not sure, for me it would probably be somewhat tricky because I don't have mad skillz :) [18:56:41] hm [18:56:47] but the same setup is up on kripke [18:56:49] it seems like the ppa for gluster has the needed version [18:56:59] running test-reportcard.wmflabs.org and dev-reportcard.wmflabs.org [18:57:31] from what I know of it, it's npm, apache, and supervisorctl [18:57:43] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 157 processes [19:00:22] milimetric: so.... [19:00:24] it's working again [19:00:25] but. [19:00:34] uh oh [19:00:38] :) [19:00:39] I had to use a deprecated ppa to install glusterfs [19:00:53] next time I upgrade gluster you guys are fucked [19:00:57] IF we reboot right? [19:00:58] so, move away from oneiric whenever you can [19:01:02] nah [19:01:05] rebooting is fine [19:01:17] if you upgrade gluster, we won't be able to ssh then? [19:01:22] yes [19:01:25] ok, got it [19:01:26] jeremyb: sorry, e mean .ssh [19:01:32] thank you so much, you're the man [19:01:39] yw [19:01:40] and we'll puppetize and get rid of this instance asap [19:01:47] I need to remove oneiric as an option [19:02:16] and natty [19:06:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 11.51, 8.58, 6.26 [19:19:59] back [19:20:13] I should have tested my change before sending it :D [19:20:42] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 151 processes [19:25:42] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 150 processes [19:38:43] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 8.42, 6.75, 5.55 [19:43:36] Disk /dev/vdb doesn't contain a valid partition table [19:43:37] blblbaba [19:43:43] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 157 processes [19:43:52] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 9.71, 8.26, 6.31 [19:46:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 9.34, 8.23, 6.28 [19:48:02] Ryan_Lane, did you restart labs-morebots or did puppet restart it? [19:48:25] (It's not clear to me from reading the manifest that puppet would restart it if it crashes, trying to understand that.) [19:54:08] Ryan_Lane: andrewbogott : can one of you merge https://gerrit.wikimedia.org/r/43013 ? Fix my lame mount {} calls in labs :/ [19:54:16] tested that one in an instance this time [19:54:35] andrewbogott: I think we can get bot to be restarted now by issuing @restart in the channel. [20:02:12] PROBLEM dpkg-check is now: CRITICAL on vumi.pmtpa.wmflabs 10.4.0.140 output: CHECK_NRPE: Socket timeout after 10 seconds. [20:03:42] PROBLEM Free ram is now: WARNING on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Warning: 11% free memory [20:07:02] RECOVERY dpkg-check is now: OK on vumi.pmtpa.wmflabs 10.4.0.140 output: All packages OK [20:07:45] andrewbogott: it didn't crash [20:07:50] !log testing test [20:07:51] Logged the message, Master [20:07:53] hm [20:07:54] weird [20:07:58] someone must have restarted it [20:08:06] part of the problem, though, is that it doesn't crash [20:08:18] a netsplit occurs, it doesn't realize it, and can't handle it [20:09:19] It looks like it was down at the point that petr was commenting. [20:09:49] but the process likely wasn't dead [20:09:51] it just wasn't in the channel [20:10:06] Hm… [20:10:41] netsplit means that it's in the channel on some subset of freenode but not on the rest? [20:11:33] yep [20:11:58] hooray, only m1 type instances are allowed now [20:12:00] and only lucid and precise [20:14:49] \O/ [20:15:14] will you have a look at the bug that prevent us from adding new puppet classes ? https://bugzilla.wikimedia.org/show_bug.cgi?id=43613 [20:15:19] that blocks git deploy deployment on beta :-] [20:15:42] sue [20:15:43] *sure [20:15:45] the Apaches boxes are applied an old puppet class which does not exist anymore. [20:15:51] * Ryan_Lane nods [20:15:53] so the /srv mount is never loaded on them :/ [20:16:02] though we could hack that by doing an INSERT in the database hehe [20:16:18] well, the interface works for me [20:16:21] what do you need changed? [20:16:37] some code so it works for me ? :-]]]]]]]]]]]]]]]]]]] [20:16:54] well, I was going to do the immediate fix, then do it in OSM [20:17:11] I need the role::applicationserver::appserver [20:17:41] and you can get rid of role::applicationserver [20:18:16] done [20:20:11] !log deployment-prep Migrated Apache box to use role::applicationserver::appserver instead of the old (and no more existent) role::applicationserver [20:20:12] Logged the message, Master [20:20:21] Ryan_Lane: thanks ! [20:20:26] yw [20:20:39] salt -g apaches puppetd-tv [20:20:57] isn't salt kind of a replacement for dsh ? [20:21:48] I AM DOOOOOOMEED [20:21:54] can't we replace puppet with shell scripts ? [20:21:58] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Package[wikimedia-task-appserver] is already defined in file /etc/puppet/manifests/mediawiki.pp at line 6; cannot redefine at /etc/puppet/modules/mediawiki_new/manifests/init.pp:14 on node i-0000031a.pmtpa.wmflabs [20:21:59] hehe [20:25:32] Ryan_Lane: could also use the role::applicationserver::imagescaler class sorry [20:25:56] oh no [20:26:03] role::applicationserver::imagescaler::labs [20:26:10] or both [20:26:13] done [20:26:14] this way we are fine :-] [20:29:46] so my instances were still using the pretty old imagescaler::labs which use mediawiki class [20:29:57] that conflicts with the role::applicationserver which have mediawiki_new :/ [20:30:22] ok [20:30:23] fixed it [20:30:30] now you can edit puppet groups [20:31:07] sexymonkeycrazylady [20:31:32] Ryan_Lane: what was wrong? [20:32:04] I was using an old way of limiting access [20:32:18] okkk [20:33:02] conflict fixed with https://gerrit.wikimedia.org/r/43016 [20:33:07] imagescaler::labs should die :-] [20:33:17] well, you can kill it now, then :) [20:33:51] but I need a role to include an additional class https://gerrit.wikimedia.org/r/#/c/43016/2/manifests/role/applicationserver.pp,unified [20:34:00] damn a tab instead of whitespace [20:34:25] ah [20:34:26] * Ryan_Lane nods [20:34:42] PS3 would do it. [20:34:50] I think I will end up applying for a position in ops [20:34:57] and have interaction with platform :-] [20:35:09] instead of having a position in platform and interacting with ops [20:35:57] https://gerrit.wikimedia.org/r/#/c/43016 PS3 should do it. [20:36:07] hashar: joke aside, we have this new SRE thing (that is still a bit vague IMO though) [20:36:11] merged [20:36:22] paravoid: yeah SRE is nice [20:36:33] paravoid: but honestly, I do not have that level of expertise as far as I understood the SRE position. [20:36:38] basically the idea was ops people embedded into other teams [20:36:47] I don't even understand what most of you are doing :/ [20:38:24] paravoid: kind of ottomata for the analytics team? [20:38:40] yeah, and Jeff for fundraising [20:38:48] that doesn't match to what *I* think SRE is [20:38:53] and the job position is a bit vague in that sense [20:39:01] that it is a start at least [20:39:10] to me SRE sounds like asher [20:39:27] (oh my god the sql query have made my graph to raise, here how to optimize the SQL properly) [20:39:43] i should read the position again [20:40:30] !log deployment-prep removing the phased out imagescaler::labs from apaches in favor of role::applicationserver::imagescaler [20:40:31] Logged the message, Master [20:42:48] hashar: I think different people understand different things [20:43:15] ah http://hire.jobvite.com/Jobvite/Job.aspx?j=o7k2Wfw9&c=qSa9VfwQ [20:44:03] Ryan_Lane: to answer your (rhetorical?) question: I have in fact had to clean up MediaWiki installations with no protections before [20:44:22] it's fucking terrible [20:44:42] It's not that bad with Special:Nuke enabled. [20:44:58] how did you clean up the ungodly number of users? [20:45:12] the last few times this happened to me, Special:Nuke didn't help [20:45:35] It doesn't help with the blocking much, but it does with the spam removal. [20:45:37] because the bot created about 10,000 users and each user edited 1-5 times [20:45:54] Anyway, that bug is turning in to a pissing contest with Nemo's latest reply. Sigh. [20:46:04] Bug commit [20:47:00] so, enlighten me [20:47:05] what *is* the real purpose of this? [20:47:16] https://meta.wikimedia.org/wiki/Research:Account_creation_UX/CAPTCHA [20:48:04] Matt's latest comment is a shorter version [20:48:08] so, our captchas don't do well against bots, so let's turn it off? [20:48:16] isn't the correct answer, let's fix it? [20:48:53] the only protection captchas offer is increasing the cost of the spam [20:48:54] Maybe. Anyway, no one is advocating we turn anything off permanently. It's a two hour test. [20:49:08] abou $1.3 per thousand specifically [20:49:37] StevenW: a test with what end goal? [20:49:54] To see how much of a barrier they really are for humans. [20:50:02] "This short experiment aims to explore how effective our current CAPTCHAs really are, and what impact they have on users." [20:50:18] seems clear to me [20:50:28] that answer is "to get data" [20:50:34] CAPTCHAs are not effective [20:50:37] although I am sold by the argument that two hours are not enough to get real results as spammers won't adapt [20:50:58] That is the consensus so far paravoid. [20:51:06] getting data is a step to an end-goal [20:51:07] you're not dealing just with bots here, you're also dealing with humans that control the bots [20:51:09] not an end-goal [20:51:29] are we going to replace captchas with something more effective? are we going to turn them off? [20:51:59] Both are options, as are simply adding features alongside the current captchas, like a refresh button. [20:52:04] Impact to ux is awful - imo we should look at betters ways like picking up browsing patters that normal people use. [20:52:13] getting more info before considering options is a fine end goal imo [20:52:39] there's a limited set of end-goals possible [20:52:52] disabling captchas being the only sane one [20:53:24] you can't know that [20:53:31] what alternatives are there? [20:53:31] maybe our captchas are too difficult to read [20:53:41] then we should improve them [20:53:45] using A/B tests [20:53:46] maybe we need an audio alternative along the captcha (like some other captchas do) [20:53:55] again, we can do that with A/B tests [20:53:55] maybe we need a refresh button as StevenW said [20:54:01] changing readability has already been a help for sure [20:54:02] also A/B tests [20:54:10] we worked with Aaron on that just recently [20:54:11] there's a theme to what I'm saying [20:54:29] use A/B tests [20:55:08] Split testing is hard to do properly [20:55:17] we could do it in E3 [20:55:27] I would have suggested we a/b it using the method we're already using for the rest of the interface modifications, but without a config change the server will still say fuck off [20:55:45] so options like hiding the captcha for some portion of users doesn't work [20:56:25] you can send users to different versions of a form [20:56:37] or just output different versions [20:56:50] with % chances of it loading [20:56:54] those pages don't hit cache [20:56:55] Ori and Spage should be here if we're going to talk implementation changes. :) [20:57:01] Is there a way of evaluating after the fact whether a given registration was 'valid' or 'spammer'? [20:57:14] boo I can't use two role::applicationserver::* classes :/ I got a conflict with role::applicationserver::common being a dupe :/ [20:57:19] andrewbogott: they aren't really testing effectiveness for spammers, though [20:57:35] andrewbogott: they are testing whether it was easier for a legitimate user [20:58:06] right, but how do we know whether we made it easier for legitimate users or just for bots? [20:58:10] <^demon> How many failed captchas do we even do in a day? It's very easy to say "we're only disabling for 2h, so spammers are unlikely to notice and start using tools." [20:58:15] andrewbogott: we look at block rates post-registration for all the account creation tests. If needs be we can also filter the block log comments for specific reasons. [20:58:15] A simple increase in registrations during the test period doesn't tell us which it is. [20:58:35] Oh! Ok, so 'block rates' do provide a metric for legit vs. bogus regs. [20:58:36] <^demon> But I'm curious if we already get spammed with 800k attempts a day or some crap that will immediately start flooding (and only don't currently *due to* the captcha) [20:58:50] I'd argue the more important value is how many legit users do we block because they can't fucking understand it [20:58:52] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 3.31, 3.58, 4.72 [20:58:58] ^demon: my argument is that this is an ineffective test [20:59:13] <^demon> I don't disagree--too many unknowns. [20:59:37] ^demon: Chris told us earlier this week he has access to the failure rate. [20:59:39] and that it just opens us up to spammers [20:59:59] ^demon: many bots check to see if a site has protections before attempting [21:00:06] <^demon> Indeed. [21:00:26] So… null hypothesis is: "Disabling capchas has no effect on the ratio of legitimate to spammer registrations" [21:00:36] Isn't rejecting or confirming that a useful question? [21:00:47] Or… do I not understand what we're testing? [21:00:59] only if bots notice we've turned it off for a few hours [21:01:16] but even then we learn something. [21:01:27] that removing captchas make it easier to create accounts? :) [21:01:32] If disabling increases the legit:spam ratio then we have demonstrated that capchas are filtering out legit regs. [21:01:42] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: OK - load average: 3.70, 4.23, 4.95 [21:01:59] Even if we later determine we need capchas we've still learned what they cost us in terms of inhibiting real users. [21:02:08] Ryan_Lane: we learn _how much_ easier it makes it. [21:02:29] that's only useful knowledge if we actually think we're ever going to turn off captchas [21:02:48] Not necessarily. [21:02:50] No, because if the ratio /doesn't/ change then we know to stop caring about capchas immediately. [21:03:00] It is also useful for justifying further work to improve them. [21:03:00] If it does then there's a problem of some sort to be solved. [21:03:07] Ratio isn't useful [21:03:16] You'd need a flatline on both sides.... [21:03:24] any promotion/events etc would change it [21:03:36] um... [21:03:55] Damianz, if you're asserting that the signal is too noisy to learn anything, then that means that /no/ test is useful. [21:04:21] andrewbogott: A/B tests are [21:04:25] If you can show a user is legit afterwards and use /that/ as the metric then it's useful [21:04:48] as it has both conditions running at the same time [21:05:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [21:05:37] I have to go, but what I want to know is: what are we comparing? [21:06:06] If not the ratio of legit to fake users then what? [21:06:12] the info they want to gather is: how much does disabling the captchas increase the number of legit user creations [21:06:14] look, all the accounts created here are spam: http://wiki.wikimedia.org.es/w/index.php?title=Especial:CambiosRecientes&useskin=monobook&uselang=en&limit=500 [21:06:14] Just the total # of registrations with and without capchas? [21:06:34] and this wiki has a captcha, just the weak math one [21:06:52] Ryan_Lane, they would need to have it at least for 24h [21:07:00] given the round-the-clock differences [21:07:00] Ryan_Lane: OK, as long as someone is filtering for legit, I think that's roughly the same as what I'm arguing for... [21:07:07] preferably for a week... [21:07:08] Anyway… lunch [21:07:56] I suspect the enwiki admins will start blocking new accounts with abusefilter to stop the bots [21:08:16] i'm reading some threads about /home, it must not be used? [21:08:34] not in labs [21:08:37] is there any other place to put the bot code? [21:08:42] yes [21:08:45] /data/project [21:08:48] /data [21:08:48] project data [21:08:55] not data [21:09:11] heh [21:09:23] lool [21:09:34] you want to use: /data/project [21:09:36] typing while eating dorritos is ard [21:09:46] someone should add the info on https://labsconsole.wikimedia.org/wiki/Help:Move_your_bot_to_Labs or somewere else [21:09:59] Damianz: can people add their own directories there, or does someone with sudo need to do so? [21:10:14] Alchimista: be bold :) [21:10:15] sudo iirc [21:10:27] * Ryan_Lane nods [21:10:42] Alchimista: Damianz or petan can likely help you get set up [21:10:47] I'd actually like to change . to be group owned by project for stuff like web/bots so mkdir works but rm doesn't... or provide a setuid script to do it [21:10:47] Damianz: who else are the bots admins? [21:10:56] no idea [21:11:16] we really need to move the nr instances to another project so data is seperate... or make it so the mount is not implicit [21:11:25] yes [21:11:43] those should likely go in bots-production [21:11:50] we really need to get a formal project going [21:12:01] so i must use mkdir on /data/project or should it already be there? [21:12:02] can you warp time so I get 72hours in a day? [21:12:12] that's a mount point [21:12:12] Damianz: that's my problem too :) [21:12:14] it should be there [21:12:31] Alchimista: someone will need to make a subdir for you [21:12:42] Alchimista: what would you like it to be called? [21:13:02] actually [21:13:09] most the stuff people have sudo access on [21:13:10] Ryan_Lane: does it has to be to all my bots, or can be one per bot? [21:13:15] ah. true [21:13:27] Alchimista: one per bot is likely best [21:13:47] ok, so it can be named "aleph" [21:13:51] need to merge puppet stuff and re-install mysql etc in a normal way... though ideally we'd setup them in pairs replicated and be sexy [21:14:17] Alchimista: actually, if you log into bots-4, you can do this yourself [21:14:28] ok, so let me try [21:14:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 5.90, 5.75, 5.35 [21:14:35] sudo mkdir /data/project/aleph [21:14:43] you'll need to chown it as well [21:16:14] * Damianz pats hashar on the nose [21:17:46] Damianz: ;-:] [21:17:56] Ryan_Lane, is there some uid range which can be used locally by the instances knowing it won't be assigned by ldap? [21:18:25] yes [21:18:25] I'd like to have each tool have its own user [21:18:34] make it a system account [21:18:38] and it will use a local range [21:19:53] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 4.46, 5.25, 5.20 [21:20:31] Ryan_Lane: is there any standard for chown, or is it a simple chown -R? [21:20:43] Ryan_Lane: I had a duplicate definition in beta applicationserver role :/ Fixed it with https://gerrit.wikimedia.org/r/#/c/43023/ hopefully. [21:20:47] if you could look at it [21:21:00] (yeah I know you got ton of ping, that is because you are a rock star I guess) [21:21:25] he is :) [21:21:44] Damianz: does your labs/nagios-builder repo pass the pep8 tests ? [21:21:55] Damianz: I can make Jenkins to vote -1 whenever pep8 fail (change is https://gerrit.wikimedia.org/r/#/c/39694/ ) [21:22:20] Platonides: you could also create users via labsconsole [21:22:39] so, either create system accounts, or make service accounts via users in labsconsole [21:22:52] labsconsole accounts would exist globally [21:22:58] Yeah it passes pep8 as far as flake8 is concerned anyway [21:22:59] Ryan_Lane, the plan was to make a suid tool to create the tool folders and accounts [21:23:08] I'm ignoring some types.... though so we'll see [21:23:14] is it possible for it to create an account through labsconsole? [21:23:20] ah [21:23:21] or would the tool need to solve a captcha for that? :P [21:23:28] heh [21:23:41] in your case it seems like system accounts would be better [21:23:53] there's 2 sides to this argument [21:24:06] system accounts == yay puppet is easy, ldap == yay everything works across instances [21:24:22] Doing ldap currently sucks donkey ass though.... creating accounts is painful and managing them is bleh [21:24:51] in /etc/login.defs currently shows normal UIDs in 1000-60000 and system uids as 100-999 [21:25:00] would it be safe to pick numbers above 60000 then? [21:25:42] If we plan to have less than 50000 users [21:25:52] 59000* [21:26:21] hm [21:26:33] system accounts are listed as 100-999? that's not good. [21:26:34] heh [21:26:43] I think our accounts start at 500 [21:26:45] #SYS_UID_MIN 100 [21:26:45] #SYS_UID_MAX 999 [21:26:47] which was the old ubuntu default [21:26:54] we should probably fix that on the system [21:27:14] our uids are holdovers from svn, as well [21:27:17] Ryan_Lane: Do you think it could, in theory be possible to use central auth and integration ldap (so ladp account is created if non exists or such) -- or when oauth is supported, support that so people /can/ just have 1 login... I think engagement would be better across the board really [21:27:57] keystone... [21:28:00] that's the real problem [21:28:17] <^demon> *puke* [21:28:19] you'd still need to log into labsconsole [21:28:25] <^demon> Sorry, involuntary response when I see the words "central auth" [21:28:43] ^demon: Well central auth should diaf... but it kinda works [21:28:59] <^demon> Yeah, and a buzzsaw "works" for removing a leg. [21:29:03] though sso doesn't really work [21:29:03] <^demon> Doesn't make it pretty :) [21:29:33] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: OK - load average: 4.13, 4.38, 4.88 [21:29:40] it works in the sense that a curl with the plain user details to commons api from every wiki would do the same job :D [21:29:52] oauth would make it pretty if someone would decide on a spec and fucking agree [21:29:53] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 3.57, 4.02, 4.66 [21:31:15] used uids in ldap seem to be in 1005-2782, plus 500-582 [21:31:37] oh? [21:31:58] brion:x:500:550:Brion VIBBER:/home/brion:/bin/bash [21:32:03] yeah [21:32:04] andrewbogott: I installed nginx on mwang-dev..pmtpa.wmflabs. puppet code: http://justpaste.it/1r2k [21:34:42] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 151 processes [21:37:47] Ryan_Lane: could you please get a look at the https://gerrit.wikimedia.org/r/#/c/43023/ that creates role::applicationserver::appserver::beta :-D [21:38:10] removes some labs hacks :-D [21:38:42] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 2.97, 3.46, 4.47 [21:39:34] merged [21:40:07] thanks / yw [21:40:50] !log deployment-prep migrating apaches box to the new role::applicationserver::appserver::beta (replaces both appserver and imagescaler) [21:40:52] Logged the message, Master [21:42:05] I suspect memcached died again [21:42:38] yeah memcached died yet again on virt0 :-( [21:43:11] Ryan_Lane: memcached died on virt0 :-( [21:43:24] :( [21:43:45] restarted [21:43:53] PROBLEM Current Load is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:44:33] PROBLEM Current Users is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:45:13] PROBLEM Disk Space is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:45:53] PROBLEM Free ram is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:47:04] ah puppet running [21:47:06] \O/ [21:47:23] PROBLEM Total processes is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:47:39] !log deployment-prep running puppet on apache boxes to get the new role::applicationserver::appserver::beta class [21:47:41] Logged the message, Master [21:49:33] PROBLEM dpkg-check is now: CRITICAL on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: Connection refused by host [21:50:13] RECOVERY Disk Space is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: DISK OK [21:50:42] one bug fixed [21:50:53] RECOVERY Free ram is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: OK: 898% free memory [21:51:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 3.54, 4.75, 5.01 [21:51:49] Ryan_Lane: beta now has /srv/ on all mediawiki related instances. Thanks for your help! [21:51:54] yw [21:52:23] RECOVERY Total processes is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: PROCS OK: 84 processes [21:52:39] that for all the lovely small files gluster has titfits over [21:53:53] RECOVERY Current Load is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: OK - load average: 0.07, 0.66, 0.63 [21:54:33] RECOVERY Current Users is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: USERS OK - 0 users currently logged in [21:54:34] RECOVERY dpkg-check is now: OK on deployment-integration.pmtpa.wmflabs 10.4.1.61 output: All packages OK [21:54:54] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 21% free memory [21:56:47] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 2.97, 3.87, 4.57 [21:57:13] and it is bed time for me ! [21:57:17] see you folks tomorrow [22:18:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [22:27:44] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [22:31:51] mike_wang: Does it work as a proxy? Or do you need a public IP to test that? [22:39:55] andrewbogott: I think I need a public IP to test it. Now I don't know how to test it. [22:41:19] mike_wang: Ok, lemme up the IP quota for that project so you can try it. [22:43:10] mike_wang: Actually, looks like there are a bunch quotaed so you should be able to assign an IP and DNS name right away. [22:43:18] https://labsconsole.wikimedia.org/wiki/Special:NovaAddress [22:46:12] mike_wang: Also you should go ahead and prepare your patch and submit it to gerrit. Let me know if you don't know how to do that. [22:50:42] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 156 processes [22:59:13] * andrewbogott gently nags Ryan_Lane to look at ldap on nova-precise2 [22:59:44] andrewbogott: http://http://208.80.153.147, I see "Welcome to nginx!". Also I see my access log from instance-proxy:/var/log/nginx/access.log. I think it works. [23:00:00] andrewbogott: ah, right [23:00:46] mike_wang: How would I access something like http://puppetdoc2.instance-proxy.wmflabs.org/ ? [23:05:02] mike_wang: What I want (ultimately) is a puppet class that we can drop onto a production machine and use it as a proxy to provide web access to instances that lack a private IP. So, that should be your test case :) [23:45:02] Ryan_Lane: If I want to add an image to labsconsole, is mediawiki set up to manage that or should I put the image on commons or should I just copy it onto the server by hand, or… ? [23:45:39] Ryan_Lane: nm, I think I just answered [23:51:56] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 154 processes