[04:48:30] Hello, I'm experiencing a strange(?) issue at Labs ... if I run a script directly, it seems to operate just fine, but if I submit is as a job it can't find a dumpfile from /public/datasets/public/ that is uses [04:54:29] * Hazard-SJ got disconnected [10:15:17] petan: hi, it is allowed on labs to hide a folder from all users? [10:15:46] yes [10:16:03] it's recommended for your confidential data, passwords etc [10:16:37] I think we should even set up some way to encrypt files / folders... but I am not the person who decide on this [10:37:58] !log tools tools-login: rm -f /var/log/exim4/paniclog (OOM) [10:38:00] Logged the message, Master [10:39:07] !log tools tools-mail: rm -f /var/log/exim4/paniclog ("User 0 set for local_delivery transport is on the never_users list" => probably artifacts from Coren's LDAP changes) [10:39:08] Logged the message, Master [11:34:28] !log tools tools-exec-08: Rebooted due to not responding on ssh and SGE [11:34:29] Logged the message, Master [11:42:04] !log tools tools-mail: Rerouted queued mail (@tools-login.pmtpa.wmflabs => @tools.wmflabs.org) [11:42:06] Logged the message, Master [12:31:30] I've just had an influx of mail routing error messages to my inbox. [12:34:22] TheLetterE: I'm flushing some stuck queues from yesterday. [12:34:32] Ah, okay scfc_de :) [13:53:09] andrewbogott_afk: Can you remind me where in git you stuffed manage-nfs-volumes? [14:11:29] Coren: I think puppet/modules/ldap/files/scripts/manage-nfs-volumes-daemon [14:14:35] andrewbogott: Yeah, found it since. Thanks. :-) [14:14:58] And, BTW, I'm puppetizing now but it's working most excellently. [14:15:09] great! [14:15:42] andrewbogott: We need to figure out what to do with the /keys though; I can probably rsync them from the current store, but we'll have to figure out the "right" way soon. [14:16:10] so… they're generated out of ldap with a cron, right? [14:16:17] We can do the same in eqiad/nfs can't we? [14:16:49] maybe in exactly the same way, I don't think it's filesystem-specific. [14:21:42] I'll write a patch. Where is /keys mounted on the nfs server? [14:25:06] it's glusterfs yuck isn't it [14:36:54] andrewbogott: On the old or new one? Because it's not mounted at all on the old one (still uses the gluster export) [14:37:09] andrewbogott: On the new one, it's /srv/keys [14:37:12] ne one [14:37:25] ok, srv/keys... [14:39:36] Coren, are you using the same class on both servers? openstack::project-nfs-storage-service? [14:39:48] andrewbogott: Yes. [14:40:09] andrewbogott: I've set $site on the node entries though so you can discriminate against that. [14:40:39] Coren: better for me to switch the code off based on that, or better to create /srv/keys on the old server as well? [14:41:29] andrewbogott: Hm, on the old server /srv/* matches labs projects, so that'll make things odd if there is (ever) a 'keys' project. So probably better turning it off. [14:41:47] Or switch to /public/keys on the old one. [14:41:57] ok [14:41:58] But that'd be a noop anyways since that's not actually exported. [14:42:03] yep [14:46:45] Coren: what is the hostname of the new nfs server? I want to log in and see what my script does :) [14:46:55] andrewbogott: labstore1001 [14:47:11] Want to give https://gerrit.wikimedia.org/r/#/c/113955/ a glance first? [14:58:49] Coren, the cron defined in my patch isn't getting applied. Any ideas? [15:00:12] andrewbogott: Oh lol. Because you put it in the wrong class? :-) [15:00:35] ah, so I did [15:00:59] Sorry I didn't notice on review, that's actually in the fold. :-) [15:07:21] andrewbogott: Your cron job is there this time. [15:07:30] yep! [15:07:34] let's see if it does anything [15:11:28] Coren: seems to be working [15:12:25] So it does! Yeay. I'm about to axe autofs from eqiad lab client and replace it with NFS mounts; once that goes through we should be golden. In particular, it means that the eqiad bastion should even allow logins right. [15:12:43] andrewbogott: Opinion time: [15:13:29] andrewbogott: Removing autofs from ldap::role::client::labs is easy, but the fstab entries would seem out of place there. Is there another class/role that is inserted in the instance node definitions I could use instead? [15:13:52] I certainly don't want to add a whole pile of conditional shit into base [15:15:23] There isn't anything that's currently included. Putting a conditional into base is simplest... [15:15:35] Lemme see where those classes are inserted, though, maybe we can just add another one conditional on site. [15:15:38] * Coren ponders. [15:16:23] I would like us to insert unconditionally a class like 'role::labsinstance' on every instance node if we could. That makes it the "right" place for all that stuff. [15:16:53] that should be easy enough, I just need to figure out where that happens... [15:17:19] Right now, ll is added; I'll create the role::labsinstance class and include ldap::role::client::labs in it. It'll be a noop until we change that. [15:17:34] ok, I see where it is, very easy to add another class, either just for eqiad or in both. [15:18:12] It's in a hand-edited config. I'll change it for eqiad right now. [15:18:55] make that role::labs::instance [15:19:48] consider, though, that if instances are migrated... [15:20:10] we'll have to twiddle their ldap settings when the vms are copied over. [15:20:23] Not ideal... [15:20:36] andrewbogott: We can even do it preemptively actually if the 'new' class is already present. [15:20:51] Check https://gerrit.wikimedia.org/r/113967 [15:20:52] true [15:21:30] Hang on, one file got left out of the commit. [15:22:58] With that changeset, if you include role::labs::instance in the node def rather than ldap::role::client::labs it becomes a noop. [15:23:26] But now we have a natural "this is a labs instance" role. [15:26:21] That's going to make things much cleaner going forward whenever we change how bits of labs work for instances. [15:39:43] andrewbogott: Did you do something to bastion-eqiad? [15:39:51] I don't think so [15:41:10] Coren, want me to build a fresh one? [15:41:42] I needed to check something on it first, but also I'd like to know why it apparently died. [15:44:15] andrewbogott: ... I'm not seeing it in Manage Instances anymore. [15:44:48] You may need to log out and in again, I restarted memcached a few hours ago [15:45:08] Ah. [15:45:29] but, yeah, instance seems dead. That's worrisome [15:47:16] It's not responding to reboots either. [15:47:28] Dun-dun-duuun! [15:48:25] Ah, no, it was just slow. [15:48:59] Back to active. It /looks/ alive, but I'm not connecting to it. [15:49:03] * Coren ponders. [15:49:11] Actually, it looks like the issue is network-level. [15:50:22] I'm getting no SYN/ACK [15:50:55] * Coren goes to check on labsnet [15:51:11] I can ping bastion2-eqiad.wmflabs.org, though, which I just created... [15:51:49] Huh. [15:52:30] Yet, the VM rebooted and shows a login prompt on console so it can't be all dead. [15:52:56] andrewbogott: Different security group though. [15:53:11] oh... [15:53:17] I bet this is my fault. [15:53:25] hell, the web security group doesn't even seem to exist anymore. :-) [15:53:39] I copied over the security groups from tampa, maybe that caused the previous 'default' group to cease to exist [15:53:48] Heh. No matter, I just readd it. :-) [15:55:24] May be that that instance is a goner, if security groups use ids... [15:55:30] Yeah, looks like. [15:55:41] It's not taking the new group. [15:55:57] I'll just move the public IP [15:56:02] ok, well… unless this happens again, let's not worry about it. Go ahead and use bastion2-eqiad.wmflabs.org [15:56:08] it has an ip already, I think [15:56:28] It does. I hadn't noticed you adding it. [15:56:56] presuming you can log in, which I can't at the moment [15:56:59] andrewbogott: My root key doesn't work on it though. [15:57:08] So not done with puppet. [15:58:54] I'm going to rebuild bastion as well, from a different image… the image used for bastion2 is a new one, mostly untested [16:01:11] Are new instances using the new role now? [16:02:12] yes [16:02:16] should be at least [16:03:33] yep, they are. [16:03:37] on eqiad [16:06:10] Coren, root@bastion-eqiad.wmflabs.org is working. Might be that the new image is not working (although it is, in tampa…) [16:07:40] So it is, but it's odd that it still got autofs applied. [16:08:04] * Coren tries to find out why. [16:26:00] Coren: I'm going to punch out in 15 or 20… any other points of interest before I go? [16:26:33] andrewbogott: I seem to be okay. Give me a minute to see why it doesn't look like the "right" class was included first? [16:26:41] yep [16:26:48] andrewbogott: I'm about to push a change that'll confirm whether it is. [16:27:24] Want to see if there's anything stupid in https://gerrit.wikimedia.org/r/#/c/113976/ while jenkins gets to it? [16:27:47] ldap entry is: https://dpaste.de/atyP [16:29:06] Clearly correct. So the presence of autofs is something else. [16:29:20] It's probably set in the initial image. [16:29:37] In fact, I'm sure it is, since we have user keys in new instances before puppet runs... [16:30:23] Ah, that's an issue for the future then. [16:30:25] Coren, you want to mount everything in every project? A lot of those things look specific to tools. [16:31:16] Also… currently shared homes and shared projects are switchable per project. You're planning to just have shared volumes for every project in eqiad? [16:31:38] (That's fine, I think… making it switchable was mostly to easy Gluster's load. But we'll need to update the web interface accordingly.) [16:32:08] No, they're all for everyone. /public/dumps and /public/keys already exist; /data/project and /home also. The only two new things are /public/backups and /data/scratch which are meant to be for all projects (the former is the destination of DB backups, the latter is just happy fun storage) [16:32:22] 'k [16:32:54] But yeah, I forgot that shared storage was optional, but I can't think of no reason why it still needs to be. The only "cost" is a directory on the fileserver if it's not in use. [16:33:09] ... too many negatives but you get my point. [16:33:28] ok [16:33:38] There is no price to be paid for more "volumes" in our case. [16:33:53] So, tomorrow I'll figure out how to build non-broken images, and see if I can make one that comes with a stock nfs mount rather than a stock autofs/gluster mount [16:34:34] andrewbogott: If the mount options are okay ('hard,bg' being the important ones) a mount for /public/keys is perfectly safe to add early. [16:34:51] ok [16:37:50] btw the dumps on eqiad (/public/datasets) currently don't work (directory is empty). But as we are hopefully moving soon I guess it doesn't matter :-) [16:38:11] ahm tampa [16:44:32] ok, I'm out -- g'night all [17:52:49] !log cvn Setting up cvn-apache3 (apt-get install php5-common libapache2-mod-php5 php5-cli) [17:52:51] Logged the message, Master [17:52:54] Coren: Thanks [17:53:14] ... my pleasure. What for? :-) [17:53:15] That did the trick for cvn-apache2 and cvn-app1, the old instances are operational again and so are the bots [17:53:32] andrewbogott_afk: Thank you as well [17:53:34] sorry :) Both of you are wonderful, but andrewbogott_afk seemed to be the one having done it this time. [20:27:14] !log integration Upgraded npm to v1.4.3 on slave02 and slave03 to fix ssl certificate errors [20:27:16] Logged the message, Master [20:27:45] !log integration Installed grunt-cli on slave02 and slave03 to broken jenkins jobs for mwext-VisualEditor, oojs-ui, oojs-core [20:27:46] Logged the message, Master [20:42:01] !log Deleted cvn-apache3, re-creating as cvn-apache4 with 'web' security group [20:42:01] Deleted is not a valid project. [20:42:07] !log cvn Deleted cvn-apache3, re-creating as cvn-apache4 with 'web' security group [20:42:08] Logged the message, Master [20:45:18] !log cvn Setting up cvn-app2 to become an app server. Installing same packages as on cvn-app1 (php5-cli mono-complete mysql-server mysql-client python-twisted-words python-mysqldb subversion) [20:45:19] Logged the message, Master [23:12:54] any labs root around? [23:22:54] petan: Around? [23:26:24] scfc_de: Around? [23:41:15] hoo: Yes. [23:42:00] scfc_de: From tools-exec-08 (which you rebooted today) it's no longer possible to access the database replicas [23:42:33] local-delinker@tools-exec-08:~$ sql enwiki [23:42:33] ERROR 2003 (HY000): Can't connect to MySQL server on 'enwiki.labsdb' (110) [23:42:54] hoo: Ah, forgot to restart iptables. Moment, please. [23:43:54] Already wondered about the output of iptables --list [23:44:27] (I don't have root so it failed to load the kernal module) [23:45:00] hoo: Works now. Caution: "iptables -L" is always empty; you need to specify "-t nat" for the NAT table. [23:45:26] scfc_de: I don't have permission to see anything anyway [23:45:53] but it's still crying for kernel modules (and certain other fatal errors at times... so it's always worth checking) [23:46:21] You're right; I need to sudo for querying as well.