[03:47:45] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 0 minutes) [04:01:06] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 13 minutes) [04:14:27] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 26 minutes) [04:27:52] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 40 minutes) [04:41:18] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 53 minutes) [04:54:40] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 67 minutes) [05:08:01] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 80 minutes) [05:21:21] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 93 minutes) [05:34:43] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 107 minutes) [05:48:04] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 120 minutes) [06:01:25] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 133 minutes) [06:14:50] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 147 minutes) [06:28:12] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 160 minutes) [06:41:38] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 174 minutes) [06:55:09] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 187 minutes) [07:08:38] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 201 minutes) [07:22:03] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 214 minutes) [07:35:24] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 227 minutes) [07:48:46] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 241 minutes) [08:02:07] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 254 minutes) [08:15:29] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 267 minutes) [08:28:50] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 281 minutes) [08:42:11] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 294 minutes) [08:55:37] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 308 minutes) [09:08:59] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 321 minutes) [09:22:25] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 334 minutes) [09:35:49] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 348 minutes) [09:49:14] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 361 minutes) [10:02:35] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 375 minutes) [10:16:01] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 388 minutes) [10:29:31] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 401 minutes) [10:42:56] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 415 minutes) [10:56:17] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 428 minutes) [11:09:39] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 442 minutes) [11:23:00] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 455 minutes) [11:36:21] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 468 minutes) [11:49:42] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 482 minutes) [12:03:03] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 495 minutes) [12:16:29] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 508 minutes) [12:21:37] That's not annoying [12:29:50] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 522 minutes) [12:43:11] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 535 minutes) [12:56:36] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 549 minutes) [13:10:01] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 562 minutes) [13:23:27] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 575 minutes) [13:36:54] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 589 minutes) [13:50:29] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 602 minutes) [14:03:59] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 616 minutes) [14:17:30] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 629 minutes) [14:31:03] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 643 minutes) [14:44:40] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 657 minutes) [14:58:08] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 670 minutes) [15:11:39] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 684 minutes) [15:25:05] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 697 minutes) [15:38:42] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 711 minutes) [15:52:03] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 724 minutes) [16:05:29] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 737 minutes) [16:18:51] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 751 minutes) [16:32:13] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 764 minutes) [16:45:35] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 778 minutes) [16:58:56] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 791 minutes) [17:12:26] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 804 minutes) [17:25:47] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 818 minutes) [17:39:14] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 831 minutes) [17:52:40] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 845 minutes) [18:06:06] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 858 minutes) [18:19:37] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 872 minutes) [18:33:13] Warning: There are 1 users waiting for shell, displaying last 1: Shaohong (waiting 885 minutes) [19:39:52] [bz] (NEW - created by: Chris McMahon, priority: Unprioritized - normal) [Bug 47479] beta cluster is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=47479 [20:31:50] chrismcmahon: will do a quick check at the beta cluster :D [20:33:55] !log deployment-prep Apache down on both apaches instances [20:33:58] Logged the message, Master [20:45:16] [bz] (ASSIGNED - created by: Chris McMahon, priority: Unprioritized - normal) [Bug 47479] GlusterFS deployment-prep-project has too many errors -> beta cluster is down - https://bugzilla.wikimedia.org/show_bug.cgi?id=47479 [20:45:17] [bz] (NEW - created by: Antoine "hashar" Musso, priority: Unprioritized - major) [Bug 47425] glusterFS could not allocate memory - https://bugzilla.wikimedia.org/show_bug.cgi?id=47425 [20:47:38] hashar, what instance are you seeing these problems on? [20:52:30] andrewbogott: any instance on the deployment-prep cluster :-] [20:52:42] andrewbogott: the Gluster volume must be corrupted somehow [20:52:52] OK -- it looks to me like gluster is behaving properly and those files were broken some time ago. Do you think otherwise? [20:53:04] bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=47479 [20:53:13] and an example log: https://bugzilla.wikimedia.org/attachment.cgi?id=12155 [20:53:20] also got some memory allocation issue last week [20:53:31] I will just get rid of Gluster entirely :-] [20:53:38] Do I understand correctly that GlusterFS sucks? [20:53:51] I have no idea whether it sucks or if we sucks at using it [20:53:56] hmmmmmm, mail [20:53:58] the end result is that it is not reliable :-] [20:54:09] hashar: maybe you can use Coren's new stuff? [20:54:16] andrewbogott: I don't really know when the files got broken though [20:54:33] hashar: Would it be acceptable to just rm them? [20:54:53] Who names the Coren? [20:54:55] :-) [20:55:08] hashar: what Labs are going to use instead of Gluster? [20:55:27] andrewbogott: yeah you can delete any .log file :) [20:55:34] vvv: I'm just done setting up a kickass NFS server with massive redundancy. [20:55:48] andrewbogott: for the deployment-prep-project volume under /logs/ [20:55:55] andrewbogott: if that fix the issue that would be nice :-D [20:56:04] Coren: jeremyb_ talked about some project to replace gluster [20:56:12] hashar, can /you/ delete them? Or does that error out? [20:56:27] vvv: I don't know; But I will make it so that beta does not use any shared disk anymore :-D [20:56:37] andrewbogott: ?????????? ? ? ? ? ? xff.log [20:56:40] andrewbogott: will try :D [20:56:41] Gluster has issues that really make it unsuitable in our use case (lack of multitenancy and poor scaling with the number of volumes) [20:57:36] andrewbogott: can't remove them : rm: cannot remove `/data/project/logs/web.log': Input/output error [20:57:47] hashar: OK… I'll give it a try shortly [20:58:02] andrewbogott: maybe a split brain [20:58:05] Wait, will that storage be mounted on the machines over NFS or will the disk images accessed through NFS? [20:58:15] got errors like /logs/cli.log: gfid different on subvolume [20:58:31] hashar: We have 'quorum' turned on so we don't get splitbrains anymore. That's why I think those files have been like that for ages. [20:58:56] ahh possibly [20:58:59] vvv: Storage over NFS, although I'm not entirely certain what the distinction you stated means. [20:59:10] andrewbogott: feel free to delete them on the volume so :-D [21:00:32] hashar: I just replied for you on labs-l. [21:01:05] hashar: tl;dr: do you want to try the NFS server? [21:01:34] andrewbogott: an example log http://bug-attachment.wikimedia.org/attachment.cgi?id=12156 :D [21:02:06] Coren: if you are looking to get one more project on it, sure :-] [21:02:39] Coren: but I still have to migrate out of the shared did just like we do in production [21:02:54] hashar, want to restart and see if things are any better now? [21:03:05] hashar: Well, the point is that you still /get/ shared storage in /data/project, but from a different server [21:03:26] andrewbogott: rebooting deployment-apache32 , we will see :-) [21:03:31] hashar: It's disruptive because it needs an outage for the copy and switch, but if it's a blocker for you already then that's not an issue. [21:04:34] andrewbogott: solved :-] [21:04:42] andrewbogott: so lucky to have you around on a sunday :-] [21:04:51] hashar: OK. If that happens again then it will be interesting... [21:05:11] andrewbogott: would you mind closing https://bugzilla.wikimedia.org/show_bug.cgi?id=47479 ? :-D [21:05:29] and you might want to say on labs-l that you fixed gluster [21:06:03] I didn't, really, I just erased those particular files by hand. [21:06:06] Coren: it is not really a blocker. GlusterFS works fine … when it works. It is just that it happens to break once a week or so and I can't fix it myself :-] [21:06:08] Not a general solution :( [21:06:17] andrewbogott: at least that fixed that specific bug hehe [21:06:30] hashar: Currently, the gluster failure case should just be that things switch to read-only. Rather than data corruption. [21:06:37] That's the idea, at least. [21:07:20] hashar, is the 'cannot allocate memory' thing still happening? [21:07:48] andrewbogott: no idea :/ [21:08:02] I think it was server side [21:08:10] since the client instance had lot of free memory [21:08:26] andrewbogott: Canonically, how is the manage-volumes daemon started? upstart? [21:09:46] yep [21:10:27] Coren, there's an upstart script pending in gerrit: https://gerrit.wikimedia.org/r/#/c/60083/ [21:10:46] …which presumes the presence of a user which doesn't actually exist yet. [21:11:52] Hm. Puppet already deploys glustermanager actually; but that seems... unseemly. :-) [21:12:31] Although creating 'nfsmanager' or somesuch for just that purpose seems just as silly. [21:13:14] yeah… I changed the name in that upstart script because calling it 'glustermanager' seemed weird. [21:13:27] Yeah. I'll add that to puppet then. [21:13:28] Creating a different properly-named user should be pretty painless [21:13:35] andrewbogott: also got an issue on apache/common-local/php-master/extensions/.git/modules/APC/refs/remotes/origin/master :D [21:14:02] can be deleted safely [21:14:17] thanks for showing up on a sunday :-] [21:14:24] I am off, really need to get to bed now *wave* [21:14:56] * andrewbogott waves [21:16:19] will follow up on labs tomorrow :-]  Thanks again [21:17:03] !log deployment-prep beta is up again. Apache2 could not start because the error log file was not accessible ( {{bug|47479}} ) [21:17:06] Logged the message, Master [21:26:04] Gah! [21:42:14] * Damianz pats Coren [22:04:35] hi [22:05:18] I had a working instance - but now it is down ... [22:05:32] at least apache is down [22:06:03] could this be due to some puppet based modification ? [22:06:57] it has role::puppet::self and role::lamp::labs [22:23:22] hmmm what's the best way to document an instance [23:15:46] I need to run sudo -u www-data /usr/bin/php install.php [23:16:43] I am propted for my pass but then I get an error that my user is not permitted to run the script as www-data [23:16:54] any ideas how I can do this [23:18:09] Coren: https://fr.wikipedia.org/wiki/Discussion_utilisateur:Akeron#Blocage_en_erreur.3F - was there a reason for him to block something outside 2000::/3? [23:40:45] Coren: packets from those addresses should never reach WMF, I think, those are bogons [23:46:48] Jasper_Deng: Reading [23:47:00] it seems the admin understands [23:47:22] but, I think there's something w/ BGP that ISPs use to block such packets [23:49:36] !logs [23:49:36] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs