[00:04:16] something is obviously fucked up with that script [00:05:41] so i herd u liek being pinged [00:06:51] heh [01:01:33] New patchset: Hashar; "/home/wikipedia/logs on mediawiki-logger service" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7304 [01:01:48] New patchset: Hashar; "stop apache when having nginx thumb proxy" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7253 [01:02:04] New patchset: Hashar; "ability to change thumbnail server name" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7255 [01:02:18] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7304 [01:02:18] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7253 [01:02:37] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7255 [02:14:39] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [02:22:39] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 11 MB (0% inode=80%): [02:38:00] great [02:38:12] labs /home/ is full :-( [02:45:29] PROBLEM Puppet freshness is now: CRITICAL on deployment-syslog i-00000269 output: Puppet has not run in last 20 hours [02:52:39] PROBLEM Total Processes is now: CRITICAL on incubator-bot2 i-00000252 output: PROCS CRITICAL: 935 processes [02:53:20] 05/15/2012 - 02:53:20 - Updating keys for laner at /export/home/deployment-prep/laner [02:57:19] 05/15/2012 - 02:57:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:03:20] 05/15/2012 - 03:03:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:03:45] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 42% free memory [03:35:19] 05/15/2012 - 03:35:19 - Updating keys for laner at /export/home/deployment-prep/laner [03:42:46] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [03:47:46] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [03:52:36] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [04:01:36] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [04:07:30] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:07:50] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:07:50] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:09:51] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 4.94, 9.76, 5.08 [04:12:41] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 95% free memory [04:12:41] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:14:51] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.55, 3.98, 3.87 [04:16:41] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:17:31] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:21:49] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:22:51] PROBLEM Total Processes is now: WARNING on incubator-bot1 i-00000251 output: PROCS WARNING: 555 processes [04:32:51] PROBLEM Total Processes is now: CRITICAL on incubator-bot1 i-00000251 output: PROCS CRITICAL: 830 processes [05:27:28] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [06:03:40] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 12% free memory [06:08:41] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [06:14:43] PROBLEM Disk Space is now: WARNING on labs-nfs1 i-0000005d output: DISK WARNING - free space: /export 612 MB (3% inode=80%): /home/SAVE 612 MB (3% inode=80%): [06:59:39] PROBLEM Disk Space is now: CRITICAL on labs-nfs1 i-0000005d output: DISK CRITICAL - free space: /export 512 MB (2% inode=80%): /home/SAVE 512 MB (2% inode=80%): [06:59:59] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 2.50, 4.12, 2.55 [07:04:39] PROBLEM Disk Space is now: WARNING on labs-nfs1 i-0000005d output: DISK WARNING - free space: /export 793 MB (4% inode=80%): /home/SAVE 793 MB (4% inode=80%): [07:04:59] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.66, 1.87, 1.99 [07:36:07] hi [07:38:42] hashar: mornin [07:38:49] how's the jet lag? [07:39:02] bad :( [07:39:31] I have slept from sunday 22:00 till monday 16:00 [07:39:42] then woke up this morning at 1:20 :-] [07:39:53] yikes [07:39:59] how is your? [07:40:08] okay I guess [07:40:24] yesterday was a difficult day but today is getting much better [07:40:40] it will hopefully keep improving [07:44:04] got a job for you paravoid :-D /home/wikipedia is supposed to be like the production one. Aka host logs, mediawiki clone and so on [07:44:27] Ryan created it on labs-nfs1:/export/home which is only 18G and almost full right now [07:46:12] so probably want to use another NFS share [07:51:41] !log deployment-prep cleaning out deployment-nfs-memc:/mnt/export/upload-back from thumb, lock dirs and related [07:51:55] pff [07:51:56] I'm still trying to get into labs-nfs1… :) [07:51:58] Logged the message, Master [07:52:06] labs-morebots: you are a lagger [07:52:34] paravoid: maybe reapply your auth keys on that server? [07:56:46] !log deployment-prep Deleted all of deployment-nfs-memc:/mnt/export/upload-back , it contained only thumbs [07:56:47] Logged the message, Master [08:01:02] 05/15/2012 - 08:01:02 - Creating a home directory for faidon at /export/home/testlabs/faidon [08:02:02] 05/15/2012 - 08:02:02 - Updating keys for faidon at /export/home/testlabs/faidon [08:02:44] yay [08:03:01] lacked a home ? [08:03:18] lacked a membership to the project [08:03:22] I knew that [08:03:35] I was just trying to find out in which project labs-nfs1 is [08:03:42] turns out, "testlabs" [08:18:44] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [08:26:55] gah [08:27:05] so, increasing that storage is very non-trivial [08:30:19] RECOVERY Disk Space is now: OK on labs-nfs1 i-0000005d output: DISK OK [08:31:01] paravoid: one possibility would be to use projectstorage.pmtpa.wmnet:/deployment-prep-project [08:31:09] and mount that as /home/wikipedia [08:31:41] as far as I know, Ryan just created /home/wikipedia wherever /home is (labs-nfs1?) [08:32:42] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 1532 MB (8% inode=80%): [08:53:17] !log deployment-prep petrb: updating to head [08:53:19] Logged the message, Master [08:55:11] RECOVERY Free ram is now: OK on bots-2 i-0000009c output: OK: 20% free memory [09:02:47] hashar: I presume the incresed free space was done by you? [09:03:17] nop [09:03:22] petan probably [09:03:44] looks like he cleaned some bots on bots-2 [09:03:47] didnt touch it either , though wondering about exactly my /home being listed up there [09:04:41] but most likely just cause first in alphabetical order on that instance [09:07:58] mornin Ryan [09:16:48] Ryan_Lane: are you idling or are you up for some work-related question/discussion? [09:17:26] he is most probably idling / sleeping since it is 2am [09:34:04] arhhoh [09:38:52] !log deployment-prep hashar: Applying apache::service to dbdump [09:38:54] Logged the message, Master [09:44:23] ahh I love long timeouts [09:44:29] I have lost my internet connection for sometime [09:44:38] yet everything managed to reconnect ;-D [09:48:28] PROBLEM HTTP is now: CRITICAL on deployment-dbdump i-000000d2 output: CRITICAL - Socket timeout after 10 seconds [10:13:55] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 0.47, 6.57, 6.30 [10:19:02] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.49, 2.69, 4.67 [10:42:45] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [10:51:59] PROBLEM Disk Space is now: CRITICAL on nagios 127.0.0.1 output: DISK CRITICAL - free space: /home/dzahn 1240 MB (7% inode=80%): [11:05:22] !ping [11:05:22] pong [11:47:45] petan|wk: hi [11:48:14] petan|wk: the OnlineStatusBar extension would need to be setup on a specific and dedicated wiki :) [12:04:43] !instance is https://labsconsole.wikimedia.org/wiki/Nova_Resource:$1 [12:04:43] Key exist! [12:04:59] !instance I-000000ba [12:05:00] https://labsconsole.wikimedia.org/wiki/Help:Instances [12:06:42] !instance del [12:06:42] Successfully removed instance [12:08:04] !instance is need help? -> https://labsconsole.wikimedia.org/wiki/Help:Instances want to manage? -> https://labsconsole.wikimedia.org/wiki/Special:NovaInstance want resources? use !resource [12:08:04] Key was added! [12:08:18] !resource is https://labsconsole.wikimedia.org/wiki/Nova_Resource:$1 [12:08:20] Key exist! [12:08:27] !resource [12:08:27] https://labsconsole.wikimedia.org/wiki/Nova_Resource:$1 [12:08:48] !resource I-000000ba [12:08:48] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000000ba [12:12:43] !monitor is http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?host=$1 [12:12:44] Key exist! [12:12:47] heh:) [12:12:50] !monitor [12:12:50] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?host=$1 [12:22:55] !log deployment-prep replaced most occurrences of /mnt/upload to /mnt/upload6 [12:22:57] Logged the message, Master [12:22:59] petan|wk: thanks [12:23:00] :) [12:27:04] RECOVERY Disk Space is now: OK on deployment-feed i-00000118 output: DISK OK [12:32:00] @search gerrit [12:32:00] Results (found 5): gerrit, change, ryanland, gitweb, whitespace, [12:32:08] @search search [12:32:08] Results (found 1): logs, [12:32:19] hrmmm, i added something about gerrit-search somewhere [12:35:07] PROBLEM Disk Space is now: WARNING on deployment-feed i-00000118 output: DISK WARNING - free space: / 73 MB (5% inode=40%): [12:46:28] PROBLEM Puppet freshness is now: CRITICAL on deployment-syslog i-00000269 output: Puppet has not run in last 20 hours [13:35:38] PROBLEM Disk Space is now: WARNING on labs-nfs1 i-0000005d output: DISK WARNING - free space: /export 1011 MB (5% inode=80%): /home/SAVE 1011 MB (5% inode=80%): [14:13:35] !accountreq [14:13:35] case you want to have an account on labs please read here: https://labsconsole.wikimedia.org/wiki/Help:Access#Access_FAQ [15:23:45] PROBLEM Current Load is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:24:25] PROBLEM Current Users is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:25:05] PROBLEM Disk Space is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:25:45] PROBLEM Free ram is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:26:55] PROBLEM Total Processes is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:27:35] PROBLEM dpkg-check is now: CRITICAL on gerrit-bots i-00000272 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:28:15] PROBLEM Puppet freshness is now: CRITICAL on nova-ldap1 i-000000df output: Puppet has not run in last 20 hours [16:01:38] PROBLEM HTTP is now: CRITICAL on deployment-web4 i-00000214 output: CRITICAL - Socket timeout after 10 seconds [16:06:28] PROBLEM HTTP is now: WARNING on deployment-web4 i-00000214 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.008 second response time [17:05:59] PROBLEM Disk Space is now: CRITICAL on labs-nfs1 i-0000005d output: DISK CRITICAL - free space: /export 513 MB (2% inode=80%): /home/SAVE 513 MB (2% inode=80%): [17:45:42] 05/15/2012 - 17:45:42 - Creating a home directory for gwicke at /export/home/visualeditor/gwicke [17:46:44] 05/15/2012 - 17:46:44 - Updating keys for gwicke at /export/home/visualeditor/gwicke [18:31:47] PROBLEM Current Load is now: WARNING on mobile-testing i-00000271 output: WARNING - load average: 16.93, 11.95, 6.98 [18:41:37] RECOVERY Current Load is now: OK on mobile-testing i-00000271 output: OK - load average: 4.47, 2.38, 0.96 [19:01:47] 05/15/2012 - 19:01:47 - Creating a home directory for ori at /export/home/visualeditor/ori [19:02:41] 05/15/2012 - 19:02:41 - Updating keys for ori at /export/home/visualeditor/ori [19:03:15] 05/15/2012 - 19:03:15 - Creating a home directory for ori at /export/home/bastion/ori [19:04:14] 05/15/2012 - 19:04:13 - Updating keys for ori at /export/home/bastion/ori [19:06:23] back [19:06:25] welcome notpeter. need assistance? [19:06:30] actually, yes [19:06:31] !instances | notpeter [19:06:31] notpeter: need help? -> https://labsconsole.wikimedia.org/wiki/Help:Instances want to manage? -> https://labsconsole.wikimedia.org/wiki/Special:NovaInstance want resources? use !resource [19:06:41] !security | notpeter [19:06:41] notpeter: https://labsconsole.wikimedia.org/wiki/Help:Security_Groups [19:06:43] sweet [19:07:05] heh, i changed that trigger earlier today, there you go [19:07:06] docs could be better, but they at least exist somewhat :) [19:07:26] hi hashar, if you have a couple of minutes could you update http://www.mediawiki.org/wiki/QA_and_testing/Labs_plan with what you know about state of the labs cluster? [19:07:41] cool cool, I can start from there and ask questions along the way! thanks! [19:07:50] (or make a better page) [19:07:51] notpeter: also get yourself an IRC cloak :) http://meta.wikimedia.org/wiki/IRC/Cloaks [19:08:07] chrismcmahon: doing update rightnow [19:08:15] thanks hashar! [19:09:06] notpeter: you already have a project "build" or something? [19:09:34] chrismcmahon: I wish we had a project management software :-D [19:11:41] hashar: I'll get around to it someday ;) [19:11:50] mutante: well, there's a lucid build host [19:12:06] so... I'll just add it to whatever that project is? I guess? [19:12:14] labs is sort of a mystery to me, tbh [19:12:20] so this should be good practice [19:12:33] hashar: we could start a project in Mingle I suppose. hardly seems worth it though, at least for now. [19:12:37] notpeter: yes, sounds right, add another instance to the existing "building" project [19:12:45] notpeter: which project is that host a member of? [19:13:09] I'm more like this guy: http://blogs.msdn.com/cfs-filesystemfile.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-01-32-02-metablogapi/8054.image_5F00_thumb_5F00_35C6E986.png [19:13:15] 05/15/2012 - 19:13:15 - Updating keys for ori at /export/home/bastion/ori [19:13:25] mutante: that's a really good question :) [19:13:26] chrismcmahon: are you in Berlin ? [19:13:31] chrismcmahon: are you going to Berlin ? [19:13:42] 05/15/2012 - 19:13:42 - Updating keys for ori at /export/home/visualeditor/ori [19:13:47] chrismcmahon: if so, you could arrange a meeting with Siebrand so he demonstrates Mingle [19:13:48] notpeter: do you know its name or "instance id"? [19:14:16] mutante: labs-build1 [19:14:29] hashar: I'm not going to Berlin, but I already know Mingle pretty well. it's good for agile development projects [19:15:08] chrismcmahon: makes things easier :-D i think mingle is managed by Office IT [19:15:29] !resource I-0000006b | notpeter [19:15:29] notpeter : https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-0000006b [19:15:55] notpeter: so that tells you it is in project "testlabs", first step: you want to become member of that project [19:16:13] chrismcmahon: status update http://www.mediawiki.org/wiki/Beta_cluster/status#2012-05-15 [19:17:09] notpeter: so i changed my project filter in labsconsole to see testlabs, and i see you are already a member of it [19:18:10] notpeter: so you can go straight here: https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=create&project=testlabs [19:18:11] mutante: cool! [19:18:33] and select the existing (wee!) precise image:) [19:18:48] what's the diff between the m and s instances? [19:19:20] notpeter: s instances have additional local storage, in general you shouldn't use them [19:19:27] they have a small amount of memory [19:19:44] notpeter: m is yellow s is red ? http://im.glogster.com/media/4/16/24/18/16241893.jpg [19:19:46] kk [19:20:02] hashar: lulz [19:23:41] so, acls are done by project, yes? [19:23:48] acls? [19:23:57] er, access [19:24:01] !access [19:24:01] https://labsconsole.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [19:24:36] I guess that isn't terribly clear [19:25:00] yes, done per-project [19:25:00] hashar: any information about UploadWizard on beta commons? I'm still getting "Internal error: Server failed to store temporary file." [19:25:00] cool [19:26:19] notpeter: also note "manage sudo policies" link [19:26:41] but might not need any for a build host, shrug [19:26:53] just saying for general labs info [19:28:14] yeah, this be a simple instance [19:30:52] chrismcmahon: I have no idea [19:31:08] chrismcmahon: I know there are still some issue with the file paths which I need to fix [19:31:27] chrismcmahon: I spent last night cleaning out the MediaWiki configuration mess :-]] [19:32:27] thanks hashar, just checking [19:33:10] chrismcmahon: keep asking :-} [19:33:40] Also, I figured out this morning that the directory we are logging to is too small :-] [19:33:47] PROBLEM Current Load is now: CRITICAL on build-precise1 i-00000273 output: Connection refused by host [19:34:18] partition is 18GB, shared among all WMF labs instances and has a few hundreds MBytes left :-] [19:34:27] PROBLEM Current Users is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:35:17] PROBLEM Disk Space is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:35:37] PROBLEM Free ram is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:36:57] PROBLEM Total Processes is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:37:07] 18GB and nothing to read :) hashar [19:37:41] PROBLEM dpkg-check is now: CRITICAL on build-precise1 i-00000273 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:38:19] fucking bots [19:38:31] no one seems to fix their bots when I tell them they are eating all of the space [19:38:40] maybe I'll link their log files to /dev/null [19:38:43] dont you use per user disk quota ? [19:38:46] no [19:38:57] * hashar opens a bug :-] [19:39:00] we're eventually moving to gluster [19:39:07] so, i'm not changing the nfs stuff at all [19:40:02] is the migration to gluster scheduled soon ? [19:40:09] whenever gluster is stable [19:40:14] which may be never [19:40:22] doh [19:41:02] anyway, I have dropped you a mail about mounting beta /home/wikipedia somewhere else [19:41:17] would be great to give some hints to Faidon, I will follow up with him [19:43:46] PROBLEM Current Load is now: CRITICAL on gluster-devstack i-00000274 output: Connection refused by host [19:44:26] PROBLEM Current Users is now: CRITICAL on gluster-devstack i-00000274 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:44:56] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [19:45:18] PROBLEM Disk Space is now: CRITICAL on gluster-devstack i-00000274 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:45:45] PROBLEM Free ram is now: CRITICAL on gluster-devstack i-00000274 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:46:19] petan: one of your processes is holding open a giant file [19:46:54] I take that back [19:46:58] PROBLEM Total Processes is now: CRITICAL on gluster-devstack i-00000274 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:47:14] /export/home/bots/petrb/production/bot2/errors is 6GB [19:47:38] PROBLEM dpkg-check is now: CRITICAL on gluster-devstack i-00000274 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:47:38] please write logs to gluster storage, rather than home directories [19:47:59] hashar: I just freed up 6GB [19:48:49] !log deployment-prep hashar: restarted udp2log on deployment-feed, lot of zombie python processes there [19:48:54] Ryan_Lane: danke:-] [19:48:56] Logged the message, Master [19:51:08] RECOVERY Disk Space is now: OK on labs-nfs1 i-0000005d output: DISK OK [19:51:43] <^demon> I like https://gerrit.wikimedia.org/r/Documentation/config-gerrit.html#_a_id_changemerge_a_section_changemerge :) [19:51:51] <^demon> It hides "Submit" if a dry merge fails [19:52:24] * Damianz puts 5TB of logs into gluster for Ryan :D [19:52:35] putting it into gluster is fine [19:52:38] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [19:52:40] that's per-project [19:52:48] error: /etc/logrotate.d/mw-udp2log:17 error verifying olddir path /home/wikipedia/logs/archive: No such file or directory [19:52:49] putting into the currently shared home directories is not [19:52:51] Yeah for logrotate! [19:53:29] <^demon> hashar: Do you think we should enable changeMerge.test? [19:53:41] ^demon: I have no idea what you are talking about :-D [19:53:45] hashar: based on size= rather than weekly [19:53:59] <^demon> hashar: See the link I posted ^ and what I said. [19:54:07] mutante: I am ranting about log rotate not creating olddir :-((( [19:54:07] <^demon> It hides "Submit" buttons when it won't merge cleanly. [19:54:08] mutante: on a daily job !! [19:54:33] oh, there is that delay option [19:55:09] delaycompress? [19:57:43] hashar: that was all related to NFS? ignore me then [19:58:06] mutante: it is just logrotate being dumb :-] [19:58:12] ok:p [19:58:26] ^demon: it is unlikely I will have time to read about that tonight, poke me about it tomorrow morning? [19:58:39] <^demon> ok :) [19:59:08] thanks for my little brain! [20:05:31] New patchset: Hashar; "logrotate require archive dir to be created!" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7746 [20:05:45] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7746 [20:10:50] mutante: wanna review my log rotate patch ? 7746 link above ^^^ [20:13:21] so olivneh ori.livneh exists just fine in Labs -- https://labsconsole.wikimedia.org/wiki/User:Ori.livneh [20:13:28] olivneh: what error do you get when trying to ssh into bastion? [20:13:35] Permission denied (publickey). [20:13:40] !log deployment-prep manually created /home/wikipedia/logs/archive from deployment-feed (pending https://gerrit.wikimedia.org/r/7746 ) [20:13:42] Logged the message, Master [20:13:53] i'm using the correct key (the one that i've added using the web interface) [20:14:20] olivneh: hm. And you've checked .... ok. (BTW, unfortunately Gerrit & Labsconsole don't know about each other's ssh keys for any given user - you checked both?) [20:15:28] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [20:15:39] haven't tried gerrit. let me try. [20:16:20] olivneh: ok. and in the meantime.... [20:16:57] Ryan_Lane: paravoid: petan|wk petan: when I go to https://labsconsole.wikimedia.org/wiki/Special:NovaProject to try to add someone to the bastion project, I see "Please select projects to display using the project filter. Project filter Projects" but no menu to select from. [20:17:06] Perhaps I am missing the relevant permissions? [20:17:28] sumanah: log out then back in [20:17:37] it's the "no credentials for your account" bug [20:18:17] Ryan_Lane: I was unaware of this bug... I presume there's a BZ issue filed for it, etc., etc. [20:18:29] New patchset: Hashar; "(bug 36872) logrotate require archive dir to be created!" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7746 [20:18:40] * sumanah logs out, logs in [20:18:43] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7746 [20:18:45] ok! Got it, thanks Ryan_Lane [20:18:49] yw [20:18:57] New review: Hashar; "Patchset 2 adds a reference to bug number in commit message." [operations/puppet] (test); V: 0 C: 0; - https://gerrit.wikimedia.org/r/7746 [20:19:01] sumanah: waitttttttt. i got it to work. [20:19:18] sumanah: with username "ori". maybe anything after the period gets truncated. [20:19:38] olivneh: Oh really? Hmmmm. Ryan_Lane ^ -- his username is Ori.livneh . [20:20:23] . should work [20:20:27] _ doesn't [20:20:32] . doesn't work in shell account names [20:20:55] sumanah: it's a long-standing bug [20:21:10] sumanah: I've tracked it down, but I'm not totally sure how to fix it [20:21:14] it's in the LDAP extension [20:24:09] olivneh: ok, so you are all set? Not sure what this means for all your LDAP-related logins [20:24:20] olivneh: for example, have you yet been able to do git pulls, etc? [20:24:43] well, what happened was this [20:25:12] i hadn't signed up to gerrit previously. when i did, it flagged the tail of my username as problematic, which left me with "ori" [20:25:27] that's how i realized the same might have happened on labs [20:25:39] so i haven't pulled yet, but i bet it'll work with user name "ori" [20:26:23] Could be! [20:26:40] I've just edited https://www.mediawiki.org/wiki/Developer_access/Instructions_to_post_your_request_below to say that Git usernames should just be letters, numbers, and spaces [20:26:44] for later ease. [20:36:18] !log deployment-prep hashar: Insatlled multiversion using svn checkout https://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/mwmultiversion/multiversion [20:36:20] Logged the message, Master [20:45:32] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [20:58:57] !log deployment-prep hashar: opened several bugs, prepared for MWMultiversion [20:58:58] Logged the message, Master [20:59:14] !log deployment-prep hashar: Cloning 1.20wmf2 and 1.20wmf3 in independant repos just like in production [20:59:15] Logged the message, Master [21:13:22] !log deployment-prep hashar: Managed to get wikiversions.cdb to be rebuild using /home/wikipedia/common/multiversion/refreshWikiversionsCDB [21:13:24] Logged the message, Master [21:16:15] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [21:23:22] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c output: Warning: 19% free memory [21:46:34] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [22:00:04] RECOVERY Current Users is now: OK on gluster-devstack i-00000274 output: USERS OK - 1 users currently logged in [22:00:04] RECOVERY Disk Space is now: OK on gluster-devstack i-00000274 output: DISK OK [22:00:49] RECOVERY Free ram is now: OK on gluster-devstack i-00000274 output: OK: 94% free memory [22:01:54] RECOVERY Total Processes is now: OK on gluster-devstack i-00000274 output: PROCS OK: 108 processes [22:03:05] RECOVERY dpkg-check is now: OK on gluster-devstack i-00000274 output: All packages OK [22:04:24] RECOVERY Current Load is now: OK on gluster-devstack i-00000274 output: OK - load average: 0.18, 0.06, 0.02 [22:16:47] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [22:36:11] New patchset: Bhartshorne; "adding in configuration option to only write thumbs to some containers" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7758 [22:36:26] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7758 [22:36:49] New review: Bhartshorne; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/7758 [22:36:51] Change merged: Bhartshorne; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7758 [22:46:38] PROBLEM dpkg-check is now: CRITICAL on fundraising-db i-0000015c output: DPKG CRITICAL dpkg reports broken packages [22:46:48] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [22:47:08] PROBLEM Puppet freshness is now: CRITICAL on deployment-syslog i-00000269 output: Puppet has not run in last 20 hours [22:50:46] the new version of ganglia (ganglia-monitor, gmetad) should be rolling out everywhere on the next puppet run. [22:51:14] PROBLEM dpkg-check is now: CRITICAL on ee-prototype i-0000013d output: DPKG CRITICAL dpkg reports broken packages [23:01:13] PROBLEM dpkg-check is now: CRITICAL on gluster-devstack i-00000274 output: DPKG CRITICAL dpkg reports broken packages [23:02:56] hey Ryan_Lane was there something that happened recently(ish) that changed group names from to project-? [23:03:41] and given that that happened, how can I recreate the group ? [23:08:41] answered offline but for anybody watching, just use groupadd. [23:08:44] :D [23:13:27] yes [23:16:10] RECOVERY dpkg-check is now: OK on gluster-devstack i-00000274 output: All packages OK [23:17:32] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [23:23:11] PROBLEM Disk Space is now: CRITICAL on upload-wizard i-0000021c output: CHECK_NRPE: Socket timeout after 10 seconds. [23:27:51] RECOVERY Disk Space is now: OK on upload-wizard i-0000021c output: DISK OK [23:30:30] New patchset: Sara; "Enable ganglia gmond override_hostname option in labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7762 [23:30:46] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/7762 [23:31:17] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/7762 [23:31:19] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/7762 [23:44:23] PROBLEM dpkg-check is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:45:43] PROBLEM Current Load is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:46:23] PROBLEM Current Users is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:47:12] PROBLEM Disk Space is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:47:33] PROBLEM Free ram is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:47:58] PROBLEM host: gluster-devstack-brick1 is DOWN address: i-00000275 CRITICAL - Host Unreachable (i-00000275) [23:48:53] PROBLEM Total Processes is now: CRITICAL on demo-deployment1 i-00000276 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:51:30] \o/ I just used labs to successfully test a feature (swift change) before rolling it out into production. [23:51:35] yay labs!