[01:07:47] New patchset: Sara; "Test data_sources in ganglia gmetad.conf template." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10217 [01:08:04] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 6% free memory [01:08:05] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10217 [01:10:02] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10217 [01:10:07] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10217 [01:16:30] New patchset: Sara; "Continue testing ganglia gmetad.conf data_sources." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10218 [01:16:46] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10218 [01:17:01] Drop dead [01:17:50] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10218 [01:17:52] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10218 [01:22:11] New patchset: Sara; "Finished testing ganglia gmetad.conf templates in labs." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10219 [01:22:27] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10219 [01:23:30] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10219 [01:23:32] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10219 [01:29:34] PROBLEM host: ganglia-test5 is DOWN address: i-000002a7 CRITICAL - Host Unreachable (i-000002a7) [03:02:20] 06/05/2012 - 03:02:20 - Updating keys for laner at /export/home/deployment-prep/laner [03:05:37] RECOVERY Free ram is now: OK on blamemaps-s1 i-000002c3 output: OK: 88% free memory [03:06:57] RECOVERY Total Processes is now: OK on blamemaps-s1 i-000002c3 output: PROCS OK: 81 processes [03:07:38] RECOVERY dpkg-check is now: OK on blamemaps-s1 i-000002c3 output: All packages OK [03:08:47] RECOVERY Current Load is now: OK on blamemaps-s1 i-000002c3 output: OK - load average: 0.18, 0.25, 0.11 [03:09:27] RECOVERY Current Users is now: OK on blamemaps-s1 i-000002c3 output: USERS OK - 0 users currently logged in [03:10:07] RECOVERY Disk Space is now: OK on blamemaps-s1 i-000002c3 output: DISK OK [03:33:57] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [03:34:57] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 12% free memory [03:53:57] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 3% free memory [03:54:57] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [03:58:57] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [03:59:57] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [04:04:57] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [04:05:57] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 17% free memory [04:19:57] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:25:57] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:29:57] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:30:57] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [05:03:57] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.12, 5.49, 5.17 [05:13:57] RECOVERY Current Load is now: OK on bots-3 i-000000e5 output: OK - load average: 4.03, 4.57, 4.85 [05:29:55] New patchset: Andrew Bogott; "One more valiant attempt to get short URLs working." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10222 [05:30:11] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10222 [05:32:33] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10222 [05:32:36] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10222 [05:36:57] PROBLEM Current Load is now: WARNING on bots-3 i-000000e5 output: WARNING - load average: 5.01, 5.19, 5.07 [05:44:24] PROBLEM Current Load is now: CRITICAL on mwreview-test8 i-000002c4 output: Connection refused by host [05:45:04] PROBLEM Current Users is now: CRITICAL on mwreview-test8 i-000002c4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:45:44] PROBLEM Disk Space is now: CRITICAL on mwreview-test8 i-000002c4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:46:14] PROBLEM Free ram is now: CRITICAL on mwreview-test8 i-000002c4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [05:49:24] RECOVERY Current Load is now: OK on mwreview-test8 i-000002c4 output: OK - load average: 1.22, 0.99, 0.61 [05:50:04] RECOVERY Current Users is now: OK on mwreview-test8 i-000002c4 output: USERS OK - 2 users currently logged in [05:50:45] RECOVERY Disk Space is now: OK on mwreview-test8 i-000002c4 output: DISK OK [05:51:15] RECOVERY Free ram is now: OK on mwreview-test8 i-000002c4 output: OK: 86% free memory [06:11:06] New patchset: Andrew Bogott; "Restart Apache after we create our site." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10223 [06:11:22] New patchset: Andrew Bogott; "Try to set up mysql in order so it works the first time." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10224 [06:11:37] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10223 [06:11:37] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10224 [06:13:01] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10223 [06:13:03] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10223 [06:13:36] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10224 [06:13:39] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10224 [06:14:24] PROBLEM Current Load is now: CRITICAL on mwreview-test9 i-000002c5 output: Connection refused by host [06:15:04] PROBLEM Current Users is now: CRITICAL on mwreview-test9 i-000002c5 output: Connection refused by host [06:19:24] RECOVERY Current Load is now: OK on mwreview-test9 i-000002c5 output: OK - load average: 0.58, 1.21, 0.90 [06:19:34] PROBLEM host: mwreview-test8 is DOWN address: i-000002c4 CRITICAL - Host Unreachable (i-000002c4) [06:20:04] RECOVERY Current Users is now: OK on mwreview-test9 i-000002c5 output: USERS OK - 2 users currently logged in [06:46:58] PROBLEM Current Load is now: WARNING on aggregator-test1 i-000002bf output: WARNING - load average: 0.37, 5.47, 5.22 [06:51:55] RECOVERY Current Load is now: OK on aggregator-test1 i-000002bf output: OK - load average: 0.56, 2.26, 3.88 [06:51:55] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 7.15, 6.88, 5.48 [07:06:55] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 0.51, 1.80, 3.77 [08:00:55] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [08:08:55] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 73 MB (5% inode=52%): [08:28:05] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 53% free memory [08:50:19] Thehelpfulone: hey [08:50:25] how's the spam on labs? [08:50:28] is it still bad [09:13:16] !log meh [09:13:51] !log is bot is currently down [09:13:51] Key was added [09:13:54] !log blah [09:13:54] bot is currently down [09:13:58] :) [09:14:56] !log del [09:14:57] Successfully removed log [09:31:20] Ryan_Lane: is there any process check in nagios [09:31:33] so that we could check if process is OK [09:31:50] it would be cool to monitor all bots [10:21:44] hi Ryan_Lane [10:21:50] hello [10:41:05] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 12% free memory [10:42:21] Ryan_Lane: http://www.mediawiki.org/wiki/User:Koolhead17 [10:42:24] what next [10:46:05] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 64% free memory [10:53:49] koolhead17: did you add your info to: http://www.mediawiki.org/wiki/Developer_access ? [10:54:05] there's a link for adding a request [11:05:19] Ryan_Lane: done [11:06:12] oh [11:06:17] it seems you already have an account [11:06:23] !initial-login | koolhead17 [11:06:23] koolhead17: https://labsconsole.wikimedia.org/wiki/Access#Initial_log_in [11:07:56] Ryan_Lane: i dont have the token though [11:08:07] which token? [11:08:11] make mediawiki send you a password [11:08:26] via labsconsole [11:08:30] ok [11:11:49] Ryan_Lane: am logged in [11:12:19] great. you'll need access to a project to actually run things in such a way that you can document [11:12:28] let me give you access to the testing project, with sysadmin and netadmin [11:13:06] what kind of phone do you have? you can likely also get two-factor auth going, so you can write up docs about that [11:14:53] ok. in a sec you'll be good to go in testing [11:15:07] here's some initial docs: [11:15:10] !instances [11:15:11] need help? -> https://labsconsole.wikimedia.org/wiki/Help:Instances want to manage? -> https://labsconsole.wikimedia.org/wiki/Special:NovaInstance want resources? use !resource [11:15:12] !security [11:15:13] https://labsconsole.wikimedia.org/wiki/Help:Security_Groups [11:15:23] !help [11:15:23] !documentation for labs !wm-bot for bot [11:15:29] !documentation [11:15:30] https://labsconsole.wikimedia.org/wiki/Help:Contents [11:19:28] petan|wk: back now, yeah it can be bad but I think it's better now that we've got some global sysops who can globally lock and block :) [11:19:38] back in a little bit [11:20:08] Thehelpfulone: I think we could have a channel for labs, like #cvn-labs [11:20:31] petan|wk: I agree, did we get an RC feed yet? [11:20:40] I am working on that still [11:20:49] problem is that production feed is restricted [11:21:01] I don't know how to make more channels there, Ryan_Lane told me he needs to sanitize bot code [11:21:06] Ryan_Lane: catch u in while. [11:21:20] !ping [11:21:20] pong [11:24:06] petan|wk: also, can you do all your actions on deployment.wikimedia instead of meta.wikimedia please - I want to keep everything in one place [11:24:20] so I do [11:24:26] I thought meta is place for that [11:24:30] or it used to be in past [11:24:51] I don't even know if everything is accessible on deployment what is on meta [11:25:17] yeah it should be? [11:25:34] ok, anyway you should insert it to some documentation then [11:25:45] like write on meta Main Page to change stuff on deployment [11:25:46] etc [11:28:08] done [11:30:06] hmm petan|wk http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:DeletedContributions/208.117.11.166 [11:30:14] so I try to globally block that IP [11:30:19] Your block was unsuccessful, for the following reason: [11:30:19] The IP address (‎208.117.11.166) you entered is invalid. Please note that you cannot enter a user name! [11:31:49] Thehelpfulone: create a bug for that [11:32:09] what is labs on? 1.20wmf5? [11:32:16] Special:Version [11:32:42] I don't really know [11:32:46] it changes a lot [11:32:54] 1.20alpha [11:35:19] Thehelpfulone: I can update it [11:36:03] yeah, can you put it to the latest version? [11:36:07] !log deployment-prep petrb: updating all files to latest version using old script [11:36:07] whatever it is, wmf4 or 5 [11:39:25] master [14:28:46] New patchset: Andrew Bogott; "Revert "Restart Apache after we create our site."" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10261 [14:29:02] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10261 [14:29:39] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10261 [14:29:41] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10261 [14:49:56] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.85, 6.42, 5.48 [15:08:27] New patchset: Andrew Bogott; "Explicitly restart Apache once sites-enabled is set up." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10268 [15:08:43] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10268 [15:09:56] RECOVERY Current Load is now: OK on bots-sql2 i-000000af output: OK - load average: 3.57, 4.16, 4.95 [15:22:57] PROBLEM Current Load is now: WARNING on bots-sql2 i-000000af output: WARNING - load average: 6.70, 5.97, 5.31 [15:25:50] Totally just about to piss off wikipedia [15:28:27] ? [15:43:16] ALERT: I am restarting sql2 server in order to load new configuration, some bots will crash [15:44:37] sadtimes [15:44:59] I get to find out if my don't die on sql crapping out code works though [15:46:26] heh [15:56:50] Damianz: ok [15:56:55] now it's stable I guess [15:57:30] Well the bots still up [15:57:31] New patchset: Andrew Bogott; "Explicitly restart Apache once sites-enabled is set up." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10268 [15:57:47] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10268 [15:58:22] Ryan_Lane: Any suggestion about how to do what 10268 does but only when sites-enabled changes rather than on every puppet run? [15:58:40] Seems like the apache_site class should just always do that. [16:00:39] andrew_wmf: use the notify directive on the file [16:01:02] ^ wrong andrew [16:01:15] heh [16:01:22] Ryan_Lane: I've now attempted that several times, clearly not getting it. [16:01:39] notify => Service['apache2'], [16:01:47] Adding notify => Service['apache2'] to the apache_site class doesn't work because it can't find the service apache2. [16:02:01] hm [16:02:11] If I add that to just my instance of apache_site it causes a dependency cycle [16:03:10] (which, eliminating that cycle is probably possible, but better to have it in the apache_site class I think.) [16:05:12] hm [16:05:27] it defines the file for the site in there and doesn't automatically put a watch on it? [16:05:43] right. [16:21:11] andrewbogott: You could try subscribe. [16:22:18] My experience with subscribe was that it restarted the server every time puppet ran, whether or not the subscribed class actually did something. [16:22:25] Maybe I was misunderstanding what was happening... [16:24:15] subscribe is like notify, but in the other direction. service { 'foo': subscribe => File['bar'] } should restart foo when bar changes. [16:25:48] *frustrated* Apache2 seems to depend on sites-enabled just enough to cause circular dependencies, but not enough to actually restart when siges-enabled changes. [16:25:52] * andrewbogott goes to lunch [16:27:59] New patchset: Sara; "Remove gangia gmetad.conf files that are obsoleted by the template." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10282 [16:28:16] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/10282 [16:30:24] Ryan_Lane, I'm getting "Transport endpoint is not connected: "/data/db/", terminating", I think you've told me before what it means but I didn't write it down [16:30:37] ummm [16:30:39] for mysql? [16:30:47] Ryan_Lane, MongoDB [16:30:56] I have no clue about mongodb [16:31:06] Well, ls /data/db gives the same error [16:31:13] oh [16:31:19] /data has a set of automounts [16:31:22] you want /data/project [16:31:24] New review: Sara; "(no comment)" [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/10282 [16:31:26] Change merged: Sara; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10282 [16:31:41] /data/project is an automount that exists [16:32:13] It looks like /data has only one directory (db/, the broken one) [16:32:36] it's an automount location [16:32:42] just do ls /data/project [16:32:44] it'll appear [16:33:02] Oh, huh, that's nice [16:33:07] Really not :P [16:33:20] Damianz, as opposed to getting a fatal error on ls [16:33:56] Ryan_Lane, so for the mongodb that expects /data/db, should I just remove the non-working /data/db? [16:34:13] you can't [16:34:16] it doesn't exist [16:34:36] It should dissappear of its own occord eventually [16:34:39] change mongodb to point to an existing location [16:47:43] Ryan_Lane, thanks much, working now [16:48:02] yw [16:52:40] So Ryan_Lane, one last question, since I was using /data/db before, that data is gone now? [16:55:04] well, it was writing to nowhere, so yeah ;) [16:55:14] I'm surprised it ever worked [16:55:29] it's mongo, you should expect your data to randomly disappear anyway [16:56:12] Ryan_Lane, this is true [16:56:27] Hopefully anyone using it was also aware of that risk [16:56:38] Also hopefully, my database backup is relatively recent [16:56:53] slash functional [16:57:08] Nooope [16:57:53] All right, thanks again [16:58:03] Mongo really not try to flush to disk often?! [17:01:15] mongo does as much as possible to try to destroy your data [17:01:27] because mongo is "fast" [17:03:11] Destroying data is just faster [17:03:34] A friend told me about this presentation someone gave once about a fast NoSQL backend [17:03:38] It was called /dev/null [17:05:13] that said, redis will happily destroy your data too [17:05:23] it does snapshots occasionally [17:05:46] Redis can do nice replication but if you let your disks fill up then it freaks [17:06:03] or if you run out of memory [17:06:18] Most things freak if you fill up your memory :D [17:06:47] yeah [17:11:01] or the kernel will reap the app for you instead [17:30:03] how does /data work? [17:33:45] PROBLEM Current Load is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:34:25] PROBLEM Current Users is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:35:05] PROBLEM Disk Space is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:35:45] PROBLEM Free ram is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:36:25] PROBLEM HTTP is now: CRITICAL on grail i-000002c6 output: CRITICAL - Socket timeout after 10 seconds [17:36:56] Platonides: it's an automount map location [17:37:15] Platonides: and it's using * maps, which aren't browsable [17:37:35] PROBLEM Total Processes is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:37:50] should data be placed there? [17:37:54] if so, how? [17:37:57] similar to this: http://ryandlane.com/blog/2011/11/01/sharing-home-directories-to-instances-within-a-project-using-puppet-ldap-autofs-and-nova/ [17:38:04] /data/project is an automount entry [17:38:15] PROBLEM dpkg-check is now: CRITICAL on grail i-000002c6 output: CHECK_NRPE: Error - Could not complete SSL handshake. [17:38:51] wtf is chsh asking me a password if i'm executing it as root? [17:39:05] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [17:39:29] no clue [17:41:16] seems it was pam_shells.so fault [17:41:49] Platonides: can you push a fix into puppet? [17:42:02] uh? [17:42:35] what do you mean? [17:43:20] did you fix the pam config so that it would work? [17:43:44] I temporarily commented the line :) [17:43:58] what I wanted was precisely to change a shell from /bin/false [17:44:09] which is what pam_shells was reventingme to do [17:44:41] *preventing [17:46:23] heh [17:46:37] why did you need to change the shell? [17:46:46] su - -s /bin/bash [17:47:05] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 71 MB (5% inode=52%): [17:47:07] I didn't try that specific incantation [17:47:24] that allows you to su to a user, using a different shell [17:47:37] you can do so with or without the extra - [17:47:38] I did try with sudo -u user /bin/bash [19:36:31] something is going terribly wrong here: http://simple.wikipedia.beta.wmflabs.org/w/index.php?title=Special:Undelete&target=Pozycjonowanie-stron [19:50:25] TBloemink, let me sysopify myself [19:51:18] mh... no page history... [19:51:21] yeah [19:51:39] or is it just no undelete-revisionrow message? [19:51:53] Let me check the production wiki [19:53:09] the problem is the message [20:12:38] TBloemink, solved [20:12:42] !log deployment-prep Message undelete-revisionrow was missing (causing revisions of Special:Undelete not to be displayed). Rebuilt localisation cache with: php multiversion/MWScript.php rebuildLocalisationCache.php --wiki arwiki [20:12:43] thanks [21:00:50] Hm.. Subject "Labs replication" on [toolserver-l] looks confusing. I assume it is not what I think it is, since that wouldn't make sense :P [21:01:32] lol [21:01:55] I don't keep up with the toolserver, only have one thing that relies on it and the downtime is worse than labs [21:01:57] Danny_B|backup: Can you explain what you are looking for here? [21:02:07] http://bit.ly/toolserverLatest [21:02:19] http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/004994.html [21:02:49] labs is or will be quite a bit bigger than labs, surely you don't mean to replicate all of that within toolserver? [21:02:51] Krinkle, given that I replied that it doesn't make sense... we probably understood the same thing :) [21:02:55] than toolserver* [21:03:32] I thought they wanted to replicate the tables at beta.wmflabs.org [21:03:35] It wouldn't be better for labs to replicate the toolserver db [21:03:50] yeah, those haven't been stable either lately [21:03:51] But it would be awesome when Ryan finally sorts out replication from the production mysql servers to labs [21:04:02] although mostly not ts-admins' fault I think [21:04:13] Ryan_Lane, earlier I asked you about /data/db, is there any way that the data that went there might be saved somewhere? [21:04:22] no [21:04:30] Damianz: Read-only Replication of production cluster dbs is on the agenda, but not a priority yet until Tool-Labs is handled. [21:04:31] beta.wmflabs.org was just having data imported I thought - dunno, a true replicate of production would be awesome but it would also be huge. [21:04:34] because you wrote no notingness [21:04:37] There's other things being worked on right now. [21:04:38] *nod* fair enough [21:04:39] *to [21:04:53] beta really needs more squids, lvs putting in and acting more like production... oh varnish too [21:05:09] Krinkle: Mhm, sadly mysql replication and oauth are my 2 blocking issues right now :( [21:05:30] beta needs tons of things [21:05:32] beta is a project within labs, that confusion needs to stop soon (Damianz: I'm not accusing you of not knowing the difference, this is a general note) [21:05:38] Ryan_Lane: He wrote something then? #doublenegative [21:05:53] sure, in the way one writes to /dev/null [21:06:00] :D [21:06:09] I can't believe whatever application was trying to write there didn't fail immediately [21:06:16] ah. right. mongodb :D [21:06:30] I can't believe it started... surley opening the fh's failed [21:06:52] Krinkle: Yeah - I'm not actually involved with beta that much, more bots.... but labs/beta/everything needs a lot of things doing :D [21:07:15] yeah, the "beta" projects is to test the software, and preferably with realistic data. And a read-only replication of production wikis (like Toolserver has) is planned for there future to write tools and stuff to use that data. But I don't think the beta project can will or should use those. [21:07:39] they are separate things entirely [21:07:56] Using 'live' data for beta would be nice from certain views but from others you'd break replication etc because of schema changes so dumps is the next best thing. [21:08:05] Damianz: I couldn't see any way it got a valid file handle. right [21:08:49] I with oracle would fix the stupid error mysql throws if it can't open its files though... totally other subject, mariadb ftw! [21:09:07] Kinda like some bits percona has though :( Kinda another oracle in development there though. [21:09:11] well, if you'd like to have some tool which is on toolserver to work with labs data... [21:09:18] Damianz: They are going to be huge and they will be read-only (other wise they are useless for usage in tools and would probably violate privacy policies and/or wiki policies with regards to the content since it would allow labs testers to do anything with that data). [21:09:48] Danny_B|backup: Tools I run on labs use the toolservers... they have an api. [21:09:55] Reasonably large dump imports should be more than enough to test the software in the beta project before deployment [21:09:56] Yay for webspace. [21:10:24] Danny_B|backup: What "labs data"? [21:11:05] Yeah - I totally don't understand the wikipedia push so apart from fixing obuvasly broken stuff I don't touch deployment-prep. [21:11:25] be more specific, because (a bit like Toolserver) wmf labs hosts a ton of different projects (e.g. you're not asking to replicate my database of my wiki bot hosted on labs to be replicated to the toolserver... right?) [21:11:34] Danny_B|backup: ^ [21:11:46] :D [21:11:53] well, i wrote that email when i visited sulinfo tool and i thought that it would be good to have labs account info there as well [21:12:08] OAUTH [21:12:12] Yell about OAUTH lots [21:12:14] ^ [21:12:29] OAuth probably don't do anything here, because this is about raw database queries. [21:12:46] Well I was assuming 'labs account info' = ldap [21:12:51] at least that is what sulutil does [21:13:04] Danny_B|backup: Special:CentralAuth works fine [21:13:33] But I want oauth for a whole other reason that involves user auth for external tools as it currently sucks ass and has a negative impact on contributions/integration. [21:13:55] meh [21:14:02] OAuth would be *much* nicer [21:14:11] the sulutil thing is a dirty, dirty hack [21:14:40] also, I've been re-thinking using mariadb, because it would encourage people to stick their ldap password into files [21:15:10] everything on toolserver is a dirty hack, get used to it. It is the first step to a great extension, a third party service or discovery of a bad idea. Some tools hang in between those for years though. [21:15:27] I was thinking we where going with more a db as a service with 1 user 1db click button get details job... or something [21:15:39] Lots of the MW core are dirty hacks :P [21:16:30] <^demon> Ryan_Lane: Mind poking gerrit change 6005 and 10352? [21:16:38] Reminds me I should update work's wiki... *goes to find the maintenance scripts to run* [21:17:10] ^demon: you realize it's 11:20 pm here, right? :) [21:17:27] <^demon> The world revolves around UTC ;-) [21:17:31] heh [21:17:42] well, gerrit is taking its sweet time opening for me [21:17:56] <^demon> Hrm, started lagging for me a moment ago too :\ [21:18:46] 11:20? You still in europe? [21:19:00] It's only 10:20 in the uk but we have stupid summer time things [21:19:11] yeah, still in germany [21:19:20] Nice, I like germany. [21:19:31] me too [21:19:45] <3 currywurst's [21:20:01] <3 doner kabap [21:20:18] Proper doner is <3 but we get horrid stuff around here. [21:20:23] heh [21:20:44] <^demon> There is nowhere in the US to get doner, and that continues to make me sad :( [21:21:09] Start a career in fast food? [21:21:40] Wait... why does mw use git branches over tags for versions?! [21:21:56] <^demon> We use branches and tags...? [21:21:57] because you can change a branch [21:22:36] <^demon> Individual releases are tagged, but they also get a branch for 1.nn.x, 1.nn.x+1, and so forth [21:22:38] Danny_B|backup: Platonides http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/004995.html [21:23:52] <^demon> Ooooh, turns out we've got a new Turkish place in Richmond....called Döner Kebab. [21:24:12] *mind blown* [21:24:25] Ryan_Lane: Isn't the point of versions that once you tag them they don't change.... [21:24:34] :D [21:25:10] Git is a bit of a pain for really long running branches still :( [21:25:17] Damianz: I'm not sure what you mean. Mw uses tags for released versions. branches for in-progress work [21:25:44] Every major release also has a branch at all times so that things can be backported to be release in the next minor release [21:25:58] e.g merge to REL1_18 to be released in 1.18.4 [21:26:21] Oh, so you cherry pick changes from master back into the branch then tag a new minor from it? [21:26:28] yep [21:26:42] and maybe some other changes between the cherry-pick and the tagging [21:27:16] (e.g. update wgVersion and/or update release-notes) [21:29:01] Totally not a software person, anything I work on just uses master as stable, branches for new features and tags for milestone releases lol [21:30:32] no special:centralauth [21:31:17] Danny_B|backup: Sure there is. http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:CentralAuth/Krinkle [21:31:37] ah, but i was not speaking about beta [21:31:43] .. ? [21:31:56] there is a bunch of other wikis for other projects [21:32:12] You do know that beta is on a totally different cluster and can not and must not have access to the production cluster. [21:32:23] sure [21:32:27] There can't be information on your beta account on the real Special:CentralAuth or vica versa [21:32:46] Hm.. so what would you like to see? [21:33:35] (aside from me reaching a rabbit out of my magic hat ;-) ) [21:34:43] for instance labsconsole is a wiki. so the sulinfo could (although it is not within sul, i know - but that's foundation wiki as well and it's there) have info from that wiki too [21:34:50] * Damianz thinks Krinkle just volunteered her/his self to do magic tricks at the next large public event (maybe youtube video during the next blackout) [21:35:07] not only sulinfo, there is bunch of other tools doing various stats through wikis [21:35:21] Damianz: You'll be surprised. [21:35:51] :o) [21:36:05] Krinkle: should we tell him? ;-) [21:36:12] or... [21:36:22] we could use OAuth, because that's what it's designed for [21:36:23] I think he might be starting to suspect something at this point :D [21:37:16] Ryan_Lane: Hm.. interesting. That would solve the the CentralAuth link. Not the general "query anything from beta on toolserver" though. [21:37:29] why not? [21:37:30] there are things like editcounters, admin activity monitors, block monitors etc. [21:37:36] Ryan_Lane: Can I voluntell you to impliment oauth? :D [21:37:37] Ryan_Lane: You tell me. [21:37:48] wait. which direction? [21:37:53] You are a core dev an all *hides* [21:37:54] beta to toolserver, or toolserver to beta? [21:38:11] our new security guy is likely doing oauth soon [21:38:12] Ryan_Lane: A toolserver tool able to do SELECT queries on beta's db [21:38:33] why not do it via the api? [21:38:37] and no, that wouldn't help [21:38:44] Ryan_Lane: If that was possible these wouldn't be toolserver tools. [21:38:54] why would it need to? beta is just copies of crap from production [21:39:10] Ryan_Lane: http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/004995.html [21:39:49] Ryan_Lane: An example use case mentioned earlier was for sulutil (toolserver tool) to show wmf accounts and beta accounts together. [21:40:01] Although that should not be implemented by using db access, OAuth can be relevant there to make the link. [21:40:10] we expressly don't want that [21:40:12] I don't really see how beta accounts matter as they are mostly dev people [21:40:13] I know [21:40:16] But maybe that's just me [21:40:32] https://bits.wikimedia.org/static-trunk/skins/vector/images/search-ltr.png doesn't exist [21:40:34] I think that would cause a shit storm, actually [21:40:45] it's linked from labs wikis [21:40:57] it's a search magnifier [21:41:01] in search box [21:41:01] Danny_B|backup: there is a bug about that in bugzlla, that is a problem in beta's config [21:41:10] fine [21:42:01] there is branch named "trunk", beta is a bit half-way a migration from "svn-trunk with an old copy of wmf-config" to "git-master with the real mediawiki-config/wmf-config" [21:42:01] Which labs wiki? [21:42:05] Damianz: all [21:42:06] https://bugzilla.wikimedia.org/show_bug.cgi?id=37245 [21:42:17] well, all "beta" wikis [21:42:18] oh [21:42:25] See when you say labs I think labsconsole [21:42:40] Deployment-prep needs varnish sorting IIRC before bits will work properly [21:42:44] * Damianz doesn't keep up [21:42:48] yes, hence my earlier point regarding people using "labs" to refer to the "beta project" [21:42:55] which is confusing [21:43:38] I was going to point out labsconsole shouldn't be pulling from beta but could be fine from commons as it's really a foundation wiki but *shrug* that's a redundant point now. [21:43:56] Still going to be interested to see what happens with wikitech/labsconsole in the end though [21:45:12] I think it makes sense to have a labsconsole (be it as a mediawiki instance or not). But all this documentation stuff needs to go away from labsconsole no matter what. [21:45:32] and the semantic stuff used to aggregates host information into nice tables seems useful to have on wikitech as well. [21:45:56] Krinkle, I think you have an account on almost every wiki? [21:46:08] Platonides: All that I know of, yes. [21:46:26] e.g. {{Server}} on http://wikitech.wikimedia.org/view/fenari and http://wikitech.wikimedia.org/view/Server_roles [21:46:37] Mhm [21:46:41] could you remove [[nv:Íiyisíí Naaltsoos]] from a couple of pages? [21:46:57] I'm not sure about docs tbh. [21:47:16] Platonides: Why would I do that? [21:47:17] On one side having project/instance docs inline is awesome but it also makes them hard to find unless you know what you're looking for. [21:47:31] Platonides: PM me or ask in another channel. I'd be happy to help. [21:48:02] Should be better when we actually use puppet and stop manually configuring stuff then logs is applicable as right now lots of things in a 'cluster' are not standard. [21:49:21] Damianz: well it depends on what it is. documentation on how something specific to a labs project as it is within labs (e.g. server admin logs and instnaces/projects makes perfect sense). But something about mediawiki-config should be on wikitech (or its successor). And something about Git/Gerrit should be on mediawiki.org [21:49:39] Deployment procedures and wmf projects make more sense on wikitech imho [21:50:22] Yeah [21:50:48] But labs should also follow production closly for some things (like beta) so the documentation crosses over. [21:52:15] The only thing on labs that follows production (or rather will, in the future once it is fully reproduced) is the beta project. And all of that is the operations/mediawiki-config repository with is managed in Gerrit and documentation on it is on wikitech. And the rest is puppetization, which is also all in Gerrit and wikitech. [21:52:48] Left are general maintenance stuff on the beta project here on labs, which makes sense to have on the labsconsole wiki I guess [21:57:57] wikitech and labsconsole are going to merge [21:57:59] * Ryan_Lane shrugs [21:58:47] things like project documentation will also likely move into it [22:00:57] I can't wait to remove /status, /Roadmap and other wmf stuff from mediawiki.org; I don't mind it that much but it can make writing in certain contexts very confusing as well as reading. [22:01:21] especially when working on non-wmf wikis. [22:11:19] New review: Andrew Bogott; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/10268 [22:11:22] Change merged: Andrew Bogott; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/10268 [22:15:29] Ryan_Lane, apparently the mongodb instance was writing to /var/lib/mongodb all along. I don't know why it complained about /data/db, but the data is safe. Thanks again for your help! [22:15:40] ah [22:15:40] heh [22:15:41] yw [22:15:53] lol [22:16:09] marktraceur: On the bright side you know your backups are screwed. [22:16:25] Damianz, exactly! That's the spirit! [22:16:34] This definitely inspired some workable backup scripts, yes [22:23:46] PROBLEM Current Load is now: CRITICAL on mwreview-test10 i-000002c7 output: Connection refused by host [22:24:26] PROBLEM Current Users is now: CRITICAL on mwreview-test10 i-000002c7 output: Connection refused by host [22:25:06] PROBLEM Disk Space is now: CRITICAL on mwreview-test10 i-000002c7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:25:46] PROBLEM Free ram is now: CRITICAL on mwreview-test10 i-000002c7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:26:56] PROBLEM Total Processes is now: CRITICAL on mwreview-test10 i-000002c7 output: CHECK_NRPE: Error - Could not complete SSL handshake. [22:28:46] RECOVERY Current Load is now: OK on mwreview-test10 i-000002c7 output: OK - load average: 0.41, 1.10, 0.84 [22:29:26] RECOVERY Current Users is now: OK on mwreview-test10 i-000002c7 output: USERS OK - 1 users currently logged in [22:30:15] RECOVERY Disk Space is now: OK on mwreview-test10 i-000002c7 output: DISK OK [22:30:45] RECOVERY Free ram is now: OK on mwreview-test10 i-000002c7 output: OK: 92% free memory [22:32:05] RECOVERY Total Processes is now: OK on mwreview-test10 i-000002c7 output: PROCS OK: 85 processes [22:35:35] PROBLEM dpkg-check is now: CRITICAL on mwreview-test10 i-000002c7 output: DPKG CRITICAL dpkg reports broken packages [22:45:35] RECOVERY dpkg-check is now: OK on mwreview-test10 i-000002c7 output: All packages OK [22:59:52] <^demon> Just drove to the new doner place in Richmond on a whim :p Pretty damn good. And they're open til 3am Fri/Sat ;-) [23:02:02] 3am is good