[00:00:41] RECOVERY - Host cp3002 is UP: PING OK - Packet loss = 0%, RTA = 109.09 ms [00:16:08] New patchset: Asher; "provide pt-heartbeat with socket location" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1994 [00:16:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1994 [00:17:53] !log asher synchronized wmf-config/db.php 'adding db52 to enwiki, load 100' [00:17:54] Logged the message, Master [00:22:27] !log asher synchronized wmf-config/db.php 'raising db52 load to 200' [00:22:28] Logged the message, Master [00:22:59] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1994 [00:22:59] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1994 [00:26:59] New patchset: Bhartshorne; "updating AUTH string for a newly created account for the eqiad swift cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1995 [00:27:03] !log asher synchronized wmf-config/db.php 'raising db52 load to 400' [00:27:04] Logged the message, Master [00:27:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1995 [00:27:23] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1995 [00:28:46] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1995 [00:28:47] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1995 [00:42:10] New patchset: Asher; "provision db /a volumes with correct default mount options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1996 [00:42:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1996 [00:42:28] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1996 [00:42:28] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1996 [00:49:26] werdan7: around? [01:50:44] New patchset: Bhartshorne; "loosening the regex to allow Swift to function correctly; we were catching too little" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1997 [01:50:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1997 [01:53:14] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1997 [01:53:15] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1997 [02:05:53] !log LocalisationUpdate completed (1.18) at Sat Jan 21 02:05:53 UTC 2012 [02:05:55] Logged the message, Master [02:25:23] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1924s [02:29:13] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 2154s [02:29:53] PROBLEM - Frontend Squid HTTP on knsq9 is CRITICAL: Connection refused [02:39:14] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:45:23] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:18:20] RECOVERY - Disk space on es1004 is OK: DISK OK [04:22:41] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:27:48] hi [04:28:13] I have a question... not sure if this is the correct place [04:28:59] how can I change the name of the right here? http://pt.wikipedia.org/w/index.php?title=Especial:Registo&type=rights&user=Teles [04:30:24] * Teles hopes somebody is awake at that moment [04:38:00] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:38:25] Teles: the user group with the rights would need to be changed on the server side [04:40:40] p858snake|l, hi. I need to change "eliminadora" [04:41:04] should I open a request on bugzilla? [04:41:23] You will need to a file a bug in bugzilla requesting that and also link to any consenus you have for the change [04:42:08] thank you. I will work on that consensus :) [04:45:54] It is kind of weird cause it was "eliminador" (correct name) on the beginning [06:17:02] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [09:48:31] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 430663 MB (3% inode=99%): [09:55:58] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 399390 MB (3% inode=99%): [10:44:18] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:07:26] Not sure if this is the right place to ask, but I'll just give it a shot. Can anybody tell me if the running instances of mediawiki using lucene also make use of a clustering component such as apache mahout? [11:08:00] that we are running in production? [11:08:08] apergos, yes. [11:08:45] or at least if clustered data is used to speed up search [11:09:13] we're not using anything like that to the best of my knowledge [11:09:43] this page http://wikitech.wikimedia.org/view/Search while not entirely accurate about which machines do what, is more or less correct about the architecture [11:12:42] thanks apergos. do you know if it is possible to have a look at the lucene configuration files? [11:13:44] especially /home/wikipedia/conf/lucene/lsearch-global-2.1.conf and /etc/lsearch.conf (hope they'll shed more light) [11:14:38] http://noc.wikimedia.org/conf/lsearch-global-2.1.conf [11:14:45] there's that one [11:15:12] http://noc.wikimedia.org/conf/highlight.php?file=lucene.php here's this [11:16:48] apergos, thanks so much. That's a good resource. I am bookmarking http://noc.wikimedia.org/ [11:16:56] yeah, it's a keeper [11:18:20] we don't have the last file in fact so you can't have it either (looks like it's no longer used) [11:22:34] Oh, I just read that the files are dynamically generated and up-to-date -- are these just local copies that are used, s.t. they're not up-to-date of what is used on the servers? [11:23:24] "the last file" = "/etc/lsearch.conf" [11:23:33] but in fact I lie, I found it after all [11:23:47] yes the noc files are current [11:24:36] there's nothing of interest in the lsearch.conf file though [11:25:01] k. [11:25:19] a bit of stuff for logging and ganglia, some OAI stuff, maxqueue params [11:25:27] I guess I should have a look at the lucene-search extension? [11:25:31] yup [11:47:12] hi, exists an IPv6 road map for the Wikimedia Services? [14:06:36] wow apergos, you've created lots of interesting pages on wikitech :) [14:08:50] sorry for the proliferation [14:09:10] but I finally came to eh realization that sinc I own that project I can make the documentation suck a lot less [14:09:20] so hopefully it does now... [14:09:29] apergos: I am currently transferring some dumps from dumps.wikimedia.org over to the Internet Archive [14:09:34] out of boredom [14:09:35] great [14:09:37] :-) [14:09:48] be bored more often? ;-) [14:09:50] but the sad thing is that the oldest dumps gets deleted automatically [14:10:24] like I managed to upload the 2011 January ones, but only for some wikis [14:10:31] and now they are gone [14:10:34] they do get deleted automatically [14:10:43] they were never intended to be kept forever [14:10:58] sad [14:11:36] apergos, the docs on mw.o about how to dump one's own wiki are quite lacking too [14:11:42] all of that information is still in the more recent dumps, except for things that were delted [14:11:50] and we don't want people to keep downloading the delted stuff [14:11:53] *deleted [14:12:08] Nemo_bis: I bet they are... I think I'm going to have to get to that sometime later [14:12:43] just like someone should look at mwdumper (notme) [14:13:02] and I should really promote the use of that perl script for converting dumps to sql for upload [14:13:07] lots to do... [14:13:21] well, just explaining how to produce xml dumps wouild be nice [14:13:26] there's only https://www.mediawiki.org/wiki/Manual:DumpBackup.php which is years old [14:13:30] :-D [14:13:57] the docs in my branch for the python scripts aren't awful [14:14:17] how hard it is for a not so huge wiki to run the normal script? [14:14:26] ah, without all the extra stuff? [14:14:40] what's the extra stuff? :) [14:14:47] I mean just an XML of full history [14:14:52] (for instance) [14:15:00] without any of the other tables etc, right? [14:15:04] yep [14:15:17] just running that script and you're done? [14:15:17] it's pretty easy [14:15:23] maybe with these dependencies: https://wikitech.wikimedia.org/view/Dumps/Software_dependencies [14:15:39] well if you didn't want to run the python script and set up configuration and all that [14:15:41] you could [14:15:49] just with the MW installation running [14:16:04] run the two maintenance scripts, one for the stubs and one for the history [14:17:00] http://wikitech.wikimedia.org/view/Dumps/Rerunning_a_job in the "other stuff" section it gives some samples for the history [14:17:14] but that part hasn't been cleaned up yet, and it will likely get moved to its own page [14:17:36] with how to run various steps by nand, how to use dry-run to see what the python script would do, etc. [14:17:50] so that will likely have the info you want when I get it done. [14:19:41] oh, great :) [14:20:04] can you wait a few days? (I'm trying to take the weekend off ;-) ) [14:20:17] eh apergos, just asking: For the split dumps of enwiki, why is it named with the page numbers? [14:20:28] it looks quite messy to me [14:20:30] that's the "chekcpoint file" system [14:20:45] apergos, sure! I'm already ashamed of asking you on saturday... [14:20:50] well first it saves all those people from asking which pages are in which piece, folks seemed to want that [14:20:56] looks difficult for me to automatically download it from a script [14:21:01] but also *I* want it, because... [14:21:13] it lets me know which files contain which in case we need to rerun them [14:21:17] well, one. [14:21:22] oh, I see [14:21:39] seems reasonable [14:21:42] automatically downloading should be easy enough [14:21:49] Hydriz, can't you just download the page and feed wget with it? [14:22:11] yeah, but when uploading to the IA, you would need to specify it in full using curl [14:22:14] grab the names out of the index.html if you want to do it file by file [14:22:36] orrrr [14:22:41] get em out of the md5sums file [14:22:47] heh [14:22:47] that's much easier to parse [14:22:53] always has a predictable name [14:23:00] and then you can construct your urls easily [14:23:24] I try to work with it then [14:23:41] ok [14:23:48] I think that solution should work ok for you [14:23:56] and another question: the 2 connection per ip, is it also implemented for the Toolserver too? [14:24:17] probably for them too [14:24:19] I once tried downloading 3 files at a time from the same host, seems okay to me [14:24:24] hmmm [14:24:27] well please don't [14:24:31] yeah [14:24:34] see that's the one host we write to [14:24:37] I reduced it now [14:24:40] as in the dumps are written to it [14:24:54] so if we start to lose on network bandwidth there we are screwed [14:25:09] * apergos makes a note to think about toolserver access [14:25:16] especially during weekends [14:25:20] more screwed? [14:25:50] well [14:26:03] if nfs get slow because bandwidth in general to that host is tight [14:26:11] then *writing* the dumps there gets slow [14:26:21] want to avoid that at all costs [14:27:02] I saw there was discussion about some sort of centrallized holding area for dump copies over there [14:27:04] this brings us back to having mirrors [14:27:08] I would encourage that [14:27:34] well the best would be to have a mirror right there in esams [14:27:35] but the brazillian mirror is very outdated [14:27:41] just for toolserver use [14:27:41] yeah [14:28:00] yeah, we are exchanging emails, he's having some trouble with the list of files to be synced [14:28:21] I can't see any problem on my end...in the meantime I sent out a pile of emails to check on other possibilities [14:28:28] the folders always look updated [14:28:36] but peering inside says otherwise [14:29:04] I'd really like to get some academic institutions in on this [14:29:32] sigh, if only we can just invest more in dumps [14:29:51] ? [14:30:05] like, make more servers so that our download speed is faster [14:30:22] we shouldn't need to have more than a couple of our own servers [14:30:33] we should have multiple organizations hosting them [14:30:59] if I think of the dozens of free TB in my university storage... [14:31:03] yeah, sigh [14:31:24] if there is one lesson to be learned after however many years of watching the internet evolve it is that decentralization is always sturdier [14:31:35] than a centralized model [14:31:53] so Nemo_bis, if they have 6 or so free tb, ... who do you know there? [14:31:54] :-) [14:32:33] well, the CTO (or how you could call her) doesn't seem to hate me [14:32:42] sweet [14:32:42] lol [14:32:47] can't hurt to ask [14:33:01] I mean "hey, we would have bragging rights that we're hosting backups for Wikipedia" [14:33:12] it doesn't actually look bad on one's list of accomplishments, right? [14:33:31] I've already asked her [14:33:35] and? [14:33:37] Nemo_bis: Where is your university? [14:33:45] When we were in the meeting deciding to buy those new servers [14:33:47] Milan [14:34:08] She said that she doesn't know, that space will be unused but who knows, blabla [14:34:16] Wikipedia-ing where that is [14:34:22] s/will be/is currently mostly/ [14:34:24] oh Italy [14:34:43] I guess we should target universities with open source/ open content initiatives [14:34:50] they'd be more likely to sign on [14:34:56] ours is one we could say [14:35:12] and we're the national peering point with Google and many others for the university network [14:35:28] hmm [14:35:32] you could... [14:35:39] ask her if it's ok to give me her email [14:35:48] and I could write a sort of pitch [14:36:14] well it would be in English, but what can we do (unles sshe wants it in Greek, those are my two choices) [14:36:43] she can speak English :) [14:36:48] I am sure [14:37:00] I would just feel bad that I can't write in the language in use [14:37:03] it sort of sucks [14:37:40] the fact is, this is potentially not her task [14:37:47] oh [14:37:54] who would be a good contact, do you think? [14:37:54] we have also http://mirror.garr.it/ but technically is not managed by the university [14:38:06] private? [14:38:28] no, public, but weird management [14:38:33] hmm [14:38:50] http://mirror.garr.it/iwantamirror.html [14:38:54] I contacted the guy who takes care of it but got nowhere, although he was very kind [14:39:01] what did he have to say? [14:39:14] I guess I have to call him or ask a prof. who loves me and might know him [14:39:18] :-) [14:39:30] again I am happy to write something all officially sounding from the WMF etc [14:39:39] if it might help :-) [14:41:08] good [14:41:17] I should call him and discover what he wants [14:41:41] then if they don't have space enough my plan is to ask the uni's CTO to lend them some space, so she doesn't have to worry [14:41:48] does your univ have a nice acronym or something? I will add it to my list of "checking into" so it doesn't fall off my radar [14:41:50] no idea whether this can actually work though [14:41:55] unimi [14:41:59] ok [14:43:00] and added. [14:43:35] heh last time I estimated the space useage it had to be an estimate, cause we didn't have 5 recent wp full dumps [14:44:06] this time I jsut went and du'ed all the files :-D [14:45:44] Eh apergos: Just remembered about the POTY files. Is it possible for you to also compile 2008 and 2009? [14:45:45] the problem last time we talked about it was how to fetch relevant files [14:45:56] I understand that now you can just let people use rsync? [14:46:05] as with that university [14:46:26] but then, would one be able to get only the most recent dumps (if not able to store multiple copies)? [14:47:39] I have an rsync-produced file which lists all the files of the last 5 good dumps for each project [14:47:43] it's generated once a day [14:47:54] that's already 6T [14:48:06] if they get set up with that we can talk about older stuff [14:48:21] no less than 6T? [14:48:33] no less, for the last 5 dumps of all projects. that is correct. [14:49:18] and you're not interested in less demanding mirroring? [14:52:18] well keeping only the last 1 or 2 is .. [14:52:28] I mean it's ok but we really need to have a few of them out there [14:52:57] wait til I start shopping around for folsk who can mirror image tarballs :-D [14:55:36] I really want people who can do more than 6T so I can give em some old stuff too, but I figure "last 5" is the top priority [14:56:51] The IA always has 925TB at its disposal... [14:56:52] Be interesting to see how end users use them [14:57:03] Downloading and storing 5TB isn't easy [14:57:07] I'm trying to get someone at IA on the hook for this [14:57:25] oh they won'tt [14:57:32] they'll download the files from their projetc of choice [14:57:36] sometimes that's en [14:57:44] (wp), sometimes it's something else [14:57:46] heh [14:58:07] some of them want history, some just want current pages, some want several dumps over time [14:58:52] if something happened to our disks, having offsite with the last 5 would give me a comfortable assurance that we could start up the dumps again with prefetch and everything working, even if some of them had unnnoticed data corruption [14:59:34] (this has happened before, hence I would expect it to happen again: both "unnotied data corruption for several runs" and "woops, there are no dumps. at all. they are all gone. what do we prefetch from??" [14:59:35] ) [15:01:03] it used to be that anyone doing research on en pedia had to use these dumps from 2007... then tomasz got some dumps up in 2010 and I gues folsk used those for awhile [15:01:18] now they can get em every month, it will be interesting to see what they do with them [15:01:45] * apergos is reminded to bite the bullet and subscrbe to the friggin research list, ugh [15:03:04] no. not going to. [15:03:12] eh, since some of the Wikimedia Ops are here, I just want to ask a question [15:03:30] why doesn't Wikimedia implement rsync and ftp access to the dumps? [15:03:33] ok but if I have to think about it I probablyy won't answer :-P [15:03:41] rsync is enabled for mirror sites, not generally [15:03:44] ftp is slow [15:04:00] like, downloading via http is hell [15:04:17] ? [15:04:20] I am tired of waiting, haiz [15:04:40] rsync is quite fast [15:04:43] (I've always used wget -c and never had to worry about it. in the days when I had a good net connection and downloaded things) [15:05:15] yeah [15:05:29] but when things get big you just hope that it can just finish faster [15:05:38] well it's slow because we cap bandwidth [15:05:44] as we were saying earlier [15:06:01] yeah, which is quite a thorn [15:06:04] and that's because we had issues with people on backbones downloading as fast as they could [15:06:11] and saturating *us*. which is not ok. [15:06:20] this, once again, is why we need mirrors [15:06:37] so if we were to draw a chart, mirrors are the center of everything? [15:06:54] well, we are: we create em, then push them out to others who then.. [15:07:11] make available for: rsync, ftp (why would people want ftp, I dunno), and whatever else [15:07:12] sigh, just hoping for that day when people ask us to get permission to mirror content, like Linux distros [15:07:17] this would go for: [15:07:36] the dumps. the pagecount stats. tarballs of groups of the images. etc [15:07:39] we are like asking people to mirror our content, when the desired is the other way round [15:08:02] I have kinda moved all the non-dump files to IA [15:08:09] how many mor epeople use wikipedia than linux dstros, I wonder [15:08:11] hmmmm [15:08:11] except for incr [15:08:22] that's ok, the add/changes are very experimental [15:08:35] yes, and it is very weird [15:08:38] I mgiht be able to get back to those for a bit near the end of the month [15:08:45] to make them a bit more robust anyways [15:08:59] it does not allow me to download the dumps using --mirror --no-parent [15:09:12] gives some crap error, sigh [15:09:29] you can email me that and I can see if that's be designor by stupidity [15:09:32] ariel at wm.or [15:09:42] *by design or [15:09:46] scuse thetypos [15:09:58] its just downloading the directories as index.html files [15:10:07] and does not create the different directories [15:10:21] so it would overall prevent download of the dump itself [15:10:57] so if you send an email "I run this. I get this error. I get this output (here's a sample)". I can look at it [15:11:06] or bugzilla and assignto me, either way [15:11:25] hmm [15:11:35] that was a long time ago, so let me try it again... [15:11:41] ok [15:11:47] well I won't look at it this weekend anyways :-D [15:12:14] omg old dumps are also deleted for this [15:12:16] sighs [15:12:39] just hope that someone else notices this too [15:12:52] I am not keen to write emails these few days haha [15:13:03] ok [15:13:06] this is a small error [15:13:23] so lets not make a fuss out of it [15:13:36] unless you are crazy about perfection :P [15:13:41] whatever works for you [15:13:53] I'm crazy about stability but I also prioritize [15:14:24] I am just focusing on getting as much dumps to the IA as possible [15:14:32] before the server purges the old dumps [15:15:47] ok, sounds like you have got your plan worked out [15:16:12] kinda [15:16:30] uploading the different versions in batches of 10 wikis each [15:16:41] now just waiting for arwiki to finish [15:16:50] and BTW I just started recently [15:17:44] my serious worry is wearing out the dump server [15:17:58] my work is already significant in ganglia [15:17:58] by doing a full download of everything? [15:18:12] as long as you download two at once and no more, [15:18:18] I download all files of every dump version [15:18:20] yeah [15:18:23] I keep to that limit [15:18:52] if you see an increase in bandwidth of dataset2.wikimedia.org, yeah thats probably me [15:19:39] * Hydriz just wishes to turn this uploading to turbo power [15:20:51] what ip are you coming from? [15:21:03] (I'm lookking at bandwidth now) [15:21:11] TS I guess [15:21:18] which one? [15:21:45] yeah, TS [15:21:47] the Toolserver [15:21:52] yes bu which host [15:21:53] both nightshade and willow [15:22:17] now is just azwikisource [15:22:23] *azwiktionary [15:22:42] arwiki is uploading the last 2011 dump [15:22:50] what's the fqdn? [15:22:56] (sorry but I'm having a brain fart) [15:23:06] should be host.toolserver.org [15:23:31] I must have mistyped it, cause that's what I did [15:23:46] isn't really using space, only BW [15:24:03] I don't see anything from there but I do see something (a *lot* of use* from [15:24:10] clients.your-server.de ... [15:24:21] so you're uploading to IA via toolserver? [15:24:28] yeah [15:24:32] http://www.archive.org/details/wikimediadownloads [15:24:55] Can they not grab them directly? [15:25:04] hmm? [15:25:26] seems a bit daft to pull them from pmtpa to esams to then push back to wherever the IA is [15:25:28] IA doesn't have anything set up for that [15:25:33] shame [15:25:40] yeah [15:25:56] well that's why they are on our list of "really shouldn't you guys be doing this? it's right up your alley." [15:25:58] just seems really inefficient having a essentially un-needed middle [15:26:10] What I do is: Download file from the bottom of page, upload them to IA using curl S3, delete file, continue with next file [15:26:12] its really inefficient having human uploaders etc [15:26:22] mmm [15:26:28] not to mention that then Hydriz probably can't upload at more than 1 MB/s [15:26:35] heh [15:27:03] I dunno about the upload speed [15:27:08] curl doesn't report [15:27:12] so there are no connections from the toolserver ip range [15:27:19] which makes me think you're coming through something else [15:27:34] that I am not sure [15:27:40] I am just running from willow currently [15:27:42] Hydriz, it does if you redirect the output to a file [15:28:15] I don't output it anywhere, or else the 1.1TB of free space on user-store would be maxed out in no time :P [15:28:48] the log [15:29:03] if only I have access to IA's supposedly called "teamarchive" server [15:29:14] then I can just wget everything and poof its on IA [15:29:19] heh, that's for a few lucky persons [15:29:20] curl displays this data to the terminal by default, so if you invoke curl to do an operation and it is about to write data to the [15:29:21] terminal, it disables the progress meter as otherwise it would mess up the output mixing progress meter and response data. [15:29:21] If you want a progress meter for HTTP POST or PUT requests, you need to redirect the response output to a file, using shell redirect [15:29:21] (>), -o [file] or similar. [15:29:31] ehm, sorry for the flood [15:29:35] ah, I saw one connection [15:29:42] lol [15:30:14] imaging you looking live at things and oh there is one connection [15:30:24] basically yup [15:30:30] well I had iftop goin [15:30:32] g [15:30:37] but I didn't see you in there [15:30:44] anyways for now it's ok [15:30:52] if it turns out to be an issue we'll deal with it [15:30:58]