[00:00:41] RECOVERY - Host cp3002 is UP: PING OK - Packet loss = 0%, RTA = 109.09 ms [00:16:08] New patchset: Asher; "provide pt-heartbeat with socket location" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1994 [00:16:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1994 [00:17:53] !log asher synchronized wmf-config/db.php 'adding db52 to enwiki, load 100' [00:17:54] Logged the message, Master [00:22:27] !log asher synchronized wmf-config/db.php 'raising db52 load to 200' [00:22:28] Logged the message, Master [00:22:59] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1994 [00:22:59] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1994 [00:26:59] New patchset: Bhartshorne; "updating AUTH string for a newly created account for the eqiad swift cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1995 [00:27:03] !log asher synchronized wmf-config/db.php 'raising db52 load to 400' [00:27:04] Logged the message, Master [00:27:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1995 [00:27:23] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1995 [00:28:46] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1995 [00:28:47] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1995 [00:42:10] New patchset: Asher; "provision db /a volumes with correct default mount options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1996 [00:42:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1996 [00:42:28] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1996 [00:42:28] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1996 [00:49:26] werdan7: around? [01:50:44] New patchset: Bhartshorne; "loosening the regex to allow Swift to function correctly; we were catching too little" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1997 [01:50:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1997 [01:53:14] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1997 [01:53:15] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1997 [02:05:53] !log LocalisationUpdate completed (1.18) at Sat Jan 21 02:05:53 UTC 2012 [02:05:55] Logged the message, Master [02:25:23] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1924s [02:29:13] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 2154s [02:29:53] PROBLEM - Frontend Squid HTTP on knsq9 is CRITICAL: Connection refused [02:39:14] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:45:23] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:18:20] RECOVERY - Disk space on es1004 is OK: DISK OK [04:22:41] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:27:48] hi [04:28:13] I have a question... not sure if this is the correct place [04:28:59] how can I change the name of the right here? http://pt.wikipedia.org/w/index.php?title=Especial:Registo&type=rights&user=Teles [04:30:24] * Teles hopes somebody is awake at that moment [04:38:00] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:38:25] Teles: the user group with the rights would need to be changed on the server side [04:40:40] p858snake|l, hi. I need to change "eliminadora" [04:41:04] should I open a request on bugzilla? [04:41:23] You will need to a file a bug in bugzilla requesting that and also link to any consenus you have for the change [04:42:08] thank you. I will work on that consensus :) [04:45:54] It is kind of weird cause it was "eliminador" (correct name) on the beginning [06:17:02] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [09:48:31] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 430663 MB (3% inode=99%): [09:55:58] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 399390 MB (3% inode=99%): [10:44:18] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:07:26] Not sure if this is the right place to ask, but I'll just give it a shot. Can anybody tell me if the running instances of mediawiki using lucene also make use of a clustering component such as apache mahout? [11:08:00] that we are running in production? [11:08:08] apergos, yes. [11:08:45] or at least if clustered data is used to speed up search [11:09:13] we're not using anything like that to the best of my knowledge [11:09:43] this page http://wikitech.wikimedia.org/view/Search while not entirely accurate about which machines do what, is more or less correct about the architecture [11:12:42] thanks apergos. do you know if it is possible to have a look at the lucene configuration files? [11:13:44] especially /home/wikipedia/conf/lucene/lsearch-global-2.1.conf and /etc/lsearch.conf (hope they'll shed more light) [11:14:38] http://noc.wikimedia.org/conf/lsearch-global-2.1.conf [11:14:45] there's that one [11:15:12] http://noc.wikimedia.org/conf/highlight.php?file=lucene.php here's this [11:16:48] apergos, thanks so much. That's a good resource. I am bookmarking http://noc.wikimedia.org/ [11:16:56] yeah, it's a keeper [11:18:20] we don't have the last file in fact so you can't have it either (looks like it's no longer used) [11:22:34] Oh, I just read that the files are dynamically generated and up-to-date -- are these just local copies that are used, s.t. they're not up-to-date of what is used on the servers? [11:23:24] "the last file" = "/etc/lsearch.conf" [11:23:33] but in fact I lie, I found it after all [11:23:47] yes the noc files are current [11:24:36] there's nothing of interest in the lsearch.conf file though [11:25:01] k. [11:25:19] a bit of stuff for logging and ganglia, some OAI stuff, maxqueue params [11:25:27] I guess I should have a look at the lucene-search extension? [11:25:31] yup [11:47:12] hi, exists an IPv6 road map for the Wikimedia Services? [14:06:36] wow apergos, you've created lots of interesting pages on wikitech :) [14:08:50] sorry for the proliferation [14:09:10] but I finally came to eh realization that sinc I own that project I can make the documentation suck a lot less [14:09:20] so hopefully it does now... [14:09:29] apergos: I am currently transferring some dumps from dumps.wikimedia.org over to the Internet Archive [14:09:34] out of boredom [14:09:35] great [14:09:37] :-) [14:09:48] be bored more often? ;-) [14:09:50] but the sad thing is that the oldest dumps gets deleted automatically [14:10:24] like I managed to upload the 2011 January ones, but only for some wikis [14:10:31] and now they are gone [14:10:34] they do get deleted automatically [14:10:43] they were never intended to be kept forever [14:10:58] sad [14:11:36] apergos, the docs on mw.o about how to dump one's own wiki are quite lacking too [14:11:42] all of that information is still in the more recent dumps, except for things that were delted [14:11:50] and we don't want people to keep downloading the delted stuff [14:11:53] *deleted [14:12:08] Nemo_bis: I bet they are... I think I'm going to have to get to that sometime later [14:12:43] just like someone should look at mwdumper (notme) [14:13:02] and I should really promote the use of that perl script for converting dumps to sql for upload [14:13:07] lots to do... [14:13:21] well, just explaining how to produce xml dumps wouild be nice [14:13:26] there's only https://www.mediawiki.org/wiki/Manual:DumpBackup.php which is years old [14:13:30] :-D [14:13:57] the docs in my branch for the python scripts aren't awful [14:14:17] how hard it is for a not so huge wiki to run the normal script? [14:14:26] ah, without all the extra stuff? [14:14:40] what's the extra stuff? :) [14:14:47] I mean just an XML of full history [14:14:52] (for instance) [14:15:00] without any of the other tables etc, right? [14:15:04] yep [14:15:17] just running that script and you're done? [14:15:17] it's pretty easy [14:15:23] maybe with these dependencies: https://wikitech.wikimedia.org/view/Dumps/Software_dependencies [14:15:39] well if you didn't want to run the python script and set up configuration and all that [14:15:41] you could [14:15:49] just with the MW installation running [14:16:04] run the two maintenance scripts, one for the stubs and one for the history [14:17:00] http://wikitech.wikimedia.org/view/Dumps/Rerunning_a_job in the "other stuff" section it gives some samples for the history [14:17:14] but that part hasn't been cleaned up yet, and it will likely get moved to its own page [14:17:36] with how to run various steps by nand, how to use dry-run to see what the python script would do, etc. [14:17:50] so that will likely have the info you want when I get it done. [14:19:41] oh, great :) [14:20:04] can you wait a few days? (I'm trying to take the weekend off ;-) ) [14:20:17] eh apergos, just asking: For the split dumps of enwiki, why is it named with the page numbers? [14:20:28] it looks quite messy to me [14:20:30] that's the "chekcpoint file" system [14:20:45] apergos, sure! I'm already ashamed of asking you on saturday... [14:20:50] well first it saves all those people from asking which pages are in which piece, folks seemed to want that [14:20:56] looks difficult for me to automatically download it from a script [14:21:01] but also *I* want it, because... [14:21:13] it lets me know which files contain which in case we need to rerun them [14:21:17] well, one. [14:21:22] oh, I see [14:21:39] seems reasonable [14:21:42] automatically downloading should be easy enough [14:21:49] Hydriz, can't you just download the page and feed wget with it? [14:22:11] yeah, but when uploading to the IA, you would need to specify it in full using curl [14:22:14] grab the names out of the index.html if you want to do it file by file [14:22:36] orrrr [14:22:41] get em out of the md5sums file [14:22:47] heh [14:22:47] that's much easier to parse [14:22:53] always has a predictable name [14:23:00] and then you can construct your urls easily [14:23:24] I try to work with it then [14:23:41] ok [14:23:48] I think that solution should work ok for you [14:23:56] and another question: the 2 connection per ip, is it also implemented for the Toolserver too? [14:24:17] probably for them too [14:24:19] I once tried downloading 3 files at a time from the same host, seems okay to me [14:24:24] hmmm [14:24:27] well please don't [14:24:31] yeah [14:24:34] see that's the one host we write to [14:24:37] I reduced it now [14:24:40] as in the dumps are written to it [14:24:54] so if we start to lose on network bandwidth there we are screwed [14:25:09] * apergos makes a note to think about toolserver access [14:25:16] especially during weekends [14:25:20] more screwed? [14:25:50] well [14:26:03] if nfs get slow because bandwidth in general to that host is tight [14:26:11] then *writing* the dumps there gets slow [14:26:21] want to avoid that at all costs [14:27:02] I saw there was discussion about some sort of centrallized holding area for dump copies over there [14:27:04] this brings us back to having mirrors [14:27:08] I would encourage that [14:27:34] well the best would be to have a mirror right there in esams [14:27:35] but the brazillian mirror is very outdated [14:27:41] just for toolserver use [14:27:41] yeah [14:28:00] yeah, we are exchanging emails, he's having some trouble with the list of files to be synced [14:28:21] I can't see any problem on my end...in the meantime I sent out a pile of emails to check on other possibilities [14:28:28] the folders always look updated [14:28:36] but peering inside says otherwise [14:29:04] I'd really like to get some academic institutions in on this [14:29:32] sigh, if only we can just invest more in dumps [14:29:51] ? [14:30:05] like, make more servers so that our download speed is faster [14:30:22] we shouldn't need to have more than a couple of our own servers [14:30:33] we should have multiple organizations hosting them [14:30:59] if I think of the dozens of free TB in my university storage... [14:31:03] yeah, sigh [14:31:24] if there is one lesson to be learned after however many years of watching the internet evolve it is that decentralization is always sturdier [14:31:35] than a centralized model [14:31:53] so Nemo_bis, if they have 6 or so free tb, ... who do you know there? [14:31:54] :-) [14:32:33] well, the CTO (or how you could call her) doesn't seem to hate me [14:32:42] sweet [14:32:42] lol [14:32:47] can't hurt to ask [14:33:01] I mean "hey, we would have bragging rights that we're hosting backups for Wikipedia" [14:33:12] it doesn't actually look bad on one's list of accomplishments, right? [14:33:31] I've already asked her [14:33:35] and? [14:33:37] Nemo_bis: Where is your university? [14:33:45] When we were in the meeting deciding to buy those new servers [14:33:47] Milan [14:34:08] She said that she doesn't know, that space will be unused but who knows, blabla [14:34:16] Wikipedia-ing where that is [14:34:22] s/will be/is currently mostly/ [14:34:24] oh Italy [14:34:43] I guess we should target universities with open source/ open content initiatives [14:34:50] they'd be more likely to sign on [14:34:56] ours is one we could say [14:35:12] and we're the national peering point with Google and many others for the university network [14:35:28] hmm [14:35:32] you could... [14:35:39] ask her if it's ok to give me her email [14:35:48] and I could write a sort of pitch [14:36:14] well it would be in English, but what can we do (unles sshe wants it in Greek, those are my two choices) [14:36:43] she can speak English :) [14:36:48] I am sure [14:37:00] I would just feel bad that I can't write in the language in use [14:37:03] it sort of sucks [14:37:40] the fact is, this is potentially not her task [14:37:47] oh [14:37:54] who would be a good contact, do you think? [14:37:54] we have also http://mirror.garr.it/ but technically is not managed by the university [14:38:06] private? [14:38:28] no, public, but weird management [14:38:33] hmm [14:38:50] http://mirror.garr.it/iwantamirror.html [14:38:54] I contacted the guy who takes care of it but got nowhere, although he was very kind [14:39:01] what did he have to say? [14:39:14] I guess I have to call him or ask a prof. who loves me and might know him [14:39:18] :-) [14:39:30] again I am happy to write something all officially sounding from the WMF etc [14:39:39] if it might help :-) [14:41:08] good [14:41:17] I should call him and discover what he wants [14:41:41] then if they don't have space enough my plan is to ask the uni's CTO to lend them some space, so she doesn't have to worry [14:41:48] does your univ have a nice acronym or something? I will add it to my list of "checking into" so it doesn't fall off my radar [14:41:50] no idea whether this can actually work though [14:41:55] unimi [14:41:59] ok [14:43:00] and added. [14:43:35] heh last time I estimated the space useage it had to be an estimate, cause we didn't have 5 recent wp full dumps [14:44:06] this time I jsut went and du'ed all the files :-D [14:45:44] Eh apergos: Just remembered about the POTY files. Is it possible for you to also compile 2008 and 2009? [14:45:45] the problem last time we talked about it was how to fetch relevant files [14:45:56] I understand that now you can just let people use rsync? [14:46:05] as with that university [14:46:26] but then, would one be able to get only the most recent dumps (if not able to store multiple copies)? [14:47:39] I have an rsync-produced file which lists all the files of the last 5 good dumps for each project [14:47:43] it's generated once a day [14:47:54] that's already 6T [14:48:06] if they get set up with that we can talk about older stuff [14:48:21] no less than 6T? [14:48:33] no less, for the last 5 dumps of all projects. that is correct. [14:49:18] and you're not interested in less demanding mirroring? [14:52:18] well keeping only the last 1 or 2 is .. [14:52:28] I mean it's ok but we really need to have a few of them out there [14:52:57] wait til I start shopping around for folsk who can mirror image tarballs :-D [14:55:36] I really want people who can do more than 6T so I can give em some old stuff too, but I figure "last 5" is the top priority [14:56:51] The IA always has 925TB at its disposal... [14:56:52] Be interesting to see how end users use them [14:57:03] Downloading and storing 5TB isn't easy [14:57:07] I'm trying to get someone at IA on the hook for this [14:57:25] oh they won'tt [14:57:32] they'll download the files from their projetc of choice [14:57:36] sometimes that's en [14:57:44] (wp), sometimes it's something else [14:57:46] heh [14:58:07] some of them want history, some just want current pages, some want several dumps over time [14:58:52] if something happened to our disks, having offsite with the last 5 would give me a comfortable assurance that we could start up the dumps again with prefetch and everything working, even if some of them had unnnoticed data corruption [14:59:34] (this has happened before, hence I would expect it to happen again: both "unnotied data corruption for several runs" and "woops, there are no dumps. at all. they are all gone. what do we prefetch from??" [14:59:35] ) [15:01:03] it used to be that anyone doing research on en pedia had to use these dumps from 2007... then tomasz got some dumps up in 2010 and I gues folsk used those for awhile [15:01:18] now they can get em every month, it will be interesting to see what they do with them [15:01:45] * apergos is reminded to bite the bullet and subscrbe to the friggin research list, ugh [15:03:04] no. not going to. [15:03:12] eh, since some of the Wikimedia Ops are here, I just want to ask a question [15:03:30] why doesn't Wikimedia implement rsync and ftp access to the dumps? [15:03:33] ok but if I have to think about it I probablyy won't answer :-P [15:03:41] rsync is enabled for mirror sites, not generally [15:03:44] ftp is slow [15:04:00] like, downloading via http is hell [15:04:17] ? [15:04:20] I am tired of waiting, haiz [15:04:40] rsync is quite fast [15:04:43] (I've always used wget -c and never had to worry about it. in the days when I had a good net connection and downloaded things) [15:05:15] yeah [15:05:29] but when things get big you just hope that it can just finish faster [15:05:38] well it's slow because we cap bandwidth [15:05:44] as we were saying earlier [15:06:01] yeah, which is quite a thorn [15:06:04] and that's because we had issues with people on backbones downloading as fast as they could [15:06:11] and saturating *us*. which is not ok. [15:06:20] this, once again, is why we need mirrors [15:06:37] so if we were to draw a chart, mirrors are the center of everything? [15:06:54] well, we are: we create em, then push them out to others who then.. [15:07:11] make available for: rsync, ftp (why would people want ftp, I dunno), and whatever else [15:07:12] sigh, just hoping for that day when people ask us to get permission to mirror content, like Linux distros [15:07:17] this would go for: [15:07:36] the dumps. the pagecount stats. tarballs of groups of the images. etc [15:07:39] we are like asking people to mirror our content, when the desired is the other way round [15:08:02] I have kinda moved all the non-dump files to IA [15:08:09] how many mor epeople use wikipedia than linux dstros, I wonder [15:08:11] hmmmm [15:08:11] except for incr [15:08:22] that's ok, the add/changes are very experimental [15:08:35] yes, and it is very weird [15:08:38] I mgiht be able to get back to those for a bit near the end of the month [15:08:45] to make them a bit more robust anyways [15:08:59] it does not allow me to download the dumps using --mirror --no-parent [15:09:12] gives some crap error, sigh [15:09:29] you can email me that and I can see if that's be designor by stupidity [15:09:32] ariel at wm.or [15:09:42] *by design or [15:09:46] scuse thetypos [15:09:58] its just downloading the directories as index.html files [15:10:07] and does not create the different directories [15:10:21] so it would overall prevent download of the dump itself [15:10:57] so if you send an email "I run this. I get this error. I get this output (here's a sample)". I can look at it [15:11:06] or bugzilla and assignto me, either way [15:11:25] hmm [15:11:35] that was a long time ago, so let me try it again... [15:11:41] ok [15:11:47] well I won't look at it this weekend anyways :-D [15:12:14] omg old dumps are also deleted for this [15:12:16] sighs [15:12:39] just hope that someone else notices this too [15:12:52] I am not keen to write emails these few days haha [15:13:03] ok [15:13:06] this is a small error [15:13:23] so lets not make a fuss out of it [15:13:36] unless you are crazy about perfection :P [15:13:41] whatever works for you [15:13:53] I'm crazy about stability but I also prioritize [15:14:24] I am just focusing on getting as much dumps to the IA as possible [15:14:32] before the server purges the old dumps [15:15:47] ok, sounds like you have got your plan worked out [15:16:12] kinda [15:16:30] uploading the different versions in batches of 10 wikis each [15:16:41] now just waiting for arwiki to finish [15:16:50] and BTW I just started recently [15:17:44] my serious worry is wearing out the dump server [15:17:58] my work is already significant in ganglia [15:17:58] by doing a full download of everything? [15:18:12] as long as you download two at once and no more, [15:18:18] I download all files of every dump version [15:18:20] yeah [15:18:23] I keep to that limit [15:18:52] if you see an increase in bandwidth of dataset2.wikimedia.org, yeah thats probably me [15:19:39] * Hydriz just wishes to turn this uploading to turbo power [15:20:51] what ip are you coming from? [15:21:03] (I'm lookking at bandwidth now) [15:21:11] TS I guess [15:21:18] which one? [15:21:45] yeah, TS [15:21:47] the Toolserver [15:21:52] yes bu which host [15:21:53] both nightshade and willow [15:22:17] now is just azwikisource [15:22:23] *azwiktionary [15:22:42] arwiki is uploading the last 2011 dump [15:22:50] what's the fqdn? [15:22:56] (sorry but I'm having a brain fart) [15:23:06] should be host.toolserver.org [15:23:31] I must have mistyped it, cause that's what I did [15:23:46] isn't really using space, only BW [15:24:03] I don't see anything from there but I do see something (a *lot* of use* from [15:24:10] clients.your-server.de ... [15:24:21] so you're uploading to IA via toolserver? [15:24:28] yeah [15:24:32] http://www.archive.org/details/wikimediadownloads [15:24:55] Can they not grab them directly? [15:25:04] hmm? [15:25:26] seems a bit daft to pull them from pmtpa to esams to then push back to wherever the IA is [15:25:28] IA doesn't have anything set up for that [15:25:33] shame [15:25:40] yeah [15:25:56] well that's why they are on our list of "really shouldn't you guys be doing this? it's right up your alley." [15:25:58] just seems really inefficient having a essentially un-needed middle [15:26:10] What I do is: Download file from the bottom of page, upload them to IA using curl S3, delete file, continue with next file [15:26:12] its really inefficient having human uploaders etc [15:26:22] mmm [15:26:28] not to mention that then Hydriz probably can't upload at more than 1 MB/s [15:26:35] heh [15:27:03] I dunno about the upload speed [15:27:08] curl doesn't report [15:27:12] so there are no connections from the toolserver ip range [15:27:19] which makes me think you're coming through something else [15:27:34] that I am not sure [15:27:40] I am just running from willow currently [15:27:42] Hydriz, it does if you redirect the output to a file [15:28:15] I don't output it anywhere, or else the 1.1TB of free space on user-store would be maxed out in no time :P [15:28:48] the log [15:29:03] if only I have access to IA's supposedly called "teamarchive" server [15:29:14] then I can just wget everything and poof its on IA [15:29:19] heh, that's for a few lucky persons [15:29:20] curl displays this data to the terminal by default, so if you invoke curl to do an operation and it is about to write data to the [15:29:21] terminal, it disables the progress meter as otherwise it would mess up the output mixing progress meter and response data. [15:29:21] If you want a progress meter for HTTP POST or PUT requests, you need to redirect the response output to a file, using shell redirect [15:29:21] (>), -o [file] or similar. [15:29:31] ehm, sorry for the flood [15:29:35] ah, I saw one connection [15:29:42] lol [15:30:14] imaging you looking live at things and oh there is one connection [15:30:24] basically yup [15:30:30] well I had iftop goin [15:30:32] g [15:30:37] but I didn't see you in there [15:30:44] anyways for now it's ok [15:30:52] if it turns out to be an issue we'll deal with it [15:30:58] hmm? [15:31:00] what issue? [15:31:29] I see this one site which I listed earlier [15:31:34] using a lot of bandwidth [15:31:44] if that turns out to be a problem we'll deal with it [15:31:46] for now it's ok [15:31:53] ganglia graphs are not spiking AFAICS [15:32:12] I was hoping to run like run 2 terminals on willow and another 2 on nightshade [15:32:27] but seeing the disaster that I may make, I need to think of a better way... [15:32:35] but the IA machine Hydriz mentioned could soon start downloading all Commons images apparently, so if you don't produce tarballs soon you might see a spike in netowrk usage there [15:33:00] Nemo_bis: We can probably stop underscor in time haha [15:33:05] if they start downloading all commons images by crawling, we will have a squid issue, not a network spike issue [15:33:17] unless they plan to request thousands of images at once [15:33:20] yeah, 13.3TB to download... [15:33:21] (in parallel) [15:33:26] heh [15:33:41] apergos: Be warned that the Internet Archive staff are a little crazy [15:33:45] I know [15:33:49] especially when it comes to real archiving [15:33:52] I know (or used to know) a coupl eof them [15:34:07] but even they know that if you break the machine wiht the data you're trying to archive, it's no good :-D [15:34:50] they look like they are abusing their own servers with the spurge of data [15:35:21] just looking at the catalog, someone new would think that only staff can contribute to the Archive [15:35:48] :-D [15:36:37] * Hydriz is feel very tempted to go turbo [15:36:53] * apergos gets itchy fingers [15:38:09] haiz, we need to be gentle with other people's stuff [15:38:36] just like borrowing a pen, you don't jerk it [15:39:08] Hydriz, you can't do much damage, both downloading from WMF and upload to IA are capped :) [15:39:22] yeah [15:39:34] you can't upload at 136 MB/s as Jason does :-p [15:39:54] but if I could mobilise >2 servers to massively upload/download the dumps, might kill muhahaha [15:40:40] Yeah, his upload of Jamaedo albums is shocking [15:40:55] totally dominated the catalog [15:42:10] I would really appreciate it if a "snapshot" of dataset2 is created and hosted on another server of the WMF [15:42:20] then I can really go turbo [15:42:59] lets just heck about the 1MB/s upload limit [15:44:46] and besides, I am just uploading only 2011 dumps, so a "snapshot" is just good enough for me [15:45:25] Hydriz: [15:45:35] ? [15:45:46] undfortunately I need to use such a "snapshot" host to set up our rsyncs to our one existing mirror first :-P [15:46:15] thats sad [15:46:19] no! [15:46:31] if its done ASAP then the very old dumps can be kept [15:46:35] otherwise the mirror runs way behind [15:46:50] like the March 2011 ones [15:47:13] yeah, its more important to have a mirror [15:47:40] * Hydriz thinks about the 679 servers that Wikimedia has :P [15:47:49] they are busy serving data :-P [15:47:53] to readers like you and me [15:47:57] yeah true [15:48:02] I see the docs on wikitech [15:48:08] very interesting, the cluster setup [15:53:49] looks like there is only 3 servers that hosts dumps, and 1 is down for a year? [15:54:08] there is *one* server that hosts dumps [15:54:20] the other one has been down *more* than a year [15:54:32] almost 2 years now [15:54:38] yeah [15:54:48] it is even out of warranty, if they ever get it fixed and close the ticket, that's the end of service [15:54:51] if we can get that up, then it would actually be of use [15:54:53] eh, "service"... [15:54:57] does anyone have MRTG or similar graphs (or hourly reading/editing statistics) for the blackout, please? [15:54:58] don't [15:55:10] it will never come back up. we have wasted soooo much time on that pece of crap [15:55:12] *piece [15:55:27] (huh, someone reads my docs, woo hoo!) [15:55:28] so don't tell me you guys are using it for coffee? :P [15:55:35] no, we're waiting on the vendor [15:55:48] it'll just make me mad to talk about it though [15:56:02] If I had an account on wikitech, I would have tried to refine that docs [15:56:04] :P [15:56:14] so. dataset1001, the new host, just racked and mgmt set up last week [15:56:30] the one on dataset1? [15:56:38] we leave it there for posterity [15:56:49] as a reminder of how awful that was [15:57:00] nah, just hoping to fix minor mistakes [15:57:00] soooo. dataset1001. the new host... [15:57:14] what is the host used for the mirroring? [15:57:25] there's one hosst in production. [15:57:30] guess what it does -P [15:57:45] * Hydriz hints that he is finding out how dumps goes around [15:57:46] seriously: dumps are created there, served from there, and rsynced from there. (dataset2) [15:57:52] soooooooo [15:58:09] dataset1001, the new host, which will be installed next week [15:58:20] and for me? :P [15:58:39] absolute first priorty, (sorry, it is n't you) is to get a full rsync of sataset2 over to the new host [15:58:45] then get cron jobs going [15:59:20] in the meantime I need to talk with our guy with the one mirror (and there's another person that volunteered space, gotta see if that happens next week) [15:59:38] and see about how I want to structure this since we'll hav the two hosts [15:59:47] next week in the sense, the week of 22? [16:00:02] next week as in I'm sitting on my hands not to install it over the weekend [16:00:07] but I really need a break [16:00:12] heh [16:00:14] last week was intense with sopa and al [16:00:15] l [16:01:05] sigh [16:01:26] * Hydriz feels bad being pushed to a lower priority :P [16:02:26] * apergos offers Hydriz some virtual chocolate [16:03:04] lets just put it in the fridge, for me to eat when I really can get some uploading jobs running [16:03:45] how big is dataset1001 going to be? [16:03:51] (in terms of disk space) [16:04:00] I see something about image dumps [16:04:25] we're not going to try to ful lhost anything like that [16:04:40] isn't there a weekly router traffic graph somewhere?? [16:04:57] we could create one or two tarballs at a time and shovel them out to a few people to distrubte them (and upload them to IA etc) [16:04:59] jps: of what? [16:05:17] enwiki traffic or similar [16:05:22] or just total traffic [16:05:32] or reads/edits showing the blackout? [16:05:35] hmm [16:06:21] I dunno if this helps: http://dumps.wikimedia.org/other/pagecounts-raw/2012/2012-01/ [16:06:29] seems like something to do with pageviews [16:06:39] but usually things get chunked up in stats.wikimedia.org [16:07:58] http://stats.grok.se/en/201201/Main%20page is very interesting [16:09:00] What I don't understand, is why doesn't http://stats.grok.se/en/201201/Justin_Bieber show the blackout? [16:09:04] "Bandwidth: Wikimedia provides about 20 MB/s via dataset2.wikimedia.org (stats) for XML dumps, as of January 2011." [16:09:21] 20MB/s isn't really accurate for me (esp. on the Toolserver) [16:09:23] oh, wait, I understand that now [16:10:32] alright, now left with pages-meta-history.xml.7z for arwiki's last 2011 dump [16:10:44] though the speed is (very) slow [16:10:51] jps, pages were still loaded [16:11:16] yes. I guess I really need a daily graph of edits. Does that exist? [16:12:47] A router/squid graph of total GB in vs. out would work. Aren't there MRTG or similar graphs? [16:13:19] since editing is relatively large uploads while browsing is just GETs [16:13:59] ah HA! http://ganglia.wikimedia.org/2.2.0/ [16:14:46] ... [16:15:09] now left for the second last dump of azwiktionary [16:15:25] then I would let dataset2 take a breather :P [16:16:08] How to get http://ganglia.wikimedia.org/2.2.0/graph_all_periods.php?me=Wikimedia&m=load_one&r=hour&s=by%20name&hc=4&mc=2&g=network_report&z=large for the past week? [16:16:29] duh nevermind (scroll down) [16:16:34] herp derp [16:17:23] sadly spikes kill that, but it has CSV [16:18:11] got it. Thanks all! [16:19:09] ok, got to sleep now, thanks all too :) [16:20:51] look away for a few minutes and everybody leaves [16:26:50] PROBLEM - Puppet freshness on knsq9 is CRITICAL: Puppet has not run in the last 10 hours [17:45:09] Change abandoned: Catrope; "We don't need this at all if we use the Gerrit hooks plugin for Jenkins, because it has per-repo act..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1794 [18:33:07] !log reedy synchronized wmf-config/InitialiseSettings.php 'Setting timezone for th projects' [18:33:08] Logged the message, Master [18:34:19] hi [18:35:43] !log reedy synchronized wmf-config/InitialiseSettings.php 'Cleanup wgEnableDnsBlacklist, enable for th projects' [18:35:45] Logged the message, Master [18:43:38] Juandev wants an etherpad pad deleted... is someone around? [18:47:11] no [18:47:23] Apparently last time it was asked, the asker said it was easily done via an api call [18:47:32] so in theory, they can maybe do it themselves [18:49:19] Reedy: < Juandev> I talked to guys at #etherpad and they told me this could be by developer via API [18:49:29] idk what "by developer" means [18:49:43] hello [18:49:46] inndeed [18:49:55] can you kill me an etherpad page? [18:55:58] !log reedy synchronized wmf-config/InitialiseSettings.php 'Setting wgNamespaceRobotPolicies for th projects' [18:55:59] Logged the message, Master [19:14:40] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33215 - Enabling transwiki import on sa.wiktionary+' [19:14:42] Logged the message, Master [19:19:22] !log reedy synchronized wmf-config/flaggedrevs.php 'More for bug 29742' [19:19:23] Logged the message, Master [20:20:30] zzz =_= [20:26:58] New patchset: pugmajere; "Clone git-setup from the puppet repository and update it for the software repository." [operations/software] (master) - https://gerrit.wikimedia.org/r/1998 [20:27:00] New patchset: pugmajere; "Simplify the aliases for the simplified branch "tradition" in the software repo." [operations/software] (master) - https://gerrit.wikimedia.org/r/1999 [20:27:01] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/1999 [20:28:16] New review: Lcarr; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1999 [20:28:50] Reedy: ping [20:29:09] ohai [20:29:10] New review: Lcarr; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1998 [20:29:10] Change merged: Lcarr; [operations/software] (master) - https://gerrit.wikimedia.org/r/1999 [20:29:11] Change merged: Lcarr; [operations/software] (master) - https://gerrit.wikimedia.org/r/1998 [20:29:35] Reedy: does open search support "Did you mean" functionality? [20:29:44] through the api [20:31:20] preilly, action=opensearch&search=foobar&suggest I think [20:32:05] Reedy: http://en.wikipedia.org/w/api.php?action=opensearch&search=indai&suggest [20:32:12] New patchset: Lcarr; "1st edition of fw creation tool" [operations/software] (master) - https://gerrit.wikimedia.org/r/2000 [20:32:12] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2000 [20:32:17] Reedy: doesn't give me a suggestion of india [20:32:25] New review: Lcarr; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2000 [20:32:25] Change merged: Lcarr; [operations/software] (master) - https://gerrit.wikimedia.org/r/2000 [20:33:26] https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=indai does have india [20:34:03] typing indai in the search box doesn't give india as a suggestion though [20:34:05] preilly [20:34:44] jeremyb: that is what I'm trying to fix [20:35:17] jeremyb: adding "did you mean" to the suggestions [20:36:13] preilly: cool, we realy need more fun in the wikipedia ;) [20:39:47] Reedy: so, is it not implemented [20:40:27] preilly, https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=indai [20:40:33] [20:40:44] can't do title searching [20:40:54] Reedy: yeah, I saw that [20:41:03] Reedy: but, is that not in open search at all? [20:41:34] seemingly not [20:41:45] do note there is an OpenSearchXml extension too [20:41:48] but that doesn't add it either [20:42:05] Reedy: do you think it's okay to try to add it? [20:42:54] Can't see why not [20:44:01] Reedy: okay, cool [21:37:25] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 39 MB (0% inode=60%): /var/lib/ureadahead/debugfs 39 MB (0% inode=60%): [21:40:04] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=60%): /var/lib/ureadahead/debugfs 0 MB (0% inode=60%): [21:44:10] bleh [21:45:39] !log reedy synchronized php-1.18/includes/api/ApiParse.php 'r109695' [21:45:41] Logged the message, Master [21:47:34] RECOVERY - Disk space on srv219 is OK: DISK OK [21:50:04] RECOVERY - Disk space on srv220 is OK: DISK OK [22:19:05] can anybody tell me where the content that was on the English Wikipedia SOPA blackout page lives? [22:20:27] peteforsyth, CentralNotice [22:20:31] AFAIK [22:20:53] thanks. Is there a way to still view it? [22:20:55] but if you mean the ZIP codes etc. that was an extension [22:21:22] the main blackout page is what I'm interested in -- if there's a way to wikilink to it. [22:21:34] get the banner (template) name and attach ?banner=ID to any URL [22:22:02] peteforsyth: http://meta.wikimedia.org/w/index.php?title=Special:NoticeTemplate/view&template=blackout [22:24:02] p858snake|l: that's great, but I think it only has meta banners. [22:24:22] Nemo_bis: are you able to help me find that info? the template name and the banner id? [22:24:30] peteforsyth, http://en.wikipedia.org/?banner=blackout [22:24:48] marvelous, thanks! [22:25:04] which is what p858snake|l linked btw :-) [22:25:10] I've been too slow [22:25:59] * Nemo_bis uses brute force to reboot Firefox [22:26:31] ah, so the one p858snake|l linked is the source for what you linked? [22:26:59] peteforsyth, yes [22:27:17] banners don't work completely on that special page, you have to use the URL trick to actually test them [22:27:35] got it, thanks! [22:33:28] My pywikipedia login isn't working for pywikipedia specifically [22:33:32] even though it worked yesterday [22:33:42] for zhwiki specifically* [23:01:45] gn8 folks