[00:00:01] it is stored already [00:00:11] well, unless you want to aggregate counts themselves [00:00:26] https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_article_count [00:00:31] That's the list. [00:01:40] http://p.defau.lt/?OVirSKTx0i3tIxrawmJniQ and http://p.defau.lt/?sF4yI_9INn3_xnhEVO__Ig are the scripts I currently use. [00:01:51] But storing every page and page creator is expensive. [00:02:29] see, the "which is page's first revision" doesn't have anything to do with "who created articles" [00:02:36] you have to aggregate per-user creations [00:02:40] not per-page first-revisions :) [00:03:10] The page's first revision should give the information I want. [00:03:20] well, you still have to count then [00:03:48] Yeah, I don't hate COUNT(*) like some people do. ;-) [00:03:49] in this case "first revisions done by each user" is better dataset than "authors of each page's first revision" [00:04:03] well, you can do whatever you want at wikimedia [00:04:08] it can afford keeping all data in RAM nowadays [00:04:08] :) [00:04:16] Redis! [00:04:20] or that. [00:04:27] I hear it is now going that direction [00:04:35] It's been deployed, yeah. [00:05:10] Job queue and user sessions are going into Redis, I guess. [00:05:56] MySQL doesn't scale? [00:05:57] :) [00:06:11] https://www.mediawiki.org/wiki/Redis [00:06:32] "... but it has often been hard to manage the performance implications of the high insert rate." [00:06:58] MySQL isn't webscale!!! [00:07:34] :( [00:08:19] meh, I got a hammer, I use a hammer [00:08:28] heh [00:08:43] our job queue systems handle hundreds of thousands of jobs a second on mysql [00:08:48] \o/ [00:09:14] I bet SSDs and dedicated servers with no slaves would go a long way [00:09:27] not necessary SSDs or 'no slaves' [00:09:33] we run multiple instances though [00:09:36] of mysqld per node [00:09:43] every query in autocommit mode, rather than also having random MW queries mixed in [00:10:03] if there is just one long running transaction you get kind of screwed [00:10:19] yup [00:10:26] (or on a separate connection) [00:10:27] domas: do you have different tables mapping to different disks? [00:10:29] Slavery gets shit done. [00:10:42] AaronSchulz: nope [00:10:45] why [00:11:32] hmm, I was just curious how having multiple mysqlds helps...do you have boxes with lots of cores? [00:11:47] mostly for replication [00:12:03] hah, so that's the secret sauce for slaves [00:12:10] it had to be something ;) [00:12:15] not that many cores, but single-threaded replication is a bit painful once replication thread starts using 100% of a core [00:12:18] 5.6 may be helpful in that regard [00:13:07] ah, the great 5.6 :) [00:13:17] I hear wikimedia will use maria15 [00:13:32] I hear MySQL will be great in version 6. [00:13:33] much more opensource [00:13:39] mysql6 has falcon! [00:13:48] true database for the web [00:13:52] R.I.P [00:14:14] it's funny I was reading that webscale joke page yesterday morning [00:14:24] you know, wikimedia database would be much more interesting if it would be actually growing [00:14:40] http://mongodb-is-web-scale.com/ [00:14:40] I was very very surprised last time I checked enwiki size [00:14:48] Wikidata is growing. :-) [00:15:06] true [00:15:12] wikidatawiki will probably beat out enwiki at some point. [00:15:13] domas: well, asher and I keep making people put new stuff on extension1 for now [00:15:26] why [00:15:28] still, not aware of any crazy growth lately in any case [00:15:28] what is extension1 [00:15:46] biggest growth for database were all these quality initiative features [00:15:48] that blew up in space [00:15:49] it was the "crap cluster" though I guess it's a bit more general purpose now [00:15:57] Article feedback! [00:16:02] yeh, article feedback [00:16:09] was 20% of database at some point in time [00:16:18] All signal; no noise. [00:16:40] |...| [00:16:46] my favorite AFT data [00:16:50] domas: you know, I don't *want* db1056 and friends to fill up :) [00:17:12] lol [00:17:42] otherwise we'd need a webscale SAN setup [00:18:05] well, it can triple in size now [00:18:09] that will take around 9 years [00:18:11] or 15 [00:18:19] the new boxes in eqiad have an extra TB I think [00:18:32] the one I'm looking at is 3TB one [00:20:04] Susan wants to solve all these graph database problems, mediawiki should be rewritten to use a graph database [00:20:21] maybe categories should, pagelinks, heh [00:20:32] revision is separate object [00:20:38] with various edge to everything around [00:20:43] e.g. pagelink added by revision [00:20:46] or pagelink removed by revision [00:20:49] revision fits that use too, though you'd moving heaven and earth to refactor it [00:21:03] everything fits graph databases nowadays [00:21:07] you'd need facebook level manpower to get that coded [00:21:09] you know what is a graph database? [00:21:19] https://en.wikipedia.org/wiki/Graph_database [00:21:22] neo4J! [00:21:26] what used to be a row is now an object, what was index, now is graph edge. [00:21:28] tadaaaaaa [00:21:30] or whatever it's called [00:22:05] funny, with all these systems out there [00:22:09] largest graphs sit directly in innodb [00:23:42] AaronSchulz: I don't think there's that much work to rewrite stuff [00:23:49] useful API helps a lot [00:23:53] * AaronSchulz laughs a bit [00:24:21] at FB first everything was written using an API, then completely different server was pushed behind to serve that API [00:26:06] James_F: you look quite over there [00:26:20] AaronSchulz: Quite what? :-) [00:27:03] domas: are you queues just push/pop or do they manage dependencies? [00:27:18] they spawn dependencies [00:27:27] James_F: stunning? [00:27:40] domas: ? [00:27:40] p858snake|l: Aww. You can stay. :-) [00:27:56] James_F: are you implying others must leave? [00:27:58] well, jobs themselves are fairly simple [00:28:01] though I don't know much [00:28:01] :) [00:28:11] we have different queues [00:28:16] AaronSchulz: Yes. Yes, I am. Mwuhahaha. [00:28:43] * AaronSchulz walks home to go straight to an IRC meeting ;) [05:20:57] Hi team. [05:21:54] yes, i deleted these three images and the files have disappeared http://en.wikipedia.org/wiki/Category:Orphaned_non-free_use_Wikipedia_files_as_of_26_April_2013 [05:22:03] the pages themselves remain? [05:24:15] 404 Not Found [05:24:16] The resource could not be found. [05:24:18] File not found: /v1/AUTH_43651b15-ed7a-40b6-b745-47666abf8dfe/wikipedia-en-local-public.ec/e/ec/White_Light_Fever.jpg [05:24:36] !tech [05:24:58] closedmouth: you broke it! [05:25:23] yep [05:25:28] do i win something? [05:25:40] loneliness and heartache [05:26:09] but i win that every day :( [05:26:20] https://en.wikipedia.org/w/index.php?title=Special:Log&page=File%3ATiegs+for+Two+-+Family+Guy+promo.png [05:30:39] i guess i should open a bug [05:30:51] * closedmouth pokes Susan [05:31:42] open a can of [06:02:20] * Nemo_bis waves at juancarlos  [06:02:44] * juancarlos Nemo_bis. [06:02:49] On https://bugzilla.wikimedia.org/show_bug.cgi?id=48257 : I believe Finns were having also problems with upload.wikimedia.org timing out [06:03:28] and yesterday I had a file where it didn't load one thumb for that reason... but can it be related to bits? [06:04:24] no [06:06:09] indeed :) [06:34:13] Hello all! On Android, wikivoyage.org redirects to www.m.wikivoyage.org which does not exist [08:42:40] https://bugzilla.wikimedia.org/show_bug.cgi?id=48321 [08:44:40] i probably did that wrong [08:55:07] still CSS issues or the status of the channel can be updated? [11:12:05] aww mid-air collisions [12:32:18] preview seems to not work. meta wiki, tagging for translation [12:32:54] ah no [12:33:00] that's my mistake [12:51:05] heyas [12:51:44] Is there a framework for mediawiki that parses content, and lets me access an article, eg. one section at a time? [15:28:05] does anyone happen to know why a geodata query would work on en.wikipedia but not on de or fr? [15:28:13] for example http://en.wikipedia.org/w/api.php?action=query&prop=coordinates|info&generator=geosearch&ggsradius=5000&ggscoord=48.856578%7C2.351828&format=json [15:28:47] compared with [15:28:53] http://de.wikipedia.org/w/api.php?action=query&prop=coordinates|info&generator=geosearch&ggsradius=5000&ggscoord=48.856578%7C2.351828&format=json [15:42:27] http://www.wikivoyage.org/ [15:42:36] Heh, I'm surprised someone hasn't complained about the Wikimedia project logo. [15:43:14] And it seems to always have a horizontal scrollbar. [15:43:26] And the projects at the top disappear. [15:53:23] object(__PHP_Incomplete_Class)#2 (2) { [15:53:27] I officially hate you PHP [15:54:02] :-) [16:18:01] PHP [16:18:04] I really hate you [19:10:50] [[Tech]]; MiszaBot; Robot: Archiving 1 thread (older than 30d) to [[Tech/Archives/2013]].; https://meta.wikimedia.org/w/index.php?diff=5476781&oldid=5474853&rcid=4170883 [19:48:38] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Is_the_Job_Queue_out_of_whack_again.3F AaronSchulz is this something we ought to be taking a closer look at? [19:59:37] AaronSchulz: ^ I ask because you are Mr. JobQueue [20:05:47] reedy@fenari:/home/wikipedia/common$ mwscript showJobs.php enwiki [20:05:48] 293174 [20:05:53] Not great, but not bad [20:09:15] Anyone know when the first deployment to Wikimedia sites from Git was? [20:09:54] Not off the top of my head, sorry [20:10:04] the history from wikitech.wikimedia.org's Deployment page would help [20:10:27] Was just trying to update a metawiki page that said everything was from SVN [20:10:39] :( [20:10:48] 1.20wmf1 https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=shortlog;h=refs/heads/wmf/1.20wmf1 [20:11:02] 2012-04-10 shouldn't be too far off [20:16:37] updated it a bit now: https://meta.wikimedia.org/wiki/MediaWiki#Versions [20:26:57] sumanah: you can now also ask Hume https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large [20:27:10] I hope you're not Hegelian, but it should be good enough [20:27:39] * sumanah enjoys Hegel joke [20:40:06] AaronSchulz: Sorry, when he said completely disappeared, I thought he meant it. [20:42:10] Krenair: that whole page makes little sense [23:11:16] kaldari: thanks much for putting that info on the tin wikipage [23:15:29] greg-g: BTW, when I tried using the instructions at https://wikitech.wikimedia.org/wiki/Server_access_responsibilities#SSH git proxing to tin, I got 'ssh: illegal option -- W' [23:15:52] git = for [23:17:19] wow, all the letter for 'for' are 1 key away from the letters for 'git', so my fingers just went to the more familiar keys :) [23:17:52] no wories, quite regularly i think one word and type another, its quite odd [23:18:03] well, regularly maybe means once or twice a month :) [23:18:13] more like once a day for me :)_ [23:19:03] hah, nice [23:19:12] but, re your real question.... dunno! [23:19:50] greg-g: well now Rob is saying to use agent forwarding after all! [23:22:28] Doesn't agent forwarding allow the target host to use your private keys while you're connected? Or am I misunderstanding it? [23:22:44] basically, yes [23:25:22] if i understand the issue correctly, worst case scenario is someone root's the machine your ssh'd into and they get to use your keys to authenticate them to other machines in the network [23:25:38] they dont see your key, but they can use sign/decrypt/etc operations [23:26:22] huh, good thing the wp machine wasn't rooted, as far as we know, otherwise everyone who logged into it over that period would need new keys [23:26:46] the wp machine...? [23:27:09] if they're already root on tin, I think they win [23:27:20] Also why would they need new keys if the keys haven't been compromised? [23:27:27] Krenair: the wordpress machine [23:27:48] I thought 'wp' was Wikipedia :) [23:27:58] they dont see your key, but they can use sign/decrypt/etc operations [23:28:08] what ebernhardson said about someone having root on a machine that you ssh ... wait, yeah, it'd need to be a bastion/something forwarded *through* I think [23:28:48] with ssh agent forwarding the actual request gets forwarded into your local machine, signed/decrypted/etc then sent back. So they dont see the key, they just get to use it [23:29:05] If my understanding is correct, agent forwarding allows the bastion to instruct your machine to sign stuff but it never sees your keys [23:29:50] If that was the only way the keys were 'compromised' then changing keys wouldn't secure anything, right? [23:30:36] I guess... (I don't know the details of how things work here, I was just extrapolating :-) ) [23:45:26] Yes, that sounds right [23:45:48] The keys can't be compromised permanently, but they can be compromised temporarily [23:46:10] An attacker (or root) could get your agent to encrypt/decrypt things with that key, and log into another machine as you