[05:29:45] Hi. I need some help in understanding the data dumps provided by Wikimedia... [05:34:34] Okay, I'm trying to understanding the data dumps. I'm working on 20180520 dumps. [05:34:58] It contains many sections, each having different data. [05:35:42] And I would like to know what each section's data represent. Although its written in a brief, I don't get it clearly. [05:36:49] Like in section "All pages, current versions only." Does each and every article's current version is present in this data? [05:38:40] I just downloaded "enwiki-20180520-pages-meta-current1.xml-p10p30303.bz2", the first page information is of "AccessibleComputing" but it does not have complete article's information in it? [06:14:20] [[Tech]]; 14.139.9.9; /* Queries regarding Wikimedia's Data Dumps. */ new section; https://meta.wikimedia.org/w/index.php?diff=18104707&oldid=18102456&rcid=11964974 [08:17:53] badman: "AccessibleComputing" is not a proper title, surely it's a redirect [08:18:02] What are you trying to do? [08:18:45] page_id 10, remarkable https://en.wikipedia.org/w/index.php?title=AccessibleComputing&action=info [08:24:35] Actually, I have just started working on a research project. And for the same I need English Wikipedia Data [08:28:13] BTW why some page have redirects and others don't have. And the one that do not have redirect have the complete Article's text? [08:29:39] What data does "All pages, current versions only." section even has? [10:18:13] And how to download the complete history of Wikipedia Article. [10:18:40] https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export says it doesn't work in some case, where the number of revisions are very high [10:19:12] Is there any alternative, to download the complete history of let's say "India" [11:59:26] [[Tech]]; Ruslik0; /* Queries regarding Wikimedia's Data Dumps. */; https://meta.wikimedia.org/w/index.php?diff=18105401&oldid=18104707&rcid=11966060 [22:59:22] Hi. [23:42:06] Is there any performance difference for (for example) doing one API query with 50 pageids and using continue, vs. using a smaller number of pageids per query so that no continue is needed? [23:46:51] Probably not... What're you looking up? [23:47:45] pageviews in this case. mostly just curious about it in general, though. [23:48:28] If it's an indexed DB lookup... Both should basically be the same [23:49:24] (I didn't realize that getting pageviews via mediawiki API means you can get data for multiple articles per query, unlike the wikimedia.org rest API... yay!) [23:49:34] cool, thanks! [23:57:50] ragesoss: The performance difference would be like HTTP overhead since you'll be making more requests. [23:58:09] Ursula: well, he said smaller number. Not 1 [23:58:32] So, if he does count( $ids ) approx = MAX_IDs... [23:58:36] Querying as many as you can at once is probably faster since you're only round-tripping once. But the data will take a bit longer to generate. [23:58:59] I do lots of stuff one at a time.