[04:51:24] [[Tech]]; Sarri.greek; /* Edittools at el.wiktionary */ Where can we ask for help for small wiktionaries?; https://meta.wikimedia.org/w/index.php?diff=19938244&oldid=19936608&rcid=15154495 [18:47:28] Can anyone confirm what the lowest supported iOS version is? [18:48:17] https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/.browserslistrc says 6.0, but there's a complaint on enwiki with iOS 10.3.3 [19:02:44] AntiComposite: Do they mean the mobile web, or are they complaining about the ios app? [19:03:04] web, couldn't install the app: https://w.wiki/Lcv [19:04:28] "buttons seem too be missing" is a useless bug report :P [19:04:46] https://www.mediawiki.org/wiki/Compatibility#Mobile [19:04:49] That says that file is grade A [19:06:22] iOS 9 is required for HTTPS now, so that might need updating (for WMF wikis at least) [19:07:43] It looks kinda wrong anyway [19:07:44] https://analytics.wikimedia.org/dashboards/browsers/#mobile-site-by-os/os-family-and-major-hierarchical-view [19:07:54] Only 12 and 13 show as >5% [19:08:04] * Reedy files a bug [19:11:53] https://phabricator.wikimedia.org/T248907 [20:50:00] Hello. [20:55:15] Hello SilverMelon15, how can we help you? [20:56:21] Hi. I have a question about the XML schema in wikipedia dump files. Is this channel a good place to ask? [20:58:11] !ask | SilverMelon15 [20:58:25] boo. that doesn't work here apparently [20:58:43] anyway, SilverMelon15 ask you question and if someone can help they will [20:59:25] https://meta.wikimedia.org/wiki/Data_dumps/Dump_format -- may or may not help [21:00:02] I notice there is a field in the schema. I would expect that to be the id of the parent record. And, I would expect multiple records to have a single parent. But, parentid values seem to almost all unique. [21:01:00] that corresponds to https://www.mediawiki.org/wiki/Manual:Revision_table#rev_parent_id in the live tables [21:02:18] SilverMelon15, which dump file are you using? [21:02:35] revisions are a linear chain in MediaWiki, so I would actually expect the /rev_parent_id to be nearly unique across records [21:02:45] Ah! That explains it. [21:03:52] Here's what I want to do. I have the bz2 XML dump. I want to find all articles in a particular category. Is there a notion of parent-child hierarchy in articles? [21:04:34] what's the filename of the dump file [21:04:44] Let me look it up. [21:06:12] SilverMelon15: broadly, no there is not a hierarchy in the articles at a storage structure level. Categories are kind of an add-on hack done in the wikitext space. [21:06:59] SilverMelon15: in the sql dumps, there is some tracking though -- https://www.mediawiki.org/wiki/Manual:Categorylinks_table [21:07:40] but even this is probably not what you really want as there is no formal taxonomy of the category graph itself [21:07:45] https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles-multistream.xml.bz2 [21:09:47] I think there is a draft proposal somewhere to move category data into a "slot" in the multi-content revision system that might make this easier to extract from dumps, but that's not a current feature [21:10:18] MediaWiki: lots of cool data hidden in unstructured text! [21:11:06] Is there a heuristic for getting all articles in a given category? The sledgehammer seems to be to grab all the article links from the top-level category page and keep following those until I get all of the leaf nodes. [21:12:00] yeah, iterating over the tree is about it [21:12:48] I don't think the english wikipedia has a root category, don't remember though. [21:12:49] Ok. Appreciate the reality check. And, that's not so much fun given how large that bz2 file is if I don't want to uncompress it. [21:12:54] https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bcategorymembers [21:14:08] I don't think that one does recursion through the subcategories though [21:14:15] nope [21:14:27] * bd808 is looking to see if the internal API of CategoryTree does [21:15:14] Using the API might be a better idea if you don't want to uncompress the dumps though [21:15:53] (the categorylinks table is in a seperate SQL (not XML) dump, if that wasn't clear) [21:15:54] using the API is almost always a better idea than using the dumps. almost always [21:16:06] https://en.wikipedia.org/wiki/Special:CategoryTree?target=Category%3ADogs&mode=categories&namespaces=&title=Special%3ACategoryTree [21:16:30] I think the internal API at https://en.wikipedia.org/w/api.php?action=help&modules=categorytree may actually do the full traversal [21:16:34] unless your usecase includes scanning all 6 million articles in less than a week [21:17:51] Didn't know about the API. This is great! Even though the category list feature of the API doesn't descend recursively I assume that I can take the result of the search for the top level category and do the recursion myself on the subcategory pages. [21:18:27] SilverMelon15: https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph/Documentation might be useful too [21:18:50] These are great suggestions! Thank you both! [21:21:13] oh.. but catgraph is apparently shutdown :/ [21:22:41] * bd808 deletes the stale doc page