[01:13:34] are gadgets always opt-in? is there such a thing as a default gadget that is enabled? [01:15:30] ningu, yes there is such a thing as a default gadget [01:15:58] ningu, see the things listed as default on https://www.mediawiki.org/wiki/MediaWiki:Gadgets-definition [01:16:23] cool [01:16:39] I am thinking of a way to have front-end functionality to transliterate balinese script into latin script [01:17:07] basically it would go into page content and take relevantly marked-up chunks of balinese script and add transliteration after it [01:17:24] we might do this through languageconverter instead, but I'm wondering if a gadget is also possible [01:17:50] sounds possible [01:18:22] hmm... is there a way to use custom tags in wikitext even if they aren't registered with the parser? [01:18:37] I don't mean to have them do anything special, just exist so they can be detected in the output [01:18:48] like some text here to indicate that there's something special there [01:21:58] maybe
or some such [01:22:04] I doubt it [01:22:07] I just need a way to wiki-encode the bits I want to transliterate [01:22:18] ok, well I'll ponder this [01:22:18] that sounds like a recipe for (potentially security) problems [01:23:11] (I recognise the irony of saying this after discussing gadgets.) [01:23:24] * Krenair -> zzz [01:23:27] thanks [02:15:25] reading instructions here... https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector#Adding_fonts [02:15:41] I'm a little confused what it wants me to do with the git commit. is that the gerrit repo? it doesn't say. [02:15:58] I guess it must be [02:16:20] ok I see, there's a whole review thing [03:47:50] cscott: if you see this -- I'm pondering our discussion on LanguageConverter and how to mark/pick out chunks of text on a wiki page to transliterate. maybe it's not as challenging as I was thinking, since the unit we want to transliterate is the text content for a particular page on wikisource [03:50:26] I'm not totally sure though, how you'd then make sure to show that page's transliterated content right below the original context on page view. [11:27:21] wiki is down? [11:28:14] Request from X.X.X.X via cp3058.esams.wmnet, ATS/8.0.5 Error: 502, Cannot find server. at 2020-02-21 11:26:17 GMT [11:28:53] wfm [11:30:30] Is Commons down? [11:30:41] There are some network issues Ciell [11:30:46] Being worked on now apparently [11:30:47] ty [11:31:02] Then I'll take lunch now ;) [11:31:12] Bon appétit [11:31:21] thank you! [11:31:31] But it looks things are starting to get back up [11:33:40] yes [12:05:08] I've been thinking about graphs/data visualization support in wikis recently [12:05:34] I think the problem with Extension:Graph is its both too high level and too low level at the same time [12:07:06] Its too low level for non-specialists to effectively make visualizations, and the vega json syntax is far from intuitive [12:07:22] undeploy! [12:07:33] At the same time I think its too high level to make effective abstractions in templates [12:07:36] Reedy: lol [12:08:04] Maybe a lua version of https://observablehq.com/@vega/vega-lite-api would be cool [12:08:40] The high level-ness is also pretty limitting. Ideally there would be a place for say visualizations of physics problems on wiki, and vega seems really unsuitable to that [12:10:28] I think i like the idea of some sort of super sandboxed scripted svgs [12:11:55] actually scratch that. What wikipedia is clearly missing is http://xkcdgraphs.com/ [12:15:09] yesssss [12:16:39] Extension:xckd [12:16:45] messed that up, didn't I [12:16:49] Extension:xkcd [12:17:24] hackathon project [12:18:31] That.. could be fun [12:19:15] lol, that would be a fun showcase "Have you ever wished charts in wikipedia looked more informal..." [12:33:36] right, because that would be so unusual for a hackathon showcase [12:33:40] * Lucas_WMDE coughs in THICC [15:20:38] Crap we still need to finish that. [15:24:19] So looks like we have a lead on wtf is up with all the randoms asking for support for random crap on Project:Support_desk https://www.mediawiki.org/wiki/Topic:Vh7s7dfllc8041gs [20:09:38] cscott: you available? got a LanguageConverter question or two. better ideas than yesterday [20:11:06] ningu: sure, shoot [20:12:25] ok, so the hopefully easier bit is that I just discovered that php's intl extension already has ICU bindings for transliterators: https://www.php.net/manual/en/class.transliterator.php [20:12:40] so I'm wondering, if I add a new language to LC, if I can just use that [20:13:05] I did a test on one of our servers and it works fine [20:13:34] I don't know of wmf's installs have that extension but seems super likely [20:14:34] so basically the class would just wrap the appropriate calls to Transliterator [20:17:14] http://userguide.icu-project.org/transforms/general is very interesting, yeah. [20:17:28] pretty sure intl is in our composer.json requirements, let me check [20:17:53] yeah, so it's just a question of (1) if it's available in general, (2) if you see any issues with using it [20:18:15] our stuff already uses it so it will greatly simplify the port LanguageConverter [20:18:24] we "suggest" ext-intl [20:18:47] ok. can you check if multilingual wikisource has it? [20:19:33] that's the place we're targeting for palmleaf.org [20:19:38] checking [20:23:52] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/mediawiki/php.pp indicates that we install it unconditionally in production [20:24:04] cool [20:24:20] i think an interface between icu's transliterators and languageconverter would be very interesting/useful [20:24:54] we'd probably get a bunch of new transliterations "for free" [20:24:56] yes. depending on the particular libicudata, you'll get a bunch of things [20:25:06] cyrillic/latin, arabic/latin etc [20:25:21] lots more, actually, but even an old version will have a good amount, I think [20:25:30] the transliterators are in CLDR and have been built up over years [20:26:16] there is something called InterIndic which is an intermediate form that indic scripts can target. so you do X -> InterIndic, then InterIndic -> Y [20:26:40] http://userguide.icu-project.org/transforms/general#TOC-Designing-Transliterators has a nice metasyntax too [20:27:01] yes. it takes a little getting used to for certain stuff, but it's well designed [20:27:15] my colleague gave a tutorial on it at the last unicode conference [20:28:10] I like the attention to what they call "round-trip integrity" [20:28:23] that makes editing in a non-native transliteration much easier [20:28:37] for some reason most people don't know this is in ICU/CLDR [20:29:00] i didn't know it was there, and i've written icu bindings for javascript [20:29:24] I wrote this: https://github.com/longnow/node-icu-transliterator [20:29:30] because no one else had written it [20:29:57] and last night I fooled around and got icu transliterators working in wasm [20:30:03] heh, i wrote https://github.com/cscott/node-icu-bidi for the same reason [20:30:10] but I think the LanguageConverter solution is better [20:30:56] icu in wasm is surprisingly easy: https://github.com/TartanLlama/icu-emscripten [20:31:10] emscripten has come a long way [20:31:13] was just missing a couple headers [20:31:19] and then I used embind, and boom [20:31:23] strings worked fine [20:31:36] std::string to javascript and back [20:32:19] anyway, for the balinese case, the transliterator won't be in icudata already [20:32:38] there's also the question of offering alternative balinese to latin transliterators, since there are different use cases [20:33:09] what I would do is use php's Transliterator::createFromRules() to pass in the rules [20:34:31] if that works out well, I could totally see integrating that into core and re-writing some of mediawiki's bespoke transliterators the same way [20:34:39] so we 'standardize' on the icu rules specification [20:35:01] probably wouldn't really work for zhwiki and a few other oddballs, but would be nice to standardize the others [20:35:11] it would be good, yeah. I dunno what mediawiki's precise needs are. icu rules are not a formal standard but probably as close to de facto standard as there is [20:35:43] right, it won't cover things that go beyond transliteration. although my colleague said technically you could probably do a simplified/traditional chinese ruleset, it would just be huge :) [20:35:53] (mediawiki also has elaborate and baroque handling for roman numerals, which are exempted from most transliteration, but it looks like icu has a 'filter' mechanism that could be used for even that case) [20:36:02] hmm really? ok [20:36:14] I haven't considered all these edge cases [20:36:19] but hopefully they have [20:36:33] ningu: yeah, that's exactly what i'm wrestling with now (simplified/traditional chinese) -- very large ruleset and modified frequently [20:37:12] how does the interface work for the user? is the whole wiki stored in one variant or the other? or maybe each page can declare? [20:37:47] and is the conversion saved separately or auto-generated each time (and therefore not modifiable)? [20:38:38] (context i've been working on a specification using foma-style FST rules; cyrillic looks like https://github.com/wikimedia/mediawiki-libs-LangConv/blob/master/fst/sr.foma but the icu syntax is probably simpler in many cases) [20:39:04] ningu: policy is decided on a per-wiki basis (unfortunately?) [20:39:19] mostly depending on whether folks speaking that particular language are usually fluent in more than one script [20:39:46] in the serbian wikis, eg, most folks are bi-scriptal (making up a term) and so they've just standardized on one [20:40:24] all articles are authored in cyrillic (i think, don't quote me) and folks can switch between the variants for display [20:40:31] but it's discouraged from authoring in latin [20:40:49] ok, so the latin is not stored but just generated as needed and if there is an issue in the conversion it's fixed in the code, rather than in saved text for that article [20:40:51] but to take another example, on zhwiki most folks are only really fluent in one writing system [20:41:05] so it works like british and american english does on enwiki -- the original author gets to choose [20:41:07] but for serbian the conversion is really fully predictable [20:41:25] yeah. [20:41:44] for chinese it's at best more complicated, at worst hard to get perfectly right [20:41:47] there are mechanisms to manually fix up the conversion in wikitext if it gets it wrong [20:42:10] you mean like giving hints to the converter in the original? [20:42:10] those are rarely used on srwiki (mostly for roman numerals or things which look like roman numerals!) [20:42:16] and very frequently used on zhwiki [20:43:05] so like a little code that says this "A" I just wrote which normally converts to "B", here converts to "C" instead [20:43:30] yeah: https://www.mediawiki.org/wiki/Writing_systems/Syntax [20:43:31] for roman numerals I guess it would mean "ignore this part, don't convert at all" [20:43:43] you can do it either for a single case, or as a rule which then applies to the rest of the page [20:43:54] ignore this part is easy, that's -{ ignore this stuff }- [20:44:20] ahh right, I saw that syntax doc but didn't read it yet [20:44:32] https://en.wikipedia.org/wiki/User:Cscott/LanguageConversion has some notes on the broader context [20:44:48] that syntax doesn't look like a parser function or template ... it's something else? [20:46:27] yes [20:46:55] rather recently (the past few years) fully integrated into the parser; in the past it was sort of glommed on at the end which led to... interesting edge cases [20:48:48] https://phabricator.wikimedia.org/T54661 [20:49:11] btw is the parsoid still supposed to eventually take over? [20:49:22] yes indeed [20:49:26] I heard there's a new parser under development for like 10 years [20:49:44] heh. not quite that long, but close. [20:50:09] it's tough cause there is so much stuff out there to support and not break. but yeah, hopefully will help a lot of code get cleaned up [20:50:15] where is that blog post [20:50:59] google is doing me wrong [20:51:02] i think 2012 was the start of the parsoid project. and yes, not breaking stuff is what makes it very tough. [20:51:21] apergos: it's on phame. wmf doesn't have a reasonable blog presence at the moment. [20:51:42] yeah but even phame should be visible to the all seeing eye [20:51:54] https://news.ycombinator.com/item?id=22315283 for the article and commentary (including by kmaher) [20:52:30] cscott: oh, this is your job too? [20:52:34] https://phabricator.wikimedia.org/phame/post/view/189/parsoid_in_php_or_there_and_back_again/ [20:52:35] I just know you as the LanguageConverter guy :) [20:52:47] it's a great read (I was not involved at all so I can say that) [20:53:04] yeah it looks very useful as an overview [20:53:44] https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/ was a big part, too [20:54:06] 3ish years of the parsoid decade were spent on that ;) [20:56:34] cscott: for the use case where I want to take designated chunks of balinese text and show transliterated latin below it, I'm thinking maybe the right implementation is a template that will expand to the passed in text plus transliterated below [20:56:47] maybe there is a more proper way to do it [20:57:03] but anyway, there would have to be a parser function or something to call LanguageConverter [20:57:29] I dunno if templates are meant to take large chunks of text like that [20:59:41] i think i'd suggested transclusion before, so the source would be {{:SubPageWithContents}} [20:59:43] or something like that [20:59:50] ahh [20:59:51] ok [20:59:56] templates aren't terribly if your pages aren't huge [21:00:12] there's a template arg size limit, let me check what that is set to by default [21:00:26] so LanguageConverter would add a parser tag for ? [21:03:21] i was thinking initially your code would be a small/thin extension [21:03:53] yeah, at this point I'm hoping to avoid extensions entirely [21:04:21] add to LanguageConverter and UniversalLanguageSelector, use a couple tools for workflow [21:04:24] it may not quite be enough [21:05:40] as an incremental approach, "putting the transliteration below the text" seems to me to be a sort of orthogonal issue [21:05:47] a gadget is another possibility, I guess [21:06:21] yeah, so you could use a gadget, or a thin extension, or even an option to proofread page (in the case where you convinced tpt this was a generally-useful feature) [21:07:10] ok, but would the thin extension use LanguageConverter under the hood or is that not possible? [21:07:44] (i can't find the template argument size limits precisely; there's a "node limit" of 1,000,000 nodes, but that doesn't correlate well to content size) [21:08:07] if it's just a few lines to add a parser tag and call LanguageConverter, then I can write that and probably won't be hard to have it reviewed [21:08:39] (sorry, multitasking. short answer is, i bet for reasonably-sized text, you could get away with shoving it in a template argument, although eventually https://phabricator.wikimedia.org/T114432 might come in handy) [21:08:59] yeah, i'm hoping the extension uses languageconverter under the hood, yeah. [21:09:02] ah yeah, heredoc would be much nicer [21:09:14] but yeah either way, need some way to hook into languageconverter [21:09:53] ok, well as long as it won't be a big bottleneck to have the extension reviewed and hopefully deployed to multilingual wikisource, it's not a problem [21:10:35] zhwiki has a gadget which does some dirty-ish tricks, where they invoke the mediawiki API client-side to re-render the page in such a mode that all the language converter rules emit machine-readable output. they then parse that output and use it to display a custom editor widget for the rules on the page [21:10:44] I don't want to hard-code some particular display mechanism to languageconverter or anything else cause it seems hard to predict exactly how people might want chunks of text to be arranged [21:11:25] in theory you could do a trick like that where your gadget asks mediawiki to re-render the page (or the transcluded page) with a certain languageconverter variant, then inserts that into the page at the right spot [21:12:22] again, just trying to decouple the display portion from the rest of the task [21:13:04] what the name of the zhwiki gadget? [21:13:07] what's* [21:14:37] ...looking it up [21:15:07] (but it's what the "D" flag in https://www.mediawiki.org/wiki/Writing_systems/Syntax#Common_flags is used for) [21:17:14] ah, found it: https://www.mediawiki.org/wiki/Requests_for_comment/Scoped_language_converter#NoteTA [21:17:52] thanks! [21:25:26] Can someone give https://phabricator.wikimedia.org/T245873 a look to ensure it’s tagged right? [21:25:38] It breaks a lot on wikis from core [21:28:32] Only if people put the wrong thing in [21:29:09] As per the comments on the miraheze task, preventing it in ManageWiki makes more sense [21:30:17] Reedy: we’re looking at that. Core shouldn’t allow it. Void mentioned it’s likely $wgExtraNamespaces etc. allows it [21:31:09] It seems odd that in many years no one else has done the same thing [21:31:15] And therefore had a need to report it [21:31:59] I wonder why no one has found it [21:35:31] Patch in theory to Setup.php would probably trivial [21:35:45] Because there’s no validation currently. Just joining of two arrays [21:35:54] Possibly compounded by using += [21:38:51] Seems worth doing for safety. The only other namespace that isn’t editable as such is Media: but I can’t think of much impact otherwise [21:46:25] Reedy: do you want to comment on the task or shall I just quote your last few lines [21:52:28] The bug report isn't actually the most clear [21:52:44] What is ending up in $wgExtraNamespaces? [21:54:51] Reedy: Special [21:55:55] you’d define(“NS_SPECIAL, 3xxx) the $wgExtraNamespaces[NS_SPECIAL] = “Special”; [22:01:05] Well... [22:01:25] IF you use define("NS_SPECIAL", XXX); all bets are off [22:01:31] There's not much we can do about that [22:01:48] Reedy: it can be anything between 3000-3999 [22:01:57] It doesn't matter [22:02:04] If you redefine the value of NS_SPECIAL... [22:02:22] Though, it should already be defined by the time it hits localsettings anyway [22:02:35] That’d be the sensivle way to do it. [22:02:54] LocalSettings is loaded after define.php [22:03:10] And you don't have to define anything to use $wgExtraNamespaces [22:03:23] So I'm still struggling to see how whatever bug actually happened [22:03:46] I believe define(“NS_RANDNAME”,3xxx) then $wgExtraNamespace[NS_RANDNAME] = ‘Special’; might work [22:04:01] "might work" is not a bug report [22:04:10] Reedy: It should per the docs [22:04:27] Yes [22:04:28] paladox: ^ how are we doing it for sure? [22:04:35] But you're not saying "this is what happened" [22:04:41] You're guessing and saying "this might've happened" [22:05:10] Is the problem that the text "Special" is used fro another namespace name? [22:05:23] Or because the -1 NS number is overwritten [22:05:44] Reedy: the issue I want to prevent is the name “Special” being used for a namespace [22:05:59] And what about every possible translation of Special? :) [22:06:43] Reedy: Not sure how they’d impact [22:06:51] Well, namespaces can have translations [22:07:02] But if they make all special pages unusable then we shouldn’t allow it [22:07:02] So from what i see Reedy, a user managed to create a namespace "Special" with the ns id of 3000. RhinosF1 what do you mean by " how are we doing it for sure"? [22:07:16] If you did the same thing for Spezial on a german language wiki [22:07:20] paladox: how does managewiki define the namespace [22:07:29] Sure, Special: would work, but Spezial: would be similarly broken [22:07:52] See also the other 362 language files we have [22:08:36] That’s a lot [22:08:52] And any time we added more lang support would be affected [22:09:43] paladox: do we overwrite NS_SPECIAL’s variable or just the name in $wgExtraNamespaces ? [22:09:50] wgExtraNamespaces [22:09:57] Good [22:10:55] you can see this at ManageWikiHooks.php#L116 [22:12:15] paladox: thx [23:15:02] is there an easy way to see what $wgContLang (content language) is set to on a particular mediawiki install? [23:15:10] I mean a wmf site [23:16:50] well actually, I see in view source, "wgPageContentLanguage":"en" [23:16:57] so that kinda answers my question [23:19:51] ok, and universallanguageselector allows [23:25:02] who should I talk to about universallanguageselector questions? [23:50:02] ok, I think I answered my own question by grepping through the repo