[02:48:37] does anyone know how exactly UniversalLanguageSelector decides what language a page is in? [02:49:03] I guess this explains https://www.mediawiki.org/wiki/Manual:Language [02:53:46] ningu: i don't think that page is about universal language selector. its only about core mediawiki [02:54:20] ULS only adjusts the interface language. I don't think it has anything to do with page content language [02:54:55] I believe it uses cookies for previously chosen languages. It has an option to use browser's accept-language header, but i think that's disabled on wikimedia. I'm not sure if it has other methods [02:55:24] &uselang= and &setlang= ofc [02:55:33] bawolff: no, it's more than that. for example if you do bar and a webfont is needed for foo, ULS will use it there [02:55:43] this is nothing to do with interface language, what I'm worried about [02:56:14] what are you trying to do, and what's working/not working about it? [02:56:22] ULS will use it there* if the user has webfonts enabled in their settings [02:56:40] AntiComposite: so, if you go here: https://wikisource.org/wiki/Index:Bali-lontar-adi-purana-300ppi.pdf [02:56:56] if you go to the edit page, you will see that language code is set to "ban-bali" which is the Balinese language in Balinese script [02:57:08] ULS uses the webfont Vimala for that by default (since I submitted it to ULS) [02:57:11] ningu: yes. Its confusing because ULS does a bunch of different stuff [02:57:22] but, if you go to page 1 here: https://wikisource.org/wiki/Page:Bali-lontar-adi-purana-300ppi.pdf/1 [02:57:45] nothing seems to have told mediawiki that the text content of that page should be ban-bali, even though it's in the metadata, so ULS doesn't do anything and uses whatever is on the system [02:58:22] I believe that webfonts are disabled for wikis in some languages (like english) because it slows things down to try and find all the non-english text on the page [02:58:24] this seems like a missing feature of some sort in ProofreadPage. but what really confuses me is, in the ProofreadPage code I can see, Language code isn't even one of the standard fields [02:58:35] bawolff: yeah but that's not the issue here, I'm aware of that setting [02:59:21] language/language code is not here: https://github.com/wikimedia/mediawiki-extensions-ProofreadPage/blob/master/i18n/en.json#L59 [02:59:31] I'm not sure but wikisource must have modified the i18n files or something? [02:59:44] orit may have some other mechanism for extending it [02:59:54] ningu: You mean the proofread page metadata has it set to ban-bali language code? [03:00:15] MediaWiki's internal page metadata thinks its english https://wikisource.org/w/index.php?title=Page:Bali-lontar-adi-purana-300ppi.pdf/1&action=info [03:00:21] which is likely what ULS is looking at [03:00:36] bawolff: right, and my point is that that's broken [03:00:37] https://www.mediawiki.org/wiki/Universal_Language_Selector/WebFonts#How_webfonts_are_applied [03:00:48] look, a flowchart [03:00:55] haha thanks, that helps! [03:00:57] ningu: I agree, sounds like a bug in proofread page [03:01:17] bawolff: yeah, or at least a missing feature :) [03:01:19] https://github.com/wikimedia/mediawiki-extensions-ProofreadPage/blob/master/includes/Page/PageViewAction.php [03:01:26] I _think_ that's the relevant section of code where it should check [03:01:40] but, since language codes are apparently not even standard in proofreadpage, it's kind of hard to know how to fix it [03:02:06] oh, actually maybe not in the view action but in wherever the page language metadata is retrieved for the page content model [03:02:44] That would be ContentHandler::getPageLanguage [03:03:32] yeah thanks [03:03:38] and there is a hook to override it [03:03:56] PageContentLanguage [03:04:19] well that seems like it would make things easier [03:04:19] ok, well maybe I'll write up a phab ticket on this and see what people think [03:04:31] AntiComposite: yeah, it's not like there is no other way to solve it, but it seems kind of silly as-is [03:04:46] I don't want to write a bot to auto-set all of the predictable languages on every Page: page as they are created etc [03:05:00] (plus that means you couldn't customize individual pages if you wanted to) [03:05:03] If proofreadpage knows what the language should be, then i think it should definitely tell the rest of mediawiki [03:05:08] yeah basically [03:05:08] fixing things in ProofreadPage is always ideal [03:05:20] the mystery to me is where that functionality/code lives in ProofreadPage [03:05:28] since it doesn't seem to be in the main repo [03:05:48] which functionality do you mean? the proofread page metadata? [03:05:51] for example here: https://wikisource.org/w/index.php?title=Index:Bali-lontar-adi-purana-300ppi.pdf&action=edit [03:06:00] the string "Language code" is not found in the ProofreadPage repo [03:06:21] but the other fields are [03:06:26] (well, some of them!) [03:06:37] there seems to be some way to extend the metadata [03:07:41] https://github.com/wikimedia/mediawiki-extensions-ProofreadPage/blob/master/includes/Index/CustomIndexFieldsParser.php [03:07:50] bingo [03:08:11] AntiComposite: ok, so maybe I don't understand this: $data = wfMessage( 'proofreadpage_index_data_config' )->inContentLanguage(); [03:08:28] wfMessage doesn't necessarily use the i18n/*.json files supplied with proofreadpage? [03:08:31] https://wikisource.org/wiki/MediaWiki:Proofreadpage_index_data_config [03:08:37] oooh [03:08:40] ok got it [03:08:52] sneaky how all this stuff ends up stored in the wiki :P [03:09:09] Lets put a wiki in this wiki! [03:11:29] I suppose if people feel like its inapropriate to have proofread page "know" about the custom language field, you could have a PageContentLanguage hook for just multilingual wikisource that sets the language just for it, or something [03:11:37] is that type (string, page, langcode) defined somewhere? [03:12:04] not in code [03:12:28] bawolff: yeah, maybe. this will ultimately be (more) solved when the Balinese stuff is sufficiently active to make ban.wikisource.org, but for now we need to do something on multilingual wikisource [03:12:40] AntiComposite: is it just like a clue for humans then or what? [03:13:20] bawolff: I'm not sure either what's appropriate here, in terms of technical solution, but yeah I guess they could add a custom hook if they think that's more reasonable [03:13:39] I think CustomIndexFieldsParser does stuff with the type [03:13:43] on the other hand, it's not at all crazy to think ProofreadPage should have native support for language metadata [03:14:24] bawolff: well, it stores it, but it doesn't seem to use it [03:14:55] the non-default index data is just extra data [03:15:02] like "langcode" doesn't seem to be defined or standardized anywhere -- you could have it magically find the first field with that type or something [03:15:06] or maybe EditIndexPage.php in the case of numbers [03:15:13] but that seems kind of ugly [03:15:13] to change the input box type [03:15:31] AntiComposite: yeah, that seems right pretty much [03:15:57] the reason this really matters, in this case, is that most users don't have an adequate balinese font installed, so the webfont is not just nice to have [03:16:29] noto balinese is broken enough that it's best to avoid [03:17:00] I think the custom index fields evidently not being translatable is also a problem [03:17:10] hmmm [03:17:43] how does it handle the translatable ones? there doesn't seem to be a language-independent key to things like Author [03:17:55] it doesn't [03:18:02] so it just uses the wiki content language? [03:18:19] not even [03:18:22] or if it uses the user's interface language, I guess you just see it in what language they had :P [03:18:31] yeah but it has to use _something_ to generate the form initially [03:18:32] it takes whatever text is used as the key in the config [03:18:37] that's it [03:18:50] but the key is different in en.json vs es.json etc [03:18:56] so which setting does it use to pick it? [03:19:00] https://wikisource.org/w/index.php?title=Index:Bali-lontar-adi-purana-300ppi.pdf&action=edit&uselang=fr [03:19:10] yeah I see, heh [03:19:46] ok, it seems to use the wiki content language, not uselang: https://wikisource.org/w/index.php?title=Index:Bali-lontar-adi-purana-300-blah.pdf&action=edit&uselang=fr [03:19:50] that's for a new page [03:19:58] there's no language involved [03:20:01] it's literally just strings [03:20:12] AntiComposite: yeah but Author, for example, is from en.json [03:20:18] it has to know to look in en.json [03:20:21] for the custom ones yeah [03:20:24] I mean the built-in ones [03:20:53] https://wikisource.org/w/index.php?title=Index:Bali-lontar-adi-purana-300-blah.pdf&action=edit&uselang=qqx [03:20:54] alright [03:20:59] so this means if you changed your wiki's content language from en to fr, you'd have old ones with Author and new ones with Auteur [03:21:08] anything you see with a interface message key is translatable [03:21:17] everything that's just text is taken right from the config [03:22:03] compare these two [03:22:05] https://en.wikisource.org/w/index.php?title=Index:Foo.pdf&action=edit [03:22:10] https://fr.wikisource.org/w/index.php?title=Index:Foo.pdf&action=edit [03:22:21] https://fr.wikisource.org/wiki/MediaWiki:Proofreadpage_index_data_config [03:23:27] ok, so what's the purpose of this? https://github.com/wikimedia/mediawiki-extensions-ProofreadPage/blob/master/i18n/fr.json#L79 [03:24:08] maybe that's just a default or something [03:24:34] yes [03:24:45] that's what got me confused but I see now [03:24:53] ok, so there really is no built-in data model at all, just defaults [03:25:03] yup [03:25:17] hrm... well I dunno, heh [03:25:31] it seems like you'd have to make ProofreadPage a fair bit smarter (and more rigid which might not be good) [03:26:33] not really [03:27:19] not really what? there's an easier way? [03:34:52] all you'd need to do is say that if a wiki wants to override the page language, they need to define a paramter in a certain way [03:35:28] looks like data:"language" would work to mark a language code [03:35:57] I see [03:37:23] (now, had I been designing that form configuration, I'd have done it differently. but that's neither here nor there) [03:37:33] yeah :) [16:47:49] thanks Reedy :-) [18:53:31] FYI I created a phab task for the ProofreadPage stuff we discussed last night https://phabricator.wikimedia.org/T259645 [20:16:39] It's going to be tricky as long as the page language is stored in the database... [20:18:46] the hook *should* override whatever's stored in the database with Special:PageLanguage [20:18:55] but I've never tried, so I don't know. [20:20:26] it's not very nice to have incorrect information stored in the database [20:27:54] Nemo_bis: so the issue is that multilingual wikisource's default page language is english and it seems like a silly idea to require people who edit sources to set the page language separately for every single page of a source document [20:28:08] logically, if a source document is in one language, that should be recorded once [20:28:27] obviously some source documents could be in more than one language etc but it's a common case to have one document, one language [20:28:53] also, it's not necessarily true for the separate wikisource sites like en.wikisource.org etc that every single source is in the site language [20:29:53] sure [20:30:44] I don't think pagelang is necessary, I think it should use the PageContentLanguage hook [20:30:44] A few Wikisources are already using Translate and/or Special:PageLanguage. [20:31:29] or are you saying to hook into the page save action and set pagelang there? [20:31:47] I guess that would work, I dunno why it's better to store it in the db vs not, if it can be inherited from the index then why store it multiple times? [20:31:58] if the pagelang doesn't match what the index says, then sure [20:32:43] doesn't make much of a difference if it's going to be stored anyway [20:33:06] also if you store it in the db, then every time you update the language on the index page it would have to update the languages for all the pages in teh db [20:33:15] which may not happen often but will happen [20:33:36] unless you've got 10000-page-long texts that's not that big a deal [20:33:51] I don't mean it's a big deal for resource usage, but it's more code to write, no? [20:34:17] it just seems like it's trickier to get right [20:34:19] marginally [20:34:30] there are upsides and downsides to both [20:34:32] ok [20:35:05] if database storage for page language is enabled, it would make sense to update that value. [20:35:40] yeah, I can see that [20:36:15] so another question becomes whether wgPageLanguageUseDB should be required to be on for this feature of ProofreadPage to work -- I guess I can see the argument for that [20:36:30] rather than implementing some alternative workaround when it's off [20:36:42] I suppose if it's off, then ProofreadPage would just ignore the whole thing [20:36:53] the whole question of page languages [20:37:25] I _think_ it's on for multilingual wikisource (would make sense) [20:39:33] AntiComposite: it's tricky though. suppose you have a book that's 100 pages long, in Balinese script, but 5 pages are in Javanese script. so you specify ban-bali in the index. then I guess you can still specify those 5 pages manually as jv-java when they're created. but now there's no way to know from the db perspective which pagelang values were auto-set from proofreadpage and which were manual [20:40:22] I guess all this means is that if someone later comes to change the index language, it will not update all pagelangs in corresponding Page: pages but just the ones matching the old index value :) [20:47:49] As long as we keep the current method in core, I think indeed the more logical way to do it is that the page language is set in the database when the page is created. [20:49:05] Although it sounds kind of wasteful and one might want to revisit the entire matter in core at some point. (Not sure why we still even restrict the pagelang permission so much... that was supposed to be only a temporary thing.) [20:50:51] probably because nobody asked [20:51:35] Nemo_bis: yeah, it seems reasonable now that I think about it more. and it could be changed later I suppose [20:52:07] and whatever you do, you definitely want all api methods returning the right value for page language etc as they should, not just a hack for ULS [20:52:11] AntiComposite: actually people ask translation admins all the time [20:52:39] nobody asked for the group permissions to be changed, I mean [20:53:08] "We" should probably tell wikis whether it's considered safe to expand [20:53:11] I think in most wiki uses you maybe don't have a single wiki page where you can set the language and know it should transfer to a range of sub-pages? [20:53:24] so maybe the wikisource workflow is not a general problem [20:54:03] Yeah but still, on our several multilingual wikis and on pretty much any Translate wiki the page language is essential [20:54:04] either way the Page: pages aren't formally subpages of the index page [20:54:17] yeah, I was talking about the ProofreadPage fix still [20:54:24] for security implications of the permission I have no idea :) [20:54:42] I don't think you could do much damage with it [20:55:01] you already let people edit other bits of pages like the title, right? [20:55:06] but that's via a magic word in wikitext [20:55:12] the title not actually [20:55:27] er... sorry I mean the title that will display [20:55:59] {{DISPLAYTITLE:}} can't change the letters themselves [20:56:08] ah right [20:56:14] just casing and such [20:56:14] On Wikimedia wikis we have $wgAllowDisplayTitle = false [20:56:34] or whatever it was [20:56:46] btw why is pagelang set via a special page and not via a magic word? or vice versa, why is displaytitle a magic template and not a special :) [20:57:25] I guess a lot of mediawiki stuff has accumulated over the years [20:57:34] displaytitle is ancient stuff [20:57:50] probably it was born as a local trick and then grandathered as magic word [20:58:15] it seems kind of ugly in general to include stuff like that in the wikitext [20:58:23] but if you do it elsewhere it can be harder to use, I suppose [20:58:24] welcome to MediaWiki [20:58:35] but like even [[Category:]] is like this [20:58:37] IIRC the only reason we restricted the pagelanguage permission was to make it easier to get approval from WMF DBA [20:58:59] Supposedly the page table might be inflated or something [20:59:24] It's probably easier now to revisit that particular decision, just need to ask [21:00:03] is the issue that for whatever user that creates something in Page: (for example), ProofreadPage would be acting in their name in setting the page language and so that user would need that permission? [21:00:51] Or maybe we just didn't want to do the schema change on all wikis [21:01:18] isn't that just a matter of usepagedb being on at all though? [21:01:35] "That" what? [21:01:44] Making the permission explicit is just a matter of clarity [21:02:03] Nemo_bis: schema changes [21:02:11] if usepagedb is off then the schema won't change, I assume [21:02:18] Indeed [21:02:27] er, wgPageLanguageUseDB [21:02:41] but that schema change per se has nothing to do with the permission [21:04:58] Ultimately we should decide whether the pagelanguage in the DB should have the last word in wikis with mixed language content. Maybe not, it's not set in stone. [21:05:46] the alternative being something like the hook override discussed above? [21:06:09] Yes, which is basically what Translate did before Special:PageLanguage existed [21:06:13] I see [21:06:41] I think for my purposes it doesn't matter too much in the end [21:06:52] as long as it's possible to fix ProofreadPage [21:07:03] Should be! [21:07:14] cool [21:07:20] Also, schema changes on Wikisource ought not be too hard. We probably need to do it either way. [21:08:01] you'd have to get in line [21:09:01] the sooner the better :) [21:09:53] the ProofreadPage fix (in any form) is probably needed for this grant work, but not literally today [21:10:09] sooner the better for me too though [22:44:58] btw are there any guidelines on one user running another user's user scripts? for the balinese workflow I'm thinking of setting up a user script under my name which will load an alternative to ProofreadPage tailored to the Balinese work, which users will be used to from Palmleaf.org already [22:45:29] I was thinking of making it a gadget but it's probably fine to just tell them to add a line to their own common.js loading my script [22:45:50] but I don't know if that's considered an ok way to do business -- just tell people to paste stuff into their common.js [22:49:44] It's perfectly acceptable to load anyone elses user JS [22:49:54] whether individual scripts, or their common.js etc [22:50:03] MW basically lets you do what you want [22:50:14] I know it's technically possible [22:50:24] I just mean it isn't discouraged or whatever, as part of instructions/documentation [22:50:44] No [22:50:49] Have a look on a big wiki at peoples JS pages [22:50:57] They'll load allsorts [22:51:01] ok [22:51:03] And probably can't tell you what most of it does [22:51:24] ^heh [22:51:27] nobody checks what they are running :p [22:51:55] my stuff is in a public github at least :P