[21:53:16] Hi guys. Can I ask about a tech problem that likely isn't specific to wikimedia software, but is a HTML problem that came up when editing a wikipedia article? [21:55:15] I have a Hungarian image thumbnail caption (text set in narrow columns) that has the string "„»Kócos«, »Pufi«, »Kéksapka«," which my browser refuses to break into lines at the spaces, presumably because the algorithm thinks the angle quotation marks are used as in French, but it should break them. [21:55:39] What's the proper way to edit this text to force the browser to allow a line break at those spaces? [21:55:51] This is really a unicode question. [21:56:09] * Platonides is sure there's a unicode character that can be put there [21:56:57] Platonides: possibly zero width space (U+200B), but is that the right one, and should I put it before or after the spaces? [21:57:36] yeah, probably: this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification [21:58:00] I'd put it after the space [21:58:23] so the space is left in the previous line [21:58:34] it's a funny use of quotes, btw [21:58:35] I'll try. What I'm afraid of is that the browser will break there, but then there is a space left [21:59:02] I'd have expected to be used like «Kócos», «Pufi»m «Kéksapka» [21:59:20] a trailing space on the line should not matter imho [22:01:07] well, this does seem to allow a line break, but I'm not sure if the space before will be required to take up space within the column width [22:01:17] it's certainly better than nothing [22:03:31] It's a pity the browser doesn't just break the text correctly without the extra markings. It could, because the HTML does tell the browser that the article content is in Hungarian language, but I presume this is a rare corner case that doesn't come up often. [22:03:51] you can open a bug to the browser [22:04:47] not to the browser, no. it's presumably a bug in whatever library it's using to handle text layout, such as Pango or ICU or whatever, or possibly a bug in the unicode standard [22:05:45] I'd open the bug to the browser [22:05:56] and let them figure out if that is a problem down the stack [22:06:00] what browser is it? [22:06:09] and, are other browsers implementing it correctly? [22:06:12] I'll try to search and read up a bit on this later, but now I'll swap the stack and continue the edits I wanted to do [22:06:41] I haven't tried any other browser, and it's Firefox 45.9.0 debian version [22:07:07] https://en.wikipedia.org/wiki/Quotation_mark#Hungarian is a good link to provide them [22:08:46] Platonides: no, not really. That page doesn't account for the fact that some texts (possibly older ones) use high-6 quotation mark instead of high-9 for closing quotations [22:09:34] it seemed useful in order to understand the » « use [22:09:56] chrome doesn't seem to move it either [22:10:25] And I still don't think it's the browser's fault. They're probably just both using pango, and they should. Text layout is complicated, and each program shouldn't reimplement it. [22:10:45] I don't think firefox uses Pango, btw [22:10:50] I mean, it could perhaps be some other library than pango, but still. [22:10:58] It must use something. [22:11:02] It doesn't? Wtf. [22:11:12] Not even recent firefox? [22:12:46] it seems that it is optional to build firefox with Pango support [22:13:20] then you have the issue on what is used by Debian firefox… [22:13:40] Optional in the sense that you can make a custom build without pango if you want to run it on some really low memory device, or optional in that it's some new experimental option that is currently too buggy to enable in mainline firefox? [22:14:04] Anyway, I'm not even sure it's a bug in pango. Unicode has a specification for where to break lines: http://www.unicode.org/reports/tr14/ [22:14:42] depending on character properties [22:14:49] let me try to understand how this works [22:16:07] good luck [22:16:20] thanks :-) [22:16:27] if it's really a bug in unicode, then I probably won't pursue it [22:16:41] but I might still want to look up the correct workaround there [22:17:20] you _could_ open them a bug [22:17:39] bu that's harder than doing so to the browser :) [22:18:27] Or I could open a bug to ICU and let them figure it out. I've filed a bug for ICU once and they reacted quickly. [22:20:18] of course for that I'd have to actually test whether ICU detects a line break opportunity there [22:32:34] Ok, so I think putting a zero-width space after the space is correct workaround. [22:41:52] Also it seems that this is not a bug in the unicode standard, the unicode standard allows for both behavior, it's a quality of implementation for whatever library is used. [22:43:39] The default rules they recommend gives no break there, because they don't let you break before the quotation mark "»" [22:44:00] but they also specifically recommend that quotation marks can be treated differently if the language is known [22:52:35] So I can either file a bug to mozilla, but I don't think that's very useful, or check what Pango and/or ICU do and file a bug if they're handling this wrong, or I can just ignore the whole thing.