[11:41:48] Technical Advice IRC meeting starting at 3 pm UTC/5 pm CEST in channel #wikimedia-tech, hosts: @addshore & @C_Fisch (WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [13:00:42] https://species.wikimedia.org/w/index.php?title=%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0&action=edit&lintid=4186 [13:00:59] Can someone please tell me why LintError doesn't like this and what the fix is? [13:09:41] Also - https://species.wikimedia.org/wiki/Maria_Helena_M._Galileo [13:09:52] LintError say it has a missing tag... NO missing tag found [13:11:44] ShakespeareFan00: wild guess: because you have 3 {| but only two |} ? [13:11:56] Can you implement a fix? [13:12:00] no. [13:12:18] where to see the LintError for https://species.wikimedia.org/wiki/Maria_Helena_M._Galileo ? [13:12:28] impossible to take a look for anyone without info how to reproduce. [13:13:27] andre__: Right so you are unable to help? [13:13:29] Typical [13:14:15] ShakespeareFan00: What is "typical"? [13:14:51] The inability of people in the Wikimedia/mediawiki -community to actually provide technical advice unless it's in a tight format... [13:14:56] It's nothing personal [13:15:03] ShakespeareFan00: I took the time to take a look at your link and I gave you a pointer. I cannot "implement a fix" because I am not a developer. [13:15:14] Oh, Sorry [13:15:20] I mean, I can also ask you to build a nuclear power plant for me and then complain that you don't have the skills. [13:15:44] andre__: Sorry... It's frustrating trying to pin down technical errors [13:43:03] https://en.wikisource.org/w/index.php?title=Page:Rothschild_Extinct_Birds.djvu/16&action=edit [13:43:10] Is saying it has fostered content [13:43:15] It doesn't [13:43:57] The content which is nominally foestered is a template inclusion, which is needed because the parser is too limited to handle continations of a table gracefully [13:44:28] A long term fix which doesn't involve workarounds like {{nop}} would be desirable [13:58:19] ShakespeareFan00: well.. it does.. sort of. [13:58:32] Yes [13:58:45] And how do you suggest the error is removed? [13:58:57] it has three seperately parsed pieces of wikicode, which then get sanitzed and then get concatenated.. [13:59:33] Yes [13:59:53] i think... or does sanitation happen after the concatination.. /me can't remember.. [13:59:56] Which leads to the use of {{nop}} because no-ones yet had the guts to take an axe to the parser [14:00:13] and PROPERLY implement headers and footers as seperated from the main text [14:00:39] so that 'clever' tricks like {{nop}} that rely on very precise behaviour aren't needed [14:01:02] I have been moaning about this for a decade nearly [14:01:22] and as yet no-ones got round to implementing a soloution [14:01:42] because messing with the parser is so freaking hard. [14:02:13] Maybe it's time to say the parser needs a complete FROM scrath re-write [14:02:18] *scratch [14:02:51] Such that the Wikisource LST/transclusion use case is considered and designed into the core from the start [14:02:52] it has. It's called parsoid. and then people complained, so they reimplemented almost every bug of the original into parsoid :) [14:03:58] Sometimes I think the Wikimedia community needs to be given a kick in the [REDACTED] to actually fix things, like getting rid of all the clever bug-dependent implementations they've used [14:05:08] There's already a massive change to the parser being planned... [14:05:14] Hence the whole thing about LintErrors [14:05:15] ShakespeareFan00: yeah they do, they are a bit spoilt. Monopolies stifle innovation. [14:05:37] It's a shame that there seems to be a lack of inertia to actually resolve the issues that will result [14:05:43] This is not professional [14:06:04] even though the Wikimedia community is mostly volunteers [14:06:43] Of course when I've tried to put Phabricator tickets about this, a small clique of users refuses to listen unless it's in a very precise technical format only they seem to understand [14:07:24] which is perhaps understandble when you are dealing with a specifc fault. [14:08:01] It less appropriate when you are a contributor trying to say that more than one fundamentla design flaw needs to be looked at again. [14:08:28] I think the general opinion is 'don't split tables'. [14:08:33] thedj [14:08:47] On other sites that's an option [14:08:53] It's not at wikisource [14:09:16] So as such the parser is "broken" for the Wikisource use case [14:09:23] which would need split tables [14:09:26] no, on other sites it isn't entirely needed, since they dno't use three seperate content slots. [14:09:36] Quite [14:09:38] but wikisource does. [14:10:09] Thusly, someone needs to figure out how to PROPERLY implement doing split-content... [14:10:26] Ideally the Page header and footer should be seperate fields in how the content is stored [14:10:42] they are (for wikisource) [14:10:51] thedj: They are NOT [14:10:52] but that also means they need to be balanced [14:11:01] what is opened, needs to be closed. [14:11:10] within that same slot. [14:11:25] Currently Wikisource stores the header/footer within a single page text field [14:11:52] In the db model the header/footer/page content should be in different fields... and only composited together when rendered [14:12:03] I.e The page should be composited BEFORE the parser sees it [14:12:24] which is more complex than at present [14:12:32] hmm... [14:12:35] lemme check something [14:13:31] On a simmilar note, LST and things like should composite/compose raw first into one document and then PARSE [14:14:17] Currently a parse as rendering approach is used, which leads to some of the issues such as split-tables needing to use {{nop}} lists that have to drop back to raw HTML and so on [14:16:45] If on the other hand there was a way of effectively deffering the processing of certain things until the page was completely built .... [14:17:18] Then the utilliy of LST becomes much greater [14:18:01] new WikitextContent( [14:18:04] $this->header->getNativeData() . "\n\n" . $this->body->getNativeData() . [14:18:04] $this->footer->getNativeData() [14:18:07] ); [14:18:23] hmm, maybe it's those line breaks..... [14:18:53] can you check if the header and footer behavior is the same ? [14:19:08] thedj: I'm not sure what you mean [14:19:25] I don't have access to the source code... I'm only a disappointed user [14:20:10] I will if you are considering linebreaks note that precisely how line-feeds are handled with LST/and Proofreadpage is another area where confusion arises. [14:20:42] the \n might force a

between the header and the body. There's not such line breaks between the body and the footer. which means that the same table when divided over body/footer instead of header/body, might show a different result. [14:20:44] I was thinking a while ago, that for things like table-start on a page, there would need to be some way of indicating this externally from the text [14:21:03] Thedj: Hmmm [14:21:33] Well I will not that in some use-cases at Wikisource, I've had to put a {{nop}} in the footer, to get it to recognise a |} table end marker [14:22:28] I had thought one solotuion was to explicitly record for a Page: in the db if there were continued constructs, that would need to be 'balanced' inside Proofread page... [14:22:47] Things like nested divs as well... [14:23:11] hmm, do you have a ticket reference for this ? [14:23:42] thedj: Not sure... If there was one it was closed long-ago due to lack of interest, and people saying I hadn't communicated the probelm well wnough [14:24:02] I won't stop you opening a ticket on this [14:24:25] given that someone already had one open about tracing "Proofread page status" in the database rather than in page text [14:25:11] A page property to tell the parser/LSt that a page body needs different handling because of 'continued' markup like tables/lists etc.. might be useful [14:25:33] The other consideration was perhaps to have pre-processor directives like various programming languages have... [14:26:25] to explicty tell the parser... The content following is a Table row, so override what you would have done and do Y specfically [14:27:33] thedj: Aside: Currently in proofread page, page numbers are renderd as numbers , unless specfied otherwise on the index page... [14:28:01] wait, why is the nop needed here ? If I remove it, i see no difference.. [14:28:13] thedj: The nop is needed on the Translcusion [14:28:24] If it's not present , you get broken rows [14:28:37] ah, so tidy cleans it up.... [14:28:37] and missing page numbers... [14:28:40] Yep [14:28:48] f'ing tidy [14:29:33] Which is why perhaps there need to be directives to tell tidy "Yes I know what I'm doing [READCTED] my content alone!" [14:30:00] nah, we are getting rid of tidy.. that's what most of that LintErrors is about... [14:30:07] Quite [14:30:17] right, so i think then that it is indeed those \n\n's being added... [14:30:48] Well I would be more generalistic and so it's an issue of when implied whitespace/line feeds should or dhould not be added.. [14:30:50] mess with the parser result. tidy fixes it, unless you transclude, because then it has way more things to fix... [14:30:51] Technical Advice IRC meeting starting in 30 minutes in channel #wikimedia-tech, hosts: @addshore & @C_Fisch (WMDE) - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:31:16] There are some tickets about paragraph breaks in footnotes being a pain as well... [14:31:38] TheDj: Can I leave this issue with you? [14:32:03] I think the \p is there, because traditionally, the 'old-style' wikisource before there was a convention where people addd multiple linebreaks between the header and the footer.... [14:32:05] I am going to take an extended wiki-break possibly due to getting frustrated trying to solve LintErrors this morning [14:32:07] \n [14:32:15] thedj: True [14:32:24] between the header and the body [14:32:41] thedj: I'll also mention template slike this - https://en.wikisource.org/wiki/Template:Statute_table/header [14:33:03] which is used extensively for something like this... - https://en.wikisource.org/wiki/Chronological_Table_and_Index_of_the_Statutes [14:33:05] and which took a LOT of stress to get right [14:33:26] i think this ticket might be related right: https://phabricator.wikimedia.org/T138604 ? [14:33:46] That wasn;t a ticket I was ware of... [14:33:58] So I'm not sure.... but possibly... [14:34:11] Anyway I need to take a break from this before I get annoyed again [14:34:19] do that :) [14:34:36] thedj: Was it you that was working on moving Proofread page status into the DB so it can be querried via Quarry? [14:34:57] Oh and if you are able... [14:35:16] Proposing a technical meeting SPECFICALLy to handle Wikisource issues is strongly hinted at.. [14:35:18] * ShakespeareFan00 out [14:38:48] ShakespeareFan00: no, wasn't me [14:45:45] Just thought of something else... [14:45:52] Mediwiki templates can be subst [14:47:12] However... Many times on Wikisource I've found what I actually needed wasn't so musch a dumb subst as a {{macro:template name}} that effectively did what a subst did, but cleaned up so that only relevant output was placed on the page.. [14:47:32] Currently templates using parser functions can't be cleanly subst [14:47:50] because subst just does a raw copy and replacement... [14:48:50] The Chronological Table I linked earlier would be a lot cleaner with macro: Statute Table, vs actual Template Calls.. [14:49:39] Generally on wiki source , the underlying contents of a table won't change... so in terms of render performance doing what essentialy a call to a 'static' function might be overkill [14:50:24] As this may be applicable to other wikis, I'd like to know where to propose a {{macro: