[17:49:01] gwicke: how's it going? [17:49:29] TrevorParscal: hey, I was just writing a reply to your post on wikitext ;) [17:49:34] cool [17:49:38] i won't interrupt you [17:49:53] we can talk more after I read it - it's good to have this stuff on list [17:50:12] TrevorParscal: repairing mis-nested or overlapping annotations in the parser could be an issue for round-tripping [17:50:29] you may misunderstand what I mean then [17:50:48] I think your approach of pushing it to the serializer makes sense [17:50:52] it's not a matter of repairing the original source, it's just a matter of ensuring valid HTML output [17:50:56] yes [17:51:15] but handling non-sensical parse trees is no fun.. [17:51:22] we should be able to have serializers that enforce their own rules and take their own actions to make things sane of the given format [17:51:56] and the nesting thing is actually not too difficult, I wrote a simple stack-based approach that works very efficiently for the visual editor [17:52:05] I don't have a good stategy for the grammar yet [17:52:12] it's working from the linear data model though [17:52:15] apart from adding ad-hoc productions [17:52:34] as in: http://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures [17:53:00] gwicke: I'm happy to help think through this stuff with you [17:53:35] but I don't have very much experience with developing grammars [17:53:36] grammars don't like overlapping structures [17:53:40] so I feel you either end up using the parser just as a tokenizer [17:53:41] sure [17:53:46] and then write your parser manually [17:54:00] I am trying to follow the HTML 5 spec re parsing [17:54:08] and mis-nesting repair etc [17:54:20] gwicke: I think the output being based on ranges of text with "something" applied to it could really free the parser to be iterative [17:54:23] but the spec is mostly in terms of manually-maintained stacks of tokens.. [17:54:30] as in, look for italics, then look for bolds, then??? etc. [17:54:41] rather than forcing it to push into a tree in one pass [17:54:47] then nesting doesn't matter at all [17:55:00] and the precedence can be defined and documented [17:55:11] I believe there's already a precedence like this in Wikitext [17:55:29] so, following that would likely produce more similar parse results as well [17:55:31] all the tick stuff is currently handled by ad-hoc rules [17:55:49] but again - I think what you are doing is it's own complex space, and I have limited experience there [17:56:03] I know how the current parser does it [17:56:05] it's scary [17:56:13] it just ignores nesting [17:56:25] so you push the trouble to tidy.. [17:57:00] the parser I wrote breaks things into blocks with an ad-hock "block" parser, then parses the blocks using an ad-hoc "span" parser [17:57:13] but I never really finished it [17:57:17] and I think what you are doing is better [17:57:25] yes - I know that we push all the complexity to tidy [17:57:37] I think that's actually OK when you consider WikiDom [17:58:00] because unlike in HTML, WikiDom supports overlapping annotation regions [17:58:08] I also have the feeling that parsing span/phrasing stuff later makes sense [17:58:13] and it's - as we said - up to the serializer to decide how to solve for that [17:59:13] anyways, I will let you respond to the email, like I said, it's usually good for things to be on list - but I'm open to talking about this stuff with you any time [17:59:40] :) yay lists [17:59:42] I'm going to define WikiDom in a more formal and documented way [17:59:44] today [17:59:50] TrevorParscal: thanks! will get back to you. [17:59:56] and also look into different techniques for validating it [18:00:14] and need to change definitionterm to term in the html serializer ;) [18:00:45] *TrevorParscal looks at http://code.google.com/p/jsonvalidator/ [18:00:46] TrevorParscal: pre is needed as well, just added some support for it [18:01:12] ah, yes, we will need to add that [18:02:19] oh- and there's one more dom question: list items can contain other block/flow content, but are currently structured to only contain annotated text [18:02:49] I'm not sure about that [18:02:59] simpler for now [18:03:06] what can you put in a list other than a line of annotated text? [18:03:07] but will need to be changed at some stage [18:03:18] tables for example [18:03:27] any flow content according to the html 5 spec [18:03:31] I don't think you can put a table in a list item [18:03:43] according to the wikitext parser [18:03:51] a table starts on a new line [18:03:56] using html syntax [18:04:03] ah, ok, that's interesting [18:04:18] even tried it in the wiki this time, with tidy enabled [18:04:20] ;) [18:04:22] so, we can make it possible to have list items contain any number of other nodes [18:04:39] it's highly configurable [18:04:48] it's like 3 lines of code or something :) [18:05:04] yep, just wanted to bring it up before I forget [18:05:14] I will talk to the other VE guys about it [18:05:15] should be simple [18:05:37] so, it will mean that listItem will contain a paragraph in most cases, but could contain other things too [18:05:51] any flow content really [18:06:12] what do you mean by flow content in this context? [18:06:26] similar to blocks in html 4 [18:06:30] sure [18:06:32] they changed the term to flow in 5 [18:06:46] phrasing ~ inline [18:06:48] ok, so that's what I mean by listItems being able to have child nodes, rather than just content [18:07:09] yep [18:07:23] right now - headings, paragraphs and listItems are "leaf" nodes, and everything else is a branch node [18:07:34] well, I guess horizontalRules and comments will be leafs too - you get the idea [18:07:44] so, yes, we are on the same page here [18:08:13] WikiDom is a little more strict than HTML about what can be in what, but you can reproduce things just fine [18:08:40] I am trying to match the content model in the grammar as far as possible [18:08:51] with general inline productions etc [18:09:02] for instance a tableCell can't contain content directly, but it can contain a paragraph [18:09:09] right on [18:09:42] that is stricter than html 5 [18:10:03] but is just the side effect of not accepting blank text nodes ;) [18:10:39] TrevorParscal: do you feel that round-tripping overlapping annotations should reproduce the operlapping syntax? [18:11:07] that's going to be up to 2 different factors [18:11:43] 1. are we going to be normalizing wikitext on the round trip (see: http://www.mediawiki.org/wiki/Visual_editor/Software_design#Dirty-diffs ) [18:12:03] 2. how we write the Wikitext serializer [18:12:43] i'm personally a fan of the idea of normalization at it's face, but also see that it can have some pretty nasty edge cases [18:13:12] reconciliation is a very interesting way to go, but it's very complex and I'm not sure it's always going to produce better results either [18:13:42] the results might look a bit inconsistent, but the diffs would be better (if it can be made to work) [18:13:54] agreed [18:14:18] so, it's sort of this situation where we have to choose one of these 3 approaches and hopefully we choose well [18:14:42] there may also be a 4th approach, or a way to use some of one and some of another [18:14:54] but theses are the only ones I can think of atm [18:15:19] for me all fix-ups in the parser seem to be in tension with round-tripping wikitext [18:15:42] which is why normalization allows us to forge ahead so much faster [18:15:54] but the cost is a one-time normalization of every page on the wiki [18:16:27] which could be done by a bot, and a change message would say "Normalizing" or something - but it's not ideal, ideal is no visible transition [18:16:42] that will be very hard to achieve though [18:16:51] and anytime you revert past the normalization point, the system would have to re-normalize [18:16:57] agreed [18:17:10] so, it's just a matter of, what do we want to spend our time on [18:17:55] long-term normalization sure is nicer [18:17:58] I mean, if we end up spending 80% of our time avoiding a visible transition - is that really worth it? [18:18:16] but if we don't get normalization right, it's likely to cause other problems [18:18:45] so, we have to measure the complexity of these different approaches in the short and long term, and choose wisely [18:18:52] not easy I'm afraid :) [18:19:13] for now I have enough to chew on with the parser, even without major fix-ups.. [18:19:24] but all objective decision making aside, I believe the smart money is on normalization [18:19:25] but somehow this still looms above it all [18:20:00] i would also like to get some real feedback from the community about how they really feel about the idea of pre-save normalization [18:20:35] maybe a simple form of reconciliation can ease the pain a bit [18:20:52] because the perceived cost of this visible transition may be grossly under or over estimated [18:20:55] with a 'touched' bit on major blocks or something [18:21:24] and normalization occurs on touched bits (here's the hybrid model already!) [18:21:35] ;) [18:22:46] ok, I'll continue to tweak the parser for mostly-correct syntax for now [18:23:06] without spending too much time on fix-ups yet [18:28:21] RoanKattouw: are you around? [18:28:55] yes [18:29:25] RoanKattouw: looks like there is some undeployed stuff inwmf branch other than mine [18:30:11] What paths? [18:30:23] For recent commits in the branch see https://www.mediawiki.org/wiki/Special:Code/MediaWiki/?path=/branches/wmf/1.18wmf1 [18:30:49] U languages/messages/MessagesEn.php [18:30:58] U extensions/WikimediaMessages/WikimediaMessages.i18n.php [18:31:45] hmm those are siebrands probably [18:31:49] Yes [18:33:20] RoanKattouw: where would you put the threshold for using scap instead of doing individual sync-dir/file? [18:33:40] ~6, I guess [18:33:47] Logging is broken (again!) atm though [18:34:07] right [18:34:12] I just fixed it [18:34:16] Until puppet goes and reverts it again [18:34:21] I'll start deploying it now [18:34:27] OK [18:41:21] RoanKattouw: fatalmonitor a bit broken? [18:41:33] *RoanKattouw looks [18:41:41] Well of course [18:41:46] Puppet put back the old broken version of that one too [18:41:54] *RoanKattouw ambushes Ryan_Lane [18:42:05] *Nikerabbit hands over the spears [18:43:43] Ryan_Lane: I've got all sorts of fixes for various things ranging from occasionally annoying to very annoying (such as sync logging breakage) sitting in gerrit for almost a week (and one for 2-3 weeks), can we deploy some of those please? [18:44:15] TrevorParscal: are you in the office today? [18:44:27] yes [18:44:28] RoanKattouw: sure, let me know which ones are most important right now [18:44:35] hiding behind my monitor quite well I guess [18:44:57] https://gerrit.wikimedia.org/r/#change,667 was breaking sync logging [18:45:00] Ah [18:45:02] Fixes it I mean [18:46:33] The rest is less urgent, mostly "I thought I was done with this 2 weeks ago" stuff. I guess if you review one other rev it should be https://gerrit.wikimedia.org/r/#change,657 , because that's a dependency for the WebFonts deployment later this month [18:50:47] ok. pushed those two [18:50:56] I'll try to review the rest today [18:51:36] Thanks [18:51:54] Nikerabbit: Ryan pushed the WebFonts change, yay :) [18:52:06] it'll take a while to go out to all the apaches ;) [18:52:14] or whever its going [18:53:09] Yeah, Apache [18:53:22] are you cheryy pickng specific changes or how are you doing it? [18:53:23] It should be working by tomorrow, though, I think [18:53:32] cherry-picking [18:53:44] ok, just checking [19:18:38] who controls SVN? Is it brion or Tim? [19:18:45] not me [19:19:06] RoanKattouw: what change? [19:19:07] can't seem to make branches any more [19:19:13] RoanKattouw: ah the caching [19:19:15] brion: ur brion vibber? :D [19:19:23] yep :) [19:19:32] hah [19:19:47] Nikerabbit: Aye [19:19:53] so ur the guy who looked in -ops for people with shell access :) [19:20:00] yeah they hide [19:20:10] and won't answer me in our private channel, so went looking for em ;) [19:20:20] <^demon|away> neilk_: What's up? [19:23:06] TrevorParscal: are you still around? [19:23:13] yes [19:23:39] re the pre stuff- it might make sense to model that as two different kind of pres [19:24:12] hmm- or maybe not [19:24:24] all the indented pre allows inline [19:24:33] so that should be covered by annotations [19:25:06] just forget about it ;) [19:26:34] wikitext serialization might still be an argument in favor though [19:26:34] no worries [19:27:23] otherwise there is quite a lot of normalization on pre areas, with everything ending up either as tags or indented [19:29:10] but of course anything with both html and wiki syntax has the same problem [19:31:01] in general, should there be some attempt to preserve html vs. wiki syntax? [19:31:56] yeah, i think there should be a way to mark something as having a wikitext or html origin [19:32:16] could be in the data of an annotation or an attribute of a node or something [19:32:34] "source/origin": "html" [19:32:37] something like that [19:33:09] ok [19:36:28] gwicke: http://www.mediawiki.org/wiki/Visual_editor/WikiDom [19:36:35] some initial documentation of WikiDom [19:37:19] nice! [19:38:08] would that be a good place to document issues I see or additions I make? [19:39:12] or do you prefer a mail? [19:41:39] http://www.mediawiki.org/wiki/Talk:Visual_editor/WikiDom [19:43:11] yeah, we can use the talk page there I guess [19:43:16] why isn't this LQT? [19:43:18] crap [19:43:41] TrevorParscal: {{#useliquidthreads:1}} [19:43:58] on the talk page? [19:43:59] ps, could there be something of a sneak preview of vis.ed sometime on the blog or something ? [19:44:31] might be me having the parserplayground editor enabled [19:44:35] i notice people seem to be totally unaware about the kind of work being done. [19:44:42] gwicke: http://www.mediawiki.org/wiki/Talk:Visual_editor/WikiDom [19:45:10] thedj: we are releasing a sandbox where people can play with it next month [19:45:17] ah cool ! [19:45:22] TrevorParscal: ahh ;) [19:45:25] how advanced it will be isn't clear, but we are working hard on making it as good as possible [19:47:09] TrevorParscal: I seem to remember something about underscore.js usage instead of jquery- is this important right now? [19:47:38] we were talking about if we wanted to start using that [19:47:42] I like the lib a lot [19:47:49] some of the functionality is already in jquery [19:47:56] TrevorParscal: Yes, on the talk page [19:48:13] underscore has some nice functional bits [19:48:16] but also, some of the stuff we are doing is very performance critical, and using the functional stuff isn't good for that [19:48:22] http://blog.trevorparscal.com/ [19:49:03] yeah, I removed the $.each uses I added after seeing the jsperf numbers [19:49:08] yeah [19:49:26] too bad function call overhead is so high in JS [19:50:12] aggressive inlining and jit might be a difficult combination to get right [19:50:20] yeah [19:54:58] gwicke: I suggestion using jshint [19:55:05] what IDE/editor do you use? [19:55:12] gvim [19:55:56] http://www.vim.org/scripts/script.php?script_id=3576 ? [19:56:17] lots to choose from there -> https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=gvim+jshint [19:56:25] just arrived there as well [19:57:03] :) [19:57:53] I notice you mostly have missing semicolons here and there [19:57:56] nothing major [19:58:50] also, you should place operators that join multiple lines together at the end of the line being broken, rather than the beginning of the fragment line [19:59:11] otherwise magic semi-colons can haunt you kitchen cabinets until you die [19:59:32] ok [20:01:21] just can't get used to those tabs ;) [20:19:01] *gwicke still tries to install jshint on Debian, which fails with npm < 1.0 (and Debian has 0.2.19..) [20:43:40] fatalmonitor shows lots of timeouts for extension distributor [20:44:08] Yeah someone mentioned ED was broken [21:37:55] jorm: http://www.pcworld.com/article/243279/modern_warfare_3_thieves_crash_into_van_carrying_6000_copies.html