[11:14:21] HI [11:14:38] anyone around dealing with de.wikipedia.org environments? [11:15:07] I have an issues that since a couple of days the extract using api.php also puts geocoordinates upfront (for some articles) - was not like this before... [11:15:24] Can't really say, when this happened, as it was also only forwarded to me [11:15:29] following url as an example [11:16:16] https://de.wikipedia.org/w/api.php?callback=angular.callbacks._6&action=query&pageids=6870431&prop=info|extracts|coordinates|pageimages&coprop=type|name|dim&inprop=url&exchars=500&exsectionformat=plain&explaintext&continue=&format=json&pithumbsize=400 [12:21:31] michipa, what do you mean "puts geocoordinates upfront"? [12:21:51] @Krenair [12:21:56] if you look at the example url [12:21:58] michipa, you mean the TextExtracts extension keeps the coordinates? [12:22:10] right - it prefixes them [12:22:23] like this for the url [12:22:25] "extract":"49.11583333333310.3225481\nLehenbuch ist e [12:22:33] this was not before and it only happens for some articles [12:22:39] and only on de.wikipedia.org [12:23:16] I can't tell when it started, but it was not like this a few weeks ago... As stated, it was also only forwarded to me [12:23:39] shorter URL: https://de.wikipedia.org/w/api.php?action=query&titles=Lehenbuch%20(Schopfloch)&prop=extracts [12:23:58] michipa, only forwarded to you? [12:24:09] @Krenair: [12:24:54] I have an application (www.geopedia.de) which does live queries on wikipedia servers using the api.php interface [12:25:25] my users there made me aware of that the extracts contains now geocoordinates and raised a bug [12:26:10] that's why "forwarded"... so named it a bug in my app. :) [12:28:26] could be https://de.wikipedia.org/w/index.php?title=Vorlage:Infobox_Ortsteil_einer_Gemeinde_in_Deutschland&diff=152852007&oldid=142632460 I suppose [12:29:35] I am familiar with the vorlagen stuff, but I know it was different before... I also not a pro when it comes to wikipedia processes, so I also don't really know where to report and such [12:29:40] that's why I try my luck here [12:30:14] sorry I AM NOT familiar ;) [12:32:40] maybe MaxSem could help [12:35:07] mhmm... so how do we bring that information to him? [12:35:37] I was thinking to write a parser to remove it again, but I would like to try to fix it at the source first.. [12:37:19] IRC, but he's not online right now [12:37:54] so keep hanging around and make him aware :-P [12:38:33] brion: Trying to do a proper grammar in PHP is... interesting. [12:38:48] heh [12:39:42] * Coren is currently working on a poc bit of code to parse css enough to be useful yet not be a monster. [12:40:09] "fun" [12:40:31] does the minifier do a semi-useful parse already or is that all regexes? [12:41:39] brion: right now it's mostly regexes to tokenize, but I think I might be able to make this into an fsm. It's not yet clear whether the performance will need that though because a complicated state machine is fast but opaque and hell to maintain. [12:42:58] Thankfully, the css syntax is mostly context-free. [12:44:45] brion: That said, I do have a question for you. Preventing use of url() is fairly trivial, but your notes implies that there are other things we may want to check for - do you have a list somewhere? [12:50:36] Coren: Sanitizer::checkCss [12:51:03] that is used for inline tyle attributes. i'm not sure if we need to add anything for validating whole stylesheets. [12:51:37] MatmaRex: Hmm. I was under the impression that brion was unsatisfied with that, but I might just be mistaken. [12:52:13] it is probably insufficient for stylesheets, yes [12:52:35] first thing that comes to mind is @font-face declarations, which should probably be disallowed in general [12:54:40] * Coren nods. [13:09:16] Coren: yeah i have vague impressions there are other scary things lurking about [13:09:22] behaviors? variables? expressions? [13:10:02] At this point, my objective is to parse the css into a tree; I expect any checks we want to then make on that tree are going to be relatively easy. [13:12:20] The css formal grammar is pretty funky and full of hysterical raisins but the resulting parse is fairly straightforward. [13:14:33] *nod* [13:14:37] sounds good :) [14:47:56] quintessential11: Hello! OK, so, I assume you want to know how to use the API to fetch information, then format that information somehow. Right? [14:48:53] Hi. I guess that is more closer to what I want than to what I was able to explain. [14:48:54] :) [14:49:16] Cool, cool [14:49:28] quintessential11: The jumping-off point for using the Wikipedia API is here: https://en.wikipedia.org/w/api.php [14:49:49] It has (recently) nice documentation (thanks again, anomie) and links to further reading [14:50:45] Krenair: you about? do you or anyone here know if we can get rid of the pink background of the edit window for "extended confirmed protected" pages? [14:50:47] quintessential11: You can also read the API documentation at https://mediawiki.org/wiki/Manual:API but most of that is linked from the generated documentation, anyway [14:51:13] I am busy with other things and am not taking new interruptions right now [14:51:16] quintessential11: Can you talk a little more about how you want to use the data, and which data you want to fetch? [14:51:31] np! [14:52:05] Thanks Mark. Let me just have a closer look at this and get back to you. [14:52:23] quintessential11: Cool, I'll be here :) [14:52:43] And so will many other qualified folk, so if I don't answer, don't be discouraged [16:40:42] * Coren grumbles at the overly permissive grammar. [16:41:09] brion: You know, I've half a mind to not actually parse css but only a sane, strict subset that we know we want to support. [16:41:51] The formal syntax is chock-full of 'we don't know when that construct might be used someday so don't touch it just in case' I'm leery of supporting. [16:42:52] brion: I'm getting the feeling that if we want to restrict it sanely we're better off defining our own supported subset and handle exactly /that/ instead. [16:43:07] Coren: you could call it "less CSS" ;-) [16:45:25] Heh [16:47:10] legoktm: Heh. Given that in CSS, strictly speaking, "$%^#[@@x7]" is a perfectly valid selector that may or may not be meaningful in some future version of a UA... "sane CSS" might be a good name for a subset. :-) [16:48:47] :P [16:49:27] valid CSS: +^~{-é:$[!(*)]} [17:15:28] Steinsplitter: the AbuseFilter/WP0 thing is deployed now [17:15:48] legoktm: thx [18:46:37] Coren: Howdy, this has been brought to my attention. Can you help point me in the right direction? https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Wma.wmflabs.org_down [18:50:42] CKoerner_WMF: Coren does not work for WMF anymore and is not responsible for WMF Labs [18:50:59] The wiki lied to me! :) [18:51:07] CKoerner_WMF: for labs related things, see #wikimedia-labs, ping especially Yuvi or Chase [18:51:14] Will do, thanks. [18:51:16] {{old}} :) [18:51:21] CKoerner_WMF: which wiki? [18:51:44] CKoerner_WMF: 18:48 < yurik> akosiaris and I will be switching maps services to node4.3 now, and will use trebuchet to update maps services [18:51:59] there seems to be some maps maintenance in course [18:51:59] A search for wma.wmflabs.org led me to Phab which lead me to Marc. https://wikitech.wikimedia.org/wiki/User:Coren [18:52:12] *Marc's user page. [18:52:37] we have an "ex-employee" template somewhere.... [18:53:03] wma is unrelated to any maps work [18:53:11] wma is all in wmflabs [18:53:12] I was about to say that [18:54:33] * CKoerner_WMF is off to labs! [18:56:17] labs labs labs ... :) [19:00:43] greg-g, CKoerner_WMF, I've created a minimal template (wikitech didn't seem to have one) and tagged that page. [19:37:06] so many typos :( https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085255.html [22:26:14] I'm pretty sure PHP is the single worst language to write a parser in [22:29:21] * Platonides invites Coren to rewrite it in brainfucker [22:30:27] ... allright. I'm pretty sure PHP is the single worst non-toy language to write a parser in. :-) [22:31:19] what about assembler? [22:35:06] ++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>. [22:35:22] Hmmm. Pretty sure I could write a better parser in assembler. At the very least, a proper state machine would be easy. :-) [22:35:48] well, it would be much easier to make it faster :) [22:39:19] It's going to be horrid code regardless.