[06:11:49] https://wikitech.wikimedia.org/wiki/Nova_Resource_Talk:Tools/Help#Hosted_jQuery_etc. [06:12:18] what could be the URL to load a copy jquery.min.js at on bits.wikimedia.org? [06:17:36] https://en.wikipedia.org/w/load.php?modules=jquery is wrapped in a mw.loader.load [06:17:44] er, .implement [06:18:51] https://bits.wikimedia.org/en.wikipedia.org/load.php?modules=jquery [06:18:58] Nemo_bis: ^ [06:23:33] hello [06:23:33] Nemo_bis: what does "free access" mean? [06:23:46] cortexA9: again? [06:23:52] Nemo_bis: have you issues ? [06:23:55] what ? [06:24:35] cortexA9: are you still having problems with connections to esams? [06:24:49] no.. [06:24:57] ok :) [06:25:03] :) [06:25:55] what is the issue ? [06:25:59] cortexA9: anyway, fyi, first 2 letters of datacenter is vender. last 3 letters is airport [06:26:06] no issue afaik [06:27:16] jeremyb: free as in not restricted by something ensuring it comes from MediaWIki [06:27:32] bits URIs are usually very weird [06:27:53] cortexA9: besides wtf, why am i awake at this hour?! [06:28:08] that's a better question :p [06:28:28] eheh jeremyb idk [06:28:35] :) [06:29:23] but the problem i thought was bits.wikimedia.org [06:29:31] loading time.. [06:29:32] earlier? [06:29:45] no, it was all of the domains across the board [06:30:21] jeremyb: suggested that to pathos, thanks [06:30:22] Nemo_bis have the same issue.. [06:32:34] Nemo_bis: i haven't looked at the bits URLs... idk how stable they are or not. but i guess we could just put a caching+anonymizing proxy in front of google apis? [06:32:37] idk [06:32:44] have to think about it :) [06:33:06] jeremyb: that URL looks rather stable, I've seen it around for a while :) [06:33:20] no idea what the content is though ;) [06:35:59] jeremyb: bits.wikimedia.org is an alias for bits-lb.esams.wikimedia.org. [06:36:15] jeremyb: bits-lb.esams.wikimedia.org has address 91.198.174.233 [06:36:29] i have that [06:37:25] cortexA9: ok, but your traceroute may have changed. or at least our BGP setup has. i think [06:37:35] cortexA9: what's your point? [06:37:49] jeremyb: the point is slows wikipedia.. [06:38:04] bits serves all the CSS and JS... [06:38:14] you can't exactly browse the site with no css [06:38:22] 11 06:24:35 < jeremyb> cortexA9: are you still having problems with connections to esams? [06:38:25] 11 06:24:49 < cortexA9> no.. [06:38:25] yes, just unstyled [06:38:28] well, it would just look weird. [06:38:28] you said no [06:38:33] are you saying yes now? [06:39:28] no problems here... ping 8 times faster than yesterday and 0 packet loss [06:40:16] but sometimes.. [06:40:21] not always [06:40:31] continue? [06:40:35] try full sentences [06:40:46] yesterday i mean.. [06:40:58] who cares about yesterday [06:41:01] focus on today [06:41:01] ok [06:41:07] today is good [06:41:14] full stop [06:41:19] :) [06:52:27] well howdy, I was directed here by the Powers that Be. [06:52:53] if there's anything I can assist with wrt DDoS mitigation, please let me know :) [06:55:07] hello nakon [06:55:52] howdy cortexA9, that's a name i haven't seen i a few years [06:55:55] :) [06:56:30] hehe nakon [06:57:16] 2 years makes me a long-time contributor [06:57:17] :) [06:57:45] well, probaly more than 2yr :) [06:58:26] wow [06:59:34] i wouldn't "contributor" after that... sry :c [07:01:12] nakon: why not ? [07:02:22] sometimes i join in this channel for report issues.. [07:02:46] :) [07:05:59] nakon: i like wikipedia i am an old visitor :) [07:06:45] no not indeed, I do apologize for elders :) [07:07:41] no [07:07:48] i mean i am young :D [07:07:53] yound old visitor :D [07:07:58] *young [07:09:27] i want to contribute in a better wikipedia. [07:09:37] :) [07:09:49] nakon [07:10:26] cortex hai [07:10:44] we all do :) [07:16:42] this is my favorite channel [07:16:45] :) [07:19:54] techs araound? [07:21:03] hello Steinsplitter [07:22:06] have you ever thought of doing a mirror backup of wikipedia? [07:22:43] like… dumps.wikimedia.org? [07:23:10] legoktm: the mediadabase is broken :/ [07:23:20] i cant do anything about it :x [07:23:41] i know. is evil :D [07:23:42] legoktm: like example: mirror.wikipedia.org [07:23:50] :) [07:24:08] cortexA9: what's the point exactly? what's wrong with what dumps gives? [07:25:12] i mean a very mirror of wikipedia [07:25:20] online [07:25:35] not a dump.. [07:27:29] for security purpose. [07:29:51] why not make a torrent of the all wikipedia too [07:30:19] people can host wikipedia [07:30:30] if they want :) [07:31:39] cortexA9: https://meta.wikimedia.org/wiki/Data_dumps [07:36:55] legoktm: with pictures ? [07:36:59] :) [07:37:12] that's a different dump, its on archive.org i think. Nemo_bis would know. [07:40:54] http://web.archive.org/web/20010727112808/http://www.wikipedia.org/ [07:40:54] wow [07:40:58] 2001 :) [07:45:04] hello apergos [07:45:08] hello [07:46:25] dump of what [07:47:09] cortexA9: if you mean off-site backups for volcanoes exploding in Virginia and the like, https://wikitech.wikimedia.org/wiki/Bacula [07:48:57] Nemo_bis: i mean all wikipedia in another host. mirror backup. [07:49:36] yes, see above [07:49:44] cortexA9: the full dumps are replicated to several organisations [07:49:58] the dumps are not a reliable source for us for a mirror [07:49:59] and we also have the dbs replicated accross DCs as well [07:50:11] (not for long) [07:50:12] we already have db snapshots, replicate the db [07:50:27] of course multiple hosts have the mediawiki installation [07:51:22] we can ask archive.org to host [07:51:35] they already have dump copies [07:51:37] no ? [07:51:45] we can't give them db copies; these contain private data [07:52:07] oh [07:53:11] I'm also archiving Commons files on archive.org as we speak https://archive.org/search.php?query=subject%3A%22Wikimedia+Commons%22 [07:53:27] so far, 10,617,718,431 KB [07:54:16] they could probably replicate from prelabsdb1 in realtime if they really wanted to, but I doubt there would be that much benefit compared to storing the dumps [07:56:50] well, WMF claims that maintaining replication of that sort is very expensive to them [07:58:01] wouldn't it basically just be bandwidth now, since that feed is maintained for labs? [07:58:59] I doubt it, it was mentioned as a reason to kill TS [07:59:09] (as in, TS replication) [08:01:01] sigh, Either this webhost hasn't responded to a ticket I filed on the 8th, or the web interface doesn't show responses and I have to wait till I go into the office... [08:01:11] and i'm in the wrong channel... [08:10:40] apergos: http://www.httrack.com/ [08:10:47] :) [08:11:11] that's not how we would or could go about it [08:12:01] text for old versions of pages, for example [08:12:41] doesn't sit in a file in a directory which can be recursively copied... revision data lives in a table in one database, the text lives in a database on another cluster [08:21:00] apergos: oh i understood [08:22:45] p858snake|l: turns out they might have changed their mind http://lists.wikimedia.org/pipermail/toolserver-l/2013-September/006289.html [08:23:12] the nda is the tough part I guess [08:25:04] what is NDA ? [08:25:14] non-disclosure agreement [08:25:32] apergos: well, that's easy, only roots have to sign it [08:25:52] anyone with access to that data would have to sign it, I think [08:26:09] if we are talking about replication of certain user data for example [08:26:19] "that data" being the private data [08:27:03] so if I were able to query those tables in the db (not required to be a root) for the purposes of my specific tool [08:34:33] among the other things that could be discussed on that thread is who would need to sign an nda, what the nda would have to say, what identifying information the user would have to provide (full, as checkusers do, I suppose), and for which sorts of data access [08:34:57] if there were a wiki page with a clear policy and set of requirements, that could be very useful [08:35:11] that's for tools, not backups [08:35:37] you can imagine archive.org taking a full copy on some hosts with only few employees of their having access [08:36:21] on a replica for tools only roots would have (potential) access to private data, as on toolserver, the normal users only public stuff [08:37:03] but yes, it would be very useful to have it documented somewhere, though I don't think it's the main roadblock for such an arrangemenet [08:40:03] yes, I thought we were talking about replication to another site that would host tools (according to the email) [08:46:34] apergos: wikipedia have different servers in europe ? [08:46:54] we have three data centers [08:47:28] european readers get content via amsterdam [08:47:29] apergos: if i am in europe how it can decide the best server for me ? [08:48:09] it's automatic ? [08:49:32] yes, it's automatic [08:50:04] dns resolution of the hostname will give you the right ip [08:50:09] and the rest just happens [08:51:27] https://developers.google.com/speed/pagespeed/ [08:51:31] this can help wikipedia ? [08:51:35] in terms of speed [08:52:44] people have worked a lot on the speed issue [08:53:22] from minimized js to serving only the pieces needed for initial page loading to caching output from the parser to serving all static resources from a separate cluster (and with a cache in front) [08:53:35] this is constant and ongoing work [08:54:39] if there is a particular area you are interested in, you could follow the discussion via bugzilla our the appropriate mailing list (or of course irc in wikimedia-dev, when such discussions are happening) [08:54:47] and you can contribute as well [08:56:43] apergos: https://developers.google.com/speed/pagespeed/module [08:56:48] for apache [08:57:30] ok so what I would highly suggest you do [08:57:42] (since I am not involved in that part of the development at all) [08:57:52] is to have a look at the archives of the wikitech-l mailing list [08:58:11] ok [08:58:13] http://lists.wikimedia.org/pipermail/wikitech-l/ [08:58:38] specifically around the issues of speed, page rendering and delivery [08:59:01] if you have trouble finding them you might subscribe and post to the list asking for pointers to such discussions [08:59:17] and from there you can see what has been done or what is still missing [08:59:42] how to open a new discussion ? [09:02:00] well you wuld want to subscribe to the list [09:02:32] just bear in mind that you're coming into the middle of a topic that has had some work done on it, so you'll want to find out what work has been done first [09:03:09] yes [09:07:29] apergos: subscribed [09:08:34] ok [09:20:11] apergos: gmane.science.linguistics.wikipedia.technical is this ? [09:20:39] p858snake|l: turns out they might have changed their mind http://lists.wikimedia.org/pipermail/toolserver-l/2013-September/006289.html uh, dunno, I don't read the archives from gmane [09:21:42] f you look at one of the messages in there it ought to have a link to the actual list though [09:25:54] apergos: http://lists.wikimedia.org/pipermail/toolserver-l/2013-September/006285.html [09:26:24] that was the comment from louis about transporting db contents off labs to a 3rd party to allow processing by non opensource tools [09:26:27] ok [09:26:43] presumably they have to have signed an nda also [09:27:02] apergos: under that theory, every user on labs has to as well [09:27:17] everything on labs is already sanitised anyway [09:27:23] oh, off of labs [09:27:36] yes, the labs data is public data I think [09:34:53] User has two different usernames on two different wikis. He wants to merge them. How is this done? [09:36:35] wmf wikis, or another project? [09:38:53] ok apergos just posted it. [09:40:34] wmf wikis [09:41:16] cortexA9: the speed mod email? [09:41:27] afaik thats already been discussed, check the archives [09:41:46] oh sorry [09:42:07] what about it ? [09:42:36] If i remember correctly, TimStarling researched it and it wouldn't be much use due to our caching layer in front [09:43:31] ok i understood sorry for double post. [09:44:05] https://www.mediawiki.org/wiki/User:MaxSem/mod_pagespeed [09:44:14] thats maxsem's research on it [09:45:13] good [09:45:19] nice to know [09:46:48] http://lists.wikimedia.org/pipermail/wikitech-l/2013-July/070310.html (start of one thread on it) and a even older thread http://lists.wikimedia.org/pipermail/wikitech-l/2010-November/050140.html [09:48:16] thanks for digging those up, p858snake|l [09:54:05] apergos: what about Prolexic ? in case of ddos. [09:55:39] well if we want to talk about ddos it's good for you to have an understanding of our basic architecture [09:55:51] p858snake|l, wmf wikis [09:56:06] so you can see where the specific weaknesses are, where we're more vulnerable and where we're not [09:56:51] most everything about our setup is available either on wikitech.wikimedia.org (look through the docs) [09:57:39] there are occasional talks folks give too about our setup, you can find those .. hm.. commons maybe? not too sure where those are gathered at this point [09:58:49] apergos: load balancing i think just enough right ? [09:58:52] we will in general prefer open source solutions to proprietary ones, and we will prefer local work as opposed to outsourcing when it goes to something in our core focus (such as keeping the site up) [09:59:13] we have load balancing of course [09:59:25] so what is the problem.. [09:59:30] :) [09:59:51] https://wikitech.wikimedia.org/wiki/LVS [09:59:57] there you go, load balancing [10:00:13] https://wikitech.wikimedia.org/wiki/Pybal and pooling/depooling hosts automatically [10:00:49] there are many types of ddos, not just 'give me this static html page' [10:01:09] anyways, again this is something where you want to get up to speed on the current setup and on past discussions [10:01:15] then you would be in position to jump in [10:01:46] apergos: i like the diagram [10:05:07] apergos: nameservers are protected ? from attacks. [10:05:29] we have some things we do [11:00:36] apergos: hello [11:14:44] hello [11:14:47] sorry, landlord [11:15:09] i don't see parent here [11:15:38] I'll ping [11:15:47] ok [11:18:07] pinged [11:18:15] I will be back in 5 mins [11:25:06] hmm [11:25:09] no parent [11:25:19] but rent paid so that's something [11:25:32] so how are things going? [11:25:58] ohh I see a pile of commits in the last little bit [11:26:24] quite well, it turns out deleting texts from a group was simpler than i expected [11:26:38] yay! [11:26:50] what were you thinkign you would have to do? [11:27:24] i wasn't sure, but until now, some problems always appeared when i was doing things like that [11:27:54] Eep. I need to get this week's tech report done [11:28:23] oh, i'm making some assumptions about the text of articles, and i wanted to verify if they are right [11:28:37] let's hear them [11:28:46] 1. the text of a page won't contain the zero byte [11:29:18] 2. the text of the page won't be UTF-8 encoded U+FFFF Unicode Noncharacter [11:29:52] if those are not true, i will have to figure out some other way to represent text groups [11:30:15] because i am using the zero byte as a delimiter between pages [11:30:43] and page text that is just U+FFFF means that the text was deleted from the group [11:31:11] (i'm updating the specification page to reflect the recent changes now) [11:31:14] wikipedia is the six website of the world :) [11:34:25] it's not possible to have a real time wiki? [11:34:40] a sort of a wikipedia 2.0 :P [11:37:53] Do you mean ONLY U+FFFF? [11:38:02] Or CONTAIN U+FFFF [11:38:28] Because the first is almost certainly safe, the second not so much [11:38:36] ? [11:38:39] what's that [11:38:45] Svick's question [11:39:00] U+FFFF being a Unicode null character [11:39:27] you remember google wave fail ? [11:39:28] i mean only U+FFFF [11:39:42] I'm going to try to make a zero-byte page [11:40:32] p858snake|l: that reply by Luis is later, not earlier :) [11:40:42] Given vandalism, it's not entirely a sasfe assumtion, but it's vanishingly unlikely [11:40:53] ? [11:40:58] it also doesn't say nothing on actual transfer of data [11:41:17] it was in September, the message I quoted in late August [11:42:10] https://en.wikipedia.org/w/index.php?title=User:Adam_Cuerden/Sandbox&action=history <- Okay, yes, a page can be zero bytes [11:42:19] Just a matter of someone blanking the page [11:42:33] AdamCuerden: it looks like U+FFFF is replaced with U+FFFF REPLACEMENT CHARACTER upon saving, so that should be okay [11:42:59] I'm pretty sure I make the assumption that null (\0) is not allowed in text [11:43:06] atfer having looked at the code [11:43:10] I am looking at it again though [11:43:16] Oh, do you mean the null character? [11:43:18] and i know a page can be empty, but that's not what i meant; i meant that it can't contain the zero byte '\0' [11:43:26] yeah [11:43:39] Oh, that I can't help with [11:44:13] Let's see.. [11:44:26] I suppose if any page'll have them, it'll be [[Null character]] [11:46:57] there is a wiki 2.0 ? [11:47:24] https://en.wikipedia.org/w/index.php?title=User:Adam_Cuerden/Sandbox&action=history <- I've attempted to add both null characters to this page. Are they there? [11:47:34] what is the most evolved wiki right now. [11:48:57] AdamCuerden: i see only U+25BA there [11:49:02] because you are xml encodings back you are likely going to be fine [11:49:35] in the future we can see a real time mediawiki ? [11:49:35] Then probably safe [11:50:04] I tried Alt+0, Alt+0000, and Alt+69904 [11:50:20] apergos: but i am not saving the texts XML encoded [11:50:27] I suppose a more precise test could be tried if you have them ready to paste into a page [11:50:52] of course you're not, what was I thinking [11:51:30] how can i suggest feature requests for mediawiki [11:51:55] http://www.fileformat.info/info/unicode/char/ffff/browsertest.htm has a pasteable U+FFFF, i tried that and it got saved as U+FFFD [11:53:30] Right [11:53:36] Then I think you're safe [11:54:25] yeah, it looks that way [11:54:45] you could try adding � and see what it does with that [11:55:01] I suppose it might be possible to bot-inject such codes. [11:55:41] Or, in theory, a parsoid bug. [11:55:56] parsoid bugs aren't theory :-D [11:56:15] Yes, but they seem to prefer chess pieces. [11:56:37] heh [11:59:01] apergos: if i write that to a wiki page normally, then it look like  in the XML, so i after decoding, i will get literally � back [11:59:21] if i edit the XML manually, then that won't tell me if MediaWiki can produce such XML [11:59:27] leaving your browser to do that final decoding [12:00:56] and it looks like MediaWiki itself does something with such code: if i write �* to a wiki page, the HTML contains * [12:02:00] it's got some sanitize routines [12:02:00] but I'm not sure how they interact with the new contenthandler stuff [12:04:41] hi [12:04:49] http://www.mediawiki.org/wiki/Future/Real-time_collaboration [12:04:52] apergos [12:04:56] :) [12:06:32] i have page source of this page https://en.wikipedia.org/w/index.php?title=Portal:Arts how i can add the source to my wiki to use it and change on it???? [12:07:48] ? [12:11:08] i have page source of this page https://en.wikipedia.org/w/index.php?title=Portal:Arts how i can add the source to my wiki to use it and change on it???? [12:12:17] static function decodeChar( $codepoint ) { [12:12:18] got it [12:12:26] you should be good to go with null and ffff [12:12:29] marktraceur: hello [12:12:40] validateCodepoint [12:12:53] see Sanitizer.php [12:12:59] it was staring me right in the face [12:13:02] svick: [12:13:04] apergos: ok, i tried to add those to a page using the API, and the results were also good [12:13:21] sorry to take so bloomin long [12:13:32] thanks for looking into it for me [12:14:04] i have page source of this page https://en.wikipedia.org/w/index.php?title=Portal:Arts how i can add the source to my wiki to use it and change on it???? [12:14:05] sure [12:14:24] cortexA9: ehtereditor I guess, I've played with it and like the concept but the big deal will be attribution [12:14:34] *ethereditor [12:15:09] so, now that i think all highest-priority things are done, i'm going to work on lower-priority things; starting with using LZMA with groups for diff dumps [12:15:25] ok [12:15:37] apergos: what u mean for attribution [12:15:45] i have page source of this page https://en.wikipedia.org/w/index.php?title=Portal:Arts how i can add the source to my wiki to use it and change on it???? [12:16:27] ahmed__: http://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_content [12:18:22] cortexA9: after 5 people have edited a revision on the etherpad popup and someone clicks 'save', who does the edit get attributed to? if you don't do it that way but every edit to the pad is considered a new version, how do you handle that (and is it reasonable)? [12:19:16] apergos: we need to find a new way. [12:20:34] apergos: i i don't needt the content of the page just need the forms and css and code [12:21:06] svick: how long do you think it would be before I could try testing with it in production? basically I will need to do the following, I think: convert an initial full xml dump to the new format, run an incremental of that same wiki, apply it to create a new full, convert the new full to xml, keeping track of space/time requirements [12:21:45] apergos: maybe an integration of the feature of etherpad in mediawiki [12:24:07] and I'd like to do that with one of the larger wikis to get a sense of things, and then with en (hmm, how will incrememtals applied to chunks work?) [12:25:08] apergos: i guess after i implement the better compression for diff dumps (which i think should be done by tomorrow) [12:25:24] nice! [12:25:37] ??? [12:25:40] do you get what I mean aboout the en dumps? right now they are done in 27 pieces [12:25:42] apergos?? [12:25:54] at the same time, the 27th is where the new pages end up [12:26:05] but old changed pages wind up anywhere in the first 26 [12:26:05] the only changes that i think could affect your results that i plan after that is better compression for metadata, but that shouldn't affect it that much [12:26:10] sure [12:26:31] ahmed__: I didn't understand what it is you want [12:27:29] d I just give the page range for each chunk? is that going to work out? [12:27:31] apergos: are u worried about the dumps ? [12:27:35] svick: [12:27:40] if you treated each piece basically as a separate wiki, then the current code will work [12:28:08] apergos : u see the link i need the source code for this page and put it in my wiki page to make page have the same style , where i can put page source code?? [12:28:28] apergos: if you need something more, then i would have to write that [12:29:02] ok so ahmed__ and cortexA9, I'm actually in a meeting right now so I'm going to ask you both if you want to talk to me, to wait a while (you can ask other folks of course) [12:29:20] svick: what I need to be able to do is update the pages from A to B say [12:29:32] so getting all changes for those pages as an incremental [12:29:45] apergos: cai i wait you [12:29:47] ?? [12:30:04] being able to convert the original xml for those pages to the new format [12:30:19] apply the incremental to that [12:30:38] there shouldn't be anything blocking me from doing that right? [12:30:51] yeah, if you want to convert a single XML piece to a single incremental piece, that should work [12:31:27] and if you then want to update the incremental piece, i think that should be just a matter of specifying the right parameters to dumpBackup [12:31:33] if a page is moved then we will see it gone in the one piece (so that's just a delete) and we'll see a new page (with old revs but the other chunk won't have those revs so ..) for a later chunk [12:31:55] i have page source of this page https://en.wikipedia.org/w/index.php?title=Portal:Arts how i can add the source to my wiki to use it and change on it???? any some one help? [12:32:03] the pieces are by title? [12:32:06] well I know how to get it to give me stubs for a given page range [12:32:13] they are by start page id to end page id [12:32:25] but if you move a page, its id doesn't change [12:32:38] ordered by page id and within a page the revs are typically by rev id but it's not 100%, as we discussed the other day [12:32:56] ok well the redirect will be new [12:33:04] whichever, I always get those screwed up [12:33:22] a delete and an undelete then [12:33:35] apergos: what u are doing ? [12:33:39] we'd see a delete in the one chunk and an apparent new page in the other [12:33:57] where by other I mean the last chunk which gets all new pages [12:34:20] ok, yeah, that will look like a delete in one piece and a completely new page in another piece (and all revisions for the undeleted page will be loaded from the database) [12:34:25] right [12:34:35] that's completely acceptable [12:34:45]