[00:15:30] Hi wikidevs, I am currently working on a tool that should assist in downloading and resizing Wikipedia images for distribution in development projects (i.e. people who don't have internet access!!). Think of it as Schools Wikipedia, but with everything included. I am putting all scripts in public domain to replace the ancient and unmaintained wikix bash script image downloader. Now, I have a couple of questions and I hope someone c [00:15:40] Scripts are here btw: https://github.com/benjaoming/python-mwdump-tools [00:16:05] 1) I would like to not get blocked -- who do I ask permission? :) [00:17:30] 2) It is possible to download thumbnail images from URLs like https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Mahabodhitemple.jpg/768px-Mahabodhitemple.jpg -- but how do I ensure that my script does not ask Wikipedia servers to create thumbnails on-the-fly ? If they do not exist, I can happily create them locally from the original, and server CPU/storage would be saved remotely. [00:19:56] I don't want to put a shitty script that annoys servers in public domain, so please do advice me!! I will also advice people to share Wikipedia media files through offline storage rather than downloading from source -- for instance, I will put my email contact for anyone who would like to obtain a copy of Wikipedia media by parcel mailing an external storage device :) [00:21:11] Hm. Don't we have dumps of stuff like this? [00:23:30] Krenair: It's all images from the English Wikipedia, in 2010 it was 1 TB -- so I doubt it? [00:27:13] benjaoming: do you want the largest available size anyways? [00:27:33] you should be using rsync once that's working again [00:27:41] idk what the status is or if there's a ticket [00:28:26] benjaoming: http://ftpmirror.your.org/pub/wikimedia/images/ [00:28:27] mutante: no, not really, I'd be interested in max 1024x1024 sizes -- so to save bandwidth and local processing, I wanted to go with the thumbnails [00:29:21] i'm not sure if you can tell if it exists, they are created if they don't [00:29:28] hmm [00:29:42] mutante: i don't think that's up to date. same rsync issue i just referenced [00:30:11] mutante: wow, thanks, unfortunately I can only find 2012 dates in there [00:30:32] benjaoming: there's more recent. but not from july afaict [00:30:47] i got that from http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media [00:30:55] but it was the only listed as external media dump [00:31:21] benjaoming: you could look at the Age header in the returned image and throttle the number of requests based on past responses with Age: 0 or something like that. responses with higher Age: values shouldn't count to the throttle [00:31:43] mutante: right. the breakage is on the WMF end. so all mirrors should be out of date [00:32:19] mutante: how did I not see that?? I've been on these pages hundreds of times :) [00:33:52] benjaoming: i'm not sure i understand the motive of this project. you want all images? or only particular subjects? [00:34:20] benjaoming: heh:) so what jeremyb says, and i guess as long as you use one of the common sizes: 128px | 256px | 512px | 1024px | 2048px [00:34:23] benjaoming: you should check out some of the existing projects to do offline stuff. e.g. openzim and http://www.nongnu.org/wp-mirror/ [00:35:00] benjaoming: do you follow that throttling strategy? [00:38:32] jeremyb: I will consider joining the wp-mirror project. For our purpose, we would like all of the English Wikipedia with all images to be used in rural districts. Media files are placed on external hard drives, while MySQL db is copied to local disk... we tinkle with the Mediawiki extensions, caching etc. to get everything running optimally. In 2010, we made it work quite well with only ~10s for making a search. [00:39:18] all images or all images actually used in WP articles is a huge difference, btw [00:41:19] Back in 2010, I used wikix, which crawls for [[File:...]] -- another option is to just take the images that are in the Mediawiki image table. [00:41:22] benjaoming: where's your code? where are your schools? [00:42:07] you should probably at least be fetching whichever size is used by the articles themselves [00:42:28] and the original unless it's too big. any other sizes you can make locally [00:42:42] mutante: problem with thumbnails is that they are either landscape or portrait, so it's impossible to predict AFAIK if to look for 1024px or 768px thumbnail... thereby I could end up generating new thumbnails for every portrait-oriented image [00:43:06] i don't entirely follow [00:43:16] are you catering to a specific screen size? [00:43:43] does each user have their own computer? is there any sort of network connection besides sneakernet? [00:44:22] jeremyb: everything is on a local server, there is supposedly almost no connectivity, it's Malawi [00:44:36] * jeremyb looks up malawi [00:45:39] i was going to say looks very small... and yes it's tiny! but much bigger than iceland [00:45:52] (population) [00:46:25] jeremyb: It's Africa, near the equator, so everything is quite a lot bigger than it looks.. actual length is like the UK :) [00:46:31] benjaoming: maybe for some cases via PDFs as well? http://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=book_creator [00:46:40] mutante: ewww :( [00:47:04] mutante: last time, we threw in the PDF creator extension.. it worked fine :) ...but I doubt that anyone used it [00:47:09] benjaoming: ok, so local server but what about workstations? is there one per student? [00:47:41] jeremyb: it's setup as single centers with LANs, 30-50 computers in each centre [00:48:25] ok [00:49:38] So, I will do this instead: 1) Use yours.org mirror instead of parsing articles and downloading from uploads.wikimedia.org and 2) Resize all images locally instead of bugging Wikimedia servers -- that should work, right? :) [00:49:53] for us it does:) [00:50:09] Cool, what are you doing with your copies? [00:50:21] well, as jeremey mentioned, it won't be up to 2013 [00:51:26] I can do a second pass and fetch the remainder, since I already wrote the necessary scripts to download in the creepy-crawly way :) [00:51:45] jeremyb: can't find a bug with rsync in it.. [00:52:37] i said i didn't know if it was logged :) [00:53:20] yea, so the answer is maybe no [00:55:20] mutante: https://bugzilla.wikimedia.org/show_bug.cgi?id=51001 [00:56:04] aha, perfect [00:56:14] benjaoming: see above [00:56:48] it's probably best if you comment there [01:04:40] mutante: I will start an rsync individually on each of the hash-style directories, like "images/wikipedia/aa" and once each of them complete I will start a task that resizes images. This way, I can deal with my storage capacity which is limited to max. 1.5 TB.. and reports say that december 2012, the size was 2.2 TB :) [01:04:59] I'll report back on bugzilla if rsync gives me a new snapshot [01:05:49] what do you mean if? [01:06:50] weren't you saying that you were unsure if the yours.org mirror was running again? [01:07:41] no. i'm certain it's not [01:07:44] http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/commons/0/08/Boeing_737-4Y0%2C_Nordic_European_Airlines_AN0696681.jpg [01:07:52] that should not be a 404 [01:11:42] http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/commons/b/bc/Enrag%C3%A9e_Point_Lighthouse.jpg [01:11:49] http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/commons/4/44/Winslow_Homer_-_Gloucester_Schooner.jpg [01:12:23] those should all work. all have been up for 4+ weeks (plenty of time to sync). only the oldest of them actually works [01:12:32] jeremyb: okay cool, thanks -- then I'll aim for a second pass to retrieve non-existent files parsing the dumps [01:12:32] benjaoming: no doubt, it's not running [01:12:52] k [02:06:40] jeremyb and mutante: script for rsyncing and resizing at the same time with concurrent processes: https://gist.github.com/benjaoming/ca6e65e5e38c56116b59 [02:07:23] benjaoming: github's kinda broken. (took ~3-4 reqs before it loaded) [02:13:45] jeremyb: yeah, I got that too... but it's up again now.. [02:19:08] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:08] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:08] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:09] AaronSchulz addshore AFBorchert akoopal Amqui anaconda andrewbogott_afk AntiSpamMeta apergos APexilI ashley Ask21 aude avar awjr AzaToth basile bblack benjaoming BlastHardcheese Bsadowski1 c camerin ccook Charmlet chasmo Chris_G closedmouth csteipp_afk Damianz Danny_B darknyan dbbot-wm Dereckson devunt Dmcdevit domas dungodung|away Elfix Elsie enhydra EvilJStoker_ fale FastLizard4 fluff fr3djuicy franny Garnig godog greg-g gry Guest87707 gwicke [02:19:09] _away Headbomb hegesippe hoo Hosiryuhosi Internet13 Isarra Izawayz James_F|Away jarry1250 Jasper_Deng jayne JD|cloud jem- jeremyb JeroenDeDauw jorm jubo2 Jyothis jzerebec1i Kaare Kingpin13 Krenair Krinkle kudu Lcawte legoktm LeslieCarr lfaraone lncabh Logan_ LoganCloud loigb Loki Lydia_WMDE m_3 Maeby Malvolio manybubbles|away marienz mark marktraceur MartijnH matanya mavhc Maximillion mazzanet mdale1 Merlissimo michi_cc Migrant mindspillage Mon [02:19:09] o morebots mutante mwalker Nasqueron Nemo_bis neverendingo Nietzsche notpeter Ocaasi ori-l paravoid petan pgehres|away phuzion PinkAmpersand putnik pyrak QueenOfFrance quiddity_ ragesoss Raylton|away RD Reedy relrod retsreklawts rmoen|away RoanKattouw_away RobH roblaAWAY Romaine root-80686 rschen7754 Ryan_Lane saper se4598_2 shdwmage Shirik siebrand skihero___ snitch Snowolf Spitfire springle StoneB str4nd T13|sleeps tchor tchor|noutbuk thedj t [02:19:09] hedj[work] Thehelpfulone TheWoozle thrasibule_ TimStarling Tm_T ToAruShiroiNeko ToBeFree twkozlowski tyteen4a03 v_a Vito Vivek vvv Vyznev wm-bot yurik YuviPanda_zz Zaran Zidonuke ZorroIII zuzak [02:19:10] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:10] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:10] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:11] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:13] !ops [02:19:15] justice must be served [02:19:16] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:16] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:16] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:19:18] AaronSchulz addshore AFBorchert akoopal Amqui anaconda andrewbogott_afk AntiSpamMeta apergos APexilI ashley Ask21 aude avar awjr AzaToth basile bblack benjaoming BlastHardcheese Bsadowski1 c camerin ccook Charmlet chasmo Chris_G closedmouth csteipp_afk Damianz Danny_B darknyan dbbot-wm Dereckson devunt Dmcdevit domas dungodung|away Elfix Elsie enhydra EvilJStoker_ fale FastLizard4 fluff fr3djuicy franny Garnig godog greg-g gry Guest87707 gwicke [02:19:18] _away Headbomb hegesippe hoo Hosiryuhosi Internet13 Isarra Izawayz James_F|Away jarry1250 Jasper_Deng jayne JD|cloud jem- jeremyb JeroenDeDauw jorm jubo2 Jyothis jzerebec1i Kaare Kingpin13 Krenair Krinkle kudu Lcawte legoktm LeslieCarr lfaraone lncabh Logan_ LoganCloud loigb Loki Lydia_WMDE m_3 Maeby Malvolio manybubbles|away marienz mark marktraceur MartijnH matanya mavhc Maximillion mazzanet mdale1 Merlissimo michi_cc Migrant mindspillage Mon [02:19:18] o morebots mutante mwalker Nasqueron Nemo_bis neverendingo Nietzsche notpeter Ocaasi ori-l paravoid petan pgehres|away phuzion PinkAmpersand putnik pyrak QueenOfFrance quiddity_ ragesoss Raylton|away RD Reedy relrod retsreklawts rmoen|away RoanKattouw_away RobH roblaAWAY Romaine root-80686 rschen7754 Ryan_Lane saper se4598_2 shdwmage Shirik siebrand skihero___ snitch Snowolf Spitfire springle StoneB str4nd T13|sleeps tchor tchor|noutbuk thedj t [02:19:18] hedj[work] Thehelpfulone TheWoozle thrasibule_ TimStarling Tm_T ToAruShiroiNeko ToBeFree twkozlowski tyteen4a03 v_a Vito Vivek vvv Vyznev wm-bot yurik YuviPanda_zz Zaran Zidonuke ZorroIII zuzak [02:19:24] o_O [02:19:38] heh [02:19:38] one sec [02:19:44] fun [02:19:52] that's what happens when you don't have idoru in your channel [02:19:54] jeremyb: should I set +r? [02:20:01] is the spammer here? [02:20:03] b/c I g2g really soon [02:20:06] Was just now luke1_ [02:20:08] * Jyothis waves at jeremyb [02:20:09] Jasper_Deng: idk, what did we do in the other channels? [02:20:13] hey Jyothis! [02:20:15] -en has +r right now [02:20:24] could it be that there is some browser attack at that URL? [02:20:33] seems someone beat me to it :) [02:20:44] Ryan_Lane: could you please watch this channel while I leave? [02:20:51] yep [02:20:54] thx [02:20:59] i can stick around too but i don't have ops [02:21:22] Jasper_Deng_away: bon voyage! [02:21:25] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:25] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:26] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:26] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:26] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:27] AaronSchulz addshore AFBorchert akoopal Amqui anaconda andrewbogott_afk AntiSpamMeta apergos APexilI ashley Ask21 aude avar awjr AzaToth basile bblack benjaoming BlastHardcheese Bsadowski1 c camerin ccook Charmlet chasmo Chris_G closedmouth csteipp_afk Damianz Danny_B darknyan dbbot-wm Dereckson devunt Dmcdevit domas dungodung|away Elfix Elsie enhydra EvilJStoker_ fale FastLizard4 fluff fr3djuicy franny Garnig godog greg-g gry Guest87707 gwicke_a [02:21:28] way Headbomb hegesippe hoo Hosiryuhosi Internet13 Isarra Izawayz James_F|Away jarry1250 Jasper_Deng_away jayne JD|cloud jem- jeremyb JeroenDeDauw jorm jubo2 Jyothis jzerebec1i Kaare Kingpin13 Krenair Krinkle kudu Lcawte legoktm LeslieCarr lfaraone lncabh Logan_ LoganCloud loigb Loki luke1_ Lydia_WMDE m_3 Maeby Malvolio manybubbles|away marienz mark marktraceur MartijnH matanya mavhc Maximillion mazzanet mdale1 Merlissimo michi_cc Migrant mindspil [02:21:28] lage Mono morebots mutante mwalker Nasqueron Nemo_bis neverendingo Nietzsche notpeter Ocaasi ori-l paravoid petan pgehres|away phuzion PinkAmpersand putnik pyrak QueenOfFrance quiddity_ ragesoss Raylton|away RD Reedy relrod retsreklawts rmoen|away RoanKattouw_away RobH roblaAWAY Romaine root-80686 rschen7754 Ryan_Lane saper se4598_2 shdwmage Shirik siebrand skihero___ snitch Snowolf Spitfire springle StoneB str4nd T13|sleeps tchor tchor|noutbuk t [02:21:28] hedj thedj[work] Thehelpfulone TheWoozle thrasibule_ TimStarling Tm_T ToAruShiroiNeko ToBeFree twkozlowski tyteen4a03 v_a Vito Vivek vvv Vyznev wm-bot yurik YuviPanda_zz Zaran Zidonuke ZorroIII zuzak [02:21:31] justice [02:21:32] !ops [02:21:35] vandalism [02:21:35] by a sysop [02:21:35] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:35] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:35] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:38] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:38] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:40] https://encyclopediadramatica.se/Ironholds#Wikipedia_vandalism [02:21:41] !ops [02:21:46] * c gives jeremyb a star [02:21:59] closedmouth: Could see if staff would join it here. might help. [02:22:04] ban the ident field [02:22:08] it's the same [02:22:14] identd* [02:22:18] bburhans: #wikimedia-operations now [02:22:26] fucker's back in -ops. [02:23:13] bburhans, ldunn: a few z-/k-lines would be good [02:23:21] * Jasper_Deng_away has g2g for real now [02:27:12] benjaoming: that's really verbose... [02:29:00] benjaoming: also, where did you get those prefixes from? [02:30:00] jeremyb: prefixes were generated from a directory listing of the current yours.org mirror [02:30:11] benjaoming: i don't think so [02:31:01] benjaoming: echo {{0..9},{a..f}}/{{0..9},{a..f}} [02:31:49] jeremyb: http://ftpmirror.your.org/pub/wikimedia/images/wikipedia/ <- I took everything that's two letters as with the mediawiki hashed paths [02:32:27] ohhhhhhhhhhh [02:32:32] you're doing it wrong [02:32:37] those aren't hashes [02:32:41] those are language codes [02:32:56] you probably only want commons images [02:33:17] most images on english wikipedia (which i think was your target?) are not actually from the english wikipedia [02:33:18] oh christ, thanks [02:33:25] i think. maybe someone has stats [02:34:42] yes, I think most are commons, should probably get "en" as well [02:35:11] anyway, very little benefit from getting those other langs [02:35:16] for you [02:35:33] yes en+commons would probably cover most [02:35:44] well, not most [02:35:49] necessarily everything [02:35:56] because no other images can be used on wikipedia [02:35:56] ! [02:35:56] ok, you just typed an exclamation mark with no meaning in the channel, good job. If you want to see a list of all keys, check !botbrain [02:36:07] wm-bot: quiet [02:39:36] poor jeremyb [02:39:45] orly? [02:42:32] benjaoming: updating the gist? [02:44:58] Jasper_Deng: wb! [02:45:04] benjaoming: hah, cookie monster [02:45:17] benjaoming: i hope you're not in your home TZ at this hour [02:50:23] jeremyb: yes, it's updated :) [02:52:03] > This page is taking way too long to load. [02:52:52] i'm most def up at 4:50 AM -- need to get this script running before I run out of time :) it'll be working for the next two weeks probably... just about as much time as I have left before flying out and rolling out the 300+ computer setup.. Wikipedia is a fundamental feature, can't let it down [02:53:26] have you talked to the OLPC peoples at all? [02:53:40] i assume you're competing with them to some extent [02:55:25] * Jasper_Deng poked jeremyb in -ops [03:03:32] jeremyb: OLPC people, nope, not really... we work with the digital divide as well, but we believe it's created by the western technology race, so while maintaining a critique from an environmental perspective, we refurbish used hardware.. the final result is that we're able to sponsor hardware that's more like the rest of the world's :) But kudos to those people, whatever they are doing.. wouldn't mind if we could work closer on th [03:04:33] * benjaoming is suspending his brain [03:05:08] jeremyb: oh btw one more thing, I gist'ed the python script that generates the bash script, same gist url [03:17:14] jeremyb: one more btw... thanks for all your help!!! [03:27:15] benjaoming: you got truncted [03:27:18] truncated* [03:27:22] yeah, saw the script [03:27:25] making some changes [11:35:17] happened to notice farsi wikipedia is getting hammered https://wikipulse.herokuapp.com/ [11:35:45] http://wikistream.wmflabs.org/#wiki=fa.wikipedia makes it look like AddBot has gone a bit crazy :) [11:35:58] unless there is a bug in wikipulse/wikistream ; quite possible [11:36:58] addshore: ^ [11:37:15] *looks* [11:37:20] seems to have stopped now [11:38:04] edsu: its on fi now [11:39:27] seems to be going at a much more moderate pace on fi [11:39:44] not that what it was doing on fa was wrong ; who am i to say :) just noticed it [11:49:47] crazy bots! [11:53:28] I, for one, Welcome our new bot overlords! [12:01:14] apergos: hello [12:02:03] hello [12:02:35] so the timeline looks fine (in that it includes generally what needs to be completed) [12:02:55] i read that as timelord at first... [12:02:58] but we will find it's ambititious as we get near the end. probably unavoidable [12:03:09] p858snake|l: take yer tardis and get outta here [12:04:12] I wrote a lits of notes (primarily for me) about 'pending things', it's kind of a hodge podge but have a look at it later [12:04:24] https://www.mediawiki.org/wiki/User:ArielGlenn/Dumps_new_format_%28deltas,_changesets%29 [12:04:45] this is more of aa checklist for me, but it will let you see the stuff I have in my head about the project [12:04:56] ah, except the first two items which are for you :-D [12:05:59] yeah, i think it's doable to finish this for the mid-September deadline, but we'll see [12:06:27] i sent that mail few minutes ago [12:07:38] and i'll have a look ath the rest of the list [12:09:52] ah sweet I see the mail [12:10:38] by "XML spec file", you mean the XSD? [12:10:52] yep [12:11:31] i think nothing has to change about that [12:14:08] what do you mean by "what do we do about folks that need two streams, i.e. stubs plus pages?"; why would anyone need stubs if they have pages? [12:14:58] stubs have byte count and text id, which pages don't unless you want to take strlen on all the revisions [12:16:12] like i said before, i doubt anyone uses textid (except for your second pass) [12:16:50] and the byte count could be easily added to the pages dump [12:17:04] if people thought that would be useful [12:17:22] don't know til we find out what's used [12:18:34] right, we'll see if someone responds to the mail saying they use it (of course, if nobody does, that doesn't necessarily mean nobody uses it) [12:18:59] well we'll find out when this code goes into trial use, for sure [12:19:19] right [12:20:52] I see parent54zz doing things [12:21:14] a but headed to class so oh well [12:21:37] he didn't respond to my Jabber message, so I assumed he won't come [12:21:54] I would prefer not to lose any info so it might be that we will include both those in 'pages' in the end, we'll see [12:22:17] yeah I just happened to notice him commenting in gerrit so... [12:22:48] ok [12:23:15] so, tomorrow i think i'll start working on reading directly from MediaWiki (probably calling dumpBackup, or whatever that script is called; figuring out what to do with un-/deletions) [12:23:32] fun times. [12:24:08] so dumpBackup writes output to (typically) a file and gives progress reports to stdout [12:24:30] which we use to assemble html files for endusers who download the dumps [12:24:52] so they can see last output messages from their wiki currently dumping [12:26:15] the output goes into a status file for that wiki [12:26:37] there's a monitoring script that runs separately and scans all the dirs once every few minutes (or maybe it's once a minute) [12:26:41] and writes html for them all [12:27:16] not that you have to do that right away but the end result should be able to hook into that framework [12:27:24] hmm, i didn't think about progress reporting yet, i'll have to think about that [12:27:35] ok [12:28:09] when you get around to it you can see what the existing reporting (for stubs and content dumps) looks like [12:28:26] part of it is about the eta (which is bogus for runs done in pieces iirc) [12:28:43] part of it is about the percent of prefetched articles (you won't have prefetch I guess) [12:29:17] and then there is revs per sec processed over a short interval and overall [12:29:46] those are the articles that were retrieved from a previous dump and not from the DB, right? [12:29:51] revs/sec? pages/sec? if only I had a memory) [12:29:56] yes, that's right [12:30:42] pages/sec and revs/sec :-D [12:30:43] it looks like it's both: 1943191 pages (380.5|1260119.4/sec all|curr), 9181525 revs (1798.1|988.9/sec all|curr) [12:30:56] yes, I went and looked at a current entry [12:31:14] curr = for the time interval since the last report, this is a fixed (small) time interval [12:31:43] no. I lie, we report every n pages I think. but it's still usually a small time interval [12:32:33] * apergos looks irritatedly at the abstracts step [12:32:53] probably have to parallelize that in the end. not that it costs us anything, I was just going to not bother [12:32:58] (that's off topic, sorry) [12:33:58] hmm, it's probably safe to assume that nobody parses the status messages, right? so it's not necessary to keep exactly the same format [12:34:54] if you change dumpruninfo, then we might have to kill you [12:35:00] but the status messages should be fair game [12:35:08] there's going ot be n preftch so they will have to change anyways [12:35:13] *to be no [12:35:16] prefetch [12:37:44] what's dumpruninfo? [12:39:26] we record each step as it runs or is waiting to run [12:39:40] when it completes we record whether it ran successfully, when it completed, ec [12:39:50] you won't have to do this, it's a job for the python wrapper scripts [12:39:56] which will simply call your program [12:40:10] ah, ok [12:40:34] likewise your program doesn't need to write status files, it just needs to pass through th progress reports it gets from dumpBackup.php to the wrapper scripts to take care of it [12:41:07] but at the same time it needs to get the output from dumpBackup.php which normally goes to a file (arg right on the command line) [12:41:24] so that will need some monkeying around with [12:42:48] that sounds like it could be done easily using named pipe; or maybe modifying dumpBackup, i'll look into that [12:43:51] dumpTextPass for the text dumps [12:44:36] but that assumes two pass which you won't be doing [12:44:52] progress reports are sent to stderr it says [12:44:56] so that's good [12:45:50] ok [12:47:22] i don't have anything else for today, what about you? [12:50:20] nope, just a caution that the details will start cropping up in a couple weeks as you get closer to a running prototype [12:51:49] yeah, it looks that way; see you tomorrow [12:52:05] see ya [14:22:40] hmm, bits and uploads are being a bit slow for me from .nl [16:09:43] jeremyb: concurrent rsync downloader GIST repo is updated, much improved now because mogrify is only run once per directory (and not both for each "en" and "commons" pass). [22:48:13] gn8 folks [22:49:46] hey folks, visual editor is loading for me on english wikipedia when i use https:// is that a known issue? [22:49:49] isn't* [22:51:13] Ahm, no? [22:51:18] Logged in? Logged out? URL? [22:53:20] logged in. https://en.wikipedia.org/wiki/User:Ocaasi/sandbox?veaction=edit [22:53:39] WFM [22:53:52] WFM as well. [22:53:54] What browser? [22:53:57] chrome, latest [22:54:02] are you using any fancy "privacy-enhancing" addons? [22:54:08] absolutely [22:54:10] or overly eager ad blockers? [22:54:12] https everywhere [22:54:14] adblock plus [22:54:20] well, don't [22:54:23] :) [22:54:23] or at least try without them [22:54:24] :) [22:54:31] Try disabling AdBlock to see if that's the culprit [22:54:38] HTTPSEverywhere should be fine [22:54:43] VE WFM with the Firefox version of it [22:54:56] and WFM with Opera version of it ;) [22:55:12] yeah, i have adblock plus disabled for wikipedia. still no luck [22:55:36] same issue in latest firefox [22:55:48] and waterfox [22:56:32] hm... just worked in firefox. it must be something with my chrome setup [22:58:24] Ocaasi: you asked me a question earlier... idr what it was... [22:59:08] Technical_13: i'm having layout issues here: https://en.wikipedia.org/wiki/Wikipedia:TWA/1/End the stats should be in the bottom right not mid-left [23:06:40] Doesn't load correctly on mobile and apparently I may have just edited inadvertently with the old script we started. You maybwant to check and revert Ocaasi.. :p [23:07:09] no sweat