[08:30:18] <Andr0id>	 Hello
[09:18:08] <Nemo_bis>	 legoktm: download.kiwix.org uses MirrorBrain, to automatically generate the torrent files and distribute load over the mirrors
[09:18:49] <Nemo_bis>	 Even without running an actual bittorrent client or tracker, you can just create torrents which use webfeeds and DHT
[09:19:32] <legoktm>	 Nemo_bis: that's what I did, just add webseeds and a few public tracker addresses
[09:20:17] <legoktm>	 https://phabricator.wikimedia.org/diffusion/2037/browse/master/run.py;ec1c8f926c750c0adf92a3dbc944d85fc6b68af6$42
[09:20:36] <Nemo_bis>	 Yeah, now I saw
[09:20:54] <Nemo_bis>	 I thought openbittorrent.com had died
[09:21:08] <Nemo_bis>	 Nowadays many seem to recommend http://coppersurfer.tk/
[09:22:34] <Nemo_bis>	 Can one just add all mirrors as webseeds and let some 404 (if they didn't sync yet or already deleted the dump as too old)?
[09:23:19] <legoktm>	 yeah, that should work
[09:25:38] <legoktm>	 http://ftp.acc.umu.se/mirror/wikimedia.org/dumps/aawiki/ doesn't have the 2017 dump yet
[09:25:56] <Nemo_bis>	 Yes, they seem to be a bit slower than your.org
[09:26:19] <Nemo_bis>	 But definitely worth including, I managed to download a 100 MiB/s from them yesterday ^^
[09:29:58] <legoktm>	 ok, done https://phabricator.wikimedia.org/R2037:a20b54a371229895a274b28a864726569a1cecf3
[09:31:50] <Nemo_bis>	 Now I'm curious how long it takes to hash all those files given the slow read :)
[09:34:02] <legoktm>	 aawiki has tiny dumps so the whole script was pretty fast
[09:34:02] <legoktm>	 real	0m1.791s
[09:34:02] <legoktm>	 user	0m0.129s
[09:34:03] <legoktm>	 sys	0m0.424s
[09:37:47] <Nemo_bis>	 system CPU is 4 times as much as user CPU
[09:38:18] <Nemo_bis>	 That will probably be a tiny bit better with bigger files, but rarely decent.
[09:40:44] <legoktm>	 alright, I'll run the other wikis tomorrow, and add the --piece-length=20 setting :) night!
[10:06:53] <apergos>	 Nemo_bis AND Yvette: we provide bz2 because it's block oriented and we can recover from issues in the middle
[10:07:16] <apergos>	 for small dumps we don't care but for large ones the run takes a good chunk of time, so that's a hard requirement, to be able to pick up in the middle
[10:08:08] <apergos>	 thanks Nemo for the edits!
[10:17:57] <Nemo_bis>	 apergos: isn't xz too? You can even concatenate
[10:18:20] <Nemo_bis>	 But xz is a bit too obscure for most people and LZMA is mostly useful for full history dumps anyway
[10:20:08] <apergos>	 yes, xz has some sort of built in indexing of blocks, I haven't looked into it in depth yet
[10:20:20] <apergos>	 that's for the dumps 2.0 rewrite, see what sort of compression we want to end up with
[10:20:58] <apergos>	 and probably split the files into many small pieces to be concat on demand for download, something like that
[10:40:46] <Nemo_bis>	 apergos: with the current setup, how hard would it be to add MirrorBrain?
[10:41:09] <Nemo_bis>	 It would be nice to send HTTP requests to the most local mirror. Some of them are amazingly fast
[10:41:27] <apergos>	 with the current setup, all my non dumps 2.0 work is limited to one day a week,or I'll never get that done
[10:41:31] <apergos>	 bearing that in mind...
[10:41:41] <Nemo_bis>	 Good point :)
[10:41:55] <Nemo_bis>	 I don't know how long it took for Kelson 
[10:41:56] <apergos>	 I'd rather have someone else take this on, and let me be reviewer/provide feedback/ etc
[10:42:08] <apergos>	 and of course do the actual deployment
[10:43:15] <apergos>	 so: investigate how mirrorbrain works, whether it makes sense when there are only a few mirrors, what config files it needs, how it decides which mirrors are faster/active/have current content etc
[10:43:53] <apergos>	 if it looks like it meets our needs, open a phab task describing the results of that investigation, proposing that it be adopted
[10:44:05] <apergos>	 figure out what puppet config would look like
[10:44:55] <apergos>	 (probably needs my help a bit for that part, at least for integrating with existing setup)
[10:45:12] <apergos>	 and get some preliminary patches in for us to look at
[10:50:54] <apergos>	 Nemo_bis: if you haven't been able to tell from the above, I'm hoping you know someone(s) who are willing to take that on and run with it
[10:55:17] <Nemo_bis>	 apergos: yeah I got it ;)
[10:55:30] <apergos>	 :-)
[10:55:58] <Nemo_bis>	 Maybe I'll give a look to MirrorBrain and see if I can send a puppet patch
[11:04:48] <apergos>	 Nemo_bis: cool!  If you open a phab task, please add me as a subscriber
[11:05:03] <apergos>	 you cna put it in the uh "datasets general other"  project I guess
[14:06:52] <Yvette>	 apergos: I still think we should divide the files by namespace.
[14:07:29] <apergos>	 huh?
[14:07:37] <apergos>	 I missed the context there...
[14:08:10] <Yvette>	 For like meta-pages-history or whatever, there's no way to just scan articles.
[14:08:16] <Yvette>	 You gotta scan everything and then filter.
[14:08:46] <Yvette>	 It would reduce the file sizes and scan times to separate things a bit more.
[14:10:16] <Yvette>	 I guess https://phabricator.wikimedia.org/T99483
[14:10:43] <Yvette>	 It just always feels so silly to me that we make people download/scan/load so many revisions that they don't probably don't care about.
[14:11:06] <Yvette>	 And/or https://phabricator.wikimedia.org/T20919
[14:12:45] <Yvette>	 https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Pages_with_the_most_revisions
[14:12:54] <Yvette>	 So many non-article revisions to wade through.
[14:13:43] <Yvette>	 Or even with current page history only: https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Page_count_by_namespace
[14:48:38] <Nemo_bis>	 People should just talk less in ns>0!
[21:08:11] <revi>	 can contents translation suggestion turned off completely?
[21:08:25] <revi>	 I don't want to use it ever still it occationally shows up and annoys me
[21:13:34] <Nemo_bis>	 revi: suggestions where?
[21:14:33] <Nemo_bis>	 Maybe you mean the popup upon editing a red link? There are others too.
[21:17:21] <revi>	 Nemo_bis: yeah, that popup