[06:33:50] Yvette: re time estimates, more like 21 CPU hours https://phabricator.wikimedia.org/P4750 [06:42:56] Huh, interesting. [06:43:00] Not too bad. [06:44:04] Though grep is fast. I wonder how much time is spent parsing. [06:44:31] Like you'd just get the line used with grep, but you actually need to get the page title, revision ID, etc. [06:45:41] `/usr/bin/time -v` is interesting. [06:46:56] I always use the shell builtin. Hmmm. [06:47:59] Nemo_bis: Thanks for the benchmark. That's helpful to know! [06:48:22] I'll still defend my two-week estimate, but for other reasons. ^_^ [06:51:20] Yvette: I use this method only for simple searches, especially when I'm not sure what to search for exactly. [06:51:39] For exact data better match the diffs with existing tools, cf. https://en.wikipedia.org/?diff=760144726&oldid=760128962 [06:52:55] It's nice that Labs has the files already. [06:53:13] Nice post. [06:54:37] Nemo_bis: Right, but even for simple searches, I'd be worried about false positives with word boundaries. [06:55:15] Case-sensitive searches or word boundary checks add time, of course. [06:55:20] Err, case-insensitive. [06:55:40] grep -i is impossible [06:56:07] Oh? [06:56:10] Why? [06:56:50] Horribly slow [06:57:00] :-) [06:57:20] ack-grep -i is slightly better iirc [06:57:52] Horribly slow doesn't sound impossible, just horribly slow. [06:59:51] Even if it's 100 or 1000 times slower? [07:00:06] Is it? [07:00:07] It was taking just 1% CPU for 7z e to max out grep -i [07:00:43] 2100 hours is 12.5 weeks. [07:00:50] So like three months, I guess. [09:39:18] Sigh, I keep receiving gerrit emails stuck 3 days or so in mx1001 [09:42:19] Yvette: about two wall clock hours to read the whole thing on a decent machine https://phabricator.wikimedia.org/P4751 [09:44:48] This is going to take a while https://ganglia.wikimedia.org/latest/graph.php?r=week&z=large&c=Miscellaneous+eqiad&h=mx1001.wikimedia.org&jr=&js=&v=276520&m=exim+queued+messages&vl=messages [09:51:12] mutante: is this an expected temporary consequence of " 21:53 mutante: mx1001 - upgrading exim4 packages, exim4-daemon-heavy, forcing puppet run" on 2017-01-05 (5 days earlier), or something to investigate? [13:34:45] wikiworkshop request: please lift ip account creation ban for ip 89.107.155.11 [13:36:47] you mean right now? [13:37:38] i don't think that is likely to happen. it's a sunday, most people who could deploy a change are probably enjoying the weekend off [13:37:48] if it's not right now, see https://meta.wikimedia.org/wiki/Mass_account_creation#Requesting_temporary_lift_of_IP_cap [13:37:52] piotrus: ^ [17:42:22] Nemo_bis: Okay. So different hosts there. [17:42:30] At first I thought you were saying grep added 19 hours. [18:25:07] Yvette: the second test took more user CPU time but a fraction of the wall clock time: older processor, but good IO [18:34:38] Hmm. [18:34:56] You should publish these stats somewhere. [18:34:59] They're interesting. [19:16:25] Yvette: it's not exactly a secret that LZMA has fast decompression [19:17:23] Maybe some reminder that the dumps can really be *much* faster to get and read could be added to some page of the kind of https://en.wikipedia.org/wiki/Wikipedia:Database_download#Why_not_just_retrieve_data_from_wikipedia.org_at_runtime.3F [19:19:13] I'm not sure how much to advertise that dumps.wikimedia.org is orders of magnitude slower than your.org. People may get offended. [19:19:48] (And I'm not sure if it's true in Asia or Africa, for instance.) [23:49:19] ugh https://phabricator.wikimedia.org/T132308 :/ [23:51:51] Still better than certain librarian standard which replace unknown digits with "u" :p [23:53:16] we're like CNN. Fake news!