[18:25:32] Hello, I am trying to size a server to load wikidata latest-all.nt.bz2. It seems like the dump is bigger than 200G after decompression. [18:25:42] What is the full size of the dump after decompression? [18:35:50] I dunno what the nt format is actually like... [18:36:03] But the XML dumps aren't small [18:36:04] >Wikidata is also large. In May 2019, 550GB bz2 compressed, 190GB 7z compressed. [18:36:24] It looks like based on your analysis, it will at least double too [18:36:34] And depending on what you're "loading" it into... [18:40:17] sorry I got disconnected [18:40:53] yes well, I will load it in something, I need to know at least how big is the decompressed dump [18:41:18] I tried with the latest-lexemes.nt, the size in the database is similar to the size of the decompressed dump [18:41:50] Size being larger will mostly depend on the indexes I guess [18:42:15] that's why I need to know the size of the decompressed dump [18:43:35] Unless someone in here has actually extracted it, there might not be an easy answer [18:43:44] What was the compressed to uncompressed ratio of the lexemes.nt? [18:43:56] good idea I will try that [18:44:00] I do not remember [18:51:11] it is around 3TB [18:51:28] based on the ratio of latest-lexemes.nt.bz2 [22:09:50] <__indo> Hi! I want to catalog facts about different filestores, primarily information about filename handling (maximum size, encoding type, etc). I was going to start by proposing a filestore property, but I'm not sure if that is appropriate. [22:10:48] <__indo> Does Wikidata want me storing this kind of information and is creating a new "filestore" property a good idea? [22:22:26] __indo: im just a lurker, so there may be better answers, but till then skim this https://www.wikidata.org/wiki/Wikidata:Project_chat [22:22:41] your question seems to fit there [22:24:20] <__indo> Thanks.