Hacker News new | past | comments | ask | show | jobs | submit login

What compression algorithms would help? It's already using lzma for the text (in the form of .xz).



The hutter prize is a competition for compressing Wikipedia:

http://prize.hutter1.net/

So the best algorithm to use from there is starlit, with a compression factor of 8.67, compared to lzma in 2MB chunks which can only achieve about 4:1 compression.


Oh, and if you are happy to wait days or weeks for your compressed data, Fabrice bellards nncp manages even higher ratios (but isn't eligible for the prize because it's too slow)


Submissions for the Hutter price also include the size of the compressor in the "total size". So I assume that's hard to beat if you use huge neural networks on the compression size, even if decompression is fast enough.


nncp uses neural networks, but 'trains itself' as it goes, so there is no big binary blob involved in the compressor.

The only reason it isn't eligible are compute constraints (and I don't think the hutter prize allows a GPU, which nncp needs for any reasonable performance).


Ah, OK, fair enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: