Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh wow, I thought this was gonna be a REALLY large file, but only 95GB not bad, some worthless videogames are larger haha


Circa 2003 I carried around a pared down copy on a Pocket PC. Dropping a few chosen categories (who needs Sports?) allowed it to barely fit on a 1-GB SD card.


People going back in time need sports. An almonac of some kind.


While handy, it would be a bit too conspicuous. At least one could claim that an almanac is a novelty print.


I was curious how they achieve this. It looks like the underlying file format uses LZMA, or optionally Zstd, compression. Both achieve pretty high compression ratios against plain text and markup.

> Its file compression uses LZMA2, as implemented by the xz-utils library, and, more recently, Zstandard. The openZIM project is sponsored by Wikimedia CH, and supported by the Wikimedia Foundation.

https://en.wikipedia.org/wiki/ZIM_(file_format)


The more important thing is that they aggressively downsize the images and omit the history and talk pages. Even if they were using LZW it would probably only triple the filesize.


BTW: what's the difference between 95.2 GB file and 45 GB one? There is no info on download page.


95.2 is the "maxi" file. 49.48 is the "nopic" file. 13.39 is the "mini".

From https://www.kiwix.org/en/documentation/

File size is always an issue when downloading such big content, so we always produce each Wikipedia file in three flavours:

Mini: only the introduction of each article, plus the infobox. Saves about 95% of space vs. the full version. nopic: full articles, but no images. About 75% smaller than the full version Maxi: the default full version.


I remember the era of stupidly large games.


you mean today!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: