Hacker News new | past | comments | ask | show | jobs | submit login

How does this scale with the need to update data with time, corrections etc? Having to download everything again doesn't seem that elegant. I think this wold benefit a lot from some form of incremental backup support, that is, download only what was changed since last time. A possible implementation of that could be a bittorrent distributed git-like mirror so that everyone could maintain their local synced one and be able to create its snapshot on removable media on the fly.



Given that the ZIM format is highly compressed, I'd assume that any "diff" approach would be computationally quite intensive [1] – on both sides, unless you require clients to apply all patches, which would allow hosting static patch files on the server side.

Bandwidth is getting cheaper and cheaper, and arguably if you can afford to get that initial 100 GB Wikipedia dump, you can afford downloading it more than once (and vice versa, if you can download multi-gigabyte differential updates periodically, you can afford the occasional full re-download).

One application where I could see it making sense is a related project [2] which streams the Wikipedia over satellite: Initial downloads at this point probably take several days of uninterrupted reception.

[1] Google has once implemented a custom binary diff optimized for Chrome updates, but I'm not sure if it still exists. [2] https://en.wikipedia.org/wiki/Othernet




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: