How does this scale with the need to update data with time, corrections etc? Hav...

lxgr · on Oct 7, 2022

Given that the ZIM format is highly compressed, I'd assume that any "diff" approach would be computationally quite intensive [1] – on both sides, unless you require clients to apply all patches, which would allow hosting static patch files on the server side.

Bandwidth is getting cheaper and cheaper, and arguably if you can afford to get that initial 100 GB Wikipedia dump, you can afford downloading it more than once (and vice versa, if you can download multi-gigabyte differential updates periodically, you can afford the occasional full re-download).

One application where I could see it making sense is a related project [2] which streams the Wikipedia over satellite: Initial downloads at this point probably take several days of uninterrupted reception.

[1] Google has once implemented a custom binary diff optimized for Chrome updates, but I'm not sure if it still exists. [2] https://en.wikipedia.org/wiki/Othernet