This is a bit similar to count-min sketch, which can be used to calculate freque...

nullc · 2025-05-03T09:21:02 1746264062

IBLT operates over sets though, your downloading example is not over a set (the data has a position). You can make it work over data with positions by adding a position number to every encoding unit, but it's pure overhead. Other codes are more efficient.

IBLT is also particularly bandwidth inefficient when the amount of correction data is not very large. Ordinary codes can recover a missing block (known error location) with just the amount of data missing or with twice the amount if the location(s) are unknown.

There are codes with quasi-linear performance, though for your example application the number of errors are few so it shouldn't really matter if the code is cubic in the number of errors to decode (which is about the worst any should be).

thomasmg · 2025-05-03T14:34:46 1746282886

Yes, I know it is not designed to be a error-correction code, and other codes (turbo code, fountain code), are a lot more efficient. But I wanted to mention it because it's related to the topic.

> You can make it work over data with positions by adding a position number

Yes, that's what I did in my experiments. I personally found it really simple to implement; I'm not sure how easy it is to implement a "real" error correction code.

nullc · 2025-05-03T20:07:05 1746302825

Fountain code is decoded by essentially the same mechanism, it's called peeling. It's quite fun to implement. Though a plain fountain code has similar bandwidth inefficiencies... also the implementations that exist only correct erasures (though you can turn errors into erasures using a checksum/mac).

You can see the fountain code as a very sparse linear system that is usually solvable through a fairly simple greedy algorithm. You an augment the sparse linear system with a dense one like a RS code, which is solved by gauss-seidel which is also fun if less braindead simple.

This github user has a whole host of different competently implemented very high performance erasure codes:

https://github.com/catid/wirehair https://github.com/catid/leopard https://github.com/catid/fecal https://github.com/catid/siamese

As far as direct error codes. There is a competent BCH code implementation in the Linux kernel, but it only works over small fields so it doesn't support many independent blocks/packets.

Then there is https://github.com/bitcoin-core/minisketch which I am a coauthor of, which is a set corrector like IBLT which achieves perfect communications efficiency (at a trade off of cubic decode performance instead of linear but with good constant factors). I mention it mostly because it contains fast implementations of the relevant and tricky number theory algorithms needed for a fast RS error correcting code over big fields (any size up to 2^64 at least)... but I'm not actually aware of a performant large field RS error correcting implementation. (IIRC there is a RS error corrector in gnuradio but again small field).

(Mostly the small field stuff doesn't work for bigger fields because they use a chien search to find the roots of a polynomial, which is linear in the field size. Minisketch finds roots by factoring in log(|field|) time. Root finding is needed in algebraic error codes because it's how you find where the errors are. IIRC the linux kernel BCH implementation uses factoring like we do but is constructed to use a log table to turn multiplications into xors.)