There’s a whole field’s worth of really cool stuff about error correction that I wish I knew a fraction of enough to give reading recommendations about, but my comment wasn’t that deep – it’s just that in hashes, you obviously care about distribution because that’s almost the entire point of non-cryptographic hashes, and in error correction you only care that x ≠ y implies f(x) ≠ f(y) with high probability, which is only directly related in the obvious way of making use of the output space (even though it’s probably indirectly related in some interesting subtler ways).
E.g. f(x) = concat(xxhash32(x), 0xf00) is just as good at error detection as xxhash32 but is a terrible hash, and, as mentioned, CRC-32 is infinitely better at detecting certain types of errors than any universal hash family.
This seems to make sense, but I need to read more about error correction to fully understand it. I was considering possibility that data could also contain patterns where error detection performs poorly due to bias, and I haven't seen how to include these estimates in probability calculations.
Is there any proof of this? I'm interested in reading more about it.
> detect all burst errors up to 32 bits in size
What if errors are not consecutive bits?