Hacker News new | past | comments | ask | show | jobs | submit login

"Not only is a cryptographic hash unnecessary, under certain circumstances it will actually do a worse job."

A cryptographic hash is unnecessary, but as you point out in the next to last paragraph, it is statistically improbable that it will do a worse job. Because collisions are statistically improbable.




Specifically, I wrote a program to search for single-bit-flip collisions in sha1 truncated to 16 bits. The program didn't need to search for long before finding two messages with the same 16-truncated sha1 with a single bit flip at bit 1 of byte 171 of a 256-byte message. 376 1 171 be44b935e7ecfc81d1fe2cddcd7c1d7e04338fd83fa994cd6a877732ca5d8db83346bd9ccbfc4c8770682bd307c782421a512a80a106be87825d5c13f3156e23ffaacdfc1651f88f775507d1175542def2ccf084271ebd4ead175c8a448be0d50b26f59d970301ebc5a7f672d3ea870d9a1e02f8f5fd01c38297b8aa264a3f07fec32f9a91aa359784d2d9ce0e4649465c705f50feed23dcbefc0a726cfadb5e47ee577ed45203f90d6e2e650d42ddb10cba49d06bd4cdad4e6eaf5cfcb062de2539fc847ce0c104f2e667369080eaaab5934ae5f7f1ba733c3d1bfbda87bfa72ef12475b9ff0edc4deb99e6a5cf387c7f6b9c71ea62b4db4bb67c92d36460dd be44b935e7ecfc81d1fe2cddcd7c1d7e04338fd83fa994cd6a877732ca5d8db83346bd9ccbfc4c8770682bd307c782421a512a80a106be87825d5c13f3156e23ffaacdfc1651f88f775507d1175542def2ccf084271ebd4ead175c8a448be0d50b26f59d970301ebc5a7f672d3ea870d9a1e02f8f5fd01c38297b8aa264a3f07fec32f9a91aa359784d2d9ce0e4649465c705f50feed23dcbefc0a726cfadb5e47ee577ed45203f90d6e2e640d42ddb10cba49d06bd4cdad4e6eaf5cfcb062de2539fc847ce0c104f2e667369080eaaab5934ae5f7f1ba733c3d1bfbda87bfa72ef12475b9ff0edc4deb99e6a5cf387c7f6b9c71ea62b4db4bb67c92d36460dd

https://gist.github.com/jepler/96d1e779dc95b8941b208887e10a8...

On the other hand, any CRC will detect all such errors; a well-chosen one such as CRC32C will detect all messages with up to 5 bits flipped at this message size.

This is quite appropriate for the error model in data transmission, of uncorrelated bit errors. https://users.ece.cmu.edu/~koopman/networks/dsn02/dsn02_koop... is a pretty good paper, though there are probably better ones for readers without an existing background in how CRC works.


You are not testing a crypto hash. "Crypto hash" means it is cryptographically strong, not truncated to 16 bits. For example ZFS with checksum=sha256 will use the full 256-bit hash for detecting data corruption.


Yup, you're right. if you use a full size cryptographic hash then the number of undetected errors can be treated as 0 regardless of hamming distance. On the other hand, it has 8x the storage overhead of a 32-bit CRC.


Less so if you assume the cryptographic hash will be truncated to 32bits so that its size matches crc32c. Furthermore, some of those collisions will be on message pairs with small hamming distance, probably including messages with a single bit flipped which CRC will always detect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: