SHA1 has a collision, so now it uses SHA256. How long until a SHA256 collision? Shouldn't the new protocol just add support for many modern hash functions, and client updates can disable support for hashes that become insecure later? Or does this introduce its own headaches? That's what SSH does, right?
Don't people always say exactly that? What are the chances of a collision if we simultaneously use multiple hashes, does it become significantly less likely?
This is an understandable reaction, but the security margin on new crypto is way higher than old crypto. Roughly speaking, we went from "I guess if a state-level actor dedicated all their resources to this for a few decades, they could probably brute-force it" to "Even if you broke 9 out of 10 rounds in this algorithm, you'd still need to harness the energy of every star in the universe for 10 billion years to brute-force it."
Most algorithms today have been "attacked" in the sense that there are tricks we can do that allow us to recover the key faster than a simple brute-force attack. But "faster" usually means doing something like 2^100 operations instead of 2^128 -- still far beyond the realm of practicality.
It's telling that cryptographers are now seriously discussing reducing the security of various algorithms: https://eprint.iacr.org/2019/1492
No, "they" don't. SHA1 collisions had been "in the wind" for a while, they had been in sight ever since MD5 started showing signs of clear weakness in the early '00s. Wikipedia has a Rivest quote about it from 2005. There is nothing like that for SHA2, although attacks are improving.
> What are the chances of a collision if we simultaneously use multiple hashes
Define "simultaneous". Shipping twice the hashes for each piece seems a big waste of space. If you mean re-hashing hashes, it's just a waste of cpu power, since an attacker only has to break one or the other to get in a position to poison data.
you are massively prematurely optimizing, the vast majority of torrents are greater than a few hundred megabytes, nobody cares about the overhead of a few KB of hashes, you are already hashing data when you download or upload it, adding in another hash when data is fresh in the CPU cache is basically free.
Very unlikely that this is going to happen any time soon.
Most modern symmetric cryptographic primitives with sizes >=256 bits are considered safe even against quantum computers. SHA256 turned out to be even stronger than expected. SHA-3 adoption is delayed in many protocols because there is no much need for it and hw implementations for SHA256 are commonplace.
They use multi-hash [0] in magnet links, presumably for exactly this reason.
... but for consistency (like their narrowing of valid bencode), they’ve presumably chosen one main hash for now, so that every client and server doesn’t have to handle all of these cases as people provide a million variants of the same torrent.
Nope, the problem is that software never upgrade their ssl stack to support the newer ciphers. Especially Microsoft that's easily 10 years behind on the current SSL version.
Without the ability to support multiple versions, it would be impossible to upgrade anything at all. That would be a whole load of other problems.
I would guess that the reasoning is similar to why Git is moving to SHA256 (from SHA1) rather than to BLAKE3 — SHA256 was around 5 years ago and the major design change has been in the works for a while (BEP 52 dates to 2017). BLAKE3 (2019) would be a fine choice today.
I see. Thank you! By the way, I have not thought much about it, so in case you may know: would not it be possible to implement this in a way that allows swapping the hash function? So for example when we run into issues with SHA-256, change the hash function to something else.
Sure, but this is not any more expensive than any other versioning scheme you might be thinking of. Consider also that they got 19+ years out of v1, and that there is no reason to believe SHA2 will be broken faster than SHA1.
Probably, but would it be possible to make it so that one could easily swap the hash function? Like I am curious about the details here. I think it would be. Clients probably will have to implement a couple of commonly used hash functions, and so forth. I am not sure how it would work in practice or if it is worth it at all. I am interested in all the details though.
In addition to loeg's point about timing, I'll add that BLAKE3 has its own internal tree structure. It would be unfortunate to have two tree structures layered on top of one another, both because it's "ugly" and because it won't do as good a job of parallelizing things. However, unifying the tree structures would be a big commitment. Every detail of the layout would need to be exactly as it is in BLAKE3. There wouldn't be any space for custom metadata on interior tree nodes, for example. I'm not familiar with the protocol details of BitTorrent myself, but I wouldn't be surprised if that unified approach turned out to be too limiting. (But for a file/tree project that does use the exact BLAKE3 structure, see github.com/oconnor663/bao.)
That depends on the platform, the size of the input, and whether multithreading is used.
On Ice Lake, where BLAKE3 benefits from AVX-512 and SHA-256 benefits from the SHA extensions, BLAKE3 seems to do better on both long and short messages. But maybe surprisingly, SHA-256 does better in a medium-length regime, where SHA-256's poorer startup time* has been mostly amortized out, but BLAKE3's chunk parallelism hasn't yet kicked in. See for example the 1536-byte results here: https://bench.cr.yp.to/results-hash.html#amd64-icelake . Using multithreading would exaggerate BLAKE3's advantage for very long messages (usually about 1 MiB and above), but it wouldn't improve the results for any of the message lengths measured there.
* I don't actually know where SHA-256's startup overhead comes from. Maybe someone who knows more could jump in here?
On ARM chips, the performance benefits of NEON are less dramatic than AVX-512, and the performance advantage of SHA-256 hardware acceleration is comparatively larger. I think it's rare for BLAKE3 to beat accelerated SHA-256 on ARM without at least some multithreading, but I've only personally benchmarked a few Raspberry Pis, and I want to be careful not to overgeneralize.
Collisions in the context of bittorrent are not a big deal. We could (and will, there are millions of torrents around that will never be updated) keep using SHA1 and the world is not going to end.
They were a big deal on other networks using for example MD5, due to collisions malicious clients would just send you garbage parts. You wouldn't notice until finishing the complete download just to see that the file was corrupted.
Yes, because warez is the only valid use of bittorrent.
Most Linux distros offer an installation iso via torrent, large files with many blocks. If you can change just a small part of those files, you’ve got compromised machines before the install even begins.
Most of the practical hash attacks we've seen allow one to create two chunks of data that hash to the same thing, not to collide with an arbitrary other block. This greatly limits the attack scenarios we need to worry about.
(That is, we've got practical collision attacks emerging for SHA1, not pre-image attacks).
You would need to create a colliding pair (because the single existing one is so well known), itself not a simple thing, and create the two executables specifically with additional code discriminating between the two pairs to do two different things. You can't replace existing files with this attack, which means you'd have to create your own Linux distro with this extra "feature" and can't attack existing ones.
Remember that for cryptographers, "practical" or "broken" doesn't really mean "everyone can do it", and AFAIK there has only been one publicly released collision pair for SHA1, which also took an enormous amount of time and money to find.
Even MD5, for which you can generate colliding blocks in seconds on an average PC, is still quite resistant to preimage attacks.
In other words, even after spending the resources to find a colliding block, you'd also need to create both files with the same hash, and can't simply collide existing torrents' files and replace them with malicious ones.
You're assuming a malicious peer, but what about a malicious seeder? One could take an existing unsigned executable, add in NOPs and no one would notice (same thing for certain noises in audio/video files).
Since there's some control over the original hash value, executing step (4) is not exactly a preimage attack, it should be a bit easier.
If the seeder is malicious, there's no need to attack SHA1 at all. The seeder can't control which peers get which versions of any identical-SHA blocks — your target peer may share the bad block — so it seems easier to just upload the malicious content to everyone.
They aren’t? So someone can craft a payload that contains a rootkit or similar and has the same name and hash as the thing you wanted, and this is okay? (Disclaimer: I don’t know all that much about how hashes are used internal to the protocol)
The weakness is a collision, not a pre-image attack. That means that a pair of payloads can be crafted together, one innocent and one malicious, that have the same hash. But it is not feasible to take a given file and make a new malicious payload with a matching hash.
So if you get your hash from whoever made the original binary, you can know that any binary with a matching hash is fine. But it could be a problem if someone creates a new pair of binaries and uploads the hash to TPB. You might see lots of good reviews from people who got the innocent version, but then the peers you connect to send you a malicious version with a matching hash.
The vast majority of data shared over bittorrent is not executable.
So your scenario becomes an issue if you're downloading executable data that you deem :trusted: without any additional verification besides the hash itself. If that's the case, you have bigger worries than the hash.