SHA1 has a collision, so now it uses SHA256. How long until a SHA256 collision? ...

Taek · on Sept 7, 2020

I don't think anyone is expecting a sha256 collision in the next 20 years. The crypto doesn't seem to leave much wiggle room for exploitation.

noja · on Sept 7, 2020

Don't people always say exactly that? What are the chances of a collision if we simultaneously use multiple hashes, does it become significantly less likely?

nemo1618 · on Sept 7, 2020

> don't people always say exactly that?

This is an understandable reaction, but the security margin on new crypto is way higher than old crypto. Roughly speaking, we went from "I guess if a state-level actor dedicated all their resources to this for a few decades, they could probably brute-force it" to "Even if you broke 9 out of 10 rounds in this algorithm, you'd still need to harness the energy of every star in the universe for 10 billion years to brute-force it."

Most algorithms today have been "attacked" in the sense that there are tricks we can do that allow us to recover the key faster than a simple brute-force attack. But "faster" usually means doing something like 2^100 operations instead of 2^128 -- still far beyond the realm of practicality.

It's telling that cryptographers are now seriously discussing reducing the security of various algorithms: https://eprint.iacr.org/2019/1492

inopinatus · on Sept 8, 2020

This isn't wholly unreasonable. At some point we will stop worrying about state actors and start worrying about the Xeelee.

sudosysgen · on Sept 8, 2020

If they have access to such power were toast anyways.

toyg · on Sept 7, 2020

> Don't people always say exactly that?

No, "they" don't. SHA1 collisions had been "in the wind" for a while, they had been in sight ever since MD5 started showing signs of clear weakness in the early '00s. Wikipedia has a Rivest quote about it from 2005. There is nothing like that for SHA2, although attacks are improving.

> What are the chances of a collision if we simultaneously use multiple hashes

Define "simultaneous". Shipping twice the hashes for each piece seems a big waste of space. If you mean re-hashing hashes, it's just a waste of cpu power, since an attacker only has to break one or the other to get in a position to poison data.

throwaway2048 · on Sept 8, 2020

you are massively prematurely optimizing, the vast majority of torrents are greater than a few hundred megabytes, nobody cares about the overhead of a few KB of hashes, you are already hashing data when you download or upload it, adding in another hash when data is fresh in the CPU cache is basically free.

bamboozled · on Sept 7, 2020

In twenty years, So they have 10-15 years to safely do other things.

nabla9 · on Sept 7, 2020

> How long until a SHA256 collision

Very unlikely that this is going to happen any time soon.

Most modern symmetric cryptographic primitives with sizes >=256 bits are considered safe even against quantum computers. SHA256 turned out to be even stronger than expected. SHA-3 adoption is delayed in many protocols because there is no much need for it and hw implementations for SHA256 are commonplace.

tekacs · on Sept 7, 2020

They use multi-hash [0] in magnet links, presumably for exactly this reason.

... but for consistency (like their narrowing of valid bencode), they’ve presumably chosen one main hash for now, so that every client and server doesn’t have to handle all of these cases as people provide a million variants of the same torrent.

[0]: https://github.com/multiformats/multihash

stingraycharles · on Sept 7, 2020

Isn’t the fact that OpenSSL et al allow so many arbitrary ciphers the reason of a whole load of problems?

loeg · on Sept 7, 2020

Yep: https://en.wikipedia.org/wiki/Downgrade_attack

> Downgrade attacks have been a consistent problem with the SSL/TLS family of protocols; examples of such attacks include the POODLE attack.

user5994461 · on Sept 7, 2020

Nope, the problem is that software never upgrade their ssl stack to support the newer ciphers. Especially Microsoft that's easily 10 years behind on the current SSL version.

Without the ability to support multiple versions, it would be impossible to upgrade anything at all. That would be a whole load of other problems.

johnisgood · on Sept 7, 2020

Why not BLAKE3? I am really curious. I mean, since it exists, why not that over SHA256? Because it is relatively new?

loeg · on Sept 7, 2020

I would guess that the reasoning is similar to why Git is moving to SHA256 (from SHA1) rather than to BLAKE3 — SHA256 was around 5 years ago and the major design change has been in the works for a while (BEP 52 dates to 2017). BLAKE3 (2019) would be a fine choice today.

johnisgood · on Sept 7, 2020

I see. Thank you! By the way, I have not thought much about it, so in case you may know: would not it be possible to implement this in a way that allows swapping the hash function? So for example when we run into issues with SHA-256, change the hash function to something else.

loeg · on Sept 8, 2020

We already can: the "swap" will just be a v3 along the same lines.

johnisgood · on Sept 8, 2020

Yeah, but would not they have to create v4, v5 and so forth every N years, for different hash functions?

loeg · on Sept 8, 2020

Sure, but this is not any more expensive than any other versioning scheme you might be thinking of. Consider also that they got 19+ years out of v1, and that there is no reason to believe SHA2 will be broken faster than SHA1.

johnisgood · on Sept 8, 2020

Probably, but would it be possible to make it so that one could easily swap the hash function? Like I am curious about the details here. I think it would be. Clients probably will have to implement a couple of commonly used hash functions, and so forth. I am not sure how it would work in practice or if it is worth it at all. I am interested in all the details though.

oconnor663 · on Sept 8, 2020

In addition to loeg's point about timing, I'll add that BLAKE3 has its own internal tree structure. It would be unfortunate to have two tree structures layered on top of one another, both because it's "ugly" and because it won't do as good a job of parallelizing things. However, unifying the tree structures would be a big commitment. Every detail of the layout would need to be exactly as it is in BLAKE3. There wouldn't be any space for custom metadata on interior tree nodes, for example. I'm not familiar with the protocol details of BitTorrent myself, but I wouldn't be surprised if that unified approach turned out to be too limiting. (But for a file/tree project that does use the exact BLAKE3 structure, see github.com/oconnor663/bao.)

somedude11 · on Sept 8, 2020

SHA2 is hardware accelerated on many new CPUs, Blake family not so much.

johnisgood · on Sept 8, 2020

I know, but according to the graphs, it is much more faster than SHA, despite hardware acceleration, so I am not sure.

oconnor663 · on Sept 8, 2020

That depends on the platform, the size of the input, and whether multithreading is used.

On Ice Lake, where BLAKE3 benefits from AVX-512 and SHA-256 benefits from the SHA extensions, BLAKE3 seems to do better on both long and short messages. But maybe surprisingly, SHA-256 does better in a medium-length regime, where SHA-256's poorer startup time* has been mostly amortized out, but BLAKE3's chunk parallelism hasn't yet kicked in. See for example the 1536-byte results here: https://bench.cr.yp.to/results-hash.html#amd64-icelake . Using multithreading would exaggerate BLAKE3's advantage for very long messages (usually about 1 MiB and above), but it wouldn't improve the results for any of the message lengths measured there.

* I don't actually know where SHA-256's startup overhead comes from. Maybe someone who knows more could jump in here?

On ARM chips, the performance benefits of NEON are less dramatic than AVX-512, and the performance advantage of SHA-256 hardware acceleration is comparatively larger. I think it's rare for BLAKE3 to beat accelerated SHA-256 on ARM without at least some multithreading, but I've only personally benchmarked a few Raspberry Pis, and I want to be careful not to overgeneralize.

heartbeats · on Sept 7, 2020

What if the hashes differ?

ssl232 · on Sept 7, 2020

I expect there would have to be other changes to allow this feature, such as the server advertising all the available/supported hashes for each chunk.

armitron · on Sept 7, 2020

Collisions in the context of bittorrent are not a big deal. We could (and will, there are millions of torrents around that will never be updated) keep using SHA1 and the world is not going to end.

aeyes · on Sept 8, 2020

They were a big deal on other networks using for example MD5, due to collisions malicious clients would just send you garbage parts. You wouldn't notice until finishing the complete download just to see that the file was corrupted.

frenchman99 · on Sept 7, 2020

It isn't? Is it not possible to be tricked into downloading a malicious binary that you then execute on your computer?

calvinmorrison · on Sept 7, 2020

Unlike the Photoshop 2020 WareZ Cracked Unlocked 2020 Xvid Torrentz WZ FUN.torrent, that I just downloaded

vermilingua · on Sept 7, 2020

Yes, because warez is the only valid use of bittorrent.

Most Linux distros offer an installation iso via torrent, large files with many blocks. If you can change just a small part of those files, you’ve got compromised machines before the install even begins.

mlyle · on Sept 7, 2020

Most of the practical hash attacks we've seen allow one to create two chunks of data that hash to the same thing, not to collide with an arbitrary other block. This greatly limits the attack scenarios we need to worry about.

(That is, we've got practical collision attacks emerging for SHA1, not pre-image attacks).

userbinator · on Sept 7, 2020

You would need to create a colliding pair (because the single existing one is so well known), itself not a simple thing, and create the two executables specifically with additional code discriminating between the two pairs to do two different things. You can't replace existing files with this attack, which means you'd have to create your own Linux distro with this extra "feature" and can't attack existing ones.

loeg · on Sept 7, 2020

I don't think so.

1. The user trusts the source of the .torrent file.

2. A malicious peer makes a preimage attack in some block in an executable file with contents containing some malicious executable payload.

3. The executable wasn't signed, or the targeted block must include executable headers.

4. Some peers get the malicious exe, some don't.

The (2) step is still hard — preimage attacks on SHA1 are still expensive.

And it is probably much easier to bypass SHA1/SHA256 entirely by just uploading a malicious torrent directly and hoping (1) still applies.

userbinator · on Sept 7, 2020

Remember that for cryptographers, "practical" or "broken" doesn't really mean "everyone can do it", and AFAIK there has only been one publicly released collision pair for SHA1, which also took an enormous amount of time and money to find.

Even MD5, for which you can generate colliding blocks in seconds on an average PC, is still quite resistant to preimage attacks.

In other words, even after spending the resources to find a colliding block, you'd also need to create both files with the same hash, and can't simply collide existing torrents' files and replace them with malicious ones.

yyyk · on Sept 8, 2020

You're assuming a malicious peer, but what about a malicious seeder? One could take an existing unsigned executable, add in NOPs and no one would notice (same thing for certain noises in audio/video files).

Since there's some control over the original hash value, executing step (4) is not exactly a preimage attack, it should be a bit easier.

loeg · on Sept 8, 2020

If the seeder is malicious, there's no need to attack SHA1 at all. The seeder can't control which peers get which versions of any identical-SHA blocks — your target peer may share the bad block — so it seems easier to just upload the malicious content to everyone.

flatline · on Sept 7, 2020

They aren’t? So someone can craft a payload that contains a rootkit or similar and has the same name and hash as the thing you wanted, and this is okay? (Disclaimer: I don’t know all that much about how hashes are used internal to the protocol)

sp332 · on Sept 7, 2020

The weakness is a collision, not a pre-image attack. That means that a pair of payloads can be crafted together, one innocent and one malicious, that have the same hash. But it is not feasible to take a given file and make a new malicious payload with a matching hash.

So if you get your hash from whoever made the original binary, you can know that any binary with a matching hash is fine. But it could be a problem if someone creates a new pair of binaries and uploads the hash to TPB. You might see lots of good reviews from people who got the innocent version, but then the peers you connect to send you a malicious version with a matching hash.

armitron · on Sept 7, 2020

The vast majority of data shared over bittorrent is not executable.

So your scenario becomes an issue if you're downloading executable data that you deem :trusted: without any additional verification besides the hash itself. If that's the case, you have bigger worries than the hash.

dmurray · on Sept 7, 2020

Or if there's a vulnerability in a popular media player.