Great question. In the recommendations, we're suggesting using two salts. One fo...

jules · on Jan 23, 2013

It does so far, and I see how this does no harm (as you say), but I also don't see how it improves security? Can you give an attack that would work against the unsalted system which wouldn't work against the salted system?

jfarmer · on Jan 23, 2013

Without a salt one can pre-compute all the KDF (input, output) pairs and store them in a dictionary (essentially). This single dictionary now acts as a kind of master key.

By including a random per-user salt, even assuming the attacker has access to the salt, the output of the KDF now varies on both a per-user and per-password basis.

Two users with the same password have different keys under this design, so pre-computing something that works for all users becomes infeasible.

If the KDF is weak in other ways, e.g., it is so fast that one doesn't need to pre-compute anything, that's a separate issue.

jules · on Jan 23, 2013

I see, so the conditions for this attack are:

1. You have a whole bunch of encrypted files from different users. 2. The KDF is slow compared to checking if decryption with a given output of the KDF is successful.

With a salt this will take time nm(T + Q) where n is the number of files, m is the number of different passwords and T is the time it takes to execute the KDF and Q is the time it takes to try to decrypt with a derived key. Without a salt you can amortize the KDF executions for a total time of mT + nmQ.

In particular it doesn't help you decrypt one specific file, and the total time is still on the order of n*m, so unless Q is very cheap compared to T this won't help much (obviously still a good idea to do the salting though).

jfarmer · on Jan 24, 2013

I think you're confusing encryption and KDFs. Encryption is meant to be reversible, KDFs are not.

In this context, a KDF is used for verifying that a message is valid. I have a derived key, you give me a key, and I verify that key by running it through the KDF. If KDF(your_key) == derived_key then we have "proven" that your_key is valid.

Another way of thinking about it is that the image of a KDF should map low-entropy input to high-entropy output. Depending on what we're using the KDF for, we might also want it to have other properties, e.g., intentionally slow, intentionally memory intensive, etc.

This is different than you encrypting something, sending me the encrypted message, and me decrypting it using a shared secret.

http://en.wikipedia.org/wiki/Key_derivation_function http://en.wikipedia.org/wiki/Rainbow_table

jules · on Jan 24, 2013

I don't think I said anything that disagrees with this? In this case the output of the KDF is used as a key to decrypt a further key, with is in turn used to decrypt your file. So an attacker can't just do a simple check to see that he guessed the right password, he actually needs to decrypt the file and see if garbage comes out or an actual decrypted file.

rarrrrrr · on Jan 23, 2013

The key derivation function's purpose is to take a long time (CPU time) to compute an output key given a password input. The combination of dangers here is that the KDF is too fast (doesn't take enough CPU) AND the output is the same per user.

What we're trying to protect against is someone in possession of the data (such as Mega themselves, or anyone working for them, or hacking them, or seizing their assets, or just someone over the internet making an offline attempt against an account after discovering the auth hash) from making a brute force attempt to learn the plaintext content of the accounts.

So to help me crack accounts, I could make a database of outputs from the KDF, with every password combination pre-computed. "Start with "aaaaaa" as a password, and go all the way through "ZZZZZZ" out to some number of characters. The database would have 32 output bytes per entry. The first 32 bytes would be the output of "aaaaaa" through the kdf, then "aaaaab", and so on.

If my math is right, going out to 6 characters using letters numbers and punctuation, the database (probably just a file really) would be about 5 TB in length, and based on my single slow CPU's AES speed, would take about 99 CPU days to compute. Of course if you have 100 CPUs (or EC2, or a botnet) you could have it done in time for dinner tomorrow. Or just wait because someone will likely precompute one you can bittorrent in the next few days.

So, if I wanted to make a brute force attack against Mega's database of users, I would use this pre-computed database, reading it sequentially from disk, taking every 32 byte section and trying it as a key to decrypt the next layer of keys for a user. My CPU could likely check answers faster than the data would stream in from the disk.

So I would be able to brute force any combination out to 6 password characters in a few minutes using only the spare hardware sitting around my office and unoptimized code. Imagine what NSA could do.

So, finally getting back to answering your question. If the salt were used, then I couldn't pre-compute the KDF output. I would have to do it separately for each user. So then my attempts would (again on my weak CPU) proceed at only about 520 attempts per second. That's still about 500 times to fast. So, the design needs both a slower KDF, and salt. :)

Trivia: Want to time your own CPU and how many AES cycles it can do?

  $ openssl speed -evp aes-128-ecb 
  Doing aes-128-ecb for 3s on 16 size blocks: 102358287 aes-128-ecb's in 3.00s 
  >>> 102358287 / 3 / 2**16 
  520

Hopefully that makes sense!

jules · on Jan 23, 2013

Well it kinda does make sense but I don't understand why you would go through this trouble with sending salts around. If the goal is what you describe couldn't you just derive a salt from the username at the client side? (I was assuming that they were already doing this, but apparently they are not?)

jfarmer · on Jan 24, 2013

The higher the entropy of the salt the better.

Let's say everyone did as you propose and use usernames for salts. I could pre-compute dictionaries for usernames like root, admin, etc. and likely gain full access to a wide range of systems.

Our aim is to build a system where brute-forcing is always the cheapest option. Once that's the case, we're in a position to control "how expensive" it is for someone to invert even one derived key.

jules · on Jan 24, 2013

Presumably each username on Mega is unique, so that precomputation won't give you anything.

jfarmer · on Jan 25, 2013

Unique inside Mega, yes, but not globally.

Imagine if UNIX systems did as you propose. The rainbow table for the salt "root" would let me do plenty of damage.

jiggy2011 · on Jan 24, 2013

Perhaps the point is to reduce the risk that a rainbow table already created for some other use could be used in this case?

jules · on Jan 24, 2013

That would at most reduce the number of KDF calls an attacker would need to perform by a factor of 2, if (1) the other service is using exactly the same KDF (2) all the usernames are identical in the two services. If not all the usernames are identical, then the factor of 2 is even lower. If an attacker can perform n calls to a KDF then he can also perform 2n calls to it, so if you're relying on that factor of 2 for security you have a problem no matter what. In any case this is easily defeated by adding a random string: KDF(username + "|blablafoo|" + password).

jiggy2011 · on Jan 24, 2013

Yes, this would be true in the case of adding a random string as a second salt but then this would have to be sent to the client also.

I am not sufficiently familiar with KDFs but I know that rainbow tables exist to crack straight up md5 hashed passwords for common usernames and passwords where the username has been used as the salt.

Maybe harder to do with a KDF because it's prohibitive to compute that many keys in the first place?

fx5 · on Jan 28, 2013

Good explanation! Hey, I have exactly the same cpu btw. :)

ww520 · on Jan 23, 2013

In a unsalted system, let's say your password is "hello" and my password is "hello." Your hash("hello") will be same as my hash("hello"). When I see your hash is same as mine, I know your password is "hello."

In a salted system, your hash("hello" + salt1) != my hash("hello" + salt2), and I have no way of guessing your password since the hashes are different.