The main difference seems to be that the NSRL does not include PGP signatures (o...

zxv · on Nov 12, 2016

The NSRL dataset has signatures that are typically used to verify both integrity and veracity.

http://www.nsrl.nist.gov/RDS/rds_2.54/split-hash.txt

Alleging the NSRL is untrustworty is inconsistent with the track record of the NSRL and NIST scientists.

Please be aware that there are thousands of forensic experts who have relied on the NSRL over the last decade or more as a basis for testimony in court. Those experts verify hashes for everything they do, and for every case, and as a result there has been significant amount of independent peer review of the contents.

While Codehash.db provides a hash for a package, the NSRL provides hashes for individual installed files.

This in no way diminishes the value of the Codehash.db design. They target different use cases.

andrewdavidwong · on Nov 12, 2016

> The NSRL dataset has signatures that are typically used to verify both integrity and veracity. > http://www.nsrl.nist.gov/RDS/rds_2.54/split-hash.txt

Can you explain this signature scheme? I'm not familiar with it. The link you provided just appears to show hashes and sizes for a file that has been split into four pieces.

> Alleging the NSRL is untrustworty is inconsistent with the track record of the NSRL and NIST scientists.

I'd just like to point out that neither I nor anyone else here has alleged that.

> Please be aware that there are thousands of forensic experts who have relied on the NSRL over the last decade or more as a basis for testimony in court. Those experts verify hashes for everything they do, and for every case, and as a result there has been significant amount of independent peer review of the contents.

I'm genuinely glad to hear that! That's good to know.

> While Codehash.db provides a hash for a package, the NSRL provides hashes for individual installed files.

I don't think that's necessarily true. Codehash.db is open to hashes for anything (source code, ISO, package, binary installer).

> This in no way diminishes the value of the Codehash.db design. They target different use cases.

Likewise, my remarks aren't meant to be in any way derogatory toward the NSRL. As far as I'm concerned, it's OK if they do, in the final analysis, target the same use case. If that's the case, the best solution should be adopted, whichever one that turns out to be. :)

jonstewart · on Nov 12, 2016

I am not sure I follow you on signatures. Doug White at NIST (https://twitter.com/dwhitenist) could change some hashes trivially and then sign them, and you'd never know the difference unless you thought you had the same file with a different hash. Even then you'd probably chalk that up to having a new version of the file that wasn't in the NIST. Are you thinking of some other scheme?

At the end of the day, I think it comes down to trusting Doug, which a lot of people do.

andrewdavidwong · on Nov 13, 2016

That's precisely the point. Doug could trivially change some of the hashes before signing them. If he were to do that, he wouldn't be trustworthy, and you, as a security-conscious individual, would want additional witnesses to corroborate the hashes before you're willing to accept that the software you downloaded is authentic. This is what codehash.db is designed to provide. (If you would be willing to chalk up the hash difference to a version difference, then this is probably aiming at a higher level of security than what you seek.)

In reality, Doug would never change hash values like that because he's trustworthy. At least, he wouldn't willingly or knowingly do it. But if Doug's signature is the only thing that guarantees the authenticity of a list of millions of hashes, that paints an awfully large target on his back. How do you know that Doug hasn't been coerced into changing some hash values before signing them. How do you know that Doug's signing key hasn't been compromised? We can't know these things for certain, but we'd have much greater assurances if we could check the signatures of multiple independent parties in addition to Doug's, and that's exactly what codehash.db aims to allow. It's a way of distributing trust across a larger group of people instead of centralizing it into a single point of failure.

By the way, does Doug actually sign the hashes? I haven't been able to find any signatures, so please point me to them if there are any.

jonstewart · on Nov 13, 2016

How do you determine identity with hash values? Alice could say that svchost.exe's hash is deadbeefdeadbeef and Bob could say it's baadcodebaadcode, but, of course, they both could be right because there are umpteen versions of svchost.exe. So, how do you solve the identity problem in order to detect evil?

andrewdavidwong · on Nov 14, 2016

It depends on the entity being hashed, but in the case of software, it's usually a version number. In the case of source code, maybe a git commit hash.