No, but think of these blurred images as a "hash" - in an ideal situation, you o...

ComputerGuru · on Sept 25, 2017

> you only have one value that encodes to a certain hash value, right?

Errr wrong. A perfect hash, yes. But they're never perfect. You have a collision domain and you hope that you don't have enough inputs to trigger a birthday paradox.

Look at the pictures on the article. It's an outline of the shoe. That's your hash. ANY shoe with that general outline resolves to that same hash.

If your input is objects found in the Oxford English Dictionary, you'll have low collisions. An elephant doesn't hash to that outline. But if your inputs is the Kohl's catalog, you'll have an unacceptable collision rate.

Hashes are attempts at creating a _truncated_ "unique" representation of an input. They throw away data they hope isn't necessary to uniquely identify between possible inputs (bits). A perfect hash for all possible 32 bit values is 32 bits. You can't even have a collision free 31 bit hash.

So back to the blurry security camera footage of a license plate or a face. Sure, that "hash" can reliably tell you that it wasn't a sasquatch that committed the robbery, but it literally doesn't contain the data necessary to _ever_ prove it was the suspect in question, even if the techs _can_ prove that the suspect hashes to the image in the footage.

chrismorgan · on Sept 25, 2017

FYI (not because it’s particularly relevant to the sort of hashing that is being talked about, but because it’s a useful piece of info that might interest people, and corrects what I think is a misunderstanding in the parent comment): perfect hash functions are a thing, and are useful: https://en.wikipedia.org/wiki/Perfect_hash_function. So long as you’re dealing with a known, finite set of values, you can craft a useful perfect hash function. As an example of how this can be useful, there’s a set of crates in Rust that make it easy to generate efficient string lookup tables using the magic of perfect hash functions: https://github.com/sfackler/rust-phf#phf_macros. (A regular hash map for such a thing would be substantially less efficient.)

Crafting a perfect hash function with keys being the set of words from the OED is perfectly reasonable. It’ll take a short while to produce it, but it’ll work just fine. (rust-phf says that it “can generate a 100,000 entry map in roughly .4 seconds when compiling with optimizations”, and the OED word count is in the hundreds of thousands.)

ComputerGuru · on Sept 25, 2017

Yeah, I debated bringing it up but since we were in the context of not knowing set members ahead of time, decided not to.

Thanks for the rust-phf link. I'm bookmarking for my next project!

jaclaz · on Sept 25, 2017

>So back to the blurry security camera footage of a license plate or a face. Sure, that "hash" can reliably tell you that it wasn't a sasquatch that committed the robbery, but it literally doesn't contain the data necessary to _ever_ prove it was the suspect in question, even if the techs _can_ prove that the suspect hashes to the image in the footage.

For a face, sure, for printed text/license plates there are effective deblurring algorithms that in some cases may rebuild a readable image.

A (IMHO good) software is this one (was freeware, now it is Commercial, this is the last freeware version):

https://github.com/Y-Vladimir/SmartDeblur/downloads

You can try it (just for the fun of it) on these two images:

https://articles.forensicfocus.com/2014/10/08/can-you-get-th...

https://forensicfocus.files.wordpress.com/2014/09/out-of-foc...

https://forensicfocus.files.wordpress.com/2014/09/moving-car...

For the first choose "Out of Focus Blur" and play with the values, you should get a decent image at roughly Radius 8, Smooth 40%, Correction Strength 0%, Edge Feather 10%

For the second choose "motion Blur" and play with the values, you should get a decent image at roughly Length 14, Angle 34, Smooth 50%,

consp · on Sept 25, 2017

Fortunately there is a limit: the universe (in a practical sense). You cannot encode all states it has in a hash as it would require more states than you want to encode as you already mentioned (pigeon hole). But representing macroscopic data like text (or basically anything bigger than atomic scale) uniquely can be done with 128+ bits. Double that and you are likely safe for collisions, assuming the method you use is uniform and not biased to some input.

If you want ease collision examples you can take a look at people using CRC32 as hashes/digests. It is notoriously prone to collisions (since only 32 bits).

IncRnd · on Sept 25, 2017

That won't work. A lot of people have tried to create systems that they claim always compress movies or files or something else. Yet, none of those systems ever come to market. They get backers to give them cash, then they disappear. The reason they don't come to market is that they don't exist. Look up the pigeon-hole principle. It's the very first principle of data compression.

You can't compress a file by repeatedly storing a series of hashes, then hashes of those hashes, down into smaller and smaller representations. The reason that you cannot do this is that you cannot create a lossless file smaller than the original entropy. If you could happen to do so, however, you would get down to ever smaller files, until you had one byte left. But, you could never decompress such a file, because there is no single correct interpretation of such a decompression. In other words, your decompression is not the original file.

ACow_Adonis · on Sept 25, 2017

Without getting too technical because I hate typing on a phone, you're technically right in the sense of a theoretical hash.

But in real life there's collisions.

And in real life image or sound compression, blurs, artifacts and resolutions, it is fundamentally destroying information in practice. It is no longer the comparatively difficult but theoretically possible task of reversing a perfect hash, but more like mapping a name to the characters/bucket RXXHXXXX where x could be anything.

There are lots of values we can replace X with which are plausible, but without an outside source of information, we can't know what the real values in the original name was.