>Both of these arguments are absolutely, unambiguously, correct.
Oh, please. As if we couldn't just compare the hashes of the pictures people are storing against a CSAM database of hashes that gets regularly updated
When this was proposed people would respond "But they could just mirror the pictures or cut a pixel off!"
Who cares? You got that picture from some place in the dark web, and eventually someone will stumble upon it and add it to the database. Unless the person individually edits the pictures as they store them, that makes it so that you're never sure if your hashes will posthumously start matching against the DB.
People who wank off to CSAM have a user behavior similar to any other porn user, they don't store 1 picture, they store dozens, and just adding that step makes them likely to trip up, or straight up just use another service altogether
"What if there's a collision?" I don't know, go one step further with hashing a specific part of the file and see if it still matches?
This whole thing felt like an overblown fearmongering campaign from "freedom ain't free" individualists. I've never seen anything wrong with content hosters using a simple hash against you like this.
The hashes can not have collisions anymore, because modern forensics hash with both md5 and sha512, and both hashes must be together for use in any legal case. The odds of both of them having a collision is big enough to flat out say it's not going to happen.
But even if there was an md5 hash collision back when md5 was the only one hash use, it still doesn't matter because upon viewing the image that matched, if it's not csam, it doesn't matter. Having said that, the chance of dozens of images matching hashes known to be associated to csam is also so unlikely as to be unthinkable. Where there is smoke, there is fire.
And further, a hash alone is meaningless, since in court there must be a presentation of evidence. If the image that set off the csam alarm by hash collision is say, an automobile, there is no case to be had. So all this talk about hash issues is absolutely moot.
Source: I have worked as an expert witness and presented for cases involving csam (back when we called it Child Pornography, because the CSAM moniker hadn't come about yet), so the requirements are well known to me.
Having said all that, I am an EFF member, and I prefer cryptography to work, and spying on users to be illegal.
Apple's system used a perceptual hash. Not cryptographic hashes. The hash databases were not auditable and were known to contain false positives. The threshold for viewing reported matches was not auditable and could have been changed at any time. I hope your expert testimony was more careful.
You just said yourself that hash collisions don't matter as "because upon viewing the image that matched, if it's not csam, it doesn't matter"
So when you say "a hash alone is meaningless, since in court there must be a presentation of evidence", you'd just present the image to court.
The hash is the trigger to call the authorities they handle the rest
And with a userbase the size of apple and people as pissy as reddit, you want to completely exclude a possibility of collisions or you'll get a repeat of the scenario we got in 2021
I don't think it's an MD5 or SHA512 hash, since just changing one pixel would be enough to evade the scanner. My understanding is that it's heuristic similarity detection, which has a much wider footprint for collisions.
Oh, please. As if we couldn't just compare the hashes of the pictures people are storing against a CSAM database of hashes that gets regularly updated
When this was proposed people would respond "But they could just mirror the pictures or cut a pixel off!"
Who cares? You got that picture from some place in the dark web, and eventually someone will stumble upon it and add it to the database. Unless the person individually edits the pictures as they store them, that makes it so that you're never sure if your hashes will posthumously start matching against the DB.
People who wank off to CSAM have a user behavior similar to any other porn user, they don't store 1 picture, they store dozens, and just adding that step makes them likely to trip up, or straight up just use another service altogether
"What if there's a collision?" I don't know, go one step further with hashing a specific part of the file and see if it still matches?
This whole thing felt like an overblown fearmongering campaign from "freedom ain't free" individualists. I've never seen anything wrong with content hosters using a simple hash against you like this.