Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tainting the CSAM client-side scanning database (xot.nl)
251 points by raybb on Oct 13, 2023 | hide | past | favorite | 266 comments


The article considers "an entity that is allowed to propose new entries to the CSAM database".

You don't even need this! You could target a whole "social cluster" of people without having any special privileges within this system.

As an example, lets say you want to attack environmental protesters.

For image A, you create a meme about climate change.

For image B, you procure something that looks, to humans, like CSAM (as described in the article).

Craft B′ such that f(B′) = f(A) (also as described).

Now, all you have to do is anonymously publish B′ to a platform that is actively moderated/monitored. Unfortunate users will see it and report it as CSAM, and if the platform fulfills its obligations, that report will bubble up to the relevant authorities, who will review it and add its fingerprint to the database.

Now you can start sending out image A, the meme, to your target demographic. You won't be able to post it on "mainstream" platforms with server-side fingerprint scanning, but there are plenty of other avenues for it to spread. If it's a good meme, it will propagate organically through group-chats and DMs, and eventually find its way onto devices with client-side scanning.

Apple's proposed device scanning system had a threshold before your device would be flagged, so repeat all these steps a whole bunch of times, until the average meme-savvy environmental protester's device gets flagged for further scrutiny.

Unwitting victims who do try to post the meme to a mainstream platform with fingerprint matching may risk getting their accounts flagged and taken down, and they might have no way of knowing what triggered it. This would lead to, of course, automated censorship of environmental protest groups.


> Apple's proposed device scanning system had a threshold before your device would be flagged

Which always struck me as odd as it would extremely easy to spin this as “Apple detected CSAM on this device but their policy is only to alert authorities once a set quantity of CSAM is found…”


It means they recognize there can be false positives. And that you can be the innocent recipient of it.


I absolutely understand that's what they're thinking.

I also know it will only take one case of someone being arrested for a related crime, and material being found on their phone that's under the threshold for reporting, and the associated headlines that could be written.


Even better: Distribute image A for a while before publishing B' and it can spread via all the server side scanning channels as well. Then when f(B')=f(A) is added to the database, people sse suddenly caught with apparently legal images.


That's some next level shit. I dig it.


> and if the platform fulfills its obligations, that report will bubble up to the relevant authorities, who will review it and add its fingerprint to the database.

I don't think NCMEC adds random images they find to the A1 list without knowing their origin.

> until the average meme-savvy environmental protester's device gets flagged for further scrutiny.

Who is this an attack on? A moderation contractor maybe, but not the protester. "Further scrutiny" doesn't mean "you go to jail", it means someone looks at the pictures.


NCMEC’s database is smaller and infrequently updated, but Facebook’s database is much larger and more actively updated. This is also one of the reasons that Facebook/Meta produces many more reports than other providers who use the NCMEC database. Governments view the limited reach of the NCMEC database as a problem, and are encouraging providers to use more complete databases and to develop tools to detect novel CSAM (this is included in the EU proposed regulations.)

“Further scrutiny” here means your images are reported to a provider, which (at that point) means your provider has a list of reports that defeat the guarantees of E2E encryption. The threat model looks something like this: https://www.justice.gov/opa/pr/former-twitter-employee-found...


Look at some of the cases in the news--the tech giants are more interested in ensuring that bad guys don't use their systems than in justice.

Remember that case not too long ago with a telehealth appointment, they sent a picture of something on their 2? year old's penis to the doc, asking if it was an issue. The police cleared him, but he's forever guilty in Google's eyes.


Google was not required to implement that the way they did, which is basically a server-side scanning policy of having contractors look at your nudes if you put them on your cloud drive. NCMEC may approve of it, sure, but another aspect of their not being a government agency is that you don't have to listen to them.


Problem is, as they are not a goverment agency and don't really see you as customer (more like a product) you have almost no avenue to get your account reinstated...

And as there are some people (and even companies) who rely on google services and their account there it can really shoot you in the knee


"Not being a government agency" was about NCMEC, not Google.


Regarding the first part, if that really is the relevant policy, fair enough. I don't know the specific policies used to include or exclude a given image. Where can I find the details?

I consider someone looking at pictures from my local device without my consent to be an attack on my privacy, regardless of the content, or whether they send me to jail afterwards.

The reason apple's threshold exists in the first place is because individual false-positives happen. Some of the images leaked from the victim's device may be entirely unrelated. In the specific example of an environmental protester, there might even be images documenting "crimes" (entirely unrelated to CSAM), due to increased criminalization of protest techniques.

A system that may be manipulated (anonymously, from great distance) to trigger spot-checks on the devices of anyone I don't like is a broken system.


> Regarding the first part, if that really is the relevant policy, fair enough. I don't know the specific policies used to include or exclude a given image. Where can I find the details?

I'm not sure either or I would've linked it. I have implemented such a reporting system though (compliant to US law, which is privacy-preserving relatively speaking, not any upcoming EU laws, which like all other EU law seems like a huge pain to live under.)

> I consider someone looking at pictures from my local device without my consent to be an attack on my privacy, regardless of the content, or whether they send me to jail afterwards.

To be clear, this is only if you're using a cloud storage service like iCloud Photos, Google Drive etc. It's meant to be a strict improvement over the usual setup, which is that your data isn't hidden from the cloud provider at all and they can just look at whatever. It would certainly be had to have any scanning if you're not opting into a cloud service.


It means your account gets a higher risk score, which may mean it gets given a "timeout", gets downranked in "the algorithm", gets outright shadowbanned, or may even get completely shut off. All in a completely automated way.


Nobody is "downranking users in the algorithm" because they think the user /is a pedophile/. That is mixing up social media controversies.

Your account could certainly be locked until someone looks at it though, yes.


Yes, and we all know how easy it is to get a real person from the big companies to take a look at something... Or even reach one


The proposed legislation and rules indicate that the “EU Centre,” Interpol, and local law enforcement and governments would all have access to alerts (reports, unfiltered) and the databases and systems used to generate them.


Image B' *is* CSAM. A competent authority will add it when it comes to their attention.


Anyone who thinks the injection of malicious hashes is an unrealistic scenario should take a look at the games now being played with Youtube‘s content protection system which is leveraged by scammers and blackmailers.


what is happening in Youtube world?


Not sure if its what the poster above is talking about, but there's definitely a hash collision type attack that's common on Youtube with regards to classical music. The attacker in question uploads very standard renditions of thousands of pieces of classical music, and claims copyright on them. Content ID then flags any video using one of these pieces as potentially violating the rights of the rightholder.

The attacker then claims the video, and claims revenue on the video rather than taking it down. There's an appeal process which seems to work about 0% of the time, and after that the only recourse the uploader has is to submit a copyright counter notification, which becomes the start of a potential lawsuit.

The attackers in question will then typically relent if challenged with a copyright counter claim, but most videos don't get counter claimed once this happened.

This pattern of behavior seems so common that it has led to two interesting patterns of behavior that I've seen.

1) Any video that uses popular public domain music will be claimed not once, but dozens or hundreds of times, because there are so many channels operating this way now, and

2) If you post a video and immediately get hit with a huge wave of claims, rather than going through Youtube's process, most uploaders just delete the video, remove whatever audio caused the problem, and reupload.


I have some, though maybe outdated, experience with that: In 2008, I was 15 and had the idea of creating youtube videos using MIDI files of classical piano songs that had entered the public domain. They're free to use and in theory, videos stay up forever and these classical songs are timeless, so that means after uploading I'd earn passive income forever!

Or so I thought... Every single video was copyright claimed. Youtube falsely thought it was a recording (which do have copyright). The second a claim was created I stopped earning money. I would then dispute the claim, explaining that it's not even a sound recording and they had a month to react to my answer.

In almost all instances the claims were retracted but it would only take a couple of days until another claim would be filed on that video. Answering all those claims took up more time than producing the videos so I gave up after a while.

But in my experience it wasn't really malicious actors filing these claims but youtube's filter simply not managing to distinguish between a recording of a classical piano piece and a well synthesized version of it. I actually remember most claims being filed by Sony.

It's interesting to hear that it's still this way because I don't really believe that nowadays, youtube still can't distinguish between different recordings of classical music. I guess they have no incentive to improve in that regard.

But I think the fact that nobody but the large labels are able to earn money with classical music on these large platforms is actually an excellent argument against upload filters and in my experience it's an argument that non-tech people can much better relate to than hash-collisions.


I wish that willful false copyright claims carried the same $250,000 penalty that copyright infringement does.


That would be ridiculous.

But there are penalties for false claims. There is a fine for claiming copyright you don't own, and if you go further and ask for takedowns, you are also liable for damage.

The problem is that these are rarely enforced. Even a $100 fine for a false claim on YouTube would be enough to weed out bots and click farms. And for the most serious cases, have the infringer pay damage and a bigger fine. No need to change the law for that, it just has to be enforced.


Consider the penalties given for a crime, and compare those to the penalties given when police or prosecutorial or judicial misconduct falsely, incompetently, or corruptly imprisons someone for that crime.

The latter are overwhelmingly more rare, but overwhelmingly larger. Society in general thinks false penalties for crimes are worse than the crime itself.

"It is better that 10 guilty men go free, than that one innocent man should suffer." I think that if the penalty for copyright infringement is $250,000, the penalty for a false claim should be $2,500,000.

You're right that a $100 low-effort, frequently-enforced counterclaim process would weed out ContentID bot farms (just like a $0.01/email transaction cost would weed out spam). But remember that the whole ecosystem is already ridiculous; if the pro-copyright MPAA/RIAA are pushing for $250k/infringement you have to be equally ridiculous to balance counterclaims.


> That would be ridiculous.

Why is it ridiculous, in a world where the penalty for sharing a single music file is $250,000?


That's an argument to reduce the penalty for sharing a single music file, not for making the penalty for a false report also ridiculous.

Then again, I personally do think that knowingly filing a false report of law-breaking should be treated as a very serious crime. I think the harm of filing a false report of copyright infringement is greater than the act of infringing copyright itself.


I mean it should be AT LEAST be handled like a copyright infringement itself. As you propose as owner of the copyright...

For stuff that is actually copyrighted that probably could only be triggered by the actual rights holder.


The penalty for a *knowingly* false allegation should be the same as the penalty for the actual act.

(And, yes, I would apply that to the criminal justice world.)


Why? I've long thought that, as a general principle, willfully false accusations of a crime should carry the same penalty as the crime itself.


It's not even a copyright claim. It's just a process in YouTubes inter al system.


Wow what a mess. At what point do we move on from copyright laws? Or more broadly intellectual property, in general. Even the words "intellectual property" sound ridiculous together when you think about it.


>Even the words "intellectual property" sound ridiculous together when you think about it.

For this reason, many would suggest not using it. It's a vague way of combining the separate issues of copyright, patents, and trademarks. It also illegitimately tries to equate those things to property, which changes how many feel about it.

https://www.gnu.org/philosophy/words-to-avoid.html#Intellect...


Thank you for that information. In the future I'll direct my criticism more precisely. To clarify, it is mainly copyright laws I have a bone to pick with. Trademarks make sense. Patent laws sort of make sense in some circumstances, less in others.


As we get closer to Star Trek replicators, patent law begins to look similarly inapplicable.


If you think deeper about it, the general concept of property is similarly ridiculous too. An arbitrary piece of land being ‘owned’ is similarly arbitrary human social concept, enforced by law and a registry


I agree that it's an arbitrary social concept, but it's not ridiculous at all, deeper thought or not. I think it's completely reasonable to expect that my house is my house, and others aren't allowed to enter it or live on my property without my consent. If you think we all should live commune-style and have no property or physical privacy, that's certainly fair for you to believe, but the vast majority of the people in the world would staunchly disagree with you... hence the concept of property ownership.

Certainly there are issues with how we've implemented property ownership, but I don't think the concept itself is inherently flawed or ridiculous.


If you're interested in reading about this sort of thing, some fun keywords are "enclosures", "Lockean Priviso", and "monopoly on violence".


Music have no registry. It is much worse than land property


Maybe true, but one step at a time IMO.


That’s not what a hash collision is.

Uploading popular (public domain) music and claiming you own it is just fraud


I don't have any direct knowledge of this, but I assume the hash collision comes in because Youtube uses some kind of fuzzy hash, so a generic music performance will flag videos using the same piece of music, but not necessarily the same performance.


I agree that it's not hash collision, but the fingerprinting strategy makes precise language on this topic a bit difficult/tedious.

The CSAM detection isn't proper hash collision either, in so far as I understand it. There's some fuzzy matching formula that generates the fingerprint, it's not simply a byte for byte hash taken of the image, and therein lies the comparison.

The fraud in question is reliant on content ID attempting to fuzzy match audio content, in this specific instance.


It's a hash, but it's not a cryptographic hash.


This one happened just a month ago so it might be what the parent comment is referring to. Basically, somebody made a fake company to make bogus copyright claims against someone to hurt their channel. Youtube refuses to deliver counterclaims unless the YTer puts their government name on it (he originally tried to deliver it via an attorney).

Additionally, the other party is actively trying to compromise the YTer's other accounts and identity to damage him, so any new data point given to this person represents risk.

I don't really care for this YTer (they made a video essay about someone who doesn't really want to be in the media circus anymore, could just MYOB) but the methods documented in this video can be used to doxx any user that uploads a video. All you gotta do is convince youtube that you own some piece of common non-royalty media that a youtuber uses to gain leverage on them. A lot of this kind of media is old with unfindable owners, so even if YT does want to spend the time to validate, there's nowhere to go with it.

https://www.youtube.com/watch?v=hixwIOd_C44


What I don’t get is this: why does youtube still have such a monopoly after all this time? It’s had a shitty reputation since I can remember, why don’t creators just band together and take their viewership somewhere less hostile?


Many people with YouTube channels are on TikTok, Twitch, Patreon, Nebula also. But leaving YouTube would reduce their incomes.


People claiming (via manual reports or via content ID) content that is not theirs to either obtain other people's ad revenue or cause the takedown of content they disagree with


I think it's pretty clear this is not about "CSAM", we have to stop using the term. It's just censorship, plain and simple. Client side means you'll pay from your own pocket for this wrongthing detector to work. It can even be automated so as soon as the detector gets triggered by anything, you'll get locked out of your bank accounts, until further notice I guess.

If this thing gets a serious discussion in a parliament, get rid of the parliament.


But parliament wants to seed your camera with mugshots of the FBI's top-ten most wanted list so the instant a false positive appears (directly on the camera, potentially even prior to writing the image to disk, potentially even prior to pressing the snapshot button)... they beacon an alert (or exfiltrate piggybacking via Bluetooth/AirTag/Covid exposure tracking mrchanisms), and bob's you're uncle.


I’m starting to think that a government large enough to get all the things it could want might be a bad idea.


I'd generalize this to any hierarchy. That includes corporations, institutions, religions, governments, cults, basically any time there are more than two people in the same room.


Sounds like there's going to be a market for vintage cell phones built before that happens. Although eventually they'll stop being useful as gWhatever rolls out and telecom providers stop supporting 4/5g.


What makes the debate confusing is that some people probably believe it is about CSAM. Like the Swedish PM Ylva Johansson for example. She probably believes in what she's doing, and other forces are simply taking advantage of her crusade.


That's very generous of you.


Any computational method that relies on a function that converts m bits (an image) to n bits (a fingerprint) where m > n will always be vulnerable to such an attack. The smaller n is compared to m, the easier it is to counterfeit something with that signature. It is not new knowledge. The only way to be certain, unfortunately, is for a human to look at it.

When the allegation is as serious as CSAM, I would rather be very very certain before accusing someone, rather than taking a shotgun approach and just randomly searching phones. But what do I know.


> Any computational method that relies on a function that converts m bits (an image) to n bits (a fingerprint) where m > n will always be vulnerable to such an attack.

The "vulnerable" depends on your definition: SHA-2 and SHA-3 are both still quite safe against preimage attacks, and even second preimage attacks require significant work to pull off for SHA-2, and I am unaware of any meaningful second preimage attack on SHA-3.

Of course, SHA isn't built for finding similar images, but for finding exact matches this should be safe enough.


These systems don't rely on cryptographic hashes but in fact the reverse: content sensitive hashes.

They essentially take an image and scale it to a small thumbnail. The values of all those reduced pixels are the hash of the original image. When a new image is scanned it's just doing a similarity check against the database of those hashes of known "bad" images. A hit triggers checking against an image hash performed with a separate algorithm. Hits against multiple hashes triggers a "bad image alarm" and ruins a person's life.

Changing a few input bits in an image doesn't usually change the perceptual hash because they're meant to be resistant to small amounts of localized noise. It is possible to add noise that will change a perceptual hash. It's also possible to manipulate an image so transforms (scaling etc) get wildly different results.

This leads to two exploits. The first is an attacker manipulates a "good" image such that when hashed it matches perceptual hashes of a "bad" image. The attacker then sends a bunch of these to a target triggering the "bad image alarm" and essentially SWATs the target. The second is to manipulate bad images in a recoverable way to get them pasted bad image scanners.


But then bypassing the filter would just require to change a few bits here and there.


No the whole purpose of these “hashes” is that they’re robust to that. The attack model they’re designed for is image manipulation to avoid the hash match, not manipulating manipulating images to trigger the hash.

There are numerous papers on doing bit manipulation to cause miss classification.


The context of the comment you’re replying to is a cryptographic hash.


It says:

> This shows that the database can be tainted with non-CSAM material by an entity that can submit entries to it.

Actually, it can easily be tainted by anybody. Take your massaged hash-colliding image, which remember is still visually child porn, and post it on some pedos-R-us forum. The people who maintain the database actively troll those forums. They'll see the image and add the hash to the database for you.


You'd be publishing child porn, which I think is not the wisest thing to be doing.


Tainting the database is probably already chargeable as something. The point is that the person doing this doesn't expect to get caught. And frankly they're probably right in that.


> Tainting the database is probably already chargeable as something

This is definitely true. Good old “intentionally accesses a computer without authorization or exceeds authorized access” in the CFAA. It’s so vague as to be able to make almost _anything_ the government doesn’t like involving computers illegal.


A spy agency or secret police doing this to use the client-side scanning system as a surveillance/censorship system for political media would not care about that.


Even if such systems were only used for their stated purpose of finding CSAM, because no one can audit the database of hashes there's no guarantee that non-CSAM images aren't in there.

Just being accused of having CSAM is a life ruining event. Even if someone is eventually cleared of charges their life is forever altered. The "We Got Him!!" headlines are front page news, retractions are filed in a basement filing cabinet with a sign that says "Beware of Leopard".


Which is why names and pictures of the accused should remain secret until the verdict, like it already happens in many other countries. But that's a separate topic.


Nah; you'll probably go about it by publishing very realistic AI generated copies so your actions are still legal.


> Nah; you'll probably go about it by publishing very realistic AI generated copies so your actions are still legal.

In many jurisdictions worldwide, producing/distributing/possessing "very realistic AI generated" child pornography is a crime.

According to Wikipedia, [0] it is criminal in these jurisdictions: Australia, Canada, Ecuador, Estonia, France, Ireland, Mexico, New Zealand, Norway, Poland, Russia, South Africa, South Korea, Switzerland, United Kingdom.

Furthermore, Wikipedia says it is in somewhat of a legal grey area in Argentina, Austria, Italy, Spain, Sweden, and the United States.

Regarding the US in particular: the Supreme Court ruled in the 2002 case of Ashcroft v Free Speech Coalition [1] that the child pornography exception to the First Amendment does not include "virtual child pornography", so long as it does not involve images of real children (i.e. using AI to take a non-pornographic image of a real child and turning it into a pornographic image of that child). However, while this bars prosecuting AI-generated child pornography under child pornography laws, it does not bar prosecuting it under obscenity laws. In the US, obscenity laws are much narrower than child pornography laws, so it is more difficult to get convictions, but people have gone to prison for violating them (e.g. Ira Isaacs [2] in 2012/2013, Paul F Little aka Max Hardcore [3] in 2008/2009). It can be difficult to convince a jury to convict, but realistic AI-generated child pornography may be one of those cases in which many juries would. Furthermore, Ashcroft v Free Speech Coalition is not set in stone–given technological developments since then, and the changed composition of the Supreme Court, it is possible that some prosecutor might seek to overturn it, and you can't say for certain they would fail. Given all this, I think Wikipedia is right to say it is a "legal grey area" in the US.

[0] https://en.wikipedia.org/wiki/Legal_status_of_fictional_porn...

[1] https://en.wikipedia.org/wiki/Ashcroft_v._Free_Speech_Coalit...

[2] https://en.wikipedia.org/wiki/Ira_Isaacs#Further_charges,_re...

[3] https://en.wikipedia.org/wiki/Max_Hardcore#2005_arrest_and_p...


I mean I know a lot of the West are big on thought crimes. Hell people in the UK get arrested for being too mean in tweets now a-days. I don't think that's going to happen here though.


This is probably a state-level actor doing it. They're not at risk.


[flagged]


Angle brackets are traditionally used for actual quotes.

> Step one, break cryptographically secure hashing system

These "perceptual hashes" are not (and cannot be) cryptographically secure, and practical collision attacks have been tested and published. Did you actually read the main article at all?

I don't think I'm even going to bother with the rest.


PhotoDNA doesn't use CS hashes, but others I've looked at do use them. But you're right, you probably don't wanna respond to my parody of you suggestion to trade in CSAM...


You're saying that there are cryptographically-secure PhotoDNA-like hash algorithms? That's remarkable, if true, though I'm not sure if the schemes you're referring to are resistant to preimage recovery or hash collision (different things!)


I dont want to oversell it, having really only seen what I might consider marketing material... I'm pretty sure they're only using a CS hash for the database, where the photo hash is similar to photodna, and they're only adding some indirection.

My assumption was something like sha_something(photodna(image))


> I'm starting to get the feeling this is a bad idea

It sure sounds like a bad idea for you or I to do, but would be it be a bad idea for $HOSTILE_NATION's intelligence agency? What about domestic intelligence agencies?


It's not a "cryptographically secure hashing system". PhotoDNA belongs to a separate class of algorithms called perceptual hashes. They're not particularly collision-resistant.


I am against the idea of scanning for the reason that the author pointed out: It's trivial to repurpose the technology to use it in dystopian ways.

I however have precisely zero concerns about impersonating hashes:

1. It's trivial to deal with tainting the database: both secondary hashing and more invasive hashes deal with that problem.

2. It's trivial to deal with impersonated hashes, all positives can be scanned on device in a second round with a different hash method. Those which are still positive can have be checked off device in an automated, yet privacy preserving, way such as by sending a low resolution crop with faces blurred to a system that does a likeness check against the matched CSA image: only the real image will bear likeness.

3. These systems have undisclosed minimums. A person sharing CSAM will be generating positive results at a significant rate. Receiving a few images with faked hashes won't set off alarms, and if a victim is suddenly receiving hundreds of images, that would already be obvious.

4. These articles over sensationalise what occurs when a positive match is found. They vaguely gesture to serious consequences from a single match. In reality even without secondary automated checks, the false positive will be noticed by a human and then discarded. If a person is receiving a lot of false positives that may lead to the individual being informed that they are begin targeted.

Overall the idea of scanning chat content seems ineffective, for much of the same reasons why it's ineffective to ban end-to-end encrypted chat: those with criminal intent will just use something else or roll their own.


If you can make one algorithm collide, you can make two collide.


It needs to be not just a second algorithm but a parametric hash function. So after an image B has been found which matches illegal image A on f(B)=f(A) then the server picks a random number R and sends the client R,f(A,R) and the client checks if f(B,R) =f(A,R) as well.

Do we have such a hash function?


If you can make 46 collide you can mint your own bitcoins.

The increase in difficulty is multiplicative not additive.


You can't because you don't have access to one of the algorithms.


In online people sometimes write things the way they want them to be, not the way they factually are. In small ways it comes across as wishful thinking, in other ways it's just plain old lying.

I didn't feel that the comment trivialising multiple hash collisions was worth a direct reply, because the author is clearly not writing with bona fide intentions, or any real knowledge about an image scanning system. It's the fingers in ears, head in the sand kind of denialism that adds nothing to the conversation.

My earlier comment already addressed what happens after the hashes match: a visual comparison. This isn't an original concept, it's a standard approach and part of earlier CSAM scanning proposals.

One needs to act rather barefaced to pretend that a matched hash alone will have consequences, when we already have established that false positives are the known downside to using hash-based image matching.


We don't actually have access to the first algorithm either, but it's been reverse engineered and a binary published. According to article.


I should have said "algorithm deployment" or something. The main point is that on the server side, you can change the hash (or its seed) anytime you want, if the false positive rate is getting too high.

It may be possible to do the same thing to the client actually.


> It's trivial to deal with tainting the database: both secondary hashing and more invasive hashes deal with that problem.

Sorry, what does "more invasive hashes" mean?

> It's trivial to deal with impersonated hashes, all positives can be scanned on device in a second round with a different hash method.

The double hashing would definitely help. I doubt anybody's gotten a close collision for more than one hash at a time.

It's not inconceivable that they could do it, though. As far as I know, there are only two hash methods that get used for this, and I think both of them are based on sliding a window over a reduced version of the image and doing a DCT, so they're pretty similar. I wouldn't be surprised if somebody could fool both of them enough to at least force you to tighten up your similarity threhsholds. It would be safer to come up with something that worked on completely different principles. No idea how hard that would be.

I don't think that the second-stage backup, where you send the image off somewhere, is "privacy preserving" by any standard I'd be comfortable with. Well, OK, actually even completely foolproof client side scanning isn't privacy preserving enough for me, but I mean that's substantially worse.

I also don't think it'd be easy to find anybody to run that server. And it'd be a significant structural change from what's deployed now. It'd be much more disruptive than adding a second hash (which is isomorphic to just switching to a single composite hash). It took a long time to get the present system deployed, so that big a change doesn't seem "trivial".

> These systems have undisclosed minimums. A person sharing CSAM will be generating positive results at a significant rate. Receiving a few images with faked hashes won't set off alarms, and if a victim is suddenly receiving hundreds of images, that would already be obvious.

Minimum what? If you just say "undisclosed minimums", my mind immediately assumes you mean threshold similarity scores. But you seem to mean hit counts to trigger various actions.

Anyway, both of those are set by the service that's using the database. Those services vary in their strictness and clue level, so I'm not sure you can say anything very general about them.

> These articles over sensationalise what occurs when a positive match is found.

In the threat model of the original article, the people doing the review for false positives are most likely the same people who poisoned the database to begin with. They've corrupted it to give them hits on images they want to suppress, even though the system operators don't want to support them in suppressing those images. So that whole "reviewed by a human and discarded" step doesn't happen.

In the broader world, I suspect that the thresholds for "delete the file" tend to be very low... too low for there to be human review. The thresholds for "disable the user's account" probably aren't exactly stratospheric, either. You can have damaging consequences well short of midnight police raids.

Also, "framing" people isn't necessarily the only thing you might use it for. For example, you could try to use it to drive the cost of running the system up to unsupportable levels.


I've never played with fingerprinting per se, but long ago I did write a program to detect near-duplicate images. It was based on comparing square of the differences of extremely downscaled images. Quite good at it's intended purpose, exposed the fact that a few images were photoshops and revealed that images that were inherently low-contrast could easily be confused by that approach--beaches looked an awful lot like beaches.


The issue described here, to my understanding, is that you find or create csam and then manipulate it so that its fingerprint collides with another image that you want to be flagged as csam. You then submit the manipulated version of the found or generated image to the authority.

First, at what point does the authority go "uh... Where did you get this from?" Practically speaking, the people doing this would have to be authorized law enforcement, no? Like if an ordinary citizen showed up with csam the excuse "oh I found this JPEG in a trunk in my late grandpa's attic" isn't gonna fly.

My assumption is that the central authority that compiles the database isn't just taking fingerprints, they want the actual source images. And I'm assuming they actually look at them to ensure they are, in fact, csam. Otherwise anyone could put anything into the database.

Now let me be very clear: I hate the idea of my device scanning for csam on principle. But what's the attack here? That someone looks at my device and determines that I do not, in fact, have csam? And that false positive might even be investigated and reveal that the bad actor sought out or generated CSAM for the purposes of perpetrating this! Which is even worse!

If the police want to look at your device, they can do that already through any number of other sketchy means that don't involve what's likely weeks of effort. The US searches phones at the border when they're feeling like it. I don't want to defend this stuff, but the potential for abuse in this specific case seems vanishingly small.


The general public submitting CSAM directly would indeed be highly unlikely, but the scenario we need to consider involves those in positions of authority who can manipulate systems behind the scenes.

Imagine that an unflattering or satirical image of Viktor Orban is circulating in France, and let's say it becomes viral, inciting discussions that the Hungarian government finds detrimental to its international image. The authorities might want to suppress this image, not just within Hungary but also in the whole of European Union.

The Hungarian government - who presumably has access to the EU CSAM database (or can coerce those who do), might attempt to add a fingerprint of a manipulated CSAM image that collides with the fingerprint of the satirical image. The principle here is not for public individuals to submit CSAM directly but for government actors to manipulate systems clandestinely.


>The Hungarian government - who presumably has access to the EU CSAM database (or can coerce those who do), might attempt to add a fingerprint of a manipulated CSAM image that collides with the fingerprint of the satirical image.

Then what? What does that achieve? There would be a huge spike in images identified as CSAM which would obviously throw up red flags. It seems like this would mostly just be headache for the law enforcement across the EU. It isn't like France is going to arrest a huge number of people without any investigation or thought of how this one CSAM image spread so far so quickly. And if we are talking about the result in Hungary, that government doesn't need this tool to abuse its power. Why go through all that effort? They could just do the equivalent of rubber-hose cryptanalysis.


One idea would be for the government of Hungary to create a list of its citizens that share this inciting, dangerous or whatever you you wanna call it material. If they keep on finding the same persons distributing multiple times, they may pay them a visit, get them fired from their government job, block their bank accounts, put them on the no-fly list or whatever.

And that’s just one idea, probably there’s others.


But why do this flagging out in the open where other member countries will see this obvious pattern of behavior and abuse of this system? Once again, yes this tool can be abused, but there are much simpler and harder to detect approaches that a corrupt or authoritarian government could implement. That is where the rubber-hose analogy comes into play.


Won't that easily be found out when the hash matches the image of Viktor Orban rather than an image of CSAM. I'm not sure the legal system is as stupid as you think it is. Then Hungary would just have their hashes reviewed.

Sure a conspiracy of all the relevant authorities across the whole EU would work, ... but that seems a stretch to enable what, political elites to _cooperatively_ censor images the public hold. There are easier ways, surely.


The point is a real CSAM image can be manipulated such that when hashed it matches the hash of the Orban image. So your phone calls the police saying you have CSAM when you get the Orban image. Your life is ruined and you're financially impacted trying to defend yourself. Even if you're cleared of charges you're fucked. This now has a chilling effect on sharing of the Orban image.

Orban's people won't face any pushback because they submitted an actual CSAM image. It's on you to prove somehow they intentionally poisoned the CSAM image such that its hash matches the Orban image.


I get people's opinions of law enforcement are low, but do you really think that no one would question why a huge number of people have a single CSAM image on their device especially when the hash for that image was just added to the database? Do you think that no one would question that maybe there is something wrong with that hash? Do you think that no one would look at the flagged image on any of those devices?

I understand people's concerns with this tech, but this seems like a rather silly hypothetical to me.


It's really about preventing images from circulating.

Yes, the database maintainers would notice, but it could take them a while to get around to it. And they're not going to be eager to remove a collision, because that would effectively "legalize" the child porn member of the image pair. There are lots of images that could be useful to suppress temporarily.

And if you actually succeed in suppressing the false-positive image, it may not get passed around very fast, and it won't get vastly more hits than real target images, so it will take longer for anybody to notice or care.

There's also a possible end game where the child porn traders start perturbing their images to collide with really common images like flags, corporate logos, iconic movie stills, and whatever else. So now you either have to ban the US flag, or let this or that actual child porn image go.

I don't actually think that the false hits would ruin very many lives in most places. But it's worth noticing that the original article was talking about authoritarian regimes repurposing the system without the consent of the database maintainers. In the Orbán example, it's possible that the system might flag you for child porn, but you might actually get arrested for sedition. And that continues to happen to people until the database maintainers pull the hash.


>Yes, the database maintainers would notice, but it could take them a while to get around to it. And they're not going to be eager to remove a collision, because that would effectively "legalize" the child porn member of the image pair. There are lots of images that could be useful to suppress temporarily.

They may not know it immediately, but the actual CSAM image wouldn't actually be shared in any real numbers which removes much of the concern with "legalizing" the image. You can argue this makes this system ineffectual, but that also means this isn't repeatable since weakening the impact of this system would quickly result in Hungary losing the power to add hashes to the database.

>And if you actually succeed in suppressing the false-positive image, it may not get passed around very fast, and it won't get vastly more hits than real target images, so it will take longer for anybody to notice or care.

Who is sharing the real target image? Is Hungary now an active creator and distributor of actual CSAM in addition to manipulating the database?

>There's also a possible end game where the child porn traders start perturbing their images to collide with really common images like flags, corporate logos, iconic movie stills, and whatever else. So now you either have to ban the US flag, or let this or that actual child porn image go.

Once again, this behavior would get Hungary booted pretty quickly. Also it is important to remember these are hashes. Not all images of the US flag would trigger the system only that specific image that has a hash collision.

>I don't actually think that the false hits would ruin very many lives in most places. But it's worth noticing that the original article was talking about authoritarian regimes repurposing the system without the consent of the database maintainers. In the Orbán example, it's possible that the system might flag you for child porn, but you might actually get arrested for sedition. And that continues to happen to people until the database maintainers pull the hash.

But this isn't a closed system within that authoritarian regime. It leaves bread crumbs of this behavior out for everyone to see. If Hungary is going to arrest people on made up charges, they can do that anyway.

I just think this in a complicated and ineffectual bullet that can only be fired once because it has a fingerprint of the person who fired it. That adds up to make this type of abuse less of a worry.


>> There's also a possible end game where the child porn traders [...]

> Once again, this behavior would get Hungary booted pretty quickly.

I'm sorry; I should have made myself clearer. In that paragraph, I've taken an aside and moved from Hungary (or any other government) as the adversary to child-porn-sharers-in-general as the adversary, and also changed the adversary goal.

No matter what you do, you can't "boot" the child porn sharers, because they're the ones who actually define what images you're legitimately trying to block.

> Also it is important to remember these are hashes. Not all images of the US flag would trigger the system only that specific image that has a hash collision.

They're approximate perceptual hashes, designed to come up with close values on close pictures. The US flag has an officially defined appearance. You'll get the same hash for any two close-cropped, straight-on images of the flag, the kind you might embed in your Web site. They'll be at least as close as two reprocessed versions of the same child porn image.

I was oversimplifying, though. You're not going to be able to tweak just any child porn image to make it hash like just any flag, unless you're willing to distort it into unrecognizability. And flags and logos might be bad candidates in general, because they're going to give DCT output that's wildly different than what you'll get from most photos. But if you had a relatively large library of child porn and a relatively large library of heavily-used effectively unbannable images of whatever kind, you should be able to find a lot of the child porn images that you can tweak to hash like one or another of the heavily used ones.

... and if you're generating fake child porn from scratch using ML, you can probably hack your ML model to bake the hash of one or another unbannable image into everything it creates. You could probably make those matches pretty damned close.

So, once they got the whole thing down, they should be able to force the system to greatly tighten its match thresholds and/or deal with a really high false positive rate. In fact, they could probably make things bad enough that they could end up with a pretty large collection of child porn that the operators would be forced to completely exclude from detection.

Now back on the original authoritarian threat model:

> I just think this in a complicated and ineffectual bullet that can only be fired once because it has a fingerprint of the person who fired it.

For the "authoritarian" purpose, you may be right... although I wouldn't be surprised if it stayed under the radar for longer than you think. If the image you want to suppress only circulates among your own people, and if you're the authority who receives and verifies the reports on your own people, then all you have to do is to keep the overall volume down enough that it doesn't make anybody suspicious enough to demand that you show them the reports you're getting.

If you're a relatively small country, the number of people who share some local meme you care about may be quite a bit smaller than the number of people who share some new real child porn image.


I guess it comes down to this black box list of hashes that no one would want to audit for pretty obvious reasons. The authority is only as good as the person paid to automate the hashing process feels that day.

It’s not inconceivable that someone at the authority could put in an image known to be stored by someone else into the hash list just to cause an extremely scary and confusing month for that person.

To be fair, that’s also the same problem of say, someone at your ISP dislikes you. There’s always a chance your supplier has a “bad apple”. Maybe it’s not an issue but I get the idea.


> I guess it comes down to this black box list of hashes that no one would want to audit for pretty obvious reasons. The authority is only as good as the person paid to automate the hashing process feels that day.

Isn't that exactly how it is run today, however? There are actual people at the National Center for Missing & Exploited Children who maintain the database and do in fact get incidentally exposed to the imagery. It's the one organization in the entire nation allowed to legally possess the actual images.


Another version of the attack is to manipulate an innocent image in a way that's flagged by detection systems, and then spread that image somehow. If any of the responses are automatic, then it's a lot of false positives in the system, a lot of contact from police, disabling of the host, things like this.


There's also not really any need to trick the system or manipulate other files to generate collisions. Spreading one of the original law-breaking files to unwilling targets would work as a trigger just the same.


But the recipient would know they had gotten CSAM and wouldn't pass it along.


> You then submit the manipulated version of the found or generated image to the authority.

If this is easy (for some definition of easy), is it not easy to then alter the CSAM images every they trade it to another pervert, such that hashes never match? How much image alteration would that take?, would it be human perceptible? If perceptible, would it be ignorable?

I think the exploit here suggests that the whole system may end up obsolete sooner rather than later.


Abusers who have actual CSAM could intentionally publish the fingerprints to sabotage the scanning scheme. If the illegal fingerprints become known, it will be possible to generate false positives and overwhelm verification/enforcement with bogus matches.


It's not an "any of us problem" for a bad system to be overwhelmed. Let it be overwhelmed.


Damn, a bit far fetched but theoretically possible. Jikes.


My understanding was that you generate a csam with a colliding fingerprint and put it online and trick somebody with an iPhone or whatever into clicking on it. Now they're on a watchlist and will get harassed by the police.


So you have to plant the manipulated csam somewhere that it'll be found by law enforcement and added to the database and just hope you did a good enough job to not be tracked?


Look if I want to host some white noise I can't be held responsible for what happens if people xor it together with some other file hosted elsewhere.


Erm, of course you can, you are then just distributing means for acquiring CSAM, or taking part in conspiracy to distribute it.


Under this logic any image at all on the entire internet is now "conspiracy to distribute" CSAM, since you can make an image diff from anything


>just hope you did a good enough job to not be tracked

Right. Your target uses e.g. an iPhone. You use Kali on clean hardware.


Would this org refuse to accept CSAM from an anonymous source just because they can’t point to its origin?


I think they'd send some police to find out


>Like if an ordinary citizen showed up with csam the excuse "oh I found this JPEG in a trunk in my late grandpa's attic" isn't gonna fly

Just post it to 4chan, it’s actively monitored by intelligence services


This post shows you either don't know what CSAM is or what an intelligence service is.


Also by unintelligent services


How do these databases differentiate between AI generated CSAM and CSAM of real victims? (Since many jurisdictions only criminalize real CP)

I know that 99% of people cannot tell an AI image from a real photo since that "Last giant irish greyhound 1902" photo has been going around on social media for weeks, and it is, to me, unbelievably obvious AI.


I assume the answer to that will be that there is no need to differentiate between them. And honestly, I agree with that argument. Possession of CSAM should be illegal regardless of whether it's "real" or not.

But the proposed scanning system is the wrong solution, regardless of any "real or AI" ambiguity, because it's possible to generate false positives with nonsense images that aren't even close to the expected CSAM, real or otherwise.


> I assume the answer to that will be that there is no need to differentiate between them. And honestly, I agree with that

I disagree. The point is to reduce actual child abuse. The images are in a way only tangential. If an image is made with an AI with no actual child being abused, then it shouldn't be a crime.

In a way, it's better, because it will distract the crowd of people into this sort of stuff from activities that harm real people.


> because it will distract the crowd of people into this sort of stuff from activities that harm real people.

This is actually the main point in dispute, and almost everyone arguing one side or the other on this topic seems to assume one side or the other on this point and argue from there, rather than seeking to support their position on the fundamental disputed fact question.

Which results in the most of the debate being people talking past each other based on conflicting assumptions of fact.


Exactly. It may distract a crowd of people into something less harmful. Or, it may perpetuate a behaviour sort of like the commonness of cigarettes leads to more people craving nicotine.


I think it's worth asking: Does synthesized CSAM have an "advertising" effect for real CSAM and CSA?


This feels a bit like "cold reading", I think you're absolutely right, but for all I know you could have been intending to post that comment on half the other threads on the front page.


Its certainly a common-enough phenomenon, what makes it specific to the topic is the specific factual disagreement relevant to this issue that people just assume a side on.


>> I assume the answer to that will be that there is no need to differentiate between them. And honestly, I agree with that

>I disagree. The point is to reduce actual child abuse.

There are limited resources, practically the only way to do this is to make it illegal to have anything that looks real (or looks derived from a real situation, in a 'I will know it when I see it way'). Otherwise, you're just making an almost impassable defence of 'it is fake' or 'I thought it was fake'. Then you can't practically reduce actual child abuse.


You're assuming the truth of your conclusion without testing it. It's equally possible that encountering AI-CSAM is going to incentivize collectors to pay a premium for 'the real stuff', just as many CSAM collectors end up getting caught when they try to make the leap into engaging in abusive activities for real. Your mental model of how CSAM enthusiasts think isn't anchored in reality.


I don't think that's how it works. It's not that CSAM drives them to actual abuse, but that for some CSAM isn't enough to sufficiently satisfy their desires that they go on to real abuse.

Thus I see no reason they would differentiate. With normal adult porn do we care that makeup and such might be involved?


I'm far less sure about this. We don't understand the neurodynamics of sexual desire that well, and lots of research in sex criminals show a pattern of escalation, similar to some kinds of drug addiction.

I don't understand your comparison with adult porn; that might be nonconsensual but typically isn't, whereas CP is nonconsensual by definition because minors aren't legally capable of agreeing. Obviously there are grey areas like two 17-yos sexting each other, but most courts take that context into account.


That's how I see it, also. There is a very clear pattern that prevalence of pornography reduces rape. Why in the world should we expect a different result when we narrow the context?

Yucky as it is I believe the answer here is to have image-generating AIs sign their work. Something properly signed is known not to involve any actual children and would thus be legal.

(My philosophy in general is that for something to be illegal the state should be able to show a non-consenting victim or the undue risk of a victim (ie, DUI). I do not believe disgusting things in private warrant a law.)


> In a way, it's better, because it will distract the crowd of people into this sort of stuff from activities that harm real people.

I see that assertion a lot. If that's how that works, why does the very large amount of CSAM already in existence not have the same effect? Why would synthesized CSAM distract pedophiles from their activities when real CSAM from their fellow pedophiles doesn't?


What are pedophiles' activities? Did you mean abusers?

I don't think they meant it would be 100% effective. And real child pornography may deter future abuse. Research is inconclusive.


You may think that intuitively, but actual studies actually indicate the opposite. Usage of CSAM material leads to increased risks of contacting and abusing children.

This needs to be balanced against rights to privacy and expression, which I personally think take precedence, but pretending that it can serve as harm reduction is just not correct.


I'm personally highly skeptical of the "offering them an outlet" argument. I'd be less suspicious of the idea if its proponents also suggested limiting it to controlled settings, e.g. during meetings with a professional psychiatrist.

But I'm sorry, I just don't believe anyone holed up in their room with a bunch of fake CSAM is "just using it as an outlet" or "protecting real kids from harm." I mean, it almost sounds like a threat: "If you don't let me look at these pictures of fake kids, I'll hurt real kids." If that's the case then they should be seeing a psychiatrist, at minimum.


Pornography reduces rape.

Violent movies that appeal to teens reduce vandalism and the like--they're in the theater rather than out causing trouble. (And it's not displaced, rates don't spike later, they just return to normal.)


Citation needed.


There are so few actual studies of this, and AI images being only a year or so old, that I would not put any weight behind them at this point.

My point isn't that AI CSAM should be legal or not, but whether these tools can differentiate what the lawmakers have decided is a crime or not.


Of course it's related. That doesn't mean it "leads to"--I think that's a case of the cart before the horse.

Those who have no sexual interest in children are neither going to have CSAM nor engage in abuse. The fact that they had CSAM already shows it's a highly non-random sample. The control would be pedophiles with no access to CSAM--but how do you find that control group????


Studies such as?


> Possession of CSAM should be illegal regardless of whether it's "real" or not.

From a purely ethical standpoint: why? What is the purpose of punishing someone who has harmed no one?

No victim means no crime.


From a purely ethical standpoint, sure, I agree. But we live in reality, and there are plenty of activities that seem ethically victimless, but are practically necessary to criminalize, in order to uphold societal frameworks and expectations of morality.

In this case, by giving every CSAM criminal a potential excuse that they "thought it was AI generated," the real victims are further victimized by being deprived of justice or forced to prove their realness.


Broadening the definitions of crime to make it easier to punish the ethically guilty on scant evidence while incidentally sweeping up the ethically innocent is a hack around a legal tradition that is designed exactly on the principal that it is better that the guilty go unpunished than the innocent are punished, by making the genuinely innocent administratively guilty, and we ought to reject that kind of justification every time it rears its head.

(There are times when it is important to have commonality while the choice of the common practice isn't important, which justifies regulations of obviously ethically unimportant things like "which side of the road is it correct to drive on relative to the direction of travel"; but where the purpose of a crime is purely to lower the evidentiary bar to punish people presumed guilty of a narrower crime, that's just an attempt to hack around the presumption of innocence and the burden of proof of guilt.)


Broadening the definition of a crime isn't exactly unheard of.

To choose a less emotional subject, mattress tags.

The ethical reason for mattress tags is because historically people would sell mattresses stuffed full of all sorts of unsavory garbage. What we actually criminalized, or at least were trying to prevent, was some sort of fraud or public endangerment.

But we also along the way made it illegal for sellers to remove the tags from mattresses.

Removing the tag isn't inherently harmful; if you don't deceive the purchaser on the contents of the mattress, it's not even fraud.

But we broadened the definition of the crime to make it easier to enforce.


You are going into forced labelling disclosure, which can have a lot of benefits beyond fraud prevention because it increases informed consent and all sorts of other net goods. It's the logic and ethics of nutrition labels, and IMO is probably one of the more good vs bad things that govts can mandate.


I agree with you. I think our disagreement here is over the level of innocence of someone possessing AI generated CSAM.

If you believe, as I do, that such a person is guilty of a crime, then we're not risking the false guiltiness of an innocent person. At best, we're risking their level of sentencing. And I'm open to the idea of reduced sentences for AI CSAM, but it shouldn't be a factor in determination of guilt (i.e. it should be a matter between the judge and the defendant, rather than something the prosecution needs to prove).

With regards to CSAM criminalization in general, there is a real risk of punishing innocent people that may have been framed by planted evidence. But this is a risk regardless of whether the evidence is "real" CSAM or not, so legalizing possession of AI-generated CSAM doesn't reduce the risk of an innocent person being framed. It might make it "easier" for a bad actor to frame someone, since they can now do it with AI content instead of real content. But if they're already planting evidence, do they really care whether they're committing a crime while preparing the evidence? And besides, if we keep the AI content illegal, then it's equally legally risky to frame someone with it as it is to frame them with real content.

The problem of prosecuting "innocent" people, whether you believe they're innocent because they were framed or because they're only guilty of possessing AI-generated CSAM, should be addressed at the time of enforcement. Stop using entrapment and fishing expeditions as an enforcement mechanism. Only open investigations when they start with a real and identifiable victim, rather than a potential perpetrator.


> I think our disagreement here is over the level of innocence of someone possessing AI generated CSAM.

> If you believe, as I do, that such a person is guilty of a crime,

You just explicitly said upthread that ethically they are not, but argued that it is useful for them to be treated as criminals because it denies an excuse to those who are ethically guilty because they are possessors of genuine CSAM.

You seem to be moving your fundamental ethical premises around in response to it being pointed out that the argument you previously made conflicts with a different widely proclaimed ethical premise.


My ethical premise is that there is no direct victim of AI generated CSAM, but that it's worth criminalizing because otherwise it further victimizes victims of existing law. In other words, there is a societal victim of it. To me it's the same ethical premise but interpreted within two different frameworks: one that's purely idealistic, and one that's based in practical reality.


AI cannot generate CSAM, because AI cannot abuse children. AI makes fictional images, which definitionally cannot be images of child sexual abuse.

There is literally no victim of any kind, even conceptually, in the case of computer generated imagery. It should be protected artistic expression.


What if a police officer generates some AI CSAM and then sells it to someone who thinks it's real? There's still "no victim," but the buyer thinks that there was. Are they guilty of a crime?

Your logic would seem to imply that there's no crime with possession of real CSAM either, and that the only crime lies with the original abuser who took the pictures.


> Are they guilty of a crime?

Unless there is a very specific "attempt to acquire CSAM" law then no they're not fucking guilty of any crime. If you live in a state where marijuana is illegal and you smoke some oregano because you thought it was marijuana you're not guilty of actually possessing marijuana.

A criminal law is composed of a number of individual statutes. When a state is trying to prosecute someone for a crime they need to prove three elements for each statute: the criminal act (actus reus), intent (mens rea), and the concurrence of both of those.

If a cop sells you oregano and you think it's marijuana you might have the intent to buy marijuana but there's no actual criminal act because oregano isn't illegal. If you make a law that only requires intent then congratulations, you've created thought crimes.

If you want to make entirely fake CSAM possession illegal, that's essentially the same as an intent-only law and creates thought crimes. It's a slippery slope.


> If a cop sells you oregano and you think it's marijuana [...]

It wasn't a cop, but I recall a case some years back when someone sold something as cocaine when it wasn't. Among other things, he went down for fraud.


> If a cop sells you oregano and you think it's marijuana you might have the intent to buy marijuana but there's no actual criminal act because oregano isn't illegal. If you make a law that only requires intent then congratulations, you've created thought crimes.

You're a lawyer, I take it? I'm not a lawyer, and I admit your analysis of this scenario confuses me. Is there no legal difference between merely having intent to commit a crime at some point in the future, and actually attempting to commit a crime?


I'm definitely not a lawyer.

> Is there no legal difference between merely having intent to commit a crime at some point in the future, and actually attempting to commit a crime?

That was my point. To be charged with and prosecuted for a crime you need to both intend to commit it and then actually/attempt to commit it. Attempted murder is a crime, I both intend to kill someone and try to do so even if I fail. It's not punished as severely as actual murder but it's still a crime. But attempted murder is actual a specific crime in the criminal code. There's elements of it that need to be proven in court.

Unless a jurisdiction has a crime of "attempted possession of marijuana", intending to buy marijuana but ending up with oregano isn't a crime someone can be charged with. If we start writing laws outlawing attempted possession it's a slippery slope that gets into outlawing thoughts. It also opens the door to stupid pre-crime ideas like someone would only use cryptography to get ahold of illegal content therefore anyone using cryptography is instantly guilty of attempting to get illegal material.

You can be sure this is what will happen because it's the very arguments the anti-cryptography groups use.


Attempted possession of an illegal item/substance is absolutely a crime.

Source: decade in the criminal justice system.


Definitely not a lawyer. You couldn't charge anyone with criminal conspiracy from the parent commenter's perspective.


While in one way you could look at it and say possession shouldn't be illegal the problem is that to possess it someone must have created it. If there's a market in it some people will engage in it to satisfy that market. Thus, possession of real CSAM has an indirect victim.

I had previously proposed that if the abuser has been caught that the victim should get the rights to the images and once they are an adult be allowed to legally sell them (thus a list of legally permitted images), but the AI image revolution has changed that. Have AIs sign their images, CSAM with a proper signature is legal.


There are plenty of "victimless" crimes that society deems unsavory and punishes. e.g. smoking pot in your home, alone is a crime in a lot of jurisdictions, even though clearly no-one is harmed.

In a lot of jurisdictions the decision has been made, whether it is right or wrong, to criminalize AI CSAM. The people have spoken and the lawmakers have made the laws. If you or I think that is wrong then the options are to lobby for a change.


> There are plenty of "victimless" crimes that society deems unsavory and punishes.

That is not an argument against the concept that that should not be the case.


Plenty of actions are crimes without real victims. Not having insurance while driving is an example. Possession of explosives is another one.


In fact, you might even argue that possessing real CSAM has no victim. After all, the person possessing the image isn't the one who committed the abuse and took a picture of it, right? But we've collectively decided that it's worth punishing that crime, because every viewer is an enabler of the abuser. The same logic should extend to AI-generated content.

To put it another way, consider a thought experiment where a police officer generates CSAM with AI and then sells it to someone who thinks it's a real picture of a real victim. We should arrest the buyer, right? They thought they were committing a crime.


You have literally proposed 'thought crimes'.


There's plenty of precedent where police officers pretend to be an underage person and some shmuck replies to them and agrees to meet at a hotel. Then they get arrested, and much of the time they also get prosecuted and convicted. You could argue it's entrapment but the fact is that most of society supports that sort of preemptive law enforcement.

If you looked at it through a purely ethical framework then you could never convict the person because there was never any "real victim." But is that the right way to look at it? It's certainly not the way most people look at it.


Seems like you want to bring all the success of the war on drugs to the war on generated images.


No, I don't support any automated scanning system or really any sort of "going out of our way" to find new criminals.

The reason I think it's a bad idea to differentiate between real or AI is similar to the arguments against "means testing" for distributing benefits. You don't want to put real victims in a situation where they're deprived of justice because they can't prove that their victimization was "real." Imagine a real CSAM criminal claiming a defense that they "thought it was AI generated." Do you want to give them that out?

If protecting those victims comes at a cost of punishing criminals possessing AI-generated CSAM with sentences equally as harsh as those for "real" CSAM, then it's a worthwhile cost to pay. They are still criminals, and they are definitely not innocent (unless they're being framed, but that's a risk with both real and AI images).


It's already a strict liability offense in many places, meaning that you don't even have to know that you possessed the image at all. You could apply the same strict liability standard and say that it doesn't matter that you didn't think it was real as long as it actually was real.

"I thought she was 18" doesn't work for physical sex either.


This is the only reasonable comment in the entire thread. Make CP possession a matter of strict liability and all problems are solved.


>> Imagine a real CSAM criminal claiming a defense that they "thought it was AI generated."

Saying "I thought this heroin was fake" is not a defense when caught with a bag of heroin, I don't see how this would be any different. It's not a magic out for anyone.


That isn’t the argument at all.

It’s that you have a bag of fake heroin, you are then arrested for it because someone thinks that by you having fake heroin you are encouraging real heroin users to do more real heroin.


Well yes, which is why my argument(sorry if it wasn't clear) is that having fake heroin shouldn't be illegal.


Sure it is. The bag of "heroin" on the movie set turns out to be real, think the actors are going to be convicted of possession?


I'm not sure what is the point that you are making here. An actor who was given a fake bag of heroin as a prop which then turns out to be real is no more guilty than a courier moving a package that happens to contain drugs or guns or fake money or anything else - neither would be found guilty of posession.

These are situational circumstances, and no prosecutor in the world would choose to prosecute those - but there is 0% chance you could get away with saying "oh I thought it was fake" if caught with CP on your phone.


I was simply providing an example of where "I thought it was fake" would be a reasonable defense.


I'm on the fence about whether AI CSAM should be illegal or not. There is no scientific consensus either way on whether it increases or decreases a person's thoughts about actual physical abuse.

The issue is that the people (and the legislators) in each jurisdiction have made a choice that AI CSAM is not illegal, and therefore this runs the risk of falsely accusing someone of a crime.

[if you disagree that AI CSAM should be legal in your jurisdiction the solution isn't to arrest everyone, but to petition your lawmakers to change the law]


> I assume the answer to that will be that there is no need to differentiate between them. And honestly, I agree with that argument.

Why do you believe that?


See my comment to a sibling reply. Basically I don't want to make victims prove their victimization was real, and I don't want to give criminals with real victims an opportunity to argue they "thought it was AI generated."


And I absolutely want both parties to have to prove crime/innocence and have an opportunity to argue. The current situation, where anything involving CSAM is so toxic that lives are ruined without trial is not healthy and isn't good for anybody


Point of order: victims are not "parties" in criminal cases, not in remotely modern legal systems. The parties are the accused and the state.

For the same reason, victims don't get to pardon crimes committed against them.

... which is the way it should be, because criminal punishment should not be seen as a form of revenge, but as a deterrent.


Right. Most crimes are essentially ones that "the people" found unsavory.

For instance, there is no "victim" if I got caught enjoying cannabis in my own home in a jurisdiction where such a thing is illegal, but "the people" have made a decision that they don't like it and I should be punished for committing an anti-social act.

That is one of the fundamental aspects of democracy at work.


I am of the opinion that regardless of whether they are both illegal, the penalties for “AI” should be significantly less.


What's your reasoning?


The only thing that can save us is a huge number of false positives and a backslash on the governments in charge, when they'll have to concede for the 10,000th time that it was not CSAM but parents sending their baby pictures to the grandparents.


Looking at the cookie banner stupidity, I don't think a huge number of false positives would generate backlash.


[...] One could conclude that the abuse centre could thus easily spot the malicious entry in its database after a few of these reports all concerned with the same image, and delete the corresponding fingerprint from the database. If that were the case, this avenue of attack would not be a problem in practice. This assumes, however, that the abuse centre keeps track of such false positives over time to detect such maliciously uploaded fingerprints. This may or may not be the case.

Why would it not be the case, unless the abuse center wanted to waste time and resources and have unreliable records? This essay is technically sophisticated and socially naive. Tech people keep spinning up the most unlikely and hard-to-explain reasons for why leveraging technology to interfere with the spread of CSAM is bad, instead of working toward any kind of privacy-respecting solution to target CSAM itself.

Constantly emphasizing the scope for government intrusion and privacy violation (legit but at the same time overblown to the point of paranoia) and constantly minimizing or dismissing the real harms of CSAM is a great way to alienate the normal non-technical people you need to have on side.


okay, ill bite:

yes, the blogger has tunnel vision, that's the only valid point in your post, he also missed the fact that anyone can poison the well, not just privileged entities.

however your rhetoric is cancerous, although i'm aware that many people who use it are just sheep and have no idea what they're insinuating.

CSAM is a propaganda word:

1. right wing uses it to push their agenda of punishing people for having sex

2. left wing uses it to push their agenda of punishing males or kulaks or whatever

3. police use it to make it look like they're solving a crime so they can get paid for nothing: they can arrest someone for looking at a nude picture of a 17 year old (which the suspect is often unaware of), then in the media and police statements: "he was arrested for the possession of child sexual abuse material". and this is the majority of convictions: people (and even teens) being into teens.

people are not naturally sexually attracted to children (13 and under) in more than say 0.0001%.

so now that we actually looked at the subject beyond the taboo veil, it seems the discussion is just to stop 14-17 year olds from having sex, which is insane and does not justify one single thing that police do about it. CP being illegal is just completely out of the question. we should be debating re legalizing CP, or changing it to 13 and under at the very least (i don't see any harm in the existence of porn of any age since it does not encourage abuse since most people will not become sexually attracted to children no matter how much you expose them to it).

so again, your rhetoric is pure cancer. i don't need a "privacy respecting solution to CSAM". what the hell do you think that would be? do we also need a freedom respecting solution to the murder problem? would you have my legs cut off and say i should just order door dash?


Your claims of things that do happen are likely true. It is a propaganda word. People do use it as a bludgeon to push their agenda through. Police do use it for bullshit arrests. Teenagers' lives are ruined for basically no reason.

Your claims of things that don't happen are, unfortunately, false. Law enforcement agents whose jobs involve ever looking at CSAM basically cannot stand the mental toll of the job for any significant amount of time. Unthinkably horrible things happen to children to create these images. The idea that we would legalize it is not on the table, and suggesting such a thing is a great way to immediately lose support from everyone in the world.

We should not reject efforts to prevent the production and spread of CSAM unless we can show that the former group can abuse it. Unfortunately, so far, basically all suggested prevention mechanisms are vulnerable to abuse and corruption. It's an extremely difficult problem.

> i don't need a "privacy respecting solution to CSAM". what the hell do you think that would be? do we also need a freedom respecting solution to the murder problem? would you have my legs cut off and say i should just order door dash?

I don't think that analogy is hitting the way you're intending. Yes, we'd all love a "freedom-respecting solution to the murder problem". In fact convicted murderers still have some rights and freedoms, and balancing those against preventing murder and rehabilitating murderers is another difficult problem. Cutting off legs hasn't been seen as a reasonable punishment for a crime since the Bronze Age, so I'm not sure what that's supposed to mean.


> Your claims of things that don't happen are, unfortunately, false. Law enforcement agents whose jobs involve ever looking at CSAM basically cannot stand the mental toll of the job for any significant amount of time. Unthinkably horrible things happen to children to create these images.

literally just propaganda and i have no doubt that this is a massive exaggeration.

>The idea that we would legalize it is not on the table, and suggesting such a thing is a great way to immediately lose support from everyone in the world.

cool, and i was going to open my first post with "CP being contraband is out of the question and never should have became a thing".

> I don't think that analogy is hitting the way you're intending. Yes, we'd all love a "freedom-respecting solution to the murder problem". In fact convicted murderers still have some rights and freedoms, and balancing those against preventing murder and rehabilitating murderers is another difficult problem. Cutting off legs hasn't been seen as a reasonable punishment for a crime since the Bronze Age, so I'm not sure what that's supposed to mean.

i don't know what i didn't make clear. i said that i don't want a law to cut off everyone's legs at birth with a casus beli like "terrible people are going around murdering people with their bare hands now that we all live in a cage and if you don't want to get rid of legs you are part of the problem". the analogy is in fact absolutely perfect even better than i intended, since you know whoever argued this is hugely exaggerating.

and then we have the other guy i'm replying to who can't even write a coherent response (for instance calculating 30K instead of 300, and then ignoring the fact that most of those people will not actually do anything) because he's too angry from his inner sheep being offended because someone dared question is terrible law which is basically just racket where they can use fabricated and overblown cases instead of having to perpetrate the crimes themselves (although the FBI is also known for serving CP themselves, which contradicts a huge amount of the "CP should be illegal" arguments, such as "seeing it makes more pedos")


I'm not the one who seems angry or incoherent here. I did carelessly omit the % on your suggested #, but I think your idea that there are only 300 child porn consumers in the US is laughably detached from reality.


you haven't addressed a single argument i made in the first post, they were all very simple too, they should be immediately rebukeable if they were wrong

what % of those "child porn consumers" actually just looked at a picture of a naked 16 year old girl? answer that and stay fashionable.


people are not naturally sexually attracted to children (13 and under) in more than say 0.0001%.

In the US alone, that would be ~30,000 people. Get back to me when you've read some legal cases or know anyone who's been abused to produce such material. Perhaps you'll learn some critical thinking and better manners along the way.


> Get back to me when you've read some legal cases or know anyone who's been abused to produce such material

Hello, I'd be the anyone in this scenario. I don't believe that the pros outweigh the cons, violating the privacy of every single person to protect us from the absolute minority (abusers) of a minority (those attracted to children) is not worth the exchange.

CSAM detection would not have protected me from abuse. At best it would have prevented continued sexual abuse.

And yet had my abuser been caught early I would have been returned to a different, brutally abusive man. One whose abuse no one particularly seems to care about, because there's less to get righteous about when there's no sex involved in the abuse.


OK, but if you read my posts above I am not arguing for mass privacy violations. I'm saying the tech community isn't going to get any traction for its advocacy unless it makes some effect to propose privacy respecting solutions to the CSAM proliferation problem.

I'm sorry that happened to you, but I presume you would also prefer that imagery including you did not continue to circulate?


How about putting more of the onus on services who might be used to traffic it?

> I'm sorry that happened to you, but I presume you would also prefer that imagery including you did not continue to circulate?

I've reported CSAM posts on tumblr and twitter and seen them survive for weeks before they finally disappear. It seems like that's a much riper target than propping up this kind of worst case solution.


what? we already know the legislation will not pass. the "tech community" doesn't have to do anything. why are you trying to make it sound like there is a dilemma here. cringe


Worst of all the EU Commission is now starting to act like fake news promoters by posting misleading ads (German article): https://netzpolitik.org/2023/politisches-mikrotargeting-eu-k...

It's outraging.


I'm completely stumped. I just don't understand how politicians can still believe that client-side scanning would somehow stop CSAM.

Couldn't those who possess this material also run those exact same CLIENT scanners (if not directly, than via isolated canary systems and emissions monitoring) to detect when files have been added to the scanners?

What about tools that keep shuffling files in ways that are not too perceivable but would defeat those scanners? Just look at copyright infringing material on Youtube (it's still there, just better hidden).

Nothing short of a government whitelist database containing all files that are legal to possess, or 100% analog human surveillance will solve CASM. Not AI and certainly not this.

I'm forced to agree with others here: This isn't about CSAM at all. It's a thinly disguised frontal attack on democracy. Imagine if the government had this technology when people were organizing to repeal discriminatory laws in the US. Imagine if authoritative governments in the east (or anywhere else) had this technology right now.

Take a long hard look at every person or organization who supports this nonsense and pay close attention to them. They are the ones who want to harm us and our children.


This is precisely the kind of trolling 4chan loves to do. It ticks all the boxes: swatting, CP, hacktivism possibly if you squint really hard.

And since these hashes are presumably resistant to cropping and scaling, they might be able to target individual symbols, people or image macros, like the rainbow pride flag (you know /pol/ wouldn't be able to resist), politicians etc.


If there were N algorithms, is it feasible to create B' image such that its fingerprint matches A in every algorithm?


If if isn't, then it becomes technically (and likely) feasible for every client that runs those algorithms to reconstruct A.

This is a really stupid idea. Politicians who are demanding this are way out of their depths.


Ok, but why not go for routine home searches then?

Is it because on the phone, it is "invisible" and cheap?

Does it mean that what stops the government from treating everyone as a potential criminal is cost and inconvenience?

People were raising alarms about this years ago and were branded as conspiracy theorists. I guess the EU works well in their propaganda department painting themselves as do gooders, where in fact they are just wannabe Stalinists.


>Ok, but why not go for routine home searches then? Is it because on the phone, it is "invisible" and cheap?

Underrated comparison. That is apt and the whole “for the children” part is smokescreen.


Government almost never passes useful/wise legislation with regard to technology, because it thinks “tech” is a problem that the government can “fix”. There are systemic reasons for the legislation almost always being bad/unhelpful/foolish and those reasons are not going to change.


I’m curious how few people have followed this topic of client side scanning through.

The logical conclusion is that someone is going to get a knock on their door and a warrant for search because of an unknown signature match on a file some where at some point?

And people seriously think this will stop at being “for the children”?


The database would obviously come from a central organisation like the FBI or NCA and be built from material gathered in investigations. They wouldn’t allow larry from reddit to share his child porn with them and break the database.

Anyone with sufficient power to want to target you wouldn’t use a ridiculous exploit like this. They’ll just go to google and say “change the update server for this user to this new address” and then breach your device with the next software update.


This attack also works for people who are not in a position to submit entries to the database as you could very well plant the generated matching images on a server set up with a compromised credit card and report it yourself.

You could select pictures from targets publically available photos or more insidiously look to compromised accounts cloud storage and generate fake offensive images that match peoples actual baby pictures whether for harm or blackmail.


This is basically what I suggested we do a few years ago if Apple added client-side content scanners.

Seems all need to here is obtain a finger print of a CSAM image then you can reverse engineer a non CSAM image to match that finger print. Distribute this image wide enough and you effectively render these algorithms useless.

Which is a shame because they could be used for good, but it seems kinda obvious this will expand out to scanning for terrorist materials. And again, while I might be in favour of this in theory I am frequently shocked by what fits this definition. In the EU if you promote the wrong ideas or say something "hateful" you can be arrested and sentenced 10+ years in prison.

I think what concerns me is that currently you really only put yourself at risk if you make your wrong think public, e.g tweet about your dislike of Islam on Twitter. These client-side scanners would now enable the authorities to not only go after those who say "racist" things online, but also those who view / save "racist" memes. If I were to guess at a motive, this would be it.

Legal note: I obviously don't agree with hate of any kind, but I do believe in the right to express views which may be considered hateful.

Edit: Fine with the downvotes – but curious what people are disagreeing with? Would you mind leaving a quick comment if you choose to downvote?


1. You mixed up generating child porn images that collide with targeted non-child-porn images, in order to taint the database and thereby suppress the target images, with generating non-child-porn images that collide with child porn images, in order to overwhelm the system with false positives.

2. You brought in matters that have nothing to do with hashing or child porn, but tend to generate flame wars.

3. You put sneer quotes around "racist", which makes it sound very much like you don't think any such thing exists.


Thank you.

1. Oh okay. I wasn't very clear there. By, "this is basically what I suggested we do a few years ago" I meant that we should look to exploit the limitations of these fingering printing algorithms to render them impractical. But yeah, my suggestion was that we should do the reverse. Rather than taint the database, taint the content.

2. That wasn't my intention. I was suggesting that there are likely other reasons governments are interested in this technology than, "think of the children". Hence, why I think we should seek to ways to render these algorithms ineffective were they deployed. But yes, please feel free to downvote if you feel my comment is unproductive. On reflection I agree it might be. I'm not really adding much here

3. You know what's annoying? I did that because I didn't to make an assertion either way to be as neutral as possible. I'm using quotes because others deem it racist, and that it's irrelevant what my personal position on that is. If you want my position on whether racism exists though, obviously I believe it does – I'd urge you to just assume good faith in the future.


> Seems all need to here is obtain a finger print of a CSAM image then you can reverse engineer a non CSAM image to match that finger print. Distribute this image wide enough and you effectively render these algorithms useless.

You can't do this without committing an illegal act (downloading CSAM) and it also wouldn't work (because there's a second server side hash), so no I don't think you should do this.


No, all he would need as the hash which his device would have access to (client side scanning)


You can design the system so the client doesn't have a list of hashes either, eg with a Bloom filter or other probabilistic system you can change.


I consider anyone promoting client side scanning of my media promoting a form of non-consensual violence against me.


Huh? I'm not promoting it?

I do think there may be a time and a place for them though... On school and work devices, for example.

That's just my weakly held opinion though. Isn't an opinion I would seek to "promote", just one I wouldn't object to it.


I’m worried the same kind of attack will make this[1] CSAM removal project useless. If there is no authentication of the submitter and no punishment for fake submissions, then it becomes garbage-in, garbage-out.

[1] https://takeitdown.ncmec.org/


Didn’t the UK just blink when chat apps said hell no we will leave instead? Does the EU expect to get better results? Apple has very publicly come out and said that their proposal like this was a bad idea and I’d expect Signal to also not comply, the only remaining question is whether WhatsApp would threaten to leave again.


Apple proposed the exact same thing themselves though.


What are the legal consequences of having an image with the matching fingerprint on your device? Is there some kind of threshold to mitigate false positives? What organization is responsible for processing the matches? Is it enough for a search warrant? (for various countries/jurisdictions)


Edit: comment deleted


Because client-side scanning is not going to work, and no one wants to government issued black box binary to send their conversations and photos to unnamed police person randomly, the non-compliance is the only way.

People just start to use chat programs in the EU that do not comply. This would be Signal, Telegram, others. The EU can fine and fight with Meta/WhatsApp, Apple, others, but that’s about it. The EU bureaucrats do not have power to magically insert spyware in our devices as long as we can install our own software.

… which of course means we know what’s going to come next.


Signal advises its users to install the app from the Play Store. While you can still get the .apk from Signal’s website, the developers warn against that. The EU can definitely exert pressure against what is hosted on the Play Store.

Telegram famously lacks end-to-end encryption (unless you intentionally use its private-chat feature, which few people do) and shouldn’t be mentioned in the same context as Signal.


Telegram is a social media site pretending to be a messaging app.

The secret chats were the biggest feature when it started, but now it's an after-thought.


Molly is a just as good (if not better) fork of Signal that is distributed on FDroid. It's the same network and same chats/contacts, just a different front-end.

And Signal avoids FDroid because they don't want someone else signing packages but they can always provide an FDroid repository like many others do and sign everything themselves.

If push comes to shove they'll be fine and pressure to black box signal in the EU is unlikely to hold up if they can just move users to another app store.


Signal being restricted to F-Droid would be the end of the app as any kind of mass phenomenon. Sure, like my fellow nerds here on HN, I use F-Droid. But none of the ordinary friends and relatives I managed to convince to install Signal, since it was free from the Play Store with just a few taps, would continue using the app if it were relegated to that repository they have never heard of.

Signal’s developers have spoken on a number of occasions about how their aim is to ensure encryption for the masses, not a techie elite.


Yeah, I communicate with way too many people on iPhones who will simply not switch to Android. If it disappears from the Apple App Store we will have to use a different messenger.


I mean if the choice is between "installing black box MITM to signal", "pulling it from the EU google play store and requiring users to download the apk manually", and "pulling it from the EU google play store and allowing users to use fdroid" then the choice is obvious.

Mass adoption can continue elsewhere if the EU chooses to go down a route that would make play store support in the EU non-viable.


Their reasoning for having it Play Store only is that Google does not require app developers to provide their signing keys/sign the APK themselves. Now this is no longer the case as Google changed their policies.

NOTE: F-Droid does not require app developers to share their keys, instead they build the application themselves and sign it with their keys -- something Signal is not a fan of.

tl;dr their rationale no longer applies.


I have also seen the preference for installing from Play Store explained as being that users are safer if they use default setups and do what everyone else is doing. For users who are not savvy, going the sideloading route can expose them to risk.


For me this line of thinking doesn't make a lot of sense. The Google Play option will always exist as it is a prerequisite for widespread adoption on Android. Also arguably, you're safer on F-Droid due to the level of vetting on that platform; so providing Signal on it is a good endorsement for general user safety and privacy.

Either way, this isn't about only offering the application on Google Play or only on F-Droid; but providing the option for both. Non-tech savvy users will always pick the more familiar and easy option on average, Google Play.


How is providing a F-Droid repo the same as being restricted to only being available via F-Droid?


It was in reply to:

> If push comes to shove they'll be fine and pressure to black box signal in the EU is unlikely to hold up if they can just move users to another app store.


Yes but the assumption is that "push comes to shove" and they have to choose between pulling it from the EU play store and installing black box MITM software.

Having relatively easy alternatives in place would reduce the leverage the EU has to actually practically enforce this and hopefully pressure them into at minimum non-enforcement and preferably walking back the obviously unenforceable legislation.

But if it looks like they could practically force out Signal and co or force them to adopt this MITM if they want to continue existing, then they might be more likely to pursue it.


If you enable private chat, you don't get notifications for new messages, that way the feature is self defeating, it seems to me at least. (Something could also have gone wrong?)


Notifications generally go through Google Firebase so by not running private messages through firebase they avoid potential leaks. At least that's my guess


A common way around this is to simply send a notification through Firebase/other push notification mechanism that tells the app there is a new message.

The app can then retrieve the message and decrypt it. Most of them will then update the notification shown to the user with the full message content.


Maybe the concern is that Firebase still learns that user A and user B receive notifications back and forth at around the same time, interspersed with periods of no notifications for both

If A and B talk regularly, there might not be many other C who by coincidence exchange notifications at the same time

So you might be vulnerable to a sort of traffic analysis


That sounds pretty bad, a cloud service needs to see all notifications?


Server side push messaging uses cloud infrastructure on both Android and iOS devices.

Some applications choose to make these notifications more 'secure' by not sending any content through them, just the fact that you have a new notification, but that makes the UX a little less friendly.

https://en.wikipedia.org/wiki/Apple_Push_Notification_servic...

https://en.wikipedia.org/wiki/Firebase_Cloud_Messaging


No, it can be encrypted and then you have a hook that decrypts the notification on-device. On iOS you need to get approval from Apple to do this though


A good question. I am not sure if notifications on iOS, Android, Google Chrome or Firefox are end-to-end encrypted.


You can send whatever values you desire as long as your service worker handles it.


With Telegram on Android you should still get notifications, even for encrypted messages. They just won't show the message's content.


I don't have problems with notifications for secret chats ?


Their point isn’t about Telegram end to end encryption or lack thereof, its that they don’t comply

Nothing gets taken down from telegram and they ignore court orders they just go to null


And they store data on their servers in plain text so that ru state/arab investors can access it at will


Also do not miss the recent good article about the US software development companies who are behind the lobbying of client-side scanning, how is this madness is being funded and by whom.

https://balkaninsight.com/2023/09/25/who-benefits-inside-the...


Thanks - interesting. Phrases like this are eternally surprising:

> The same month, UK officials privately admitted to tech companies that there is no existing technology able to scan end-to-end encrypted messages without undermining users’ privacy

It's as though this is a secret that UK officials knew. Everyone technically minded knows this; it's a battle between those who regulate (the difficulty of whose job generally stops at "write down the rule and fine people if they don't do it") and those who know about current fundamental limitations.


Remember this is the same UK government that at one stage back in 2020 was proposing that there could be borderless customs enforcement between Ireland and Northern Ireland (part of the uk) "using drones". Because everyone knows drones do that.

I don't think they consider themselves limited by things technology is actually capable of when talking about what they are going to get technology to do.


cloudflare also comes to mind, as they proudly do whatever they can in their power to check off that CSAM box as long as it makes them money somehow.


And by next you mean in parallel. Spinning E2E encryption as a tool used only by bad actors is very common and pushing browsers to implement governments managed blacklists seems quite in progress. Banning VPNs from social networks (and more) was even suggested in France but quickly removed due to backlash.


Governments will mandate by law that the OS do client side scanning by peeping inside the app directories. Apple, Google, Microsoft will comply and realistically, which phones and computers are we going to use if we don't buy from them? Desktop Linux might become illegal.


Yes, the EU is already working to make this illegal

https://mullvad.net/en/blog/2023/2/1/eu-chat-control-law-wil...

The best bet is Android fork based on Android open source distribution, without Google.


Right.

Per program makes people use different programs.

Per OS, ok, maybe people will use a different OS (finally! Year of the Linux desktop!)

… ok, we’ll just roll it into the three companies making GPUs. Driver binary side scanning. Maybe hardware frame buffer scanning?

All is this is framed around children, but you are an absolute clown if you think that is truly the end goal.

It’s always “for the children”.


> The EU bureaucrats do not have power to magically insert spyware in our devices as long as we can install our own software.

EU bureaucrats have power to force ISPs to block connections to the messengers that would not comply, Chinese-style. They already do block websites they don't like - e. g. in Cyprus local provider CYTA shown me "access blocked due to EU comission regulation #whatever" for most of the Russian media sites.


The EU is trying to make it mandatory to allow sideloading. This is bad, but also the opposite of what you are saying.


At the moment, it is perfectly possible for e2ee to be compliant.


[flagged]


Can you point to any society anywhere, where people learned to behave?


I am generally against the death penalty, but I am particularly against the death penalty for anything other than murder because of the actions it incentivizes.


Another case of setec astronomy.


IFL these threads always give vibes like, "oh, thank god that anti CSAM measure wouldn't actually work".


I think that some people are terrified that their (possibly AI generated) "loli pictures" would be caught by a scanner.


What I'm concerned about is a system that flags me for a crime based on a database I can't audit based on mechanisms with an entirely too high false positive rate.

Because the database can't be audited by anyone but a select group we have to trust that it only contains actual bad images. I do not trust that such databases don't also contain images that are embarrassing to powerful/connected people. I also do not trust such databases don't contain false positives.

The sort of people that are super zealous about a topic aren't simultaneously super rational and objective about that topic. There's a non-zero probability that those databases contain lewd yet entirely legal images that the submitters just didn't like.

Because of the false positive rate a photo of my dog might trigger an alarm and then my phone sends an automated message to the police. I'm then told by proponents there will be some manual review. I have to then hope that the local DA doesn't have an election coming up and wants to push a "tough on crime" message so charges me with a crime despite a review.

In short these scanning systems require far too much unearned trust. They also present a slippery slope thanks to the incendiary nature of the topic. Today it's CSAM but what undesirable content will the systems be used for tomorrow? Such systems require trust in the stewards of today and tomorrow. Do you want people of the opposite ideology to you in charge of such systems? Do you trust they'll never be abused? Do you trust well meaning people never make mistakes?

I do not trust in any of those things. I'm not worried about myself doing actual bad things, I'm worried that demonstrable false positive rates will ruin my life with the mere accusation of doing something bad.


Nothing in this post (either the system implementation details or the supposed consequences of detection) is how it actually works in reality, so I don't think you should spent much time being upset that something you just made up has a security hole in it.

e.g. there isn't a high false positive rate, the attack in this article doesn't work because the attacker doesn't have access to all the hashing algorithms used, and it doesn't text the police.


Um, the whole thing definitely is unaudited.

> the attack in this article doesn't work because the attacker doesn't have access to all the hashing algorithms used

As far as I know, there are only two hashing algorithms used: ContentID and "the Facebook one", whose name I don't remember offhand at the moment. ContentID has leaked, been reverse engineered, been published, and been broken. A method to generate collisions in it has been published. The Facebook one has never been secret, and essentially the same method can generate collisions in it. And most users just use ContentID. [On edit: Oh, yeah, Apple has their "Neuralhash" thing.]

Are there others? If there are, I predict they're going to leak just like ContentID did, if they haven't already. You can't keep something like that secret. To actually use such an algorithm, you end up having to distribute code for it to too many places. [On the same edit: that applies to Neuralhash].

I assume you're right that the false positive rate is very low at the moment. Given the way they're done, I don't see how those hashes would match closely by accident. But the whole point of this discussion is that people have now figured out how to raise the false positive rate at will. It's a matter of when, not if, somebody finds a reason to drive the false positive rate way up. Even if that reason is pure lulz.

None of this "texts the police", but it does alert service providers who may delete files, lock accounts, or flag people for further surveillance and heightened suspicion. Much of that is entirely automated. And a lot of the other things you'd use as input, if you were writing a program to decide how suspicious you were, are even more prone to manipulation and false positives.

I believe the service providers also send many of the hits to varous "national clearinghouses", which are supposed to validate them. Those clearinghouses usually aren't the police, but they're close.

But the clearinghouses and the police aren't the main problem the false positives will cause. The main problem is the number of people and topics that disappear from the Internet because of risk scores that end up "too high".


> As far as I know, there are only two hashing algorithms used: ContentID and "the Facebook one", whose name I don't remember offhand at the moment.

Yes, those aren't suited for client-side scanning. If the server side can do any content scanning then you're not secure against them, so the protection isn't what kind of hashing they use, it's just that someone actually looks at the results.

> You can't keep something like that secret.

I didn't say it was secret, I said you don't have access to it. Well… that's kind of the same thing I guess, but anyway the important point is they can change it/reseed it.

> None of this "texts the police", but it does alert service providers who may delete files, lock accounts, or flag people for further surveillance and heightened suspicion.

Google has done this one looking for "novel CSAM" aka anyone's nudes, which is bad, so I recommend not doing that.

> Those clearinghouses usually aren't the police, but they're close.

No, it's extremely important that they're not the police (or other government organization); in the US NCMEC exists because, as they're a private organization, you get Fourth Amendment protections if you do get reported to them. But these systems don't automatically report to them either. Someone at the service looks at it first.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: