AI/machine learning has been used in Advanced Threat Protection for ages and LLM...

simonw · 2025-07-09T00:03:34 1752019414

Using non-deterministic statistical systems to help find security vulnerabilities is fine.

Using non-deterministic statistical systems as the only defense against security vulnerabilities is disastrous.

ants_everywhere · 2025-07-09T00:17:28 1752020248

I don't understand why people get hung up on non-determinism or statistics. But most security people understand that there is no one single defense against vulnerabilities.

Disastrous seems like a strong word in my opinion. All of medicine runs on non-deterministic statistical tests and it would be hard to argue they haven't improved human health over the last few centuries. All human intelligence, including military intelligence, is non-deterministic and statistical.

It's hard for me to imagine a field of security that relies entirely on complete determinism. I guess the people who try to write blockchains in Haskell.

It just seems like the wrong place to put the concern. As far as I can see, having independent statistical scores with confidence measures is an unmitigated good and not something disastrous.

simonw · 2025-07-09T00:38:44 1752021524

SQL injection and XSS both have fixes that are 100% guaranteed to work against every possible attack.

If you make a mistake in applying those fixes, you will have a security hole. When you spot that hole you can close it up and now you are back to 100% protection.

You can't get that from defenses that use AI models trained on examples.

tptacek · 2025-07-09T00:40:42 1752021642

Notably, SQLI and XSS have fixes that also allow the full possible domain of input-output mappings SQL and the DOM imply. That may not be true of LLM agent configurations!

To me, that's a liberating thought: we tend to operate under the assumptions of SQL and the DOM, that there's a "right" solution that will allow those full mappings. When we can't see one for LLMs, we sometimes leap to the conclusion that LLMs are unworkable. But allowing the full map is a constraint we can relax!

Johngibb · 2025-07-09T04:23:51 1752035031

I am actually asking this question in good faith: are we certain that there's no way to write a useful AI agent that's perfectly defended against injection just like SQL injection is a solved problem?

Is there potentially a way to implement out-of-band signaling in the LLM world, just as we have in telephones (i.e. to prevent phreaking) and SQL (i.e. to prevent SQL injection)? Is there any active research in this area?

We've built ways to demarcate memory as executable or not to effectively transform something in-band (RAM storing instructions and data) to out of band. Could we not do the same with LLMs?

We've got a start by separating the system prompt and the user prompt. Is there another step further we could go that would treat the "unsafe" data differently than the safe data, in a very similar way that we do with SQL queries?

If this isn't an active area of research, I'd bet there's a lot of money to be made waiting to see who gets into it first and starts making successful demos…

simonw · 2025-07-09T12:25:29 1752063929

This is still an unsolved problem. I've been tracking it very closely for almost three years - https://simonwillison.net/tags/prompt-injection/ - and the moment a solution shows up I will shout about it from the rooftops.

pegasus · 2025-07-09T07:49:40 1752047380

It is a very active area of research, AI alignment. The research so far [1] suggests inherent hard limits to what can be achieved. TeMPOraL's comment [2] above points out the reason this is so: the generalizable nature of LLMs is in direct tension with certain security requirements.

[1] check out Robert Miles' excellent AI safety channel on youtube: https://www.youtube.com/@RobertMilesAI

[2] https://news.ycombinator.com/item?id=44504527