Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure you can. You just design the system to assume the LLM output isn't predictable, come up with invariants you can reason with, and drop all the outputs that don't fit the invariants. You accept up front the idea that a significant chunk of benign outputs will be lossily filtered in order to maintain those invariants. This just isn't that complicated; people are super hung up on the idea that an LLM agent is a loop around a single "LLM session", which is not how real agents work.


Fair.

> You just design the system to assume the LLM output isn't predictable, come up with invariants you can reason with, and drop all the outputs that don't fit the invariants.

Yes, this is what you do, but it also happens to defeat the whole reason people want to involve LLMs in a system in the first place.

People don't seem to get that the security problems are the flip side of the very features they want. That's why I'm in favor of anthropomorphising LLMs in this context - once you view the LLM not as a program, but as a something akin to a naive, inexperienced human, the failure modes become immediately apparent.

You can't fix prompt injection like you'd fix SQL injection, for more-less the same reason you can't stop someone from making a bad but allowed choice when they delegate making that choice to an assistant, especially one with questionable intelligence or loyalties.


> People don't seem to get that the security problems are the flip side of the very features they want.

Everyone who's worked in big tech dev got this the first time their security org told them "No."

Some features are just bad security and should never be implemented.


That's my point, though. Yes, some features are just bad security, but they nevertheless have to be implemented, because having them is the entire point.

Security is a means, not an end - something security teams sometimes forget.

The only perfectly secure computing system is an inert rock (preferably one drifting in space, infinitely away from people). Anything more useful than that requires making compromises on security.


Some features are literally too radioactive to ever implement.

As an example, because in hindsight it's one of the things MS handled really well: UAC (aka Windows sudo).

It's convenient for any program running on a system to be able to do anything without a user prompt.

In practice, that's a huge vector for abuse, and it turns out that crafting a system of prompting around only the most sensitive actions can be effective.

It takes time, but eventually the program ecosystem updates to avoid touching those things in that way (because prompts annoy users), prompt instances decrease, and security is improved because they're rare.

Proper feature design is balancing security with functionality, but if push comes to shove security should always win.

Insecure, functional systems are worthless, unless the consequences of exploitation are immaterial.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: