It's attempting to split control and data through a system which is susceptible ...

spion · on May 15, 2023

The system as described is not susceptible to prompt injection:

- The tool-using-LLM never sees data, only variables that are placeholders for the data.

- A post tool-using-LLM templating layer translates variables into content before passing them to a concrete tool.

- After variables are translated, only a non-priviledged (non-tool-using) LLM has access to the actual content.

- The output of a non-priviledged LLM is again another variable represented e.g. by the tokens $OUTPUT. The tool LLM never sees into that content. It can give it to another tool, but it cannot see inside it.

You can inject prompt into the non-priviledged LLM but it doesn't get to do anything.

sdenton4 · on May 15, 2023

You're simply incorrect here: the point is that the quarantined LLM has no ability to execute code and all inputs and outputs are treated as untrusted strings. Thanks to the history of the Internet, handling untrusted strings is a thing we understand how to do.

The privileged LLM doesn't see the untrusted text, and is prompted by the user - which is fine until the user does something dumb with the untrusted text. (Thus, the social engineering section.)

Nothing about this is security by obscurity... It may be flawed ( feel free to provide an example that would cause a failure), but it's not just hiding a problem under a layer of rot13...

jamilton · on May 15, 2023

Prompt injection with this method could, at worst, make the plaintext incorrect. The summary could be replaced with spam, for example. Prompt injection with the naive method (just have 1 LLM doing everything) could, at worst, directly infect the user's computer.