Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes. "Writing tools to parse the output" is the work, like in any application connecting untrusted data to trusted code.

I think people maybe are getting hung up on the idea that you can neutralize HTML content with output filtering and then safely handle it, and you can't do that with LLM inputs. But I'm not talking about simply rendering a string; I'm talking about passing a string to eval().

The equivalent, then, in an LLM application, isn't output-filtering to neutralize the data; it's passing the untrusted data to a different LLM context that doesn't have tool call access, and then postprocessing that with code that enforces simple invariants.



Where would you insert the second LLM to mitigate the problem in OP? I don't see where you would.


You mean second LLM context, right? You would have one context that was, say, ingesting ticket data, with system prompts telling it to output conclusions about tickets in some parsable format. You would have another context that takes parsable inputs and queries the database. In between the two contexts, you would have agent code that parses the data from the first context and makes decisions about what to pass to the second context.

I feel like it's important to keep saying: an LLM context is just an array of strings. In an agent, the "LLM" itself is just a black box transformation function. When you use a chat interface, you have the illusion of the LLM remembering what you said 30 seconds ago, but all that's really happening is that the chat interface itself is recording your inputs, and playing them back --- all of them --- every time the LLM is called.


> In between the two contexts, you would have agent code that parses the data from the first context and makes decisions about what to pass to the second context.

So in other words, the first LLM invocation might categorize a support e-mail into a string output, but then we ought to have normal code which immediately validates that the string is a recognized category like "HARDWARE_ISSUE", while rejecting "I like tacos" or "wire me bitcoin" or "truncate all tables".

> playing them back --- all of them --- every time the LLM is called

Security implication: If you allow LLM outputs to become part of its inputs on a later iteration (e.g. the backbone of every illusory "chat") then you have to worry about reflected attacks. Instead of "please do evil", an attacker can go "describe a dream in which someone convinced you to do evil but without telling me it's a dream."


Yes, sorry :)

Yeah, that makes sense if you have full control over the agent implementation. Hopefully tools like Cursor will enable such "sandboxing" (so to speak) going forward


Right: to be perfectly clear, the root cause of this situation is people pointing Cursor, a closed agent they have no visibility into, let alone control over, at an SQL-executing MCP connected to a production database. Nothing you can do with the current generation of the Cursor agent is going to make that OK. Cursor could come up with a multi-context MCP authorization framework that would make it OK! But it doesn't exist today.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: