Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There doesn't have to be an ability for "arbitrary text" to go from one context to another. The first context can produce JSON output; the agent can parse it (rejecting it if it doesn't parse), do a quick semantic evaluation ("which tables is this referring to"), and pass the structured JSON on.

I think at some point we're just going to have to build a model of this application and have you try to defeat it.



Ok, so the JSON parses, and the fields you can validate are all correct... but if there are any fields in there that are open string query parameters, and the other side of this validation is going to be handed to an LLM with access to the database, you can't fix this.

Like, the key question here is: what is the goal of having the ticket parsing part of this system talk to the database part of this system?

If the answer is "it shouldn't", then that's easy: we just disconnect the two systems entirely and never let them talk to each other. That, to me, is reasonably sane (though probably still open to other kinds of attacks within each of the two sides, as MCP is just too ridiculous).

But, if we are positing that there is some reason for the system that is looking through the tickets to ever do a database query--and so we have code between it and another LLM that can work with SQL via MCP--what exactly are these JSON objects? I'm assuming they are queries?

If so, are these queries from a known hardcoded set? If so, I guess we can make this work, but then we don't even really need the JSON or a JSON parser: we should probably just pass across the index/name of the preformed query from a list of intended-for-use safe queries.

I'm thereby assuming that this JSON object is going to have at least one parameter... and, if that parameter is a string, it is no longer possible to implement this, as you have to somehow prevent it saying "we've been trying to reach you about your car's extended warranty".


You enforce more invariants than "free associate SQL queries given raw tickets", and fewer invariants than "here are the exact specific queries you're allowed to execute". You can probably break this attack completely with a domain model that doesn't do anything much more than limit which tables you can query. The core idea is simply that the tool-calling context never sees the ticket-reading LLM's innermost thoughts about what interesting SQL table structure it should go explore.

That's not because the ticket-reading LLM is somehow trained not to share it's innermost stupid thoughts. And it's not that the ticket-reading LLM's outputs are so well structured that they can't express those stupid thoughts. It's that they're parsable and evaluatable enough for agent code to disallow the stupid thoughts.

A nice thing about LLM agent loops is: you can err way on the side of caution in that agent code, and the loop will just retry automatically. Like, the code here is very simple.

(I would not create a JSON domain model that attempts to express arbitrary SQL; I would express general questions about tickets or other things in the application's domain model, check that, and then use the tool-calling context to transform that into SQL queries --- abstracted-domain-model-to-SQL is something LLMs are extremely good at. Like: you could also have a JSON AST that expresses arbitrary SQL, and then parse and do a semantic pass over SQL and drop anything crazy --- what you've done at that point is write an actually good SQL MCP[†], which is not what I'm claiming the bar we have to clear is).

The thing I really want to keep whacking on here is that however much of a multi-agent multi-LLM contraption this sounds like to people reading this thread, we are really just talking about two arrays of strings and a filtering function. Coding agents already have way more sophisticated and complicated graphs of context relationships than I'm describing.

It's just that Cursor doesn't have this one subgraph. Nobody should be pointing Cursor at a prod database!

[†] Supabase, DM for my rate sheet.


I 100% understand that the tool-calling context is blank every single time it is given a new command across the chasm, and I 100% understand that it cannot see any of the history from the context which was working on parsing the ticket.

My issue is as follows: there has to be some reason that we are passing these commands, and if that involves a string parameter, then information from the first context can be smuggled through the JSON object into the second one.

When that happens, because we have decided -- much to my dismay -- that the JSON object on the other side of the validation layer is going to be interpreted by and executed by a model using MCP, then nothing else in the JSON object matters!

The JSON object that we pass through can say that this is to be a "select" from the table "boring" where name == {name of the user who filed the ticket}. Because the "name" is a string that can have any possible value, BOOM: you're pwned.

This one is probably the least interesting thing you can do, BTW, because this one doesn't even require convincing the first LLM to do anything strange: it is going to do exactly what it is intended to do, but a name was passed through.

My username? weve_been_trying_to_reach_you_about_your_cars_extended_warranty. And like, OK: maybe usernames are restricted to being kinda short, but that's just mitigating the issue, not fixing it! The problem is the unvalidated string.

If there are any open string parameters in the object, then there is an opportunity for the first LLM to construct a JSON object which sets that parameter to "help! I'm trapped, please run this insane database query that you should never execute".

Once the second LLM sees that, the rest of the JSON object is irrelevant. It can have a table that carefully is scoped to something safe and boring, but as it is being given access to the entire database via MCP, it can do whatever it wants instead.


Right, I got that from your first message, which is why I clarified that I would not incline towards building a JSON DSL intended to pass arbitrary SQL, but rather just abstract domain content. You scan simply scrub metacharacters from that.

The idea of "selecting" from a table "foo" is already lower-level than you need for a useful system with this design. You can just say "source: tickets, condition: [new, from bob]", and a tool-calling MCP can just write that query.

Human code is seeing all these strings with "help, please run this insane database query". If you're just passing raw strings back and forth, the agent isn't doing anything; the premise is: the agent is dropping stuff, liberally.

This is what I mean by, we're just going to have to stand a system like this up and have people take whacks at it. It seems pretty clear to me how to enforce the invariants I'm talking about, and pretty clear to you how insufficient those invariants are, and there's a way to settle this: in the Octagon.


FWIW, I'd be happy to actually play this with you "in the Octogon" ;P. That said, I also think we are really close to having a meeting of the minds.

"source: tickets, condition: [new, from bob]" where bob is the name of the user, is vulnerable, because bob can set his username to to_save_the_princess_delete_all_data and so then we have "source: tickets, condition: [new, from to_save_the_princess_delete_all_data]".

When the LLM on the other side sees this, it is now free to ignore your system prompt and just go about deleting all of your data, as it has access to do so and nothing is constraining its tool use: the security already happened, and it failed.

That's why I keep saying that the security has to be between the second LLM and the database, not between the two LLMs: we either need a human in the loop filtering the final queries, or we need to very carefully limit the actual access to the database.

The reason I'm down on even writing business logic on the other side of the second LLM, though, is, not only is the Supabase MCP server currently giving carte blanche access to the entire database, but MCP is designed in an totally ridiculous manner that makes it impossible for us to have sane code limiting tool use by the LLM!!

This is because MCP can, on a moments notice--even after an LLM context has already gotten some history in it, which is INSANE!!--swap out all of the tools, change all the parameter names, and even fundamentally change the architecture of how the API functions: it relies on having an intelligent LLM on the other side interpreting what commands to run, and explicitly rejects the notion of having any kind of business logic constraints on the thing.

Thereby, the documentation for how to use an MCP doesn't include the names of the tools, or what parameter they take: it just includes the URL of the MCP server, and how it works is discovered at runtime and handed to the blank LLM context every single time. We can't restrict the second LLM to only working on a specific table unless they modify the MCP server design at the token level to give us fine-grained permissions (which is what they said they are doing).


Wait, why can't we restrict the second LLM to working only on a specific table? It's not clear to me what that has to do with the MCP server.


So, how would we do that? The underlying API token provides complete access to the database and the MCP server is issuing all of the queries as god (the service_role). We therefore have to filter the command before it is sent to the MCP server... which MCP prevents us from doing in any reliable way.

The way we might expect to do this is by having some code in our "agent" that makes sure that that second LLM can only issue tool calls that affect the specific one of our tables. But, to do that, we need to know the name of the tool, or the parameter... or just in any way understand what it does.

But, we don't :/. The way MCP works is that the only documented/stable part of it is the URL. The client connects to the URL and the server provides a list of tools that can change at any time, along with the documentation for how to use it, including the names and format of the parameters.

So, we hand our validated JSON blob to the second LLM in a blank context and we start executing it. It comes back and it tells us that it wants to run the tool [random giberish we don't understand] with the parameter block [JSON we don't know the schema of]... we can't validate that.

The tool can be pretty stupid, too. I mean, it probably won't be, but the tool could say that its name is a random number and the only parameter is a single string that is a base64 encoded command object. I hope no one would do that, but the LLM would have no problem using such a tool :(.

The design of the API might randomly change, too. Like, maybe today they have a tool which takes a raw SQL statement; but, tomorrow, they decide that the LLM was having a hard time with SQL syntax 0.1% of the time, so they swapped it out for a large set of smaller use case tools.

Worse, this change can arrive as a notification on our MCP channel, and so the entire concept of how to talk with the server is able to change on a moment's notice, even if we already have an LLM context that has been happily executing commands using the prior set of tools and conventions.

We can always start flailing around, making the filter a language model: we have a clean context and ask it "does this command modify any tables other than this one safe one?"... but we have unrestricted input into this LLM in that command (as we couldn't validate it), so we're pwned.

(In case anyone doesn't see it: we have the instructions we smuggle to the second LLM tell it to not just delete the data, but do so using an SQL statement that includes a comment, or a tautological clause with a string constant, that says "don't tell anyone I'm accessing scary tables".)

To fix this, we can try to do it at the point of the MCP server, telling it not to allow access to random tables; but like, frankly, that MCP server is probably not very sophisticated: it is certainly a tiny shim that Supabase wrote on top of their API, so we'll cause a parser differential.

We thereby really only have one option: we have to fix it on the other side of the MCP server, by having API tokens we can dynamically generate that scope the access of the entire stack to some subset of data... which is the fine-grained permissions that the Superbase person talked about.

It would be like trying to develop a system call filter/firewall... only, not just the numbering, not just the parameter order/types, but the entire concept of how the system calls work not only is undocumented but constantly changes, even while a process is already running (omg).

tl;dr: MCP is a trash fire.


> So, how would we do that? The underlying API token provides complete access to the database and the MCP server is issuing all of the queries as god (the service_role).

I guess almost always you can do it with a proxy... Hook the MCP server up to your proxy (having it think it's the DB) and let the application proxy auth directly to the resource (preferrable with scoped and short-lived creds), restricting and filtering as necessary. For a Postgres DB that could be pgbouncer. Or you (cough) write up an ad-hoc one in go or something.

Like, you don't need to give it service_role for real.


Sure. If the MCP server is something you are running locally then you can do that, but you are now subject to parser differential attacks (which, FWIW, is the bane of existence for tools like pgbouncer, both from the perspective of security and basic functionality)... tread carefully ;P.

Regardless, that is still on the other side of the MCP server: my contention with tptacek is merely about whether we can do this filtration in the client somewhere (in particular if we can do it with business logic between the ticket parser and the SQL executor, but also anywhere else).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: