> "security through obscurity" has been a well understand fallacy in open source circles for decades
The times, as they say, are a-changin’.
Open software is not inherently more secure than closed software, and never has been.
Its relative security value was always derived from circumstantial factors, one of the most important of which was the combination of incentive and ability and willingness of others in the community to spend their time and attention finding and fixing bugs and potential exploits.
Now, that’s been the case for so long that we all implicitly take it for granted, and conclude that open software is generally more secure than closed, and that security through obscurity falls short in comparison.
But this may very well fundamentally change when the cost of navigating the search space of potential exploits, for both the attacker and the defender, is dramatically reduced along the axes of time and attention, and increased along the axis of monetary investment.
It then becomes a game of which side is more willing to pool monetary resources into OSS security analysis – the attackers or the defenders – and I wouldn’t feel comfortable betting on the defenders in that case.
There is no evidence of this. Evals are quite different from "self-evals". The only robust way of determining if LLM instructions are "good" is to run them through the intended model lots of times and see if you consistently get the result you want. Asking the model if the instructions are good shows a very deep misunderstanding of how LLMs work.
When you give prompt P to model M, when your goal is for the model to actually execute those instructions, the model will be in state S.
When you give the same prompt to the same model, when your goal is for the model to introspect on those instructions, the model is still in state S. It's the exact same input, and therefore the exact same model state as the starting point.
Introspection-mode state only diverges from execution-mode state at the point at which you subsequently give it an introspection command.
At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input, and there is overwhelming evidence that frontier models do this very well, and have for some time.
Asking the model, while it's in state S, to introspect and surface any points of confusion or ambiguities it's experiencing about what it's being asked to do, is an extremely valuable part of the prompt engineering toolkit.
I didn't, and don't, assert that "asking the model if the instructions are good" is a replacement for evals – that's a strawman argument you seem to be constructing on your own and misattributing to me.
Nicely put. I haven't seen anyone say that the introspection abilities of LLMs are up to much, but claiming that it's completely impossible to get a glimpse behind the curtain is untrue.
At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input
This point is load-bearing for your position, and it is completely wrong.
Prompt P at state S leads to a new state SP'. The "common jumping off point" you describe is effectively useless, because we instantly diverge from it by using different prompts.
And even if it weren't useless for that reason, LLMs don't "query" their "state" in the way that humans reflect on their state of mind.
The idea that hallucinations are somehow less likely because you're asking meta-questions about LLM output is completely without basis
Is that based on your "deep understanding" of how LLMs work or have you actually tried it? If you watch the execution trace of a Skill in action, you can see that it's doing exactly this inspection when the skill runs - how could it possibly work any other way?
Skills are just textual instructions, LLMs are perfectly capable of spotting inconsistencies, gaps and contradictions in them. Is that sufficient to create a good skill? No, of course not, you need to actually test them. To use an analogy, asking a LLM to critique a skill is like running lint on C code first to pick up egregious problems, running testcases is vital.
I view this post as primarily pattern-matching and storytelling. But I think there’s a buried truth there, and that they were nibbling at the edges of it when they started talking about the overlapping stages.
There are some very interesting information network theories that present information growth as a continually evolving and expanding graph, something like a virus inherent to the universe’s structure, as a natural counterpoint to entropy. And in that view, atomic bonds and cells and towns and railroads and network connections and model weights are all the same sort of thing, the same phenomenon, manifesting in different substrates at different levels of the shared graph.
To me, that’s a much better and deeper explanation that connects the dots, and offers more predictive power about what’s next.
Highly recommend the book Why Information Grows to anyone whose interest is piqued by this.
When a human is coding against a traditional API, it might be a bit annoying if the API has four or five similar-sounding endpoints that each have a dozen parameters, but it's ultimately not a showstopper. You just spend a little extra time in the API docs, do some Googling to see what people are using for similar use cases, decide which one to use (or try a couple and see which actually gets you what you want), commit it, and your script lives happily ever after.
When an AI is trying to make that decision at runtime, having a set of confusing tools can easily derail it. The MCP protocol doesn't have a step that allows it to say "wait, this MCP server is badly designed, let me do some Googling to figure out which tool people are using for similar use cases". So it'll just pick whichever ones seems most likely to be correct, and if it's wrong, then it's just wasted time and tokens and it needs to try the next option. Scaled up to thousands or millions of times a day, it's pretty significant.
There's a lot of MCP servers out there that are just lazy mappings from OpenAPI/Swagger specs, and it often (not always, to be fair) results in a clunky, confusing mess of tools.
You could get pretty far with a set of agent-focused routes mounted under e.g. an /agents path in your API.
There'd be a little extra friction compared to MCP – the agent would presumably have to find and download and read the OpenAPI/Swagger spec, and the auth story might be a little clunkier – but you could definitely do it, and I'm sure many people do.
Beyond that, there are a few concrete things MCP provides that I'm a fan of:
- first-class integration with LLM vendors/portals (Claude, ChatGPT, etc), where actual customers are frequently spending their time and attention
- UX support via the MCP Apps protocol extension (this hasn't really entered the zeitgeist yet, but I'm quite bullish on it)
- code mode (if using FastMCP)
- lots of flexibility on tool listings – it's trivial to completely show/hide tools based on access controls, versus having an AI repeatedly stumble into an API endpoint that its credentials aren't valid for
I could keep going, but the point is that while it's possible to use another tool for the job and get _something_ up and running, MCP (and FastMCP, as a great implementation) is purpose built for it, with a lot of little considerations to help out.
It's still middle-click in my muscle memory from the Windows XP days!
God, I used to be _really_ into Minesweeper.
One of the earliest games I made back in college was a 3D Minesweeper cube. I remember being really proud of one little detail – the detection and automatic resolution of ambiguous clues that would require guessing, which always annoyed the heck out of me in every other version of Minesweeper.
Yeah, you could certainly tell it to skip the Gemini image gen step and replace with generated SVG art, or an existing image, or just stick a longer description there, whatever. It's flexible.
I thought about the deterministic generation aspect, but there's no "random seed" equivalent for frontier LLMs to my knowledge that would guarantee deterministic output (even temperature=0 is still nondeterministic).
Wow. The guy who’s been thanklessly maintaining the project for 10+ years, with very little help, went way out of his way to produce a zero-reuse, ground-up reimplementation so that it could be MIT licensed... and the very-online copyleft crowd is crucifying him for it and telling him to kick rocks.
Unbelievable. This is why we can’t have nice things.
Mark Pilgrim isn't even the original author, he just ported the C version to Python and contributed nothing to it for the last 10 years.
If you take 5 minutes to look at the code you'll see that v7 works in a completely different way, it mostly uses machine learning models instead of heuristics. Even if you compare the UTF8 or UTF16 detection code you'll see that they have absolutely nothing in common.
Its just API compatible and the API is basically 3 functions.
If he had published this under a different name nobody would have challenged it.
Those claims were way, way after the fact. Like, 50+ years later. Zero documentation or contemporary evidence of ANY kind. The claim isn’t taken particularly seriously by historians.
Interesting that you’re getting downvoted. This passage also stuck out like a sore thumb to me – it’s like seeing some antivax stuff thrown into an otherwise serious discussion.
The times, as they say, are a-changin’.
Open software is not inherently more secure than closed software, and never has been.
Its relative security value was always derived from circumstantial factors, one of the most important of which was the combination of incentive and ability and willingness of others in the community to spend their time and attention finding and fixing bugs and potential exploits.
Now, that’s been the case for so long that we all implicitly take it for granted, and conclude that open software is generally more secure than closed, and that security through obscurity falls short in comparison.
But this may very well fundamentally change when the cost of navigating the search space of potential exploits, for both the attacker and the defender, is dramatically reduced along the axes of time and attention, and increased along the axis of monetary investment.
It then becomes a game of which side is more willing to pool monetary resources into OSS security analysis – the attackers or the defenders – and I wouldn’t feel comfortable betting on the defenders in that case.
reply