Sure, but Augment’s main value add is their context engine, and imo they do it really well. If all they had to do was launch an MCP for their context engine product to compete, I think the comparison is still worth exploring.
It's fascinating to see the evolution of HN sentiment towards LLMs in real time. Just a few months ago, projects like these were a dime a dozen and every AI-related post had a skeptical comment at the top. Now I'm almost surprised to see a project like this hit the front page.
I don't have any particular opinion about this project itself, I'm sure there are legitimate use cases for wanting to trick LLMs or obfuscate content etc. But if these sorts of projects are a litmus test for AI skepticism, I'm seeing a clear trend: AI skeptics are losing ground on HN.
I actually made this back in August but never posted it until now.
I agree with your point; many of the comments say that simple regex filtering can solve it, but they seem to ignore that it would break many languages that rely on these characters for things like accent marks.
> 47 tools = 141k tokens consumed before you write a single word
This is the real problem in my opinion.
There are a ton of great sounding MCP but in practice they have too many individual tools and way too much documentation for each tool. It inflates processing time and burns tokens.
I find MCP is the opposite of the Unix design philosophy. You want fewer tools with more options surfaced via schema, shorter documentation, and you want to rely on convention as much as possible.
You don’t want a create file, write file, and update file tools, you want one write file tool with the ability to do all of those things. Instead of ls and find you want your list files tool to support regex and fuzzy matching with a metadata list.
This is based on building these things for most of this year, so it’s anecdotal and ymmv.
As an example rust-mcp-filesystem has 24 tools, many with completely overlapping functionality: `head_file`, `tail_file`, `read_file_lines`, `read_text_file` plus multi-file variants; or there's `list_directory`, `list_directory_with_sizes`, `calculate_directory_size`, `search_files`, and `directory_tree`. I think that whole server could be 4-6 mcp tools and it would accelerate things.
No mention of coding benchmarks. I guess they've given up on competing with Claude and GPT-5 there. (and from my initial testing of grok 4.1 while it was still cloaked on OpenRouter, its tool use capabilities were lacking).
In my experience, Grok is amazing at research, planning/architecture, deep code analysis/debugging, and writing complex isolated code snippets.
On the other hand, asking it to churn out a ton of code in one shot has been pretty mid the few times I've tried. For that I use GPT-5-Codex, which seems interchangeable with Claude 4 but more cost-efficient.
Codex is good when you have a clear spec and an isolated feature.
Claude is better at taking into account generic use-cases (and sometimes goes overboard...)
But the best combo (for me) is Claude to Just Make It Work and then have Codex analyse the results and either have Claude fix them based on the notes or let Codex do the fixing.
Ah okay, that makes sense. I do a lot of planning with Gemini and Grok before the coding model ever gets involved, so that might be why I've never noticed a clear difference in output quality between GPT-5, GPT-5-Codex, and Claude 4.
TBH I really should do a lot more pre-planning for tasks - especially on new projects. But it's just so much more rewarding to shove Claude at a quick idea, watch some shows and come back to see what it figured out =)
Since coding is such a common usecase and since Claude and GPT5 - Codex are fairly high bars to beat I'm guessing we'll see an updated code model soon.
Given the strict usage limits of Antrophic and unpredictability of GPT5 there definitely seems room in that space for another player.
I'm with you, which is why I started the post by stating that most group chats don't need LLM assistants. But I do wonder if your friends have ever posted an AI generated image in the gc, or text you suspect was generated by LLMs? I would be very surprised if not.
reply