I'm actually relieved they're doing it now because it's going to be a forcing function for the local LLM ecosystem. Same thing with their "distillation attack" smear piece -- the more of a spotlight people get on true alternatives + competition to the 900 lb gorillas, the better for all users of LLMs.
I really hope so. I moved to Codex, only to get my account flagged and my requests downgraded to 5.2 because of some "safety" thing. Now OpenAI demands I hand my ID over to Persona, the incredibly dodgy US surveillance company Discord just parted ways with, to get back what I paid for.
This timeline sucks, I don't want to live in a future where Anthropic and OpenAI are the arbiters of what we can and cannot do.
It definitely does suck. I had the same feelings about a year ago and the unpleasantness has definitely increased. But glass half full, we didn't have Kimi K2.5, GLM5, Qwen3.5, MiniMax 2.5, Step Flash 3.5, etc available and the cambrian explosion is only continuing (DeepSeek V4 should be out pretty soon too).
The real moment of relief for me was the first time I used DeepSeek R1 to do a large task that I would've otherwise needed Claude/OpenAI for about 12 months ago and it just did it -- not just decently, but with less slop than Claude/OpenAI. Ever since that point, I've been continuing to eye local models and parallel testing them for workloads I'd otherwise use commercial frontier models for. It's never a perfect 1:1 replacement, but I've found that I've gotten close enough that I no longer feel that paranoia of my AI workloads not being something I can own and control. True, I do have to sacrifice some capability, but the tradeoff is I get something that lives on my metal, never leaks data or IP, doesn't change behavior or get worse under my feet, doesn't rate limit me, can be fine tuned and customized. It's all lead to a belief for me that the market competition is very much functioning and the cat is out of the bag, for the benefit of all of us as users.
Very clever. Our team is small enough right now for this to not be an issue, but I've ran into this issue previously and this feels like a far more practical design to avoid lockin.
What do you think about RLMs? At first blush it looks like sub agents with some sprinkles on top, but people who have become more adept with it seem to show its ability to handle sublinear context scaling behavior very effectively.
By "agentic recursive LLMs," I mean all the approaches that involve agents recursively calling LLMs, including RLMs. My post in fact links to an RLM paper.
It's a very concerning future. I would love to live in a world where we could simply stop them from doing that, but for the moment, the best hedge appears to be the Chinese open weight models that can't be put back in the box and provide the valuable market function of commodifying the encoded knowledge of these models (which in and of itself was derived from knowledge not created by the frontier lab).
It goes the other way around as well. DeepSeek has made quite a few innovations that the US labs were lacking (DSA being the most notable one). It's also not clear to me how much of distilled outputs are just an additional ingredient of the recipe rather than a whole "frozen dinner" so to speak. I have no evidence to say it's one way or the other, but my guess is the former.
Just came across SST the other day and it looks very interesting. It looks like it's based on Pulumi, so it begs the question for me of why does it exist? Structurally it doesn't seem to be that different in capabilities. Perhaps it is more that it is a more opinionated subset that has better ergonomics. Is that correct, or is the reason different for you?
What I don't understand is why Gemini is not #1, other than that Google has no economic reason to have the same fire under its ass to be #1 as Anthropic and OpenAI. Or maybe they are correctly assessing that getting to good enough and out-building infrastructure is more valuable; they do have their TPUs as bets on their future and their search monopoly today to print nearly endless free cash flow. Perhaps Gemini is advancing at exactly the right rate for them.
I guess there is one thing that Gemini is objectively better at than either, which is long context, and it does seem to be by an order of magnitude. What boggles my mind is why Gemini is still not as good as the open weight frontier models yet. If they got just to parity with that along with their existing long context and strong token pricing, they'd be able to take over the coding market. Are they just biding their time to make their move? Hard to discern.
Something that's been on my mind recently - what if gen AI coding tools are ultimately attention casinos in the same way social media is? You burn through tons of tokens and you pay per token, it feels productive and engaging, but ultimately the more you try and fail, the more money the vendor makes. Their expressed (though perhaps not stated) economic goal may be to keep you in the "goldilocks zone" of making enough progress to not give up, but not so much progress that you 1-shot to the end state without issues.
I'm not saying that they can actually do that per sé; switching costs are so low that if you are doing worse than an existing competitor, you'd lose that volume. Nor am I saying they are deliberately bilking folks -- I think it would be hard to do that without folks cottoning on.
But, I did see an interesting thread on Twitter that had me pondering [1]. Basically, Claude Code experimented with RAG approaches over the simple iterative grep that they now use. The RAG approach was brittle and hard to get right in their words, and just brute forcing it with grep was easier to use effectively. But Cursor took the other approach to make semantic searching work for them, which made me wonder about the intrinsic token economics for both firms. Cursor is incentivized to minimize token usage to increase spread from their fixed seat pricing. But for Claude, iterative grep bloating token usage doesn't harm them and in fact increases gross tokens purchased, so there is no incentive to find a better approach.
I am sure there are many instances of this out there, but it does make me inclined to wonder if it will be economic incentives rather than technical limitations that eventually put an upper limit on closed weight LLM vendors like OpenAI and Claude. Too early to tell for now, IMO.
Well, the first time i got really excited about an LlM was when it told me “yes, if you give me your game ideas and we iterate together, i can handle 100% of the coding.” lies, pure lies.
I agree with your point and it is to that point I disagree with GP. These open weight models which have ultimately been constructed from so many thousands of years of humanity are also now freely available to all of humanity. To me that is the real marvel and a true gift.
reply