> Let me be clear: this is pure speculation. The evidence is public, but there are no leaks or insider rumors that confirm I’m right. In fact, I am building the theory with this post, not just sharing it. I don’t have privileged information—if I did, I’d be under an NDA anyway. The hypothesis feels compelling because it makes sense. And honestly, what more do I need to give the rumor mill a spin?
Maybe that raises eyebrows, but I for one appreciate this disclosure — too many opinions are presented as facts nowadays. In turn, it makes me read the article more seriously.
I have the feeling on the other hand that we have plateaued quite quickly.
If you dig deep, what current AI companies do right now is not really better than what the chess and GO AI players were doing 30 years ago, the only difference being mostly the amount of storage available and the speed at which AI tools are able to use them.
And I have the feeling that we will already are maximizing on the amount and quality of data available, that we can't really grow and improve much better because the human quality content is already hidden by the sheer amount of uninteresting content being produced currently by AI tools. Also of what is left of human content, a huge majority is build to serve a marketing or political agenda, both using the same technics of lies, deception and fraud. What good can we get of tools built from lies and fantasies?
So, if I read this right, all the major AI companies are not releasing their highest performing models because it's too expensive to roll them out en masse, they are instead using it to (somehow, IANA ML expert) make their existing models cheaper to run.
If I'm reading this right, they're peaking on general performance (as a function of GPUs on hamster wheels), and instead are focusing on reducing costs for their most-demanded operating modes.
A testable prediction of this is then: "All major AI companies will roll out hyper-specialized models that exceed general models at the same price point" and simultaneously will boost profits. The game is to find the specializations that make the most money?
Well, I think the most interesting point made is that it may not make sense for AI companies to release models _at all_ as they approach AGI. Exposing models allows their capabilities to be easily copied, whereas keeping them internal lets you capture the profit from their output while protecting your IP.
For example, say OpenAI trained a hyper-intelligent model specialising in designing new small molecule drugs. Why make that publicly available at all? Why not just partner with a pharmaceutical manufacturer and make money from selling the drugs?
We’ve become accustomed to AI-as-a-service being the default, via chat or API, but this might just be a blip. After all OpenAI have made it clear that they only see their mission as making the “benefits” of AGI available to all, which doesn’t necessarily correspond with making AGI itself available to everyone.
If you can find it in this thread, an OpenAI employee hinted at the idea they have reached "data singularity": using a model to generate synthetic data to train the next version.
>What stops them suffering the same entropy issues? Is there a reinforcement step with human feedback?
It seems the situation changed with inference scaling/reasoning models (o1 and beyond). I found the tweet I was looking for, but it wasn't from an OpenAI employee as I seemed to recall:
>>Gwern: we have reached the threshold of “recursively self-improving”, “where o4 or o5 will be able to automate AI R&D and finish off the rest.”
>Now that we have received various cryptic messages from renowned OpenAI scientists, “Gwern” assumes the following:
>- we have reached the threshold of “recursively self-improving”, “where o4 or o5 will be able to automate AI R&D and finish off the rest.”
>- the purpose of o1 is primarily to generate synthetic data for models like o3, which is why he is surprised that o1-pro was released at all
>- he thinks that Anthropic Opus 3.5 is not released for the same reason, the compute is needed to generate synthetic data
The rest of the tweet is interesting too, it starts with the same facts as this article, but makes the point for "reverse distillation" instead (i.e. using synthetic data to train the next model). In particular:
>If you look at the relevant scaling curves - may I yet again recommend reading Jones 2021?* - the reason for this becomes obvious. Inference-time search is a stimulant drug that juices your score immediately, but asymptotes hard. Quickly, you have to use a smarter model to improve the search itself, instead of doing more.
Yeah that doesn't address the point. The output of an LLM is a compression. It has errors. Recursive training would seem to create iterations that become more noisy. There's no new information, just a lossy distillation of the previous iteration.
I'm not into ML so I can't answer your question specifically, but it seems your circular google translate or jpeg comparisons are missing the essential element: self-evaluation
Sorry but this is not only dumb speculation, but a harmful one. It spreads the cool aid all AI companies are sprinkling - maybe it actually is a marketing piece - that AI development is accelerating, (LLM-based) AGI is "almost" here, and in some secret labs and startups the Real Deal is already cooking.
No one could guard such a breakthrough secret. What actually is being rumored is that GPT-5 is not that much of a leap over current models AND it's expensive to run (especially compared the amount of money they already burning with current model). Which is, well, rather believeable considering the state of the industry.
Its speculation but well argued and frankly somewhat convincing taking into account the overlay with Anthropic.
Remember they are all raising money hand-over-fist at high velocity so being able to throw in a story about improving margins whilst they march towards AGI its not a bad play.
> Let me be clear: this is pure speculation. The evidence is public, but there are no leaks or insider rumors that confirm I’m right. In fact, I am building the theory with this post, not just sharing it. I don’t have privileged information—if I did, I’d be under an NDA anyway. The hypothesis feels compelling because it makes sense. And honestly, what more do I need to give the rumor mill a spin?