> As long as every AI provider is operating at a loss
None of them are doing that.
They need funding because the next model has always been much more expensive to train than the profits of the previous model. And many do offer a lot of free usage which is of course operated at a loss. But I don't think any are operating inference at a loss, I think their margins are actually rather large.
Parent comment never said operating inference at a loss, though it wouldn't surprise me, they just said "operating at a loss" which they most definitely are [0].
However, knowing a few people on teams at inference-only providers, I can promise you some of them absolutely are operating inference at a loss.
> Parent comment never said operating inference at a loss
Context. Whether inference is profitable at current prices is what informs how risky it is to build a product that depends on buying inference, which is what the post was about.
So you're assuming there's a world where these companies exist solely by providing inference?
The first obvious limitation of this would be that all models would be frozen in time. These companies are operating at an insane loss and a major part of that loss is required to continue existing. It's not realistic to imagine that there is an "inference" only future for these large AI companies.
And again, there are many inference only startups right now, and I know plenty of them are burning cash providing inference. I've done a lot of work fairly close to the inference layer and getting model serving happening with the requirements for regular business use is fairly tricky business and not as cheap as you seem to think.
The models may be somewhat frozen in time but with the right tools available to it they don't need all information innately coded into it. If they're able to query for reliable information to drag in they can talk about things that are well outside their original training data.
For a few months of news this works, but over the span of years even the statistical nature of language drifts a bit. Have you shipped natural language models to production? Even simple classifiers need to be updated periodically because of drift. There is no world where you lead the industry serving LLMs and don't train them as well.
Okay, this tells me you really don't understand model serving or any of the details of infrastructure. The hardware is incredibly ephemeral. Your home GPU might last a few years (and I'm starting to doubt that you've even trained a model at home), but these GPUs have incredibly short lifespans under load for production use.
Even if you're not working on the back end of these models, you should be well aware that one of the biggest concerns about all this investment is how limited the lifetime of GPUs is. It's not just about being "outdated" by superior technology, GPUs are relatively fragile hardware and don't last too long under constant load.
As far as models go, I have a hard time imagining a world in 2030 where the model replies "sorry, my cutoff date was 2026" and people have no problem with this.
Also, you still didn't address my point that startups doing inference only model serving are burning cash. Production inference is not the same as running inference locally where you can wait a few minutes for the result. I'm starting to wonder if you've ever even deployed a model of any size to production.
I didn't address the comment about how some startups are operating at a loss because it seems like an irrelevant nitpick at my wording that "none of them" is operating inference at a loss. I don't think the comment I was replying to was referring to relying on whatever startups you're talking about. I think they were referring to Google, Anthropic, and OpenAI - and so was I.
That seems like a theme with these replies, nitpicking a minor thing or ignoring the context or both, or I guess more generously I could blame myself of not being more precise with my wording. But sure, you have to buy new GPUs after making a bunch of money burning the ones you have down.
I think your point about knowledge cutoff is interesting, and I don't know what the ongoing cost to keeping a model up to date with world knowledge is. Most of the agents I think about personally don't actually want world knowledge and have to be prompted or fine tuned such that they won't use it. So I think that requirement kind of slipped my mind.
This thread isn't about who wins, it's about the implication that it's too risky to build anything that depends on inference because AI companies are operating at a loss.
So AI companies are profitable when you ignore some of the things they have to spend money on to operate?
Snark aside, inference is still being done at a loss. Anthropic, the most profitable AI vendor, is operating at a roughly -140% margin. xAI is the worst at somewhere around -3,600% margin.
If they are not operating inference at a loss and current models remain useful (why would they regress?), they could just stop developing the next model.
At minimum they have to incorporate new data every month or the models will fail to know how many Shrek movies there are and become increasingly wrong in a world that isn't static.
Yes, but at the same time it's unlikely for existing models to disappear. You won't get the next model, but there is no choice but to keep inference running to pay off creditors.
The interesting companies to look at here are the ones that sell inference against open weight models that were trained by other companies - Fireworks, Cloudflare, DeepInfra, Together AI etc.
They need to cover their serving costs but are not spending money on training models. Are they profitable? Probably not yet, because they're investing a lot of cash in competing with each other to R&D more efficient ways of serving etc, but they're a lot closer to profitability than the labs that are spending millions of dollars on training runs.
> But I don't think any are operating inference at a loss, I think their margins are actually rather large.
Citation needed. I haven't seen any of them claim to have even positive gross margins to shareholders/investors, which surely they would do if they did.
> “if you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There’s some cost to inference with the model, but let’s just assume, in this cartoonish cartoon example, that even if you add those two up, you’re kind of in a good state. So, if every model was a company, the model, in this example, profitable,” he added.
“What’s going on is that while you’re reaping the benefits from one company, you’re founding another company that’s much more expensive and requires much more upfront R&D investment. The way this is going to shake out is that it’s going to keep going up until the numbers get very large, and the models can’t get larger, and then there will be a large, very profitable business. Or at some point, the models will stop getting better, and there will perhaps be some overhang — we spent some money, and we didn’t get anything for it — and then the business returns to whatever scale it’s at,” he said.
This take from Amodei is hilarious but explains so much.
When comparing the cost of an H100 GPU per hour and calculating cost of tokens, it seems the OpenAI offering for the latest model is 5 times cheaper than renting the hardware.
OpenAI balance sheet also shows an $11 billion loss .
I can't see any profit on anything they create.
The product is good but it relies on investors fueling the AI bubble.
All the labs are going hard on training and new GPUs. If we ever level off, they probably will be immensely profitable. Inference is cheap, training is expensive.
To do this analysis on an hourly retail cost and an open weight model and infer anything about the situation at OpenAI or Anthropic is quite a reach.
For one (basic) thing, they buy and own their hardware, and have to size their resources for peak demand. For another, Deepseek R1 does not come close to matching claude performance in many real tasks.
> When comparing the cost of an H100 GPU per hour and calculating cost of tokens, it seems the OpenAI offering for the latest model is 5 times cheaper than renting the hardware.
How did you come to that conclusion? That would be a very notable result if it did turn out OpenAI were selling tokens for 5x the cost it took to serve them.
None of them are doing that.
They need funding because the next model has always been much more expensive to train than the profits of the previous model. And many do offer a lot of free usage which is of course operated at a loss. But I don't think any are operating inference at a loss, I think their margins are actually rather large.