Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-4o mini is $0.15/1M input tokens, $0.60/1M output tokens. In comparison, Claude Haiku is $0.25/1M input tokens, $1.25/1M output tokens.

There's no way this price-race-to-the-bottom is sustainable.



At scale you should realise that this is still A LOT of money and the models are considerably reduced in cost so the margin probably works out even better. OpenAI are successful, it's a fact, which means they know what they're doing business wise. (Not bootlicking, just trying to be logical).

Think about it this way: Imagine if every email you sent or every online forum post you commented on provided incentive for the provider.


I’m not sure what you mean and I don’t see how profitability follows from that?

Venture-backed companies can lose money for years. Sometimes it pays off in the end, but making predictions about profitability seems hard inside a bubble.

Also, some industries like manufacturing solar panels have high market growth but they’re unprofitable for most manufacturers.

So I think it remains to be seen if OpenAI knows what they’re doing. It doesn’t seem like the sort of thing armchair arguments are good at predicting.


You're right. This is definitely armchair opinions however what I meant was at scale, OpenAI are able to make their model unfathomably cheap as they have the resources to do so.

If they're running at a loss, it's a great way to take shots at the competition and especially with the added advantage of model capability.

Get more customers onboard, play around with margins as required.


Take a loss on every sale and make up for it with volume!


Take a loss on every sale to drive less-well-funded competitors out of the market, and then reap monopoly rents.


> Take a loss on every sale and make up for it with volume!

If you take a loss on every sale, it is impossible to make up for it with volume. The result will be a loss magnified by the volume.


It's a joke. Sadly, the origin is unknown, but it's a joke that's well over 10 years old.


I believe it originates in the original dot.com bubble.


I'm pretty sure I heard it in an econ class, which would have been around y2k. From the way it was presented I had the sense that it was already well known.


Guess you missed the sarcasm.


Sarcasm is generally expected to be suffixed with /s. In this case, significant historical context is required to detect it.


They're building a beautiful garden with rich soil and generous watering. In fact it is so wonderful that you'd love to grow your product there. A product with deep roots and symbiotic neighbors.

Just be careful when they start building the walls. And they will build those walls.


I think it's heavily quantized, so it doesn't cost them (too much). But I think it's still at cost...


Judging from the perplexity scores, the model doesn't seem to be quantized, it seems to simply be a scaled down version of the original GPT-4O or something similar.


Yeah, to put these prices in perspective: when tokens get this cheap, $1M buys you more than a trillion output tokens.

To earn appreciable revenue at this price, an LLM company needs to be regularly generating multiple internets worth of text.

On the one hand, generating multiple internets of text seems outlandish.

But on the other hand, we're now approaching the point where you can start building LLMs into software without fretting about cost. Now that you can buy ~30 pages for a penny (instead of a dollar) you can really start to throw it into websites, games, search bars, natural language interfaces etc. without every user costing you much.

But small models are not the endgame for these AI companies, as truly general intelligence is a market worth trillions.

What this ~98% cost drop over 2 years hints at is that when AGI does arrive, it might not be horribly expensive.


I don't expect organizations to need to generate 1T output tokens, but 1T input tokens is common. Consider developers at a large company running queries with their entire codebase as context. Or lawyers plugging in the entire tax code to ask questions about. Each of them running dozens of queries per day on multi-millions of context input, it's going to add up quick.


Wouldn't a lawyer wanting to run queries against the entire tax code have a model that was fine-tuned on all of that data though? I mean, vs. doing RAG by sending the entire tax code on each request.


Unclear, but fine-tuning has many problems not faced by RAG:

- More prone to hallucinations

- Worse at citing sources for people to double check outputs

- Can't be updated without retraining

- Can't impose knowledge access controls for different users


I think the place for generating larger total revenue/margins would be in the highest end models. Budget models almost "come with" the effort put towards making those high end models so it's alright they are a race to the bottom (so long as someone actually realizes return on higher end models, which is a problem in itself at this moment).


> There's no way this price-race-to-the-bottom is sustainable.

Why not?


Well each new generation of model costs like 10x the previous one to train, and its value (and thus ability to generate a return) diminishes extremely rapidly. The only source of improved economics is the rapidly evaporating Moore's Law (and any opex savings are swamped by the crazy high capex if you're using chips from Nvidia).


> rapidly evaporating Moore's Law

Algorithm (no, I don't mean Mamba etc, you can still use decoder-only transformers with some special attention layers) and engineering side there's still at least 10x improvement possible. Compared to what TensorRT-LLM is able to achieve now.

My concern is, this is only possible because of scale, so local LLMs are going to be dead in the water.


what if they can make money? then the problem is on claude/gemini...


These models are still really expensive to run




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: