GPT-4o mini is $0.15/1M input tokens, $0.60/1M output tokens. In comparison, Cla...

razodactyl · on July 18, 2024

At scale you should realise that this is still A LOT of money and the models are considerably reduced in cost so the margin probably works out even better. OpenAI are successful, it's a fact, which means they know what they're doing business wise. (Not bootlicking, just trying to be logical).

Think about it this way: Imagine if every email you sent or every online forum post you commented on provided incentive for the provider.

skybrian · on July 18, 2024

I’m not sure what you mean and I don’t see how profitability follows from that?

Venture-backed companies can lose money for years. Sometimes it pays off in the end, but making predictions about profitability seems hard inside a bubble.

Also, some industries like manufacturing solar panels have high market growth but they’re unprofitable for most manufacturers.

So I think it remains to be seen if OpenAI knows what they’re doing. It doesn’t seem like the sort of thing armchair arguments are good at predicting.

razodactyl · on July 28, 2024

You're right. This is definitely armchair opinions however what I meant was at scale, OpenAI are able to make their model unfathomably cheap as they have the resources to do so.

If they're running at a loss, it's a great way to take shots at the competition and especially with the added advantage of model capability.

Get more customers onboard, play around with margins as required.

Sohcahtoa82 · on July 18, 2024

Take a loss on every sale and make up for it with volume!

dragonwriter · on July 19, 2024

Take a loss on every sale to drive less-well-funded competitors out of the market, and then reap monopoly rents.

OutOfHere · on July 18, 2024

> Take a loss on every sale and make up for it with volume!

If you take a loss on every sale, it is impossible to make up for it with volume. The result will be a loss magnified by the volume.

Sohcahtoa82 · on July 18, 2024

It's a joke. Sadly, the origin is unknown, but it's a joke that's well over 10 years old.

ben_w · on July 18, 2024

I believe it originates in the original dot.com bubble.

dllthomas · on July 20, 2024

I'm pretty sure I heard it in an econ class, which would have been around y2k. From the way it was presented I had the sense that it was already well known.

thatsnotmepls · on July 18, 2024

Guess you missed the sarcasm.

OutOfHere · on July 19, 2024

Sarcasm is generally expected to be suffixed with /s. In this case, significant historical context is required to detect it.

Workaccount2 · on July 18, 2024

They're building a beautiful garden with rich soil and generous watering. In fact it is so wonderful that you'd love to grow your product there. A product with deep roots and symbiotic neighbors.

Just be careful when they start building the walls. And they will build those walls.

yawnxyz · on July 18, 2024

I think it's heavily quantized, so it doesn't cost them (too much). But I think it's still at cost...

saiansh2525 · on July 19, 2024

Judging from the perplexity scores, the model doesn't seem to be quantized, it seems to simply be a scaled down version of the original GPT-4O or something similar.

tedsanders · on July 18, 2024

Yeah, to put these prices in perspective: when tokens get this cheap, $1M buys you more than a trillion output tokens.

To earn appreciable revenue at this price, an LLM company needs to be regularly generating multiple internets worth of text.

On the one hand, generating multiple internets of text seems outlandish.

But on the other hand, we're now approaching the point where you can start building LLMs into software without fretting about cost. Now that you can buy ~30 pages for a penny (instead of a dollar) you can really start to throw it into websites, games, search bars, natural language interfaces etc. without every user costing you much.

But small models are not the endgame for these AI companies, as truly general intelligence is a market worth trillions.

What this ~98% cost drop over 2 years hints at is that when AGI does arrive, it might not be horribly expensive.

pants2 · on July 18, 2024

I don't expect organizations to need to generate 1T output tokens, but 1T input tokens is common. Consider developers at a large company running queries with their entire codebase as context. Or lawyers plugging in the entire tax code to ask questions about. Each of them running dozens of queries per day on multi-millions of context input, it's going to add up quick.

lsaferite · on July 18, 2024

Wouldn't a lawyer wanting to run queries against the entire tax code have a model that was fine-tuned on all of that data though? I mean, vs. doing RAG by sending the entire tax code on each request.

tedsanders · on July 18, 2024

Unclear, but fine-tuning has many problems not faced by RAG:

- More prone to hallucinations

- Worse at citing sources for people to double check outputs

- Can't be updated without retraining

- Can't impose knowledge access controls for different users

zamadatix · on July 18, 2024

I think the place for generating larger total revenue/margins would be in the highest end models. Budget models almost "come with" the effort put towards making those high end models so it's alright they are a race to the bottom (so long as someone actually realizes return on higher end models, which is a problem in itself at this moment).

quotemstr · on July 18, 2024

> There's no way this price-race-to-the-bottom is sustainable.

Why not?

mechagodzilla · on July 18, 2024

Well each new generation of model costs like 10x the previous one to train, and its value (and thus ability to generate a return) diminishes extremely rapidly. The only source of improved economics is the rapidly evaporating Moore's Law (and any opex savings are swamped by the crazy high capex if you're using chips from Nvidia).

rfoo · on July 19, 2024

> rapidly evaporating Moore's Law

Algorithm (no, I don't mean Mamba etc, you can still use decoder-only transformers with some special attention layers) and engineering side there's still at least 10x improvement possible. Compared to what TensorRT-LLM is able to achieve now.

My concern is, this is only possible because of scale, so local LLMs are going to be dead in the water.

ff7250 · on July 18, 2024

what if they can make money? then the problem is on claude/gemini...

ldjkfkdsjnv · on July 18, 2024

These models are still really expensive to run