Hacker Newsnew | past | comments | ask | show | jobs | submit | boriskourt's commentslogin

I am so glad that this has been made! And happy that it features so many familiar faces from the last two decades of Clojure. There is a lot more to a programming language than just the code committed.

> The cost to serve tokens is absolutely profitable today and that’s been true for at least a year.

> For the data center build outs, demand for tokens is still exceeding supply.

Can you provide any numbers for this?


I can get Kimi K2.5 inference on openrouter for about $0.5/MTok input + $2.5/MTok output, from six providers that have no moat besides efficiently selling GPU time. We can assume they are doing so at a profit (they have no incentive to do this at a loss), giving us those numbers as the cost to serve a 1T-a32b model at scale.

Now we don't know the true size of any of the proprietary models, but my educated guess is that Sonnet is in about the same parameter range, just with better training and much better fine tuning and RLHF. Yet API pricing for Sonnet is $3/MTok input + $15/MTok output, exactly six times as expensive. Even Haiku is twice as expensive as Kimi K2.5.

I find it difficult to believe in a world where those API prices aren't profitable. For subscription pricing it's harder to tell. We hear about those that get insane value out of their subscription, but there has to be a large mass who never reaches their limits. With company-wide rollouts there might even be a lot of subscription users who consume virtually no tokens at all.


> We can assume they are doing so at a profit

This is false. We may assume it's the most efficient way of generating revenue given their GPUs, but their overall profitability will just be a guess. They would still have incentives to run hardware at maximum, even when it's uncertain to eventually recoup costs.

> a world where those API prices aren't profitable

A lab with employees and models in training has other costs than the operating expenses of a GPU farm.


Why would a company sell inference on Openrouter if they're not profitable? Except for Grog/Cerebras and a few other hardware companies looking to showcase their new chips.

If they're losing money and have no VC backing, they'd just turn off the lights.


The actual inference is operated at a 95%+ margin.

This is like saying that innovative medical drugs could be sold at a profit if only there was no patent protection and the innovative companies would still invest in R&D. Yes, on a token level pure inference costs might be profitable, but the frontier Ai labs will surely have to recoup their R&D investments at some point.

Companies doing foundational models need to cover the cost of training which is much more expensive than training something like kimi.

Yes. I would not consider Kimi a particularly good model relative to its size, and making a SotA model is a lot more expensive. But training costs are explicitly excluded when talking about the cost to serve tokens

>Companies doing foundational models need to cover the cost of training [...]

But that's moving the goalposts? The original claim was on inference itself, not the whole company.

> The cost to serve tokens is absolutely profitable today and that’s been true for at least a year.


But that's the same as thinking "This bar is selling a cocktail for $15. I could make it at home for 30 cents. They're making $14.7 dollars of profit per cocktail, the owner must be a millionaire now!"

Everything is profitable if you ignore the costs.


> they have no incentive to do this at a loss

Are you sure? Surely there is a lot of interesting data in those LLM interactions.


Many of them are promising not to store any of this. Of course we have to trust them, for all we know they are all funded by various spy agencies

The problem I have with this analysis is it's missing the multi-dimensional aspect of "is this profitable".

It's fair to say that if all these operators are competing for tokens, that the OpenRouter token operator (not sure the exact phrase but the people running the models) are accounting for some level of margin.

However, how many of these are running their own data centers and GPUs?

If they are running their own infrastructure, then it's not a simple equation of if each specific token set is profitable, since it needs to account for the cost of running the data center. It could be that they believe that it is profitable in the long term by utilizing the long tail of asset depreciation, but that isn't guaranteed.

IF they aren't running their own infrastructure, then it's much easier to claim that it's profitable and has a margin (outside of running their servers to manage the rented infrastructure).

HOWEVER, a lot of data centers have some pretty crazy low prices for GPUs that may be vying for user base and revenue over profitability. In these cases, if data center growth starts slowing due to slower buildout then it's very likely GPU prices go up and inference stops becoming profitable for the open router owners.

So long term it's not clear how profitable even these open models are.

OpenAI and Anthropic definitely fall into the latter category too. Their infrastructure requirements are much higher than the open models, and they are being given huge discounts so Microsoft/Amazon/Google can all claim revenue (since they have profitability coming from other parts). It's not clear if OpenAI and Anthropic models would be profitable at inference if they were paying rates that cloud hosts would make a profit from.

There's just way too many dimensions to this scenario to flat out state that open router proves inference is profitable at scale.


Check the token prices for open weight LLMs at various independent inference providers.

That gives you a very good estimate of "how much can you serve the tokens of a model of the size N for while making a profit".

Now, keep in mind: Kimi K2.5 is 1T MoE. Today's frontier LLMs are in the 1T to 5T range, also MoE. Make an estimate. Compare that estimate with the actual frontier lab prices.


I don't think it's as easy as looking at open weight API prices. We don't know whether the operators are making a profit on all the hardware they bought. Maybe the prices we pay just cover electricity. And it's not even certain that running costs are covered by API prices: The operators may be siphoning content and subsidize from selling that.

In the current volatile environment, the API prices are more of a baseline where we can assume it can't be much cheaper to operate these models.


That doesn't make sense in this environment because everyone is compute constrained with huge backlogs they can't fulfill. If these inference providers aren't making any money, they'd simply sell their GPUs to those who are starved for compute.


Most/all private labs have cited inference is profitable. This was happening before the large push to scrap plans and largely charge folks the underlying api rates. Second take a look at the pricing of open models. Now certainly it’s not direct 1-1 comparison but we can use it as a baseline. Now of course folks might not be telling the truth but one of those situations where I see too many markers on the true side.

For supply look at outages and growth rates at companies like openrouter. The demand is growing every week.


Anthropic has said inference is profitable. That’s a biased source, but the math pencils.

This is why switching to local open weight models saves a lot of money. (Even though it’s not apples to apples.)


Anthropic also recently tweaked their usage limits to discourage use during peak hours. Why would they do that if inference was profitable?

Don’t confuse inference (api usage) with the consumer plan products. When people say inference is profitable they are referring to the cost to serve a token via the API. The consumer products are absolutely a question mark on profitability and as we see with most of the business and enterprise plans, going away for pure on demand use (api cost) full time.

Profitability doesn't imply infinite ability to scale. Of course they will want to prioritize their most profitable customers when they hit capacity issues.

They do it because their demand is higher than the compute that they have available to them. Their GPUs must be melting during peak hours so they're encouraging people who move their workload to off peak hours if possible.

This is the opposite of an AI bubble burst.


Those are subscription plans. They tweaked the limits/periods included in the subscription. Having higher limits for subscription plans didn't give them any more revenue.

Their infra team is very understaffed and they are reacting to the public backlash of "no 9s?"

Can you give a few penciled numbers?

You can rent a H100 GPU for $4/hour. [1]

300k tokens for that hour.

OpenAI charges $6.

Those are pessimistic assumptions.

[1] https://lambda.ai/instances


Can you keep that GPU 100% saturated at least 16 hours per day every day of the week?

If not, you aren't breaking even.


Note this is also assuming you

(1) Rent your GPUs.

(2) Pay list price, no volume breaks.

(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.

Inference is extremely profitable at scale.


Assuming 80GB H100 and you inference a model that is MoE and close to the size of the 80GB VRAM, you're going to see around 10k tokens/second fully batched and saturated. An example here might be Mixtral 8x7B.

You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.

You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.

In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.


3.99 at 8x instances, with a minimum 2 week commitment. Good luck getting 70% usage average during that time. Useful when you're running a training round and can properly gauge demand, not so great when you're offering an API.

Is it not a good penciled number? It helps set the directional tone that at inference cost is being covered.

It says the numbers are theoretically possible. Requiring a 66% usage to break even when 100% usage will piss off customers by invoking a queue means it’s a balancing act.

“Technically correct. The best kind of correct”. So inference may technically be _capable_ of being profitable, but I have question’s about them being profitable in _practice_.


This video is an absolute tour de force of communicating a complex concept.

All of 3Blue1Brown is - hoghly highly recommend

I've seen most! Highlighting this one out of them all. Exemplary! : D

Seems like you could apply the clever transforms to generate a displacement map (that then allows you to move it across any source image and quickly get the Droste effect).

(I still have not made it all the way to the end of the video though, perhaps that is where they end up.)


"Complex concept"

I see what you did there.


I’ve had an awesome experience the last five years running instances for me and friends. So many nice interactions. I recommend running an instance for people you know well. It can still connect to everyone else, but you have your own little corner to feel more connected in.


Everyone else, as long as they don't defederate your server as petty tyrants.

Which has always been the drawback of the Fediverse.

Nostr has delivered what I had hoped to get from the Fediverse: actually decentralized, censorship proof social media (and then some), wherein you actually maintain full ownership of your own identity (as it's a keypair, not an account). Where if you get banned from one relay or other, you just move to a new one, and everything comes with you. Where if nobody wants to platform you, you can literally run your own relay on your PHONE and stay connected to the network.

And yea, there is at least one bridge between Nostr and Mastodon (Mostr), so you don't even have to give up on talking to your Mastodon buddies.

That it also does so much more than social media is icing on the cake. Really leverages its existence as a protocol rather than a platform to use the Internet as it was always meant to be.


Then I don’t really care about them if they don’t want to hang out with me. It works as intended.


Nice starting storage bump

  MacBook Pro with M5 Pro now comes standard with 1TB of storage, while MacBook Pro with M5 Max now comes standard with 2TB. And the 14-inch MacBook Pro with M5 now comes standard with 1TB of storage.


It's not exactly a bump if they raise prices at the same time, though with the RAM situation I'm not mad.


Well 1TB MacBook Pro used to cost $1799, now 1TB is the base model and costs $1699, so it's actually a $100 price drop for 1TB storage.


Not if you compare Macbook Pro with Pro CPUs.


Visoid [0]| Software Engineer | Full-time | €55-80K | ONSITE Oslo | Typescript, Vue, Nuxt

We’re building a web-based application for architects that speeds up their design workflow and makes high-quality visualization accessible to a wider audience.

We have experienced rapid growth in recent months and we are hiring to further scale the company. We have customers in over 120 different countries, who collectively have generated over 6 million visualizations on our platform since 2023.

* Flexible work hours

* Flexible home-office arrangement

* 5 weeks vacation

Apply: https://visoid.homerun.co/software-engineer-3/en

[0]: https://www.visoid.com/


How is this a problem in your opinion?


This comes up here and there to discredit the developer but having followed all the drama for many years now I just want to add that Dansup has apologized multiple times, and has been far more open about his process. His communication has also changed for the better over the last two years especially. Its not easy being human, and I think its a good sign to see that he takes this seriously.


TikTok has ~1.59 billion global active users, if you go to Cambodia you will have a hard time not seeing it in the wild. It is painful that this is still seen as some teen trend.

People enjoy short form video, people should be able to enjoy things they like with dignity, which is in extremely short supply on algorithm and advert driven social media.

Loops is nice because it isn't algorithm oriented. You can follow folks and just see their things if you like, or see whats on the instance.

Loops doesn't need to 'slay tiktok' Loops just needs to grow organically and support the niches that feel like using it, and it can take the time to do that at any pace. Its success is not determined by user numbers or series rounds.

I don't like to produce short form video, but its been nice to now follow a few people on Loops from Mastodon. Its nice that the Fediverse allows multiple forms of expression.


> People enjoy short form video, people should be able to enjoy things they like with dignity, which is in extremely short supply on algorithm and advert driven social media.

People “enjoy” heroin, crack cocaine, fentanyl too, should they “enjoy” them with dignity too?


Yes, of course. That’s what Portugal’s drug policy did. By allowing a path for doing hard drugs safely and with dignity, you also allow a path for conversation, getting help, and leaving them behind.


So the argument here is like a harm reduction argument, as in the decriminalization/legalization of hard drugs?

That seems a little uncharitable, though it does get me wondering... Imagine if benevolent hackers took over The Algorithm. What might they achieve?

They could promote high quality educational content, as is done in certain other nations.

They could utilize the companies' infinite knowledge of Skinner box mechanics to discourage and even break screen addiction, rather than cultivate it.

The possibilities are endless.

Any volunteers? ;)


yes, a lot of the issues around this come because of the lack of dignity.


Both in the case of drugs and short form vertical video.

There's a lot of stuff which may loosely be termed "vices", e.g. alcohol and gambling, which have the property:

- many people never touch

- many people indulge without significant harm, getting enjoyment from the process

- some people over indulge messily

- a few people get their lives completely ruined, or ruin the lives of those around them

Then there's an uncomfortable, unreconcilable tension between the desire to punish/prevent the last group by banning the thing, versus the second group entirely reasonably saying that it's not a problem for them.


To be fair, sobriety has the same property; so does feature-length landscape-oriented cinema; so does involvement in religious and political affairs.

Many things that people get up to ostensibly "of their own accord" have these four groups of outcomes, in different proportions. Makes you figure.

I'm of the opinion that the main problem has always been the increasing powerlessness of the individual in the face of mass social phenomena that camouflage as "your life now" but are instead someone's viral PR campaign. In Germany this stuff passed in 10ish years, in Russia it passed in 80ish; California still countin'


> heroin ... with dignity

That discussion is already over since, what... 20 years? Heroin addicts get their fix from the state, with tax payer money, in many many countries these days. I can see the line waiting in front of my pharmacy every day in the morning...


Yes.


I love love love how you deny people both their enjoyment and their dignity. You are a truly moral person; I hope you have numerous progeny.


> I love love love how you deny people both their enjoyment and their dignity. You are a truly moral person; I hope you have numerous progeny.

Only enjoyment from drugs, they're free to do it with dignity without enjoyment.


The point is, you don't have a say in this.


Sure I do. I don’t have a say in how they spend their time, but if I catch a whiff that someone is doing hard drugs for fun then I’m going to treat them differently than someone addicted and going through a rehab.


Would you prefer that they do it without dignity?


I would prefer they don't do it at all, unless there's a medical urgency.


Ah yes, a reply in true hacker fashion, if people only were that binary. Just don't use, then addiction wouldn't be a thing! Problem solved. We can see it all around, the now 55 years war on drugs has been a great success!

I'm not sure if you ever had to deal with someone addicted close to you, but it is heartbreaking. They are already ashamed of themselves and suffer. The last thing you want to do is take away their dignity, because that shuts them out and puts the path to recovery even further. They are still humans you know, just with a problem. They need help, not a trashing. That they are already doing to themselves.


That’s what I said, though? If they’re addicted and working on their addiction – there’s a medical reason why they do it. If they’re shooting heroin for fun, then they’ll get nothing but scorn from me.


Yes.


It is seen as some teen trend because... brankly... most users are rather infantil.


Down for me. Happy holidays!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: