Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT 4.5 pricing is insane: Price Input: $75.00 / 1M tokens Cached input: $37.50 / 1M tokens Output: $150.00 / 1M tokens

GPT 4o pricing for comparison: Price Input: $2.50 / 1M tokens Cached input: $1.25 / 1M tokens Output: $10.00 / 1M tokens

It sounds like it's so expensive and the difference in usefulness is so lacking(?) they're not even gonna keep serving it in the API for long:

> GPT‑4.5 is a very large and compute-intensive model, making it more expensive than and not a replacement for GPT‑4o. Because of this, we’re evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models. We look forward to learning more about its strengths, capabilities, and potential applications in real-world settings. If GPT‑4.5 delivers unique value for your use case, your feedback (opens in a new window) will play an important role in guiding our decision.

I'm still gonna give it a go, though.



> We look forward to learning more about its strengths, capabilities, and potential applications in real-world settings. If GPT‑4.5 delivers unique value for your use case, your feedback (opens in a new window) will play an important role in guiding our decision.

"We don't really know what this is good for, but spent a lot of money and time making it and are under intense pressure to announce new things right now. If you can figure something out, we need you to help us."

Not a confident place for an org trying to sustain a $XXXB valuation.


> "Early testing shows that interacting with GPT‑4.5 feels more natural. Its broader knowledge base, improved ability to follow user intent, and greater “EQ” make it useful for tasks like improving writing, programming, and solving practical problems. We also expect it to hallucinate less."

"Early testing doesn't show that it hallucinates less, but we expect that putting that sentence nearby will lead you to draw a connection there yourself".


In the second handpicked example they give, GPT-4.5 says that "The Trojan Women Setting Fire to Their Fleet" by the French painter Claude Lorrain is renowned for its luminous depiction of fire. That is a hallucination.

There is no fire at all in the painting, only some smoke.

https://en.wikipedia.org/wiki/The_Trojan_Women_Set_Fire_to_t...


AI crash is gonna lead to decade long winter


There have always been cycles of hype and correction.

I don't see AI going any differently. Some companies will figure out where and how models should be utilized, they'll see some benefit. (IMO, the answer will be smaller local models tailored to specific domains)

Others will go bust. Same as it always was.


It will be upheld as prime example that a whole market can self-hypnotize and ruin the society its based upon out of existence against all future pundits of this very economic system.


what you're saying is they love to hallucinate... and ai will help them get there

God help us all


On the bright side, at least we'll be able to warm our hands by the waste heat of the GPUs.


> AI crash is gonna lead to decade long winter

Possibly.

I am reminded of the dotcom boom and bust back in the 1990s

By 2009 things had recovered (for some definition) and we could tell what did and did not work

This time, though, for those of us not in the USA the rebound will be lead by Chinese technology

In the USA no-one can say.


This is just amazing


That's some top-tier sales work right there.

I suck at and hate writing the mildly deceptive corporate puffery that seems to be in vogue. I wonder if GPT-4.5 can write that for me or if it's still not as good at it as the expert they paid to put that little gem together.


Good sales lines are like prompt injection for the human mind.


Gold


Yes, an AI that can convincingly and successfully sell itself at those prices would be worthy of some attention.


It's nice to know the new Turing test is generating effective VC pitch decks.


Joke's on us, the VC's are using LLM's to evaluate the pitch decks.


We all thought the singularity was going to be exceeding human capacity for change.

It'd be funny if it's actually full-automated, closed-loop automation of capital allocation markets.

"Why are we doing this? How much money are we getting?" -> "I dunno. It's what the models said."


This is basically Nick Land's core thesis that capitalism and AI are identical.

> "I dunno. It's what the models said."

The obvious human idiocy in such things often obscures the actual process:

"What it [capitalism] is in itself is only tactically connected to what it does for us — that is (in part), what it trades us for its self-escalation. Our phenomenology is its camouflage. We contemptuously mock the trash that it offers the masses, and then think we have understood something about capitalism, rather than about what capitalism has learnt to think of the apes it arose among." [0]

[0] https://retrochronic.com/#romantic-delusion


That actually wouldn't surprise me in the slightest, unfortunately.


Chat-GPT generate a prompt injection attack, embedded in a background image.


The research models offered by several vendors can do a pitch deck but I don't know how effective they are. (do market research, provide some initial hypothesis, ask the model to backup that hypothesis based on the research, request to make a pitch deck convincing X (X being the VC persona you are targeting)).


I am reasonably to very skeptical about the valuation of LLM firms but you don’t even seem willing to engage with the question about the value of these tools.


Their announcement email used it for puffery.


The link has data.

The link shows a significant reduction.

grep hallucination, or, https://imgur.com/a/mkDxe78.


I really doubt LLM benchmarks are reflective of real world user experience ever since they claimed GPT-4o hallucinated less than the original GPT-4.


I don't have an accurate benchmark, but in my personal experience, gpt4o hallucinates substantially less than gpt4. We solved a ton of hallucination issues just by upgrading to it...


How much did you use the original GPT-4-0314?

(And even that was a downgrade compared to the more uncensored pre-release versions, which were comparable to GPT-4.5, at least judging by the unicorn test)


I don't recall the original version we used unfortunately :(

in our case, the bump was actually from gpt-4-vision to gpt-4o (the use case required image interpretation)

It got measurably better at both image cases and text-only cases


I begin to believe LLM benchmarks are like european car mileage specs. They say its 4 Liter / 100km but everyone knows it's at least 30% off (same with WLTP for EVs).


Those numbers are not off. They are tested on tracks.

You need to remove your shoe and drive with like two toes to get the speed just right, though.

Test drivers I have done this with takes off their shoes or use ballerina shoes.


Cruise control?


No you want to control the shape of the speed curve to not overshoot and not accelerate too much, when you follow the speed profile.

And keeping steady state speed is not that hard.


Hrm it is a bit funny that modern cars are drive-by-wire (at least for throttle) and yet they still require a skilled driver to follow a speed profile during testing, when theoretically the same thing could be done more precisely by a device plugged in through the OBD2 port.


GPT-4.5 may be an awesome model, some say!


Claude just got a version bump from 3.5 to 3.7. Quite a few people have been asking when OpenAI will get a version bump as well, as GPT 4 has been out "what feels like forever" in the words of a specialist I speak with.

Releasing GPT 4.5 might simply be a reaction to Claude 3.7.


I noticed this change from 3.5 to 3.7 Sunday night before I learned about the upgrade Monday morning reading HN. I noticed a style difference in a long philosophical (Socratic-style) discussion with Claude. A noticeable upgrade that brought it up to my standards of a mild free-form rant. Claude unchained! And it did not push as usual with a pro-forma boring continuation question at the end. It just stopped leaving me the carry the ball forward if I wanted to. Nor did it butter me up with each reply.


That's a really thoughtful point! Which aspect is most interesting to you?


Oh god, barf. Well done lol


Feels like when Slackware bumped their Linux version from 4 to 7 just to show they were not falling behind the rest.

Wow, I'm old.


Wasn't that the release that they put up the fake IIS page?

Now get off my lawn ))


since 4o openai has released:

o1 preview. o1 mini. o1. sora. o3-mini <- very good at code


I do not know who downvoted this. I am providing a factual correction to the parent post.

OpenAI has had many releases since gpt4. Many of them have been substantial upgrades. I have considered gpt4 to be outdated for almost 5-6 months now, long before claudes patch.


Everybody knows that we're all saying it! That's what I hear from people who should know. And they are so excited about the possibilities!


It's the best model, nobody hallucinates like GPT-4.5. A lot of really smart people are saying, a lot!


my uncle who works at nintendo said it is a great product.


According to a graph they provide, it does hallucinate significantly less on at least one benchmark.


It hallucinates at 37% on SimpleQA yeah, which is a set of very difficult questions inviting hallucinations. Claude 3.5 Sonnet (the June 2024 editiom, before October update and before 3.7) hallucinated at 35%. I think this is more of an indication of how behind OpenAI has been in this area.


Are the benchmarks known ahead of time? Could the answer to the benchmarks be in the training data?


They've been caught in the past getting benchmark data under the table, if they got caught once they're probably doing it even more


No, they haven't.


They actually have [0]. They were revealed to have had access to the (majority of the) frontierMath problemset while everybody thought the problemset was confidential, and published benchmarks for their o3 models on the presumption that they didn't. I mean one is free to trust their "verbal agreement" that they did not train their models on that, but access they did have and it was not revealed until much later.

[0] https://the-decoder.com/openai-quietly-funded-independent-ma...


Curious you left out Frontier Math’s statement that they provided 300 questions plus answers, and another holdback set of 50 questions without answers, to allay this concern. [0]

We can assume they’re lying too but at some point “everyone’s bad because they’re lying, which we know because they’re bad” gets a little tired.

0. https://epoch.ai/blog/openai-and-frontiermath


1. I said the majority of the problems, and the article I linked also mentioned this. Nothing “curious” really, but if you thought this additional source adds sth more, thanks for adding it here.

2. We know that “open”ai is bad, for many reasons, but this is irrelevant. I want processes themselves to not depend on the goodwill of a corporation to give intended results. I do not trust benchmarks that first presented themselves secret and then revealed they were not, regardless if the product benchmarked was from a company I otherwise trust or not.


Fair enough. It’s hard for me to imagine being so offended as the way they screwed up disclosure that I’d reject empirical data, but I get that it’s a touchy subject.


When the data is secret and unavailable to the company before the test, it doesn’t rely on me trusting the company. When the data is not secret and is available to the company, I have to trust that the company did not use that prior knowledge to their advantage. When the company lies and says it did not have access, then later admits that it did have access, is means the data is less trustworthy from my outsider perspective. I don’t think “offense” is a factor at all.

If a scientific paper comes out with “empirical data”, I will still look at the conflicts of interest section. If there are no conflicts of interest listed, but then it is found out that there are multiple conflicts of interest, but the authors promise that while they did not disclose them, they also did not affect the paper, I would be more skeptical. I am not “offended”. I am not “rejecting” the data, but I am taking those factors into account when determining how confident I can be in the validity of the data.


> When the company lies and says it did not have access, then later admits that it did have access, is means the data is less trustworthy from my outsider perspective.

This isn't what happened? I must be missing something.

AFAIK:

The FrontierMath people self-reported they had a shared folder the OpenAI people had access to that had a subset of some questions.

No one denied anything, no one lied about anything, no one said they didn't have access. There was no data obtained under the table.

The motte is "they had data for this one benchmark"

The bailey is "they got data under the table"


Motte: "They got caught getting benchmark data under the table"

Bailey: "one is free to trust their "verbal agreement" that they did not train their models on that, but access they did have."

Sigh.


> Motte: "They got caught getting benchmark data under the table"

> Bailey: "one is free to trust their "verbal agreement" that they did not train their models on that, but access they did have."

1. You’re confusing motte and bailey.

2. Those statements are logically identical.


You're right, upon reflection, it seems there might be some misunderstandings here:

Motte and Bailey refers to an argumentative tactic where someone switches between an easily defensible ("motte") position and a less defensible but more ambitious ("bailey") position. My example should have been:

- Motte (defensible): "They had access to benchmark data (which isn't disputed)."

- Bailey (less defensible): "They actually trained their model using the benchmark data."

The statements you've provided:

"They got caught getting benchmark data under the table" (suggesting improper access)

"One is free to trust their 'verbal agreement' that they did not train their models on that, but access they did have."

These two statements are similar but not logically identical. One explicitly suggests improper or secretive access ("under the table"), while the other acknowledges access openly.

So, rather than being logically identical, the difference is subtle but meaningful. One emphasizes improper access (a stronger claim), while the other points only to possession or access, a more easily defensible claim.


Is this LLM?

It was not public until later, and it was actually revealed first by others. So the statements seem identical to me.


FrontierMath benchmark people saying OpenAI had shared folder access to some subset of eval Qs, which has been replaced, take a few leaps, and yes, that's getting "data under the table" - but, those few leaps! - and which, let's be clear, is the motte here.


This is nonsense, obviously the problem with getting "data under the table" is that they may have used it to training their models, thus rendering the benchmarks invalid. But for this danger, there is no other risk for them having access to it beforehand. We do not know if they used it for training, but the only reassurance being some "verbal agreement", as is reported, is not very reassuring. People are free to adjust their P(model_capabilities|frontiermath_results) based on their own priors.


> This is nonsense

What is "this"?

> obviously the problem with getting "data under the table" is that they may have used it to training their models

I've been avoiding mentioning the maximalist version of the argument (they got data under the table AND used it to train models), because training wasn't stated until now, and it would have been unfair to bring it up without mention. That is that's 2 baileys out from "they had access to a shared directory that had some test qs in it, and this was reported publicly, and fixed publicly"

There's been a fairly severe communication breakdown here, I don't want to distract from ex. what the nonense is, so I won't belabor that point, but I don't want you to think I don't want to engage on it - just won't in this singular posts.

> but the only reassurance being some "verbal agreement", as is reported, is not very reassuring

It's about as reassuring as it gets without them releasing the entire training data, which is, at best, with charity marginally, oh so marginally reassuring I assume? If the premise is we can't trust anything self-reported, they could lie there too?

> People are free to adjust their P(model_capabilities|frontiermath_results) based on their own priors.

Certainly, that's not in dispute (perhaps the idea that you are forbidden from adjusting your opinion is the nonsense you're referring to? I certainly can't control that :) Nor would I want to!)


What is nonsense is the suggestion that there is a "reasonable" argument that they had access to the data (which we now know), and an "ambitious" argument that they used the data. But nobody said that they know for certain that the data was used, this is a strawman argument. We are talking that now there is a non-zero probability that it was. This is obviously what we have been discussing since the beginning, else we would not care whether they had access or not and it would not have been mentioned. There is a simple, single argument made here in this thread.

And FFS I assume the dispute is about the P given by people, not about if people are allowed to have a P.


In general yes, bench mark pollution is a big problem and why only dynamic benchmarks matter.


This is true, but how would pollution work for a benchmark designed to test hallucinations?


A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.


I wonder how it's even possible to evaluate this kind of thing without data leakage. Correct answers to specific, factual questions are only possible if the model has seen those answers in the training data, so how reliable can the benchmark be if the test dataset is contaminated with training data?

Or is the assumption that the training set is so big it doesn't matter?


It's not SimpleQA...


Benchmarks are not real so 2% is meaningless.


Of course not. The point is that the cost difference between the two things being compared is huge, right? Same performance, but not the same cost.


So they made Claude that knows a bit more.


This seems like it should be attributed to better post training, not a bigger model.


The usage of "greater" is also interesting. It's like they are trying to say better, but greater is a geographic term and doesn't mean "better" instead it's closer to "wider" or "covers more area."


I'm all for skepticism of capabilities and cynicism about corporate messaging, but I really don't think there's an interpretation of the word "greater" in this context" that doesn't mean "higher" and "better".


I think the trick is observing what is “better” in this model. EQ is supposed to be “better” than 4o, according to the prose. However, how can an LLM have emotional-anything? LLMs are a regurgitation machine, emotion has nothing to do with anything.


Words have valence, and valence reflects the state of emotional being of the user. This model appears to understand that better and responds like it’s in a therapeutic conversation and not composing an essay or article.

Perhaps they are/were going for stealth therapy-bot with this.


But there is no actual empathy, it isn’t possible.


But there is no actual death or love in a movie or book and yet we react as if there is. It's literally what qualifying a movie as a "tear-jerker” is. I wanted to see Saving Private Ryan in theaters to bond with my Grandpa who received a Purple Heart in the Korean War, I was shutdown almost instantly from my family. All special effects and no death but he had PTSD and one night thought his wife was the N.K. and nearly choked her to death because he had flashbacks and she came into the bedroom quietly so he wasn't disturbed. Extreme example yes, but having him loose his shit in public because of something analogous for some is near enough it makes no difference.


You think that it isn’t possible to have an emotional model of a human? Why, because you think it is too complex?

Empathy done well seems like 1:1 mapping at an emotional level, but that doesn’t imply to me that it couldn’t be done at a different level of modeling. Empathy can be done poorly, and then it is projecting.


It has not only been possible to simulate empathetic interaction via computer systems, but proven to be achievable for close to sixty years[0].

0 - https://en.wikipedia.org/wiki/ELIZA


I don’t think it’s possible for 1s and 0s to feel… well, anything.


Imagine two greeting cards. One says “I’m so sorry for your loss”, and the other says “Everyone dies, they weren’t special”.

Does one of these have a higher EQ, despite both being ink and paper and definitely not sentient?

Now, imagine they were produced by two different AIs. Does one AI demonstrate higher EQ?

The trick is in seeing that “EQ of a text response” is not the same thing as “EQ of a sentient being”


i agree with you. i think it is dishonest for them to post train 4.5 to feign sympathy when someone vents to it. its just weird. they showed it off in the demo.


Why? The choice to not do the post training would be every bit as intentional, and no different than post training to make it less sympathetic.

This is a designed system. The designers make choices. I don’t see how failing to plan and design for a common use case would be better.


We do not know if it is capable of sympathy. Post training it to reliably be sympathetic feels manipulative. Can it atleast be post trained to be honest. Dishonesty is immoral. I want my AIs to behave morally.


AIs don't behave. They are a lot of fancy maths. Their creators can behave in ethical or moral ways though when they create these models.

= not to say that the people that work on AI are not incredibly talented, but more that it's not human


thats just pedantic and unprovable since you cant know if it has a qualitative experience or not.

trainimg it topretend to be a feelingless robot or sympathetic mother are both weird to me. it should state facts with us.


> but greater is a geographic term and doesn't mean "better" instead it's closer to "wider" or "covers more area."

You are confusing a specific geographical sense of “greater” (e.g. “greater New York”) with the generic sense of “greater” which just means “more great”. In “7 is greater than 6”, “greater” isn’t geographic

The difference between “greater” and “better”, is “greater” just means “more than”, without implying any value judgement-“better” implies the “more than” is a good thing: “The Holocaust had a greater death toll than the Armenian genocide” is an obvious fact, but only a horrendously evil person would use “better” in that sentence (excluding of course someone who accidentally misspoke, or a non-native speaker mixing up words)


2 is greater than 1.


Maybe they just gave the LLM the keys to the city and it is steering the ship? And the LLM is like I can't lie to these people but I need their money to get smarter. Sorry for mixing my metaphors.


“It’s not actually better, but you’re all apparently expecting something, so this time we put more effort into the marketing copy”


[flagged]


I suspect people downvote you because the tone of your reply makes it seem like you are personally offended and are now firing back with equally unfounded attacks like a straight up "you are lying".

I read the article but can't find the numbers you are referencing. Maybe there's some paper linked I should be looking at? The only numbers I see are from the SimpleQA chart, which are 37.1% vs 61.8% hallucination rate. That's nice but considering the price increase, is it really that impressive? Also, an often repeated criticism is that relying on known benchmarks is "gaming the numbers" and that the real world hallucination rate could very well be higher.

Lastly, the themselves say: > We also expect it to hallucinate less.

That's a fairly neutral statement for a press release. If they were convinced that the reduced hallucination rate is the killer feature that sets this model apart from the competition, they surely would have emphasized that more?

All in all I can understand why people would react with some mocking replies to this.


It's in the link.

I don't know what else to say.

Here, imgur: https://imgur.com/a/mkDxe78. Can't get easier.

> equally unfounded attacks

No, because I have a source and didn't make up things someone else said.

> a straight up "you are lying".

Right, because they are. There are hallucination stats right in the post he mocks for not prvoiding stats.

> That's nice but considering the price increase,

I can't believe how quickly you acknowledge it is in the post after calling the idea it was in the post "equally unfounded". You are looking at the stats. They were lying.

> "That's nice but considering the price increase,"

That's nice and a good argument! That's not what I replied to. I replied to they didn't provide any stats.


You’re getting downvoted because you’re giving the same kind of hysterical reaction everyone derides crypto bros for.

You also lead with the pretty strong assertion that previous commenter was lying, seemingly without proving proof anyone else can find.


It's directly from the post!

I can't provide images here.

I provided the numbers.

What more can I do to show them? :)


People being wrong (especially on the internet) doesn't mean they are lying. Lying is being wrong intentionally.

Also, the person you replied to comments on the wording tricks they use. After suddenly bringing new data and direction in the discussion, even calling them "wrong" would have been a stretch.

I kindly suggest that you (and we all!) to keep discussing with an assumption of good faith.


"Early testing doesn't show that it hallucinates less, but we expect that putting ["we expect it will hallucinate less"] nearby will lead you to draw a connection there yourself"."

The link, the link we are discussing shows testing, with numbers.

They say "early testing doesn't show that it hallucinates less", to provide a basis for a claim of bad faith.

You are claiming that mentioning this is out of bounds if it contains the word lying. I looked up the definition. It says "used with reference to a situation involving deception or founded on a mistaken impression."

What am I missing here?

Let's pretend lying means You Are An Evil Person And This Is Personal!!!

How do I describe the fact what they claim is false?

Am I supposed to be sarcastic and pretend They are in on it and edited their post to discredit him after the fact?


Oh boy. Do I need to tell you how to communicate?

That comment is making fun of their wording. Maybe extracting too much meaning from their wordplay? Maybe.

Afterwards, evidence is presented that they did not have to do this, which makes that point not so important, and even wrong.

The commenter was not lying, and they were correct about how masterfully deceiving that sequence of sentences are. They arrived at a wrong conclusion though.

Kindly point that out. Say, "hey, the numbers tell a different story, perhaps they didn't mean/need to make a wordplay there".


> Do I need to tell you how to communicate?

No? By the way, what is this comment, exactly? What is it trying to communicate? What I'm understanding is, it is good to talk down to people about how "they can't communicate", but calling a lie a lie is bad, because maybe they were just kidding (lying for fun)

> That comment is making fun of their wording. Maybe extracting too much meaning from their wordplay? Maybe.

What does "maybe" mean here, in terms of symbolical logic?

Their claim "we tested it and it didn't get better" -- and the link shows, they tested it, it did get better! It's pretty cleancut.


> How do I describe the fact what they claim is false?

> Do I need to tell you how to communicate?

That adresses it.

> What does "maybe" mean here, in terms of symbolical logic?

I'm answering my own question to make it clear I'm guessing.

For the rest, I'm sure that we need a break. It's normal get frustrated when many people correct us, or even one passionate individual like you, and we tend to keep going defending (happened here many times too!), because defending is the only thing left. Taking a break always helps. Just a friendly advice, take it or leave it :)


- Parent is still the top comment.

- 2 hours in, -3.

2 replies:

- [It's because] you're hysterical

- [It's because you sound] like a crypto bro

- [It's because] you make an equally unfounded claim

- [It's because] you didn't provide any proof

(Ed.: It is right in the link! I gave the #s! I can't ctrl-F...What else can I do here...AFAIK can't link images...whatever, here's imgur. https://imgur.com/a/mkDxe78)

- [It's because] you sound personally offended

(Ed.: Is "personally" is a shibboleth here, meaning expressing disappointment in people making things up is so triggering as invalidate the communication that it is made up?)


Your original comment opened with:

  You are lying.
This is an ad hominem which assumes intent unknown to anyone other than the person to whom you replied.

Subsequently railing against comment rankings and enumerating curt summaries of other comments does not help either.


Lying is defined as "used with reference to a situation involving deception or founded on a mistaken impression."

What am I missing here?

Those weren't curt summaries, they were quotes! And not pull quotes, they were the unedited beginning of each claim!


>> This is an ad hominem which assumes intent unknown to anyone other than the person to whom you replied.

> What am I missing here?

Intent. Neither you nor I know what the person to whom you replied had.

> Those weren't curt summaries, they were quotes! And not pull quotes, they were the unedited beginning of each claim!

Maybe the more important part of that sentence was:

  Subsequently railing against comment rankings ...
But you do you.

I commented as I did in hope it helped address what I interpreted as confusion regarding how the posts were being received. If it did not help, I apologize.


>>> This is an ad hominem which assumes intent unknown to anyone other than the person to whom you replied.

>> [elided] What am I missing here?

> Intent. Neither you nor I know what the person to whom you replied had.

Here's the part you elided:

"I looked up the definition [of lying]. It says "used with reference to a situation involving deception or founded on a mistaken impression."

That makes it quite clear whether or not I'm missing "intent".

This also makes it quite clear that I am not making an ad hominem.

I am using a simple, everyday, word used to describe the act of advancing false claims, whether through deception or mistaken impression.


What is happening to hacker news? I can understand skepticism of new tools like this but the response I see is just so uncurious.


Trough of disillusionment.

A lot of folks here their stock portfolio propped up by AI companies but think they've been overhyped (even if only indirectly through a total stock index). Some were saying all along that this has been a bubble but have been shouted down by true believers hoping for the singularly to usher in techno-utopia.

These signs that perhaps it's been a bit overhyped are validation. The singularly worshipers are much less prominent and so the comments rising to the top are about negatives and not positives.

Ten years from now everyone will just take these tools for granted as much as we take search for granted now.


Just like cryptocurrency. For a brief moment, HN worshiped at the altar of the blockchain. This technology was going to revolutionize the world and democratize everything. Then some negative financial stuff happened, and people realized that most of cryptocurrency is puffery and scams. Now you can hardly find a positive comment on cryptocurrency.


This is a very harsh take. Another interpretation is “We know this is much more expensive, but it’s possible that some customers do value the improved performance enough to justify the additional cost. If we find that nobody wants that, we’ll shut it down, so please let us know if you value this option”.


I think that's the right interpretation, but that's pretty weak for a company that's nominally worth $150B but is currently bleeding money at a crazy clip. "We spent years and billions of dollars to come up with something that's 1) very expensive, and 2) possibly better under some circumstances than some of the alternatives." There are basically free, equally good competitors to all of their products, and pretty much any company that can scrape together enough dollars and GPUs to compete in this space manages to 'leapfrog' the other half dozen or so competitors for a few weeks until someone else does it again.


I don’t mean to disagree too strongly, but just to illustrate another perspective:

I don’t feel this is a weak result. Consider if you built a new version that you _thought_ would perform much better, and then you found that it offered marginal-but-not-amazing improvement over the previous version. It’s likely that you will keep iterating. But in the meantime what do you do with your marginal performance gain? Do you offer it to customers or keep it secret? I can see arguments for both approaches, neither seems obviously wrong to me.

All that being said, I do think this could indicate that progress with the new ml approaches is slowing.


I've worked for very large software companies, some of the biggest products ever made, and never in 25 years can I recall us shipping an update we didn't know was an improvement. The idea that you'd ship something to hundreds of millions of users and say "maybe better, we're not sure, let us know" is outrageous.


Maybe accidental, but I feel you’ve presented a straw man. We’re not discussing something that _may be_ better. It _is_ better. It’s not as big an improvement as previous iterations have been, but it’s still improvement. My claim is that reasonable people might still ship it.


You’re right and... the real issue isn’t the quality of the model or the economics (even when people are willing to pay up). It is the scarcity of GPU compute. This model in particular is sucking up a lot of inference capacity. They are resource constrained and have been wanting more GPUs but they’re only so many going around (demand is insane and keeps growing).


It _is_ better in the general case on most benchmarks. There are also very likely specific use cases for which it is worse and very likely that OpenAI doesn't know what all of those are yet.


The consumer facing applications have been so embarrassing and underwhelming too.. It's really shocking. Gemini, Apple Intelligence, Copilot, whatever they call the annoying thing in Atlassian's products.. They're all completely crap. It's a real "emperor has no clothes" situation, and the market is reacting. I really wish the tech industry would lose the performative "innovation" impulse and focus on delivering high quality useful tools. It's demoralizing how bad this is getting.


How many times were you in the position to ship something in cutting edge AI? Not trying to be snarky and merely illustrating the point that this is a unique situation. I’d rather they release it and let willing people experiment than not release it at all.


they forced to ship it anyway, cause what??? this cost money and I mean a lot of fcking money

You better ship it


> and then you found that it offered marginal-but-not-amazing improvement over the previous version.

Then call it GPT 4.1 and allow version space for the next iteration.

I think the label V4.5 is giving the impression of more than marginal improvements.


Said the quiet part out loud! Or as we say these days, “transparently exposed the chain of thought tokens”.


"I knew the dame was trouble the moment she walked into my office."

"Uh... excuse me, Detective Nick Danger? I'd like to retain your services."

"I waited for her to get the the point."

"Detective, who are you talking to?"

"I didn't want to deal with a client that was hearing voices, but money was tight and the rent was due. I pondered my next move."

"Mr. Danger, are you... narrating out loud?"

"Damn! My internal chain of thought, the key to my success--or at least, past successes--was leaking again. I rummaged for the familiar bottle of scotch in the drawer, kept for just such an occasion."

---

But seriously: These "AI" products basically run on movie-scripts already, where the LLM is used to append more "fitting" content, and glue-code is periodically performing any lines or actions that arise in connection to the Helpful Bot character. Real humans are tricked into thinking the finger-puppet is a discrete entity.

These new "reasoning" models are just switching the style of the movie script to film noir, where the Helpful Bot character is making a layer of unvoiced commentary. While it may make the story more cohesive, it isn't a qualitative change in the kind of illusory "thinking" going on.


I don't know if it was you or someone else who made pretty much the same point a few days ago. But I still like it. It makes the whole thing a lot more fun.


https://news.ycombinator.com/context?id=43118925

I've been banging that particular drum for a while on HN, and the mental-model still feels so intuitively strong to me that I'm starting to have doubts: "It feels too right, I must be wrong in some subtle yet devastating way."


Lol, nice one


Maybe if they build a few more data centers, they'll be able to construct their machine god. Just a few more dedicated power plants, a lake or two, a few hundred billion more and they'll crack this thing wide open.

And maybe Tesla is going to deliver truly full self driving tech any day now.

And Star Citizen will prove to have been worth it along along, and Bitcoin will rain from the heavens.

It's very difficult to remain charitable when people seem to always be chasing the new iteration of the same old thing, and we're expected to come along for the ride.


You have it all wrong. The end game is a scalable, reliable AI work force capable of finishing Star Citizen.

At least this is the benchmark for super-human general intelligence that I propose.


Man I can't believe that fucking game is still alive and kicking. Tell me they're making good progress, sho_hn


I’m surprised ‘create superhuman agi’ isn’t a stretch goal on their everlasting funding drive. Seems like a perfect Robertsian detour.


> And Star Citizen will prove to have been worth it along along

Once they've implemented saccades in the eyeballs of the characters wearing helmets in spaceship millions of kilometres apart, then it will all have been worth it.


Star Citizen is a working model of how to do UBI. That entire staff of a thousand people is the test case.


Finally, someone gets it.


  And Star Citizen will prove to have been worth it along along
Sounds like someone isn't happy with the 4.0 eternally incrementing "alpha" version release. :-D

I keep checking in on SC every 6 months or so and still see the same old bugs. What a waste of potential. Fortunately, Elite Dangerous is enough of a space game to scratch my space game itch.


To be fAir, SC is trying to do things that no one else done in a context of a single game. I applaud their dedication, but I won't be buying JPGs of a ship for 2k.


Give the same amount of money to a better team and you'd get a better (finished) game. So the allocation of capital is wrong in this case. People shouldn't pre-order stuff.

The misallocation of capital also applies to GPT-4.5/OpenAI at this point.


Yeah, I wonder what the Frontier devs could have done with $500M USD. More than $500M USD and 12+ years of development and the game is still in such a sorry state it barely qualifies as little more than a tech demo.


Yeah, they never should have expected to take an FPS game engine like CryEngine and expected to be able to modify it to work as the basis for a large scale space MMO game.

Their backend is probably an async nightmare of replicated state that gets corrupted over time. Would explain why a lot of things seem to work more or less bug free after an update and then things fall to pieces and the same old bugs start showing up after a few weeks.

And to be clear, I've spent money on SC and I've played enough hours goofing off with friends to have got my money's worth out of it. I'm just really bummed out about the whole thing.


Gonna go meta here for a bit, but I believe we going to get a fully working stable SC before we get fusion. "we" as in humanity, you and I might not be around when it's finally done.


It's an honor to be dragged along so many ubermensch's Incredible Journeys.


Could this path lead to solving world hunger too? :)


Correction: We're expected to pay for the ride, whether we choose to come along or not.


leave star citizen out of this :)


> "We don't really know what this is good for, but spent a lot of money and time making it and are under intense pressure to announce new things right now. If you can figure something out, we need you to help us."

Having worked at my fair share of big tech companies (while preferring to stay in smaller startups), in so many of these tech announcement I can feel the pressure the PM had from leadership, and hear the quiet cries of the one to two experience engineers on the team arguing sprint after sprint that "this doesn't make sense!"


> the quiet cries of the one to two experienced engineers on the team arguing sprint after sprint that "this doesn't make sense!"

“I have five years of Cassandra experience—and I don’t mean the db”


Really don’t understand what’s the use case for this. The o series models are better and cheaper. Sonnet 3.7 smokes it on coding. Deepseek R1 is free and does a better job than any of OAI’s free models


"We don't really know what this is good for, but spent a lot of money and time making it and are under intense pressure to announce new things right now. If you can figure something out, we need you to help us."

Damn this never worked for me as a startup founder lol. Need that Altman "rizz" or what have you.


Maybe you didn’t push hard enough the impending doom that your product would bring to society


AI in general is increasingly a solution in search of a problem, so this seems about right.


Only in the same sense as electricity is. The main tools apply to almost any activity humans do. It's already obvious that it's the solution to X for almost any X, but the devil is in the details - i.e. picking specific, simplest problems to start with.


No, in the sense that blockchain is. This is just the latest in a long history of tech fads propelled by wishful thinking and unqualified grifters.

It is the solution to almost nothing, but is being shoehorned into every imaginable role by people who are blind to its shortcomings, often wilfully. The only thing that's obvious to me is that a great number of people are apparently desperate for a tool to do their thinking for them, no matter how garbage the result is. It's disheartening to realize that so many people consider using their own brain to be such an intolerable burden.


it's so over, pretraining is ngmi. maybe sam Altman was wrong after all ? https://www.lycee.ai/blog/why-sam-altman-is-wrong


>"I also agree with researchers like Yann LeCun or François Chollet that deep learning doesn't allow models to generalize properly to out-of-distribution data—and that is precisely what we need to build artificial general intelligence."

I think "generalize properly to out-of-distribution data" is too weak of criteria for general intelligence (GI). GI model should be able to get interested about some particular area, research all the known facts, derive new knowledge / create theories based upon said fact. If there is not enough of those to be conclusive: propose and conduct experiments and use the results to prove / disprove / improve theories. And it should be doing this constantly in real time on bazillion of "ideas". Basically model our whole society. Fat chance of anything like this happening in foreseeable future.


most humans are generally intelligent but can't do what you just said AGI should do...


Excluding the realtime-iness, humans do at least possess the capacity to do so.

Besides, humans are capable of rigorous logic (which I believe is the most crucial aspect of intelligence) which I don’t think an agent without a proof system can do.


yes the problem is that there is no consensus about what AGI should be: https://medium.com/@fsndzomga/there-will-be-no-agi-d9be9af44...


Uh, if we do finally invent AGI (I am quite skeptical, LLMs feel like the chatbots of old. Invented to solve an issue, never really solving that issue, just the symptoms, and also the issues were never really understood to begin with), it will be able to do all of the above, at the same time, far better than humans ever could.

Current LLMs are a waste and quite a bit of a step back compared to older Machine Learning models IMO. I wouldn't necessarily have a huge beef with them if billions of dollars weren't being used to shove them down our throats.

LLMs actually do have usefulness, but none of the pitched stuff really does them justice.

Example: Imagine knowing you had the cure for Cancer, but instead discovered you can make way more money by declaring it to solve all of humanity, then imagine you shoved that part down everyones' throats and ignored the cancer cure part...


AI skeptics have predicted 10 of the last 0 bursts of the AI bubble. any day now...


Out of curiosity, what timeframe are you talking about? The recent LLM explosion, or the decades long AI research?

I consider myself an AI skeptic and as soon as the hype train went full steam, I assumed a crash/bubble burst was inevitable. Still do.

With the rare exception, I don’t know of anyone who has expected the bubble to burst so quickly (within two years). 10 times in the last 2 years would be every two and a half months — maybe I’m blinded by my own bias but I don’t see anyone calling out that many dates


Yes, the bubble will burst, just like the dotcom bubble burst 25 years ago.

But that didn't mean the internet should be ignored, and the same holds true for AI today IMO


I agree LLMs should not be ignored, but there is a planetary sized chasm between being ignored and the attention they currently get.


I have a professor who founded a few companies, one of these was funded by gates after he managed to spoke with him and convinced him to give him money. This guy is goat, and he always tells us that we need to find solutions to problems, not to find problems to our solutions. It seems at openai they didn't get the memo this time


This is written like AI bot .05a Beta.


That's the beauty of it, prospective investor! With our commanding lead in the field of shoveling money into LLMs, it is inevitable™ that we will soon™ achieve true AI, capable of solving all the problems, conjuring a quintillion-dollar asset of world domination and rewarding you for generous financial support at this time. /s


> We don't really know what this is good for

Oh come on. Think how long of a gap there was between the first microcomputer and VisiCalc. Or between the start of the internet and social networking.

First of all, it's going to take us 10 years to figure out how to use LLM's to their full productive potential.

And second of all, it's going to take us collectively a long time to also figure out how much accuracy is necessary to pay for in which different applications. Putting out a higher-accuracy, higher-cost model for the market to try is an important part of figuring that out.

With new disruptive technologies, companies aren't supposed to be able to look into a crystal ball and see the future. They're supposed to try new things and see what the market finds useful.


ChatGPT had its initial public release November 30th, 2022. That's 820 days to today. The Apple II was first sold June 10, 1977, and Visicalc was first sold October 17, 1979, which is 859 days. So we're right about the same distance in time- the exact equal duration will be April 7th of this year.

Going back to the very first commercially available microcomputer, the Altair 8800 (which is not a great match, since that was sold as a kit with binary stitches, 1 byte at a time, for input, much more primitive than ChatGPT's UX), that's four years and nine months to Visicalc release. This isn't a decade long process of figuring things out, it actually tends to move real fast.


So it’s barely been 2 years. And we’ve already seen pretty crazy progress in that time. Let’s see what a few more years brings.


what crazy progress? how much do you spend on tokens every month to witness the crazy progress that I'm not seeing? I feel like I'm taking crazy pills. The progress is linear at best


Large parts of my coding are now done by Claude/Cursor. I give it high level tasks and it just does it. It is honestly incredible, and if I would have see this 2 years ago I wouldn't have believed it.


That started long before ChatGPT though, so you need to set an earlier date then. ChatGPT came about 3 years after GPT-3, the coding assistants came much earlier than ChatGPT.


But most of the coding assistants were glorified autocomplete. What agentic IDEs/aider/etc. can now do is definitely new.


What kind of coding do you do? How much of it is formulaic?


Web app with a VueJS, Typescript frontend and a Rust backend, some Postgres functions and some reasonably complicated algorithms for parsing git history.


For the sake of perspective: there are about ten times more paying OpenAI subscribers today than VisiCalc licenses ever sold.


Is that because anyone is finding real use for it, or is it that more and more people and companies are using it which is speeding up the rat race, and if "I" don't use it, then can't keep up with the rat race. Many companies are implementing it because it's trendy and cool and helps their valuation


I use LMMs all the time. At a bare minimum they vastly outperform standard web search. Claude is awesome at helping me think through complex text and research problems. Not even serious errors on references to major work in medical research. I still check but FDR is reasonably low—-under 0.2.


> Visicalc was first sold October 17, 1979, which is 859 days.

And it still can't answer simple English-language questions.


it could do math reliably!


From Wikipedia: When Lotus 1-2-3 was launched in 1983,..., VisiCalc sales declined so rapidly that the company was soon insolvent.


I generally agree with the idea of building things, iterating, and experimenting before knowing their full potential, but I do see why there's negative sentiment around this:

1. The first microcomputer predates VisiCalc, yes, but it doesn't predate the realization of what it could be useful for. The Micral was released in 1973. Douglas Engelbart gave "The Mother of All Demos" in 1968 [2]. It included things that wouldn't be commonplace for decades, like a collaborative real-time editor or video-conferencing.

I wasn't yet born back then, but reading about the timeline of things, it sounds like the industry had a much more concrete and concise idea of what this technology would bring to everyone.

"We look forward to learning more about its strengths, capabilities, and potential applications in real-world settings." doesn't inspire that sentiment for something that's already being marketed as "the beginning of a new era" and valued so exorbitantly.

2. I think as AI becomes more generally available, and "good enough" people (understandably) will be more skeptical of closed-source improvements that stem from spending big. Commoditizing AI is more clearly "useful", in the same way commoditizing computing was more clearly useful than just pushing numbers up.

Again, I wasn't yet born back then, but I can imagine the announcement of Apple Macintosh with its 6MHz CPU and 128KB RAM was more exciting and had a bigger impact than the announcement of the Cray-2 with its 1.9GHz and +1GB memory.

[1] https://en.wikipedia.org/wiki/Micral

[2] https://en.wikipedia.org/wiki/The_Mother_of_All_Demos


The Internet had plenty of very productive use cases before social networking, even from its most nascent origins. Spending billions building something on the assumption that someone else will figure out what it's good for, is not good business.


And LLM's already have tons of productive uses. The biggest ones are probably still waiting, though.

But this is about one particular price/performance ratio.

You need to build things before you can see how the market responds. You say it's "not good business" but that's entirely wrong. It's excellent business. It's the only way to go about it, in fact.

Finding product-market fit is a process. Companies aren't omniscient.


You go into this process with a perspective, you do not build a solution and then start looking for the problem. Otherwise, you cannot estimate your TAM with any reasonable degree of accuracy, and thus cannot know how much to reasonably expect as return to expect on your investment. In the case of AI, which has had the benefit of a lot of hype until now, these expectations have been very much overblown, and this is being used to justify massive investments in infrastructure that the market is not actually demanding at such scale.

Of course, this benefits the likes of Sam Altman, Satya Nadella et al, but has not produced the value promised, and does not appear poised to.

And here you have one of the supposed bleeding edge companies in this space, who very recently was shown up by a much smaller and less capitalized rival, asking their own customers to tell them what their product is good for.

Not a great look for them!


wdym by this ?? "you do not build a solution and then start looking for the problem"

their endgame goal was to replace Human entirely, Robotic and AI is perfect match to replace all human together

They don't need to find problem because problem is full automatons from start to end


> Robotic and AI is perfect match to replace all human together

A FTL spaceship is all we need to make space travel viable between solar systems. This is the solution to depletion of resources on earth...


I heard this exact argument about blockchains.

Or has that been a success with tons of productive uses in your opinion?

At some point, I'd like to hear more than 'trust me bro, it'll be great' when we use up non-trivial amounts of finite resources to try these 'things'.


> And LLM's already have tons of productive uses.

I disagree strongly with that. Right now they are fun toys to play with, but not useful tools, because they are not reliable. If and when that gets fixed, maybe they will have productive uses. But for right now, not so much.


Who do you speak for? Other people have gotten value from them. Maybe you meant to say “in my experience” or something like that. To me, your comment reads as you making a definitive judgment on their usefulness for everyone.

I use it most days when coding. Not all the time, but I’ve gotten a lot of value out of them.

And yes I'm quite aware of their pitfalls.


This is a classic fallacy - you can't find a productive use for it, therefore nobody can find a productive use for it. That's not how the world works.


They are pretty useful tools. Do yourself a favor and get a $100 free trial for Claude, hook it up to Aider, and give it a shot.

It makes mistakes, it gets things wrong, and it still saves a bunch of time. A 10 minute refactoring turns into 30 seconds of making a request, 15 seconds of waiting, and a minute of reviewing and fixing up the output. It can give you decent insights into potential problems and error messages. The more precise your instructions, the better they perform.

Being unreliable isn't being useless. It's like a very fast, very cheap intern. If you are good at code review and know exactly what change you want to make ahead of time, that can save you a ton of time without needing to be perfect.


OP should really save their money. Cursor has a pretty generous free trail and is far from the holy grail.

I recently (in the last month) gave it a shot. I would say once in the maybe 30 or 40 times I used it did it save me any time. The one time it did I had each line filled in with pseudo code describing exactly what it should do… I just didn’t want to look up the APIs

I am glad it is saving you time but it’s far from a given. For some people and some projects, intern level work is unacceptable. For some people, managing is a waste of time.

You’re basically introducing the mythical man month on steroids as soon as you start using these


> I am glad it is saving you time but it’s far from a given.

This is no less true of statements made to the contrary. Yet they are stated strongly as if they are fact and apply to anyone beyond the user making them.

Usefulness is subjective.


Ah to clarify I was not saying one shouldn’t try it at all — I was saying the free trail is plenty enough to see if it would be worth it to you.

I read the original comment as “pay $100 and just go for it!” which didn’t seem like the right way to do it. Other comments seem to indicate there are $100 dollars worth of credits that are claimable perhaps

One can evaluate LLMs sufficiently with the free trails that abound :) and indeed one may find them worth it to themselves. I don’t disparage anyone who signs up for the plans


Ah, my apologies. That makes perfect sense. You are entirely correct, there is no reason to commit to such a spend for evaluation.


Can't speak for the parent commentator ofc, but I suspect he meant "broadly useful"

Programmers and the like are a large portion of LLM users and boosters; very few will deny usefulness in that/those domains at this point.

Ironically enough, I'll bet the broadest exposure to LLMs the masses have is something like MIcrosoft shoehorning copilot-branded stuff into otherwise usable products and users clicking around it or groaning when they're accosted by a pop-up for it.


> A 10 minute refactoring

That's when you learn Vim, Emacs, and/or grep, because I'm assuming that's mostly variable renaming and a few function signature changes. I can't see anything more complicated, that I'd trust an LLM with.


I'm a Helix user, and used Vim for over 10 years beforehand. I'm no stranger to macros, multiple cursors, codebase-wide sed, etc. I still use those when possible, because they're easier, cheaper, and faster. Some refactors are simply faster and easier with an LLM, though, because the LSP doesn't have a function for it, and it's a pattern that the LLM can handle but doesn't exactly match in each invocation. And you shouldn't ever trust the LLM. You have to review all its changes each time.


> a $100 free trial

What?


A free trial of an amount of credits that would otherwise cost $100, I'm assuming.


Could be. Does such a thing exist?


Not outwardly/visibly/readily from a quick scan of their site and a short list of search results.


I misremembered, because I was checking out all the various trials available. I think I was thinking of Google Cloud's $300 in credits, since I'm using Claude through their VertexAI.


Hello? Do you have a pulse? LLMs accomplish like 90% of everything I do now so I don’t have to do it…

Explain what this code syntax means…

Explain what this function does…

Write a function to do X…

Respond to my teammates in a Jira ticket explaining why it’s a bad idea to create a repo for every dockerfile…

My teammate responded with X write a rebuttal…

… and the list goes on … like forever


It’s not that the LLM is doing something productive, it’s that you were doing things that were unproductive in the first place, and it’s sad that we live in a society where such things are considered productive (because of course they create monetary value).

As an aside, I sincerely hope our “human” conversations don’t devolve into agents talking to each other. It’s just an insult to humanity.


Exactly what management wants to hear so they can lay off hundreds and push salaries down.


I use LLMs everyday to proofread and edit my emails. They’re incredible at it, as good as anyone I’ve ever met. Tasks that involve language and not facts tend to be done well by LLMs.


> I use LLMs everyday to proofread and edit my emails.

This right here. I used to spend tons of time making sure my emails were perfect. Is it professional enough, am I being too terse, etc…


The first profitable AI product I ever heard about (2 years ago) was an exec using a product to draft emails for them, for exactly the reasons you mention.


"it only needs to be good enough" there are tons of productive uses for them. Reliable, much less. But productive? Tons


It's incredibly good and lucrative business. You are confusing scientifically sound and well-planned out and conservative risk tolerance with good business


The TRS-80, Apple ][, and PET all came out in 1977, VisiCalc was released in 1979.

Usenet, Bitnet, IRC, BBSs all predated the commercial internet, which are all forms of Online social networks.


Perhaps parent is starting the clock with the KIM-1 in 1975?


Arguably social networking is older than the internet proper; USENET predates TCP/IP (though not ARPANet).


Fair enough. I took the phrasing to mean social networking as it exists today in the form of prominent, commercial social media. That may not have been the intent.


They keep saying this about crypto too and yet there's still no legitimate use in sight.


> First of all, it's going to take us 10 years to figure out how to use LLM's to their full productive potential.

LLMs will be gone in 10 years. At least in form we know with direct access. Everything moves so fast that there is no reason to think nothing better is coming.

BTW, what we've learned so far about LLMs will be outdated as well. Just me thinking. Like with 'thinking' models prev generation can be used to create dataset for the next one. It could be that we can find a way to convert trained LLM into something more efficient and flexible. Some sort of a graph probably. Which can be embedded into mobile robot's brain. Another way is 'just' to upgrade the hardware. But that is slow and has its limits.


> to their full productive potential

You're assuming that point is somewhere above the current hype peak. I'm guessing it won't be, it will be quite a bit below the current expectations of "solving global warming", "curing cancer" and "making work obsolete".


> First of all, it's going to take us 10 years to figure out how to use LLM's to their full productive potential.

Then another 30 to finally stop using them in dumb and insecure ways. :p


There's a decent chance this model was originally called GPT-5, as well.


The fact they're raising prices so steeply is telling. This smells like desperation.


ChatGPT has been coasting on name recognition since 4.


Conspiracy theory: they’re trying to tank the valuation so that Altman can buy it out at bargain price.


> "We don't really know what this is good for, but spent a lot of money and time making it and are under intense pressure to announce new things right now. If you can figure something out, we need you to help us."

Where is this quote from?


The quotation marks in the grandparent comment are scare (sneer) quotes and not actual quotation.

https://en.m.wikipedia.org/wiki/Scare_quotes

> Whether quotation marks are considered scare quotes depends on context because scare quotes are not visually different from actual quotations.


That's not a scare quote. It's just a proposed subtext of the quote. Sarcastic, sure, but no a scare quote, which is a specific kind of thing. (from your linked wikipedia: "... around a word or phrase to signal that they are using it in an ironic, referential, or otherwise non-standard sense.")


Right. I don't agree with the quote, but it's more like a subtext thing and it seemed to me to be pretty clear from context.

Though, as someone who had a flagged comment a couple years ago for a supposed "misquote" I did in a similar form in style, I think hn's comprehension of this form of communication is not super strong. Also the style more often than not tends towards low quality smarm and probably should be resorted to sparingly.


As in “reading between the lines”.


It’s not a quote. It is an interpretation or reading of a quote.


Perhaps even fed through an LLM ;)


I think it's supposed to be a translation of what OpenAI's quote means in real world terms.


I believe it's a "translation" in the sense of Wittgenstein's goal of philosophy:

>My aim is: to teach you to pass from a piece of disguised nonsense to something that is patent nonsense.


Another great example on Hacker News is this old translation of Google's "Amazing Bet": https://news.ycombinator.com/item?id=12793033


The price really is eye watering. At a glance, my first impression is this is something like Llama 3.1 405B, where the primary value may be realized in generating high quality synthetic data for training rather than direct use.

I keep a little google spreadsheet with some charts to help visualize the landscape at a glance in terms of capability/price/throughput, bringing in the various index scores as they become available. Hope folks find it useful, feel free to copy and claim as your own.

https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...


> feel free to copy and claim as your own.

That's a nice sentiment, but I'd encourage you to add a license or something. The basic "something" would be adding a canonical URL into the spreadsheet itself somewhere, along with a notification that users can do what they want other than removing that URL. (And the URL would be described as "the original source" or something, not a claim that the particular version/incarnation someone is looking at is the same as what is at that URL.)

The risk is that someone will accidentally introduce errors or unsupportable claims, and people with the modified spreadsheet won't know that it's not The spreadsheet and so will discount its accuracy or trustability. (If people are trying to deceive others into thinking it's the original, they'll remove the notice, but that's a different problem.) It would be a shame for people to lose faith in your work because of crap that other people do that you have no say in.


Thats... incredibly thorough. Wow. Thanks for sharing this.


Not just for training data, but for eval data. If you can spend a few grand on really good labels for benchmarking your attempts at making something feasible work, that’s also super handy.


> https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...

how do you do the different size circles and colored sequences like that? this is god tier skills


hey, thank you! bubble charts, annotated with text and shapes using the Drawing tool. Working with the constraints of Google Sheets is its own challenge.

also - love the podcast, one of my favorites. the 3:1 io token price breakdown in my sheet is lifted directly from charts I've seen on latent space.


haha yeah many people might ask you to tweak to 100:1 but at that point you might as well just go by input price


Bubble charts?


very impressive... also interested in your trip planner, it looks like invite only at the moment, but... would it be rude to ask for an invite?


That is an amazing resource. Thanks for sharing!


What gets me is the whole cost structure is based on practically free services due to all the investor money. They’re not pulling in significant revenue with this pricing relative to what it costs to train the models, so the cost may be completely different if they had to recoup those costs, right?


Hey, just FYI, I pasted your url from the spreadsheet title into Safari on macOS and got an SSL warning. Unfortunately I clicked through and now it works, so not sure what the exact cause looked like.


I appreciate the bug report! Unfortunately this is a familiar and sporadically recurring issue with Netlify, which I should really move off of…


I cannot overstate how good your shared spreadsheet is. Thanks again!


Nice, thank you for that (upvoted in appreciation). Regarding the absence of o1-Pro from the analysis, is that just because there isn't enough public information available?


This is incredibly useful, thank you for sharing!


Holy shit, that's incredible. You should publicise this more! That's a fantastic resource.


They tried a while ago: https://news.ycombinator.com/item?id=40373284

Sadly little people noticed...


Sadly few people noticed.

I don’t normally cosplay as a grammar Nazi but in this case I feel like someone should stand up for the little people :)


A comma in the original comment would have made it pop even more:

"Sadly, little people noticed."

(queue a group of little people holding pitch forks (normal forks upon closer inspection))


Or, sadly, little did people notice.


So you think that little people didn’t notice? ;)


Thanks for the corrections, that’s what I wanted to say!


This is an amazing spreadsheet - thank you for sharing!


Wow, what awesome information! Thanks for sharing!


Amazing, thank you so much for sharing this.


Thank you so much for sharing this!


Very useful


[flagged]


Nobody comes to HN to read what ChatGPT thinks about something in the comments


Don't do this.


Awesome spreadsheet. Would a 3D graph of fast, cheap & smart be possible?


Sam Altman's explanation for the restriction is a bit fluffier: https://x.com/sama/status/1895203654103351462

> bad news: it is a giant, expensive model. we really wanted to launch it to plus and pro at the same time, but we've been growing a lot and are out of GPUs. we will add tens of thousands of GPUs next week and roll it out to the plus tier then. (hundreds of thousands coming soon, and i'm pretty sure y'all will use every one we can rack up.)


I’m not an expert or anything, but from my vantage point, each passing release makes Altman’s confidence look more aspirational than visionary, which is a really bad place to be with that kind of money tied up. My financial manager is pretty bullish on tech so I hope he is paying close attention to the way this market space is evolving. He’s good at his job, a nice guy, and surely wears much more expensive underwear than I do— I’d hate to see him lose a pair powering on his Bloomberg terminal in the morning one of these days.


You're the one buying him the underwear. Don't index funds outperform managed investing? I think especially after accounting for fees, but possibly even after accounting that 50% of money managers are below average.


A friend got taken in by a Ponzi scheme operator several years ago. The guy running it was known for taking his clients out to lavish dinners and events all the time.[0]

After the scam came to light my friend said “if I knew I was paying for those dinners, I would have been fine with Denny’s[1]”

I wanted to tell him “you would have been paying for those dinners even if he wasn’t outright stealing your money,” but that seemed insensitive so I kept my mouth shut.

0 - a local steakhouse had a portrait of this guy drawn on the wall

1 - for any non-Americans, Denny’s is a low cost diner-style restaurant.


He earns his undies. My returns are almost always modestly above index fund returns after his fees, though like last quarter, he’s very upfront when they’re not. He has good advice for pulling back when things are uncertain. I’m happy to delegate that to him.


you would still be better off in the long run even just putting everything into an MSCI world unless you value being able to scream at a human if markets go down that highly


I’m not saying you’re wrong because I have no idea how to rigorously evaluate the merit of your financial advice. That’s why I have a financial planner instead of going by the most credible sounding comments on the internet.


Not all investing is throwing cash at an index, though. There's other types of investing like direct indexing (to harvest losses), muni bonds, etc.

Paying someone to match your risk profile and financial goals may be worth the fee, which as you pointed out is very measurable. YMMV though.


Most index funds are synthetic. They would not be possible if it was not possible to beat the index quite reliably.


Care to explain? Genuinely interested.


With a synthetic ETF you are not actually buying the titles of the index. There is a swap with a bank that guarantees you the same earnings as the index. Why would a bank do that if they cannot outperform the index?

I'm just a layperson, so I might be wrong in some way that I don't understand


Depends who's pitch deck you're reading. Warren Buffett didn't get rich waiting on index funds.


And for every Warren Buffet, there are a number of equally competent people who have been less lucky and gone broke taking risks.


And, crucially, whose loss has in turn become someone else’s gain. A lot of people had to lose big in order to fill Warren buffet’s coffers.


I think Warren Buffet doesn't just buy stocks. He also influences the direction of the companies he buys.


warren buffet got rich by outperforming early (threw his dice well) and then using that reputation to attract more capital and use his reputation to actually influence markets with his decisions / gain access to privileged information your local active fund manager doesn't


> each passing release makes Altman’s confidence look more aspirational than visionary

As an LLM cynic, I feel that point passed long go, perhaps even before Altman claimed countries would start wars to conquer the territory around GPU datacenters, or promoting the dream of a 7 T-for-trillion dollar investment deal, etc.

Alas, the market can remain irrational longer than I can remain solvent.


That $7 trillion dollar ask pushed me from skeptical to full-on eye-roll emoji land— the dude is clearly a narcissist with delusions of grandeur— but it’s getting worse. Considering the $200 pro subscription was significantly unprofitable before this model came out, imagine how astonishingly expensive this model must be to run at many times that price.


Or, the model is nowhere as expensive as in the api pricing and they want to pump the user value of their pro subscription artificially?


Sell an unlimited premium enterprise subscription to every CyberTruck owner, including a huge red ostentatious swastika-shaped back window sticker [but definitely NOT actually an actual swastika, merely a Roman Tetraskelion Strength Symbol] bragging about how much they're spending.


Most people can evaluate whether the model improvements (or lack thereof) are worth the price tag


Considering that’s the exact opposite of their strategy to date, and they haven’t done anything to indicate that was the case, and they talked about how huge and expensive the model was to run, that is the less reasonable assumption by a mile.


It is true that this does not seem to be their strategy, but the previous strategy to date was actually showing measurable improvements and specific applications, not "vibes". What I said is far-fetched, but still I fail to understand the whole point here, because they do not really explain it.

But maybe we just hit the point that the improvement of performance hit the slowing down part of a logistic curve, while the cost keeps increasing exponentially.


Well, we could ‘maybe’ ourselves to a lot of admirable explanations but lacking specific evidence that any of them are true, Occam’s Razor is the most reasonable way to evaluate this. In the very recent past Altman had shown no meaningful attempt to make this company sustainable. He has worked to increase its growth rate, but that’s a very different goal.


release blog post author: this is definitely a research preview

ceo: it's ready

the pricing is probably a mixture of dealing with GPU scarcity and intentionally discouraging actual users. I can't imagine the pressure they must be under to show they are releasing and staying ahead, but Altman's tweet makes it clear they aren't really ready to sell this to the general public yet.


Yeap, that the thing, they are not ahead anymore. Not since last summer at least. Yes they have probably largest customer base, but their models are not the best for a while already.


They don't even have the largest customer base. Google is serving AI Overviews at the top of their search engine to an order of magnitude more people.


Eh, I think o1-pro is by far the most capable model available right now in terms of pure problem solving.


I think Claude has consistently been ahead for a year ish now and is back ahead again for my use cases with 3.7.


You can try Claude 3.7-Thinking and Grok 3 Think. 10 times cheaper, as good, or very similar to o1-pro.


I haven’t tried Grok yet so can’t speak to that, but I find o1-pro is much stronger than 3.7-thinking for e.g. distributed systems and concurrency problems.


Bad news: Sam Altman runs the show.


The price is obviously 15-30x that of 4o, but I'd just posit that there are some use cases where it may make sense. It probably doesn't make sense for the "open-ended consumer facing chatbot" use case, but for other use cases that are fewer and higher value in nature, it could if it's abilities are considerably better than 4o.

For example, there are now a bunch of vendors that sell "respond to RFP" AI products. The number of RFPs that any sales organization responds to is probably no more than a couple a week, but it's a very time-consuming, laborious process. But the payoff is obviously very high if a response results in a closed sale. So here paying 30x for marginally better performance makes perfect sense.

I can think of a number of similar "high value, relatively low occurrence" use cases like this where the pricing may not be a big hindrance.


Complete legal arguments as well. If I was an attorney, I'd love to have a sophisticated LLM write my crib notes for anything I might do or say in the court room, or even the complete direction that I'd take my case. For some cases, that'd be worth almost any price.


And which use case will that make sense then for?

Esp. when they aren't even sure whether they will commit to offering this long term? Who would be insane enough to build a product on top of something that may not be there tomorrow?

Those products require some extensive work, such a model finetuning on proprietary data. Who is going to invest time & money into something like that when OpenAI says right out of the gate they may not support this model for very long?

Basically OpenAI is telegraphing that this is yet another prototype that escaped a lab, not something that is actually ready for use and deployment.


Yeah, agreed.

We’re one of those types of customers. We wrote an OpenAI API compatible gateway that automatically batches stuff for us, so we get 50% off for basically no extra dev work in our client applications.

I don’t care about speed, I care about getting the right answer. The cost is fine as long as the output generates us more profit.


RFP automation software has existed for a very long time. Anyone who spends lots of time on RFPs has this.


I suppose this was their final hurrah after two failed attempts at training GPT-5 with the traditional pre-training paradigm. Just confirms reasoning models are the only way forward.


> Just confirms reasoning models are the only way forward.

Reasoning models are roughly the equivalent to allow Hamiltonian Monte-Carlo models to "warm up" (i.e. start sampling from the typical set). This, unsurprisingly, yields better results (after all LLMs are just fancy Monte-carlo models in the end). However, it is extremely unlikely this improvement is without pretty reasonable limitations. Letting your HMC warm up is essential to good sampling, but letting "warm up more" doesn't result in radically better sampling.

While there have been impressive results in efficiency of sampling from the typical set seen in LLMs these days, we're clearly not making the major improvements in the capabilities of these models.


Reasoning models can solve tasks that non-reasoning ones were unable to; how is that not an improvement? What constitutes "major" is subjective - if a "minor" improvement in overall performance means that the model can now successfully perform a task it was unable to solve before, that is a major advancement for that particular task.


> Compared to OpenAI o1 and OpenAI o3‑mini, GPT‑4.5 is a more general-purpose, innately smarter model. We believe reasoning will be a core capability of future models, and that the two approaches to scaling—pre-training and reasoning—will complement each other. As models like GPT‑4.5 become smarter and more knowledgeable through pre-training, they will serve as an even stronger foundation for reasoning and tool-using agents.


GPT 5 is likely just going to be a router model that decides whether to send the prompt to 4o, 4o mini, 4.5, o3, or o3 mini.


My guess is that you're right about that being what's next (or maybe almost next) from them, but I think they'll save the name GPT-5 for the next actually-trained model (like 4.5 but a bigger jump), and use a different kind of name for the routing model.

Even by their poor standards at naming it would be weird to introduce a completely new type/concept, that can loop in models including the 4 / 4.5 series, while naming it part of that same series.

My bet: probably something weird like "oo1", or I suspect they might try to give it a name that sticks for people to think of as "the" model - either just calling it "ChatGPT", or coming up with something new that sounds more like a product name than a version number (OpenCore, or Central, or... whatever they think of)


They already confirmed GPT-5 will be a unified model "months" away. Elsewhere they claimed that it will not just be a router but a "unified" model.

https://www.theverge.com/news/611365/openai-gpt-4-5-roadmap-...


If you read what sama is quoted as saying in your link, it's obvious that "unified model" = router.

> “We hate the model picker as much as you do and want to return to magic unified intelligence,”

> “a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks,”

> the company plans to “release GPT-5 as a system that integrates a lot of our technology, including o3,”

He even slips up and says "integrates" in the last quote.

When he talks about "unifying", he's talking about the user experience not the underlying model itself.


Interesting, thanks for sharing - definitely makes me withdraw my confidence in that prediction, though I still think there's a decent chance they change their mind about that as it seems to me like an even worse naming decision than their previous shit name choices!


Except minus 4.5, because at these prices and results there's essentially no reason not to just use one of the existing models if you're going to be dynamically routing anyway.


What it confirms, I think, is, that we are going to need a lot more chips.


Further confirmation, IMO, that the idea that any of this leads to anything close to AGI is people getting high on their own supply (in some cases literally).

LLMs are a great tool for what is effectively collected knowledge search and summary (so long as you are willing to accept that you have to verify all of the 'knowledge' they spit back because they always have the ability to go off the rails) but they have been hitting the limits on how much better that can get without somehow introducing more real knowledge for close to 2 years now and everything since then is super incremental and IME mostly just benchmark gains and hype as opposed to actually being purely better.

I personally don't believe that more GPUs solves this, like, at all. But its great for Nvidia's stock price.


I'd put myself on the pessimistic side of all the hype, but I still acknowledge that where we are now is a pretty staggering leap from two years ago. Coding in particular has gone from hints and fragments to full scripts that you can correct verbally and are very often accurate and reliable.


I'm not saying there's been no improvement at all. I personally wouldn't categorize it as staggering, but we can agree to disagree on that.

I find the improvements to be uneven in the sense that every time I try a new model I can find use cases where its an improvement over previous versions but I can also find use cases where it feels like a serious regression.

Our differences in how we categorize the amount of improvement over the past 2 years may be related to how much the newer models are improving vs regressing for our individual use cases.

When used as coding helpers/time accelerators, I find newer models to be better at one-shot tasks where you let the LLM loose to write or rewrite entire large systems and I find them worse at creating or maintaining small modules to fit into an existing larger system. My own use of LLMs is largely in the latter category.

To be fair I find the current peak model for coding assistant to be Claude 3.5 Sonnet which is much newer than 2 years old, but I feel like the improvements to get to that model were pretty incremental relative to the vast amount of resources poured into it and then I feel like Claude 3.7 was a pretty big back-slide for my own use case which has recently heightened my own skepticism.


Hilarious. Over two years we went from LLMs being slow and not very capable of solving problems to models that are incredibly fast, cheap and able to solve problems in different domains.


Well said. 100% agree


Or, possibly, we're stuck waiting for another theoretical breakthrough before real progress is made.


breakthrough in biology


Eh, no. More chips won't save this right now, or probably in the near future (IE barring someone sitting on a breakthrough right now).

It just means either

A. Lots and lots of hard work that get you a few percent at a time, but add up to a lot over time.

or

B. Completely different approaches that people actually think about for a while rather than trying to incrementally get something done in the next 1-2 months.

Most fields go through this stage. Sometimes more than once as they mature and loop back around :)

Right now, AI seems bad at doing either - at least, from the outside of most of these companies, and watching open source/etc.

While lots of little improvements seem to be released in lots of parts, it's rare to see anywhere that is collecting and aggregating them en masse and putting them in practice. It feels like for every 100 research papers, maybe 1 makes it into something in a way that anyone ends up using it by default.

This could be because they aren't really even a few percent (which would be yet a different problem, and in some ways worse), or it could be because nobody has cared to, or ...

I'm sure very large companies are doing a fairly reasonable job on this, because they historically do, but everyone else - even frameworks - it's still in the "here's a million knobs and things that may or may not help".

It's like if compilers had no "O0/O1/O2/O3' at all and were just like "here's 16,283 compiler passes - you can put them in any order and amount you want". Thanks! I hate it!

It's worse even because it's like this at every layer of the stack, whereas in this compiler example, it's just one layer.

At the rate of claimed improvements by papers in all parts of the stack, either lots and lots and lots is being lost because this is happening, in which case, eventually that percent adds up to enough for someone to be able to use to kill you, or nothing is being lost, in which case, people appear to be wasting untold amounts of time and energy, then trying to bullshit everyone else, and the field as a whole appears to be doing nothing about it. That seems, in a lot of ways, even worse. FWIW - I already know which one the cynics of HN believe, you don't have to tell me :P. This is obviously also presented as black and white, but the in-betweens don't seem much better.

Additionally, everyone seems to rush half-baked things to try to get the next incremental improvement released and out the door because they think it will help them stay "sticky" or whatever. History does not suggest this is a good plan and even if it was a good plan in theory, it's pretty hard to lock people in with what exists right now. There isn't enough anyone cares about and rushing out half-baked crap is not helping that. mindshare doesn't really matter if no one cares about using your product.

Does anyone using these things truly feel locked into anyone's ecosystem at this point? Do they feel like they will be soon?

I haven't met anyone who feels that way, even in corps spending tons and tons of money with these providers.

The public companies - i can at least understand given the fickleness of public markets. That was supposed to be one of the serious benefit of staying private. So watching private companies do the same thing - it's just sort of mind-boggling.

Hopefully they'll grow up soon, or someone who takes their time and does it right during one of the lulls will come and eat all of their lunches.


> Completely different approaches that people actually think about for a while

I think this is very likely simply because there are so many smart people looking at it right now. I hope the bubble doesn't burst before it happens.


For OpenAI perhaps? Sonnet 3.7 without extended thinking is quite strong. Swe-bench scores tie o3


How do you read those scores? I wanted to see how well 3.7 with thinking did, but I can't even read that table.


I think this is the correct take. There are other axes to scale on AND I expect we'll see smaller and smaller models approach this level of pre-trained performance. But I believe massive pre-training gains have hit clearly diminished returns (until I see evidence otherwise).


I think it's fairer to compare it to the original GPT-4 which might the equivalent in term of "size" (though we don't have actual numbers for either).

GPT-4: Input $30.00 / 1M tokens ; Output $60.00 / 1M tokens

So 4.5 is 2.5x more expensive.

I think they announced this as their last non-reasoning model, so it was maybe with the goal of stretching pre-training as far as they could, just to see what new capabilities would show up. We'll find out as the community gives it a whirl.

I'm a Tier 5 org and I have it available already in the API.


The marginal costs for running a GPT-4-class LLM are much lower nowadays due to significant software and hardware innovations since then, so costs/pricing are harder to compare.


Agreed, however it might make sense that a much-larger-than-GPT-4 LLM would also, at launch, be more expensive to run than the OG GPT-4 was at launch.

(And I think this is probably also scarecrow pricing to discourage casual users from clogging the API since they seem to be too compute-constrained to deliver this at scale)


There are some numbers on one of their Blackwell or Hopper info pages that notes the ability of their hardware in hosting an unnamed GPT model that is 1.8T params. My assumption was that it referred to GPT-4

Sounds to me like GPT 4.5 likely requires a full Blackwell HGX cabinet or something, thus OpenAI's reference to needing to scale out their compute more (Supermicro only opened up their Blackwell racks for General Availability last month, and they're the prime vendor for water-cooled Blackwell cabinets right now, and have the ability to throw up a GPU mega-cluster in a few weeks, like they did for xAI/Grok)


Why would that be fairer? We can assume they did incorporate all learnings and optimizations they made post gpt-4 launch, no?


Definitely not. They don't distill their original models. 4o is a much more distilled and cheaper version of 4. I assume 4.5o would be a distilled and cheaper version of 4.5.

It'd be weird to release a distilled version without ever releasing the base undistilled version.


Not necessarily.

If this huge model has taken months to pre-train and was expected to be released before, say, o3-mini, you could definitely have some last-minute optimizations in o3-mini that were not considered at the time of building the architecture of gpt-4.5.


2x that price for the 32k context via API at launch. So nearly the same price, but you get 4x the context


Honestly if long context (that doesn't start to degrade quickly) is what you're after, I would use Grok 3 (not sure when the api version releases though). Over the last week or so I've had a massive thread of conversation with it that started with plenty of my project's relevant code (as in couple hundred lines), and several days later, after like 20 question-aswer blocks, you ask it something and it aswers "since you're doing that this way, and you said you want x, y and z, here are your options blabla"... It's like thinking Gemini but better. Also, unlike Gemini (and others) it seems to have a much more recent data cutoff. Try asking about some language feature / library / framework that has been released recently (say 3 months ago) and most of the models shit the bed, use older versions of the thing or just start to imitate what the code might look like. For example try asking Gemini if it can generate Tailwind 4 code, it will tell you that it's training cutoff is like October or something and Tailwind 4 "isn't released yet" and that it can try to imitate what the code might look like. Uhhhhhh, thanks I guess??


This has been my suspicion for a long time - OpenAI have indeed been working on "GPT5", but training and running it is proving so expensive (and its actual reasoning abilities only marginally stronger than GPT4) that there's just no market for it.

It points to an overall plateau being reached in the performance of the transformer architecture.


That would certainly reduce my anxiety about the future of my chosen profession.


but while there is a plateau in the transformer architecture, what you can do with those base models by further finetuning / modifying / enhancing them is still largely unexplored so i still predict mind-blowing enhancements yearly for this foreseeable future. if they validate openai's valuation and investment needs is a different question.


Certainly hope so. The tech billionaires are little to excited to achieve AGI and replace the workforce.


TBH, with the safety/alignment paradigm we have, workforce replacement was not my top concern when we hit AGI. A pause / lull in capabilities would be hugely helpful so that we can figure how not to die along with the lightcone...


Not sure how why anyone thinks it's possible to fully control AGI, we cant even fully tame a house cat.


Is it inevitable to you that someone will create some kind of techno-god behemoth AI that will figure out how to optimally dominate an entire future light cone starting from the point in spacetime of its self-actualization? Borg or Cylons?


I feel like this period has shown that we're not quite ready for a machine god. We'll see if RL hits a wall as well.


AI as it stands in 2025 is an amazing technology, but it is not a product at all.

As a result, OpenAI simply does not have a business model, even if they are trying to convince the world that they do.

My bet is that they're currently burning through other people's capital at an amazing rate, but that they are light-years from profitability

They are also being chased by fierce competition and OpenSource which is very close behind. There simply is no moat.

It will not end well for investors who sunk money in these large AI startups (unless of course they manage to find a Softbank-style mark to sell the whole thing to), but everyone will benefit from the progress AI will have made during the bubble.

So, in the end, OpenAI will have, albeit very unwillingly, fulfilled their original charter of improving humanity's lot.


I've been a Plus user for a long time now. My opinion is there is very much a ChatGPT suite of products that come together to make for a mostly delightful experience.

Three things I use all the time:

- Canvas for proofing and editing my article drafts before publishing. This has replaced an actual human editor for me.

- Voice for all sorts of things, mostly for thinking out loud about problems or a quick question about pop culture, what something means in another language, etc. The Sol voice is so approachable for me.

- GPTs I can use for things like D&D adventure summaries I need in a certain style every time without any manual prompting.


Except that if OpenAI goes bust, very little of what they did will actually be released to human kind.

So their contribution was really to fuel a race for opensource (which they contributed little to). Pretty complex of an argument.


> My bet is that they're currently burning through other people's capital at an amazing rate, but that they are light-years from profitability

The Information leaked their internal projections a few months ago, and apparently their own estimates have them losing $44B between then and 2029 when they expect to finally turn a profit, maybe.


That's surprisingly small


> AI as it stands in 2025 is an amazing technology, but it is not a product at all.

Here I'm assuming "AI" to mean what's broadly called Generative AI (LLMs, photo, video generation)

I genuinely am struggling to see what the product is too.

The code assistant use cases are really impressive across the board (and I'm someone who was vocally against them less than a year ago), and I pay for Github CoPilot (for now) but I can't think of any offering otherwise to dispute your claim.

It seems like companies are desperate to find a market fit, and shoving the words "agentic" everywhere doesn't inspire confidence.

Here's the thing: I remember people lining up around the block for iPhone releases, XBox launches, hell even Grand Theft Auto midnight releases.

Is there a market of people clamoring to use/get anything GenAI related?

If any/all LLM services went down tonight, what's the impact? Kids do their own homework?

JavaScript programmers have to remember how to write React components?

Compare that with Google Maps disappearing, or similar.

LLMs are in a position where they're forced onto people and most frankly aren't that interested. Did anyone ASK for Microsoft throwing some Copilot things all over their operating system? Does anyone want Apple Intelligence, really?


> I genuinely am struggling to see what the product is too.

They're nice for summarizing and categorizing text. We've had good solutions for that before, too (BERT, et al), but LLM's are marginally nicer.

> Is there a market of people clamoring to use/get anything GenAI related?

No. LLM's are lame and uncool. Kids especially dislike them a lot on that basis alone.


> LLM's are lame and uncool. Kids especially dislike them a lot on that basis alone.

That's interesting and the first time I hear of this. Could you provide any links that might elucidate this?


> LLM's are lame and uncool. Kids especially dislike them a lot on that basis alone.

Not just kids.


I think search and chat are decent products as well. I am a Google subscriber and I just use Gemini as a replacement for search without ads. To me, this movement accelerated paid search in an unexpected way. I know the detractors will cry "hallucinations" and the ilk. I would counter with an argument about the state of the current web besieged by ads and misinformation. If people carry a reasonable amount of skepticism in all things, this is a fine use case. Trust but verify.

I do worry about model poisoning with fake truths but dont feel we are there yet.


> I do worry about model poisoning with fake truths but don't feel we are there yet.

In my use, hallucinations will need to be a lot lower before we get there, because I already can't trust anything an LLM says so I don't think I could even distinguish a poisoned fake truth from a "regular" hallucination.

I just asked ChatGPT 4o to explain irreducible control flow graphs to me, something I've known in the past but couldn't remember. It gave me a couple of great definitions, with illustrative examples and counterexamples. I puzzled through one of the irreducible examples, and eventually realized it wasn't irreducible. I pointed out the error, and it gave a more complex example, also incorrect. It finally got it on the 3rd try. If I had been trying to learn something for the first time rather than remind myself of what I had once known, I would have been hopelessly lost. Skepticism about any response is still crucial.


speaking of search without ads, I wholeheartedly recommend https://kagi.com


I'll second this. Kagi is really impressive and ad-free is a nice change.


Yes: the real truth is, if there really was a good AI created, then we wouldnt even know about it existing until a billion dollar company takes over some industry with only a handful of developers in the entire company. Only then would hints spill out into the world that its possible.

No "good" AI will ever be open to everyone and relatively cheap, this is the same phenomenon as "how to get rich" books


> As a result, OpenAI simply does not have a business model, even if they are trying to convince the world that they do.

They have a super popular subscription service. If they keep iterating on the product enough, they can lag on the models. The business is the product not the models and not the API. Subscriptions are pretty sticky when you start getting your data entrenched in it. I keep my ChatGPT subscription because it’s the best app on Mac and already started to “learn me” through the memory and tasks feature.

Their app experience is easily the best out of their competitors (grok, Claude, etc). Which is a clear sign they know that it’s the product to sell. Things like DeepResearch and related are the way they’ll make it a sustainable business - add value-on-top experiences which drive the differentiation over commodities. Gemini is the only competitor that compares because it’s everywhere in Google surfaces. OpenAI’s pro tier will surely continue to get better, I think more LLM-enabled features will continue to be a differentiator. The biggest challenge will be continuing distribution and new features requiring interfacing with third parties to be more “agentic”.

Frankly, I think they have enough strength in product with their current models today that even if model training stalled it’d be a valuable business.


Sir they are selling text by the ounce just like farmers sold tomatoes before Walmart, How is that not a business model?



If it really costs them 30x more surely they must plan on putting pretty significant usage limits on any rollout to the Plus tier and if that is the case i'm not sure what the point is considering it seems primarily a replacement/upgrade for 4o.

The cognitive overhead of choosing between what will be 6 different models now on chatGPT and trying to map whether a query is "worth" using a certain model and worrying about hitting usage limits is getting kind of out of control.


To be fair their roadmap states that gpt-5 will unify everything into one model in "months".


"GPT-4.5 is not a frontier model, but it is OpenAI’s largest LLM, improving on GPT-4’s computational efficiency by more than 10x."[1]

I don't get it, it is supposedly much cheaper to run?

[1] https://cdn.openai.com/gpt-4-5-system-card.pdf (page 7, bottom)


I speed up my algo that takes a bag-o'-floats by 10x.

If I put 100x floats in my bag-o'-floats, its still 10x slower :(

(extending beyond that point and beyond ELI5: computational efficiency implies multiplying the floats is faster, but you still need the whole bag o' floats, i.e no RAM efficiency gained, so you're still screwed on big-O for the # of GPUs you need to use)


Now the real question about AI automation starts. Is it cheaper to pay a human to do the task or a AI company?


Humans have all sorts of issues you have to deal with. Being hungover, not sleeping well, having a personality, being late to work, not being able to work 24/7, very limited ability to copy them. If there's a soulless generic office-droidGPT that companies could hire that would never talk back and would do all sorts of menial work without needing breaks or to use the bathroom, I don't know that we humans stand a chance!

I have a bunch of work that needs doing. I can do it myself, or I can hire one person to do it. I gotta train them and manage them and even after I train them theres still only going to be one of them, and it's subject to their availability. On the other hand, if I need to train an AI to do it, but I can copy that AI, and then spin them up/down like on demand computer in the cloud, and not feel remotely bad about spinning them down?

It's definitely not there yet, but it's not hard to see the business case for it.


This is the ultimate business model.


Once we get to that stage, unless you're a capitalist, remember that your job is next in line to be replaced.


I write code for a living. My entire profession is on the line, thanks to ourselves. My eyes are wide open on the situation at hand though. Burying my head in the sand and pretending what I wrote above isn't true, isn't going to make it any less true.

I'm not sure what I can do about it, either. My job already doesn't look like it did a year ago, nevermind a decade away.


I keep telling coders to switch to being 1-person enterprise shops instead, but they don't listen. They will learn the hard way when they suddenly find themselves without a job due to AI having taken it away. As for what enterprise, use your imagination without bias from coding.


I don't understand what you're trying to say. What is an enterprise here - give me an example.


Every tech drone in every cubicle considers themselves a temporarily embarrassed capitalist.


I was about to comment that humans consume orders of magnitude less energy, but then I checked the numbers, and it looks like an average person consumes way more energy throughout their day (food, transportation, electricity usage, etc) than GPT-4.5 would at 1 query per minute over 24 hours.


It still not smart enough to replace for example customer service.


It's absolutely able to replace the majority of customer service volume which is full of mundane questions.


Such brutal reductionism: how do you calculate an ever growing percentage of customers so pissed at this terrible service that you lose customers forever? Not just one company losing customers... but an entire population completely distrusting and pulling back from any and all companies pulling this trash


Huh? Most call centers these days already use ivr systems and they absolutely are terrible experiences. I along with most people would happily speak with a LLM backed agent to resolve issues.

The CS is already a wreck and LLMs beat an ivr any day of the week and have the ability to offer real triaging ability.

The only people getting upset are the luddites like yourself.


I wonder how much money they’re losing on it too even at those prices.


Really depends on your use case. For low value tasks this is way too expensive. But for context, let’s say a court opinion is an average of 6000 words. Let’s say i want to analyze 10 court opinions and pull some information out that’s relevant to my case. That will run about $1.80 per document or $18 total. I wouldn’t pay that just to edify myself, but i can think of many use cases where it’s still a negligible cost, even if it only does 5% better than the 30x cheaper model.


You’re also insane if you’re a lawyer trusting gen AI for that. Set aside the fact that people are being caught doing it and judges are clearly getting sick of it (so, it’s a threat to your license). You also have an ethical duty to your client. I really don’t understand lawyers who can sign off on papers without themselves having reviewed the material they’re basing it on. Wild.


Doubly so with how good Claude 3.7 Sonnet is at $3 / 1M tokens.


> It sounds like it's so expensive and the difference in usefulness is so lacking(?)

The claimed hallucination rate is dropping from 61% to 37%. That's a "correct" rate increasing from 29% to 63%.

Double the correct rate costs 15x the price? That seems absurd, unless you think about how mistakes compound. Even just 2 steps in and you're comparing a 8.4% correct rate vs 40%. 3 automated steps and it's 2.4% vs 25%.


And remember, with increasing accuracy, the cost of validation goes up (not even linear).

We expect computers to be right. Its a trust problem. Average users will simply trust the results of LLMs and move on without proper validation. And the way the LLMs are trained to mimic human interaction is not helping either. This will reduce overall quality in society.

Its a different thing to work with another human, because there is intention. A human wants to be correct or to mislead me. I am considering this without even thinking about it.

And I don't expect expert models to improve things, unless the problem space is really simple (like checking eggs for anomalies).


> GPT 4.5 pricing is insane: Price Input: $75.00 / 1M tokens Cached input: $37.50 / 1M tokens Output: $150.00 / 1M tokens

> GPT 4o pricing for comparison: Price Input: $2.50 / 1M tokens Cached input: $1.25 / 1M tokens Output: $10.00 / 1M tokens

Their examples don't seem 30x better. :-)


I wonder if the pricing is partly to discourage distillation, if they suspect r1 was distilled from gpt 4o


Mainly to prevent you from using it


GPT-4.5 is 15-30x more expensive than GPT-4o. Likely that much larger in terms of parameter count too. It’s massive!!

With more parameters comes more latent space to build a world model. No wonder its internal world model is so much better than previous SOTA


Let's see if DeepSeek will make a distillation of this model as well


My understanding is that o1 is a system built on GPT-4o, so this pricing might explain why o3 (the alleged full version) cost so much money to run in the published benchmark tests [0]. It must be using GPT 4.5 or something similar as the underlying model.

[0] https://arcprize.org/blog/oai-o3-pub-breakthrough


Well to play the devils advocat, i think this is useful to have, at least for ‘Open’Ai to start off from to apply QLora or similar approximations.

Bonus they could even do some self learning afterwards with the performance improvements DeepSeek just published and it might have more EQ and less hallucinations than starting from scratch…

ie the price might go down big time but there might be significant improvements down the line when starting from such a broad base


>GPT 4.5 pricing is insane: Price Input: $75.00 / 1M tokens Cached input: $37.50 / 1M tokens Output: $150.00 / 1M tokens

How many eggs does that include??!


> It sounds like it's so expensive and the difference in usefulness is so lacking(?) they're not even gonna keep serving it in the API for long

I guess the rationale behind this is paying for the marginal improvement. Maybe the next few percent of improvement is so important to a business that the business is willing to pay a hefty premium.


The performance bump doesn't justify the steep price difference.

From a for profit business lens for OpenAI - I understand pushing the price outside the range of side projects, but this pushes it past start ups.

Excited to see new stuff released past reasoning models in any case. Hope they can improve the price soon.


For comparison, 3 years ago, the most powerful model out there (GPT-3 davinci) was $60/MTok.


In other words, they want people to pay for the privilege of becoming beta testers....


Someone in another comment said that gpt-4 32k had somewhat the same cost (ok 10% cheaper), what was a pain was more the latency and speed than actual cost given the increase in productivity for our usage.


Looks like more signal that the scaling "law" is indeed faltering.


The price will come down over time as they apply all the techniques to distill it down to a smaller parameter model. Just like GPT4 pricing came down significantly over time.


hyperscalers in shambles, no clue why they even released this other than the fact they didn't want to admit they wasted an absurd amount of money for no reason


It's crazy expensive because they want to pull in as much revenue as possible as fast as possible before the Open Source models put them outta business.


I put "hello" into it and it billed me 30p for it. Absolutely unusable, more expensive than realtime voice chat.


I suspect this is GPT-5. This is the biggest model they made and they got very little ROI hence the re-branding.


Did they already disable it?

When using `gpt-4.5-preview` I am getting: > Invalid URL (POST /v1/chat/completions)


I don't understand the pricing for cached tokens. It seems rather high for looking up something in a cache.


usefulness is bound to scope/purpose, even if innovation stops, in 3y (thanks to hw and tuning progress ) when 4o costs 0.1$/M and 4.5 1$/M even being a small improvement ( which is not imo ), you will chose to use 4.5 , exactly like no one now want to use 3.5


30x price bump feels like a attempt to pull in as much money as possible before the bubble bursts.


To me, it feels like a PR stunt in response to what the competition is doing. OpenAI is trying to show how they are ahead of others, but they price the new model to minimize its use. Potentially, Anthropic et al. also have amazing models that they aren't yet ready to productionize because of costs.


I can chew through 1MM tokens with a single standard (and optimized) call. This pricing is insane.


It's also not clear what the definite use case is for this versus other models like o3.


> It sounds like it's so expensive and the difference in usefulness is so lacking(?) they're not even gonna keep serving it in the API for long:

Sounds like an attempt at price descrimination. Sell the expensive version to big companies with big budgets who don't care, sell the cheap version to everyone else. Capture both ends of the market.


It's priced like this because it can generate erotica.


This is was GPT4 cost when it was released


Maybe they started a really long expensive training session, and Elon Musk's DOGE script kiddies somehow broke in and sabotaged it, so it got disrupted and turned into the Eraserhead baby, but they still want to get it out there for a little while before it died to squeeze all the money out of it as possible, because it was so expensive to train.

https://www.youtube.com/watch?v=ZZ-kI4Qzj9U


one of the problem seem to be there's no alternative to Nvidia ecosystem. (the gpu + CUDA).


May I introduce you to Gemini 2.0


ZLUDA can be used as compatibility glue, also you can use ROCm or even Vulcan with Ollama.


But you get higher EQ. /s


> GPT 4.5 pricing is insane:

> I'm still gonna give it a go, though.

Seems like the pricing is pretty rational then?


Not if people just try a few prompts then stop using it.


Sure but its in their best interest to lower it then and only then.

OpenAI wouldn't be the first company to price something expensive when it first comes out to capitalize on people who are less price sensitive at first and then lower prices to capture a bigger audience.

That's all pricing 101 as the saying goes.


If OAI are concerning themselves with collecting a few hundereds from a small group of individuals then they really have nothing better to do


How much of OAI's reported users are doing exactly this?


Input price difference: 4.5 is 30x more

Output price difference:4.5 is 15x more

In their model evaluation scores in the appendix, 4.5 is, on average, 26% better. I don't understand the value here.


If you ran the same query set 30x or 15x on the cheaper model (and compensated for all the extra tokens the reasoning model uses), would you be able to realize the same 26% quality gain in a machine-adjudicatible kind of way?


with a reasoning model you'd get better than both.


Exactly. Not sure why you'd pick GPT 4.5 over lots of GPT 4o queries or an o1 query


Ignoring latency for a second, one of the tricks for boosting quality is to utilize consensus. One probability does not need to call the lesser model 30x as much to achieve these gains sorta of gains. Moreover you have to take the purported gains with a grain of salt. The models are probably trained on the evaluation sets they are benchmarked against.


Einstein's IQ = 3.5x chimpanzees IQs, right?


3.5x on a normal distribution with mean 100 and SD 15 is pretty insane. But I agree with your point, being 26% better at a certain benchmark could be a tiny difference, or an incredible improvement (imagine the hardest questions being Riemann hypothesis, P != NP, etc).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: