Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

$200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model. In other words, it's a con. I'm a paying Perplexity user, and Perplexity already does this same sort of reasoning. At first it seemed impressive, then I started noticing mistakes in topics I'm an expert in. After awhile, I started realizing that these mistakes are present in almost all topics, if you check the sources and do the reasoning yourself.

LLMs are very good at giving plausible answers, but calling them "intelligent" is a misnomer. They're nothing more than predictive models, very useful for some things, but will ALWAYS be the wrong tool for the job when it comes to verifying truth and reasoning.



> In other words, it's a con.

A con like that wouldn't last very long.

This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.

Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.


> A con like that wouldn't last very long.

That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.

All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.


A con? It's not that $200 is a con, their whole existence is a con.

They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.

> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]

Surprise surprise, they just raised the price.

[1] https://www.wheresyoured.at/oai-business/


They haven’t raised the price, they have added new models to the existing tier with better performance at the same price.

They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.


They haven't raised the price yet but NYT has seen internal documents saying they do plan to.

https://www.nytimes.com/2024/09/27/technology/openai-chatgpt...

Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.

We'll have to see if the first bump to $22 this year ends up happening.


Reasoning through that from a customer perspective is interesting.

I'm hard pressed to identify any users to whom LLMs are providing enough value to justify $20/month, but not $44.

On the other hand, I can see a lot of people to whom it's not providing any value being unable to afford a higher price.

Guess we'll see which category most OpenAI users are in.


> We'll have to see if the bump to $22 this year ends up happening.

I can't read the article. Any mention of the API pricing?


You're technically right. New models will likely be incremental upgrades at a hefty premium. But considering the money they're loosing, this pricing likely better reflects their costs.


They're throwing products at the wall to see what sticks. They're trying to rapidly morph from a research company into a product company.

Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).


AI may be over hyped and it may have flaws (I think it is both)... but it may also be totally worth $200 / month to many people. My brother is getting way more value than that out of it for instance.

So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.


> the competition is intense from free models

Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].

The thing that matters is product.

[1] Llama, QwQ, Mistral, ...

[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.

[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.

[4] The research on world models is expansive and there are lots of open source models and weights in the space.


A better way to express it than a "con" is that it's a price-framing device. It's like listing a watch at an initial value of $2,000 so that people will feel content to buy it at $400.


That sounds like a con to me too.


The line between ‘con’ and ‘genuine value synthesised in the eye of the buyer using nothing but marketing’ is very thin. If people are happy, they are happy.


> A con like that wouldn't last very long.

The NFT market lasted for many years and was enormous.

Never underestimate the power of hype.


I think this is probably right but so far it seems that the areas in which an LLM is most effective do fine with the lower power models.

Example: the 4o or Claude are great for coding, summarizing and rewriting emails. So which domains require a slightly better model?

I suppose if the error rate in code or summary goes down even 10%, it might be worth $180/month.


Few days ago I had issue with IPsec VPN behind NAT. I spend few hours Googling around, tinkering with system, I had some rough understanding of what goes wrong, but not much and I had no idea how to solve this issue.

I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.

I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.


It is really great when it works, but challenge is I've had it sometimes not understanding a detailed programming question and it confidently gives an incorrect answer. Going back and forth a few times ends up clear it really does not know answer, but I end up going in circles. I know LLMs can't really tell you "sorry I don't know this one", but I wish they could.


The exhaustive question makes ChatGPT reconstruct your answer in real-time, while all you need to do is sleep; your brain will construct the answer and deliver it tomorrow morning.


The benefit of getting an answer immediately rather than tomorrow morning is why people are sometimes paid more for on-call rates rather than everyone being 9-5.

(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).


I bet users won't pay for the power, but for a guarantee of access! I always hear about people running out of compute time for ChatGPT. Obvious answer is charge more for a higher quality service.


> A con like that wouldn't last very long.

Bernie Madoff ran his investment fund as a Ponzi scheme for over a decade (perhaps several decades)


Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)

> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one

This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.


> ... or their employer does.

I suspect this is a key driver behind having a higher priced, individual user offering. It gives pricing latitude for enterprise volume licenses.


Ok.

Let's say I run a company call AndSoft.

AndSoft has about 2000 people on staff, maybe 1000 programers.

This solution will cost 200k per year. Or 2.4 million per year.

Llama3 is effectively free with some liberation. Is ChatGPT pro 2.4 million a year better than Llama3. Of course Open AI will offer volume discounts.

I imagine if I was making north of 500k a year I'd subscribe as a curiosity... At least for a few months.

If your time is worth 250$ a hour, and this saves you an hour per month it's well worth it.


> A con like that wouldn't last very long

As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:

I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.

My mum was very fond of homeopathic pills and Bach flower tinctures, for example.

* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.


Overcharging for a product to make it seem better than it really is has served apple well for decades


That's a tired trope that simply isn't true.

Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.

The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.


> consistent repair support

Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.

I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?


I suspect they're saying that for a lot of us, Apple provides enough value compared to the competition that we buy them despite the premium prices (and, on iOS, the lock-in).

It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.

And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.

Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.


They only have to be consistently better than the competition, and they are, by far. I always look for reviews before buying anything, and even then I've been nothing but disappointed by the likes of Razer, LG, Samsung, etc.


I used to love to bash on Apple too. But ever since I’ve had the money all my hardware (except desktop PC) has been apple.

There’s something to be said for buying something and knowing it will interoperate with all your other stuff perfectly.


> consistent repair support

The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.

People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.


Here in Brazil Apple is very much all about showing off how rich you are. Especially since we have some of the most expensive Apple products in the world.

Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.


Same in Kazakhstan. It's all about status. Many poor persons get a credit to buy iPhones, because they want to look rich.


Apple products are expensive — sometimes to a degree that almost seems to be taking the piss.

But name one other company whose hardware truly matches Apple’s standards for precision and attention to detail.


Indeed


>Whether LLM's help you at your work is extremely domain-dependent.

I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.


That opinion made sense two years ago. It's plain weird to still hold it today.


There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.

EDIT: Added links.

https://www.cio.com/article/3540579/devs-gaining-little-if-a...

https://web.archive.org/web/20241205204237/https://llmreport...

(Archive link because the llmreporter site seems to have an expired TLS certificate at the moment.)

No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance...


I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.


$100 to save 15 minutes implies that you net at least $800,000 a year. Well done if so!


When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.


> but really saved me from moving to a different task and coming back

You missed this part. Being able to quickly fix things without deep thought while in flow saves you from the slowdowns of context switching.


That $100 of value likely costed them more like $0.1 - $1 in API costs.


It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.


I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.


First of all, that's moving the goalposts to next state over, relative to what I replied to.

Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.

Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).

Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).

--

[0] - Per the article you linked; I'm yet to find and read the actual study itself.


Do you have a link? I'm not finding it by searching.


I really need the source of this.


LLMs have become indispensable for many attorneys. I know many other professionals that have been able to offload dozens of hours of work per month to ChatGPT and Claude.


What on earth is this work that they're doing that's so resilient to the fallible nature of LLMs? Is it just document search with a RAG?


Everything. Drafting correspondence, pleadings discovery, discovery responses. Reviewing all of the same. Reviewing depositions, drafting deposition outlines.

Everything that is “word processing,” and that’s a lot.


Well that's terrifying. Good luck to them.


To be honest, much of contract law is formal boilerplate. I can understand why they'd want to move their role to 'review' instead of 'generate'


So, instead of fixing the issue (legal documents becoming a barely manageable mess) they’re investing money into making it… even worse?

This world is so messed up.


Arguably the same problem is occurs in programming: Anything so formulaic and common that an LLM can regurgitate it with a decent level of reliability... is something that ought to have been folded into method/library already.

Or it already exists in some howto documentation, but nobody wanted to skim the documentation.


They have no lever with which to fix the issue.


Why not just move over to forms with structured input?


As a customer of legal work for 20 years, it is also way (way way) faster and cheaper to draft a contract with Claude (total work ~1 hour, even with complex back-and-forth ; you don't want to try to one-shot it in a single prompt) and then pay a law firm their top dollar-per-hour consulting to review/amend the contract (you can get to the final version in a day).

Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.


Good law firms won’t charge you for using their boilerplates, only the time to customize it for your use case.

I anlways ask our lawyer whether or not they have a boilerplate when I need a contract written up. They usually do.


That's interesting. I've never had a law firm be straightforward about the (obvious) fact they'll be using a boilerplate.

I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)

Do you ask about the boilerplate before or after you ask for a quote?


I typically don’t ask for a quote upfront since they are very fair with their business and billing practices.

I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.

I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.

Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.


Yeah the industries LLMs will disrupt the most are the ones who gatekeep busywork. SWE falls into this to some degree but other professions are more guilty than us. They dont replace intelligence they just surface jobs which never really required much intelligence to begin with.


I bet they still charge for all the hours though.


I use llms to do most of my dunki work.


Maybe not very long, but long enough is plausible.


HN has been just such an awful place to discuss AI. Everyone here is convinced its a grift, a con, and we're all "marks"

Just zero curiosity, only skepticism.


If you do a lot of work in an area that o1 is strong in - $200/month effectively rounds down to $0 - and a single good answer at the right time could justify that entire $200 in a single go.


I feel like a single bad answer at the wrong time could cost a heck of a lot more than $200. And these LLMs are riddled with bad answers.


Think of it as an intern. Don't trust everything they say.


It's so strange to me that in a forum full of programmers, people don't seem to understand that you set up systems to detect errors before they cause problems. That's why I find ChatGPT so useful for helping me with programming - I can tell if it makes a mistake because... the code doesn't do what I want it to do. I already have testing and linting set up to catch my own mistakes, and those things also catch AI's mistakes.


Thank you! I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is; it's like people want it 100% perfect or nothing. For me if it gets me 80% there in 1/10 the time, and then I do the final 20%, that's still heck of a productivity boost basically for free.


Yep, I’m with you. I’m a solo dev who never went to college… o1 makes far fewer errors than I do! No chance I’d make it past round one of any sort of coding tournament. But I managed to bootstrap a whole saas company doing all the coding myself, which involved setting up a lot of guard rails to catch my own mistakes before they reached production. And now I can consult with a programming intelligence the likes of which I could never afford to hire if it was a person. It’s amazing.


Is it working?


Not sure what you're referring to exactly. But broadly yes it is working for me - the number of new features I get out to users has sped up greatly, and stability of my product has also gone up.


Are you making money with your saas idea?


Yep, been living off it for nine years now


Congratulations! That is not an easy task. I am just starting the journey.


Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.

What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?


> Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.


Using 90/10 split: that 10% of the time before being reduced to only take 10% of that makes 9% time savings.

160 hours a month * $100/hr programmer * 9% = $1400 savings, easily enough to justify $200/month.

Even if 1/10th of the time it fails, that is still ~8% or $1200 savings.


Does that count the time you spend on prompt engineering?


It depends what you’re doing.

For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.


I'd love to hear what that is.

So far, given my range of projects, I have seen it struggle with lower level mobile stuff and hardware (ESP32 + BLE + HID).

For things like web (front/back), DB, video games (web and Unity), it does work pretty well (at least 80% there on average).

And I'm talking of the free version, not this $200/mo one.


Well, that is a very specific set of skills. I bet the C-suite loves it.


I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is;

People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.


Well now at $200 it's a little farther away from free :P


What do you mean? ChatGPT is free, the Pro version isn't.

I'm talking of the generally available one, haven't had the chance to try this new version.


I could a car for that kind of money!


Of course, but for every thoroughly set up TDD environment, you have a hundred other people just blindly copy pasting LLM output into their code base and trusting the code based on a few quick sanity checks.


You assume programming software with an existing well-defined and correct test suite is all these will be used for.


>I can tell if it makes a mistake because... the code doesn't do what I want it to do

Sometimes it does what you want it to do, but still creates a bug.

Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.

Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.

I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.


So the AI wrote a bug; but if humans wouldn’t catch it in code review, then obviously they could have written the same bug. Which shouldn’t be surprising because LLMs didn’t invent the concept of bugs.

I use LLMs maybe a few times a month but I don’t really follow this argument against them.


Code reviewing is not the same thing as writing code. When you're writing code you're supposed to look at the documentation and do some exploration before the final code is pushed.

It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.

Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.


it also catches MY mistakes, so that saves time


So true, and people seem to gloss over this fact completely. They only talk about correcting the LLM's code while the opposite is much more common for me.


I would hesitate to hire an intern that makes incorrect statements with maximum confidence and with no ability to learn from their mistakes.


When you highlight only the negatives, yeah it does sound like no one should hire that intern. But what if the same intern happens to have an encyclopedia for a brain and can pour through massive documents and codebases to spot and fix countless human errors in a snap?

There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.


Those are fair points, I didn't mean to imply that there are only negatives, and I don't consider myself to be in the former camp you describe as wanting nothing to do with these "interns". I shouldn't have stuck with the intern analogy at all since it's difficult for me to compare the two, with one being fairly autonomous and the other being totally reliant on a prompter.

The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.


nothing chatgpt says is with maximum confidence. the EULA and terms of use are riddled with "no guarantee of accuracy" and "use at own risk"


No they're right. ChatGPT (and all chargers) responds confidently while making simple errors. Disclaimers upon signup or in tiny corner text are so at odds with the actual chat experience.


What I meant to say was that the model uses the verbiage of a maximally confident human. In my experience the interns worth having have some sense of the limits of their knowledge and will tell you "I don't know" or qualify information with "I'm not certain, but..."

If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.


I think the point is that an LLM almost always responds with the appearance of high confidence. It will much quicker hallucinate than say "I don't know."


And we, as humans, are having a hard time compartmentalizing and forgetting our lifetimes of language cues, which typically correlate with attention to detail, intelligence, time investment, etc.

New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)

Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...


They aren't talking about EULAs. It's how they give out their answers.


If I have to do the work to double-check all the answers, why am I paying $200?


Why do companies hire junior devs? You still want a senior to review the PRs before they merge into the product right? But the net benefit is still there.


We hire junior devs as an investment, because at some point they turn into seniors. If they stayed juniors forever, I wouldn't hire them.


I started incorporating LLMs into my workflows around the time gpt-3 came out. By comparison to its performance at that point, it sure feels like my junior is starting to become a senior.


Are you implying this technology will remain static in its capabilities going forward despite it having seen significant improvement over the last few years?


No, I'm explicitly saying that gpt-4o-2024-11-20 won't get any smarter, no matter how much I use it.


Does that matter when you can just swap it for gpt-5-whatever at some point in the future?


Someone asked why I hire juniors. I said I hire juniors because they get better. I don't need to use the model for it to get better, I can just wait until it's good and use it then. That's the argument.


I suppose the counterargument would be your investment in OpenAI allows them to fund the better model down the road, but I get your drift :)


Genuinely curious, are you saying that your junior devs don't provide any value from the work they do?


They provide some value, but between the time they take in coaching, reviewing their work, support, etc, I'm fairly sure one senior developer has a much higher work per dollar ratio than the junior.


Because double checking and occasionally hitting retry is still 10x faster than me doing.


Because you wouldn't have come up with the correct answer before you used up 200 dollars worth of salary or billable time.


because checking the work is much faster than generating it.


Because it's per month and not per hour for a specialist consultant.


I don't know anyone who does something and at first says, "This will be a mistake" Maybe they say, "I am pretty sure this is the right thing to do," then they make a mistake.

If it's easier mentally, just put that second sentence in from of every chatgpt answer.

Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.


Yeah, but you personally don't pay $200/month out of your pocket for the intern. Heck in Canada, govt. actually rebates for hiring interns and co-ops.


Then the lesson you have learned is “don’t blindly trust the machine”

Which is a very valuable lesson, worth more than $200


Easy - don't trust the answers. Verify them


Even in this case loosing $200 + whatever vs a tiny bit higher chance of loosing $20 + whatever makes pro seem a good deal.


Doesn't that completely depend on those chances and the magnitude of +whatever?

It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.

If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.


Compared to know things and not loosing whatever, both are pretty bad deals.


What specific use cases are you referring to where that poses a risk? I've been using LLMs for years now (both directly and as part of applications) and can't think of a single instance where the output constituted a risk or where it was relied upon for critical decisions.


That's why you have a human in the loop responsible for the answer.


Presumably, this is what they want the marks buying the $200 plan to think. Whether it's actually capable of providing answers worth $200 and not just sweet talking is the whole question.


If i'm happy to pay 20 in retirement just for the odd bit of writing help, then i can easily imagine it being worth 200 to someone with a job


Yep. I’m currently paying for both Claude and chatgpt because they’re good at different things. I can’t tell whether this is extremely cheap or expensive - last week Claude saved me about a day of time by writing a whole lot of very complex sql queries for me. The value is insane.


yeah, as someone who is far from programming, the amount of time and money it saved me helping me make sql queries and making php code for wordpress is insane. It even helped me fix some wordpress plugins that had errors and you just copy paste or even screenshot those errors until they get fixed! If used correctly and efficiently the value is insane, I would say $20, $200 is still cheap for such an amazing tool.


The problem isn't whether ChatGPT Pro can save you $200/mo (for most programmers it can.)

The problem is whether it can saves you $180/mo more than Claude does.


I kind of feel this is a kick in the face.

Now I'll forever be using a second rate model because I'm not rich enough.

If I'm stuck using a second rate model I may go find someone else's model to use.


> In other words, it's a con. I'm a paying Perplexity user

I love this back-to-back pair of statements. It is like “You can never win three card monte. I pay a monthly subscription fee to play it.”


I pay $10/month for perplexity because I fully understand its limitations. I will not pay $200/month for an LLM.


I am CERTAIN you do not FULLY understand its limitations.


mkay


yeah, that's what i thought.


Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.

I think at least LLMs are more receptive to the idea that they may be wrong, and based on that, we can have N diverse LLMs and they may argue more peacefully and build a reliable consensus than N "intelligent" people.


The difference between a person and a bot is that a person has a stake in the outcome. A bot is like a person who's already put in their two weeks notice and doesn't have to be there to see the outcome of their work.


That’s still amazing quality output for someone working for under $1/hour?


It's not obvious that one should prefer that, versus not having that output at all.


Why does that matter?

Even if it was a consensus opinion among all HN users, which hardly seems to be the case, it would have little impact on the other billion plus potential customers…


The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure. LLMs, by default, seem to be extremely confident in their answers, and it's quite hard to get the "confidence" level out of them (if that metric is even applicable to LLMs). That's why they are so good at duping people into believing them after all.


> The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure.

People also pull this figure out of their ass, over or undertrust themselves, and lie. I'm not sure self-reported confidence is that interesting compared to "showing your work".


How is this a counter argument that LLMs are marketed as having intelligence when it’s more accurate to think of them as predictive models? The fact that humans are also flawed isn’t super relevant to a $200/month LLM purchasing decision.


Intelligent people will know they made a mistake, if given a hint and figure out what went wrong.

A LLM will just pretend to care about the error and happily repeats the error over and over.


> Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.

I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.

I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?

Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:

    services:
      app:
        # ...
        environment:
          - 'DB_HOST=mysql:3306'
        # ...
      mariadb:
        image: linuxserver/mariadb
        container_name: mariadb
        environment:
          - PUID=1000
          - PGID=1000
          - MYSQL_ROOT_PASSWORD=ROOT_ACCESS_PASSWORD
          - TZ=Europe/London
        volumes:
          - /home/user/appdata/mariadb:/config
        ports:
          - 3306:3306
        restart: unless-stopped
Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.

Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.

There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.

How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?

1. https://docs.linuxserver.io/general/docker-compose/#multiple...


I asked ChatGPT 4o if there's anything that can be improved in your docker-compose file. Among other (seemingly sensible) suggestions, it offered:

## Restrict Host Ports for Security

If app and mariadb are only communicating internally, you can remove 3306:3306 to avoid exposing the port to the host machine:

```yaml ports: - 3306:3306 # Remove this unless external access is required. ```

So, apparently, ChatGPT doesn't need any more convincing.


Here GPT is saying the port is only exposed to the host machine (e.g.: localhost), rather than the full local network.


Wow. I can honestly say I'm surprised it makes that suggestion. That's great!

I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?

I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.


I suspect there is acsolid corpus of advices online that mention the exposed ports risk. Alongside with flawed examples you mentioned. Narrow request will trigger the right response. That's why LLMs are still requiring basic understanding of what exactly you plan to achieve.


Yeah, most people suck at verifying truth and reasoning. But most information technology employees, above intern level, are highly capable of reasoning and making decisions in their area of expertise.

Try asking an LLM complex questions in your area of expertise. Interview it as if you needed to be confident that it could do your job. You'll quickly find out that it can't do your job, and isn't actually capable of reasoning.


> they may argue more peacefully

bit of a stretch.


I would pay $200 for GPT4o. Since GPT4, ChatGPT is absolutely necessary for my work and for my life. It changed every workflow like Google changed. I'm paying $20 to remove ads from youtube which I watch may be once a week, so $20 for ChatGPT was a steal.

That said, my "issue" might be that I usually work alone and I don't have anyone to consult with. I can bother people on forums, but these days forums are pretty much dead and full of trolls, so it's not very useful. ChatGPT was that thing that allows me to progress in this environment. If you work in Google and can ask Rob Pike about something, probably you don't need ChatGPT as much.


this is more or less my take too. if tomorrow all Claude and ChatGPT became $200/month I would still pay. The value they provide me with far, far exceeds that. so many cynics in this thread.


You don't have to be a cynic to be annoyed with a $200/month price. Just make a normal amount of money.


It’s like hiring an assistant. You could hire one for 60k/year. But you wouldn’t do it unless you knew how the assistant could help you make more than 60k per year. If you don’t know what to do with an employee then don’t hire them. If you don’t know what to do with expensive ai, don’t pay for it.


> $200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model.

Is it possible that they have subsidized the infrastructure for free and paid users and they realized that OpenAI requires a higher revenue to maintain the current demand?


Yes, it's entirely possible that they're scrambling to make money. That doesn't actually increase the value that they're offering though.


> $200 a month for this is insane

Losing $5-10b per year also is insane. People are still looking for the added value, it's been 2 whole years now


$200 a month is potentially a bargain since it comes with unlimited advanced voice. Via the API, $200 used to only get you 14 hours of advanced voice.


I've got unlimited "advanced voice" with Perplexity for $10/mo. You're defining a bargain based on the arbitrary limits set by the company offering you said bargain.


The advanced voice of ChatGPT is miles ahead of the Perplexity one. I subscribe to both.


Does it give unlimited API access though?


No (naturally). But my thought process is that if you use advanced voice even half an hour a day, it's probably a fair price based on API costs. If you use it more, for something like language learning or entertaining kids who love it, it's potentially a bargain.


you'll be throttled and rate limited


Is it insane? It's the cost of a new laptop every year. There are about as many people who won't blink at that among practitioners in our field as people who will.

I think the ship has sailed on whether GPT is useful or a con; I've lost track of people telling me it's their first search now rather than Google.

I'd encourage skeptics who haven't read this yet to check out Nicholas' post here:

https://news.ycombinator.com/item?id=41150317


> It's the cost of a new laptop every year.

It's the cost of a new, shiny, Apple laptop every year.


If a model is good enough (I’m not saying this one is that level) I could imagine individuals and businesses paying 20,000 a month. If they’re answering questions at phd level (again, not saying this one is) then for a lot of areas this makes sense


Let me know when the models are actually, verifiably, this good. They're barely good enough to replace interns at this point.


Let me know where you can find people that are individually capable at performing at intern level in every domain of knowledge and text-based activity known to mankind.

"Barely good enough to replace interns" is worth a lot to businesses already.

(On that note, a founder of a SAP competitor and a major IT corporation in Poland is fond of saying that "any specialist can be replaced by a finite number of interns". We'll soon get to see how true that is.)


Cześć!

Since when does SAP have competitors? ;-P

A friend of mine claims most research is nowadays done by undergraduates because all senior folks are too busy.


postdocs but yeah


Let me know what kind of intern you can keep around 24/7 for a total monthly outlay of $200, and then we can compare notes.


Probably one from the Philippines.


Not 24/7.


And probably not one that can guess (often poorly, but at least sometimes quite well, and usually at least very much in the right direction) about everything from nuances of seasoning taco meat to particle physics, and do so in ~an instant.

$200 seems pretty cheap for a 24/7 [remote] intern with these abilities. That kind of money doesn't even buy a month's worth of Big Macs to feed that intern with.

It just seems like a lot (or even absurd) for a subscription to a service on teh Interweb, akin to "$200 for access to a web site? lolwut?"


If true, $2,400/y isn't bad for a 24/7/365 intern.


My main concern with $200/mo is that, as a software dev using foundational LLMs to learn and solve problems, I wouldn't get that much incremental value over the $20/mo tier, which I'm happy to pay for. They'd have to do a pretty incredible job at selling me on the benefits for me to pay 10x the original price. 10x for something like a 5% marginal improvement seems sus.


> but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model

Or each user doing an o1 model prompt is probably like, really expensive and they need to charge for it until they can get cost down? Anybody have estimates on what a single request into o1 costs on their end? Like GPU, memory, all the "thought" tokens?


Perplexity does reasoning and searching, for $10/mo, so I have a hard time believing that it costs OpenAI 20x as much to do the same thing. Especially if OpenAI's model is really more advanced. But of course, no one except internal teams have all of the information about costs.


Do you also think $40K a year for Hubspot is insane? What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?

The truth is that there are people who value the marginal performance -- if you think it's insane, clearly it's not for you.


>What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?

Those people want to purchase status. Unless they ship you a fancy bow tie and a wine tasting at a wood cabin with your chatgpt subscription this isn't gonna last long.

This isn't about marginal performance, it's an increasingly desperate attempt to justify their spending in a market that's increasingly commodified and open sourced. Gotta convince Microsoft somehow to keep the lights on if you blew tens of billions to be the first guy to make a service that 20 different companies are soon gonna sell for pennies.


I'm extremely excited because this margin represents opportunity for all the other LLM startups.


Their demo video was uploading a picture of a birdhouse and asking how to build it


I would say using performance of Perplexity as a benchmark for the quality of o1-pro is a stretch?


Find third party benchmarks of the relevant models and then this discussion is worth having. Otherwise, it's just speculation.


They claim unlimited access, but in practice couldn't a user wrap an API around the app and use it for a service? Or perhaps the client effectively throttles use pretty aggressively?

Interesting to compare this $200 pricing with the recent launch of Amazon Nova, which has not-equivalent-but-impressive performance for 1/10th the cost per million tokens. (Or perhaps OpenAI "shipmas" will include a competing product in the next few days, hence Amazon released early?)

See e.g.: https://mastodon.social/@mhoye/113595564770070726


> After awhile, I started realizing that these mistakes are present in almost all topics.

A fun question I tried a couple of times is asking it to give me a list with famous talks about a topic. Or a list of famous software engineers and the topics they work on.

A couple of names typically exist but many names and basically all talks are shamelessly made up.


If you understood the systems you’re using, you’d know the limitations and wouldn’t marvel at this. Use search engines for searching, calculators for calculating, and LLMs for generating text.


Whenever I’ve used ChatGPT for this exact thing it has been very accurate and didn’t make up anyone


I've actually hit a interesting situation a few times that make use of this. If some language feature, argument, or configuration option doesn't exists it will hallucinate one.

This hallucination is usually a very good choice to name the option / API.


I've seen this before and it's frustrating to deal with chasing phantom APIs it invents.

I wish it could just say "There is not a good approximation of this API existing - I would suggest reviewing the following docs/sources:....".


I’d like to see more evidence that it’s a scam than just your feelings. Any data there?

I certainly don’t see why mere prediction can’t validate reasoning. Sure, it can’t do it perfectly all the time, but neither can people.


> I’d like to see more evidence that it’s a scam

Have you been introduced to their CEO yet? 5 minutes of Worldcoin research should assuage your curiosity.



So you’ve got feelings and guilt by association. And I’ve got a year of using ChatGPT, which has saved tens to hundreds of hours of tedious work.

Forgive me for not finding your argument persuasive.


Guilt by association? It's literally the same guy.


You’re saying the company’s product has no value because another company by the same guy produced no value. That is the literal definition of guilt by association: you are judging the chatgpt produced based on the worldcoin product’s value.

As a customer, I don’t care about the people. I’m not interested in either argument by authority (if Altman says it’s good it must be good) or ad hominem (that Altman guy is a jerk, nothing he does can have value).

The actual product. Have you tried it? With an open mind?


Ah, so you're one of the "I separate the art from the artist, so I'm allowed to listen to Kanye" kinda people. I respect that, at least when the product is something of subjective value like art. In this case, 3 months of not buying ChatGPT Pro would afford you the funding to build your own damn AI cluster.

To be honest, it doesn't matter what the price of producing AI is, though. $200/month is, and will be a stupid price to pay because OpenAI already invented a price point with a half billion users - free. When they charged $10/month, at least they weren't taking advantage of the mentally ill. This... this is a grift, and a textbook one at that.


It is true that I separate art from artist. Mostly because otherwise there would be very little art to enjoy.

You don’t sound like you’re very familiar with the chatgpt product. They have about 10m customers paying $20/month. I’m one of them, and I honestly get way more than $200/month value from it.

Perhaps I’m “mentally ill”, but I’d ask you to do some introspection and see if leaping to that characterization is really the best way to explain people who get value where you see none.


> In other words, it's a con.

Such a silly conclusion to draw based on a gut feeling, and to see all comments piggyback on it like it's a given feels like I'm going crazy. How can you all be so certain?


You don't have to be certain to be skeptical. But you should definitely be certain before you buy.


I am a moderately successful software consultant and it is not even 1% of my revenue. So definitely not insane if it delivers the value.

What I doubt though is that it can reach a mass market even in business. A good large high resolution screen is something that I absolutely consider to deliver the value it costs. Most businesses don’t think their employees deserve a 2k screen which will last for 6-10 years and thus costs just a fraction of this offering.

Apparently the majority of businesses don’t believe in marginal gains


I mean this in what I hope will be taken in the most helpful way possible: you should update your thinking to at least imagine that intelligent thoughtful people see some value in ChatGPT. Or alternately that some of the people who see value in ChatGPT are intelligent and thoughtful. That is, aim for the more intelligent "Interesting, why do so many people like this? Where is it headed? Given that, what is worth doing now, and what's worth waiting on?" over the "This doesn't meet my standards in my domain, ergo people are getting scammed."

I'll pay $200 a month, no problem; right now o1-preview does the work for me of a ... somewhat distracted graduate student who needs checking, all for under $1 / day. It's slow for an LLM, but SUPER FAST for a grad student. If I can get a more rarely distracted graduate student that's better at coding for $7/day, well, that's worth a try. I can always cancel.


[flagged]


I think you did make some strong inferences about others when you said "it's a con." But I'm actually not interested in the other people, or defending against an ad hominem attack - I'm comfortable making my own purchase decisions.

My intent was to say "you seem like a smart person, but you seem to have a blind spot here, might benefit you to stay more open minded."


Look in the mirror and read your comment back.


Could be a case of price discrimination [1], and a way to fuel the hype.

[1] https://www.investopedia.com/terms/p/price_discrimination.as...


target market is probably people who will write it off as a business expense


The performance difference seems minor, so this is a great way for the company to get more of its funding from whales versus increasing the base subscription fee.


Couldn't disagree more, I will be signing up for this as soon as I can, and it's a complete no brainer.


What will you be using it for? Where do you think you'll see the biggest benefit over the cheaper plan?


For programming. I've already signed up for it and it seems quite good (the o1 pro model I mean). I was also running into constraints on o1-preview before so it will be nice to not have to worry about that either. I wish I could get a similar more expensive plan for Claude 3.5 Sonnet that would let me make more requests.


The megga disappointment is o1 is performing worse than o1-preview [1], and claude 3.6 had already nearly caught up to o1-preview.

1. https://x.com/nrehiew_/status/1864763064374976928


Considering no one makes money in AI, maybe this is just economics.


Is $200 a lot if you end up using it quite often?

It makes me wonder why they don't want to offer a usage based pricing model.

Is it because people really believe it makes a much worse product offering?

Why not offer some of the same capability as pay-per-use?


I'm singing up when I get home tonight.


Remember the whole "how many r's in strawberry" thing?

Yeah, not really fixed: https://imgur.com/a/counting-letters-with-chatgpt-7cQAbu0


Exactly I thought this, People falsely equate high price == high quality. Basically with the $200 you are just donating to their cloud bills


it's literally the cost of a cup per coffee per day


This argument only works in isolation, and only for a subset of people. “Cost of a cup of coffee per day” makes it sound horrifically overpriced to me, given how much more expensive a coffee shop is than brewing at home.


Or the price of replacing your espresso machine on a monthly basis.


When you put it this way, I think I need to finally buy that espresso machine.


In America. If you drink your coffee from coffee shops.


> it's literally the cost of a cup per coffee per day

So, AI market is capped by Starbucks revenue/valuation.


I don’t drink coffee. But even if I did, and I drank it everyday at a coffeehouse or restaurant in my country (which would be significantly higher quality than something like a Starbucks), it wouldn’t come close to that cost.


I pay $1.5 USD per day on my coffee. And I'm an extreme outlier. I buy speciality beans from mom and pop roasters.


Not if you make coffee at home.


Maybe in an expensive coffee shop in the USA.

In Italy, an espresso is ca. 1€.


Or an Avacado Toast.


Not to be glib, but where do you live such that a single cup of coffee runs you seven USD?

Just to put that into perspective.

I also really don't find comparisons like this to be that useful. Any subscription can be converted into an exchange rate of coffee, or meals. So what?


You're right - at my coffee shop a cup of coffee is nine


Yeah but the coffee makes you more productive


What evidence or data, if you (hypothetically) saw it, do you think would disprove the thesis that "[LLMs] will ALWAYS be the wrong tool for the job"?


You're attempting to set goal posts for a logical argument, like we're talking about religion or politics, and you've skipped the part about mutually agreeing on definitions. Define what an LLM is, in technical terms, and you will have your answer about why it is not intelligent, and not capable of reasoning. It is a statistical language model that predicts the next token of a plausible response, one token at a time. No matter how you dress it up, that's all it can ever do, by definition. The evidence or data that would change my mind is if instead of talking about LLMs, we were talking about some other technology that does not yet exist, but that is fundamentally different than an LLM.


If we defined "LLM" as "any deep learning model which uses the GPT transformer architecture and is trained using autoregressive next-token prediction", and then we empirically observed that such a model proved the Riemann Hypothesis before any human mathematician, it would seem very silly to say that it was "not intelligent and not capable of reasoning" because of an a-priori logical argument. To be clear, I think that probably won't happen! But I think it's ultimately an empirical question, not a logical or philosophical one. (Unless there's some sort of actual mathematical proof that would set upper bounds on the capabilities of such a model, which would be extremely interesting if true! but I haven't seen one.)


Let's talk when we've got LLMs proving the Riemann Hypothesis (or any mathematical hypothesis) without any proofs in the training data. I'm confident in my belief that an LLM can't do that, and will never be able to. LLMs can barely solve elementary school math problems reliably.


If the cure for cancer arrived to us in the form of the most probable token being predicted one at a time, would your view on the matter change in any way?

In other words, do you have proof that this medium of information output is doomed to forever be useless in producing information that adds value to the world?

These are of course rhetorical questions that you nor anyone else can answer today, but you seem to have a weird sort of absolute position on this matter, as if a lot depended on your sentiment being correct.


My new internet is in his 3rd day in the job and he's still behind o1-preview with less than 25 prompts.


Sounds like you're the perfect customer for this offer then. Good luck!


I'm in a low-cost country, haha! So the intern is even cheaper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: