$200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model. In other words, it's a con. I'm a paying Perplexity user, and Perplexity already does this same sort of reasoning. At first it seemed impressive, then I started noticing mistakes in topics I'm an expert in. After awhile, I started realizing that these mistakes are present in almost all topics, if you check the sources and do the reasoning yourself.
LLMs are very good at giving plausible answers, but calling them "intelligent" is a misnomer. They're nothing more than predictive models, very useful for some things, but will ALWAYS be the wrong tool for the job when it comes to verifying truth and reasoning.
This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.
Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.
That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.
All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.
A con? It's not that $200 is a con, their whole existence is a con.
They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.
> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]
They haven’t raised the price, they have added new models to the existing tier with better performance at the same price.
They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.
Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.
We'll have to see if the first bump to $22 this year ends up happening.
You're technically right. New models will likely be incremental upgrades at a hefty premium. But considering the money they're loosing, this pricing likely better reflects their costs.
They're throwing products at the wall to see what sticks. They're trying to rapidly morph from a research company into a product company.
Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).
AI may be over hyped and it may have flaws (I think it is both)... but it may also be totally worth $200 / month to many people. My brother is getting way more value than that out of it for instance.
So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.
Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].
The thing that matters is product.
[1] Llama, QwQ, Mistral, ...
[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.
[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.
[4] The research on world models is expansive and there are lots of open source models and weights in the space.
A better way to express it than a "con" is that it's a price-framing device. It's like listing a watch at an initial value of $2,000 so that people will feel content to buy it at $400.
The line between ‘con’ and ‘genuine value synthesised in the eye of the buyer using nothing but marketing’ is very thin. If people are happy, they are happy.
Few days ago I had issue with IPsec VPN behind NAT. I spend few hours Googling around, tinkering with system, I had some rough understanding of what goes wrong, but not much and I had no idea how to solve this issue.
I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.
I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.
It is really great when it works, but challenge is I've had it sometimes not understanding a detailed programming question and it confidently gives an incorrect answer. Going back and forth a few times ends up clear it really does not know answer, but I end up going in circles. I know LLMs can't really tell you "sorry I don't know this one", but I wish they could.
The exhaustive question makes ChatGPT reconstruct your answer in real-time, while all you need to do is sleep; your brain will construct the answer and deliver it tomorrow morning.
The benefit of getting an answer immediately rather than tomorrow morning is why people are sometimes paid more for on-call rates rather than everyone being 9-5.
(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).
I bet users won't pay for the power, but for a guarantee of access! I always hear about people running out of compute time for ChatGPT. Obvious answer is charge more for a higher quality service.
Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)
> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one
This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.
As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:
I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.
My mum was very fond of homeopathic pills and Bach flower tinctures, for example.
* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.
Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.
The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.
Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.
I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?
I suspect they're saying that for a lot of us, Apple provides enough value compared to the competition that we buy them despite the premium prices (and, on iOS, the lock-in).
It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.
And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.
Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.
They only have to be consistently better than the competition, and they are, by far. I always look for reviews before buying anything, and even then I've been nothing but disappointed by the likes of Razer, LG, Samsung, etc.
The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.
People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.
Here in Brazil Apple is very much all about showing off how rich you are. Especially since we have some of the most expensive Apple products in the world.
Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.
>Whether LLM's help you at your work is extremely domain-dependent.
I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.
There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.
I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.
When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.
It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.
I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.
First of all, that's moving the goalposts to next state over, relative to what I replied to.
Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.
Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).
Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).
--
[0] - Per the article you linked; I'm yet to find and read the actual study itself.
LLMs have become indispensable for many attorneys. I know many other professionals that have been able to offload dozens of hours of work per month to ChatGPT and Claude.
Arguably the same problem is occurs in programming: Anything so formulaic and common that an LLM can regurgitate it with a decent level of reliability... is something that ought to have been folded into method/library already.
Or it already exists in some howto documentation, but nobody wanted to skim the documentation.
As a customer of legal work for 20 years, it is also way (way way) faster and cheaper to draft a contract with Claude (total work ~1 hour, even with complex back-and-forth ; you don't want to try to one-shot it in a single prompt) and then pay a law firm their top dollar-per-hour consulting to review/amend the contract (you can get to the final version in a day).
Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.
That's interesting. I've never had a law firm be straightforward about the (obvious) fact they'll be using a boilerplate.
I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)
Do you ask about the boilerplate before or after you ask for a quote?
I typically don’t ask for a quote upfront since they are very fair with their business and billing practices.
I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.
I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.
Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.
Yeah the industries LLMs will disrupt the most are the ones who gatekeep busywork. SWE falls into this to some degree but other professions are more guilty than us. They dont replace intelligence they just surface jobs which never really required much intelligence to begin with.
If you do a lot of work in an area that o1 is strong in - $200/month effectively rounds down to $0 - and a single good answer at the right time could justify that entire $200 in a single go.
It's so strange to me that in a forum full of programmers, people don't seem to understand that you set up systems to detect errors before they cause problems. That's why I find ChatGPT so useful for helping me with programming - I can tell if it makes a mistake because... the code doesn't do what I want it to do. I already have testing and linting set up to catch my own mistakes, and those things also catch AI's mistakes.
Thank you! I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is; it's like people want it 100% perfect or nothing. For me if it gets me 80% there in 1/10 the time, and then I do the final 20%, that's still heck of a productivity boost basically for free.
Yep, I’m with you. I’m a solo dev who never went to college… o1 makes far fewer errors than I do! No chance I’d make it past round one of any sort of coding tournament. But I managed to bootstrap a whole saas company doing all the coding myself, which involved setting up a lot of guard rails to catch my own mistakes before they reached production. And now I can consult with a programming intelligence the likes of which I could never afford to hire if it was a person. It’s amazing.
Not sure what you're referring to exactly. But broadly yes it is working for me - the number of new features I get out to users has sped up greatly, and stability of my product has also gone up.
Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.
I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.
What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?
> Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.
This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.
For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.
I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is;
People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.
Of course, but for every thoroughly set up TDD environment, you have a hundred other people just blindly copy pasting LLM output into their code base and trusting the code based on a few quick sanity checks.
>I can tell if it makes a mistake because... the code doesn't do what I want it to do
Sometimes it does what you want it to do, but still creates a bug.
Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.
Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.
I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.
So the AI wrote a bug; but if humans wouldn’t catch it in code review, then obviously they could have written the same bug. Which shouldn’t be surprising because LLMs didn’t invent the concept of bugs.
I use LLMs maybe a few times a month but I don’t really follow this argument against them.
Code reviewing is not the same thing as writing code. When you're writing code you're supposed to look at the documentation and do some exploration before the final code is pushed.
It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.
Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.
So true, and people seem to gloss over this fact completely. They only talk about correcting the LLM's code while the opposite is much more common for me.
When you highlight only the negatives, yeah it does sound like no one should hire that intern. But what if the same intern happens to have an encyclopedia for a brain and can pour through massive documents and codebases to spot and fix countless human errors in a snap?
There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.
Those are fair points, I didn't mean to imply that there are only negatives, and I don't consider myself to be in the former camp you describe as wanting nothing to do with these "interns". I shouldn't have stuck with the intern analogy at all since it's difficult for me to compare the two, with one being fairly autonomous and the other being totally reliant on a prompter.
The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.
No they're right. ChatGPT (and all chargers) responds confidently while making simple errors. Disclaimers upon signup or in tiny corner text are so at odds with the actual chat experience.
What I meant to say was that the model uses the verbiage of a maximally confident human. In my experience the interns worth having have some sense of the limits of their knowledge and will tell you "I don't know" or qualify information with "I'm not certain, but..."
If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.
I think the point is that an LLM almost always responds with the appearance of high confidence. It will much quicker hallucinate than say "I don't know."
And we, as humans, are having a hard time compartmentalizing and forgetting our lifetimes of language cues, which typically correlate with attention to detail, intelligence, time investment, etc.
New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)
Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...
Why do companies hire junior devs? You still want a senior to review the PRs before they merge into the product right? But the net benefit is still there.
I started incorporating LLMs into my workflows around the time gpt-3 came out. By comparison to its performance at that point, it sure feels like my junior is starting to become a senior.
Are you implying this technology will remain static in its capabilities going forward despite it having seen significant improvement over the last few years?
Someone asked why I hire juniors. I said I hire juniors because they get better. I don't need to use the model for it to get better, I can just wait until it's good and use it then. That's the argument.
They provide some value, but between the time they take in coaching, reviewing their work, support, etc, I'm fairly sure one senior developer has a much higher work per dollar ratio than the junior.
I don't know anyone who does something and at first says, "This will be a mistake" Maybe they say, "I am pretty sure this is the right thing to do," then they make a mistake.
If it's easier mentally, just put that second sentence in from of every chatgpt answer.
Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.
Doesn't that completely depend on those chances and the magnitude of +whatever?
It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.
If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.
What specific use cases are you referring to where that poses a risk? I've been using LLMs for years now (both directly and as part of applications) and can't think of a single instance where the output constituted a risk or where it was relied upon for critical decisions.
Presumably, this is what they want the marks buying the $200 plan to think. Whether it's actually capable of providing answers worth $200 and not just sweet talking is the whole question.
Yep. I’m currently paying for both Claude and chatgpt because they’re good at different things. I can’t tell whether this is extremely cheap or expensive - last week Claude saved me about a day of time by writing a whole lot of very complex sql queries for me. The value is insane.
yeah, as someone who is far from programming, the amount of time and money it saved me helping me make sql queries and making php code for wordpress is insane. It even helped me fix some wordpress plugins that had errors and you just copy paste or even screenshot those errors until they get fixed! If used correctly and efficiently the value is insane, I would say $20, $200 is still cheap for such an amazing tool.
Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.
I think at least LLMs are more receptive to the idea that they may be wrong, and based on that, we can have N diverse LLMs and they may argue more peacefully and build a reliable consensus than N "intelligent" people.
The difference between a person and a bot is that a person has a stake in the outcome. A bot is like a person who's already put in their two weeks notice and doesn't have to be there to see the outcome of their work.
Even if it was a consensus opinion among all HN users, which hardly seems to be the case, it would have little impact on the other billion plus potential customers…
The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure. LLMs, by default, seem to be extremely confident in their answers, and it's quite hard to get the "confidence" level out of them (if that metric is even applicable to LLMs). That's why they are so good at duping people into believing them after all.
> The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure.
People also pull this figure out of their ass, over or undertrust themselves, and lie. I'm not sure self-reported confidence is that interesting compared to "showing your work".
How is this a counter argument that LLMs are marketed as having intelligence when it’s more accurate to think of them as predictive models? The fact that humans are also flawed isn’t super relevant to a $200/month LLM purchasing decision.
> Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.
I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.
I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?
Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:
Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.
Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.
There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.
How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?
Wow. I can honestly say I'm surprised it makes that suggestion. That's great!
I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?
I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.
I suspect there is acsolid corpus of advices online that mention the exposed ports risk. Alongside with flawed examples you mentioned. Narrow request will trigger the right response. That's why LLMs are still requiring basic understanding of what exactly you plan to achieve.
Yeah, most people suck at verifying truth and reasoning. But most information technology employees, above intern level, are highly capable of reasoning and making decisions in their area of expertise.
Try asking an LLM complex questions in your area of expertise. Interview it as if you needed to be confident that it could do your job. You'll quickly find out that it can't do your job, and isn't actually capable of reasoning.
I would pay $200 for GPT4o. Since GPT4, ChatGPT is absolutely necessary for my work and for my life. It changed every workflow like Google changed. I'm paying $20 to remove ads from youtube which I watch may be once a week, so $20 for ChatGPT was a steal.
That said, my "issue" might be that I usually work alone and I don't have anyone to consult with. I can bother people on forums, but these days forums are pretty much dead and full of trolls, so it's not very useful. ChatGPT was that thing that allows me to progress in this environment. If you work in Google and can ask Rob Pike about something, probably you don't need ChatGPT as much.
this is more or less my take too. if tomorrow all Claude and ChatGPT became $200/month I would still pay. The value they provide me with far, far exceeds that. so many cynics in this thread.
It’s like hiring an assistant. You could hire one for 60k/year. But you wouldn’t do it unless you knew how the assistant could help you make more than 60k per year. If you don’t know what to do with an employee then don’t hire them. If you don’t know what to do with expensive ai, don’t pay for it.
> $200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model.
Is it possible that they have subsidized the infrastructure for free and paid users and they realized that OpenAI requires a higher revenue to maintain the current demand?
I've got unlimited "advanced voice" with Perplexity for $10/mo. You're defining a bargain based on the arbitrary limits set by the company offering you said bargain.
No (naturally). But my thought process is that if you use advanced voice even half an hour a day, it's probably a fair price based on API costs. If you use it more, for something like language learning or entertaining kids who love it, it's potentially a bargain.
Is it insane? It's the cost of a new laptop every year. There are about as many people who won't blink at that among practitioners in our field as people who will.
I think the ship has sailed on whether GPT is useful or a con; I've lost track of people telling me it's their first search now rather than Google.
I'd encourage skeptics who haven't read this yet to check out Nicholas' post here:
If a model is good enough (I’m not saying this one is that level) I could imagine individuals and businesses paying 20,000 a month. If they’re answering questions at phd level (again, not saying this one is) then for a lot of areas this makes sense
Let me know where you can find people that are individually capable at performing at intern level in every domain of knowledge and text-based activity known to mankind.
"Barely good enough to replace interns" is worth a lot to businesses already.
(On that note, a founder of a SAP competitor and a major IT corporation in Poland is fond of saying that "any specialist can be replaced by a finite number of interns". We'll soon get to see how true that is.)
And probably not one that can guess (often poorly, but at least sometimes quite well, and usually at least very much in the right direction) about everything from nuances of seasoning taco meat to particle physics, and do so in ~an instant.
$200 seems pretty cheap for a 24/7 [remote] intern with these abilities. That kind of money doesn't even buy a month's worth of Big Macs to feed that intern with.
It just seems like a lot (or even absurd) for a subscription to a service on teh Interweb, akin to "$200 for access to a web site? lolwut?"
My main concern with $200/mo is that, as a software dev using foundational LLMs to learn and solve problems, I wouldn't get that much incremental value over the $20/mo tier, which I'm happy to pay for. They'd have to do a pretty incredible job at selling me on the benefits for me to pay 10x the original price. 10x for something like a 5% marginal improvement seems sus.
> but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model
Or each user doing an o1 model prompt is probably like, really expensive and they need to charge for it until they can get cost down? Anybody have estimates on what a single request into o1 costs on their end? Like GPU, memory, all the "thought" tokens?
Perplexity does reasoning and searching, for $10/mo, so I have a hard time believing that it costs OpenAI 20x as much to do the same thing. Especially if OpenAI's model is really more advanced. But of course, no one except internal teams have all of the information about costs.
Do you also think $40K a year for Hubspot is insane? What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?
The truth is that there are people who value the marginal performance -- if you think it's insane, clearly it's not for you.
>What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?
Those people want to purchase status. Unless they ship you a fancy bow tie and a wine tasting at a wood cabin with your chatgpt subscription this isn't gonna last long.
This isn't about marginal performance, it's an increasingly desperate attempt to justify their spending in a market that's increasingly commodified and open sourced. Gotta convince Microsoft somehow to keep the lights on if you blew tens of billions to be the first guy to make a service that 20 different companies are soon gonna sell for pennies.
They claim unlimited access, but in practice couldn't a user wrap an API around the app and use it for a service? Or perhaps the client effectively throttles use pretty aggressively?
Interesting to compare this $200 pricing with the recent launch of Amazon Nova, which has not-equivalent-but-impressive performance for 1/10th the cost per million tokens. (Or perhaps OpenAI "shipmas" will include a competing product in the next few days, hence Amazon released early?)
> After awhile, I started realizing that these mistakes are present in almost all topics.
A fun question I tried a couple of times is asking it to give me a list with famous talks about a topic. Or a list of famous software engineers and the topics they work on.
A couple of names typically exist but many names and basically all talks are shamelessly made up.
If you understood the systems you’re using, you’d know the limitations and wouldn’t marvel at this. Use search engines for searching, calculators for calculating, and LLMs for generating text.
I've actually hit a interesting situation a few times that make use of this. If some language feature, argument, or configuration option doesn't exists it will hallucinate one.
This hallucination is usually a very good choice to name the option / API.
You’re saying the company’s product has no value because another company by the same guy produced no value. That is the literal definition of guilt by association: you are judging the chatgpt produced based on the worldcoin product’s value.
As a customer, I don’t care about the people. I’m not interested in either argument by authority (if Altman says it’s good it must be good) or ad hominem (that Altman guy is a jerk, nothing he does can have value).
The actual product. Have you tried it? With an open mind?
Ah, so you're one of the "I separate the art from the artist, so I'm allowed to listen to Kanye" kinda people. I respect that, at least when the product is something of subjective value like art. In this case, 3 months of not buying ChatGPT Pro would afford you the funding to build your own damn AI cluster.
To be honest, it doesn't matter what the price of producing AI is, though. $200/month is, and will be a stupid price to pay because OpenAI already invented a price point with a half billion users - free. When they charged $10/month, at least they weren't taking advantage of the mentally ill. This... this is a grift, and a textbook one at that.
It is true that I separate art from artist. Mostly because otherwise there would be very little art to enjoy.
You don’t sound like you’re very familiar with the chatgpt product. They have about 10m customers paying $20/month. I’m one of them, and I honestly get way more than $200/month value from it.
Perhaps I’m “mentally ill”, but I’d ask you to do some introspection and see if leaping to that characterization is really the best way to explain people who get value where you see none.
Such a silly conclusion to draw based on a gut feeling, and to see all comments piggyback on it like it's a given feels like I'm going crazy. How can you all be so certain?
I am a moderately successful software consultant and it is not even 1% of my revenue. So definitely not insane if it delivers the value.
What I doubt though is that it can reach a mass market even in business. A good large high resolution screen is something that I absolutely consider to deliver the value it costs. Most businesses don’t think their employees deserve a 2k screen which will last for 6-10 years and thus costs just a fraction of this offering.
Apparently the majority of businesses don’t believe in marginal gains
I mean this in what I hope will be taken in the most helpful way possible: you should update your thinking to at least imagine that intelligent thoughtful people see some value in ChatGPT. Or alternately that some of the people who see value in ChatGPT are intelligent and thoughtful. That is, aim for the more intelligent "Interesting, why do so many people like this? Where is it headed? Given that, what is worth doing now, and what's worth waiting on?" over the "This doesn't meet my standards in my domain, ergo people are getting scammed."
I'll pay $200 a month, no problem; right now o1-preview does the work for me of a ... somewhat distracted graduate student who needs checking, all for under $1 / day. It's slow for an LLM, but SUPER FAST for a grad student. If I can get a more rarely distracted graduate student that's better at coding for $7/day, well, that's worth a try. I can always cancel.
I think you did make some strong inferences about others when you said "it's a con." But I'm actually not interested in the other people, or defending against an ad hominem attack - I'm comfortable making my own purchase decisions.
My intent was to say "you seem like a smart person, but you seem to have a blind spot here, might benefit you to stay more open minded."
The performance difference seems minor, so this is a great way for the company to get more of its funding from whales versus increasing the base subscription fee.
For programming. I've already signed up for it and it seems quite good (the o1 pro model I mean). I was also running into constraints on o1-preview before so it will be nice to not have to worry about that either. I wish I could get a similar more expensive plan for Claude 3.5 Sonnet that would let me make more requests.
This argument only works in isolation, and only for a subset of people. “Cost of a cup of coffee per day” makes it sound horrifically overpriced to me, given how much more expensive a coffee shop is than brewing at home.
I don’t drink coffee. But even if I did, and I drank it everyday at a coffeehouse or restaurant in my country (which would be significantly higher quality than something like a Starbucks), it wouldn’t come close to that cost.
Not to be glib, but where do you live such that a single cup of coffee runs you seven USD?
Just to put that into perspective.
I also really don't find comparisons like this to be that useful. Any subscription can be converted into an exchange rate of coffee, or meals. So what?
You're attempting to set goal posts for a logical argument, like we're talking about religion or politics, and you've skipped the part about mutually agreeing on definitions. Define what an LLM is, in technical terms, and you will have your answer about why it is not intelligent, and not capable of reasoning. It is a statistical language model that predicts the next token of a plausible response, one token at a time. No matter how you dress it up, that's all it can ever do, by definition. The evidence or data that would change my mind is if instead of talking about LLMs, we were talking about some other technology that does not yet exist, but that is fundamentally different than an LLM.
If we defined "LLM" as "any deep learning model which uses the GPT transformer architecture and is trained using autoregressive next-token prediction", and then we empirically observed that such a model proved the Riemann Hypothesis before any human mathematician, it would seem very silly to say that it was "not intelligent and not capable of reasoning" because of an a-priori logical argument. To be clear, I think that probably won't happen! But I think it's ultimately an empirical question, not a logical or philosophical one. (Unless there's some sort of actual mathematical proof that would set upper bounds on the capabilities of such a model, which would be extremely interesting if true! but I haven't seen one.)
Let's talk when we've got LLMs proving the Riemann Hypothesis (or any mathematical hypothesis) without any proofs in the training data. I'm confident in my belief that an LLM can't do that, and will never be able to. LLMs can barely solve elementary school math problems reliably.
If the cure for cancer arrived to us in the form of the most probable token being predicted one at a time, would your view on the matter change in any way?
In other words, do you have proof that this medium of information output is doomed to forever be useless in producing information that adds value to the world?
These are of course rhetorical questions that you nor anyone else can answer today, but you seem to have a weird sort of absolute position on this matter, as if a lot depended on your sentiment being correct.
LLMs are very good at giving plausible answers, but calling them "intelligent" is a misnomer. They're nothing more than predictive models, very useful for some things, but will ALWAYS be the wrong tool for the job when it comes to verifying truth and reasoning.