$200 a month for this is insane, but I have a feeling that part of the reason th...

crazygringo · 2024-12-05T19:01:10 1733425270

> In other words, it's a con.

A con like that wouldn't last very long.

This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.

Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.

gwervc · 2024-12-05T19:45:00 1733427900

> A con like that wouldn't last very long.

That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.

All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.

mikae1 · 2024-12-05T20:17:05 1733429825

A con? It's not that $200 is a con, their whole existence is a con.

They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.

> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]

Surprise surprise, they just raised the price.

[1] https://www.wheresyoured.at/oai-business/

luma · 2024-12-05T20:47:32 1733431652

They haven’t raised the price, they have added new models to the existing tier with better performance at the same price.

They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.

jsheard · 2024-12-05T21:16:25 1733433385

They haven't raised the price yet but NYT has seen internal documents saying they do plan to.

https://www.nytimes.com/2024/09/27/technology/openai-chatgpt...

Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.

We'll have to see if the first bump to $22 this year ends up happening.

ethbr1 · 2024-12-05T21:45:04 1733435104

Reasoning through that from a customer perspective is interesting.

I'm hard pressed to identify any users to whom LLMs are providing enough value to justify $20/month, but not $44.

On the other hand, I can see a lot of people to whom it's not providing any value being unable to afford a higher price.

Guess we'll see which category most OpenAI users are in.

sdesol · 2024-12-05T21:44:06 1733435046

> We'll have to see if the bump to $22 this year ends up happening.

I can't read the article. Any mention of the API pricing?

mikae1 · 2024-12-05T21:08:15 1733432895

You're technically right. New models will likely be incremental upgrades at a hefty premium. But considering the money they're loosing, this pricing likely better reflects their costs.

echelon · 2024-12-05T20:58:10 1733432290

They're throwing products at the wall to see what sticks. They're trying to rapidly morph from a research company into a product company.

Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).

grogenaut · 2024-12-05T20:22:09 1733430129

AI may be over hyped and it may have flaws (I think it is both)... but it may also be totally worth $200 / month to many people. My brother is getting way more value than that out of it for instance.

So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.

echelon · 2024-12-05T20:55:32 1733432132

> the competition is intense from free models

Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].

The thing that matters is product.

[1] Llama, QwQ, Mistral, ...

[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.

[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.

[4] The research on world models is expansive and there are lots of open source models and weights in the space.

john-radio · 2024-12-05T20:28:53 1733430533

A better way to express it than a "con" is that it's a price-framing device. It's like listing a watch at an initial value of $2,000 so that people will feel content to buy it at $400.

jl6 · 2024-12-05T20:37:41 1733431061

That sounds like a con to me too.

xanderlewis · 2024-12-05T21:11:49 1733433109

The line between ‘con’ and ‘genuine value synthesised in the eye of the buyer using nothing but marketing’ is very thin. If people are happy, they are happy.

pera · 2024-12-05T20:18:30 1733429910

> A con like that wouldn't last very long.

The NFT market lasted for many years and was enormous.

Never underestimate the power of hype.

omarhaneef · 2024-12-05T19:57:21 1733428641

I think this is probably right but so far it seems that the areas in which an LLM is most effective do fine with the lower power models.

Example: the 4o or Claude are great for coding, summarizing and rewriting emails. So which domains require a slightly better model?

I suppose if the error rate in code or summary goes down even 10%, it might be worth $180/month.

vbezhenar · 2024-12-05T21:17:52 1733433472

Few days ago I had issue with IPsec VPN behind NAT. I spend few hours Googling around, tinkering with system, I had some rough understanding of what goes wrong, but not much and I had no idea how to solve this issue.

I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.

I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.

gedy · 2024-12-05T21:50:37 1733435437

It is really great when it works, but challenge is I've had it sometimes not understanding a detailed programming question and it confidently gives an incorrect answer. Going back and forth a few times ends up clear it really does not know answer, but I end up going in circles. I know LLMs can't really tell you "sorry I don't know this one", but I wish they could.

BOOSTERHIDROGEN · 2024-12-05T21:53:18 1733435598

The exhaustive question makes ChatGPT reconstruct your answer in real-time, while all you need to do is sleep; your brain will construct the answer and deliver it tomorrow morning.

ben_w · 2024-12-05T23:47:17 1733442437

The benefit of getting an answer immediately rather than tomorrow morning is why people are sometimes paid more for on-call rates rather than everyone being 9-5.

(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).

ducttapecrown · 2024-12-05T20:16:27 1733429787

I bet users won't pay for the power, but for a guarantee of access! I always hear about people running out of compute time for ChatGPT. Obvious answer is charge more for a higher quality service.

taco_emoji · 2024-12-05T20:26:57 1733430417

> A con like that wouldn't last very long.

Bernie Madoff ran his investment fund as a Ponzi scheme for over a decade (perhaps several decades)

px1999 · 2024-12-05T20:33:54 1733430834

Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)

> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one

This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.

mrandish · 2024-12-05T21:05:59 1733432759

> ... or their employer does.

I suspect this is a key driver behind having a higher priced, individual user offering. It gives pricing latitude for enterprise volume licenses.

999900000999 · 2024-12-05T22:03:11 1733436191

Ok.

Let's say I run a company call AndSoft.

AndSoft has about 2000 people on staff, maybe 1000 programers.

This solution will cost 200k per year. Or 2.4 million per year.

Llama3 is effectively free with some liberation. Is ChatGPT pro 2.4 million a year better than Llama3. Of course Open AI will offer volume discounts.

I imagine if I was making north of 500k a year I'd subscribe as a curiosity... At least for a few months.

If your time is worth 250$ a hour, and this saves you an hour per month it's well worth it.

ben_w · 2024-12-05T21:29:18 1733434158

> A con like that wouldn't last very long

As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:

I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.

My mum was very fond of homeopathic pills and Bach flower tinctures, for example.

* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.

shortrounddev2 · 2024-12-05T19:54:12 1733428452

Overcharging for a product to make it seem better than it really is has served apple well for decades

crazygringo · 2024-12-05T19:59:12 1733428752

That's a tired trope that simply isn't true.

Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.

The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.

windexh8er · 2024-12-05T20:21:25 1733430085

> consistent repair support

Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.

I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?

chipotle_coyote · 2024-12-05T21:48:57 1733435337

I suspect they're saying that for a lot of us, Apple provides enough value compared to the competition that we buy them despite the premium prices (and, on iOS, the lock-in).

It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.

And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.

Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.

cruano · 2024-12-05T20:56:58 1733432218

They only have to be consistently better than the competition, and they are, by far. I always look for reviews before buying anything, and even then I've been nothing but disappointed by the likes of Razer, LG, Samsung, etc.

Aeolun · 2024-12-05T20:31:35 1733430695

I used to love to bash on Apple too. But ever since I’ve had the money all my hardware (except desktop PC) has been apple.

There’s something to be said for buying something and knowing it will interoperate with all your other stuff perfectly.

shortrounddev2 · 2024-12-05T20:05:58 1733429158

> consistent repair support

The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.

People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.

Draiken · 2024-12-05T20:17:07 1733429827

Here in Brazil Apple is very much all about showing off how rich you are. Especially since we have some of the most expensive Apple products in the world.

Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.

vbezhenar · 2024-12-05T21:21:30 1733433690

Same in Kazakhstan. It's all about status. Many poor persons get a credit to buy iPhones, because they want to look rich.

xanderlewis · 2024-12-05T21:13:23 1733433203

Apple products are expensive — sometimes to a degree that almost seems to be taking the piss.

But name one other company whose hardware truly matches Apple’s standards for precision and attention to detail.

ingen0s · 2024-12-05T21:57:38 1733435858

Indeed

matteoraso · 2024-12-05T19:42:59 1733427779

>Whether LLM's help you at your work is extremely domain-dependent.

I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.

TeMPOraL · 2024-12-05T20:01:28 1733428888

That opinion made sense two years ago. It's plain weird to still hold it today.

JoshTriplett · 2024-12-05T20:15:20 1733429720

There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.

EDIT: Added links.

https://www.cio.com/article/3540579/devs-gaining-little-if-a...

https://web.archive.org/web/20241205204237/https://llmreport...

(Archive link because the llmreporter site seems to have an expired TLS certificate at the moment.)

No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance...

grogenaut · 2024-12-05T20:28:01 1733430481

I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.

mattkrause · 2024-12-05T21:20:21 1733433621

$100 to save 15 minutes implies that you net at least $800,000 a year. Well done if so!

grogenaut · 2024-12-06T02:11:55 1733451115

When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.

afro88 · 2024-12-05T21:41:43 1733434903

> but really saved me from moving to a different task and coming back

You missed this part. Being able to quickly fix things without deep thought while in flow saves you from the slowdowns of context switching.

TeMPOraL · 2024-12-05T22:03:15 1733436195

That $100 of value likely costed them more like $0.1 - $1 in API costs.

grogenaut · 2024-12-06T02:09:46 1733450986

It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.

mdtancsa · 2024-12-05T20:46:58 1733431618

I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.

TeMPOraL · 2024-12-05T22:01:38 1733436098

First of all, that's moving the goalposts to next state over, relative to what I replied to.

Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.

Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).

Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).

--

[0] - Per the article you linked; I'm yet to find and read the actual study itself.

mkl · 2024-12-05T20:27:47 1733430467

Do you have a link? I'm not finding it by searching.

marcodiego · 2024-12-05T20:28:41 1733430521

I really need the source of this.

tiahura · 2024-12-05T19:53:42 1733428422

LLMs have become indispensable for many attorneys. I know many other professionals that have been able to offload dozens of hours of work per month to ChatGPT and Claude.

PittleyDunkin · 2024-12-05T20:01:08 1733428868

What on earth is this work that they're doing that's so resilient to the fallible nature of LLMs? Is it just document search with a RAG?

tiahura · 2024-12-05T20:07:50 1733429270

Everything. Drafting correspondence, pleadings discovery, discovery responses. Reviewing all of the same. Reviewing depositions, drafting deposition outlines.

Everything that is “word processing,” and that’s a lot.

PittleyDunkin · 2024-12-05T20:15:06 1733429706

Well that's terrifying. Good luck to them.

wing-_-nuts · 2024-12-05T20:30:05 1733430605

To be honest, much of contract law is formal boilerplate. I can understand why they'd want to move their role to 'review' instead of 'generate'

drdaeman · 2024-12-05T20:56:25 1733432185

So, instead of fixing the issue (legal documents becoming a barely manageable mess) they’re investing money into making it… even worse?

This world is so messed up.

Terr_ · 2024-12-05T21:43:45 1733435025

Arguably the same problem is occurs in programming: Anything so formulaic and common that an LLM can regurgitate it with a decent level of reliability... is something that ought to have been folded into method/library already.

Or it already exists in some howto documentation, but nobody wanted to skim the documentation.

randallsquared · 2024-12-05T21:40:19 1733434819

They have no lever with which to fix the issue.

PittleyDunkin · 2024-12-05T21:01:47 1733432507

Why not just move over to forms with structured input?

sebastiennight · 2024-12-05T20:50:01 1733431801

As a customer of legal work for 20 years, it is also way (way way) faster and cheaper to draft a contract with Claude (total work ~1 hour, even with complex back-and-forth ; you don't want to try to one-shot it in a single prompt) and then pay a law firm their top dollar-per-hour consulting to review/amend the contract (you can get to the final version in a day).

Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.

cj · 2024-12-05T21:20:27 1733433627

Good law firms won’t charge you for using their boilerplates, only the time to customize it for your use case.

I anlways ask our lawyer whether or not they have a boilerplate when I need a contract written up. They usually do.

sebastiennight · 2024-12-06T18:03:21 1733508201

That's interesting. I've never had a law firm be straightforward about the (obvious) fact they'll be using a boilerplate.

I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)

Do you ask about the boilerplate before or after you ask for a quote?

cj · 2024-12-07T23:43:58 1733615038

I typically don’t ask for a quote upfront since they are very fair with their business and billing practices.

I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.

I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.

Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.

bad_haircut72 · 2024-12-05T20:18:46 1733429926

Yeah the industries LLMs will disrupt the most are the ones who gatekeep busywork. SWE falls into this to some degree but other professions are more guilty than us. They dont replace intelligence they just surface jobs which never really required much intelligence to begin with.

jprd · 2024-12-05T21:50:47 1733435447

I bet they still charge for all the hours though.

rusticpenn · 2024-12-05T19:45:19 1733427919

I use llms to do most of my dunki work.

newsclues · 2024-12-05T22:14:48 1733436888

Maybe not very long, but long enough is plausible.

spaceman_2020 · 2024-12-05T21:58:15 1733435895

HN has been just such an awful place to discuss AI. Everyone here is convinced its a grift, a con, and we're all "marks"

Just zero curiosity, only skepticism.

ghshephard · 2024-12-05T19:10:09 1733425809

If you do a lot of work in an area that o1 is strong in - $200/month effectively rounds down to $0 - and a single good answer at the right time could justify that entire $200 in a single go.

daveguy · 2024-12-05T19:22:40 1733426560

I feel like a single bad answer at the wrong time could cost a heck of a lot more than $200. And these LLMs are riddled with bad answers.

amelius · 2024-12-05T19:34:33 1733427273

Think of it as an intern. Don't trust everything they say.

crindy · 2024-12-05T19:43:32 1733427812

It's so strange to me that in a forum full of programmers, people don't seem to understand that you set up systems to detect errors before they cause problems. That's why I find ChatGPT so useful for helping me with programming - I can tell if it makes a mistake because... the code doesn't do what I want it to do. I already have testing and linting set up to catch my own mistakes, and those things also catch AI's mistakes.

xandrius · 2024-12-05T20:06:53 1733429213

Thank you! I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is; it's like people want it 100% perfect or nothing. For me if it gets me 80% there in 1/10 the time, and then I do the final 20%, that's still heck of a productivity boost basically for free.

crindy · 2024-12-05T20:28:46 1733430526

Yep, I’m with you. I’m a solo dev who never went to college… o1 makes far fewer errors than I do! No chance I’d make it past round one of any sort of coding tournament. But I managed to bootstrap a whole saas company doing all the coding myself, which involved setting up a lot of guard rails to catch my own mistakes before they reached production. And now I can consult with a programming intelligence the likes of which I could never afford to hire if it was a person. It’s amazing.

thelastparadise · 2024-12-05T21:19:43 1733433583

Is it working?

crindy · 2024-12-05T22:49:42 1733438982

Not sure what you're referring to exactly. But broadly yes it is working for me - the number of new features I get out to users has sped up greatly, and stability of my product has also gone up.

daveguy · 2024-12-06T08:37:06 1733474226

Are you making money with your saas idea?

crindy · 2024-12-06T16:01:36 1733500896

Yep, been living off it for nine years now

daveguy · 2024-12-07T01:10:36 1733533836

Congratulations! That is not an easy task. I am just starting the journey.

lumb63 · 2024-12-05T21:17:24 1733433444

Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.

What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?

avarun · 2024-12-05T21:22:15 1733433735

> Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.

km3r · 2024-12-05T21:35:06 1733434506

Using 90/10 split: that 10% of the time before being reduced to only take 10% of that makes 9% time savings.

160 hours a month * $100/hr programmer * 9% = $1400 savings, easily enough to justify $200/month.

Even if 1/10th of the time it fails, that is still ~8% or $1200 savings.

daveguy · 2024-12-06T08:41:37 1733474497

Does that count the time you spend on prompt engineering?

xanderlewis · 2024-12-05T21:15:04 1733433304

It depends what you’re doing.

For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.

xandrius · 2024-12-06T10:26:57 1733480817

I'd love to hear what that is.

So far, given my range of projects, I have seen it struggle with lower level mobile stuff and hardware (ESP32 + BLE + HID).

For things like web (front/back), DB, video games (web and Unity), it does work pretty well (at least 80% there on average).

And I'm talking of the free version, not this $200/mo one.

daveguy · 2024-12-06T08:43:32 1733474612

Well, that is a very specific set of skills. I bet the C-suite loves it.

CamperBob2 · 2024-12-05T22:29:45 1733437785

I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is;

People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.

JamesBarney · 2024-12-05T20:27:46 1733430466

Well now at $200 it's a little farther away from free :P

xandrius · 2024-12-06T10:27:56 1733480876

What do you mean? ChatGPT is free, the Pro version isn't.

I'm talking of the generally available one, haven't had the chance to try this new version.

thelastparadise · 2024-12-05T21:20:03 1733433603

I could a car for that kind of money!

vunderba · 2024-12-05T20:12:04 1733429524

Of course, but for every thoroughly set up TDD environment, you have a hundred other people just blindly copy pasting LLM output into their code base and trusting the code based on a few quick sanity checks.

daveguy · 2024-12-05T19:46:33 1733427993

You assume programming software with an existing well-defined and correct test suite is all these will be used for.

leptons · 2024-12-05T20:32:20 1733430740

>I can tell if it makes a mistake because... the code doesn't do what I want it to do

Sometimes it does what you want it to do, but still creates a bug.

Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.

Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.

I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.

rrradical · 2024-12-05T21:38:42 1733434722

So the AI wrote a bug; but if humans wouldn’t catch it in code review, then obviously they could have written the same bug. Which shouldn’t be surprising because LLMs didn’t invent the concept of bugs.

I use LLMs maybe a few times a month but I don’t really follow this argument against them.

leptons · 2024-12-06T00:38:50 1733445530

Code reviewing is not the same thing as writing code. When you're writing code you're supposed to look at the documentation and do some exploration before the final code is pushed.

It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.

Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.

yawnxyz · 2024-12-05T20:55:18 1733432118

it also catches MY mistakes, so that saves time

Kiro · 2024-12-05T21:55:29 1733435729

So true, and people seem to gloss over this fact completely. They only talk about correcting the LLM's code while the opposite is much more common for me.

crackrook · 2024-12-05T19:45:42 1733427942

I would hesitate to hire an intern that makes incorrect statements with maximum confidence and with no ability to learn from their mistakes.

educasean · 2024-12-05T21:41:46 1733434906

When you highlight only the negatives, yeah it does sound like no one should hire that intern. But what if the same intern happens to have an encyclopedia for a brain and can pour through massive documents and codebases to spot and fix countless human errors in a snap?

There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.

crackrook · 2024-12-06T01:50:54 1733449854

Those are fair points, I didn't mean to imply that there are only negatives, and I don't consider myself to be in the former camp you describe as wanting nothing to do with these "interns". I shouldn't have stuck with the intern analogy at all since it's difficult for me to compare the two, with one being fairly autonomous and the other being totally reliant on a prompter.

The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.

pie420 · 2024-12-05T19:51:34 1733428294

nothing chatgpt says is with maximum confidence. the EULA and terms of use are riddled with "no guarantee of accuracy" and "use at own risk"

albumen · 2024-12-05T20:15:46 1733429746

No they're right. ChatGPT (and all chargers) responds confidently while making simple errors. Disclaimers upon signup or in tiny corner text are so at odds with the actual chat experience.

crackrook · 2024-12-05T21:00:56 1733432456

What I meant to say was that the model uses the verbiage of a maximally confident human. In my experience the interns worth having have some sense of the limits of their knowledge and will tell you "I don't know" or qualify information with "I'm not certain, but..."

If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.

daveguy · 2024-12-05T20:16:36 1733429796

I think the point is that an LLM almost always responds with the appearance of high confidence. It will much quicker hallucinate than say "I don't know."

Terr_ · 2024-12-05T21:52:28 1733435548

And we, as humans, are having a hard time compartmentalizing and forgetting our lifetimes of language cues, which typically correlate with attention to detail, intelligence, time investment, etc.

New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)

Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...

Draiken · 2024-12-05T20:20:31 1733430031

They aren't talking about EULAs. It's how they give out their answers.

stavros · 2024-12-05T19:43:33 1733427813

If I have to do the work to double-check all the answers, why am I paying $200?

billti · 2024-12-05T19:53:46 1733428426

Why do companies hire junior devs? You still want a senior to review the PRs before they merge into the product right? But the net benefit is still there.

stavros · 2024-12-05T20:01:33 1733428893

We hire junior devs as an investment, because at some point they turn into seniors. If they stayed juniors forever, I wouldn't hire them.

drusepth · 2024-12-05T20:13:20 1733429600

I started incorporating LLMs into my workflows around the time gpt-3 came out. By comparison to its performance at that point, it sure feels like my junior is starting to become a senior.

jhgg · 2024-12-05T20:13:19 1733429599

Are you implying this technology will remain static in its capabilities going forward despite it having seen significant improvement over the last few years?

stavros · 2024-12-05T20:19:48 1733429988

No, I'm explicitly saying that gpt-4o-2024-11-20 won't get any smarter, no matter how much I use it.

jhgg · 2024-12-05T21:11:52 1733433112

Does that matter when you can just swap it for gpt-5-whatever at some point in the future?

stavros · 2024-12-05T21:14:24 1733433264

Someone asked why I hire juniors. I said I hire juniors because they get better. I don't need to use the model for it to get better, I can just wait until it's good and use it then. That's the argument.

esafak · 2024-12-06T03:34:27 1733456067

I suppose the counterargument would be your investment in OpenAI allows them to fund the better model down the road, but I get your drift :)

OvidNaso · 2024-12-05T22:12:45 1733436765

Genuinely curious, are you saying that your junior devs don't provide any value from the work they do?

stavros · 2024-12-05T22:20:25 1733437225

They provide some value, but between the time they take in coaching, reviewing their work, support, etc, I'm fairly sure one senior developer has a much higher work per dollar ratio than the junior.

tiahura · 2024-12-05T19:55:42 1733428542

Because double checking and occasionally hitting retry is still 10x faster than me doing.

behringer · 2024-12-05T19:54:23 1733428463

Because you wouldn't have come up with the correct answer before you used up 200 dollars worth of salary or billable time.

UltraSane · 2024-12-05T20:13:46 1733429626

because checking the work is much faster than generating it.

Sporktacular · 2024-12-05T19:56:07 1733428567

Because it's per month and not per hour for a specialist consultant.

motoxpro · 2024-12-05T20:12:11 1733429531

I don't know anyone who does something and at first says, "This will be a mistake" Maybe they say, "I am pretty sure this is the right thing to do," then they make a mistake.

If it's easier mentally, just put that second sentence in from of every chatgpt answer.

Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.

parthdesai · 2024-12-05T19:43:28 1733427808

Yeah, but you personally don't pay $200/month out of your pocket for the intern. Heck in Canada, govt. actually rebates for hiring interns and co-ops.

malux85 · 2024-12-05T19:45:52 1733427952

Then the lesson you have learned is “don’t blindly trust the machine”

Which is a very valuable lesson, worth more than $200

awestroke · 2024-12-05T19:38:41 1733427521

Easy - don't trust the answers. Verify them

ZiiS · 2024-12-05T19:35:56 1733427356

Even in this case loosing $200 + whatever vs a tiny bit higher chance of loosing $20 + whatever makes pro seem a good deal.

daveguy · 2024-12-05T19:43:19 1733427799

Doesn't that completely depend on those chances and the magnitude of +whatever?

It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.

If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.

ruszki · 2024-12-05T19:41:24 1733427684

Compared to know things and not loosing whatever, both are pretty bad deals.

Kiro · 2024-12-05T21:49:43 1733435383

What specific use cases are you referring to where that poses a risk? I've been using LLMs for years now (both directly and as part of applications) and can't think of a single instance where the output constituted a risk or where it was relied upon for critical decisions.

llm_trw · 2024-12-05T21:21:24 1733433684

That's why you have a human in the loop responsible for the answer.

yosito · 2024-12-05T19:16:34 1733426194

Presumably, this is what they want the marks buying the $200 plan to think. Whether it's actually capable of providing answers worth $200 and not just sweet talking is the whole question.

dubeye · 2024-12-05T19:10:09 1733425809

If i'm happy to pay 20 in retirement just for the odd bit of writing help, then i can easily imagine it being worth 200 to someone with a job

josephg · 2024-12-05T19:16:48 1733426208

Yep. I’m currently paying for both Claude and chatgpt because they’re good at different things. I can’t tell whether this is extremely cheap or expensive - last week Claude saved me about a day of time by writing a whole lot of very complex sql queries for me. The value is insane.

cryptoegorophy · 2024-12-05T19:42:23 1733427743

yeah, as someone who is far from programming, the amount of time and money it saved me helping me make sql queries and making php code for wordpress is insane. It even helped me fix some wordpress plugins that had errors and you just copy paste or even screenshot those errors until they get fixed! If used correctly and efficiently the value is insane, I would say $20, $200 is still cheap for such an amazing tool.

raincole · 2024-12-05T20:17:29 1733429849

The problem isn't whether ChatGPT Pro can save you $200/mo (for most programmers it can.)

The problem is whether it can saves you $180/mo more than Claude does.

behringer · 2024-12-05T19:56:26 1733428586

I kind of feel this is a kick in the face.

Now I'll forever be using a second rate model because I'm not rich enough.

If I'm stuck using a second rate model I may go find someone else's model to use.

jrflowers · 2024-12-05T19:21:10 1733426470

> In other words, it's a con. I'm a paying Perplexity user

I love this back-to-back pair of statements. It is like “You can never win three card monte. I pay a monthly subscription fee to play it.”

yosito · 2024-12-05T19:28:52 1733426932

I pay $10/month for perplexity because I fully understand its limitations. I will not pay $200/month for an LLM.

monkey_monkey · 2024-12-05T21:57:45 1733435865

I am CERTAIN you do not FULLY understand its limitations.

yosito · 2024-12-05T23:29:17 1733441357

monkey_monkey · 2024-12-07T18:01:36 1733594496

yeah, that's what i thought.

nemonemo · 2024-12-05T18:58:25 1733425105

Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.

I think at least LLMs are more receptive to the idea that they may be wrong, and based on that, we can have N diverse LLMs and they may argue more peacefully and build a reliable consensus than N "intelligent" people.

jazzyjackson · 2024-12-05T19:05:16 1733425516

The difference between a person and a bot is that a person has a stake in the outcome. A bot is like a person who's already put in their two weeks notice and doesn't have to be there to see the outcome of their work.

MichaelZuo · 2024-12-05T20:17:48 1733429868

That’s still amazing quality output for someone working for under $1/hour?

Smaug123 · 2024-12-05T21:09:04 1733432944

It's not obvious that one should prefer that, versus not having that output at all.

MichaelZuo · 2024-12-06T00:56:47 1733446607

Why does that matter?

Even if it was a consensus opinion among all HN users, which hardly seems to be the case, it would have little impact on the other billion plus potential customers…

jerjerjer · 2024-12-05T19:15:01 1733426101

The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure. LLMs, by default, seem to be extremely confident in their answers, and it's quite hard to get the "confidence" level out of them (if that metric is even applicable to LLMs). That's why they are so good at duping people into believing them after all.

PittleyDunkin · 2024-12-05T20:05:57 1733429157

> The issue is that most people, especially when prompted, can provide their level of confidence in the answer or even refuse to provide an answer if they are not sure.

People also pull this figure out of their ass, over or undertrust themselves, and lie. I'm not sure self-reported confidence is that interesting compared to "showing your work".

fourside · 2024-12-05T19:07:39 1733425659

How is this a counter argument that LLMs are marketed as having intelligence when it’s more accurate to think of them as predictive models? The fact that humans are also flawed isn’t super relevant to a $200/month LLM purchasing decision.

lukan · 2024-12-05T19:07:14 1733425634

Intelligent people will know they made a mistake, if given a hint and figure out what went wrong.

A LLM will just pretend to care about the error and happily repeats the error over and over.

ryan29 · 2024-12-05T20:31:18 1733430678

> Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.

I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.

I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?

Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:

    services:
      app:
        # ...
        environment:
          - 'DB_HOST=mysql:3306'
        # ...
      mariadb:
        image: linuxserver/mariadb
        container_name: mariadb
        environment:
          - PUID=1000
          - PGID=1000
          - MYSQL_ROOT_PASSWORD=ROOT_ACCESS_PASSWORD
          - TZ=Europe/London
        volumes:
          - /home/user/appdata/mariadb:/config
        ports:
          - 3306:3306
        restart: unless-stopped

Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.

Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.

There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.

How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?

1. https://docs.linuxserver.io/general/docker-compose/#multiple...

vanviegen · 2024-12-05T21:10:20 1733433020

I asked ChatGPT 4o if there's anything that can be improved in your docker-compose file. Among other (seemingly sensible) suggestions, it offered:

## Restrict Host Ports for Security

If app and mariadb are only communicating internally, you can remove 3306:3306 to avoid exposing the port to the host machine:

```yaml ports: - 3306:3306 # Remove this unless external access is required. ```

So, apparently, ChatGPT doesn't need any more convincing.

BeefWellington · 2024-12-05T22:05:12 1733436312

Here GPT is saying the port is only exposed to the host machine (e.g.: localhost), rather than the full local network.

ryan29 · 2024-12-05T22:00:33 1733436033

Wow. I can honestly say I'm surprised it makes that suggestion. That's great!

I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?

I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.

kdmtctl · 2024-12-05T23:31:50 1733441510

I suspect there is acsolid corpus of advices online that mention the exposed ports risk. Alongside with flawed examples you mentioned. Narrow request will trigger the right response. That's why LLMs are still requiring basic understanding of what exactly you plan to achieve.

yosito · 2024-12-05T19:07:50 1733425670

Yeah, most people suck at verifying truth and reasoning. But most information technology employees, above intern level, are highly capable of reasoning and making decisions in their area of expertise.

Try asking an LLM complex questions in your area of expertise. Interview it as if you needed to be confident that it could do your job. You'll quickly find out that it can't do your job, and isn't actually capable of reasoning.

sangeeth96 · 2024-12-05T19:07:27 1733425647

> they may argue more peacefully

bit of a stretch.

vbezhenar · 2024-12-05T21:08:44 1733432924

I would pay $200 for GPT4o. Since GPT4, ChatGPT is absolutely necessary for my work and for my life. It changed every workflow like Google changed. I'm paying $20 to remove ads from youtube which I watch may be once a week, so $20 for ChatGPT was a steal.

That said, my "issue" might be that I usually work alone and I don't have anyone to consult with. I can bother people on forums, but these days forums are pretty much dead and full of trolls, so it's not very useful. ChatGPT was that thing that allows me to progress in this environment. If you work in Google and can ask Rob Pike about something, probably you don't need ChatGPT as much.

outside415 · 2024-12-05T21:22:29 1733433749

this is more or less my take too. if tomorrow all Claude and ChatGPT became $200/month I would still pay. The value they provide me with far, far exceeds that. so many cynics in this thread.

throwaway314155 · 2024-12-05T21:35:35 1733434535

You don't have to be a cynic to be annoyed with a $200/month price. Just make a normal amount of money.

tippytippytango · 2024-12-05T20:32:05 1733430725

It’s like hiring an assistant. You could hire one for 60k/year. But you wouldn’t do it unless you knew how the assistant could help you make more than 60k per year. If you don’t know what to do with an employee then don’t hire them. If you don’t know what to do with expensive ai, don’t pay for it.

wslh · 2024-12-05T19:15:32 1733426132

> $200 a month for this is insane, but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model.

Is it possible that they have subsidized the infrastructure for free and paid users and they realized that OpenAI requires a higher revenue to maintain the current demand?

yosito · 2024-12-05T19:35:12 1733427312

Yes, it's entirely possible that they're scrambling to make money. That doesn't actually increase the value that they're offering though.

lm28469 · 2024-12-05T19:08:18 1733425698

> $200 a month for this is insane

Losing $5-10b per year also is insane. People are still looking for the added value, it's been 2 whole years now

AlanYx · 2024-12-05T19:05:10 1733425510

$200 a month is potentially a bargain since it comes with unlimited advanced voice. Via the API, $200 used to only get you 14 hours of advanced voice.

yosito · 2024-12-05T19:18:17 1733426297

I've got unlimited "advanced voice" with Perplexity for $10/mo. You're defining a bargain based on the arbitrary limits set by the company offering you said bargain.

dumbmrblah · 2024-12-06T01:16:31 1733447791

The advanced voice of ChatGPT is miles ahead of the Perplexity one. I subscribe to both.

jerjerjer · 2024-12-05T19:10:48 1733425848

Does it give unlimited API access though?

AlanYx · 2024-12-05T19:13:20 1733426000

No (naturally). But my thought process is that if you use advanced voice even half an hour a day, it's probably a fair price based on API costs. If you use it more, for something like language learning or entertaining kids who love it, it's potentially a bargain.

htrp · 2024-12-05T19:09:43 1733425783

you'll be throttled and rate limited

tptacek · 2024-12-05T21:41:03 1733434863

Is it insane? It's the cost of a new laptop every year. There are about as many people who won't blink at that among practitioners in our field as people who will.

I think the ship has sailed on whether GPT is useful or a con; I've lost track of people telling me it's their first search now rather than Google.

I'd encourage skeptics who haven't read this yet to check out Nicholas' post here:

https://news.ycombinator.com/item?id=41150317

rfoo · 2024-12-06T11:51:53 1733485913

> It's the cost of a new laptop every year.

It's the cost of a new, shiny, Apple laptop every year.

jack_riminton · 2024-12-05T19:34:18 1733427258

If a model is good enough (I’m not saying this one is that level) I could imagine individuals and businesses paying 20,000 a month. If they’re answering questions at phd level (again, not saying this one is) then for a lot of areas this makes sense

yosito · 2024-12-05T19:37:11 1733427431

Let me know when the models are actually, verifiably, this good. They're barely good enough to replace interns at this point.

TeMPOraL · 2024-12-05T19:59:42 1733428782

Let me know where you can find people that are individually capable at performing at intern level in every domain of knowledge and text-based activity known to mankind.

"Barely good enough to replace interns" is worth a lot to businesses already.

(On that note, a founder of a SAP competitor and a major IT corporation in Poland is fond of saying that "any specialist can be replaced by a finite number of interns". We'll soon get to see how true that is.)

jll29 · 2024-12-05T20:26:13 1733430373

Cześć!

Since when does SAP have competitors? ;-P

A friend of mine claims most research is nowadays done by undergraduates because all senior folks are too busy.

etrautmann · 2024-12-05T20:58:30 1733432310

postdocs but yeah

ssl-3 · 2024-12-05T20:13:12 1733429592

Let me know what kind of intern you can keep around 24/7 for a total monthly outlay of $200, and then we can compare notes.

yosito · 2024-12-05T23:31:03 1733441463

Probably one from the Philippines.

handfuloflight · 2024-12-06T04:11:43 1733458303

Not 24/7.

ssl-3 · 2024-12-06T15:19:40 1733498380

And probably not one that can guess (often poorly, but at least sometimes quite well, and usually at least very much in the right direction) about everything from nuances of seasoning taco meat to particle physics, and do so in ~an instant.

$200 seems pretty cheap for a 24/7 [remote] intern with these abilities. That kind of money doesn't even buy a month's worth of Big Macs to feed that intern with.

It just seems like a lot (or even absurd) for a subscription to a service on teh Interweb, akin to "$200 for access to a web site? lolwut?"

zamadatix · 2024-12-05T20:28:55 1733430535

If true, $2,400/y isn't bad for a 24/7/365 intern.

8f2ab37a-ed6c · 2024-12-05T20:32:55 1733430775

My main concern with $200/mo is that, as a software dev using foundational LLMs to learn and solve problems, I wouldn't get that much incremental value over the $20/mo tier, which I'm happy to pay for. They'd have to do a pretty incredible job at selling me on the benefits for me to pay 10x the original price. 10x for something like a 5% marginal improvement seems sus.

MuffinFlavored · 2024-12-05T18:57:38 1733425058

> but I have a feeling that part of the reason they're charging so much is to give people more confidence in the model

Or each user doing an o1 model prompt is probably like, really expensive and they need to charge for it until they can get cost down? Anybody have estimates on what a single request into o1 costs on their end? Like GPU, memory, all the "thought" tokens?

yosito · 2024-12-05T19:12:10 1733425930

Perplexity does reasoning and searching, for $10/mo, so I have a hard time believing that it costs OpenAI 20x as much to do the same thing. Especially if OpenAI's model is really more advanced. But of course, no one except internal teams have all of the information about costs.

metacritic12 · 2024-12-05T20:37:19 1733431039

Do you also think $40K a year for Hubspot is insane? What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?

The truth is that there are people who value the marginal performance -- if you think it's insane, clearly it's not for you.

Barrin92 · 2024-12-05T21:06:17 1733432777

>What about people who pay $1k in order to work on a field for 4 hours hitting a small ball with a stick?

Those people want to purchase status. Unless they ship you a fancy bow tie and a wine tasting at a wood cabin with your chatgpt subscription this isn't gonna last long.

This isn't about marginal performance, it's an increasingly desperate attempt to justify their spending in a market that's increasingly commodified and open sourced. Gotta convince Microsoft somehow to keep the lights on if you blew tens of billions to be the first guy to make a service that 20 different companies are soon gonna sell for pennies.

echelon · 2024-12-05T20:39:07 1733431147

I'm extremely excited because this margin represents opportunity for all the other LLM startups.

digitcatphd · 2024-12-05T20:38:57 1733431137

Their demo video was uploading a picture of a birdhouse and asking how to build it

Max-q · 2024-12-05T18:57:50 1733425070

I would say using performance of Perplexity as a benchmark for the quality of o1-pro is a stretch?

yosito · 2024-12-05T19:19:54 1733426394

Find third party benchmarks of the relevant models and then this discussion is worth having. Otherwise, it's just speculation.

choppaface · 2024-12-05T19:11:58 1733425918

They claim unlimited access, but in practice couldn't a user wrap an API around the app and use it for a service? Or perhaps the client effectively throttles use pretty aggressively?

Interesting to compare this $200 pricing with the recent launch of Amazon Nova, which has not-equivalent-but-impressive performance for 1/10th the cost per million tokens. (Or perhaps OpenAI "shipmas" will include a competing product in the next few days, hence Amazon released early?)

See e.g.: https://mastodon.social/@mhoye/113595564770070726

fzeindl · 2024-12-05T19:41:48 1733427708

> After awhile, I started realizing that these mistakes are present in almost all topics.

A fun question I tried a couple of times is asking it to give me a list with famous talks about a topic. Or a list of famous software engineers and the topics they work on.

A couple of names typically exist but many names and basically all talks are shamelessly made up.

valval · 2024-12-05T20:08:55 1733429335

If you understood the systems you’re using, you’d know the limitations and wouldn’t marvel at this. Use search engines for searching, calculators for calculating, and LLMs for generating text.

bowsamic · 2024-12-05T21:41:40 1733434900

Whenever I’ve used ChatGPT for this exact thing it has been very accurate and didn’t make up anyone

athrowaway3z · 2024-12-05T21:21:54 1733433714

I've actually hit a interesting situation a few times that make use of this. If some language feature, argument, or configuration option doesn't exists it will hallucinate one.

This hallucination is usually a very good choice to name the option / API.

clutchdude · 2024-12-05T21:59:36 1733435976

I've seen this before and it's frustrating to deal with chasing phantom APIs it invents.

I wish it could just say "There is not a good approximation of this API existing - I would suggest reviewing the following docs/sources:....".

brookst · 2024-12-05T19:23:13 1733426593

I’d like to see more evidence that it’s a scam than just your feelings. Any data there?

I certainly don’t see why mere prediction can’t validate reasoning. Sure, it can’t do it perfectly all the time, but neither can people.

talldayo · 2024-12-05T19:32:36 1733427156

> I’d like to see more evidence that it’s a scam

Have you been introduced to their CEO yet? 5 minutes of Worldcoin research should assuage your curiosity.

latexr · 2024-12-05T19:48:48 1733428128

https://www.technologyreview.com/2022/04/06/1048981/worldcoi...

https://www.buzzfeednews.com/article/richardnieva/worldcoin-...