Hacker Newsnew | past | comments | ask | show | jobs | submit | enraged_camel's commentslogin

People always say stuff like this, but it is misleading. The reason it's misleading is because that remaining 5% makes a huge difference, and is where most of the value of using AI agents lies.

I'm not interested in using AI to write code that would have taken me 5-10 minutes to write myself. I use AI to debug complex bugs and develop large features that span multiple domains - stuff that normally takes hours, if not days/weeks. A model that is "enough for 95%" does not cut it for that, because the failures compound during long-horizon tasks and the thing becomes a mess.


That must be why Trump spent over $50B bombing Iran and agreed to pay them several hundred billion to go back to the status quo.

I'm not sure about that. Claude has some bugs, but Codex is not as polished and doesn't have as many features. For example, you need to add MCP servers manually. There's no Plugin/Skill/Connector marketplace that is accessible from within the app, like there is with Claude Desktop. The Cowork-equivalent is nowhere as powerful. And so on.

I still use Codex, but mostly when I need to check Opus 4.8's work. Pretty sure I will stop doing that soon, because during the short time Fable was available, Codex was not able to find any important issues with the code Fable wrote.


But how many plugins are people actually using? I can think of one MCP server I find valuable (context7) and one plugin that i've installed, but continuously think about uninstalling (obra/superpowers).

Both were trivial to set up with codex.


There are plugins in the app.

Haven’t tried Cowork, interesting. Isn’t it just the same agent minus the git worktree based UI?

Frankly, neither Claude nor Codex are as good as hype entails.


Personally I prefer GPT 5.5 writing style over Opus 4.8. It’s much more no nonsense and information denser.

That's the first time I saw someone prefering GPT-styled output over Claude ;) It's the complete opposite for me, GPT is way too verbose (even after telling it to STFU), overwhelms the user with thousands of options and doesn't just answer a question without shitting out thousands of paragraphs. Also the overall tone is way too enthusiastic.

I strongly prefer codex. Claude is annoying. Codex provides descriptions where I want them and more touchpoints to audit the quality of work. Claude code on experimental seems to not even show diffs when asked anymore, and it's much less clear what is being shipped.

Dunno, I prefer GPT 5.5 too for the same reasons as the parent. Extremely subjective but had better results with it too. Maybe I just got unlucky with Claude a few times, but even the latest Opus was dumb.

Fascinating how people have such complete diametrically opposed experiences. I guess both models have it in them to behave very differently in different circumstances and we have very little idea what pushes them in this or that direction. I guess it does boil down to luck!

Personally, Claude Opus (and in the few interactions I had with it, Fable) has been the far the superior experience. GPT-5.5 seems dumber and more certain about presenting me bullshit. Opus has better humor, and is less pretentious in its presentation. But this may all boil down to how the models react to my prompting.

What is without a doubt is that I wish they both were more intelligent – or maybe it is their wisdom I find lacking!


It's a good thing. I hate MCPs from the bottom of my heart because they always stay there and bloat the context window. Also, usually developers who develop them don't know what they're doing, so the MCP responses also bloat your context even further.

> For example, you need to add MCP servers manually. There's no Plugin/Skill/Connector marketplace that is accessible from within the app

This is all wrong.


i think codex is much better in that aspect. In claude there is skills, connector, capabilities and 4 places for browser... It is too much.

>> A true, but vapid speech.

PG got into an argument with AOC about it on Twitter. It sounded like he was personally offended by what she was saying. Which makes sense because, as someone who has helped startup founders become famously wealthy, he probably took her statement as an attack on his identity.

Perhaps PG should follow his own advice, though: https://paulgraham.com/identity.html


I'm using AI for most things. It has been an incredible improvement to both my quality of life and my wallet. Some of the most high profile items from just the past three months:

- I'm getting my roof replaced due to hail damage. Insurance originally covered only $5k due to depreciation. I fed the insurance policy to AI. I learned about the appraisal clause and invoked it. At the end, I got another $6,500 back.

- I was having issues with plumbing. Four different plumbers came, they all said the cast iron pipes under the house need to change. Quotes ranged from $35k to $55k. I had AI walk me through the process. It taught me about the yard line vs. under-slab distinction, and suggested getting just the yard line replaced first because it's much cheaper and can fix the issue. I did that and spent $6k. The issue was fixed. I "saved" $30k for now by deferring that massive month-long project. (For brevity, I'm omitting a ton of boring technical stuff I learned about plumbing that helped me make the optimal decision - none of the contractors bothered explaining any of it.)

- My 2010 Hyundai Santa Fe is starting to show its age. I've taken it to multiple different repair shops, then fed their diagnoses and recommendations to AI and figured out which ones are trying to fleece me and which ones are being more careful and conservative with their repair recommendations. Probably saved several thousand dollars there. Learned a lot about cars too!

- My partner and I are converting the backyard to a wildlife sanctuary. The AI helped us plan what to plant where (depending on lots of factors like sunlight location, irrigation access, etc.) and it has been going really well. Also planned out a dragonfly pond to deal with mosquitoes. AI created a project plan, including schematics, material purchase list and step-by-step instructions.

- I've been wanting to do various other home improvement projects, but only ones that make financial sense. I took photos of my house, both inside and outside, and fed them to AI, and said "give me a list of projects I can do that will have high ROI for when I decide to sell this house". It spent 15 mins doing deep research, then came back with a long, prioritized list. If I do all the projects, I'd be spending about $40k and it would improve the house valuation by about $90k.

I can go on. There's probably dozens of stuff that I've used it for over the past year that led to massive time and money savings, and I've learned a ton as well about topics I normally would not have been exposed to or bothered to research myself. And I'm not even including all the work-related usage, both for my employer and my side business. That would be its own very long list.


Great examples. I think people not using AI for issues like these lack imagination or more charitably, simply don't know that it works so well for these. Especially non-technical people can find great value out of AI, not just SWEs.

>> The article is presenting an idea, not a solution.

The article establishes an arbitrary standard, provides examples and criticizes them on the basis that they don't meet that arbitrary standard, and then... nothing.

It is easy to criticize something from the outside. Much harder to dig deep, learn the material and understand why it produces the status quo, and then propose workable solutions. That's where the actual value lies.


>> Something I've wondered: for all its claims of business-friendliness, why does Texas insist on attracting the lowest-margin industries?

Tech is one of the highest margin industries.


Not blockchain tech.

>> Another is who is going into the first IPO. Troubles for Anthropic IPO would channel all those money into OpenAI's one. Check financial interests of this admin. Hint - they aren't with Anthropic.

Yep. Kushner owns private shares of OpenAI.


I think it's a possibility, because labs trying to one-up each other is a fairly common phenomenon at this point. Previous Opus releases were immediately followed by GPT releases, for example. At some point the timing stops being a mere coincidence.

>> Power is not free.

There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?


If an identical task takes a day on both sides, then the human route uses less energy, surely.

Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.

The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.

Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)

The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.


Brains are efficient, but civilized humans aren't. In the USA, adults consume at a rate of about 10kW -- only 1-2% of that being the human's metabolism, the rest being HVAC, electrical devices, etc.

For comparison, a modern frontier model like Gemini 3.5 Pro consumes about 15kW -- so only about 1.5x the fully loaded human. In an 8h workday, that model would crank through ~80M tokens (~$5k at API prices). That's ~4 major refactors of a 10k LOC codebase, so probably not a very realistic comparison to a single human dev.

I think a more useful comparison, based on my experience, is that an engineer with AI support can get one 8h day's worth of unassisted work done in 1h. So, the 25 kWh consumed during collaboration (conservatively assuming I keep the GPU hot for the whole hour) frees up the remaining 70 kWh I'll draw down for the day to be spent in some other way.


You forgot to mention that it takes a lot more energy to train that human before they're able to work.

The human in the scenario is on regardless. One has to assume. But I also think this sentence you typed is essentially a single line horror story and we should consider whether it is ever appropriate to say it out loud.

to be pedantic you'd need to think a lot about how you power your human. Did you fuel up your human with beef or beans? local or shipped? were they operating a day in climate control? have to commute? did they need equipment like a large monitor? etc .

in reality basically all those concerns come out in the wash when you factor pay. energy inputs throughout the chain tend to materialize as expense. if the human was paid less then likely they used less energy.


Studies on grandmaster chess players indicate that at most you burn 10% more calories when engaged in deep thought than when you're at rest. So the energy "attributable" to an hour of knowledge work is like 10 calories (average sedentary calorie burn is like 80-100 per hour; add a max of 10% for the thinking gets you 8-10 calories). A pound of potatoes is like a buck and is about 320 calories. So you're looking at like 3 cents an hour at most to cover that energy burn. It's definitely even less; I certainly don't think as hard as a grandmaster chess player.

Then, assume power costs 20 cents per kilowatt hour (US avwrage) To match the human 3 cents per hour, you need an average of 150 watts of power drawn per hour. That's in the range of a budget graphics card, but not much past there.

However, if you sleep instead of sitting around, you can probably make AI cost competitive. Sleeping drops your metabolic rate by more, and lying down in bed (as opposed to sitting) also reduces calorie burn. Combined, you can reduce your burn by like 30 calories an hour. At the new 9 cents per hour human cost, you can afford to run a higher end graphics card at ~450 watts per hour. That puts you in RTX 3090 range.


Excellent analysis, and

> However, if you sleep instead of sitting around, you can probably make AI cost competitive.

important consideration.


Yeah, it gets weird when you start trying to compare human versus AI energy demands because you can turn a computer off, but you can't really turn a human off. Most studies indicate that humans can do 3-4 hours of high-level knowledge work in a day. An AI is not limited by these restrictions; it could do 24 straight hours of work if you wanted it to. So do we count all the energy the human burned for the remaining 20 hours of the day and allocate it to those 4 hours? Or are we just comparing "energy spent working on the task" for the AI and the human?

:D

The jokes are aside now!

Maybe we gotta include all 24, since no one will ever* get the 4 without the 20 of sleep/eat/relax.

*least probably not long term with FTEs I assume

Though for employers who send employees home it should be OK to ignore what employees do outside of work…


What would you do for the rest of the day, power off your devices and go for a long bike ride?

Speaking personally: yes. That's literally what I'm planning to do this afternoon because it's noon and I'm already done with the coding tasks I had on my plate today.

Luckily the future is absolutely going to be that star trek one where technological abundance means we are all wealthy and have free time to develop personally, and not the future where all the money bubbles up into the hands of a thin-skinned malignant narcissist who wants to play with launching rockets and provoking racial violence /s

The question needs to be tweaked a little: it's not just human vs LLM, it's human vs human + LLM, which makes the calculations easier (and more correct because LLMs don't currently operate independently.)

I've run the napkin math, and assuming LLMs make humans even 5% more efficient, the power and water savings over time are significant, largely because humans are so resource intensive: https://news.ycombinator.com/item?id=46984659


There is no break even point, you always come out ahead doing it yourself because your caloric burn is the same for the day whether you build the tool or AI builds the tool. Only way the AI example might avoid that is if it tells you to jump off a cliff before starting the compute run.

I'm assuming that you need to feed the human being (i.e. you) regardless of whether you use that human being for writing code or not. So, by this metric, there is simply no breaking even point. The cost of human + AI is always going to be higher than the cost of human.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: