I expect my calculator to be 100% accurate 100% of the time. I have slightly mor...

LordDragonfang · 2025-05-06T21:36:38 1746567398

A calculator isn't software, it's hardware. Your inputs into a calculator are code.

Your interaction with LLMs is categorically closer to interactions with people than with a calculator. Your inputs into it are language.

Of course the two are different. A calculator is a computer, an LLM is not. Comparing the two is making the same category error which would confuse Mr. Babbage, but in reverse.

(“On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.”)

mattlondon · 2025-05-06T19:05:26 1746558326

AIs aren't intended to be used as calculators though?

You could say that when I use my spanner/wrench to tighten a nut it works 100% of the time, but as soon as I try to use a screwdriver it's terrible and full of problems and it can't even reliably so something as trivially easy as tighten a nut, even though a screwdriver works the same way by using torque to tighten a fastener.

Well that's because one tool is designed for one thing, and one is designed for another.

thewebguyd · 2025-05-06T21:24:48 1746566688

> AIs aren't intended to be used as calculators though?

Then why are we using them to write code, which should produce reliable outputs for a given input...much like a calculator.

Obviously we want the code to produce correct results for whatever input we give, and as it stands now, I can't trust LLM output without reviewing first. Still a helpful tool, but ultimately my desire would be to have them be as accurate as a calculator so they can be trusted enough to not need the review step.

Using an LLM and being OK with untrustworthy results, it'd be like clicking the terminal icon on my dock and sometimes it opens terminal, sometimes it might open a browser, or just silently fail because there's no reproducible output for any given input to an LLM. To me that's a problem, output should be reproducible, especially if it's writing code.

mattlondon · 2025-05-07T14:41:22 1746628882

But this was my original point.

If we have an intern junior dev on our team do we expect them to be 100% totally correct all the time? Why do we have a culture of peer code reviews at all if we assume that every one who commits code is 100% foolproof and correct 100% of the time?

Truth is we don't trust all the humans that write code to be perfect. As the old-as-the-hills saying goes "we all make mistakes". So replace "LLM" in your comment above with "junior dev" and everything you said still applies wether it is LLMs or inexperienced colleagues. With code, there is very rarely a single "correct" answer to how to implement something (unlike the calculator tautology you suggest) anyway, so an LLM or an intern (or even an experienced colleague) absolutely nailing their PRs with zero review comments etc seems unusual to me.

So we go back to the original - and I admit quite philosophical - point: when will we be happy? We take on juniors because they do the low-level and boring work and we need to keep an eye on their output until they learn and grow and improve ... but we cannot do the same for a LLM?

What we have today was literally science fiction not so long ago (e.g. "Her" movie from 2013 is now a reality pretty much). Step back for a moment - the fact we are even having this discussion that "yeah it writes code but it needs to be checked" is just mind-blowing that it even writes code that is mostly-correct at all. Give things another couple of years and its going to be even better.

kupopuffs · 2025-05-07T11:37:42 1746617862

I dunno man, I think writing an app is 10000x harder than adding 5 + 5

mdp2021 · 2025-05-06T20:06:48 1746562008

> AIs are

"AI"s are designed to be reliable; "AGI"s are designed to be intelligent; "LLM"s seem to be designed to make some qualities emerge.

> one tool is designed for one thing, and one is designed for another

The design of LLMs seems to be "let us see where the promise leads us". That is not really "design", i.e. "from need to solution".

asadotzler · 2025-05-06T16:33:06 1746549186

And a $2.99 drugstore slim wallet calculator with solar power gets it right 100% of the time while billion dollar LLMs can still get arithmetic wrong on occasion.

pb7 · 2025-05-06T17:10:31 1746551431

My hammer can't do any arithmetic at all, why does anyone even use them?

izacus · 2025-05-06T18:39:57 1746556797

What you're being asked is to stop trying to hammer every single thing that comes into your vicinity. Smashing your computer with a hammer won't create code.

namaria · 2025-05-06T17:25:29 1746552329

Does it sometimes instead of driving a nail hit random things in the house?

hn_go_brrrrr · 2025-05-06T18:55:45 1746557745

Yes, like my thumb.

namaria · 2025-05-07T09:22:15 1746609735

Limited blast radii are a great advantage of deterministic tools.

gilbetron · 2025-05-06T18:35:47 1746556547

It's your option not to use it. However, this is a competitive environment and so we will see who pulls out ahead, those that use AI as a productivity multiplier versus those that do not. Maybe that multiplier is less than 1, time will tell.

acoustics · 2025-05-06T18:46:47 1746557207

Agreed. The nice thing is that I am told by HN and Twitter that agentic workflows makes code tasks very easy, so if it turns out that using these tools multiplies productivity, then I can just start using them and it will be easy. Then I am caught up with the early adopters and don't need to worry about being out-competed by them.

Analemma_ · 2025-05-06T17:00:24 1746550824

I don't think that's the relevant comparison though. Do you expect StackOverflow or product documentation to be 100% accurate 100% of the time? I definitely don't.

acoustics · 2025-05-06T18:56:43 1746557803

I actually agree with this. I use LLMs often, and I don't compare them to a calculator.

Mainly I meant to push back against the reflexive comparison to a friend or family member or colleague. AI is a multi-purpose tool that is used for many different kinds of tasks. Some of these tasks are analogues to human tasks, where we should anticipate human error. Others are not, and yet we often ask an LLM to do them anyway.

ctxc · 2025-05-06T17:08:39 1746551319

Also, documentation and SO are incorrect in a predictable way. We don't expect them to state things in a matter of fact way that just don't exist.

ctxc · 2025-05-06T17:07:31 1746551251

The error introduced by the data is expected and internalized, it's the error of LLMs on _top_ of that that's hard to.

pizza · 2025-05-06T17:03:55 1746551035

Are you sure about that? Try these..

- (1e(1e10) + 1) - 1e(1e10)

- sqrt(sqrt(2)) * sqrt(sqrt(2)) * sqrt(sqrt(2)) * sqrt(sqrt(2))

ctxc · 2025-05-06T17:10:27 1746551427

Three decades and I haven't had to do anything remotely resembling this on a calculator, much less find the calculator wrong. Same for the majority of general population I assume.

tasuki · 2025-05-06T17:40:13 1746553213

The person you're replying to pointed out that you shouldn't expect a calculator to be 100% accurate 100% of the time. Especially not when faced with adversarial prompts.

jjmarr · 2025-05-06T17:34:28 1746552868

(1/3)*3

Vvector · 2025-05-06T17:45:27 1746553527

Try "1/3". The calculator answer is not "100% accurate"

bb88 · 2025-05-06T18:08:47 1746554927

I had a casio calculator back in the 1980's that did fractions.

So when I punched in 1/3 it was exactly 1/3.