A calculator isn't software, it's hardware. Your inputs into a calculator are code.
Your interaction with LLMs is categorically closer to interactions with people than with a calculator. Your inputs into it are language.
Of course the two are different. A calculator is a computer, an LLM is not. Comparing the two is making the same category error which would confuse Mr. Babbage, but in reverse.
(“On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.”)
AIs aren't intended to be used as calculators though?
You could say that when I use my spanner/wrench to tighten a nut it works 100% of the time, but as soon as I try to use a screwdriver it's terrible and full of problems and it can't even reliably so something as trivially easy as tighten a nut, even though a screwdriver works the same way by using torque to tighten a fastener.
Well that's because one tool is designed for one thing, and one is designed for another.
> AIs aren't intended to be used as calculators though?
Then why are we using them to write code, which should produce reliable outputs for a given input...much like a calculator.
Obviously we want the code to produce correct results for whatever input we give, and as it stands now, I can't trust LLM output without reviewing first. Still a helpful tool, but ultimately my desire would be to have them be as accurate as a calculator so they can be trusted enough to not need the review step.
Using an LLM and being OK with untrustworthy results, it'd be like clicking the terminal icon on my dock and sometimes it opens terminal, sometimes it might open a browser, or just silently fail because there's no reproducible output for any given input to an LLM. To me that's a problem, output should be reproducible, especially if it's writing code.
If we have an intern junior dev on our team do we expect them to be 100% totally correct all the time? Why do we have a culture of peer code reviews at all if we assume that every one who commits code is 100% foolproof and correct 100% of the time?
Truth is we don't trust all the humans that write code to be perfect. As the old-as-the-hills saying goes "we all make mistakes". So replace "LLM" in your comment above with "junior dev" and everything you said still applies wether it is LLMs or inexperienced colleagues. With code, there is very rarely a single "correct" answer to how to implement something (unlike the calculator tautology you suggest) anyway, so an LLM or an intern (or even an experienced colleague) absolutely nailing their PRs with zero review comments etc seems unusual to me.
So we go back to the original - and I admit quite philosophical - point: when will we be happy? We take on juniors because they do the low-level and boring work and we need to keep an eye on their output until they learn and grow and improve ... but we cannot do the same for a LLM?
What we have today was literally science fiction not so long ago (e.g. "Her" movie from 2013 is now a reality pretty much). Step back for a moment - the fact we are even having this discussion that "yeah it writes code but it needs to be checked" is just mind-blowing that it even writes code that is mostly-correct at all. Give things another couple of years and its going to be even better.
And a $2.99 drugstore slim wallet calculator with solar power gets it right 100% of the time while billion dollar LLMs can still get arithmetic wrong on occasion.
What you're being asked is to stop trying to hammer every single thing that comes into your vicinity. Smashing your computer with a hammer won't create code.
It's your option not to use it. However, this is a competitive environment and so we will see who pulls out ahead, those that use AI as a productivity multiplier versus those that do not. Maybe that multiplier is less than 1, time will tell.
Agreed. The nice thing is that I am told by HN and Twitter that agentic workflows makes code tasks very easy, so if it turns out that using these tools multiplies productivity, then I can just start using them and it will be easy. Then I am caught up with the early adopters and don't need to worry about being out-competed by them.
I don't think that's the relevant comparison though. Do you expect StackOverflow or product documentation to be 100% accurate 100% of the time? I definitely don't.
I actually agree with this. I use LLMs often, and I don't compare them to a calculator.
Mainly I meant to push back against the reflexive comparison to a friend or family member or colleague. AI is a multi-purpose tool that is used for many different kinds of tasks. Some of these tasks are analogues to human tasks, where we should anticipate human error. Others are not, and yet we often ask an LLM to do them anyway.
Three decades and I haven't had to do anything remotely resembling this on a calculator, much less find the calculator wrong. Same for the majority of general population I assume.
The person you're replying to pointed out that you shouldn't expect a calculator to be 100% accurate 100% of the time. Especially not when faced with adversarial prompts.