Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People say this, but in my experience it’s not true.

1) The cognitive burden is much lower when the AI can correctly do 90% of the work. Yes, the remaining 10% still takes effort, but your mind has more space for it.

2) For experts who have a clear mental model of the task requirements, it’s generally less effort to fix an almost-correct solution than to invent the entire thing from scratch. The “starting cost” in mental energy to go from a blank page/empty spreadsheet to something useful is significant. (I limit this to experts because I do think you have to have a strong mental framework you can immediately slot the AI output into, in order to be able to quickly spot errors.)

3) Even when the LLM gets it totally wrong, I’ve actually had experiences where a clearly flawed output was still a useful starting point, especially when I’m tired or busy. It nerd-snipes my brain from “I need another cup of coffee before I can even begin thinking about this” to “no you idiot, that’s not how it should be done at all, do this instead…”



>The cognitive burden is much lower when the AI can correctly do 90% of the work. Yes, the remaining 10% still takes effort, but your mind has more space for it.

I think their point is that 10%, 1%, whatever %, the type of problem is a huge headache. In something like a complicated spreadsheet it can quickly become hours of looking for needles in the haystack, a search that wouldn't be necessary if AI didn't get it almost right. In fact it's almost better if it just gets some big chunk wholesale wrong - at least you can quickly identify the issue and do that part yourself, which you would have had to in the first place anyway.

Getting something almost right, no matter how close, can often be worse than not doing it at all. Undoing/correcting mistakes can be more costly as well as labor intensive. "Measure twice cut once" and all that.

I think of how in video production (edits specifically) I can get you often 90% of the way there in about half the time it takes to get it 100%. Those last bits can be exponentially more time consuming (such as an intense color grade or audio repair). The thing is with a spreadsheet like that, you can't accept a B+ or A-. If something is broken, the whole thing is broken. It needs to work more or less 100%. Closing that gap can be a huge process.

I'll stop now as I can tell I'm running a bit in circles lol


I understand the idea. My position is that this is a largely speculative claim from people who have not spent much time seriously applying agents for spreadsheet or video editing work (since those agents didn’t even exist until now).

“Getting something almost right, no matter how close, can often be worse than not doing it at all” - true with human employees and with low quality agents, but not necessarily true with expert humans using high quality agents. The cost to throw a job at an agent and see what happens is so small that in actual practice, the experience is very different and most people don’t realize this yet.


I guess we just have different ideas of what “high-quality agents“ are and what they should regularly produce.


> The cognitive burden is much lower when the AI can correctly do 90% of the work.

It's a high cognitive burden if you don't know which 10% of the work the AI failed to do / did incorrectly, though.

I think you're picturing a percentage indicating what scope of the work the AI covered, but the parent was thinking about the accuracy of the work it did cover. But maybe what you're saying is if you pick the right 90% subset, you'll get vastly better than 98% accuracy on that scope of work? Maybe we just need to improve our intuition for where LLMs are reliable and where they're not so reliable.

Though as others have pointed out, these are just made-up numbers we're tossing around. Getting 99% accuracy on 90% of the work is very different from getting 75% accuracy on 50% of the work. The real values vary so much by problem domain and user's level of prompting skill, but it will be really interesting as studies start to emerge that might give us a better idea of the typical values in at least some domains.


A lot of people here also make the assumption that the human user would make no errors.

What error rate this same person would find if reviewing spreadsheets made by other people seems like an inherently critical benchmark before we can even discuss whether this is a problem or an achievement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: