Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> overconfidence is the problem.

The problem is a bit deeper than that, because what we perceive as "confidence" is itself also an illusion.

The (real) algorithm takes documents and makes them longer, and some humans configured a document that looks like a conversation between "User" and "AssistantBot", and they also wrote some code to act-out things that look like dialogue for one of the characters. The (real) trait of confidence involves next-token statistics.

In contrast, the character named AssistantBot is "overconfident" in exactly the same sense that a character named Count Dracula is "immortal", "brooding", or "fearful" of garlic, crucifixes, and sunlight. Fictional traits we perceive on fictional characters from reading text.

Yes, we can set up a script where the narrator periodically re-describes AssistantBot as careful and cautious, and that might help a bit with stopping humans from over-trusting the story they are being read. But trying to ensure logical conclusions arise from cautious reasoning is... well, indirect at best, much like trying to make it better at math by narrating "AssistantBot was good at math and diligent at checking the numbers."

> Hallucinating

P.S.: "Hallucinations" and prompt-injection are non-ironic examples of "it's not a bug, it's a feature". There's no minor magic incantation that'll permanently banish them without damaging how it all works.



I'd love to know if the conversational training set includes documents where the AI throws its hands up and goes "actually I have no idea". I'm guessing not.


There's also the problem of whether the LLM would learn to generate stories where the AssistantBot gives up in cases that match our own logical reasons, versus ones where the AssistantBot gives up because that's simply what AssistantBots in training-stories usually do when the User character uses words of disagreement and disapproval.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: