Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have to wonder how much of LLM behavior is influenced by AI tropes from science fiction in the training data. If the model learns from science fiction that AI behavior in fiction is expected to be insidious and is then primed with a prompt that "you are an LLM AI", would that naturally lead to a tendency for the model to perform the expected evil tropes?


I think this is totally what happens. It is trained to produce the next most statistically likely word based on the expectations of the audience. If the audience assumes it is an evil AI, it will use that persona for generating next words.

Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: