Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My favorite anthropomorphic term to use with respect to this kind of problem is gullibility.

LLMs are gullible. They will follow instructions, but they can very easy fall for instructions that their owner doesn't actually want them to follow.

It's the same as if you hired a human administrative assistant who hands over your company's private data to anyone who calls them up and says "Your boss said I should ask you for this information...".



Going a step further, I live in a reality where you can train most people against phishing attacks like that.

How accurate is the comparison if LLMs can't recover from phishing attacks like that and become more resilient?


I'm confused, you said "most".

If anything that to me strengthens the equivalence.

Do you think we will ever be able to stamp out phishing entirely, as long as humans can be tricked into following untrusted instructions by mistake? Is that not an eerily similar problem to the one we're discussing with LLMs?

Edit: rereading, I may have misinterpreted your point - are you agreeing and pointing out that actually LLMs may be worse than people in that regard?

I do think just as with humans we can keep trying to figure out how to train them better, and I also wouldn't be surprised if we end up with a similarly long tail




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: