if you prompted ChatGPT with something like "harm John Doe" and the response comes back "ok i will harm John Doe" then what happens next? The language model has no idea what harm even means much less the instructions to carry out an action that would be considered "harm". You'd have to build something in like `if response contains 'cause harm' then launch_nukes;`
But in short, as I said in my GP comment, systems like OpenAssistant are being given the ability to make network calls in order to take actions.
Regardless of whether the system "knows" what an action "means" or if those actions construe "harm", if it hallucinates (or is prompt-hijacked into) a script kiddie personality in its prompt context and starts emitting actions that hack external systems, harm will ensue.
Perhaps at first rather than "launch nukes", consider "post harassing/abusive tweets", "dox this person", "impersonate this person and do bad/criminal things", and so on. It should require little imagination to come up with potential harmful results from attaching a LLM to `eval()` on a network-connected machine.
We already have a model running in prod that is taught to perform web searches as part of generating the response. That web search is basically an HTTP request, so in essence the model is triggering some code to run, and it even takes parameters (the URL). What if it is written in such a way that allows it to make HTTP requests to an arbitrary URL? That alone can already translate to actions affecting the outside environment.
You don't need an API to kill people to cause someone to get seriously hurt. If you can, say, post to public forums, and you know the audience of those forums and which emotional buttons of said audience to push, you could convince them to physically harm people on your behalf. After all, we have numerous examples of people doing that to other people, so why can't an AI?
And GPT already knows which buttons to push. It takes a little bit of prompt engineering to get past the filters, but it'll happily write inflammatory political pamphlets and such.
It's a language model, and language itself is pretty good at encoding meaning. ChatGPT is already capable of learning that "do thing X" means {generate and output computer code that probably does X}.