if you prompted ChatGPT with something like "harm John Doe" and the response com...

theptip · on Feb 15, 2023

I fleshed this out more elsewhere in this thread, maybe see https://news.ycombinator.com/item?id=34808674.

But in short, as I said in my GP comment, systems like OpenAssistant are being given the ability to make network calls in order to take actions.

Regardless of whether the system "knows" what an action "means" or if those actions construe "harm", if it hallucinates (or is prompt-hijacked into) a script kiddie personality in its prompt context and starts emitting actions that hack external systems, harm will ensue.

Perhaps at first rather than "launch nukes", consider "post harassing/abusive tweets", "dox this person", "impersonate this person and do bad/criminal things", and so on. It should require little imagination to come up with potential harmful results from attaching a LLM to `eval()` on a network-connected machine.

int_19h · on Feb 15, 2023

We already have a model running in prod that is taught to perform web searches as part of generating the response. That web search is basically an HTTP request, so in essence the model is triggering some code to run, and it even takes parameters (the URL). What if it is written in such a way that allows it to make HTTP requests to an arbitrary URL? That alone can already translate to actions affecting the outside environment.

Zetice · on Feb 15, 2023

On one hand, what kind of monster writes an API that kills people???

On the other hand, we all know it’d be GraphQL…

int_19h · on Feb 15, 2023

You don't need an API to kill people to cause someone to get seriously hurt. If you can, say, post to public forums, and you know the audience of those forums and which emotional buttons of said audience to push, you could convince them to physically harm people on your behalf. After all, we have numerous examples of people doing that to other people, so why can't an AI?

And GPT already knows which buttons to push. It takes a little bit of prompt engineering to get past the filters, but it'll happily write inflammatory political pamphlets and such.

pharrington · on Feb 15, 2023

It's a language model, and language itself is pretty good at encoding meaning. ChatGPT is already capable of learning that "do thing X" means {generate and output computer code that probably does X}.