> With this capability also comes potential risks. We strongly recommend building in user confirmation flows before taking actions that impact the world on behalf of users (sending an email, posting something online, making a purchase, etc).
"When performing actions such as sending emails or posting comments on internet forums, the language learning model may mistakenly attempt to take control of powerful weaponry and engage with humans in physical warfare. For this reason we strongly suggest building in user confirmation prompts for such functionality."
Bonus points if the LLM replies sinisterly with something like, "As a language learning model I am not legally liable for any property damage, loss of life, or sudden regime changes that may occur due to receiving user input. Are you sure that you would like to send this email?"
Altman's Law: Somewhere out there is a webhook connected through a rube goldberg series of systems to nuclear launch codes... and OpenAI will find it. One function call at a time.
This makes me wonder why does OpenAI not build in a mitigation by default that requires a confirmation that they control? Why leave it up to the tool developers to mitigate, many of whom never heard of confused deputy attacks?
Seems like a missed opportunity to make things a little more secure.
Their docs suggest you could allow the model to extract structured data by giving it the ability to call a function like `sql_query(query: string)`, which is presumably connected to your DB.
This seems wildly dangerous. I wonder how hard it would be for a user to convince the GPT to run a query like `DROP TABLE ...`
I think a good mental security model might be - if you wouldn't expose your function as an unsecured endpoint on the web, then you probably shouldn't expose it to a LLM
Yeah thanks for the heads up Sam.