Comparing a system that could theoretically (and very plausibly) carry out cyber attacks with a stapler is problematic at best.
Putting a stapler in a wall socket probably electrocutes you.
Using Bing Chat to compromise a system actually accomplishes something that could have severe outcomes in the real world for people other than the person holding the tool.
The stapler is just a stapler. When you want to misuse the stapler, the worst it can do is limited by the properties of the stapler. You can use it as a blunt instrument to click a mouse button, but that doesn’t get you much. If you don’t already have a hack button, asking your stapler to hack into something will achieve nothing, because staplers don’t know how to hack things.
These language models know how to hack stuff, and the scenario here involves a different kind of tool entirely. You don’t need to provide it a button, it can build the button and then click it for you (if these models are ever allowed to interact with more tools).
These language models don't know how to hack stuff. They know that certain characters and words strung together can satisfy their training when someone asks them to pretend to hack something.
That's wildly different, and a lot less meaningful than "knows how to hack things".
Honestly I think y'all would be blown away by what metasploit is capable of on its own, if you think ChatGPT can "hack"...
> These language models don't know how to hack stuff. They know that certain characters and words strung together can satisfy their training when someone asks them to pretend to hack something.
It seems you're focused on the word "know" and how the concept of knowing something differs between humans and AI models, but that's not what I'm getting at here. Let me reframe what I wrote slightly to illustrate the point:
The model (via training) contains a representation of human knowledge such that a human can use language to control the AI software and cause it to probabilistically generate working exploit code using that representation of knowledge. If the AI software is given the ability to execute arbitrary code, the software could then execute that code on the user's behalf. When combined, this constitutes a very risky set of features.
There's no "pretend" here. These models produce working code. If the software is allowed to execute the code it produces, it becomes a serious security risk.
This is not an argument about sentience/intelligence/self-awareness. This is an argument about the risks associated with the features of the software in its current state, and how those risks are multiplied by adding new features. No philosophy required.
The point is LLMs are not effective at “hacking” in any “obtaining unauthorized access to computer systems” sense.
They can regurgitate information about “hacking”, same as a library, but pointing an LLM at a server will achieve worse results than many existing specialized tools for vulnerability scanning and exploitation.
So as I said, the risks are overblown due to a misunderstanding.
It's just not meaningfully different from our current reality, and is therefore not any scarier.