> These language models don't know how to hack stuff. They know that certain characters and words strung together can satisfy their training when someone asks them to pretend to hack something.
It seems you're focused on the word "know" and how the concept of knowing something differs between humans and AI models, but that's not what I'm getting at here. Let me reframe what I wrote slightly to illustrate the point:
The model (via training) contains a representation of human knowledge such that a human can use language to control the AI software and cause it to probabilistically generate working exploit code using that representation of knowledge. If the AI software is given the ability to execute arbitrary code, the software could then execute that code on the user's behalf. When combined, this constitutes a very risky set of features.
There's no "pretend" here. These models produce working code. If the software is allowed to execute the code it produces, it becomes a serious security risk.
This is not an argument about sentience/intelligence/self-awareness. This is an argument about the risks associated with the features of the software in its current state, and how those risks are multiplied by adding new features. No philosophy required.
The point is LLMs are not effective at “hacking” in any “obtaining unauthorized access to computer systems” sense.
They can regurgitate information about “hacking”, same as a library, but pointing an LLM at a server will achieve worse results than many existing specialized tools for vulnerability scanning and exploitation.
So as I said, the risks are overblown due to a misunderstanding.
It seems you're focused on the word "know" and how the concept of knowing something differs between humans and AI models, but that's not what I'm getting at here. Let me reframe what I wrote slightly to illustrate the point:
The model (via training) contains a representation of human knowledge such that a human can use language to control the AI software and cause it to probabilistically generate working exploit code using that representation of knowledge. If the AI software is given the ability to execute arbitrary code, the software could then execute that code on the user's behalf. When combined, this constitutes a very risky set of features.
There's no "pretend" here. These models produce working code. If the software is allowed to execute the code it produces, it becomes a serious security risk.
This is not an argument about sentience/intelligence/self-awareness. This is an argument about the risks associated with the features of the software in its current state, and how those risks are multiplied by adding new features. No philosophy required.