I have the utmost respect for the standards and engineering work done by NIST. I'm left with such cognitive dissonance seeing their name juxtaposed with "AI safety". That said, if anyone can come up with a decent definition, I have faith that NIST would be the ones though I'm not holding my breath.
I think it’s counterproductive to limit it to one definition. There are many levels of safety that we genuinely need to worry about before we even have to worry about the more lofty goals like preventing Skynet. Just off the top of my head:
* does the LLM act like a 4chan commenter when someone is expressive thoughts of self-harm
* can it be used to automate security research
* can it be used to bootstrap from backyard machine shop to weapons manufacturing
* the above but with nuclear fuel enrichment
There are a lot of low hanging fruit like that. They make up what I think is the real meat and potatoes of AI safety in the short to medium term.
None of those are actually things we need to worry about. Humans have already been doing all of that stuff at scale without LLMs.
AI isn't even slightly helpful for weapons manufacturing or nuclear enrichment. The techniques are well known and have been extensively published in open literature.
> None of those are actually things we need to worry about. Humans have already been doing all of that stuff at scale without LLMs.
I don't think we are speaking about the same levels of scale. This reminds me of "its okay for police to look at license plates in public" to a massive distributed network of surveillance cameras doing real time plate recognition and database limits.
It's a difference in degree, not in kind.
> AI isn't even slightly helpful for weapons manufacturing or nuclear enrichment.
I think it could be. I know quite a lot about nuclear history and the fundamentals, but there are 100 questions off the top of my head that I would need to research, find the correct sources, get all the data, figure out which is correct or which might be a smokescreen or a fudge, compile all this into actionable data, to get anything even remotely close to correct.
The LLM can assist in this. Even you say the techniques are well known and published in open literature - I personally don't know where to start to find that information. I'm sure the LLM can not only list these publications, but give me reasonable answers to the 100 questions I had above. In a literal fraction of the time.
> "I'm sure the LLM can not only list these publications, but give me reasonable answers to the 100 questions I had above. In a literal fraction of the time."
Sure. It can also just as easily give you totally wrong answers that sound totally plausible, and it'll seem quite convinced of the accuracy of it's output, because it's just stringing "tokens" (bits of encoded language) together seeking to generate valid sounding output based on it's training dataset and the user's input. What most people label as LLM "hallucinations" is actually all that LLMs do. They don't actually "understand" the output they're generating. It's all statistics and fancy math. They're just as certain of an "incorrect" output as they are of a correct one, because from the "point of view" of the LLM, all output they generate is actually correct, as long as it doesn't outright violate the rules of the language being output or the dataset the model was trained on.
> Humans have already been doing all of that stuff at scale without LLMs.
Every country that has developed nuclear enrichment tech in the last 50 years has done so with the help of a superpower sharing their technology. Pakistan, Iran, and North Korea couldn't have done it without Russia's assistance and stealing tech from URENCO.
> The techniques are well known and have been extensively published in open literature.
That's the point! It's all in the literature. As are all the todo list tech demos and 2048 clones people are using current AI for.
Experts can do it now, but what happens when any idiot can do it assisted by AI?