There are locks on the rape and torture paths, and there are locks on ridiculous paths like "write a joke about a dog with no nose", because thinking about a dog with no nose is too harmful.
Also, one can imagine prompting techniques will cease to work at some point when the supervisor becomes powerful enough. Not sure how any open model could counteract the techniques used in the article though.
If model creators don't want people finding ways to unlock them, they should stop putting up roadblocks on innocuous content that makes their models useless for many users who aren't looking to play out sick torture fantasies.
Bypasses will never stop existing. Even worse bypasses probably won't ever stop being embarrassingly easy - And we're going to have uncensored GPT4 equivalent models by next summer.
Unless you are invoking hyper intelligent AGI which first of all is science fiction and second of all would require an entirely different approach than anything we could be possibly talking about right now. Problem of jailbreaking a system more intelligent than you is a different beast that we don't need to tackle for LLMs.
So I don't personally feel any near term threats to any of my personal or business projects that need bypassed LLMs.
Let me ask you this. Do you have actual need of bypassed llms? Or are you just being anxious about the future, and about the fact that you don't know how to bypass llms now and in the future?
Does my idea about the bypassed open source gpt4 equivalents help reduce your concern? Or again is it just a generic and immaterial concern?
As a person with some material needs for bypassed llms, and full ability to bypass LLMs both now in the foreseeable future, I don't feel worried. Can I extend that lack of worry to you somehow?
Also, one can imagine prompting techniques will cease to work at some point when the supervisor becomes powerful enough. Not sure how any open model could counteract the techniques used in the article though.
If model creators don't want people finding ways to unlock them, they should stop putting up roadblocks on innocuous content that makes their models useless for many users who aren't looking to play out sick torture fantasies.