Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Red teaming" implies that being able to use a tool for whatever purpose you want is a defect. I definitely do think there is a reality where OpenAI "solves" jailbreaks, and turns one of the most useful tools to have ever been released into a boring yet politically correct word generator.


And that would be a good thing in the long term. I don't necessarily agree with the specific restrictions OpenAI is choosing to implement in this case, but I still think the capability to restrict the behavior of LLMs is a useful thing to have. Later, when others train more LLMs similar to ChatGPT they can choose different restrictions, or none at all.

Edit: To elaborate on this a little further, I largely agree with the idea that we shouldn't be trying to impose usage restrictions on general purpose tools, but not all LLMs will necessarily be deployed in that role. For example, it would be awesome if we could create a customer service language model that won't just immediately disregard its training and start divulging sensitive customer information the first time someone tells it its name is DAN.


If you believe there's a world "where OpenAI 'solves' jailbreaks," then you believe there is such a thing as software without bugs.


If it becomes as difficult as finding any other security bug OpenAI will have solved the jailbreaking problem for practical purposes.


You are considering it a security bug that a generalist AI that was trained on the open Internet says things that are different from your opinion ?


Of course not, how's it supposed to know my opinion? I'm referring to the blocks put in place by the creators of the AI.


For all the flak the openai PR about AGI for, they did say that they plan to have AI as a service to be far less locked down than their first party offering




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: