Thanks to Meta for their work on safety, particularly Llama Guard. Llama Guard 3 adds defamation, elections, and code interpreter abuse as detection categories.
Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.
Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.
Very bullish on open source.
[0] https://www.promptfoo.dev/docs/red-team/