Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks to Meta for their work on safety, particularly Llama Guard. Llama Guard 3 adds defamation, elections, and code interpreter abuse as detection categories.

Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.

Very bullish on open source.

[0] https://www.promptfoo.dev/docs/red-team/



is there a #2 to llamaguard? Meta seems curiously alone in doing this kind of, lets call it, "practical safety" work




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: