Hacker News new | past | comments | ask | show | jobs | submit login

Chatbot Arena is not a blind ranking.

Many, if not most, users intentionally ask the models questions to tease out their canned disclaimers: so they know exactly which model is answering.

On one hand it's fair to say disclaimers affect the usefulness of the model, but on the other I don't think most people are solely asking these LLMs to produce meth or say "fuck", and that has an outsized effect on the usefulness of Chatbot Arena as a general benchmark.

I personally recommend people use it at most as a way to directly test specific LLMs and ignore it as a benchmark.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: