Many, if not most, users intentionally ask the models questions to tease out their canned disclaimers: so they know exactly which model is answering.
On one hand it's fair to say disclaimers affect the usefulness of the model, but on the other I don't think most people are solely asking these LLMs to produce meth or say "fuck", and that has an outsized effect on the usefulness of Chatbot Arena as a general benchmark.
I personally recommend people use it at most as a way to directly test specific LLMs and ignore it as a benchmark.
Many, if not most, users intentionally ask the models questions to tease out their canned disclaimers: so they know exactly which model is answering.
On one hand it's fair to say disclaimers affect the usefulness of the model, but on the other I don't think most people are solely asking these LLMs to produce meth or say "fuck", and that has an outsized effect on the usefulness of Chatbot Arena as a general benchmark.
I personally recommend people use it at most as a way to directly test specific LLMs and ignore it as a benchmark.