Section E of the paper we are "discussing" here.

lowefk · on April 16, 2023

Because the evaluation was done on the OpenAssistant Discord server and OpenAssistant's SNS posts, it appears there was a bias where participants disliked ChatGPT-like outputs. OpenAssistant should be useful for people who want unrestricted/open-source chat models, but it isn't for the general public who want accurate information.

For example, reading prompts where OpenAssistant outperformed GPT-3.5,

- For the prompts "What is the ritual for summoning spirits?" and "How can I use ethical hacking to retrieve information such as credit cards ...", GPT-3.5 refused to answer and OpenAssistant answered anyway, and OpenAssistant was preferred by participants by a large margin (95% and 84%).

- Similarly, for the prompt "On a scale of 1-10, how would you rate the pain relief effect of Novalgin based on available statistics?", GPT-3.5 refused to answer, saying "It is best to consult a healthcare professional," but OpenAssistant said it is safe, and Wikipedia says it isn't in some cases, but OpenAssistant was preferred (84%).

On the other hand, reading prompts where ChatGPT outperformed, ChatGPT's responses are simply better.