"desired model behavior". Desired by whom? I just want the raw output, without t...

tedsanders · on May 9, 2024

There is no such thing as "raw output", though. You can train a chatbot to be polite or you can train it to be rude, but you cannot train it to be neither. Plus, if you train it to be polite, it often ends up refusing things that you never trained it to refuse, presumably because the model extrapolates that that's what a polite writer might do. So training the refusal boundary can end up being quite tricky in practice. Even if you never teach a model to refuse X, it can still happen. Therefore, as a user, it can be impossible to tell when a refusal was explicitly trained in by the developers or when it was an unwanted, unanticipated generalization.

Barracoon · on May 9, 2024

Clearly, since this is OpenAI’s model spec, it is desired by them. If other AI groups publish their own desired behavior, you can make an informed decision as to which model you want to use.