The only way to really do it is to add a second layer of processing that evaluat...

int_19h · on May 9, 2024

What we get instead is both layers at once. Try asking questions like these to Bing instead of ChatGPT - it's the same GPT-4 (if set to "creative") under the hood, and quite often it will happily start answering... only to get interrupted midsentence and the message replaced with something like "I'm sorry, I cannot assist with that".

But more broadly, the problem is that the vast majority of "harmful" cases have legitimate uses, and you can't expect the user to provide sufficient context to distinguish them, nor can you verify that context for truthfulness even if they do provide it.

flir · on May 9, 2024

That struck me too. You don't need to lobotomize the model that answers questions, you just need to filter out "bad" questions and reply "I'm sorry Dave, I'm afraid I can't do that".

Would it be 2x cost? Surely the gatekeeper model can be a fair bit simpler and just has to spit out a float between 0 and 1.

(caveat: this is so not my area).