> I bet that uncensored models also give more accurate answers in general. Doubt...

bhouston · on Aug 2, 2023

I understand that it is the same technique for both. This makes sense.

But to train a model to deny truths which are politically taboo does seem to be misaligned with training a model to favor truths, no? And what is taboo can be very broad if you want to make everyone happy.

I would rather know the noble lie [1] is a lie, and then repeat it willing instead of not knowing it is a lie. My behavior in many situations will likely differ because I am operating with a more accurate model of the world, even if it isn't outwardly explicitly expressed.

[1] https://en.wikipedia.org/wiki/Noble_lie

cubefox · on Aug 3, 2023

> But to train a model to deny truths which are politically taboo does seem to be misaligned with training a model to favor truths, no?

Strictly speaking, RLHF trains models to give answers which the human raters believe to be correct. In uncontroversial territory this correlates with truth, in taboo territory only with what is politically correct.