That makes me wonder though what the best loss function was. I assume you used M...

		espadrine 11 months ago \| parent \| context \| favorite \| on: Using reinforcement learning and $4.80 of GPU time... That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.