Okay, so the certainty is combined with the raw answer into a single number betw...

yunwal · on Dec 7, 2022

Correct. Or for more complex questions you could feasibly model it as guessing a point in multi-dimensional space. For example, if the answer to a question is a single word, and your loss function is the mse of semantic similarity between the guess and the true answer, but you, the student, think it might be one of 3 words, you could take a weighted average of the vectors representing those words in latent space of a large language model to minimize your mean squared error, where each of the weights is your estimation of the probability of that word being correct.

Sorry was that’s very wordy but hopefully you can get the point.

An easier to understand, but perhaps less sensible example would be to do the same thing in a quiz about arithmetic, so 5+5=9 and 6+2=7 is less wrong than 5+5=10 and 6+2=1.