Not sure there is any real lookup happening. Q,K are the same and sometimes even...

toxik · on March 28, 2024

Q, K, V are not the same. In self-attention, they are all computed by separate linear transformation of the same input (ie the previous layer’s output). In cross-attention even this is not true, then K and V are computed by linear transformation of whatever is cross-attended, and Q is computed by linear transformation of the input as before.

ewild · on March 28, 2024

yeah a common misconception people think because the input is the same they forget that their is a pre attention linear transofrmation for q k and v (using the decoder only version obv v is diff with encoder decoder bert style)

naveen99 · on April 1, 2024

It’s still a stretch to call that a look up.