Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure there is any real lookup happening. Q,K are the same and sometimes even v is the same…


Q, K, V are not the same. In self-attention, they are all computed by separate linear transformation of the same input (ie the previous layer’s output). In cross-attention even this is not true, then K and V are computed by linear transformation of whatever is cross-attended, and Q is computed by linear transformation of the input as before.


yeah a common misconception people think because the input is the same they forget that their is a pre attention linear transofrmation for q k and v (using the decoder only version obv v is diff with encoder decoder bert style)


It’s still a stretch to call that a look up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: