Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is the behavior of that the attention + FF displacements tend point in the same direction known? I am kind of surprised they are even in the same latent space across layers. The FF network could be doing arbitrary rotations, right? I suspect I misunderstand what is going on.


It's a 2D representation of very high-dimensional vectors. Something has to be left out and accurately depicting arbitrary rotations in the high-dimensional space is one of those things.


Best to replace attention addition with scaling and see.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: