Is the behavior of that the attention + FF displacements tend point in the same ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		robrenaud on Feb 5, 2024 \| parent \| context \| favorite \| on: Beyond self-attention: How a small language model ... Is the behavior of that the attention + FF displacements tend point in the same direction known? I am kind of surprised they are even in the same latent space across layers. The FF network could be doing arbitrary rotations, right? I suspect I misunderstand what is going on.

yorwba on Feb 5, 2024 | [–]

It's a 2D representation of very high-dimensional vectors. Something has to be left out and accurately depicting arbitrary rotations in the high-dimensional space is one of those things.

mirekrusin on Feb 5, 2024 | [–]

Best to replace attention addition with scaling and see.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact