Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That’s kind of what (R/C)NNs did before the Attention is all you need paper introduced the attention mechanism. One of the breakthroughs that enabled GPT is giving each token equal “weight” through cross attention instead of letting them get attenuated in some sort of summarization mechanism.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: