Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just a quick clarification, attention of the same sort transformers used was already being employed in RNNs for a while. Thus the name "attention is all you need", it turned out you can just remove the recurrent part which makes it hard to train the NN.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: