Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We don't know that. No one has demonstrated it. It's very likely that at larger scales, for a given amount of compute you cannot train a traditional RNN to be as a good as a transformer.


We are saying the same thing. Transformers are more compute efficient than RNNs. Nobody is denying that but the switch from RNNs didn't precede some performance wall(i.e it's not like we were training bigger RNNs that weren't getting better).

We use Transformers today in large part because they got rid of recursion and in effect could massively parallelize compute.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: