Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's massive hardware and energy infra built out going on. None of that is specialized to run only transformers at this point, so wouldn't that create a huge incentive to find newer and better architectures to get the most out of all this hardware and energy infra?


>None of that is specialized to run only transformers at this point

isn't this what [etched](https://www.etched.com/) is doing?


Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.


devil is in the details




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: