Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We know architecture and training procedures matter in practice.

MLPs and transformers are ultimately theoretically equivalent. That means there is an MLP that represent the any function a given transformer can. However, that MLP is hard to identify and train.

Also the transformer contains MLPs as well...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: