Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ah, that makes sense. So, we consider two hidden layers more as "memory" or "buffers", and actually the rule is implemented in just one layer, at least for a single token.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: