Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you just want to understand the Transformer, here is a clean implementation:

https://github.com/blue-season/pywarm/blob/master/examples/t...



and here's a breakdown of the architecture:

http://dugas.ch/artificial_curiosity/GPT_architecture.html


These 4 videos (~45 mins) do an excellent job at explaining attention, multi-headed attention, and transformers: https://www.youtube.com/watch?v=yGTUuEx3GkA




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: