Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Going to go out on a limb and say they are probably referring to the gradient calculus required for updating the model.

https://en.wikipedia.org/wiki/Differentiable_programming

See automatic differentiation.



Correct, but note that if you subject a standard hash table algo to AD it won't magically become a transformer. (Hashes in the "normal construction" are discrete functions and thus aren't really continuous or differentiable, neither are lookup tables)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: