Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is precisely what I recommend to people starting out with LLMs: do not start with the architecture, start with their behavior - use them for a while as a black box and then circle back and learn about transformers and cross entropy loss functions and whatever. Bottom-up approaches to learning work well in other areas of computing, but not this - there is nothing in the architecture to suggest the emergent behavior that we see.


This is more or less how I came to the mental model I have that I refer to above. It helps me tremendously in knowing what to expect from every model I’ve used.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: