Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think GP means that if you internalize the bitter lesson (more data more compute wins), you stop imagining how to squeeze SOTA minus 1 performance out of constrained compute environments.


This. When we ran out of speed on the CPU, we moved to the GPU. Same thing here. The more we work with (22T) models, quants, and decimating precision - the more we learn and find more novel ways to do things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: