Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this is way too simplistic.

First, LLMs seem to be still on a trajectory of more parameters is better. So the convergence is a man-made decision to stop for the sake of memory usage and money spend on training.

I don't think there is anything inherent in LLMs which prohibit adjusting weights over time or branch out to another model which is more regularly updated and contains novel stuff. I would well imagine that they integrate a "news model" which is trained every 24h to incorporate text scraped from news-sites. This can be called from the main LLM.

If we look at the human brain we realize that there are dedicated systems for specialized processing (think spatial reasoning vs judging colors). I would imagine that OpenAI would eventually to have a similar more complex system structure.



A more parameter LLM is basically a new model that you have to train from scratch. You can't pick up say GPT3 and turn it into Chinchilla-grade LLMs with 175B params with just the difference in training between the two. The former has "converged" and more effort and money put into it would bring negligible benefits. The progress is basically reaching an asymptote.

And of course there is nothing saying future progress won't make a new model that can learn on the go. But right now, the transformer structure that made LLMs such a huge thing is inherently incapable of learning from its own output. Or at least, it isn't really feasible with how demanding the cost to it would be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: