Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yes, there are billions of parameters necessary. but large language models only came out about 5 years ago. I'm confident 5 years from now the parameters necessary to get gpt-4 performance will be decreased orders of magnitude.

at the very least, even if that's not the case, inference will be drastically less gpu heavy by then I suspect.



There will also be hardware improvements (as always) and ASIC chips specifically designed for running this kind of model. For example, see this "Optical Transformers" paper [0] and its HN discussion [1] from last month.

[0] https://arxiv.org/abs/2302.10360

[1] https://news.ycombinator.com/item?id=34905210


I could also imagine a sort of two-tier approach, where the on-device model can handle the majority of queries, but recognize when it should pass the query on to a larger model running in the cloud.


Wait, so there's a way to make a model as smart as GPT but with less parameters? Isn't that why it's so good?


This is an older paper, but DeepMind alleges in their Chinchilla paper that far better performance can be extracted with fewer parameters; quote

"We find that current large language models are significantly under-trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant."

It's difficult to evaluate a LLM's performance as it's all qualitative, but Meta's LLaMA has been doing quite well, at even 13B parameters.


Chinchilla is aimed at finding a cost-performance tradeoff as well, not the optimal amount of training. If cost is no barrier because it'll be used forever, then probably there's no amount of training that's good enough.


The rumor I've heard is that GPT4 didn't meaningfully increase the parameter count versus GPT3.5, but instead focused on training and structural improvements.


Well the inference time of gpt4 seems to be far greater than gpt3, so it could hint a difference in parameters count.


if you watch their announcement Livestream video it looked just as fast as normal ChatGPT.

I think what we have access to is a fair bit slower.


You can train a small model to behave like the large model at a subset of tasks.


that's a complicated question to answer. what I'd say is that more parameters makes the model more robust, but there are diminishing returns. optimizations are under way


dont underestimate how many of those parameters are actually necessary to support multiple languages.

If you focus on english only, this can easily reduce the paramters 5fold


Could you explain how supporting multiple languages increases the parameter count so much? I'm genuinely curious.

LLMs seem to be comfortable with hundreds of programming languages, DSLs and application specific syntaxes so how does supporting a couple more natural languages become so expensive?

I see how more training data would be needed, but I don't understand how that maps to a greater parameter count.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: