This is an older paper, but DeepMind alleges in their Chinchilla paper that far ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		month13 on March 23, 2023 \| parent \| context \| favorite \| on: Transformer architecture optimized for Apple Silic... This is an older paper, but DeepMind alleges in their Chinchilla paper that far better performance can be extracted with fewer parameters; quote "We find that current large language models are significantly under-trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant." It's difficult to evaluate a LLM's performance as it's all qualitative, but Meta's LLaMA has been doing quite well, at even 13B parameters.

astrange on March 24, 2023 [–]

Chinchilla is aimed at finding a cost-performance tradeoff as well, not the optimal amount of training. If cost is no barrier because it'll be used forever, then probably there's no amount of training that's good enough.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact