An idea I hear often listening to talks about LLMs, is that training on a larger...

An idea I hear often listening to talks about LLMs, is that training on a larger (assuming constant quality) and more various data leads to the emergence of grater generalization and reasoning (if I may use this word) across task categories. While the general quality of a model has a somewhat predictable correlation with the amount of training, the amount of training where specific generalization and reasoning capabilities emerge is much less predictable.