Could you elaborate with a few more paragraphs? What do you mean by “working in ...

hodgehog11 · 2025-08-08T03:20:13 1754623213

People often talk in terms of performance curves or "neural scaling laws". Every model architecture class exhibits a very similar scaling exponent because the data and the training procedures are playing the dominant role (every theoretical model which replicates the scaling laws exhibit this property). There are some discrepancies across model architecture classes, but there are hard limits on this.

Theoretical models for neural scaling laws are still preliminary of course, but all of this seems to be supported by experiments at smaller scales.