Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you are referring to what is theoretically possible with arbitrary computation in the model, it's called Kolmogorov complexity and it's not computable.


With a fixed architecture, and a fixed dataset, as mentioned. So, a specific kind of neural network, and a fixed dataset.


You can estimate it empirically. However large changes in model parameters/capacity tends to interact with hyperparameters, so would want to do runs with multiple values of hyperparameters. And training processes give noisy results, so might want to do multiple repetitions. And each run may take several GPU days. So even a small experiment of 10 repetitions X 10 hyperparameters X 10 model sizes takes several thousand GPU days. But there are many papers from the large labs that do such.

And the whole result is also conditional on the optimization/training process used. Which is an area where we have no reason to think that we are optimal... So we can do studies with practical results (given sufficient money), but we are far from being able to identify the actual maximums available.


The closest research would be the Chinchilla scaling laws, which estimates the final loss as a function of number of parameters and tokens. Set the number of tokens to infinity would give a good estimate of minimum achievable loss.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: