If you are referring to what is theoretically possible with arbitrary computatio...

usgroup · on May 31, 2024

With a fixed architecture, and a fixed dataset, as mentioned. So, a specific kind of neural network, and a fixed dataset.

jononor · on May 31, 2024

You can estimate it empirically. However large changes in model parameters/capacity tends to interact with hyperparameters, so would want to do runs with multiple values of hyperparameters. And training processes give noisy results, so might want to do multiple repetitions. And each run may take several GPU days. So even a small experiment of 10 repetitions X 10 hyperparameters X 10 model sizes takes several thousand GPU days. But there are many papers from the large labs that do such.

And the whole result is also conditional on the optimization/training process used. Which is an area where we have no reason to think that we are optimal... So we can do studies with practical results (given sufficient money), but we are far from being able to identify the actual maximums available.

renonce · on May 31, 2024

The closest research would be the Chinchilla scaling laws, which estimates the final loss as a function of number of parameters and tokens. Set the number of tokens to infinity would give a good estimate of minimum achievable loss.