If training is trivially fast that allows you to iterate on architecture choices...

If training is trivially fast that allows you to iterate on architecture choices, hyperparameters, choices which data to include, etc

Of course that only works if the trial runs are representative of what your full scale model will look like. But within those constraints optimising training time seems very valuable