If training is trivially fast that allows you to iterate on architecture choices, hyperparameters, choices which data to include, etc
Of course that only works if the trial runs are representative of what your full scale model will look like. But within those constraints optimising training time seems very valuable
Of course that only works if the trial runs are representative of what your full scale model will look like. But within those constraints optimising training time seems very valuable