Sounds like we need some new training methods. If training could take place locally and asynchronously instead of globally through backpropagation, the amount of energy could probably be significantly reduced.
Yeah, I strongly agree. While Nvidia is working on better hardware (and they're doing a great job at it!), we believe that better training methods should be a big source of efficiency. We've released a new PyTorch library for efficient training at http://github.com/mosaicml/composer.
Our combinations of methods can train CV models ~4x faster to the same accuracy on CV tasks, and ~2x faster to the same perplexity/GLUE score on NLP tasks!
The principled way of doing this is via ensemble learning, combining the predictions of multiple separately-trained models. But perhaps there are ways of improving that by including "global" training as well, where the "separate" models are allowed to interact while limiting overall training costs.
That's like a person driving the Model T in 1908 saying "trying to reduce gas efficiency is so silly".
Why are people so dumb when it comes to planning for the future? Does it require a 1973 oil crisis to make people concerned about potential issues? Why can't people be preventative instead of reactive? Isn't the entire point of an engineer to optimize what they're building for the good of humanity?