Sounds like we need some new training methods. If training could take place loca...

moinnadeem · on March 22, 2022

Disclosure: I work at MosaicML

Yeah, I strongly agree. While Nvidia is working on better hardware (and they're doing a great job at it!), we believe that better training methods should be a big source of efficiency. We've released a new PyTorch library for efficient training at http://github.com/mosaicml/composer.

Our combinations of methods can train CV models ~4x faster to the same accuracy on CV tasks, and ~2x faster to the same perplexity/GLUE score on NLP tasks!

jwuphysics · on March 22, 2022

I've been seeing a lot more about MosaicML on my Twitter feed. Just wanted to ask -- how are your priorities different than, say, Fastai?

zozbot234 · on March 22, 2022

The principled way of doing this is via ensemble learning, combining the predictions of multiple separately-trained models. But perhaps there are ways of improving that by including "global" training as well, where the "separate" models are allowed to interact while limiting overall training costs.

hwers · on March 22, 2022

Trying to reduce energy consumption for ML like this is so silly.

mlyle · on March 22, 2022

Training costs are growing exponentially bigger.

The degree to which energy and capital costs can be optimized will determine how large they can go.

jychang · on March 23, 2022

That's like a person driving the Model T in 1908 saying "trying to reduce gas efficiency is so silly".

Why are people so dumb when it comes to planning for the future? Does it require a 1973 oil crisis to make people concerned about potential issues? Why can't people be preventative instead of reactive? Isn't the entire point of an engineer to optimize what they're building for the good of humanity?

oblio · on March 22, 2022

Reducing energy consumption for computation is not silly.

We're at a point we we're turning into a computation driven society and computation is becoming a globally relevant power consumption aspect.

> global data centers likely consumed around 205 terawatt-hours (TWh) in 2018, or 1 percent of global electricity use

And that's just data centers, if you add all client devices you probably double that.

Plus that number will only continue to grow.

thfuran · on March 22, 2022