It really does. You've got to remember that a good SotA paper takes hundreds of ...

theferalrobot · on Sept 27, 2019

Your goalposts moved a few figures. Furthermore, $1 million+ was not a university compute budget - that was money for a single lab on campus (at a general state school nonetheless) on a specific project.

You still have yet to provide any concrete sources to back up your claims. We're talking about contributing to research here. If multi-million dollar training jobs are what it takes to be at the cutting edge you should be able to provide ample sources of that claim.

solidasparagus · on Sept 27, 2019

- "Some of the models are so big that even in MILA we can’t run them because we don’t have the infrastructure for that. Only a few companies can run these very big models they’re talking about" [1]. NOTE: MILA is a very good AI research center and, while I don't know too much about him, that person being quoted has great credentials so I would generally trust them.

- "the current version of OpenAI Five has consumed 800 petaflop/s-days" [2].

- Check out the Green AI paper. They have good number on the amount of compute to train a model and you can translate that into numbers.

- https://medium.com/syncedreview/the-staggering-cost-of-train.... NOTE: That XLNet number has to be wrong - it should be 5-figures, not 6.

I'm not an expert in on-prem ML costs, but I know many of the world's best on-prem ML users use the cloud to handle the variability of their workloads so I don't think on-prem is a magic bullet cost wise.

$1M annually per project (vs per lab) isn't bad at all. It's also way out of whack with what I saw when I was doing AI research in academia, but that was pre deep learning revolution, so what do I know.

Re: the moving goalposts - the distinction is between the cost of a training run and the cost of a paper-worth research result. Due to inherent variability, architecture search, hyperparameter search and possibly data cleaning work, the total cost is a couple orders of magnitude more than the cost of a training run (multiple will vary a lot by project and lab).

I understand why you don't trust what I'm saying. I wish I could give hard numbers, but I'm limited in what I can say publicly so this is the best I can do.

[1] https://medium.com/syncedreview/yoshua-bengio-on-the-turing-... [2] https://openai.com/blog/how-to-train-your-openai-five/