I wonder, is a community trained model feasible? As in, get a few tens of thousa...

karpathy · on Aug 17, 2020

Fun idea. GPT @ Home :D. Scatter of the inputs would be very cheap as they are tiny LongTensors (sequences of indices), but the Gather of the gradients seems like a bottleneck. These models can be quite large. Maybe each worker only communicates back some sparse or potentially precision-reduced gradients? In the limit, I recall papers that were able to only communicate one bit per dimension. May also be possible to further reduce the number of weights by weight sharing, or e.g. with HyperNetworks.

0-_-0 · on Aug 18, 2020

I long wanted to see a proof-of-work cryptocurrency that does neural network training instead of burning through hashes. Imagine if 0.5% of the planet's energy consumptions (9 GW) was used for training neural networks instead of mining bitcoin! It would also solve the problem of ASICS being 1000x more efficient than GPUs, so everyone can participate. It would incentivise the development of efficient neural network training hardware. Somebody do this already!

londons_explore · on Aug 18, 2020

I built this a few years ago:

https://github.com/Hello1024/shared-tensor

It does updates to weights based on 1 bit precision updates each iteration.

It would be fairly trivial to go to less than 1 bit precision too - simply set some threshold (eg 3), and wherever the difference between the weight on the server and the client is greater than 3, transmit a binary "1", else send a binary "0". Then entropy code all the resulting binary.

By adjusting the threshold up and down, you trade off the size of the data to send Vs precision.

0-_-0 · on Aug 18, 2020

I read a paper that did exactly what you describe but of course I can't find it now...

gwern · on Aug 18, 2020

https://www.reddit.com/r/MachineLearning/comments/i9u6u3/d_g...

totetsu · on Aug 18, 2020

I wonder if it would just be simpler to collect the money that would be spent on electricity and use the cloud anyway.

londons_explore · on Aug 18, 2020

People's home GPU's are much much cheaper than using the cloud because of the way Nvidias licensing works to jack cloud gpu prices.

mattlutze · on Aug 18, 2020

Distributed training is indeed a thing! A few random Arxiv pulls:

https://arxiv.org/abs/2007.03970 https://arxiv.org/abs/1802.09941

Efficiency and resource cost is the big question though. You don't pay for the electricity or part wear that you don't use, and home computers or workstations may not be as efficient at performing a training run vs a task-specific setup. AI@home might end up costing even more, and increase the footprint of the model more, than doing it all together.

Part of the magic really needed is finding simpler ways to achieve the same levels of model robustness.

worldsayshi · on Aug 18, 2020

> electricity or part wear

Sure part wear is relevant but I feel that most parts worldwide gets chucked way before they are worn out. Electric efficiency is probably quite worse though. Although you could possibly find opportunity in maximizing the load in regions that have surplus energy and/or renewable sources.

thewarrior · on Aug 17, 2020

Seems doable to me