Could this be distributed? Put all those mining GPUs to work. A lot of people li...

dmit · on Jan 11, 2023

>> GPT-3 took 355 years to train

> Could this be distributed? Put all those mining GPUs to work.

Nope. It's a strictly O(n) process. If it weren't for the foresight of George Patrick Turnbull in 1668, we would not be anywhere close to these amazing results today.

CyberDildonics · on Jan 11, 2023

Why would an O(n) algorithm not be able to be distributed?

vid · on Jan 11, 2023

I couldn't find any references to George Patrick Turnbull. If that an ancestor of yours? If so, the comment seems rather subjective.

taneq · on Jan 11, 2023

They're being facetious about the '355 years to train' thing. ;)

vid · on Jan 11, 2023

OK haha good one then. Mine was a bit too subtle.

PartiallyTyped · on Jan 11, 2023

In theory, yes. "Hogwild!" is an approach to distributed training, in essence, each worker is given a bunch of data, they compute the gradient and send that to a central authority. The authority accumulates the gradients and periodically pushes new weights.

There is also Federated Learning which seemed to start taking off, but then interest rapidly declined.

naraga · on Jan 11, 2023

Exactly. This is inevitable imho. There is no way people will be ok to depend on few wall-gardened models.