I got news for you. The 1M GPUs that meta purchased running at full (or less pro...

kaba0 · on May 31, 2024

Which is the same order of magnitude? Also, how often do they train from scratch?

dartos · on May 31, 2024

I don’t understand your first question, sorry.

But for the second:

Llama 1, 2, and 3 all have different architectures and needed to be trained from scratch. Llama 1 was released February 2023.

Same training story for openAI’s Sora, dalle, and 4o. All of mistral’s models Mamba, Kan, and Each version of rwkv (they’re on 6 now)

Not that this list is a result of survivor bias. It’s only looking at their published models too. Not the probably 1000s of individual training experiments that go into producing each model.

kaba0 · on June 2, 2024

Which is still absolutely nothing compared to something like youtube’s servers, which is absolutely nothing compared to something like the food industry.

Like, if a couple of millions of people can use chatgpt in the manner they do today, would it matter if a house’s yearly energy budget was used up for that? Or 10?

dartos · on June 3, 2024

I guess because training AI doesn’t use as much energy as the largest energy consumers on the planet, we just shouldn’t even worry about it.