Llama 1, 2, and 3 all have different architectures and needed to be trained from scratch.
Llama 1 was released February 2023.
Same training story for openAI’s Sora, dalle, and 4o. All of mistral’s models Mamba, Kan, and Each version of rwkv (they’re on 6 now)
Not that this list is a result of survivor bias.
It’s only looking at their published models too. Not the probably 1000s of individual training experiments that go into producing each model.
Which is still absolutely nothing compared to something like youtube’s servers, which is absolutely nothing compared to something like the food industry.
Like, if a couple of millions of people can use chatgpt in the manner they do today, would it matter if a house’s yearly energy budget was used up for that? Or 10?
The 1M GPUs that meta purchased running at full (or less probably) load 24/7 is more than the energy of a single household.
The energy cost is in training, not inference.