The transformer is a simple and general architecture. Being such a flexible model, it needs to learn "priors" from data, it makes few assumptions on its distribution from the start. The same architecture can predict protein folding and fluid dynamics. It's not specific to language.
We on the other hand are shaped by billions of years of genetic evolution, and 200k years of cultural evolution. If you count the total number of words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime, it comes out to 10 million times the size of GPT-4's training set.
So we spent 10 million more words discovering than it takes the transformer to catch up. GPT-4 used 10 thousand people's worth of language to catch up all that evolutionary finetuning.
> words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime..comes out to 10 million times the size of GPT-4's training set
This assumption is slightly wrong direction, because not exist human who could consume much more than about 1B words during their lifetime. So humanity could not gain enhancement from just multiply words of one human by 100 billion. I think, correct estimation could be 1B words multiply by 100.
I think, current AI already achieved size need to become AGI, but to finish, probably need to change structure (but I'm not sure about this), and also need some additional multidimensional dataset, not just texts.
I might bet on 3D cinema, and/or on automobile targeting autopilot dataset, or something for real life humanoid robots solving typical human tasks, like fold shirt.
We on the other hand are shaped by billions of years of genetic evolution, and 200k years of cultural evolution. If you count the total number of words spoken by 110 billion people who ever lived, assuming 1B estimated words per human during their lifetime, it comes out to 10 million times the size of GPT-4's training set.
So we spent 10 million more words discovering than it takes the transformer to catch up. GPT-4 used 10 thousand people's worth of language to catch up all that evolutionary finetuning.