Yes, but the underlying point is that in this case you can train the AI in parallel, and there's a decent chance this or something like it will be true for future AI architectures too. What does it matter that the AI needs to be trained on 20 years of experiences if all of those 20 years can be experienced in 6 months given the right hardware?
I think we're talking at cross-purposes here. I understand you, but what if the type of learning that leads to intelligence is inherently serial in some important way and can't just be parallelized? What if the fact that it takes a certain amount of chronological time is important? etc
What I'm trying to express is we seem to want to cherry pick certain features from nature but ignore others that are inconvenient and that is understandable, but currently because our knowledge of the biological systems is so incomplete we really don't know which (if any) of these features gives rise to intelligence. For all we know we could be doing training in a seemingly-efficient way that completely precludes intelligence actually emerging.
Estimates put training of gpt4 at something like 2500 gpu years to train, over about 10000 gpus. 20 years would be a big improvement.