You can indeed train state of the art text models with this data provided you co...

You can indeed train state of the art text models with this data provided you come up with some better architecture that what we currently have. What I am saying is that you cannot train state of the ART "GPT" models like the link is saying.

Your link is for GPT-2, GPT-3 used much, much more data.

GPT-3 was trained in part on "books2" corpus which is not public but seems to basically be the same thing as this: 200k books * 100k words per book on average * ~3 token per words = 60B tokens, books2 is 55B tokens so it checks out.

The total amount of tokens that GPT-3 was trained on from all sources is a combined 500B tokens, this is merely 10% of what they have.

https://arxiv.org/pdf/2005.14165.pdf