Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can indeed train state of the art text models with this data provided you come up with some better architecture that what we currently have. What I am saying is that you cannot train state of the ART "GPT" models like the link is saying.

Your link is for GPT-2, GPT-3 used much, much more data.

GPT-3 was trained in part on "books2" corpus which is not public but seems to basically be the same thing as this: 200k books * 100k words per book on average * ~3 token per words = 60B tokens, books2 is 55B tokens so it checks out.

The total amount of tokens that GPT-3 was trained on from all sources is a combined 500B tokens, this is merely 10% of what they have.

https://arxiv.org/pdf/2005.14165.pdf



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: