The GPT family of models shines above 100B parameters. Almost nobody uses GPT2 t...

		visarga on Jan 11, 2023 \| parent \| context \| favorite \| on: NanoGPT The GPT family of models shines above 100B parameters. Almost nobody uses GPT2 today. It's too weak. If you want to go with <1B model, you use a BERT which is bidirectional or a T5 that is easier to fine-tune on other tasks.