Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The GPT family of models shines above 100B parameters. Almost nobody uses GPT2 today. It's too weak.

If you want to go with <1B model, you use a BERT which is bidirectional or a T5 that is easier to fine-tune on other tasks.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: