Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the argument here is about pedagogy not performance.


minGPT is actually quite performant too, the min refers to breadth of supported functionality (eg the absence of support for various additional conditioning, exotic masking, masked LMs, finetuning, pruning, etc).


GPT training performance on the CPU is funny. The vocab size and context window size have a massive effect on both speed and accuracy.


Sure thing! I only meant to imply the relative ordering of considerations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: