I think the argument here is about pedagogy not performance.

karpathy · on Aug 17, 2020

minGPT is actually quite performant too, the min refers to breadth of supported functionality (eg the absence of support for various additional conditioning, exotic masking, masked LMs, finetuning, pruning, etc).

minimaxir · on Aug 17, 2020

GPT training performance on the CPU is funny. The vocab size and context window size have a massive effect on both speed and accuracy.

activatedgeek · on Aug 17, 2020

Sure thing! I only meant to imply the relative ordering of considerations.