Don't forget that this model probably has far less params than o1 or even 4o. Th...

diedyesterday 5 months ago | parent | context | favorite | on: OpenAI says it has evidence DeepSeek used its mode...

Don't forget that this model probably has far less params than o1 or even 4o. This is a compression/distillation, which means it frees up so much compute resources to build models much powerful than o1. At least this allows further scaling compute-wise (if not in the amount of, non-synthetic, source material available for training).