People who can use the 585B model will use the best model they can have. What DeepSeek really did was start an AI "space race" to AGI with China, and this race is running on Nvidia GPUs.
Some hobbyists will run the smaller model, but if you could, why not use the bigger & better one?
Model distillation has been a thing for over a decade, and LLM distillation has been widespread since 2023 [1].
There is nothing new in being able to leverage a bigger model to enrich smaller models. This is what people that don't understand the AI space got out of it, but it's clearly wrong.
OpenAI has smaller models too with o1 mini and o4 mini, and phi-1 has shown that distillation could make a model 10x smaller perform as well as a much bigger model. The issue with these models is that they can't generalize as well. Bigger models will always win at first, then you can specialize them.
Deepseek also showed that Nvidia GPUs could be more memory-efficient, which catapults Nvidia even further ahead of upcoming processors like Groq or AMD.
People who can use the 585B model will use the best model they can have. What DeepSeek really did was start an AI "space race" to AGI with China, and this race is running on Nvidia GPUs.
Some hobbyists will run the smaller model, but if you could, why not use the bigger & better one?
Model distillation has been a thing for over a decade, and LLM distillation has been widespread since 2023 [1].
There is nothing new in being able to leverage a bigger model to enrich smaller models. This is what people that don't understand the AI space got out of it, but it's clearly wrong.
OpenAI has smaller models too with o1 mini and o4 mini, and phi-1 has shown that distillation could make a model 10x smaller perform as well as a much bigger model. The issue with these models is that they can't generalize as well. Bigger models will always win at first, then you can specialize them.
Deepseek also showed that Nvidia GPUs could be more memory-efficient, which catapults Nvidia even further ahead of upcoming processors like Groq or AMD.
[1] https://arxiv.org/abs/2305.02301