So, can a distilled 8B model (say, the Deepseek-R1-Distil-Llama-8B or whatever) ...

Arthur_ODC 4 months ago | parent | context | favorite | on: S1: A $6 R1 competitor?

So, can a distilled 8B model (say, the Deepseek-R1-Distil-Llama-8B or whatever) be "trained up" to a higher parameter 16B Parameter model after distillation from a superior model, or is it forever stuck at the 8B parameters that can just be fine tuned?