self distillation and mutual distillation are used in MoE models. What you can d...

		imtringued 88 days ago \| parent \| context \| favorite \| on: GEPA: Reflective prompt evolution can outperform r... self distillation and mutual distillation are used in MoE models. What you can do is freeze all but one expert and then train the model. If you want to do it again, you have to do self/mutual distillation to spread the training result onto the other experts.