Is it? Last I checked when you trained an LLM on another's output, at best you got the same performance as the original, and it was more likely you significantly degraded usefulness. (I'm not talking about distillation, where that tradeoff is known in return for a smaller, more efficient parameter set)