I’m trying right now. The combination of small models, qlora and grpo has made it accessible to experimenters. I’m not using unsloth yet, but I will probably start checking it out pretty soon so that I can train larger models or increase the number of generations for grpo.