I’m trying right now. The combination of small models, qlora and grpo has made i...

I’m trying right now. The combination of small models, qlora and grpo has made it accessible to experimenters. I’m not using unsloth yet, but I will probably start checking it out pretty soon so that I can train larger models or increase the number of generations for grpo.