I know (should have included in my earlier response but editing would've felt weird) but I still assume one should run the result natively, so am asking if/where there's some jumping around required.
Last time I tried running an LLM I tried wsl&native both on 2 machines and just got lovecraftian-tier errors so waiting if I'm missing something obvious before going down that route again
No idea if it will work, in this case, but it does with llama.cpp: https://github.com/ggerganov/llama.cpp/issues/103