It's ok, I just followed your instructions and with that model is works well. But are you sure that this uses CUDA? My CPU utilization is at 50% while my GPU utilization is at 1% while the output is being generated..
The cmake build prints that it finds cuda when I run the cmakelists (prints the location of cuda headers), however I dont see any noticeable difference between cpu-only and cuda builds. So if its not working then maybe there a CLI option thats required, or maybe cuda support is broken on windows