I’m in a similar boat where I produce about one video a day and will definitely give this a try. But if I I want to caption them all that means $149/mo because the $15 plan has a cap of 20 videos/mo. At that price I’d much more likely pay for DaVinci Resolve Studio one time at $295 to use their subtitle generator (and that fits into my existing workflow).
If there was a better pricing plan for someone who has yet to see a dime from videos, I’d consider this long term.
whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
If there was a better pricing plan for someone who has yet to see a dime from videos, I’d consider this long term.