Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Asking because I’m lazy: if I need to transcribe audio in real-time, is there a state of the art model I can plug into?


https://github.com/ggerganov/whisper.cpp

https://github.com/Const-me/Whisper

I had fun with both of these. They will both do realtime transcription. Bit you will have to download the training data sets…


I saw whisper recommended, but I was curious how it compares to the other robust ASR systems (like nvidia's Nemo + Riva). I found this Twitter thread that seemed relevant: https://nitter.net/lunixbochs/status/1574848899897884672

Long story short, it depends on what you want to use it for. Different models and different training sets can help optimize for different things. Also, if you're in a domain with very uncommon speech patterns (think doctor shorthand or radio lingo), you'll need to understand how difficult it will be to customize generated models to do better in your space. I think Nemo + Riva does well at this; but I'm not as familiar with other options.


Yeah I fine tuned a model 2 years ago but it was a big pain and performance didn’t get better than 85%


whisper from OpenAI works great for me.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: