Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For (English only) speech-to-text, NVIDIA's Parakeet-V2 is significantly faster than Whisper and I found it to be more accurate.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

For Apple Silicon (MLX) https://huggingface.co/senstella/parakeet-tdt-0.6b-v2-mlx



Compared to all Whister models? Or the faster ones? And which version of Whisper? All for a faster, more accurate model, but need a bit more.


All of them, in my experience.


Fair, looking at the ASR leaderboards it is truly better - https://huggingface.co/spaces/hf-audio/open_asr_leaderboard and NVIDIA's Canary might be even better? Will try these out. Appreciate bringing these to my attention!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: