Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The metadata exists in special karaoke recordings, but assuming they're using original recordings not created/modified for karaoke, they'd have to create it on the fly:

I'd guess it's done using the same speech-to-text system used by voice assistants, which can certainly show the words it hears in near-realtime -- and way more quickly when it already knows what words it's listening for.

By the way, karaoke often highlights individual syllables, not just whole words.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: