Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very interested in this. I have been contemplating building something similar, but am unaware of any existing services that do this. Haven't played with pyannote, how does it compare to whisper? Also thought it might be useful to be able to OCR screenshots and use the text to inform the summariation and transcription especially for things like code snippets and domain-specifc terms.


I remember whisper v3 large blowing my mind: it was able to properly transcribe some two language monstrosity (przescreenować, which is a english word "to screen a candidate", but conjugated according to standard polish rules). Once I saw that I thought "it's finally time: truly good transcription has finally arrived".

So I view whisper as sota with excellent accuracy.

Now, for the type of transcription I need speaker discerning is much more valuable than accurate to the point translation: so it will be summarized anyway and that tends to gloss over some of errors anyway.

That said, pyannote has also caught me off guard: it correctly annotated lazily spoken "DP8" with non native speaker accent.

It looks really good




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: