Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like Large-V2 is a huge improvement when transcribing Japanese. I tested it on "Macross Frontier - the Movie", and it no longer breaks after 8 minutes as before:

Large-V1 (transcribed at 2022-10-02):

* SRT: https://pastes.io/yrggofqhof

* Transcript: https://pastes.io/rtp9buhsm0

Large-V2 (latest version at 2022-12-07):

* SRT - https://pastes.io/uiqblpw1qk

* Transcript - https://pastes.io/rheqgnftzl

There's still some timing issues after a period of silence, but using a VAD as a workaround (like I do in my WebUI) may no longer be strictly necessary:

* https://github.com/openai/whisper/discussions/397



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: