It seems like Large-V2 is a huge improvement when transcribing Japanese. I tested it on "Macross Frontier - the Movie", and it no longer breaks after 8 minutes as before:
There's still some timing issues after a period of silence, but using a VAD as a workaround (like I do in my WebUI) may no longer be strictly necessary:
Large-V1 (transcribed at 2022-10-02):
* SRT: https://pastes.io/yrggofqhof
* Transcript: https://pastes.io/rtp9buhsm0
Large-V2 (latest version at 2022-12-07):
* SRT - https://pastes.io/uiqblpw1qk
* Transcript - https://pastes.io/rheqgnftzl
There's still some timing issues after a period of silence, but using a VAD as a workaround (like I do in my WebUI) may no longer be strictly necessary:
* https://github.com/openai/whisper/discussions/397