I just ran some benchmarks - M1 Max, pytorch, with a 1.29 second flac (looks like the matrix math was running on a single thread):
tiny 146.522ms detect_lang 549.131ms decode_one 0.057ms tokenizer base 354.885ms detect_lang 1046.679ms decode_one 0.011ms tokenizer small 803.892ms detect_lang 3194.503ms decode_one 0.017ms tokenizer medium 2279.689ms detect_lang 10128.255ms decode_one 0.023ms tokenizer large 3656.478ms detect_lang 17249.024ms decode_one 0.016ms tokenizer
I just ran some benchmarks - M1 Max, pytorch, with a 1.29 second flac (looks like the matrix math was running on a single thread):