Okay this is super impressive. I just downloaded Whisper and fed it a random fla...

lunixbochs · on Sept 21, 2022

Looks like it defaults to the model called "small".

I just ran some benchmarks - M1 Max, pytorch, with a 1.29 second flac (looks like the matrix math was running on a single thread):

    tiny
    146.522ms detect_lang
    549.131ms decode_one
    0.057ms tokenizer

    base
    354.885ms detect_lang
    1046.679ms decode_one
    0.011ms tokenizer

    small
    803.892ms detect_lang
    3194.503ms decode_one
    0.017ms tokenizer

    medium
    2279.689ms detect_lang
    10128.255ms decode_one
    0.023ms tokenizer

    large
    3656.478ms detect_lang
    17249.024ms decode_one
    0.016ms tokenizer

adgjlsfhk1 · on Sept 22, 2022

For more benchmarks on an rtx 2060 (6gb), the "small" model for me is roughly 10x real-time and the tiny model is 30x real-time.