Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, that's because Whisper - like pretty much all of them - uses a Transformer encoder with Attention layers. And the Attention layers learn to look into the future.

And yes, what you describe could be done. But no, it won't reduce latency that much, because the model itself learns to delay the prediction w.r.t. the audio stream. That's why ASR-generated subtitles usually need to be re-aligned after the speech recognition step. And that's why there is research such as the FastEmit paper to prevent that, but then it is a trade-off between latency and quality again.

Also, running your "low-latency" model with 1s chunks means you now need to evaluate the AI 30x as often as if you'd be using 30s chunks.



You just said the models pretty much all work the same way, then you said doing what I described won't help. I'm confused. Apple and Google both offer real time, on device transcription these days, so something clearly works. And if you say the models already all do this, then running it 30x as often isn't a problem anyways, since again... people are used to that.

I doubt people run online transcription for long periods of time on their phone very often, so the battery impact is irrelevant, and the model is ideally running (mostly) on a low power, high performance inference accelerator anyways, which is common to many SoCs these days.


I meant that most research that has been released in papers or code recently uses the same architecture. But all of those research papers use something different than Apple and Google.

As for running the AI 30x, on current hardware that'll make it slower than realtime. Plus all of those 1GB+ models won't fit into a phone anyway.


> Plus all of those 1GB+ models won't fit into a phone anyway.

I don't think that's a requirement here. I've been playing with Whisper tonight, and even the tiny model drastically outperformed Siri dictation for me in my testing. YMMV, of course.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: