First off, it seems that the model can easily run on M1/M2 with minor modification. However `aten::_index_put_impl_` operator is current not supported and fallback always slows things down quite a lot.
Second, is there a bug with how the script processes incoming audio segments? For a short 4 second clip, what I got was:
> [00:00.000 --> 00:03.760] Okay, Eunice, travel plans. I need to be in New York on Monday, L.A. on Tuesday, New York on Wednesday, L.A. on Thursday. You're knocking Friday. Got it?
> [00:03.760 --> 00:28.760] Got it.
However the final segment should have been shy of 1 second.
It mistakenly thinks the last segment was 25 seconds long and makes you wait for processing.
Second, is there a bug with how the script processes incoming audio segments? For a short 4 second clip, what I got was:
> [00:00.000 --> 00:03.760] Okay, Eunice, travel plans. I need to be in New York on Monday, L.A. on Tuesday, New York on Wednesday, L.A. on Thursday. You're knocking Friday. Got it?
> [00:03.760 --> 00:28.760] Got it.
However the final segment should have been shy of 1 second. It mistakenly thinks the last segment was 25 seconds long and makes you wait for processing.