Could you expand on why CMUSphinx[0]/Julius[1] not used? I'm all for using rust ...

IshKebab · on Feb 22, 2018

Have you used Sphinx? Remember in the 90s and early 00's when speech recognition was laughable and barely worked, and it seemed like one of those thing that would never really work?

Sphinx is like that.

hardwaresofton · on Feb 22, 2018

Yes, I've used sphinx, not quite recently, and when I last used it (which was years ago), it was trivial to set up a working english install with the pre-provided models, as an amateur. From there it's only the problem of improving the model (which is obviously a pretty hard problem).

Are you saying that the sphinx progress is at that same level currently as the state of the art in the 90s? Surely that's hyperbole.

Also, using AI to generate even better models doesn't seem like a bad idea, if you're looking to improve it -- the fundamentals of speech recognition haven't changed, why build a completely separate open source product instead of contributing to something already well understood and accessible?

yorwba · on Feb 22, 2018

As far as I can tell, CMU Sphinx is still based on HMMs, which were the previous state of the art before neural networks brought a breakthrough in model performance. So it is likely that CMU Sphinx is currently not much better than what was possible in the 90s. When I last looked into this, I found a mailing-list message by one of the maintainers, where he explicitly recommended to use Kaldi if you want better results.

Kaldi does support neural network models in addition to good old HMMs, but it is very "researchy": everything is set up so that you can replace any step in the pipeline by your components (and then publish a paper about your results). But that also means that you pretty much have to be an expert to correctly assemble a working product from the available components, and it's pretty much assumed that you will be training your own models, which can be difficult when you don't have access to lots of data.

So yes, the existing landscape for open-source speech recognition leaves something to be desired, and the focus of existing projects doesn't necessarily lend itself to turning them into what you want.

hardwaresofton · on Feb 22, 2018

Sorry, I wasn't clear -- what I meant to say is that even inside the limitations of HMM, it's absurd to imply that CMUSphinx has made no progress on the state of the art in the 90s. A greatly improved methodology/approach to HMM is hard won, and it's unfair to minimize that effort that produced progreess in an old method just because a new method has been developed.

neural network models have their own downsides, the biggest one of which being training -- why throw the baby out with the bathwater instead of taking the approach kaldi has taken, possibly building a neural network model alternative inside sphinx?

Let CMU plug away at making HMMs better, while you plug away at making neural networks better, but interoperate so everyone gets both benefits? You can even maybe make some headway with the neural network bootstrapping problem with some help from the progress CMU has already made.

oulipo · on Feb 22, 2018

We are using deep-learning models which have much better accuracy for the speech recognition

hardwaresofton · on Feb 22, 2018

I understand that the project is using AI, but why not feed that learning into sphinx, or some other tool? Couldn't this product have just been essentially an extension to make one of those other open source, research-backed efforts smarter?

How does any other project benefit from the models you build? Or is that the business model -- produce open source software that no one else can really extend or use with anything else, but hopefully people will then buy into your modelling strategy + tooling?

I do realize that you have absolutely no obligation to any other voice recognition effort, but I wonder how easy it is for anyone else to use the model you're building.

oulipo · on Feb 22, 2018

Sphinx has its own models, it is not easy to extend it with the frameworks we are using

We will be open-sourcing more of the platform over time and give back to the community, this will start with the NLU in the coming weeks

hardwaresofton · on Feb 22, 2018

Thanks so much for being open about it, I see why you didn't go with trying to extend it.

Again, I want to express that you don't owe me anything (and it was entitled of me to imply that you did) -- but I wanted to know. Maybe in the future writing that thing that can enrich other models is possible.

__bee · on Feb 22, 2018

> deep-learning models

What kind of models do you use ?