Also, CMUSphinx and Julius: http://cmusphinx.sourceforge.net/ http://julius.osdn...

ytjohn · on March 23, 2016

I first learned about CMUSphinx from the [Jasper Project](https://jasperproject.github.io/). While Jasper provided an image for the Pi, I decided to go ahead and make a scripted install of CMUSphinx. I spent something like 2 frustrating days attempting to get it installed by hand in a repeatable fashion before giving up.

This was 2 years ago, so maybe it's simple now, but I didn't find it "amazingly easy" back then.

I do have a number of projects where I could definitely use a local speech recognition library. I have used [Python SpeechRecognition](https://github.com/Uberi/speech_recognition/blob/master/exam...) to essentially record and transcribe from a scanner. I wanted to take it further, but google at the time limited the number of requests per day. Today's announcement seems to indicate they will be expanding their free usage, but a local setup would be much better. I'd like to deploy this in a place that might not have reliable Internet.

uberi · on March 24, 2016

In my experiences, the issues with building CMU Sphinx are mainly unspecified dependencies, undocumented version requirements, and forgetting to sacrifice the goat when the MSVC redistributable installer pops up.

We've written detailed, up-to-date instructions [1] for installing CMU Sphinx, and now also provide prebuilt binaries [2]!

If you're interested in not sending your audio to Google, CMU Sphinx and other libraries (like Kaldi and Julius), are definitely worth a second look.

[1] https://github.com/Uberi/speech_recognition/blob/master/refe... [2] https://github.com/Uberi/speech_recognition/tree/master/thir...

dr_zoidberg · on March 25, 2016

Yeah I'm gonna leave a reply here just in case I need to find this again (already opened tabs, but you never know). This might be big for a stalled project at work. If this can un-stall that, I'll sure owe you a beer ;)

hardwaresofton · on March 24, 2016

Would you mind submitting this documentation to CMU? I get the feeling they'd love to at least host a link to them or something to enhance their own documentation?

ytjohn · on March 28, 2016

Thanks for providing this. Will definitely give it a fresh look.

simcop2387 · on March 23, 2016

That sounds like my experience with it from about 5 years ago or so. I gave up on it also. It also didn't help that CMUSphinx has had more than one version in development in different languages.

hardwaresofton · on March 24, 2016

I would note that as a positive... But yeah, 5 years ago things were much much rougher (which is partly why I didn't think it got so much press).

But these days, if you go all the way through their tutorial, and give it a proper read, it's very doable to set up.

sebak · on March 24, 2016

Unfortunately, the situation hasn't improved much. Besides, even if you get it set up, the quality of the recognition isn't even close to the one from Google.

chetatkinsdiet · on March 24, 2016

As someone who's worked with a lot of these engines, Nuance and IBM are the only really high quality players in the space. CMUSphinx and Julius are fine for low volume operations where you don't need really accurate response rates, but if you want high accuracy neither comes close from my experience.

hardwaresofton · on March 24, 2016

Right, but they do offer you a fantastic starting point. If Nuance is 100%, I'd say CMUSphinx is at least 40%.

Also, they give you the tools and knowledge to build better models (and explain the theory), which is where most of the competitive advantage is IMHO.

IshKebab · on March 24, 2016

As someone who has actually done objective tests, Google are by far the best, Nuance are a clear second. IBM Watson is awful though. Actually the worst I've tested.

Pafnouti · on March 24, 2016

Do you have a report of your tests? I'm interested in using speech recognition, but there are many start-ups and big players that it would be quite time consuming to get a quality/price analysis.

dr_zoidberg · on March 25, 2016

For the "dialect" of spanish that we speak in Argentina, Watson misses every single word. So, to me, CMUSphinx is valuable in that it allows me to tweak it, while IBM miserably fails at every word. Must've been trained with Spain or Mexican "neutral" spanish.

Googles engine also works fine (have been trying it with the phones), but the pricing may or may not be a deal breaker.

blennon · on March 23, 2016

Is Julius really state-of-the-art? Looks like they use n-gram and HMMs. Those were the methods that achieve SotA 5+ years ago. My understanding is that Google and Microsoft are using end-to-end (or nearly) neural network models; these outperformed the older methods a few years ago. Not sure how CMUSphinx works under the hood.

hardwaresofton · on March 24, 2016

They might not be considered state-of-the-art (if you consider both approaches in the same category), but they are definitely one valid approach to voice recognition, which works surprisingly well.

CMUSphinx is not a neural network based system, they do use language and acoustic modeling.

dharma1 · on March 24, 2016

check out https://github.com/yajiemiao/eesen for LSTM and CTC based library instead of HMMs

FranOntanaya · on March 23, 2016

CMUSphinx is really easy to set up, and then being able to train it for one's specific domain probably beats state of the art with one-size-fits-all training.

dheera · on March 24, 2016

> It is amazingly easy to create speech recognition without going out to any API these days.

Not really. The hard part is not the algorithm, it is the millions of samples of training data that have gone behind Google's system. They pretty much have every accent and way of speaking covered in their system which is what allows them to deliver such a high-accuracy speaker-independent system.

CMUSphinx is remarkable as an academic milestone, but in all honesty it's basically unusuable from a product standpoint. If your speech recognition is only 95% accurate, you're going to have a lot of very unhappy users. Average Joes are used to things like microwave ovens, which work 99.99% of the time, and expect new technology to "just work".

CMUSphinx is also an old algorithm; AFAIK Google is neural-network based.

dharma1 · on March 24, 2016

Eesen looks promising, uses LSTM and CTC rather than older tech.

https://github.com/yajiemiao/eesen

Baidu open sourced their CTC implementation

https://github.com/baidu-research/warp-ctc

I think we will have an easy to install OSS speech recognition library and accurate pretrained networks not far off from Google/Alexa/Baidu, running locally rather than in the cloud, within 1-2 years. Can't wait.