We did built two free tools, which are geared towards non-native English speakers. You can find them at https://accentoracle.com and https://accentfilter.com. They're less effective for English native speakers, but could still be fun.
Is the approach being used to do accented TTS (or just reference recordings), and then a tone color conversion model that just changes the timbre? Because if I say a completely different sentence it still says the original words, haha.
Hmmm. Initially impressive but upon retries and reflection ... not that great. It doesn't even maintain timing ... unless that's part of the transform.
Indeed yeah that’s one of the key weaknesses of the approach that we’re using. It overrides the speakers cadence and accent while keeping their voice profile / timbre in place. Different techniques may not do this but also may not copy over the accent to the resulting clip as effectively. So far we’re using this to support pedagogical (and lead-gen) use cases where we think it works sufficiently enough.
Let's put it a different way. I grew up in the UK till 24. I've lived in the USA for 36 years. The UK/US accent conversions dramatically altered my voice/accent; the AU one left it mostly unchanged.
We actually did something like this for non-native English speakers a few months back. Check out https://accentoracle.com (most mind-blowing if you're a non native English speaker)
Well, it says I'm Finish. But now I have a new game, where I put on my best Italian or Russian or Greek or Australian accent and try to see how close I am.
I'm terrible, according to the program. My Italian is Russian or Hungarian or Swedish, my Australian is English.
Fun. I have a strongly modulated North American midwestern accent so unsurprisingly it had me read several paragraphs before only being able to say with any certainty that my accent was 83% English with the rest being Spanish/Russian. It couldn't detect the country of origin.
Agreed, pretty meh. Tried my usual accent (the one where natives mostly can’t tell where I'm from) — got 78%. Then went full cartoon russian ‘bad neighborhood’ mode — somehow scored 68%.
I would love to be able to explore combinations of X spoken language with Y accent, like for example I've always been curious how French sounds spoken with an Indian accent.
Was that right? Or what is the correct native language it should have predicted? Note the %s in the accent breakdown section are prediction probabilities
Indeed, although the inference output of the model is based on the ratings input that we trained it on. And that rating input was done by American English native speakers, so this iteration of the model is centered towards those accents more than e.g. UK or Australian or other accents of English from outside the US.
That's a fascinating idea! Definitely something to try out for our team. We actively and continuously do all sorts of experiments with our machine learning models to be able to extract the most useful insights. We will definitely share if we find something useful here.
Sure, that's fair. We apply labels that have a connotation of strength based on the distance, but the underlying calculation is indeed based on distance.
reply