This space seems like one of those areas where it would be really hard to break ...

This space seems like one of those areas where it would be really hard to break in because their whole selling point is having had hundreds or thousands of people record and annotate an enormous amount of voice input, which I assume has to be hand polished for every single exercise?

I'm sure some part of it could be automated these days, or some parts even use voice synthesis, but I'm sure it would take basically an army of people hand-crafting it for the experience not to be very janky in the end.