Incredible stuff, and yet TTS is still so robotic. Frankly I assume it must be deliberate at this point, or at least deliberate that nobody's worked on it because it's comparatively easy and dull?
(The context awareness of the current breed of generative AI seems to be exactly what TTS always lacks, awkward syllables and emphasis, pronunciation that would be correct sometimes but not after that word, etc.)
(The context awareness of the current breed of generative AI seems to be exactly what TTS always lacks, awkward syllables and emphasis, pronunciation that would be correct sometimes but not after that word, etc.)