Are there any simple howtos anywhere which describes the process in as simple terms as possible? Without knowing the cool toolkits du jour.
Something like:
- Download these texts
- Record in WAV at least 48 kHz
- Record each line in a separate file.
- Do 3 takes of each line: flat, happy, despair
Maybe even a minimal set and a full set depending on how much effort you are willing to put in.
A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.
I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.
Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair
Maybe even a minimal set and a full set depending on how much effort you are willing to put in.
A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.
I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.