Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here: We were blown away too. This project started with a question in our minds about whether it was even possible for the stable diffusion model architecture to output something with the level of fidelity needed for the resulting audio to sound reasonable.


Any chance of spoken voice-work being possible? It would be interesting to see if a model could "speak" like James Earl Jones or Steve Blum.


James Earl Jones: https://fakeyou.com/tts/result/TR:9ek4x6eb80kq49e94grnhctk4g...

Steve Blum: https://fakeyou.com/tts/result/TR:xmjjq9ty0hnsyjrjnw806k6rnp...

Furiously working on voice-to-voice (web, real time, and singing!) Should be out the door tomorrow!


Excellent work! Singing would be amazing - karaoke can finally sound good :p

Have you released a tool for volumetric capture? I'm applying this to LED lighting fixture setup for tv/film/live shows and 3D positioning is the last step to fully automated configuration.

My goal is real-time sync between 3D model and real world.


Be careful not to choke on your aspirations :P


This already exists [1].

[1] https://www.respeecher.com/


Are there any open source models with good quality?

I had a look around several months ago, and it seems like everything is locked behind SaaS APIs.


have a look at UberDuck, they do something like this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: