Modders have been hiring sound-alikes for the original Skyrim voice actors since forever. For example, male Khajiits in "Interesting NPCs" sound particularly convincing, this is clearly a work of another professional voice actor closely mimicking Andre Sogliuzzo's original work. As far as I'm aware, no one had any issues with that; using someone's voice without consent was and is an accepted practice. What has changed?
It's the same with other game assets - textures, models, animation etc. In a mod, you just have to match the original game style closely, there's no other way. The common view in the game industry is that it's fine to do because modders usually don't make money. (at least they didn't before Bethesda and alikes let them...)
Modders going out of their way to find voice actors who can accurately reproduce an out of budget voice actor sounds like a labor of love that would not scale without serious money.
Having a gaming pc with a modern-ish nvidia GPU (not hating on AMD, rocm is just painful right now) scales very nicely when you apply ML to the problem.
Scales to the budget of bored tinkering teenagers with a lot of compute from chasing gaming framerates, gaming put mini ML workstations in a lot of kid's bedrooms if you think about it.
As of right now it is pretty much turnkey as it is.
The trouble with this framing is that it obscures the other factors involved.
This isn’t just about who has enough money, but about the ethics of using an actor’s likeness at scale without their permission and without paying them.
I think the fact that other VAs have been hired by mod teams in the past is only relevant if you believe that using AI is somehow equivalent.
The definition of likeness extends beyond visual imagery.
Regarding copyright, it was previously not practical to copy a voice, so while it may be true that it is not copyright-able today, this is not a sufficient argument for the ethics of such copies, which may or may not be in sync with current laws. If copying voices was possible when copyright laws were established, the legal landscape would probably look different than it does today, which hints at the necessity of changes going forward.
AI is breaking new ground, and at this stage of the conversation, precedent is interesting in terms of how it reveals the gaps in current legal frameworks, but this should not be mistaken for such gaps being acceptable.
Many of these new conversations will reveal two things:
1) The things happening right now are legal
2) There are legitimate questions about whether or not they should be
I didn't mention anything about restricting it, the law will eventually get around handling this type of thing, probably in a DMCA-ish manner would be my guess.
Most anime dubs and videogames all sound extremely samey, at least with digital voices engine we can retune them easily without the gatekeeping of the dubbing industry which is extremely toxic.
It's the same ethical gap that dogs virtually any AI art project. In your example, some other actor did work in exchange for some satisfactory payment (possibly free). In the case of AI synthesis, actors' work is plagiarized without even compensation.
> In your example, some other actor did work in exchange for some satisfactory payment (possibly free).
In that example, someone else was paid instead of the original actor while using their voice, doing the job that neither the original voice actor nor copyright holder approved (in the context of NSFW mods the article is talking about). What makes it more ethical all of a sudden? What makes it less of a plagiarism?
It seems to me that the actors in the article don't want their names to be associated with NSFW mods in particular (which is understandable), and the AI part is a red herring.
This isn't a new debate in the modding community. There were all kind of opinions voiced. Robbie is not fond of AI voices, but he tracks NSFW mods because they are a natural concern.
Anyways, it is the custom there that Nexusmods moderation reacts to suspected mod asset stealing and such an issue can be raised by any website user (as mod authors often leave and are no more around). However, for the voice cloning/impersonating, it requires a notification from the original VA, or some other credible confirmation that the particular VA is against this.
If we want to compare this to libraries, it’s necessary to redefine the role of libraries.
For the analogy to work, libraries now provide the service of generating new books/stories based on works under the library’s care. As a library patron, you’ll give the librarian a prompt, and they’ll generate a book for you.
This book will be intrinsically dependent on the authors of all of the books in the library, but those authors will not be credited or paid for the new work.
“Librarian, please generate a fantasy novel in the style of George R. R. Martin that continues where A Dance with Dragons left off”.
If this is what libraries did, authors would not want their books there.
This is not what libraries do, and the analogy does not hold up.
It's worse then that, LOnce that's legal, AI will have no need for fresh input. It'll takeover the business, surround the talent in sheer volume so even if someone wants "real" there's no metric to easily find it without huge effort.
It plagiarizes then effectively puts them out of business.
> It plagiarizes then effectively puts them out of business.
Isn't that a concern for any artist? We've had this discussion with digital art and photography, and decades ago with electronic music, remixes and sampling.
AI is just enabling this on a larger scale, which will disrupt many fields, but copyright law will broaden, and artists will find ways to adapt or change careers.
> We've had this discussion with digital art and photography, and decades ago with electronic music, remixes and sampling
We had very different discussions about all of those things.
There is a certain structural similarity between AI and these past advanced in the form of: new thing disrupts old thing.
But I think it’s deeply problematic to take that analogy much further. Take digital art. I don’t think it’s fair to compare the impact of the advent of digital painting tools with the advent of tools that systematically ingest all paintings and the remove the need for the original artist entirely.
If removing the artist entirely was part of that discussion, I suspect the tooling and legal landscape would look rather different today.
> AI is just enabling this on a larger scale, which will disrupt many fields
“This” and “larger scale” are doing a lot of heavy lifting here.
Nuclear weapons just enable this (blowing things up) at a larger scale. But these weapons also show us that scale introduces risks and factors not present in any prior iteration of the profession of blowing things up.
My point is not that AI art tools are as dangerous as nuclear weapons, obviously, but that “it’s just x at larger scale” breaks down when the shift in scale is large enough.
The result is something entirely new, for which the past rules of engagement no longer apply.
I do agree in part. Scale matters, and the challenges humanity faces with AI are much greater than with any disruptive technology of the past.
That said, we've had similar challenges before, and society has adapted. I'm pessimistic about the long-term existential risks of AI, but the short-term disruptions to jobs and the legal changes that will be required seem manageable, and are not the doomsday scenario that the media makes them up to be.
> But I think it’s deeply problematic to take that analogy much further. Take digital art. I don’t think it’s fair to compare the impact of the advent of digital painting tools with the advent of tools that systematically ingest all paintings and the remove the need for the original artist entirely.
The invention of photography in the 19th century certainly had the same, if not greater, impact for painters. Yet artists adapted, and paintings were able to coexist with the new technology. Photography opened up new avenues for art, but it didn't eliminate the demand for the traditional art form.
So will happen with AI-produced art as well, I think. The markets and our media feeds will be flooded with it, but the demand for human-produced digital art will still exist. It will be challenging to filter and curate human art, especially as the line will be blurred, and many human artists will take advantage of AI. But I don't think any of it will entirely make human artists obsolete.
Anyway, this is all speculation from my side, so I concede that I may be wrong, but it's interesting to think about, and time will tell.
There might be a Streisand Effect to this. Also, it's a bit hard to get morally outraged when if the mod devs had hired a voice actors that sounded like the Skyrim characters, it wouldn't be theft of voice, even though it had the same effect.
> Also, it's a bit hard to get morally outraged when if the mod devs had hired a voice actors that sounded like the Skyrim characters, it wouldn't be theft of voice, even though it had the same effect
Tom Waits successfully sued Frito-Lay for using a Tom Waits soundalike in a commercial.
There's an important distinction here - these ads were impersonating Tom Waits himself. There's something called publicity rights that relates to a person's right to control their public likeness which is presumably what Waits sued over.
Here, modders are impersonating a character played by a voice actor and the copyright on the character belongs to the game company, not the actor.
Assuming the game company declines to enforce their copyright, it falls back to the actor to enforce their publicity rights. Which I'm not sure they can, since it's not them but rather a character they play.
When modders hired soundalikes to extend a character, it was equivalent to finding a similar looking actor. Now, using AI, it's equivalent to using deepfake video. Legally, I assume there's a difference which might be important if the game company decided to sue the modders. But from the perspective of the voice actor, I don't think it matters. Their copyright isn't being infringed.
How do car dealers get away with using the voice of “ex-presidents” for their presidents’s day sales then? I mean, while not exactly the ex-presidents’s voices, they do mimic their voice styles and voice mannerisms.
Typically ‘W’ and Clinton —though the ones for Lincoln and Washington is obviously imagined and presumably any rights would have expired.
I can't speak for video games, but for any work with multiple characters (novels, commercials, etc.), voicing IS acting. Different characters have different voices, at different times depending on their emotional state.
On the other hand, you could just have different voices for the different characters, which a book narrator can't do. I thought about doing that for my books, but it'd just be way too expensive.
Once the voice models allow for annotations (make this sarcastic, give a slow sigh here, make this sound sad and depressed) we'll see a lot of generated audio in audiobooks.
Apple already released some audiobooks using AI generated voice actors. I don’t know if they were annotated, but the AI seemed to adjust tone correctly for the context in the snippets I listened to.
I'm not saying it won't happen (after all, so much animation is crap and it's driven out the good stuff), but take a listen to the audio samples on my two books (searching Amazon for "Albert Cory" should find them).
It's more than just following an annotation about sarcasm or sighing or sounding depressed.
But yeah, that probably IS going to have to suffice for the vast majority of popular works.
I don't know, even the base model from Elevenlabs is doing a pretty good job at imitating Maxwell Glick. I had it read out your comment, see here: https://sndup.net/q3r5/
Elevenlabs announced a new professional audio model with a more involved training process, I'm curious to see how good that one is, and how much control it gives you.
That sounds like him all right. I didn't see an obvious way to use a different text file, but try this one: it's what I used for auditioning:
One morning in May 1979, Dan and Janet were going over a problem with the latest build when Grant knocked, “Hi, I can come back later if you're busy with something.” Dan looked at Janet to see if he should leave. “No, come in. Dan was just being a jerk as usual.” Dan tried to look hurt. Grant said, “I'm all settled in now. You should come up.” Dan looked confused, “Wait… I thought you lived up North.” “No, I transferred down here to the Tor project.” “Wow, I thought people only moved from South to North.” “I'm starting a new trend.” “Are you up on the second floor now?” “ Yep. Come up and visit sometime.” “What a thought. Is it like here?” Dan had never been on the second floor or anywhere else in the building. Grant went to the door and made a show of looking in both directions, “Looks very similar. Anyhow, what are you doing for no-driving day tomorrow?” He looked at Janet, who had a quizzical expression. “The company just said we're not supposed to drive alone tomorrow because of the gas rationing. They're going to station guards at the parking lots to check.” Dan already knew about this, but Janet didn't. “Don't know. Maybe Dan can come by and pick me up. Then, we'd be a carpool.” Grant flashed a look of displeasure for an instant but quickly erased it. He was hoping Janet would come to his house, and they could walk to work. “Do you live near Janet?” “Hermosa. Janet's right on the way.” Grant recovered, “I was going to suggest we meet at my house and walk to work. It's only about a half-hour walk.” Dan asked, “Where do you live? Hawthorne?” Grant looked vaguely insulted at the suggestion that he might live in Hawthorne, “Manhattan Beach, near Marine and Aviation.” “We could pick you up, too! You're right on the way.” Grant wasn't expecting this resolution, but there it was. He wrote his address on the whiteboard, which Dan copied down into his notebook. They agreed on a time tomorrow, and Dan and Grant both left.
This is just a single generation pass though, and the voice is a bit unstable and is maybe not perfectly consistent with the direct speech, ideally you would tweak it a bit, maybe break it into chunks, isolate the direct speech and then cut it together, etc.
Still, it's pretty good I'd say, and as I mentioned previously, being able to give specific directions is around the corner, along with a "proper"/professional training of the voice model on a specific voice.
An interesting question: what counts as a "close enough" voice?
Voices aren't that unique, at least from the perspective of a human with average hearing. I believe that by analyzing every single frequency of a perfect recording, you can determine the speaker fairly precisely, but for a normal human ear, plenty of people sound "almost alike".
The danger here is that AI voice will continue to get better until one day...voice actors are simply not needed. Perhaps a better solution is for voice actors to license their voice to be used in AI. Imagine if you could license Morgan Freeman's voice for something, with plenty of rules and stipulations of course. The voice is recreated digitally and Mr. Freeman oversees the use of his voice.
I think it’s already happening for audiobooks to some extent. I’ve heard the latest voice AI and it’s extremely natural sounding with tonality and all. And audiobooks are an extreme case too with several hundreds of pages needing hiring a voice. Now they can just upload it and have the result in a few hours or so. Either the publisher themselves or the audiobook service. Hell, they could give you your favorite voice. Male? Female? Young? Old? Maybe you want a book on AI read by Steve Jobs?
Here’s an audio AI example by ElevenLabs with celebrity voices.
It's not a danger, it's the dream for a huge number of creatives blocked by voice actor gatekeeping.
The indie game development community is adopting AI voices extremely fast, because voice actors blew budgets or were difficult to work with. Many voice actors feel too good for videogames.
I'm not surprised the gaming community is going hard and fast on AI voices, especially voice engines for the huge amount of text in lots of older games which would cost a fortune and a half to voice.
Think more that extended AI voice rights become part of standard contracts and video games with a 60 hour main story and voice acting have hundreds to thousands of tailored extended questlines powered by AI to extend the original performances across content no single player is ever intended to see all of.
It's been pretty wild the past few years playing games and realizing the difference between today and tomorrow AI will have on gaming.
Contract negotiation and licensing just needs to catch up.
Well, eventually we will reach the point where models don’t have to “mimic” anyone and can be guided. It really doesn’t matter.
It will become so diluted that using human actors will be seen as special and not the other way around. Kinda like synthetic diamonds vs blood diamonds.
It's the same with other game assets - textures, models, animation etc. In a mod, you just have to match the original game style closely, there's no other way. The common view in the game industry is that it's fine to do because modders usually don't make money. (at least they didn't before Bethesda and alikes let them...)