A voice technology company that uses artificial intelligence (AI) to generate lifelike voices said it would introduce additional protections after its free tool was used to generate voices of celebrities reading highly inappropriate remarks.
ElevenLabs released its so-called Voice Clone Kit earlier this month.
It allows users to upload clips of someone speaking, which are used to generate artificial voices.
It could then be applied to the company’s text-to-speech speech synthesis feature, which by default provides a list of characters with various accents, and can read up to 2,500 characters of text at a time.
Ukraine war: Deepfake video of Zelenskyy telling Ukrainians to ‘put down arms’ is debunked
Don’t use “Google it” anymore?How artificial intelligence is changing the way we search the web
It didn’t take long for the entire internet to experiment with the technique, including on the notoriously anonymous image board site 4chan, generating clips including Harry Potter actress Emma Watson reading a passage from Adolf Hitler’s My Kampf talk.
Other documents found by Sky News included what sounded like Joe Biden announcing that U.S. troops would be entering Ukraine, and a lisp of David Attenborough bragging about his Navy SEAL career.
Film director James Cameron, Top Gun star Tom Cruise and podcaster Joe Rogan have been targeted, as have segments of fictional characters, often read in highly offensive, racist or misogynistic messages.
In a statement on Twitter, ElevenLabs, founded last year by former Google engineer Piotr Dabkowski and former Palantir strategist Mati Staniszewski, asked for feedback on how to prevent its technology from being misused.
“Crazy weekend – thanks for trying out our beta platform,” it said.
“While we see our technology being overwhelmingly used in active use, we are also seeing increasing cases of voice cloning abuse. We want to reach out to the Twitter community for comments and feedback!”
The company says that while it can “trace any generated audio” back to the user who produced it, it also wants to introduce “additional safeguards.”
It recommends requiring additional account checks, such as asking for payment details or ID; verifying someone’s copyright to a clip they upload; or ditching the tool entirely to manually verify each voice-cloning request.
But the tool remained online as of Tuesday morning.
The company’s website suggests that its technology could one day be used to power articles, newsletters, books, educational materials, video games and movies.
Sky News has contacted ElevenLabs for further comment.
The dangers of AI-generated media
The proliferation of inappropriate speech clips serves as a reminder of the dangers of releasing AI tools into the public domain without adequate protections – previous examples include Microsoft chatbots that must be turned off After being quickly taught to say offensive things.
Earlier this month, researchers at the tech giant announced they had developed a text-to-speech AI called VALL-E that can simulate a human voice based on just three seconds of audio.
They said they would not release the tool to the public because “it could pose potential risks,” including people “spoofing speech recognition or impersonating a specific speaker.”
The technology poses many of the same challenges as deepfake videos, which are increasingly common on the internet.
last year, Deepfake video by Volodymyr Zelenskyy A count of the number of Ukrainians who “laid down their arms” was shared online.
It comes after the creator of a series of realistic Tom Cruise deepfakes, Although the lighter clip was meant to show the actors performing magic tricks and golfingwarning viewers of the potential of the technology.