‘Voice Skins’ Could Let Gamers Customise Their Voices Online

‘Voice Skins’ Could Let Gamers Customise Their Voices Online

Online, we can be whoever we want—a bulked-up soldier, a freakish banana, Sonic the Hedgehog—only until we start speaking. Over any online game’s built-in chat or Discord, our voices have the power to immediately reveal information about our geographic region, our gender, even our ethnicity.

Now, a company that makes “voice skins” is trying to change that, too.

“You go online and have the freedom to design your avatar, choose your username, pick what communities you jump into. You can design your online persona completely separately from who you are in the real world,” said Modulate founder Mike Pappas.

“Modulate gives you the complete freedom to design your online persona from scratch instead of bringing the real world with you.”

Modulate was founded by two MIT graduates as a “What if?” exploration into the “spy movie” idea of using technology to change your voice. At face, it’s not a new or original idea; Photoshop creator Adobe unveiled their own voice conversion program, or “VoCo,” in 2016, for example.

On top of generic audio applications, like podcasts or voiceovers, Modulate is now gearing their product toward gaming. Their voice skins, as they call them, transform users’ voices live by filtering through their artificial intelligence software, which can be set to generate outputs resembling a certain gender, speaking style or celebrity (there’s a Barack Obama skin).

Pappas is hoping to integrate the technology into online games and, eventually, transform users’ voices into their favourite gaming characters’, too. Instead of sounding like a stilted text-to-speech robot, the audio outcome is more believably human, echoing a user’s excitement or sadness with faithful changes in tone or pacing.

“If Overwatch was interested in using it and we were built into their voice chat provider, maybe they’d be able to design some Overwatch-specific voice skins that would become available as a microtransaction to a player,” Pappas said.

Although Pappas explained that Modulate is working on partnering with gaming companies to offer their voice skins in-game, he said the company is also exploring a standalone app option.

Technology that transforms media and masks its original form always runs the risk of doing more harm than good. The more viral, explosive images are revealed to be Photoshopped, the more sceptical savvy internet dwellers might be about how real their content is.

Voice skins will be no different should they take off. When Kotaku asked Pappas how he’ll handle potential misuses of the technology, like catfishing, he said Modulate’s audio has a “watermark.” “It’s not something humans hear,” he said, “but there’s a detection algorithm that can be used in real time to detect the watermark.”

Unfortunately, he added, only Modulate’s detection technology can currently determine whether someone’s voice has been altered with its vocal skins. And as their AI software gets better, and the skins become more lifelike, Pappas added, “there will be fewer clues that this is synthetic except our watermark.”

Pappas says Modulate should be finding its way into games from this winter. He and his co-founder have already tried it in Dota 2 and League of Legends, they said, and it went smoothly until his co-founder began laughing. “The voice skins had never heard laughter because they were trained on data sets of speech. It came out as a stuttery ‘Ha. Ha. Ha. Ha,’” Pappas recalled. “We’ve now added laughter to the dataset to support that kind of thing.” 


  • We used to use the built in voice modifier on the PS3 for shits and giggles.

    It’s kinda sad some folks need to do this to hide their identities though, especially when we are just talking about sitting down to play some games.

  • “If Overwatch was interested in using it and we were built into their voice chat provider, maybe they’d be able to design some Overwatch-specific voice skins that would become available as a microtransaction to a player,”

    Cool then people could buy a Doomfist voice skin and shout AND THEY SAY AND THEY SAY AND THEY SAY without spamming the voiceline

  • I’m curious to know how the watermarking process works because like with image processing tools, there are certain watermarking techniques that can easily be stripped or even cut out. Those who were dedicated to disguising themselves would certainly find ways to defeat it.

    Then again, the question before that is how would you even know it’s watermarked unless you were actively verifying someone’s voice? Are they saying that only certain chat programs can support their technology because both incoming and outgoing sounds need to be processed by their software? Maybe it’d be easier to just use speech-to-text-to-speech technology and make us all sound like the Google voice.

    • Nah. it’ll just detect audio and start shouting Fegelein, over and over again.

  • When Xbox Live first launched way back when, they had a similar feature which let you alter your voice according to several different modulations when you were playing online. It was notoriously unpopular and was subsequently removed. Not sure how this is different.

  • Ok, this is cool. But, I also see big problems with this, especially when you have people claiming that posting a meme with a black person is ‘digital blackface’ if you’re not black.

    The white guy makes there voice sound black so he can say the N-word as much as he likes. Personally, I don’t care either way, but people will find out and be outrage. Its how the internet works.

    It could be used for sacmers aswell. Not to mention all the other ways it could be and will be miss used.

Show more comments

Comments are closed.

Log in to comment on this story!