OpenAI, a prominent artificial intelligence research organization, published a blog post on March 29, discussing their latest development: Voice Engine. This model, created in late 2022, can generate natural-sounding speech that closely resembles the original speaker using only a 15-second audio sample and text input. While the technology is impressive, OpenAI is cautious about its broader release due to the potential for misuse.
Voice Engine has already been used in various applications, such as powering preset voices in OpenAI’s text-to-speech API and enhancing ChatGPT Voice and Read Aloud features. To better understand the real-world applications of Voice Engine, OpenAI has been working with a select group of trusted partners since late 2022.
These collaborations have yielded interesting results, with companies like Age of Learning using Voice Engine for personalized educational content, HeyGen leveraging it for video translation, and Dimagi utilizing it to provide interactive feedback to community health workers. The technology has even been piloted in healthcare, with the Norman Prince Neurosciences Institute at Lifespan using it to restore the voices of patients with speech impairments.
However, OpenAI is well aware of the risks associated with generating speech that closely mimics people’s voices, particularly in an election year. To address these concerns, the company has implemented safety measures and usage policies for their partners, such as prohibiting impersonation without consent, requiring explicit permission from the original speaker, and using watermarking to trace the origin of generated audio.
As synthetic speech technology advances, OpenAI is advocating for proactive measures to ensure its responsible deployment. This includes phasing out voice-based authentication for sensitive information, educating the public on the capabilities and limitations of AI, and developing techniques to track the origin of audiovisual content.
In line with their commitment to AI safety, OpenAI has decided to preview Voice Engine but not release it widely at this time. By sharing these insights, the company aims to initiate a conversation about the future of synthetic voices and the necessary steps to harness their potential while mitigating the risks of misuse.
Here are a few reactions to OpenAI’s announcement:
Featured Image via Pixabay