conversational ai, ai, tomorrow, nvidia, nvidia nemo, interspeech 2021, gpus, voice synthesis, news

NVIDIA

The voices on Amazon's Alexa, Google Assistant and other AI assistants are far ahead of old-school GPS devices, but they still lack the rhythms, intonation and other qualities that make speech sound, well, human. NVIDIA has unveiled new research and tools that can capture those natural speech qualities by letting you train the AI system with your own voice, the company announced at the Interspeech 2021 conference.

To improve its AI voice synthesis, NVIDIA’s text-to-speech research team developed a model called RAD-TTS, a winning entry at an NAB broadcast convention competition to develop the most realistic avatar. The system allows an individual to train a text-to-speech model with their own voice, including the pacing, tonality, timbre and more.

Another RAD-TTS feature is voice conversion, which lets a user deliver one speaker's words using another person's voice. That interface gives fine, frame-level control over a synthesized voice’s pitch, duration and energy.

Using this technology, NVIDIA's researchers created more conversational-sounding voice narration for its own I Am AI video series using synthesized rather than human voices. The aim was to get the narration to match the tone and style of the videos, something that hasn't been done well in many AI narrated videos to date. The results are still a bit robotic, but better than any AI narration I've ever heard.

"With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator’s voice. Using this baseline narration, the producer could then direct the AI like a voice actor — tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video’s tone," NVIDIA wrote.

NVIDIA is distributing some of this research — optimized to run efficiently on NVIDIA GPUs, of course — to anyone who wants to try it via open source through the NVIDIA NeMo Python toolkit for GPU-accelerated conversational AI, available on the company's NGC hub of containers and other software.

"Several of the models are trained with tens of thousands of hours of audio data on NVIDIA DGX systems. Developers can fine tune any model for their use cases, speeding up training using mixed-precision computing on NVIDIA Tensor Core GPUs," the company wrote.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.

Internet Explorer Channel Network


LATEST NEWS

NEWS RELATED

The Coming Convergence of NFTs and Artificial Intelligence

Building AI capabilities into the lifecycle of NFTs opens the door to forms of intelligent ownership, says the CEO of IntoTheBlock.

Read more: The Coming Convergence of NFTs and Artificial Intelligence

AI, cloud, hybrid work headline Gartner's top tech trends for 2022

The C-suite’s recognition of IT as an engine for business transformation rages on, with CIOs taking centre stage in improving digital experiences and advancing digital solutions.

Read more: AI, cloud, hybrid work headline Gartner's top tech trends for 2022

Here's everything Google announced at its Pixel 6 event

Google may have announced the Pixel 6 and 6 Pro in August, but we had to wait more than three months to get the full story on its latest phones. On Tuesday, the company did just that, detailing nearly every aspect of their design and software. Pixel 6 and 6…

Read more: Here's everything Google announced at its Pixel 6 event

The Pixel 6's camera will feature larger image sensors and smarter photo editing AI

Google The Pixel 6 smartphone has finally been unveiled. On Tuesday, Google executives explained what sorts of cameras and image capture systems the new handsets will offer when they go on sale October 28th. Both the Pixel 6 and 6 Pro will come equipped with a 50 MP Octa PD…

Read more: The Pixel 6's camera will feature larger image sensors and smarter photo editing AI

AI for Breast Cancer: UK's Shortage of Radiologists to Be Answered by Tech to Detect the Condition

Artificial Intelligence (AI) for breast cancer detection is not a new thing anymore, but for the United Kingdom, it is a new path to take, especially as it has a shortage of Radiologists for detecting it. The problem of breast cancer can affect almost anyone, and there are no exact…

Read more: AI for Breast Cancer: UK's Shortage of Radiologists to Be Answered by Tech to Detect the Condition

Depth Sensing AI Camera OAK-D Lite Gets Smaller and Smaller

(Photo : Image from Kickstart) Depth Sensing AI Camera OAK-D Lite Gets Smaller and Smaller | Full-Color 4K, Greyscale, Onboard AI Machine Vision Processing The new OAK-D is reportedly an open-source, full-color depth sensing camera that is embedded with AI capabilities. Currently, there is now a crowdfunding campaign going on…

Read more: Depth Sensing AI Camera OAK-D Lite Gets Smaller and Smaller

Google AI Department Gets Class-Action Lawsuit for 1.6 Million Confidential Medical Records of NHS Patients

(Photo : Image from Unsplash Website) Google AI Department Gets Class-Action Lawsuit for 1.6 Million Confidential Medical Records of NHS Patients The Google AI department otherwise known as DeepMined, the Google-owned AI research company, is getting a class-action lawsuit. The lawsuit focuses on the company’s use of personal records of…

Read more: Google AI Department Gets Class-Action Lawsuit for 1.6 Million Confidential Medical Records of NHS Patients

Bank Robbers Used Deepfake Voice for $35 Million Heist

(Photo : Image from Unsplash Website) Bank Robbers Used Deepfake Voice for $35 Million Heist | AI-Enhanced Voice Simulation Used Bank robbers appeared to have stolen a massive $35 million from a United Arab Emirates bank by using the help of AI-enhanced voice simulation. The robbers reportedly used deepfake to…

Read more: Bank Robbers Used Deepfake Voice for $35 Million Heist

Facebook is using first-person videos to train future AIs

Google turns its AI on traffic lights to reduce pollution

Samsung hopes to 'copy and paste' the brain to 3D chip networks

AI startup Boomy looks to turn the music industry on its ear

European Parliament calls for a ban on facial recognition in public spaces

Google Lens is coming to Chrome on the desktop

What rights does an evil sentient computer have on Star Trek?

Sony's head of AI research wants to build robots that can win a Nobel Prize

OTHER NEWS