Deep Voice AI of Baidu can Quickly Synthesize Realistic Human Speech
Chinese company Baidu revealed that develops system to convert text to speech, called Deep Voice, which is faster and more efficient than Google WaveNet. The company says that Deep Voice can be trained to speak for only a few hours and with minimal human interaction.
Baidu also claims that the system can synthesize speech so that it sounds natural and realistic. For this purpose, the company uses techniques deep training to convert text into phonemes, the smallest linguistic unit of speech. The software then converts these phonemes in sounds.
The system converts example, the word “hello” in (silence HH), (HH, EH), (EH, L), (L, OW), (OW, silence) before it is delivered. Both steps rely on deep training and require no human involvement. However, the system does not control which phonemes or syllables with emphasis and how long they are pronounce.
“Our team in Baidu’s Silicon Valley AI Lab (SVAIL) has been working on a speech recognition system named Deep Speech, with an ambitious goal: advancing the state of the art in speech recognition with a model that translates audio input directly to transcribed output. Most speech systems use many steps to make the translation, using hand-engineered representations in between. Deep Speech instead uses an artificial neural network that learns what is relevant in the audio and how to transcribe it directly.” Wrote from Baidu
At this point interfere specialists from Baidu – they “switch” words to change the emotion you want to express. Of course, this requires serious computing power. The computer must be able to generate words that will be spoken for 20 microseconds to mimic human interaction, explain the researchers of Baidu.
However, scientists believe that the synthesis of speech in real time is possible. They have already created a samples and collect feedback through Mechanical Turk on Amazon. They want more people to appreciate the quality of the service, and the results indicate that it is of excellent quality.http://gizbrain.com/2017/03/10/deep-voice-ai-of-baidu-can-synthesize-realistic-human-speech/https://i1.wp.com/gizbrain.com/wp-content/uploads/2017/03/Baidu-Deep-Speech.jpg?fit=1014%2C520https://i1.wp.com/gizbrain.com/wp-content/uploads/2017/03/Baidu-Deep-Speech.jpg?fit=190%2C150TECH NEWSAI,Artificial Intelligence,BaiduChinese company Baidu revealed that develops system to convert text to speech, called Deep Voice, which is faster and more efficient than Google WaveNet. The company says that Deep Voice can be trained to speak for only a few hours and with minimal human interaction. Baidu also claims that the...John GreenJohn GreenJohngreen@gizbrain.comContributorJohn Green is a writer at GizBrain.GizBrain - Your Tech Brain