Google Speech Synthesis Gets More Realistic
Google says it's made the most realistic computer speech simulation ever. It uses artificial intelligence to reproduce the way humans put words together.
The idea of Google's "Cloud Text-to-Speech" is to go beyond the traditional approach when dealing with speech synthesis. That effectively boils down to recording a batch of sound files of different syllables, then patching them together to form words. That works well for some languages such as Japanese, where speech patterns are very regular, but not so well for language such as English that have more complexity with pronunciation.
Full Sentences Analyzed
For example, the way that different syllables blend together isn't always consistent in English. There's also some variance in the way that some syllables are stressed more than others. That's why some speech synthesis still sounds like a machine talking.
Google says it's approach involves taking recordings of human beings saying different words and then analyzing the audio wave patterns. Its system can then take genuine examples of how different people say real words and then combine them into a consistent voice. The company says that because it works from recordings of full speech rather than standalone syllables, the results have more realistic features right down to the sounds of lips smacking on some words. (Source: theverge.com)
'Deep Mind' Behind System
Taking this approach uses a huge amount of computing power, which is why Google's system works by remote processing on its servers rather than on individual computers or devices. It's powered by "Deep Mind," its system that aims to mirror the way humans learn different skills and make intelligent decisions. Using the remote servers gives enough power that 20 second's worth of audio can be generated in one second.
Though Google plans to license the system for websites and applications, it has a free trial page. Although it certainly wouldn't be confused with a real person speaking, it is an improvement on some previous technologies. For example, it pronounces words in a more appropriate manner when they form part of a question rather than a mere statement. It's also good at capturing some geographic variances such as Australian speakers of English increasing their pitch throughout a sentence. (Source: google.com)
What's Your Opinion?
Does synthesized speech need to be improved? What uses can you see for it if it gets more realistic? Do you think computerized speech can ever sound completely believable?
Most popular articles
- Which Processor is Better: Intel or AMD? - Explained
- How to Prevent Ransomware in 2018 - 10 Steps
- 5 Best Anti Ransomware Software Free
- How to Fix: Computer / Network Infected with Ransomware (10 Steps)
- How to Fix: Your Computer is Infected, Call This Number (Scam)
- Scammed by Informatico Experts? Here's What to Do
- Scammed by Smart PC Experts? Here's What to Do
- Scammed by Right PC Experts? Here's What to Do
- Scammed by PC / Web Network Experts? Here's What to Do
- How to Fix: Windows Update Won't Update
- Explained: Do I need a VPN? Are VPNs Safe for Online Banking?
- Explained: VPN vs Proxy; What's the Difference?
- Explained: Difference Between VPN Server and VPN (Service)
- Forgot Password? How to: Reset Any Password: Windows Vista, 7, 8, 10
- How to: Use a Firewall to Block Full Screen Ads on Android
- Explained: Absolute Best way to Limit Data on Android
- Explained: Difference Between Dark Web, Deep Net, Darknet and More
- Explained: If I Reset Windows 10 will it Remove Malware?
My name is Dennis Faas and I am a senior systems administrator and IT technical analyst specializing in cyber crimes (sextortion / blackmail / tech support scams) with over 30 years experience; I also run this website! If you need technical assistance , I can help. Click here to email me now; optionally, you can review my resume here. You can also read how I can fix your computer over the Internet (also includes user reviews).
We are BBB Accredited
We are BBB accredited (A+ rating), celebrating 21 years of excellence! Click to view our rating on the BBB.
Comments
Tech Support Scammers
Surely there will be many great things to come from this technology. That said, the one thing that came to mind when reading this article is that tech support scammers (especially from India) could use this technology to significantly improve their reach.
Let's look at an example: a dead give away that "something just isn't right" when dealing with these people over the phone (if and when they make a call to your home, claiming your PC has a virus) is that they are foreign, very difficult to understand, and they are asking for money to "fix" the "problem". Now, if you get rid of the "foreign" and "difficult to understand" bit, the scam seems that much more legitimate. "Microsoft Bob" from India now sounds like the legit "Microsoft Bob" from Redmond, Washington. When this technology improves even further, it will be used in real-time conversations.
I recently watched the TV show "Click" from the BBC which demonstrated how video and speech are being analyzed and reprogrammed to generate fake video and speech. There's even a fake speech given by Barack Obama via Youtube.