The secret of Google's amazing voice recognition revealed: it works like a brain
Voice recognition technology like Siri and Google’s Voice Search in Android has really gone a long way. It all started materializing to users with the iPhone 4S and Android devices of about that time, and now the leading minds in tech like Apple co-founder Steve Wozniak and Microsoft’s Bill Gates all agree that this is the one thing that has immense potential to actually change the way we interact with our devices.
And while we were impressed with Siri when it first launched on the iPhone 4S in 2011, it was Google’s Voice Search swift and almost flawless voice recognition technology that has set the bar this high for voice.
But how does it work and what makes Google’s Voice Search so good?
We've heard it before and now we get one more confirmation that the inspiration for it comes from the neural networks in our brain. The implementation of the ‘neural network’ started in Jelly Bean and brought a whopping 25% drop in voice recognition errors.
Basically, using such ample cloud processing power, Google can analyze a ton of patterns - which in case of voice are spectograms - and use that to predict new patterns, much like the neurons in the brain would reconnect to accomplish new tasks.
There are a couple of layers in processing speech. First Google tries to understand the consonants and the vowels. That is the foundational layer. Next, it uses those to make intelligent guesses about the words. And then higher.
The same approach is actually applied to image analysis where you try to first detect edges in an image. Then check for edges close to each other to find a corner. Then go higher from there.
It’s all a fascinating revealing piece on the bits and pieces the future will be built on, and if you’re interested you can hit the original article at Wired below for the details.
source: Wired
But how does it work and what makes Google’s Voice Search so good?
We've heard it before and now we get one more confirmation that the inspiration for it comes from the neural networks in our brain. The implementation of the ‘neural network’ started in Jelly Bean and brought a whopping 25% drop in voice recognition errors.
‘"It really is changing the way that people behave." ...When you talk to Android's voice recognition software, the spectrogram of what you've said is chopped up and sent to eight different computers housed in Google's vast worldwide army of servers. It's then processed, using the neural network models built by Vanhoucke and his team.’
Basically, using such ample cloud processing power, Google can analyze a ton of patterns - which in case of voice are spectograms - and use that to predict new patterns, much like the neurons in the brain would reconnect to accomplish new tasks.
There are a couple of layers in processing speech. First Google tries to understand the consonants and the vowels. That is the foundational layer. Next, it uses those to make intelligent guesses about the words. And then higher.
The same approach is actually applied to image analysis where you try to first detect edges in an image. Then check for edges close to each other to find a corner. Then go higher from there.
source: Wired
Things that are NOT allowed: