Amazon is trying to make Alexa more humanlike with dedicated news voice

0comments
Amazon is trying to make Alexa more humanlike with dedicated news voice
Voice assistants are slowly but steadily finding their way into our homes in various shapes and forms, as people warm up to the idea of conversing with a piece of software when they need something. And since voice assistants rely solely on, well, their voice, that’s a pretty important aspect of their appeal.

Amazon’s Alexa is currently the most popular assistant when it comes to dedicated smart speakers and while the way she speaks can hardly be mistaken for a human, it’s doing a decent job at not sounding too robotic. Alexa’s creators think that’s not good enough and are working on giving her a more natural sounding voice they call “newscaster style”, The Verge tells us. Actually, Alexa is doing most of the work herself.

The team is using machine learning to add a more natural cadence to Alexa’s voice by letting her listen to hours of actual newscasters. Using Amazon’s “neural text-to-speech" technology, Alexa can pick up on the variations in pronunciation and apply them to her speech, making the words sound more like they’re spoken in succession rather than just combined from different soundbites. The latter is what voice assistants are mostly doing today and it’s called concatenative speech synthesis. You can hear the difference for yourself from the samples provided by Amazon:

Concatenative speaking style:

Video Thumbnail


Newscaster NTTS:

Video Thumbnail


The improved speaking style still won’t fool you it’s a human, but it does sound better than the currently used one and you really get that news vibe in it. The reason why Amazon is focusing on "newscaster style" first is fairly simple, lots of people use Alexa to get the latest news while doing things around the house. And as it goes with machine learning, the software will only be getting better over time and other styles will likely follow. Developers must be careful with how far they’re pushing this technology, however. 

There’s an effect called “uncanny valley” that occurs when a piece of technology is so similar to humans, it provokes a negative reaction. Nowadays, it’s usually observed when a company presents a realistic looking robot with facial expressions, often deemed “creepy” by observers. Recently, Google showcased its Duplex technology that can make calls for you. One feature deliberately added by Google to make the conversations more natural were the “ah”, “oh”, “uh” and similar sounds the AI was using when talking. While they had the desired effect, some people found it too much and even blamed Google for trying to mislead people into thinking they’re speaking to a human. 

Recommended Stories
Alexa's new speaking style will be coming to Echo devices in the following weeks. If you're not into it, make sure your family knows it, as smart speakers are popular gifts this time of year.

Recommended Stories

Loading Comments...
FCC OKs Cingular\'s purchase of AT&T Wireless