How Amazon brought Big B’s voice to Alexa – Times of India

Bengaluru: Lana Amitabh Bachchan’s voice Alexa There are two major technical challenges for heroine.
The voice should be exactly like that of Bachchan, as it is a voice that Indians recognize very well.
As Manoj Sindhwani, VP of Alexa Speech at Amazon says, “My mother is a huge fan of Mr. Bachchan. I worried that if there was a single flaw, I wouldn’t hear the end of it. ”
It was even more complicated, says Sindhwani, the way Bachchan speaks – he is very rich, and he speaks with a lot of emotion. It’s hard to get perfect for a text-to-speech voice system.
The second big challenge was to use ‘Amit ji’ as a wake-up word. The wake word is the word you use to activate Alexa – which has been ‘Alexa’ until now.
The Amazon team considered ‘Mr Bachchan’, ‘Bachchan ji’, ‘Amitabh Bachchan ji’, ‘Amitabh ji’ among other wake-up words. But no one looked as exciting as ‘Amit ji’. But it is so short, practically a single syllable, that many other words we use in everyday speech sound similar to it.
There may also be some big person in your house who is Amit or Ajit. It would be annoying to have Alexa wake up frequently for things she shouldn’t.
When people start using ‘Amit ji’ extensively we will know how well Amazon solved these issues. Inside Amazon, they are happy with what they have achieved.
Bachchan is only the fourth celebrity to be part of the Alexa voice feature, and the first outside the US. The first celebrity voice used was that of American actor Samuel L. Jackson, which was launched in December 2019.
Work with Bachchan involved technical teams in India, Poland, the UK and the US, and the actor recorded his voice over several sessions so that artificial intelligence (AI) systems could work on it.
To the entertainment of all of us – a sound engineer in Poland is on a first name basis with Bachchan, judging by all their interactions, says Puneesh Kumar, country leader for Alexa at Amazon India. He says that Bachchan is firm to the standards.
“There were so many occasions when we felt like, oh, that sounds great, it’s so close to your voice. And he was like, no, let’s try this one more, I want to make it perfect.”
The major technology used to perfect Bachchan’s speech is called the neural text-to-speech system. When you ask a question, the system first converts it to text, searches for the answer, and then converts the answer from the text to Bachchan’s voice.
“There are many ways to do text-to-speech, but the latest and greatest are based on deep neural networks, or deep learning,” says Sindhwani. It is one of the most advanced forms of machine learning or AI.
“These training methods have been able to produce models that reproduce not only Mr. Bachchan’s voice, but also his speaking style – the way he can emphasize certain words, fast forward on some occasions.” May grow, some may be slow on others. It went into a lot of innovation and thought,” he says.
Another complication was making sure Alexa recognized both ‘Alexa’ and ‘Amit ji’ as wake-up words. That generally takes a lot of memory and calculations.
“So, we used what we call multi target learning, where you have one input, and you try to predict multiple outputs. It’s super complex, it requires a lot of thought.” That’s how we model. And on top of that, this is covid, I can’t collect a lot of data, yet it has to work for the unique environment in India, with all the noise that is usually around,” Sindhwani it is said.
To overcome the data constraints, Amazon used transfer learning, where you take a skill learned from one domain and transfer that learning to another domain. “We shifted learning from something that works for moderate vocabulary recognition, to very specific recognition, which is surprising,” says Sindhwani.

.

Leave a Reply