Accent Identification in speech recognition has the potential to provide value added phone services…English is the de-facto international language of choice to conduct all official business. For most people around the world, it happens to be the second language and therefore, English is spoken in myriad varieties of accents. However, different accents in spoken language can prove to be problematic.
For a moment consider that you are calling the customer service of a global service provider. The customer service has an automated answering service with built-in speech recognition, that prompts you to say specific information in English.
If your accent does not have a match with the one that the automated speech recognition system is programmed with, guess what, you may end up cursing the customer service and asking why they cannot provide you service from a human operator.
Consider another scenario, when you are dealing with a multi-site global project team, and you are required to have a teleconference with team members at least once in a week. The team members have non-native English accents that come in different flavors, such as Chinese English, Indian English and East European, to name a few. These multiple accents could prove to be an obstacle to effective and efficient communication within the team.
What if, we had access to Accent Identification service from our phone services provider?
The automated customer service speech recognition system could identify accents, to have a perfect match with spoken English words, and ease the frustration that resulted due to the absence of human operator.
The teleconferencing service could normalize multiple accents of spoken English, and convert spoken English from multiple accents to intelligible text that is visible to all team members in real time.
However, accent identification is technically a difficult challenge. Essentially, contemporary speech recognition systems use Hidden Markov Models (HMM) to recognize the basic sound segments in speech, called phonemes. These are probabilistic models and work reasonably well with a fixed accent in speech, but fail when there is a dramatic change in accent.
Fortunately, the HMMs can be trained to recognize a specific accent. So, any Accent Identification system could use multiple of these HMMs in parallel, where each HMM is trained for a specific known accent of spoken English. When the incoming speech segments, or phonemes are processed in parallel by the HMMs, the one with the highest match will be recognized as the correct accent.
However, the challenge lies in having tens, if not hundreds of these HMMs trained for all the known accents of spoken English. Add to this the ability to identify accents and recognize speech in real time, to make it a really viable business case for the phone service providers.
The anticipated payoffs that could result from a successful Accent Identification service should be a strong reason to invest funds for advanced research in these technologies.
What do you think?
Places to go from here: