The Future of Speech
Derek Austin, Interactive Technologies Manager for Nuance Communications, looks at speech recognition software and how it will help shape the way people interact with their world in the future.
Speech recognition provides perhaps the most natural way for humans to interact with computers. Whether it is a case of the fun filled antics of Star Trek or the more serious interactions that went on in Space Odyssey 2001, we know that somewhere in the future, our destiny is to talk to computers rather than use a keyboard.
But how far will speech recognition help define this future and what can we realistically expect in the next few years?
Speech recognition technologies in general are now accurate and there are two broad forms of recognition technology – speaker independent and speaker dependent. Speaker independent technologies work anywhere and at anytime.
For instance, you can dial into a call centre and the computer usually recognises what you say without having to know who you are. This is the easiest way to talk to a computer. However, the problem with speaker independent technologies is that the level of accuracy is limited in such situations.
The computer has a much harder job of understanding what you say if you can be anywhere, using any kind of accent, and speaking any language.
Speaker dependent technologies offer the highest accuracy, as the computer knows who you are. Speaker dependent technologies keep a profile of how you speak and what you say. These systems provide the most accurate transcription for writing business or personal correspondence, documents, scientific articles and the like.
Over time, these two forms of technologies will converge so that computers will recognise what you say no matter who you are, or the accent you are using. The technology should even be able to understand which language you are speaking and perhaps even identify you from your voiceprint.
The vision is a computer that you can talk to, that understands and converses with you. Developing the computer’s ability to increasingly understand individuals will take time. Right now however, the speech recognition and the text-to-speech technologies are mature and ready for action.
The challenge then, is to add further intelligence into the system as a whole. We are already seeing this in early forms of personal assistants on smart phones. You are able to check the weather, book a flight, check on aircraft departure times, and so on.
In these systems, speech recognition is only the front end to a much wider world. The system is connected to other services in the cloud and is able to answer reasonably intelligent questions from its human correspondent.
In the home, you can already control the television, set up recording schedules and find programs using your voice. It will not be too long before the rest of the house is similarly enabled. Thanks to home automation systems, there are control interfaces that run on smart phones and these can be used to turn on lights, appliances and adjust heat settings.
Cars now are not just a mode of transportation. Increasingly, they are becoming an intelligent system that drivers can interact with. You can already use your voice to control aspects of the car, to control the interface to entertainment, and of course, to navigate.
In the future, you will probably be able to converse with your car and tell it where you want to go so that it delivers you to the chosen destination. There is no doubt, that as we move forward, much of your world will be controlled by voice via speech recognition software combined with smart systems.
Speech recognition software will also help shape interaction in the business environment. When you contact the company for assistance with a product or service, you will probably still use a phone to do that.
Increasingly, customer service is moving to the web and you are able to talk to people using the Internet instead of your traditional phone line.
Smart systems will be able to answer your questions from extensive knowledge bases without you having to ever talk to a real human being. In the healthcare area, such systems have been rated as more helpful and empathic than real humans.
The ultimate destination for speech recognition software combined with smart systems though, is an environment where each piece of technology keeps track of how it is interacting with us and is able to hand over that interaction as we change locations and tasks. For example, if you are in the car and doing a web search for a new refrigerator, the car could be reading back details of pricing to you as you pull into your driveway.
Once you turn off the car and go inside, your TV set might take up the conversation and you can continue your search from the living room.
Mainstream speech recognition is increasingly defining the start of a new era where our devices are not only linked together, but are more intelligent and more capable of helping us to effectively interact with the world.