When considering voice recognition, firstly a distinction should be made between Speaker Recognition and Speech Recognition. The difference is that Speaker Recognition means that a digital device (such as a computer, PDA or mobile phone) recognises the person who is speaking to it, thus recognising the sound of a persons voice and being able to identify it and Speech Recognition is recognising what is being said, thus translating spoken words into written words.
To differentiate further, 2 subparts are recognised in Speaker Recognition; Speaker Verification (1 voice is compared to 1 template of that voice) and Speaker Authentication (1 voice compared to N templates of stored voices).
The term Voice recognition is often used for the combination of Speaker Recognition and Speech Recognition. This is when the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognise what is being said.
Various technologies have been developed to process and store voice prints. They include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Ambient noise levels can impede both collection of the initial and subsequent voice samples. Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect.
Nowadays, many computer software programs have built-in voice recognition software, which will turn spoken word into written words. These often require a person to pre-read a certain amount of text (this process is called enrollment) to be able to subsequently recognize that specific voice.
Speaker recognition is used a s a way of identifying an individual. Examples of uses are; to gain access to a secure system, to identify individuals on recordings in detective/police work,
Speech recognition has a wide range of possible uses. Some of which are:
- Health care; reduce the amount of medical transcriptionists and enable search options in Electronic Medical Records
- Military; reducing pilot workload by using voice commands for certain tasks, as a training aid when training air traffic controllers
- Aiding people with disabilities; assist people with limited use of their hands (also RSI-sufferers); enable deaf telephony; assist people with learning disabilities (problems with thought-to-paper communication)
- Automatic translation
- Telecommunications and Video-games
- Internet-based services
- Technological improvement of voice-recognition software
- Increased capacity of computers
- Illiteracy; increased need for computer applications for illiterate people
- Increased attention for needs of people with disabilities
- Increased availability of computers worldwide
- The cost of developing voice recognition software
- Other biometric technologies (such as fingerprinting, iris-scans)
In the future voice recognition could make other forms of identification obsolete. It will surpass other forms of biometrics, such as fingerprinting and iris scans, as the technology is simpler.
Increased use of computers and internet will speed up the development of voice-recognition software.
Voice-recognition could make the keyboard and mouse obsolete.
Voice-recognition can be very helpful to various services in regions where there is a high illiteracy rate, as users do not have to type to activate the services. This could be used in banking, healthcare, public administrations etc.
1870 This technology really began with Alexander Graham Bell's inventions in the 1870s.
1952 Bell Communications Research started to investigate speech recognition with zero crossing
1959 Kyoto University, Japan, developed “speech-recognition typewriter” utilizing the technology Bell Communication research developed.
1964 IBM presents an early Speech recognition device, the IBM Shoebox, at the New York's World Fair
1970s Russia and Japan simultaneously developed DP matching method, which normalizes utterance time length by using dynamic programming
1980s Two distinct types of commercial products were available. The first offered speaker-independent recognition of small vocabularies. It was most useful for telephone transaction processing. The second, offered by Kurzweil Applied Intelligence, Dragon Systems, and IBM, focused on the development of large-vocabulary voice recognition systems so that text documents could be created by voice dictation.
1990s Defense Advanced Research Projects Agency, U.S, started dictation program for speech recognition, which realized Q&A voice recognition system by n-Gram method