SlideShare a Scribd company logo
1 of 21
Visit www.seminarlinks.blogspot.in to Download
Introduction
• Speech recognition is the process of converting an acoustic signal, captured
by a microphone or a telephone, to a set of words.
• The recognized words can be an end in themselves, as for applications such
as commands & control, data entry, and document preparation.
• They can also serve as the input to further linguistic processing in order to
achieve speech understanding.
• It is also known as Automatic Speech Recognition (ASR) ,computer speech
recognition, speech to text (STT).
History
• Around since the 1960s, ASR has seen steady, incremental improvement
over the years.
• It has benefited greatly from increased processing speed of computers in
the last decade, entering the marketplace in the mid-2000s.
• Early systems were acoustic phonetics-based and worked with small
vocabularies to identify isolated words.
• Over the years, vocabularies have grown while ASR systems have become
statistics-based
• They now have large vocabularies and can recognize continuous speech.
Basic Structure
Digital Sampling
• When you speak, you create vibrations in the air. The analog-to-digital
converter (ADC) translates this analog wave into digital data that the
computer can understand.
• To do this, it samples, or digitizes, the sound by taking precise
measurements of the wave at frequent intervals.
• The system filters the digitized sound to remove unwanted noise, and
sometimes to separate it into different bands of frequency.
Acoustic model
• Next the signal is divided into small segments as short as a few
hundredths of a second, or even thousandths in the case of plosive
consonant sounds -- consonant stops produced by obstructing airflow
in the vocal tract -- like "p" or "t."
• The program then matches these segments to known phonemes in
the appropriate language.
• A phoneme is the smallest element of a language -- a representation
of the sounds we make and put together to form meaningful
expressions.
Language model
• The program examines phonemes in the context of the other
phonemes around them.
• It runs the contextual phoneme plot through a complex statistical
model and compares them to a large library of known words, phrases
and sentences.
• The program then determines what the user was probably saying and
either outputs it as text or issues a computer command.
Statistical Modeling Systems
• These systems use probability and mathematical functions to
determine the most likely outcome.
• The two models that dominate the field today are the Hidden Markov
Model and Neural Networks.
• These methods involve complex mathematical functions, but
essentially, they take the information known to the system to figure
out the information hidden from it.
Hidden Markov Model (HMM)
• In this model, each phoneme is like a link in a chain, and the
completed chain is a word.
• The chain branches off in different directions as the program
attempts to match the digital sound with the phoneme that's most
likely to come next.
• During this process, the program assigns a probability score to each
phoneme, based on its built-in dictionary and user training.
Markov Model
Neural Networks
A class of statistical models may be called "neural" if they consist of

• sets of adaptive weights, i.e. numerical parameters that are tuned by
a learning algorithm, and
• are capable of approximating non-linear functions of their inputs.
The adaptive weights are conceptually connection strengths between
neurons, which are activated during training and prediction.
Each circular node represents an artificial neuron and an arrow represents a
connection from the output of one neuron to the input of another.
Program Training
• The process is more complicated for phrases and sentences -- the system
has to figure out where each word stops and starts.
• The statistical systems need lots of exemplary training data to reach their
optimal performance.

• Sometimes on the order of thousands of hours of human-transcribed
speech and hundreds of megabytes of text.
• The training data are used to create acoustic models of words, word lists
and multi-word probability networks.
• The details can make the difference between a well-performing system and
a poorly-performing system -- even when using the same basic algorithm.
Applications
• Transcription
• dictation, information retrieval

• Command and control
• data entry, device control, navigation, call routing

• Information access
• airline schedules, stock quotes, directory assistance

• Problem solving
• travel planning, logistics
Weaknesses and Flaws
• Low signal-to-noise ratio - The program needs to "hear" the words
spoken distinctly, and any extra noise introduced into the sound will
interfere with this.
• Overlapping speech- Current systems have difficulty separating
simultaneous speech from multiple users.

• Intensive use of computer power.
• Homonyms e.g. "There" and "their," "air" and "heir," "be" and "bee"
Major Challenges
• Making a system that can flawlessly handle roadblocks like
slang, dialects, accents and background noise.
• The different grammatical structures used by languages can also pose
a problem. For example, Arabic sometimes uses single words to
convey ideas that are entire sentences in English.
The Future of Speech Recognition
• The Defense Advanced Research Projects Agency (DARPA) has three teams
of researchers working on Global Autonomous Language Exploitation
(GALE), a program that will take in streams of information from foreign
news broadcasts and newspapers and translate them.
• It hopes to create software that can instantly translate two languages with
at least 90 percent accuracy.
• "DARPA is also funding an R&D effort called TRANSTAC to enable the
soldiers to communicate more effectively with civilian populations in nonEnglish-speaking countries.
Conclusion
At some point in the future, speech recognition may become speech
understanding.
The statistical models that allow computers to decide what a person
just said may someday allow them to grasp the meaning behind the
words.
Although it is a huge leap in terms of computational power and
software sophistication, some researchers argue that speech
recognition development offers the most direct line from the
computers of today to true artificial intelligence.
References
• http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speechrecognition.htm
• http://project.uet.itgo.com/speech.htm
• http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html
• http://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
• http://www.plumvoice.com/resources/blog/speech-recognition/
• http://en.wikipedia.org/wiki/Hidden_Markov_model
• http://en.wikipedia.org/wiki/Automatic_translation
Speech Recognition Technology

More Related Content

What's hot

Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 

What's hot (20)

Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
silent sound technology
silent sound technologysilent sound technology
silent sound technology
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Speech processing
Speech processingSpeech processing
Speech processing
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 

Viewers also liked

Medical Transcription
Medical TranscriptionMedical Transcription
Medical Transcription
aadhar14_b
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
jeanrummy
 

Viewers also liked (14)

Uses of speech recognition system
Uses of speech recognition systemUses of speech recognition system
Uses of speech recognition system
 
What is medical transcription
What is medical transcriptionWhat is medical transcription
What is medical transcription
 
Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...
 
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
 
Voice & Speech Recognition Technology in Healthcare
Voice &  Speech Recognition Technology in HealthcareVoice &  Speech Recognition Technology in Healthcare
Voice & Speech Recognition Technology in Healthcare
 
Noise Adaptive Training for Robust Automatic Speech Recognition
Noise Adaptive Training for Robust Automatic Speech RecognitionNoise Adaptive Training for Robust Automatic Speech Recognition
Noise Adaptive Training for Robust Automatic Speech Recognition
 
Medical Records Destruction Guide
Medical Records Destruction GuideMedical Records Destruction Guide
Medical Records Destruction Guide
 
Medical Transcription
Medical TranscriptionMedical Transcription
Medical Transcription
 
Translation and Transcription Process | Medical Transcription Service Company
Translation and Transcription Process | Medical Transcription Service Company  Translation and Transcription Process | Medical Transcription Service Company
Translation and Transcription Process | Medical Transcription Service Company
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
 
Medical Transcription Power Point Show
Medical Transcription Power Point ShowMedical Transcription Power Point Show
Medical Transcription Power Point Show
 
Transcription
TranscriptionTranscription
Transcription
 
Medical Records Role and its Maintenance.
Medical Records Role and its Maintenance.Medical Records Role and its Maintenance.
Medical Records Role and its Maintenance.
 
Medical records ppt
Medical records pptMedical records ppt
Medical records ppt
 

Similar to Speech Recognition Technology

NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
Mohammad Ilyas Malik
 

Similar to Speech Recognition Technology (20)

Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Speech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logicSpeech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logic
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
H010625862
H010625862H010625862
H010625862
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
VOICE BROWSER
VOICE BROWSERVOICE BROWSER
VOICE BROWSER
 

More from Seminar Links

More from Seminar Links (20)

Artificial Intelligence (A.I.) in Schools (PPT)
Artificial Intelligence (A.I.) in Schools (PPT)Artificial Intelligence (A.I.) in Schools (PPT)
Artificial Intelligence (A.I.) in Schools (PPT)
 
Sustainable Materials Management (SMM)
Sustainable Materials Management (SMM)Sustainable Materials Management (SMM)
Sustainable Materials Management (SMM)
 
Are Top Grades Enough (PPT)
Are Top Grades Enough (PPT)Are Top Grades Enough (PPT)
Are Top Grades Enough (PPT)
 
AI and Youth Employment (PPT)
AI and Youth Employment (PPT)AI and Youth Employment (PPT)
AI and Youth Employment (PPT)
 
Environmental Impacts of COVID-19 Pandemic: PPT
Environmental Impacts of COVID-19 Pandemic: PPTEnvironmental Impacts of COVID-19 Pandemic: PPT
Environmental Impacts of COVID-19 Pandemic: PPT
 
20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies
 
Claytronics | Programmable Matter | PPT
Claytronics | Programmable Matter | PPTClaytronics | Programmable Matter | PPT
Claytronics | Programmable Matter | PPT
 
Three-dimensional Holographic Projection Technology PPT | 2018
Three-dimensional Holographic Projection Technology PPT | 2018Three-dimensional Holographic Projection Technology PPT | 2018
Three-dimensional Holographic Projection Technology PPT | 2018
 
MicroLED : Latest Display Technology | PPT
MicroLED : Latest Display Technology | PPTMicroLED : Latest Display Technology | PPT
MicroLED : Latest Display Technology | PPT
 
Performance of 400 kV line insulators under pollution | PDF | DOC | PPT
Performance of 400 kV line insulators under pollution | PDF | DOC | PPTPerformance of 400 kV line insulators under pollution | PDF | DOC | PPT
Performance of 400 kV line insulators under pollution | PDF | DOC | PPT
 
Box Pushing Technique
Box Pushing TechniqueBox Pushing Technique
Box Pushing Technique
 
Highest Largest Tallest Longest in India 2018
Highest Largest Tallest Longest in India 2018Highest Largest Tallest Longest in India 2018
Highest Largest Tallest Longest in India 2018
 
Atmospheric Vortex Engine (AVE)
Atmospheric Vortex Engine (AVE) Atmospheric Vortex Engine (AVE)
Atmospheric Vortex Engine (AVE)
 
Artificial photosynthesis PPT
Artificial photosynthesis PPTArtificial photosynthesis PPT
Artificial photosynthesis PPT
 
How to prevent WannaCry Ransomware
How to prevent WannaCry RansomwareHow to prevent WannaCry Ransomware
How to prevent WannaCry Ransomware
 
Dams PPT
Dams PPTDams PPT
Dams PPT
 
Bio mass Energy
Bio mass EnergyBio mass Energy
Bio mass Energy
 
Babbitt material ppt
Babbitt material pptBabbitt material ppt
Babbitt material ppt
 
Ceramic Bearing ppt
Ceramic Bearing pptCeramic Bearing ppt
Ceramic Bearing ppt
 
Carbon Foam Military Applications
Carbon Foam Military ApplicationsCarbon Foam Military Applications
Carbon Foam Military Applications
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Speech Recognition Technology

  • 2. Introduction • Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. • The recognized words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. • They can also serve as the input to further linguistic processing in order to achieve speech understanding. • It is also known as Automatic Speech Recognition (ASR) ,computer speech recognition, speech to text (STT).
  • 3. History • Around since the 1960s, ASR has seen steady, incremental improvement over the years. • It has benefited greatly from increased processing speed of computers in the last decade, entering the marketplace in the mid-2000s. • Early systems were acoustic phonetics-based and worked with small vocabularies to identify isolated words. • Over the years, vocabularies have grown while ASR systems have become statistics-based • They now have large vocabularies and can recognize continuous speech.
  • 5. Digital Sampling • When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. • To do this, it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals. • The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency.
  • 6. Acoustic model • Next the signal is divided into small segments as short as a few hundredths of a second, or even thousandths in the case of plosive consonant sounds -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." • The program then matches these segments to known phonemes in the appropriate language. • A phoneme is the smallest element of a language -- a representation of the sounds we make and put together to form meaningful expressions.
  • 7. Language model • The program examines phonemes in the context of the other phonemes around them. • It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. • The program then determines what the user was probably saying and either outputs it as text or issues a computer command.
  • 8.
  • 9. Statistical Modeling Systems • These systems use probability and mathematical functions to determine the most likely outcome. • The two models that dominate the field today are the Hidden Markov Model and Neural Networks. • These methods involve complex mathematical functions, but essentially, they take the information known to the system to figure out the information hidden from it.
  • 10. Hidden Markov Model (HMM) • In this model, each phoneme is like a link in a chain, and the completed chain is a word. • The chain branches off in different directions as the program attempts to match the digital sound with the phoneme that's most likely to come next. • During this process, the program assigns a probability score to each phoneme, based on its built-in dictionary and user training.
  • 12. Neural Networks A class of statistical models may be called "neural" if they consist of • sets of adaptive weights, i.e. numerical parameters that are tuned by a learning algorithm, and • are capable of approximating non-linear functions of their inputs. The adaptive weights are conceptually connection strengths between neurons, which are activated during training and prediction.
  • 13. Each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.
  • 14. Program Training • The process is more complicated for phrases and sentences -- the system has to figure out where each word stops and starts. • The statistical systems need lots of exemplary training data to reach their optimal performance. • Sometimes on the order of thousands of hours of human-transcribed speech and hundreds of megabytes of text. • The training data are used to create acoustic models of words, word lists and multi-word probability networks. • The details can make the difference between a well-performing system and a poorly-performing system -- even when using the same basic algorithm.
  • 15. Applications • Transcription • dictation, information retrieval • Command and control • data entry, device control, navigation, call routing • Information access • airline schedules, stock quotes, directory assistance • Problem solving • travel planning, logistics
  • 16. Weaknesses and Flaws • Low signal-to-noise ratio - The program needs to "hear" the words spoken distinctly, and any extra noise introduced into the sound will interfere with this. • Overlapping speech- Current systems have difficulty separating simultaneous speech from multiple users. • Intensive use of computer power. • Homonyms e.g. "There" and "their," "air" and "heir," "be" and "bee"
  • 17. Major Challenges • Making a system that can flawlessly handle roadblocks like slang, dialects, accents and background noise. • The different grammatical structures used by languages can also pose a problem. For example, Arabic sometimes uses single words to convey ideas that are entire sentences in English.
  • 18. The Future of Speech Recognition • The Defense Advanced Research Projects Agency (DARPA) has three teams of researchers working on Global Autonomous Language Exploitation (GALE), a program that will take in streams of information from foreign news broadcasts and newspapers and translate them. • It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. • "DARPA is also funding an R&D effort called TRANSTAC to enable the soldiers to communicate more effectively with civilian populations in nonEnglish-speaking countries.
  • 19. Conclusion At some point in the future, speech recognition may become speech understanding. The statistical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence.
  • 20. References • http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speechrecognition.htm • http://project.uet.itgo.com/speech.htm • http://www.hitl.washington.edu/scivw/EVE/I.D.2.d.VoiceRecognition.html • http://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx • http://www.plumvoice.com/resources/blog/speech-recognition/ • http://en.wikipedia.org/wiki/Hidden_Markov_model • http://en.wikipedia.org/wiki/Automatic_translation