SlideShare une entreprise Scribd logo
1  sur  19
Juan Ortega
  10/20/09
   NTS490
Speaker recognition is the computing task of
validating a user’s claimed identity using
characteristics extracted from their voices.

Speaker recognizes who is speaking, where as
speech recognition recognizes what is being said.

Voice recognition is a combination of the two
where it uses learned aspects of a speakers voice
to determine what is being said.
Speaker verification has co-evolved with the technologies of
speech recognition and speech synthesis (TTS) because of the
similar characteristics and challenges associated with each.

1960 - Gunnar Fant, a Swedish professor published a
model describing the physiological components of
acoustic speech production, based on the analysis
of x-rays of individuals making specified phonic
sounds.

1970 – Dr. Joseph Perkell used motion x-rays and
included the tongue and jaw to expand on Fant’s
model. Original speaker recognition systems used
the average output of several analog filters to
perform matching – often aided by humans.
1976 – Texas Instruments built a prototype system
that was tested by the U.S. Air Force and The MITRE
Corporation.

Mid 1980s – The National Institute of Standards and
Technology (NIST) developed the NIST Speech
Group to study and promote the use of speech
processing techniques.

Since 1996 – Under funding from the NSA, the NIST
Speech Group has hosted yearly evaluations, the
NIST Speaker Recognition Workshop, to foster the
continued advancement of the speaker recognition
community.
The physiological component of voice recognition is
 related to the physical shape of an individuals vocal
 tract, which consists of an airway and the soft tissue
 cavities from which vocal sounds originate.

 The acoustic patterns of speech come from the physical
 characteristics of the airways. Motion of the mouth and
 pronunciations are the behavioral components of this
 biometric.

This source sound is
altered as it travels
through the vocal
tract, configured
differently based
on the position of
the tongue, lips,
mouth, and
pharynx.
Speech samples are waveforms with time on the
horizontal axis and loudness on the vertical access. The
speaker recognition system analyzes the frequency
content of the speech and compares characteristics
such as the quality, duration, intensity, dynamics, and
pitch of the signal.
r eh k ao g n ay z   s p iy ch

      "recognize speech"
r eh k ay n ay s b iy ch

     "wreck a nice beach"
Two major applications of speaker recognition
technologies and methodologies exist.

Speaker authentication or verification is the task of
validating the identity the speaker claims to be.
The verification is a 1:1 match where one speaker’s
voice is matches against one template (called
“voice print” or “voice model”).

Speaker identification is the task of determining an
unknown speaker’s identity. Identification is a 1:N
match where it is compared against N templates.
Text-Dependent require the speaker to provide
utterances (speak) of key words or sentences, the
same text being used for both training and
recognition.

Text-Independent is when predetermined key words
cannot be used. Human beings recognize speakers
irrespective of the content of the utterance.

Text-Prompted Methods prompts each user with a
new key sentence every time the system is used.
How can speaker recognitions normalize the
variation of likelihood values in speaker verification?

In order to compensate for the variations, two types
of normalization techniques have been tried:
parameter domain, and likelihood domain.

Adaptation of the reference model as well as the
verification threshold for each speaker is
indispensable to maintaining a high recognition
accuracy over a long period.
Parameter domain
Spectral equalization (“blind equalization”) has been
confirmed to be effective in reducing linear channel
effects and long-term spectral variation. This method is
especially effective for text-dependent speaker
recognition applications using sufficiently long
utterances.

Likelihood domain
Ratio is the conditional probability of the observed
measurements of the utterance given the claimed
identity is correct, to the conditional probability of the
observed measurements given the speaker is an
impostor.

Posteriori probability method is calculated by using a set
of speakers including the claimed speaker.
1) The quality/duration/loudness/pitch features are
   extracted from the submitted sample.
2) The extracted sample is compared to the claimed
   identity and other models. The other-speakers models
   contain the “states” of a variety of individuals, not
   including that of the claimed identity.
3) The input voice sample and enrolled models are
   compared to produce a “likelihood ratio”, indicating
   the likelihood of the input sample came from the
   claimed speaker.
How to update speaker models to cope with the gradual
changes in people’s voices.


It is necessary to build each speaker model based on a
small amount of data collected in a few sessions, and
then the model must be updated using speech data
collected when the system is used.

The reference template for each speaker is updated by
averaging new utterances and the present template
after time registration.

These methods have been extended and applied to
text-independent and text-prompted speaker
verification using HMMs.
Hidden Markov Models (HMMs) are random based
model that provides a statistical representation of the
sounds produced by the individual. The HMM represents
the underlying variations and temporal changes over
time found in the speech states using
quality/duration/intensity dynamics/pitch
characteristics.

Guassian Mixture Model (GMM) is a state-mapping
model closely related to HMM, often used for “text-
independent”. Uses the speaker’s voice to create a
number of vector “states” representing the various
sound forms. These methods all compare the similarities
and differences between the input voice and the stores
voice “states” to produce a recognition decision.
 Some companies use voiceprint recognition so people
can gain access to information or give authorization
without being physically present.

 Instead of stepping up to an iris scanner or hand
geometry reader, someone can give authorization by
making a phone call.

 Unfortunately, people can bypass some systems,
particularly those that work by phone, with a simple
recording of an authorized person's password. That's why
some systems use several randomly-chosen voice
passwords or use general voiceprints instead of prints for
specific words.
 Except for text-promoted systems, speaker recognition
are susceptible to spoofing attacks through the use of
recorded voice.

 Text-dependent systems are less suitable for public use.

 Noise in the background can be disruptive, although
equalizers may be used to fix this problem.

 Text-independent is currently under research, although
methods have been proposed calculating the rhythm,
speed, modulation, and intonation, based on personality
type and parental influence.

 Authentication is based on ratio and probability.

 Frequent enrollment needs to happen to deal with
voice changes.

 Someone who is deaf or mute can’t use this type of
biometrics.
 All you need is software and a microphone.

 Many methods have been proposed:
      Text-Dependent
          DTW-Based Methods
          HMM-Based Methods
      Text-Independent
          Long-Term-Statistics-Based Methods
          VQ-Based Methods
          Ergodic-HMM-Based Methods
          Speech-Recognition-Based Methods

 Fast authentication.

 Give someone else authentication.
http://www.youtube.com/watch?v=0ec1Gtnlq1k
Speaker recognition. Retrieved October 20, from Wikipedia
web site: http://en.wikipedia.org/wiki/Speaker_recognition

Sadoki, Dr. F. (2008). Speaker Recognition. Retrieved October
20, from Scholarpedia web site:
http://www.scholarpedia.org/article/Speaker_recognition#DT
W-Based_Methods

The Speaker Recognition Homepage. Retrieved October 20,
from speaker-recognition web site: http://www.speaker-
recognition.org/

(2006). Speaker Recognition. Retrieved October 20, from
biometrics web site:
http://www.biometrics.gov/Documents/SpeakerRec.pdf

Howstuffworks “How speech recognition works”. Retrieved
October 21, from howstuffworks web site:
http://electronics.howstuffworks.com/gadgets/high-tech-
gadgets/speech-recognition.htm/printable

Wilson. T. Howstuffworks “Voiceprints”. Retrieved October 21,
from howstuffworks web site:
http://science.howstuffworks.com/biometrics3.htm

Contenu connexe

Tendances

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition Systemijtsrd
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Identity authentication using voice biometrics technique
Identity authentication using voice biometrics techniqueIdentity authentication using voice biometrics technique
Identity authentication using voice biometrics techniqueeSAT Journals
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesHarshad Karmarkar
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionManthan Gandhi
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
2.302.bhatt.parth
2.302.bhatt.parth2.302.bhatt.parth
2.302.bhatt.parthParth Bhatt
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summaryAditya Deshmukh
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Jinho Choi
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basicssivakumar m
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 

Tendances (20)

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Identity authentication using voice biometrics technique
Identity authentication using voice biometrics techniqueIdentity authentication using voice biometrics technique
Identity authentication using voice biometrics technique
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devices
 
F334047
F334047F334047
F334047
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
2.302.bhatt.parth
2.302.bhatt.parth2.302.bhatt.parth
2.302.bhatt.parth
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summary
 
T0 numtq0nzq=
T0 numtq0nzq=T0 numtq0nzq=
T0 numtq0nzq=
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 

En vedette

Chapter 4 Form Factors Power Supplies
Chapter 4 Form Factors Power SuppliesChapter 4 Form Factors Power Supplies
Chapter 4 Form Factors Power SuppliesPatty Ramsey
 
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability Extension
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability ExtensionPreparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability Extension
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability ExtensionSafe Software
 
Drought: Looking Back and Planning Ahead, Todd Votteler
Drought: Looking Back and Planning Ahead, Todd VottelerDrought: Looking Back and Planning Ahead, Todd Votteler
Drought: Looking Back and Planning Ahead, Todd VottelerTXGroundwaterSummit
 
Chapter 4 Form Factors & Power Supplies
Chapter 4 Form Factors & Power SuppliesChapter 4 Form Factors & Power Supplies
Chapter 4 Form Factors & Power SuppliesPatty Ramsey
 
Chapter 9 Asynchronous Communication
Chapter 9 Asynchronous CommunicationChapter 9 Asynchronous Communication
Chapter 9 Asynchronous CommunicationPatty Ramsey
 
Application of dual output LiDAR scanning system for power transmission line ...
Application of dual output LiDAR scanning system for power transmission line ...Application of dual output LiDAR scanning system for power transmission line ...
Application of dual output LiDAR scanning system for power transmission line ...Pedro Llorens
 
PHP - Introduction to PHP - Mazenet Solution
PHP - Introduction to PHP - Mazenet SolutionPHP - Introduction to PHP - Mazenet Solution
PHP - Introduction to PHP - Mazenet SolutionMazenetsolution
 
Chapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationChapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationPatty Ramsey
 
Appendex g
Appendex gAppendex g
Appendex gswavicky
 
Appendex e
Appendex eAppendex e
Appendex eswavicky
 
300 Years of Groundwater Management, Charles Porter
300 Years of Groundwater Management, Charles Porter300 Years of Groundwater Management, Charles Porter
300 Years of Groundwater Management, Charles PorterTXGroundwaterSummit
 
Chapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationChapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationPatty Ramsey
 
Introduction to PHP - SDPHP
Introduction to PHP - SDPHPIntroduction to PHP - SDPHP
Introduction to PHP - SDPHPEric Johnson
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHPprabhatjon
 
Groundwater Research and Technology, Stefan Schuster
Groundwater Research and Technology, Stefan SchusterGroundwater Research and Technology, Stefan Schuster
Groundwater Research and Technology, Stefan SchusterTXGroundwaterSummit
 
Survey Grade LiDAR Technologies for Transportation Engineering
Survey Grade LiDAR Technologies for Transportation EngineeringSurvey Grade LiDAR Technologies for Transportation Engineering
Survey Grade LiDAR Technologies for Transportation EngineeringQuantum Spatial
 
Appendex a
Appendex aAppendex a
Appendex aswavicky
 
Aquifer Storage and Recovery, Kelley Neumann
Aquifer Storage and Recovery, Kelley NeumannAquifer Storage and Recovery, Kelley Neumann
Aquifer Storage and Recovery, Kelley NeumannTXGroundwaterSummit
 

En vedette (20)

Chapter 4 Form Factors Power Supplies
Chapter 4 Form Factors Power SuppliesChapter 4 Form Factors Power Supplies
Chapter 4 Form Factors Power Supplies
 
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability Extension
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability ExtensionPreparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability Extension
Preparing LiDAR for Use in ArcGIS 10.1 with the Data Interoperability Extension
 
Drought: Looking Back and Planning Ahead, Todd Votteler
Drought: Looking Back and Planning Ahead, Todd VottelerDrought: Looking Back and Planning Ahead, Todd Votteler
Drought: Looking Back and Planning Ahead, Todd Votteler
 
Chapter 4 Form Factors & Power Supplies
Chapter 4 Form Factors & Power SuppliesChapter 4 Form Factors & Power Supplies
Chapter 4 Form Factors & Power Supplies
 
Chapter 9 Asynchronous Communication
Chapter 9 Asynchronous CommunicationChapter 9 Asynchronous Communication
Chapter 9 Asynchronous Communication
 
Application of dual output LiDAR scanning system for power transmission line ...
Application of dual output LiDAR scanning system for power transmission line ...Application of dual output LiDAR scanning system for power transmission line ...
Application of dual output LiDAR scanning system for power transmission line ...
 
PHP - Introduction to PHP - Mazenet Solution
PHP - Introduction to PHP - Mazenet SolutionPHP - Introduction to PHP - Mazenet Solution
PHP - Introduction to PHP - Mazenet Solution
 
Chapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationChapter 10 Synchronous Communication
Chapter 10 Synchronous Communication
 
Appendex g
Appendex gAppendex g
Appendex g
 
Appendex e
Appendex eAppendex e
Appendex e
 
300 Years of Groundwater Management, Charles Porter
300 Years of Groundwater Management, Charles Porter300 Years of Groundwater Management, Charles Porter
300 Years of Groundwater Management, Charles Porter
 
Chapter 10 Synchronous Communication
Chapter 10 Synchronous CommunicationChapter 10 Synchronous Communication
Chapter 10 Synchronous Communication
 
Introduction to PHP - SDPHP
Introduction to PHP - SDPHPIntroduction to PHP - SDPHP
Introduction to PHP - SDPHP
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
C# programs
C# programsC# programs
C# programs
 
Ch07
Ch07Ch07
Ch07
 
Groundwater Research and Technology, Stefan Schuster
Groundwater Research and Technology, Stefan SchusterGroundwater Research and Technology, Stefan Schuster
Groundwater Research and Technology, Stefan Schuster
 
Survey Grade LiDAR Technologies for Transportation Engineering
Survey Grade LiDAR Technologies for Transportation EngineeringSurvey Grade LiDAR Technologies for Transportation Engineering
Survey Grade LiDAR Technologies for Transportation Engineering
 
Appendex a
Appendex aAppendex a
Appendex a
 
Aquifer Storage and Recovery, Kelley Neumann
Aquifer Storage and Recovery, Kelley NeumannAquifer Storage and Recovery, Kelley Neumann
Aquifer Storage and Recovery, Kelley Neumann
 

Similaire à Voice

Gender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rateGender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rateTELKOMNIKA JOURNAL
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
voice recognition
voice recognition voice recognition
voice recognition Hemant Jain
 
A Robust Speaker Identification System
A Robust Speaker Identification SystemA Robust Speaker Identification System
A Robust Speaker Identification Systemijtsrd
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal
 
Speech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logicSpeech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logicSnehal Patel
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.pptSahidKhan61
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 

Similaire à Voice (20)

Gender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rateGender voice classification with huge accuracy rate
Gender voice classification with huge accuracy rate
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
voice recognition
voice recognition voice recognition
voice recognition
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
A Robust Speaker Identification System
A Robust Speaker Identification SystemA Robust Speaker Identification System
A Robust Speaker Identification System
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Speech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logicSpeech recognition using neural + fuzzy logic
Speech recognition using neural + fuzzy logic
 
Interspeech-2006-data
Interspeech-2006-dataInterspeech-2006-data
Interspeech-2006-data
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.ppt
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Assign
AssignAssign
Assign
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
D017552025
D017552025D017552025
D017552025
 

Voice

  • 1. Juan Ortega 10/20/09 NTS490
  • 2. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. Speaker recognizes who is speaking, where as speech recognition recognizes what is being said. Voice recognition is a combination of the two where it uses learned aspects of a speakers voice to determine what is being said.
  • 3. Speaker verification has co-evolved with the technologies of speech recognition and speech synthesis (TTS) because of the similar characteristics and challenges associated with each. 1960 - Gunnar Fant, a Swedish professor published a model describing the physiological components of acoustic speech production, based on the analysis of x-rays of individuals making specified phonic sounds. 1970 – Dr. Joseph Perkell used motion x-rays and included the tongue and jaw to expand on Fant’s model. Original speaker recognition systems used the average output of several analog filters to perform matching – often aided by humans.
  • 4. 1976 – Texas Instruments built a prototype system that was tested by the U.S. Air Force and The MITRE Corporation. Mid 1980s – The National Institute of Standards and Technology (NIST) developed the NIST Speech Group to study and promote the use of speech processing techniques. Since 1996 – Under funding from the NSA, the NIST Speech Group has hosted yearly evaluations, the NIST Speaker Recognition Workshop, to foster the continued advancement of the speaker recognition community.
  • 5. The physiological component of voice recognition is related to the physical shape of an individuals vocal tract, which consists of an airway and the soft tissue cavities from which vocal sounds originate. The acoustic patterns of speech come from the physical characteristics of the airways. Motion of the mouth and pronunciations are the behavioral components of this biometric. This source sound is altered as it travels through the vocal tract, configured differently based on the position of the tongue, lips, mouth, and pharynx.
  • 6. Speech samples are waveforms with time on the horizontal axis and loudness on the vertical access. The speaker recognition system analyzes the frequency content of the speech and compares characteristics such as the quality, duration, intensity, dynamics, and pitch of the signal.
  • 7. r eh k ao g n ay z s p iy ch "recognize speech" r eh k ay n ay s b iy ch "wreck a nice beach"
  • 8. Two major applications of speaker recognition technologies and methodologies exist. Speaker authentication or verification is the task of validating the identity the speaker claims to be. The verification is a 1:1 match where one speaker’s voice is matches against one template (called “voice print” or “voice model”). Speaker identification is the task of determining an unknown speaker’s identity. Identification is a 1:N match where it is compared against N templates.
  • 9. Text-Dependent require the speaker to provide utterances (speak) of key words or sentences, the same text being used for both training and recognition. Text-Independent is when predetermined key words cannot be used. Human beings recognize speakers irrespective of the content of the utterance. Text-Prompted Methods prompts each user with a new key sentence every time the system is used.
  • 10. How can speaker recognitions normalize the variation of likelihood values in speaker verification? In order to compensate for the variations, two types of normalization techniques have been tried: parameter domain, and likelihood domain. Adaptation of the reference model as well as the verification threshold for each speaker is indispensable to maintaining a high recognition accuracy over a long period.
  • 11. Parameter domain Spectral equalization (“blind equalization”) has been confirmed to be effective in reducing linear channel effects and long-term spectral variation. This method is especially effective for text-dependent speaker recognition applications using sufficiently long utterances. Likelihood domain Ratio is the conditional probability of the observed measurements of the utterance given the claimed identity is correct, to the conditional probability of the observed measurements given the speaker is an impostor. Posteriori probability method is calculated by using a set of speakers including the claimed speaker.
  • 12. 1) The quality/duration/loudness/pitch features are extracted from the submitted sample. 2) The extracted sample is compared to the claimed identity and other models. The other-speakers models contain the “states” of a variety of individuals, not including that of the claimed identity. 3) The input voice sample and enrolled models are compared to produce a “likelihood ratio”, indicating the likelihood of the input sample came from the claimed speaker.
  • 13. How to update speaker models to cope with the gradual changes in people’s voices. It is necessary to build each speaker model based on a small amount of data collected in a few sessions, and then the model must be updated using speech data collected when the system is used. The reference template for each speaker is updated by averaging new utterances and the present template after time registration. These methods have been extended and applied to text-independent and text-prompted speaker verification using HMMs.
  • 14. Hidden Markov Models (HMMs) are random based model that provides a statistical representation of the sounds produced by the individual. The HMM represents the underlying variations and temporal changes over time found in the speech states using quality/duration/intensity dynamics/pitch characteristics. Guassian Mixture Model (GMM) is a state-mapping model closely related to HMM, often used for “text- independent”. Uses the speaker’s voice to create a number of vector “states” representing the various sound forms. These methods all compare the similarities and differences between the input voice and the stores voice “states” to produce a recognition decision.
  • 15.  Some companies use voiceprint recognition so people can gain access to information or give authorization without being physically present.  Instead of stepping up to an iris scanner or hand geometry reader, someone can give authorization by making a phone call.  Unfortunately, people can bypass some systems, particularly those that work by phone, with a simple recording of an authorized person's password. That's why some systems use several randomly-chosen voice passwords or use general voiceprints instead of prints for specific words.
  • 16.  Except for text-promoted systems, speaker recognition are susceptible to spoofing attacks through the use of recorded voice.  Text-dependent systems are less suitable for public use.  Noise in the background can be disruptive, although equalizers may be used to fix this problem.  Text-independent is currently under research, although methods have been proposed calculating the rhythm, speed, modulation, and intonation, based on personality type and parental influence.  Authentication is based on ratio and probability.  Frequent enrollment needs to happen to deal with voice changes.  Someone who is deaf or mute can’t use this type of biometrics.
  • 17.  All you need is software and a microphone.  Many methods have been proposed: Text-Dependent DTW-Based Methods HMM-Based Methods Text-Independent Long-Term-Statistics-Based Methods VQ-Based Methods Ergodic-HMM-Based Methods Speech-Recognition-Based Methods  Fast authentication.  Give someone else authentication.
  • 19. Speaker recognition. Retrieved October 20, from Wikipedia web site: http://en.wikipedia.org/wiki/Speaker_recognition Sadoki, Dr. F. (2008). Speaker Recognition. Retrieved October 20, from Scholarpedia web site: http://www.scholarpedia.org/article/Speaker_recognition#DT W-Based_Methods The Speaker Recognition Homepage. Retrieved October 20, from speaker-recognition web site: http://www.speaker- recognition.org/ (2006). Speaker Recognition. Retrieved October 20, from biometrics web site: http://www.biometrics.gov/Documents/SpeakerRec.pdf Howstuffworks “How speech recognition works”. Retrieved October 21, from howstuffworks web site: http://electronics.howstuffworks.com/gadgets/high-tech- gadgets/speech-recognition.htm/printable Wilson. T. Howstuffworks “Voiceprints”. Retrieved October 21, from howstuffworks web site: http://science.howstuffworks.com/biometrics3.htm