SlideShare une entreprise Scribd logo
1  sur  42
Under guidance of
Dr. G. Pradhan
NIT PATNA (ECE dept.)
Presented by -
Kamlesh Kalvaniya -(1104080)
Niranjan Kumar –(1104087)
Piyush Kumar-(1104091)
B.TECH 4th yr (ECE dept.)
4/30/2016 N.I.T. PATNA ECE, DEPTT. 1
1. Introduction
2. Baseline speaker verification system
3. Future Plan
Speaker Recognition is the computing task of validating
identity claim of a person from his/her voice.
Applications:-
Authentication
Forensic test
Security system
ATM Security Key
Personalized user interface
Multi speaker tracking
Surveillance
4/30/2016 N.I.T. PATNA ECE, DEPTT. 3
Identification v/s verification
4/30/2016 N.I.T. PATNA ECE, DEPTT. 4
Phase of Speaker Verification
• Enrollment Session or Training Phase
• Operating Session or Testing Phase
4/30/2016 N.I.T. PATNA ECE, DEPTT. 5
Training & Testing Phase
Training Reference model
Speech
Identity claim
Testing
Speech R
Accept/reject
Pre-
processing
Feature
extraction
Model
Building
Pre-
processing
Feature
extraction comparison
Decision
logic
4/30/2016 N.I.T. PATNA ECE, DEPTT. 6
Preprocessing
Preprocessing is an important step in a speaker verification system. This also called
voice activity detection (VAD).
VAD separates speech region from non-speech regions[2-3]
It is very difficult to implement a VAD algorithm which works consistently for
different type of data
VAD algorithms can be classified in two groups
 Feature based approach
 Statistical model based approach
 Each of the VAD method have its own merits and demerits depending on accuracy,
complexity etc.
Due to simplicity most of the speaker verification systems use signal energy for VAD.
4/30/2016 N.I.T. PATNA ECE, DEPTT. 7
The speech signal along with speaker information
contains many other redundant information like
recording sensor, channel, environment etc.
The speaker specific information in the speech
signal[2]
 Unique speech production system
 Physiological
 Behavioral aspects
Feature extraction module transforms speech to a set
of feature vectors of reduce dimensions
 To enhance speaker specific information
 Suppress redundant information.
Feature Extraction
4/30/2016 N.I.T. PATNA ECE, DEPTT. 8
• Robust against noise and distortion
• Occur frequently and naturally in speech
• Be easy to measure from speech signal
• Be difficult to impersonate/mimic
• Not be affected by the speaker’s health or long term variations in voice
Selection of Features
4/30/2016 N.I.T. PATNA ECE, DEPTT. 9
Types Of Features
4/30/2016 N.I.T. PATNA ECE, DEPTT. 10
Feature Extraction Techniques
A wide range of approaches may be used to parametrically represent the speech
signal to be used in the speaker recognition activity.
 Linear Prediction Coding
 Linear Predictive Ceptral Coefficients
 Mel Frequency Ceptral Coefficients
 Perceptual Linear Prediction
 Neural Predictive Coding
Most of the state-of-the-art speaker verification systems use Mel-frequency
Cepstral Coefficient (MFCC) appended to it’s first and second order derivative
as the feature vectors
Easy to extract
Provides best performance compared to other features
 MFCC mostly contains information about the resonance structure of the vocal
tract system
4/30/2016 N.I.T. PATNA ECE, DEPTT. 11
1. Analog to digital conversion
2. Pre emphasis
3. Framing & windowing
4. Fast Fourier Transform
5. Mel scale wrapping
6. MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 12
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 13
Step 1:- Analog to digital conversion: is transformed to
digital form by sampling it at given frequency.
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 14
Step 2:- Pre-emphasis: The amount of energy present in
the high frequency (important for speech) are boosted.
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 15
Step 3:(framing)the signal is divided into frames
of given size.
MFCC FRAMING
4/30/2016 N.I.T. PATNA ECE, DEPTT. 16
MFCC FRAMING
4/30/2016 N.I.T. PATNA ECE, DEPTT. 17
MFCC FRAMING
4/30/2016 N.I.T. PATNA ECE, DEPTT. 18
MFCC FRAMING
4/30/2016 N.I.T. PATNA ECE, DEPTT. 19
25ms
10ms
MFCC WINDOWING
• The next step is to window individual frame to
minimize the signal discontinuities at the
beginning and end of each frame.
• The concept applied here is to minimize the
spectral distortion by using the window to
taper the signal to zero at the beginning and
end of each frame.
• We have used hamming window
4/30/2016 N.I.T. PATNA ECE, DEPTT. 20
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 21
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 22
MEL FILTERBANK
4/30/2016 N.I.T. PATNA ECE, DEPTT. 23
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 24
DCT
MFCC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 25
DCT
4/30/2016 N.I.T. PATNA ECE, DEPTT. 26
Speaker Modelling
• Vector Quantization
• Gaussian Mixture Model
• Gaussian Mixture Model-UBM
• Hidden Markov Model
• Artificial Neural Networks
• Super Vector Machines
• I-Vector
 Gaussian model assumes the feature vectors follow a Gaussian distribution,
characterized by mean vectors, covariance matrix and weights
 The data unseen in the training which appear in the test data will trigger a low
score
Speaker models the statistical information present in the
feature vectors it enhances the speaker information and
suppress the redundant information
 A Gaussian mixture density defined as-
A Gaussian function for D dimension is defined as-
where- Unimodal Gaussian
D=8,16,32,64
ʎ i = {wi , ∑i µi }
wi = Weight
µi = Mean ;
∑i = Covariance ;
i-No. of models(M=356)
4/30/2016
N.I.T. PATNA ECE, DEPTT.
27
Gaussian Mixture Model
 For a sequence of T training vector X={x1 , x2 ,…, xT }
the GMM likelihood can be defined as-
 For estimation of speaker specific GMM,
Expectation maximization algorithm is used .
4/30/2016 N.I.T. PATNA ECE, DEPTT. 28
4/30/2016 N.I.T. PATNA ECE, DEPTT. 29
ʎtarget : X(MFCC(TESTING DATA)) is from the hypothesized
speaker S
ʎUBM : X(MFCC(TESTING DATA)) is not from the
hypothesized speaker S
 The likelihood ratio test is given by-
LR(X)=
 The probability of alternative hypothesis
P(X/ʎUBM ) =F( P(X/ʎ1), P(X/ʎ2),..., P(X/ʎM))
F( ) is function such as average or maximum of likelihood
value of Background Speaker set ( P(X/ʎi) ) .
4/30/2016 N.I.T. PATNA ECE, DEPTT.
30
 Score Normalisation
Where-
s- Original Score = log(LR(X));
µI - Estimated mean of s
σI -standard deviation of s
4/30/2016 N.I.T. PATNA ECE, DEPTT. 31
PERFORMANCE EVALUATION
 NIST has conducted speaker recognition
benchmarking activity on annual basis since
1997.
NIST has provided speech files as development
data.
NIST 2003 data-
Testing Speech Data-2559
Train Speech Data-356
UBM Female Speech data-251
UBM male Speech data-251
4/30/2016 N.I.T. PATNA ECE, DEPTT. 32
For Baseline speaker verification the following parameter are
used
 VAD: Energy based VAD (0.6 * average energy)
 Feature vector: 13 dimension MFCC appended with delta
and delta-delta
 Modeling: GMM
 GMM size: 8, 16, 32, 64.0
 Comparison: log Likelihood score
.
4/30/2016 N.I.T. PATNA ECE, DEPTT. 34
DET
PLOT
FOR
TEST
15 Sec
AND
TRAIN
15
SEC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 35
DET
PLOT
FOR
TEST
FULL
AND
TRAIN
15
SEC
4/30/2016 N.I.T. PATNA ECE, DEPTT. 36
DET
PLOT
FOR
TEST
15 Sec
AND
TRAIN
FULL
4/30/2016 N.I.T. PATNA ECE, DEPTT. 37
DET
PLOT
FOR
TEST
FULL
AND
TRAIN
FULL
4/30/2016 N.I.T. PATNA ECE, DEPTT. 38
Comparison of training data model
with Equal Error Rate
.
4/30/2016 N.I.T. PATNA ECE, DEPTT. 39
GAUSSIAN SIZE
8
16
32
64
TEST 15 Sec
TRAIN 15 SEC
Test Full
Train 15 sec
TEST 15 sec
Train Full
Test Full
Train Full
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
EQUAL ERROR
RATE(%)
34.90 34.24 33.18 27.70
33.05 32.28 30.50 25.67
32.46 32.94 28.78 23.67
32.82 33.06 27.42 22.05
Conclusion
 Performance is more sensitive to training
data.
4/30/2016 N.I.T. PATNA ECE, DEPTT. 40
Future Plan
 Synthetically generating training and testing speech
from limited speech data.
 Validating the results on state-of-the-art i-vector
based speaker verification system.
4/30/2016 N.I.T. PATNA ECE, DEPTT. 41
Thank you
4/30/2016 N.I.T. PATNA ECE, DEPTT. 42

Contenu connexe

Tendances (20)

Speech processing
Speech processingSpeech processing
Speech processing
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognition
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Multi modal biometric system
Multi modal biometric systemMulti modal biometric system
Multi modal biometric system
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Pattern recognition voice biometrics
Pattern recognition voice biometricsPattern recognition voice biometrics
Pattern recognition voice biometrics
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
GMM
GMMGMM
GMM
 

En vedette

Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Automatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAutomatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAbdullah al Mamun
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication systemNadeeshani Aththanagoda
 
Speaker identification system with voice controlled functionality
Speaker identification system with voice controlled functionalitySpeaker identification system with voice controlled functionality
Speaker identification system with voice controlled functionalityarizhamid786
 
A presentation on electronic wallet
A presentation on electronic walletA presentation on electronic wallet
A presentation on electronic walletNitish Xavier Tirkey
 
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...Dvizma Sinha
 

En vedette (9)

Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Automatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approachAutomatic Speaker Recognition system using MFCC and VQ approach
Automatic Speaker Recognition system using MFCC and VQ approach
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication system
 
Summarizing Lessons
Summarizing LessonsSummarizing Lessons
Summarizing Lessons
 
Speaker identification system with voice controlled functionality
Speaker identification system with voice controlled functionalitySpeaker identification system with voice controlled functionality
Speaker identification system with voice controlled functionality
 
E wallet
E walletE wallet
E wallet
 
A presentation on electronic wallet
A presentation on electronic walletA presentation on electronic wallet
A presentation on electronic wallet
 
Ewallet
EwalletEwallet
Ewallet
 
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...
Performance Improvisation of Automatic Speaker Recognition by Spectral Reverb...
 

Similaire à Speaker Recognition

Speaker Identification and Verification
Speaker Identification and VerificationSpeaker Identification and Verification
Speaker Identification and Verificationniranjan kumar
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...niranjan kumar
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...IJERA Editor
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System IJCSIS Research Publications
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2Minh Tuan Nguyen
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Sri Manakula Vinayagar Engineering College
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identificationTriloki Gupta
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesIJECEIAES
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationIRJET Journal
 

Similaire à Speaker Recognition (20)

Speaker Identification and Verification
Speaker Identification and VerificationSpeaker Identification and Verification
Speaker Identification and Verification
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
D04812125
D04812125D04812125
D04812125
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
Ijecet 06 09_010
Ijecet 06 09_010Ijecet 06 09_010
Ijecet 06 09_010
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identification
 
F43063841
F43063841F43063841
F43063841
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Voice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix FactorizationVoice Signal Synthesis using Non Negative Matrix Factorization
Voice Signal Synthesis using Non Negative Matrix Factorization
 

Dernier

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Speaker Recognition

  • 1. Under guidance of Dr. G. Pradhan NIT PATNA (ECE dept.) Presented by - Kamlesh Kalvaniya -(1104080) Niranjan Kumar –(1104087) Piyush Kumar-(1104091) B.TECH 4th yr (ECE dept.) 4/30/2016 N.I.T. PATNA ECE, DEPTT. 1
  • 2. 1. Introduction 2. Baseline speaker verification system 3. Future Plan
  • 3. Speaker Recognition is the computing task of validating identity claim of a person from his/her voice. Applications:- Authentication Forensic test Security system ATM Security Key Personalized user interface Multi speaker tracking Surveillance 4/30/2016 N.I.T. PATNA ECE, DEPTT. 3
  • 4. Identification v/s verification 4/30/2016 N.I.T. PATNA ECE, DEPTT. 4
  • 5. Phase of Speaker Verification • Enrollment Session or Training Phase • Operating Session or Testing Phase 4/30/2016 N.I.T. PATNA ECE, DEPTT. 5
  • 6. Training & Testing Phase Training Reference model Speech Identity claim Testing Speech R Accept/reject Pre- processing Feature extraction Model Building Pre- processing Feature extraction comparison Decision logic 4/30/2016 N.I.T. PATNA ECE, DEPTT. 6
  • 7. Preprocessing Preprocessing is an important step in a speaker verification system. This also called voice activity detection (VAD). VAD separates speech region from non-speech regions[2-3] It is very difficult to implement a VAD algorithm which works consistently for different type of data VAD algorithms can be classified in two groups  Feature based approach  Statistical model based approach  Each of the VAD method have its own merits and demerits depending on accuracy, complexity etc. Due to simplicity most of the speaker verification systems use signal energy for VAD. 4/30/2016 N.I.T. PATNA ECE, DEPTT. 7
  • 8. The speech signal along with speaker information contains many other redundant information like recording sensor, channel, environment etc. The speaker specific information in the speech signal[2]  Unique speech production system  Physiological  Behavioral aspects Feature extraction module transforms speech to a set of feature vectors of reduce dimensions  To enhance speaker specific information  Suppress redundant information. Feature Extraction 4/30/2016 N.I.T. PATNA ECE, DEPTT. 8
  • 9. • Robust against noise and distortion • Occur frequently and naturally in speech • Be easy to measure from speech signal • Be difficult to impersonate/mimic • Not be affected by the speaker’s health or long term variations in voice Selection of Features 4/30/2016 N.I.T. PATNA ECE, DEPTT. 9
  • 10. Types Of Features 4/30/2016 N.I.T. PATNA ECE, DEPTT. 10
  • 11. Feature Extraction Techniques A wide range of approaches may be used to parametrically represent the speech signal to be used in the speaker recognition activity.  Linear Prediction Coding  Linear Predictive Ceptral Coefficients  Mel Frequency Ceptral Coefficients  Perceptual Linear Prediction  Neural Predictive Coding Most of the state-of-the-art speaker verification systems use Mel-frequency Cepstral Coefficient (MFCC) appended to it’s first and second order derivative as the feature vectors Easy to extract Provides best performance compared to other features  MFCC mostly contains information about the resonance structure of the vocal tract system 4/30/2016 N.I.T. PATNA ECE, DEPTT. 11
  • 12. 1. Analog to digital conversion 2. Pre emphasis 3. Framing & windowing 4. Fast Fourier Transform 5. Mel scale wrapping 6. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 12
  • 13. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 13 Step 1:- Analog to digital conversion: is transformed to digital form by sampling it at given frequency.
  • 14. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 14 Step 2:- Pre-emphasis: The amount of energy present in the high frequency (important for speech) are boosted.
  • 15. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 15 Step 3:(framing)the signal is divided into frames of given size.
  • 16. MFCC FRAMING 4/30/2016 N.I.T. PATNA ECE, DEPTT. 16
  • 17. MFCC FRAMING 4/30/2016 N.I.T. PATNA ECE, DEPTT. 17
  • 18. MFCC FRAMING 4/30/2016 N.I.T. PATNA ECE, DEPTT. 18
  • 19. MFCC FRAMING 4/30/2016 N.I.T. PATNA ECE, DEPTT. 19 25ms 10ms
  • 20. MFCC WINDOWING • The next step is to window individual frame to minimize the signal discontinuities at the beginning and end of each frame. • The concept applied here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. • We have used hamming window 4/30/2016 N.I.T. PATNA ECE, DEPTT. 20
  • 21. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 21
  • 22. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 22
  • 23. MEL FILTERBANK 4/30/2016 N.I.T. PATNA ECE, DEPTT. 23
  • 24. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 24 DCT
  • 25. MFCC 4/30/2016 N.I.T. PATNA ECE, DEPTT. 25 DCT
  • 26. 4/30/2016 N.I.T. PATNA ECE, DEPTT. 26 Speaker Modelling • Vector Quantization • Gaussian Mixture Model • Gaussian Mixture Model-UBM • Hidden Markov Model • Artificial Neural Networks • Super Vector Machines • I-Vector  Gaussian model assumes the feature vectors follow a Gaussian distribution, characterized by mean vectors, covariance matrix and weights  The data unseen in the training which appear in the test data will trigger a low score Speaker models the statistical information present in the feature vectors it enhances the speaker information and suppress the redundant information
  • 27.  A Gaussian mixture density defined as- A Gaussian function for D dimension is defined as- where- Unimodal Gaussian D=8,16,32,64 ʎ i = {wi , ∑i µi } wi = Weight µi = Mean ; ∑i = Covariance ; i-No. of models(M=356) 4/30/2016 N.I.T. PATNA ECE, DEPTT. 27 Gaussian Mixture Model
  • 28.  For a sequence of T training vector X={x1 , x2 ,…, xT } the GMM likelihood can be defined as-  For estimation of speaker specific GMM, Expectation maximization algorithm is used . 4/30/2016 N.I.T. PATNA ECE, DEPTT. 28
  • 29. 4/30/2016 N.I.T. PATNA ECE, DEPTT. 29
  • 30. ʎtarget : X(MFCC(TESTING DATA)) is from the hypothesized speaker S ʎUBM : X(MFCC(TESTING DATA)) is not from the hypothesized speaker S  The likelihood ratio test is given by- LR(X)=  The probability of alternative hypothesis P(X/ʎUBM ) =F( P(X/ʎ1), P(X/ʎ2),..., P(X/ʎM)) F( ) is function such as average or maximum of likelihood value of Background Speaker set ( P(X/ʎi) ) . 4/30/2016 N.I.T. PATNA ECE, DEPTT. 30
  • 31.  Score Normalisation Where- s- Original Score = log(LR(X)); µI - Estimated mean of s σI -standard deviation of s 4/30/2016 N.I.T. PATNA ECE, DEPTT. 31
  • 32. PERFORMANCE EVALUATION  NIST has conducted speaker recognition benchmarking activity on annual basis since 1997. NIST has provided speech files as development data. NIST 2003 data- Testing Speech Data-2559 Train Speech Data-356 UBM Female Speech data-251 UBM male Speech data-251 4/30/2016 N.I.T. PATNA ECE, DEPTT. 32
  • 33. For Baseline speaker verification the following parameter are used  VAD: Energy based VAD (0.6 * average energy)  Feature vector: 13 dimension MFCC appended with delta and delta-delta  Modeling: GMM  GMM size: 8, 16, 32, 64.0  Comparison: log Likelihood score
  • 34. . 4/30/2016 N.I.T. PATNA ECE, DEPTT. 34
  • 39. Comparison of training data model with Equal Error Rate . 4/30/2016 N.I.T. PATNA ECE, DEPTT. 39 GAUSSIAN SIZE 8 16 32 64 TEST 15 Sec TRAIN 15 SEC Test Full Train 15 sec TEST 15 sec Train Full Test Full Train Full EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) EQUAL ERROR RATE(%) 34.90 34.24 33.18 27.70 33.05 32.28 30.50 25.67 32.46 32.94 28.78 23.67 32.82 33.06 27.42 22.05
  • 40. Conclusion  Performance is more sensitive to training data. 4/30/2016 N.I.T. PATNA ECE, DEPTT. 40
  • 41. Future Plan  Synthetically generating training and testing speech from limited speech data.  Validating the results on state-of-the-art i-vector based speaker verification system. 4/30/2016 N.I.T. PATNA ECE, DEPTT. 41
  • 42. Thank you 4/30/2016 N.I.T. PATNA ECE, DEPTT. 42