SlideShare a Scribd company logo
1 of 44
Project Members
ASHOK SHARMA PAUDEL(066/BEX/405)
DEEPESH LEKHAK(066/BEX/414)
KESHAV BASHYAL(066/BEX/418)
SUSHMA SHRESTHA(066/BEX/444)
TEXT-INDEPENDENT SPEAKER
RECOGNITION SYSTEM
1
OVERVIEW OF PRESENTATION
1. Introduction
2. Objective
3. System Architecture
4. Methodology
5. Results and Analysis
6. Application area
7. Limitations
8. Problem Faced
9. Conclusion2
1. INTRODUCTION
 Speech - universal method of
communication.
 Information through speech signal
1. high-level Characteristics -syntax, dialect, style, overall
meaning of a spoken message.
2. low-level Characteristics- pitch and phonemic spectra
associated much more with the physiology of vocal tract.
3
1. INTRODUCTION(2)
4
1. INTRODUCTION(3)
 Speech is a diverse field with many
applications.
Speech
Recognition
Language
Recognition
Speaker
Recognition
Words
Language Name
Speaker Name
“How are you?”
English
“ Deepesh”
Speech
Signal
5
1. INTRODUCTION (4)
What is Speaker Recognition?
 Recognition of who is speaking based on
characteristics of their speech signal.
 Text-independent , Text-dependent
 Speaker Identification: Determines which
registered speaker has spoken.
 Speaker Verification: Accept or reject a
claimed identity of a speaker.
6
1. INTRODUCTION (5)
Biometric: a human generated signal or
attribute for authenticating a person’s
identity
Why Voice ?
– natural signal to produce
– Only biometric that allows users to authenticate
remotely.
– does not require a specialized input device,
Implementation cost is low
– ubiquitous: telephones and microphone equipped PC
7
Strongest
security
• Voice biometric with other forms of security
– Something you have - badge
– Something you are - voice
HaveKnow
Are
– Something you know - password
1. INTRODUCTION(6)
Why text independent speaker recognition ?
- Independent of text, easy to access, cannot be
forgotten or misplaced,
- Independent of language, Acceptable by user8
2. OBJECTIVE
The main goal of the project is to design and
implement a text-independent speaker
recognition system on FPGA.
The specific goals can be summarized as:
1. To learn about digital signal processing and FPGA.
2. To implement and analyze the system in MATLAB.
3. To design and implement the system on FPGA.
9
3. SYSTEM ARCHITECTURE
Universal Asynchronous Receiver Transmitter
Mel-Frequency Cespstral Coefficients
Mel-Spectrum
Fast Fourier Transform
Framing and Windowing
Pre-emphasis
Double Data Rate SDRAM Storage
Analog to Digital Conversion
Conditioning
Input audio
10
4. METHODOLOGY
Testing data Training data
Input signal
Feature extraction
Feature matching
Threshold
Output
11
4.1. System Implementation on MATLAB
4.1.1. Voice Capturing and Storage
-input through microphone, saved .wav
format
-sound used in format of 22050Hz, 16-bits
PCM, Mono Channel.
12
4.1.2. Pre-Processing
1) Silence removal
13
4.1.2. Pre-Processing(2)
1)Silence removal 2) Pre-
emphasis
s’[n]=s[n]-a s[n-1] ……
[1]
[1] Shi-Huang Chen and Yu-Ren Luo, Speaker Verification Using MFCC
and Support Vector Machine14
4.1.2. Pre-Processing(3)
1)Silence removal 2) Pre-emphasis 3)Framing
•Overlapping frames - frame block of 23.22ms with 50%
overlapping i.e., 512 samples per
frame
15
4.1.2. Pre-Processing(4)
1)Silence removal 2) Pre-emphasis 3)Framing 4)Windowing
x[n] = s’[n] . w[n-m]
if n=0,1,2,…,N-1
if n=m,m+1,…..m+N-1
[2]
[2] Shi-Huang Chen and Yu-Ren Luo , Speaker Verification Using
MFCC and Support Vector Machine16
4.1.3. Feature Extraction using MFCC
MFCC : Mel Filter Cepstral Coefficients
 Perceptual approach
the human perception of speech, are applied to
the sample frames extract the features of speech.
Steps for calculating MFCC
1. Discrete Fourier Transform using FFT and
Power spectrum , X[k]|2 of signal
17
4.1.3. Feature Extraction using MFCC(2)
2. Mel scaling
Mel scale : linear up to 1 KHz and logarithmic after 1 KHz
. Mapping the powers of the spectrum onto the Mel scale,
using Mel filter bank-Mel spectral coefficients G[k]
Filter bank:
overlapping windows
18
4.1.3. Feature Extraction using MFCC(3)
3.log of Mel spectral coefficients has been taken
log(G[k]).
4. Discrete Cosine Transform (DCT) ->Mel-cepstrum
c[q].
(Source: Shi-Huang Chen and Yu-Ren Luo , Speaker Verification
Using MFCC and Support Vector Machine)
(3.4)
19
4.1.3. Feature Extraction using MFCC(4)
mel
cepstrum
mel
spectrum
framecontinuous
speech
Frame
Blocking
Windowing FFT spectrum
Mel-frequency
Wrapping
Cepstrum
20
4.1.4. Feature Matching using GMM
Gaussian Mixture Model
Parametric probability
density function
Based on soft clustering
technique
Mixture of Gaussian
components
21
4.1.4. Feature Matching using GMM(2)
•GMM Training
22
4.1.4. Feature Matching using GMM(3)
The GMM modeling process consists of two
steps:
1. Initialization :
Initial value of mean, covariance & weight
assigned.
2. Expectation Maximization(EM)
Value of mean, covariance & weight
calculated adaptively by finding maximum
likelihood of parameters.23
4.1.5. Identification & Verification
For speaker identification, maximum posteriori
probability of a speaker model within a group of
S speakers.
For verification, a threshold value for the log-
likelihood probability of speaker has been set on
the adaptive basis.
.
24
Feature
Extraction
Feature
Matching Decision
Accept if
> Threshold
Reject if
< Threshold
4.2. System Implementation on FPGA
Mic Pre-
amplification
DC offset
shiter
Analog-to-
digital
conversion
Temporary
Buffer
Framing and
windowing
Fast Fourier
Transform
Mel
Spectrum
Log
Discrete
Cosine
Transform
MFCC
(UART)
Computer
(MATLAB)
25
4.2. System Implementation on FPGA(2)
 Sound Capture and Level Shifter
• The audio sound is captured using conditioner
microphone and amplified using Op-amp
• Dc offset of the input audio signal is shifted to 1.65
volt
 Analog to digital conversion and Digital to
analog conversion
• Spartan 3E FPGA board has ADC module having SPI
operation
• 14 bit ADC sample values are obtained from ADC at
the rate of 25000 samples per seconds.
26
4.2. System Implementation on FPGA(3)
 Double Data Rate SDRAM
- ADC Samples are stored in DDR SDRAM
temporarily before further processing.
- Burst mode 4 with burst length 2 i.e. 64
bits are written in SDRAM.
- Wishbone communication protocol is
used for communication with DDR SDRAM.
27
4.2. System Implementation on FPGA(4)
 Framing and windowing
 ADC samples stored in DDR are pre-
emphasized.
 50 % overlapped frames having frame
length of 512 samples are used.
 Fast Fourier Transform
 512 point Radix-2 Fast Fourier Transform is
done using Xilinx Logicore.
28
4.2. System Implementation on FPGA(5)
29
FFT timing diagram
4.2. System Implementation on FPGA(6)
 Mel-Spectrum
 Spectrum (linear scale) => Mel Spectrum
 Log calculation
 Natural log using look up tables .
 Input data : 24 bit
output : 12 bit
30
4.2. System Implementation on FPGA(7)
Discrete Cosine Transform (DCT)
 DCT core by poencores.org
 Input : 1 bit
Output : 16 bit parallel
Universal Asynchronous Receiver
Transmitter(UART)
 Baud rate of 19.2 kbps
 Each MFCC (32 bits) are divided into four
8-bit components.
 Implemented on unused pin in Jumper for
using UART protocol via CDC.
31
4.3. Further processing in Matlab
MFCCs are received in MATLAB in int32
format.
Training phase :MFCC feature vectors =>
Gaussian Mixture Model
Testing phase : MFCC feature vectors =>
posterior probability (Recognition).
32
5. RESULT AND ANALYSIS
33
5.1. Output in MATLAB
 Training data:31 speakers (male – 20, female-11)
 Testing data length= 10-30 seconds
 Training data length= 1-10 seconds
 No. of MFCCs= 8-20
 Up to 99% recognition when
testing data length= 30 seconds
training data length= 10 seconds
No. of MFCCs= 20
5.1. Output in MATLAB(2)
Amount of
Training Speech
Model order
(M)
Duration of Testing Speech
1 seconds 5 seconds 10 seconds
10 Seconds 8 51.3% 75.5% 82.9%
13 60.3% 83.5% 88.4%
20 64.7% 85.1% 90.4%
20 Seconds 8 67.3% 86.3% 93.6%
13 75.1% 95.1% 97.3%
20 78.3% 95.4% 97.4%
30 seconds 8 71.7% 95.5% 97.5%
13 79.2% 97.8% 98.5%
20 84.1% 98.1% 99.1%34
 Largest increase in performance when training data
increases from 10 to 20 sec. Increasing to 30 sec
improves the performance with little increment
 At most 30 sec of speech to maintain high
performance.
 Abrupt change in performance on increasing testing
speech duration from 1 to 5 seconds. Only slight
increase in performance when increased from 5
seconds to 10 seconds.
 Using more training data improves the performance .
35
5.1. Output in MATLAB(3)
 77% unknown female voice is matched with
female voice 85% unknown male voice is matched
with male voice.
 During the experiments, 4 languages English,
Nepali and Hindi, German - correct speaker
recognition regardless of the spoken text and
language.
36
5.1. Output in MATLAB(4)
 Total Error Rate (TER) = FAR + FRR
 Threshold for speaker verification was
calculated empirically using FAR and FRR.
.37
5.1. Output in MATLAB(5)
5.2. Output Analysis in FPGA
Recognition rate less than that of software
implementation.
overall resource utilization in FPGA :
i. RAMs : 7
ii. ROMs : 3
iii. Multipliers : 15
iv. Adders/ Subtractors : 18
v. Counters : 9
vi. Registers : 132
vii. Comparators : 20
viii. Multiplexers : 238
Device Utilization summary
Logic utilization Used Available Utilizations
Number of Slice Flip-Flops 8225 9312 88%
Number of 4 input LUTs 8734 9312 93%
Number of occupied Slices 2355 4656 54%
Number of Slices containing only related
logic
1325 1325 100%
Number of Slices containing unrelated logic 0 1325 0%
Total Number of 4 inputs LUTs 8903 9312 94%
Number of bonded IOBs 215 232 94%
Number of RAMB16s 7 20 35%
Number of BUFGMUXs 2 24 8%
Number of MULT18X18SIOs 15 20 75%
Average Fanout of Non-Clock Nets 272
39
5.2. Output Analysis in FPGA (2)
Security
• Forensics for
voice sample
matching
• Transaction
authentication
• Toll fraud
prevention
Information and
physical facilities
• Telephone
credit card
purchases
• Remote time
and attendance
logging
• Information
retrieval
• Audio indexing
• Voice dialing
and voice mail
Monitoring
• Access control
• Access to
confidential
information
areas
• Computer and
data networks
• Remote access
of computers
40
6. APPLICATIONS
41
 Duration of speech signal limits the
performance .
 The intrusion based on voice imitation
cannot be detected.
 Optimal number of model order.
The silence removal process is not efficient.
7. LIMITATION
limited resources in the Spartan 3E.
Lack of sufficient block RAM & ROM memory.
Synchronization problem of different
modules/components.
42
8. PROBLEM FACED
The system has been implemented using
MFCC for feature extraction and GMM to
model the speakers.
The performance of software
implementation of systems is very good.
The implementation in FPGA is not
satisfactory
Noise reduction algorithms can be used to
improve the performance of the system.
43
9. CONCLUSION
THANK YOU
44

More Related Content

What's hot

Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping pptciciapaul
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICEVamshidharSingh
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptxJoyChowdhury30
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
 
Channel Equalisation
Channel EqualisationChannel Equalisation
Channel EqualisationPoonan Sahoo
 

What's hot (20)

Biometric Voting System
Biometric Voting SystemBiometric Voting System
Biometric Voting System
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping ppt
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICE
 
Bio Metric-E-Voting
Bio Metric-E-VotingBio Metric-E-Voting
Bio Metric-E-Voting
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Advancements in Neural Vocoders
Advancements in Neural VocodersAdvancements in Neural Vocoders
Advancements in Neural Vocoders
 
Stft vs. mfcc
Stft vs. mfccStft vs. mfcc
Stft vs. mfcc
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Keystroke dynamics
Keystroke dynamicsKeystroke dynamics
Keystroke dynamics
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
1 (1)
1 (1)1 (1)
1 (1)
 
Ec8791 lpc2148 timer unit
Ec8791 lpc2148 timer unitEc8791 lpc2148 timer unit
Ec8791 lpc2148 timer unit
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Channel Equalisation
Channel EqualisationChannel Equalisation
Channel Equalisation
 

Viewers also liked

iVector vs GMM/UBM for Automatic Speaker Recognition system
iVector vs GMM/UBM for Automatic Speaker Recognition system iVector vs GMM/UBM for Automatic Speaker Recognition system
iVector vs GMM/UBM for Automatic Speaker Recognition system Walid Bouaffou
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemVani011
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
MFCC Malta - MICE Presentation 2017
MFCC Malta - MICE Presentation 2017MFCC Malta - MICE Presentation 2017
MFCC Malta - MICE Presentation 2017MICEboard
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGARajesh Roshan
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Saurab Dulal
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication systemNadeeshani Aththanagoda
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabSohaib Tallat
 
Speaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanSpeaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanAbhishek Mahajan
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Jason Yang
 
LiDAR processing for road network asset inventory
LiDAR processing for road network asset inventory LiDAR processing for road network asset inventory
LiDAR processing for road network asset inventory Conor Mc Elhinney
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 

Viewers also liked (20)

iVector vs GMM/UBM for Automatic Speaker Recognition system
iVector vs GMM/UBM for Automatic Speaker Recognition system iVector vs GMM/UBM for Automatic Speaker Recognition system
iVector vs GMM/UBM for Automatic Speaker Recognition system
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
MFCC Malta - MICE Presentation 2017
MFCC Malta - MICE Presentation 2017MFCC Malta - MICE Presentation 2017
MFCC Malta - MICE Presentation 2017
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGA
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
 
Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model Speaker Recognition using Gaussian Mixture Model
Speaker Recognition using Gaussian Mixture Model
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
Speaker identification based user authentication system
Speaker identification based user authentication systemSpeaker identification based user authentication system
Speaker identification based user authentication system
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice Identification And Recognition System, Matlab
Voice Identification And Recognition System, MatlabVoice Identification And Recognition System, Matlab
Voice Identification And Recognition System, Matlab
 
Speaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanSpeaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajan
 
Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10Pattern Mining To Unknown Word Extraction (10
Pattern Mining To Unknown Word Extraction (10
 
LiDAR processing for road network asset inventory
LiDAR processing for road network asset inventory LiDAR processing for road network asset inventory
LiDAR processing for road network asset inventory
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Object segmentation in images using EEG signals
Object segmentation in images using EEG signalsObject segmentation in images using EEG signals
Object segmentation in images using EEG signals
 

Similar to Text independent speaker recognition system

Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...IJERA Editor
 
Plan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systemsPlan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systemsTan Vo
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
VOICE CONTROLLED WHEELCHAIR using Amharic.pdf
VOICE CONTROLLED WHEELCHAIR using Amharic.pdfVOICE CONTROLLED WHEELCHAIR using Amharic.pdf
VOICE CONTROLLED WHEELCHAIR using Amharic.pdfMubarek kebede
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition finalArchit Vora
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Environmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsEnvironmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsTakuya Yoshioka
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMirjes
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMIJRES Journal
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Ojaswa Anand
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System IJCSIS Research Publications
 
Final presentation
Final presentationFinal presentation
Final presentationRohan Lad
 
Melp codec optimization using DSP kit
Melp codec optimization using DSP kitMelp codec optimization using DSP kit
Melp codec optimization using DSP kitsohaibaslam207
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
 

Similar to Text independent speaker recognition system (20)

Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
4g lte matlab
4g lte matlab4g lte matlab
4g lte matlab
 
Plan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systemsPlan_design and FPGA implement of MIMO OFDM SDM systems
Plan_design and FPGA implement of MIMO OFDM SDM systems
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
VOICE CONTROLLED WHEELCHAIR using Amharic.pdf
VOICE CONTROLLED WHEELCHAIR using Amharic.pdfVOICE CONTROLLED WHEELCHAIR using Amharic.pdf
VOICE CONTROLLED WHEELCHAIR using Amharic.pdf
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Environmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic modelsEnvironmentally robust ASR front end for DNN-based acoustic models
Environmentally robust ASR front end for DNN-based acoustic models
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEM
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEM
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Melp codec optimization using DSP kit
Melp codec optimization using DSP kitMelp codec optimization using DSP kit
Melp codec optimization using DSP kit
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Text independent speaker recognition system

  • 1. Project Members ASHOK SHARMA PAUDEL(066/BEX/405) DEEPESH LEKHAK(066/BEX/414) KESHAV BASHYAL(066/BEX/418) SUSHMA SHRESTHA(066/BEX/444) TEXT-INDEPENDENT SPEAKER RECOGNITION SYSTEM 1
  • 2. OVERVIEW OF PRESENTATION 1. Introduction 2. Objective 3. System Architecture 4. Methodology 5. Results and Analysis 6. Application area 7. Limitations 8. Problem Faced 9. Conclusion2
  • 3. 1. INTRODUCTION  Speech - universal method of communication.  Information through speech signal 1. high-level Characteristics -syntax, dialect, style, overall meaning of a spoken message. 2. low-level Characteristics- pitch and phonemic spectra associated much more with the physiology of vocal tract. 3
  • 5. 1. INTRODUCTION(3)  Speech is a diverse field with many applications. Speech Recognition Language Recognition Speaker Recognition Words Language Name Speaker Name “How are you?” English “ Deepesh” Speech Signal 5
  • 6. 1. INTRODUCTION (4) What is Speaker Recognition?  Recognition of who is speaking based on characteristics of their speech signal.  Text-independent , Text-dependent  Speaker Identification: Determines which registered speaker has spoken.  Speaker Verification: Accept or reject a claimed identity of a speaker. 6
  • 7. 1. INTRODUCTION (5) Biometric: a human generated signal or attribute for authenticating a person’s identity Why Voice ? – natural signal to produce – Only biometric that allows users to authenticate remotely. – does not require a specialized input device, Implementation cost is low – ubiquitous: telephones and microphone equipped PC 7
  • 8. Strongest security • Voice biometric with other forms of security – Something you have - badge – Something you are - voice HaveKnow Are – Something you know - password 1. INTRODUCTION(6) Why text independent speaker recognition ? - Independent of text, easy to access, cannot be forgotten or misplaced, - Independent of language, Acceptable by user8
  • 9. 2. OBJECTIVE The main goal of the project is to design and implement a text-independent speaker recognition system on FPGA. The specific goals can be summarized as: 1. To learn about digital signal processing and FPGA. 2. To implement and analyze the system in MATLAB. 3. To design and implement the system on FPGA. 9
  • 10. 3. SYSTEM ARCHITECTURE Universal Asynchronous Receiver Transmitter Mel-Frequency Cespstral Coefficients Mel-Spectrum Fast Fourier Transform Framing and Windowing Pre-emphasis Double Data Rate SDRAM Storage Analog to Digital Conversion Conditioning Input audio 10
  • 11. 4. METHODOLOGY Testing data Training data Input signal Feature extraction Feature matching Threshold Output 11
  • 12. 4.1. System Implementation on MATLAB 4.1.1. Voice Capturing and Storage -input through microphone, saved .wav format -sound used in format of 22050Hz, 16-bits PCM, Mono Channel. 12
  • 14. 4.1.2. Pre-Processing(2) 1)Silence removal 2) Pre- emphasis s’[n]=s[n]-a s[n-1] …… [1] [1] Shi-Huang Chen and Yu-Ren Luo, Speaker Verification Using MFCC and Support Vector Machine14
  • 15. 4.1.2. Pre-Processing(3) 1)Silence removal 2) Pre-emphasis 3)Framing •Overlapping frames - frame block of 23.22ms with 50% overlapping i.e., 512 samples per frame 15
  • 16. 4.1.2. Pre-Processing(4) 1)Silence removal 2) Pre-emphasis 3)Framing 4)Windowing x[n] = s’[n] . w[n-m] if n=0,1,2,…,N-1 if n=m,m+1,…..m+N-1 [2] [2] Shi-Huang Chen and Yu-Ren Luo , Speaker Verification Using MFCC and Support Vector Machine16
  • 17. 4.1.3. Feature Extraction using MFCC MFCC : Mel Filter Cepstral Coefficients  Perceptual approach the human perception of speech, are applied to the sample frames extract the features of speech. Steps for calculating MFCC 1. Discrete Fourier Transform using FFT and Power spectrum , X[k]|2 of signal 17
  • 18. 4.1.3. Feature Extraction using MFCC(2) 2. Mel scaling Mel scale : linear up to 1 KHz and logarithmic after 1 KHz . Mapping the powers of the spectrum onto the Mel scale, using Mel filter bank-Mel spectral coefficients G[k] Filter bank: overlapping windows 18
  • 19. 4.1.3. Feature Extraction using MFCC(3) 3.log of Mel spectral coefficients has been taken log(G[k]). 4. Discrete Cosine Transform (DCT) ->Mel-cepstrum c[q]. (Source: Shi-Huang Chen and Yu-Ren Luo , Speaker Verification Using MFCC and Support Vector Machine) (3.4) 19
  • 20. 4.1.3. Feature Extraction using MFCC(4) mel cepstrum mel spectrum framecontinuous speech Frame Blocking Windowing FFT spectrum Mel-frequency Wrapping Cepstrum 20
  • 21. 4.1.4. Feature Matching using GMM Gaussian Mixture Model Parametric probability density function Based on soft clustering technique Mixture of Gaussian components 21
  • 22. 4.1.4. Feature Matching using GMM(2) •GMM Training 22
  • 23. 4.1.4. Feature Matching using GMM(3) The GMM modeling process consists of two steps: 1. Initialization : Initial value of mean, covariance & weight assigned. 2. Expectation Maximization(EM) Value of mean, covariance & weight calculated adaptively by finding maximum likelihood of parameters.23
  • 24. 4.1.5. Identification & Verification For speaker identification, maximum posteriori probability of a speaker model within a group of S speakers. For verification, a threshold value for the log- likelihood probability of speaker has been set on the adaptive basis. . 24 Feature Extraction Feature Matching Decision Accept if > Threshold Reject if < Threshold
  • 25. 4.2. System Implementation on FPGA Mic Pre- amplification DC offset shiter Analog-to- digital conversion Temporary Buffer Framing and windowing Fast Fourier Transform Mel Spectrum Log Discrete Cosine Transform MFCC (UART) Computer (MATLAB) 25
  • 26. 4.2. System Implementation on FPGA(2)  Sound Capture and Level Shifter • The audio sound is captured using conditioner microphone and amplified using Op-amp • Dc offset of the input audio signal is shifted to 1.65 volt  Analog to digital conversion and Digital to analog conversion • Spartan 3E FPGA board has ADC module having SPI operation • 14 bit ADC sample values are obtained from ADC at the rate of 25000 samples per seconds. 26
  • 27. 4.2. System Implementation on FPGA(3)  Double Data Rate SDRAM - ADC Samples are stored in DDR SDRAM temporarily before further processing. - Burst mode 4 with burst length 2 i.e. 64 bits are written in SDRAM. - Wishbone communication protocol is used for communication with DDR SDRAM. 27
  • 28. 4.2. System Implementation on FPGA(4)  Framing and windowing  ADC samples stored in DDR are pre- emphasized.  50 % overlapped frames having frame length of 512 samples are used.  Fast Fourier Transform  512 point Radix-2 Fast Fourier Transform is done using Xilinx Logicore. 28
  • 29. 4.2. System Implementation on FPGA(5) 29 FFT timing diagram
  • 30. 4.2. System Implementation on FPGA(6)  Mel-Spectrum  Spectrum (linear scale) => Mel Spectrum  Log calculation  Natural log using look up tables .  Input data : 24 bit output : 12 bit 30
  • 31. 4.2. System Implementation on FPGA(7) Discrete Cosine Transform (DCT)  DCT core by poencores.org  Input : 1 bit Output : 16 bit parallel Universal Asynchronous Receiver Transmitter(UART)  Baud rate of 19.2 kbps  Each MFCC (32 bits) are divided into four 8-bit components.  Implemented on unused pin in Jumper for using UART protocol via CDC. 31
  • 32. 4.3. Further processing in Matlab MFCCs are received in MATLAB in int32 format. Training phase :MFCC feature vectors => Gaussian Mixture Model Testing phase : MFCC feature vectors => posterior probability (Recognition). 32
  • 33. 5. RESULT AND ANALYSIS 33 5.1. Output in MATLAB  Training data:31 speakers (male – 20, female-11)  Testing data length= 10-30 seconds  Training data length= 1-10 seconds  No. of MFCCs= 8-20  Up to 99% recognition when testing data length= 30 seconds training data length= 10 seconds No. of MFCCs= 20
  • 34. 5.1. Output in MATLAB(2) Amount of Training Speech Model order (M) Duration of Testing Speech 1 seconds 5 seconds 10 seconds 10 Seconds 8 51.3% 75.5% 82.9% 13 60.3% 83.5% 88.4% 20 64.7% 85.1% 90.4% 20 Seconds 8 67.3% 86.3% 93.6% 13 75.1% 95.1% 97.3% 20 78.3% 95.4% 97.4% 30 seconds 8 71.7% 95.5% 97.5% 13 79.2% 97.8% 98.5% 20 84.1% 98.1% 99.1%34
  • 35.  Largest increase in performance when training data increases from 10 to 20 sec. Increasing to 30 sec improves the performance with little increment  At most 30 sec of speech to maintain high performance.  Abrupt change in performance on increasing testing speech duration from 1 to 5 seconds. Only slight increase in performance when increased from 5 seconds to 10 seconds.  Using more training data improves the performance . 35 5.1. Output in MATLAB(3)
  • 36.  77% unknown female voice is matched with female voice 85% unknown male voice is matched with male voice.  During the experiments, 4 languages English, Nepali and Hindi, German - correct speaker recognition regardless of the spoken text and language. 36 5.1. Output in MATLAB(4)
  • 37.  Total Error Rate (TER) = FAR + FRR  Threshold for speaker verification was calculated empirically using FAR and FRR. .37 5.1. Output in MATLAB(5)
  • 38. 5.2. Output Analysis in FPGA Recognition rate less than that of software implementation. overall resource utilization in FPGA : i. RAMs : 7 ii. ROMs : 3 iii. Multipliers : 15 iv. Adders/ Subtractors : 18 v. Counters : 9 vi. Registers : 132 vii. Comparators : 20 viii. Multiplexers : 238
  • 39. Device Utilization summary Logic utilization Used Available Utilizations Number of Slice Flip-Flops 8225 9312 88% Number of 4 input LUTs 8734 9312 93% Number of occupied Slices 2355 4656 54% Number of Slices containing only related logic 1325 1325 100% Number of Slices containing unrelated logic 0 1325 0% Total Number of 4 inputs LUTs 8903 9312 94% Number of bonded IOBs 215 232 94% Number of RAMB16s 7 20 35% Number of BUFGMUXs 2 24 8% Number of MULT18X18SIOs 15 20 75% Average Fanout of Non-Clock Nets 272 39 5.2. Output Analysis in FPGA (2)
  • 40. Security • Forensics for voice sample matching • Transaction authentication • Toll fraud prevention Information and physical facilities • Telephone credit card purchases • Remote time and attendance logging • Information retrieval • Audio indexing • Voice dialing and voice mail Monitoring • Access control • Access to confidential information areas • Computer and data networks • Remote access of computers 40 6. APPLICATIONS
  • 41. 41  Duration of speech signal limits the performance .  The intrusion based on voice imitation cannot be detected.  Optimal number of model order. The silence removal process is not efficient. 7. LIMITATION
  • 42. limited resources in the Spartan 3E. Lack of sufficient block RAM & ROM memory. Synchronization problem of different modules/components. 42 8. PROBLEM FACED
  • 43. The system has been implemented using MFCC for feature extraction and GMM to model the speakers. The performance of software implementation of systems is very good. The implementation in FPGA is not satisfactory Noise reduction algorithms can be used to improve the performance of the system. 43 9. CONCLUSION