SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
DOI : 10.5121/ijnlc.2013.2205 51
EFFECT OF MFCC BASED FEATURES FOR SPEECH
SIGNAL ALIGNMENTS
Jang Bahadur Singh, Radhika Khanna and Parveen Lehana*
Department of Physics and Electronics, University of Jammu, Jammu, India
E-mail: sonajbs@gmail.com, pklehanajournals@gmail.com.
ABSTRACT
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
KEYWORDS
Speech recognition, Speech transformation, MFCC, HNM, DTW
1. INTRODUCTION
Speech is one of the most dominating and natural means of communicate or express thoughts,
ideas, and emotions between individuals [1]. It is a complicated signal, naturally produced by
human beings because various processes are involved in the generation of speech signal. As a
result, verbal communication can be varied extensively in terms of their accent, pronunciation,
articulation, nasality, pitch, volume, and speed [2], [3].
Speech signal processing is one of the biggest developing areas of research in signal processing.
The aim of digital speech signal processing is to take advantage of digital computing techniques
to process the speech signal for increased understanding, improved communication, and increased
efficiency and productivity associated with speech activities. The major fields of speech signal
processing consists of speech synthesis, speech recognition, and speech/speaker transformation.
In order to extract features from speech signal it has to be divided into fixed frames called
windows. The types of the window function used are Hanning, Hamming, cosine, half
parallelogram, not with right angles. Length of frame can be varied, depending upon nature of
feature vectors to be extracted [4] Overlapping of two consecutive windows is done in order to
maintain continuity. Feature extraction approaches mostly depend upon modeling human voice
production and modeling perception system [5]. They provide a compressed representation of the
speech signal by extracting a sequence of feature vectors [6].
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
52
2. SPEECH RECOGNITION
It is an alternative and effective method of interacting with a machine. It is also known as
automatic speech recognition (ASR) converts spoken language in text. It has been two decades
since the ASR have started moving from research labs to real-world. Speech recognition is more
difficult than speech generation because of the fact that computers can store and recall enormous
amounts of data, perform mathematical computations at very high speed, and do repetitive tasks
without losing any type of efficiency. Because of these limitations, the accuracy of speech
recognition is reduced. There are two main steps for speech recognition: feature extraction and
feature matching. Isolated words of the input speech signal is analysed to obtain the speech
parameters using Mel frequency cepstral coefficients (MFCC) or line spectral frequencies (LSF).
These parameters provide the information related to the dynamically changing vocal tract during
speech production. These parameters are then matched up to with earlier pattern of spoken words
to recognize the closest match. Similar steps may be used for identity matching of a given speaker
[7]. There are two types of speech recognition: speaker-dependent and speaker-independent.
Speaker-dependent technique works by learning the distinctiveness of a single speaker like in
case of voice recognition, while speaker-independent systems involves no training as they are
designed to recognize anyone's voice. As the acoustic spaces of the speakers are
multidimensional, reduction of their dimensionality is very important [8]. The most common
method for training of the speaker recognition system is hidden Markov model (HNM) and its
latest variant is (HMM-GMM) [9]. For better alignment or matching, normalization of the sub-
band temporal modulation envelopes may be used [17]. The main factors responsible for the
stagnation in the fields of speech recognition are environmental noise, channel distortion, and
speaker variability [10], [11], [12].
Fig. 1 Blocks of speech recognition as pattern matching problem [13].
Let us consider simple speech recognition as a pattern matching problem. As shown in fig. 1, the
system accepts an input speech waveform. The feature extraction sub-block converts input speech
waveform to a set of feature vectors. These vectors represent speech characteristics/features
suitable for detection and then match up to it with reference patterns to find a closest sample. If
the input speech produced by the alike speaker or similar environmental condition from the
reference patterns, the system is likely to find the correct pattern otherwise not [13], [14]. The few
valuable approaches used for feature extraction are: full-band and subband energies [15],
spectrum divergence between speech signal and noise in background [16], pitch estimation [17],
zero crossing rate [18], and higher-order statistics [19], [20], [21], [22]. However, using long-
term speech information [23], [24] has shown considerable benefits for identifying speech signal
in noisy environments. Several methods used for the formulation of the rules in decision module
based on distance measures techniques like the Euclidean distance [25] Itakura-Saito and
Kullback-Leibler divergence [26]. Some other method are based on fuzzy logic [27] [support
vector machines (SVM) [28] and genetic algorithms [29], [30].
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
53
3. METHODOLOGY
The analysis process of the speech signals at phrase, word and phoneme levels was carried out
with the raw recordings of six speakers in Hindi language. Speakers were of different age group,
from different regions of Jammu. Sony recorder (ICD-UX513F) has been used for the recording
of speech. It is a 4GB UX Digital Voice Recorder with expandable memory capabilities, and
provides high quality voice recording. The investigation further includes the segmentation,
labeling of the recorded speech at phrase, word and phoneme levels. Fig. 2 shows sub-section of
speech signal waveform and spectrogram labeled at phoneme level. Feature extraction and lastly
alignment of source and target feature vectors using DTW technique and then calculating
Mahalanobis distance between them.
Fig. 2 Sub-section of speech signal waveform and spectrogram labeled at phoneme level.
Thus main experiment is separated into two main sections. In first section, segmented source and
target speech, features vectors were extracted using MFCC algorithms. The features extracted
were aligned separately by means of DTW techniques and alignment error was calculated using
Mahalanobis distance. In second section, segmented source and target speech were analysed first
by HNM module in order to obtain the HNM parameters afterward MFCC features were
calculated. The extracted features were aligned by means of DTW techniques and alignment error
was calculated using Mahalanobis distance. These steps were implemented on the phrase, word
and phoneme levels of the speech signal.
4. RESULTS AND DISCUSSION
In order to analyse the results based on the techniques namely MFCC, HNM, and DTW for the
alignment of segmented speech at phrase, word, and phoneme levels. These investigations were
carried out with male to male, female to male, and female to female speaker combinations. In
order to compare alignment techniques, mean and standard deviation of Mahalanobis distances
were calculated. Alignment error using Mahalanobis distances of various male-male
combinations are represented from fig. 3 to fig. 5. The various combinations of speaker pair’s use
as source and targets were written on the bottom of each plot, e.g. aman-s-nakul represents source
target alignment without using HNM at sentence level while aman-sh-nakul represents source
target alignment using HNM at sentence level.
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
54
From the below graphs it can easily be analysed that alignment error at all segmented level of
speech decreases by using HNM model, as Mahalanobis distances reduces. Therefore using HNM
model, alignment can further be improved in comparison with the feature extraction method
based on MFCC only. Decrease in the Mahalanobis distances means better alignment between the
two sequences.
The overall comparisons of the speech alignment were shown in form of bar graphs. It can be
well estimated that better speech alignment can be achieved at sentence level segmentation rather
than word level or phoneme level segmentation of speech.
Fig. 3 MFCC based speech alignment errors using Mahalanobis distance at sentence level with male
to male combination. The first column shows the results for alignment without the use of HNM
based method and second column shows the results for HNM based method.
Fig. 4 MFCC based speech alignment errors using Mahalanobis distance at word level with male to
male combination. The first column shows the results for alignment without the use of HNM based
method and second column shows the results for HNM based method
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
55
Fig. 5 MFCC based speech alignment errors using Mahalanobis distance at phoneme level with male to
male combination. The first column shows the results for alignment without the use of HNM based method
and second column shows the results for HNM based method.
Fig. 6 Bar graph illustrate averaged MFCC based mean along standard deviation speech alignment errors
using Mahalanobis distance at sentence, word and phoneme level with all male to male, female to male and
female to female combinations. The symbol s, w, p shows the results for alignment without the use of HNM
based method and the symbol s”, w”, p” shows the results for HNM based method.
5. CONCLUSION
The speech signals in the form of twenty phrases were recorded from each of six speakers.
Investigations were carried out with three different combinations of male to male, female to male,
and female to female speakers. The feature vectors are extracted using MFCC, and HNM
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
56
techniques. Speech alignment errors using Mahalanobis distances for labeled phrases, words, and
phonemes, aligned by means of DTW were calculated. From the analysis of the results it can be
observed that alignment error with HNM model decreases at all the levels of labeled speech
levels. Therefore implementing HNM model alignment error can further be reduced in
compression with the feature extraction method based on MFCC only. Reduction in alignment
error means better alignment or matching between two speech signals. Thus this research work is
concluded as follows in brief:
1) HNM based alignment is more promising than other existing techniques.
2) The accuracy of speech recognition cannot be considerably increased even labeling the
phrases at phoneme level. Effective speech alignment can be obtained at phrase level also
which save our valuable time and unnecessary step in algorithm.
3) It is remarkable to note that speech alignment error in case of female-male is much larger
than other combinations of male-male and female-female. Therefore such combination
must be avoided in speech recognition and speaker transformation.
REFERENCES
[1] Zaher A A (2009) “Recent Advances in Signal Processing”, Publisher: InTech, pp. 544.
[2] Tebelskis J (1995) “Speech Recognition using Neural Networks”, Ph.D. dissertation, Dept. School
of Computer Science, Carnegie Mellon University Pittsburgh, Pennsylvania.
[3] http://www.learnartificialneuralnetworks.com/speechrecognition.html
[4] Lindholm S (2010) “A speech recognition system for Swedish running on Android”.
[5] Ursin M (2002) “Triphone Clustering in Finnish Continuous Speech Recognition”, M.S thesis,
Department of Computer Science, Helsinki University of Technology, Finland,
[6] Trabelsi I and Ayed D Ben (2012) “On the Use of Different Feature Extraction Methods for Linear
and Non Linear kernels”, Proc. IEEE, Sciences of Electronic, Technologies of Information and
Telecommunications.
[7] Rabiner L R (1989) “A tutorial on Hidden Markov Models and selected applications in speech
recognition".
[8] Wang1 X & Paliwal K K (2003) “Feature extraction and dimensionality reduction algorithms and
their applications in vowel recognition,” Pattern Recognition, vol. 36, pp. 2429.
[9] Muller F & Mertins A (2011) “Contextual invariant-intergration features for improved speaker-
independent speech recognition”, Speech communication, pp. 830.
[10] Lu X, Unoki M & Nakamura S (2011) “Sub-band temporal modulation envelopes and their
normalization for automatic speech recognition in reverberant environments”, Computer Speech
and Language, pp. 571.
[11] Mporas I, Ganchev T, Kacsis O & Fakotakis N (2011) “Context-adaptive pre-processing scheme
for robust speech recognition in fast-varying noise environments”, Signal Processing, vol. 91, pp.
2101.
[12] Dai P & Soon I Y (2011) “A temporal warped 2D psychoacoustic modeling for robust speech
recognition system”, Speech Communication, vol. 53, pp. 229.
[13] Rabiner L R & Juang B H (1993) “Fundamentals of Speech Recognition”, Prentice-Hall.
[14] Doh S J (2000) “Enhancements to Transformation-Based Speaker Adaptation: Principal
Component and Inter-Class Maximum Likelihood Linear Regression”, PhD, Department of
Electrical and Computer Engineering, Carnegie Mellon University Pittsburgh.
[15] Woo K, Yang T, Park K & Lee C (2000) “Robust voice activity detection algorithm for estimating
noise spectrum”, Electronics Letters, vol. 36, pp. 180.
[16] Marzinzik M & Kollmeier B (2002) “Speech pause detection for noise spectrum estimation by
tracking power envelope dynamics”, IEEE Trans. Speech Audio Processing, vol. 10, pp. 341.
[17] Tucker R (1992), “Voice activity detection using a periodicity measure”, Proc. Institute of
Electronic Engineering, vol. 139, pp. 377.
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
57
[18] Bachu R G, Kopparthi S, Adapa B. & Barkana B D (2008) “Separation of Voiced and Unvoiced
Using Zero Crossing Rate and Energy of the Speech Signal”, Proc., American Society for
Engineering Education, Zone Conference, pp. 1.
[19] Nemer E, Goubran R & Mahmoud S (2001) “Robust voice activity detection using higher order
statistics in the lpc residual domain”, IEEE Trans. Speech Audio Processing, vol. 9, pp. 217.
[20] Ramirez J, Gorriz J, Segura J C, Puntonet C G & Rubio A (2006) “Speech/Non-speech
Discrimination based on Contextual Information Integrated Bispectrum LRT”, IEEE Signal
Processing Letters, vol. 13, pp. 497.
[21] Gorriz J M, Ramirez J, Puntonet C G & Segura J C (2006) “Generalized LRT-based voice activity
detector”, IEEE Signal Processing Letters, vol. 13, pp. 636.
[22] Ramírez J, Gorriz J M & Segura J C (2007) “Statistical Voice Activity Detection Based on
Integrated Bispectrum Likelihood Ratio Tests”, Journal of the Acoustical Society of America, vo.
121, pp. 2946,
[23] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2004), “Efficient Voice Activity Detection
Algorithms Using Long-Term Speech Information”, Speech Communication, vol. 42, pp. 271.
[24] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2005) “An Effective Subband OSF-based
VAD with Noise Reduction for Robust Speech Recognition”, IEEE Transactions on Speech and
Audio Processing, vol. 13, pp. 1119.
[25] Gorriz J M, Ramirez J, Segura J C & Puntonet C G (2006) “An effective cluster-based model for
robust speech detection and speech recognition in noisy environments”, Journal of the Acoustical
Society of America, vol. 120, pp. 470.
[26] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2004) “A New Kullback- Leibler VAD for
Robust Speech Recognition”, IEEE Signal Processing Letters, vol. 11, pp. 266.
[27] Beritelli F, Casale S, Rugeri G & Serrano S (2002) “Performance evaluation and comparison of
G.729/AMR/fuzzy voice activity detectors”, IEEE Signal Processing Letters, vol. 9, pp. 85.
[28] Ramirez J, Yelamos P, Gorriz J M & Segura J C (2006) “SVM-based Speech Endpoint Detection
Using Contextual Speech Features”, IEE Electronics Letters, vol. 42.
[29] Estevez P A, Yoma N, Boric N & Ramirez J A (2005) “Genetic programming based voice activity
detection”, Electronics Letters, vol. 41, pp. 1141.
[30] Ramirez J, Gorriz J M & Segura J C (2007) “Voice Activity Detection. Fundamentals and Speech
Recognition System Robustness”, Robust Speech Recognition and Understanding, I-TECH
Education and Publishing, pp. 1.

Contenu connexe

Tendances

Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Alexander Decker
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancementIAESIJEECS
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in androidAnshuli Mittal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 

Tendances (17)

Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
52 57
52 5752 57
52 57
 
01 8445 speech enhancement
01 8445 speech enhancement01 8445 speech enhancement
01 8445 speech enhancement
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 

En vedette

Presentación Estadística
Presentación EstadísticaPresentación Estadística
Presentación Estadísticaaamutuy
 
Aportes políticos y pedagócios
Aportes políticos y pedagóciosAportes políticos y pedagócios
Aportes políticos y pedagóciosgisepetracca
 
Product Work Log
Product Work Log Product Work Log
Product Work Log crobert18
 
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...CIM Grupo de Formación
 
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali, ...
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali,  ...Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali,  ...
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali, ...ProColombia
 
Instruccions inici de curs sec 12 13 (1)
Instruccions inici de curs sec  12 13 (1)Instruccions inici de curs sec  12 13 (1)
Instruccions inici de curs sec 12 13 (1)equiplicvoc
 
Workplace Pandemic
Workplace PandemicWorkplace Pandemic
Workplace Pandemicmulfingerj
 
13784156 Drug Decriminalization In Portugal Lessons For Creating Fair A...
13784156  Drug  Decriminalization In  Portugal  Lessons For  Creating  Fair A...13784156  Drug  Decriminalization In  Portugal  Lessons For  Creating  Fair A...
13784156 Drug Decriminalization In Portugal Lessons For Creating Fair A...Edward Dobson
 
Grace abbinante miss peregrine's home for peculiar children
Grace abbinante miss peregrine's home for peculiar childrenGrace abbinante miss peregrine's home for peculiar children
Grace abbinante miss peregrine's home for peculiar childrenRochesspp
 
Final INTED 2013 presentation 040313
Final INTED 2013 presentation 040313Final INTED 2013 presentation 040313
Final INTED 2013 presentation 040313Halafi01
 
Analisis de articulo Nativos Digitales
Analisis de articulo Nativos DigitalesAnalisis de articulo Nativos Digitales
Analisis de articulo Nativos DigitalesMARINELYS
 
A. 150 herramientas para crear materiales educativos con tics.
A.  150 herramientas para crear materiales educativos con tics.A.  150 herramientas para crear materiales educativos con tics.
A. 150 herramientas para crear materiales educativos con tics.imac_angel
 
Usos PedagóGicos Del Blog De Aula
Usos PedagóGicos Del Blog De AulaUsos PedagóGicos Del Blog De Aula
Usos PedagóGicos Del Blog De AulaJorge Halpern
 

En vedette (20)

Bachelorthese
BachelortheseBachelorthese
Bachelorthese
 
Presentación Estadística
Presentación EstadísticaPresentación Estadística
Presentación Estadística
 
Construct Gmay10
Construct Gmay10Construct Gmay10
Construct Gmay10
 
The Country Brand of Portugal
The Country Brand of PortugalThe Country Brand of Portugal
The Country Brand of Portugal
 
Mobi360 user guide
Mobi360 user guideMobi360 user guide
Mobi360 user guide
 
Aportes políticos y pedagócios
Aportes políticos y pedagóciosAportes políticos y pedagócios
Aportes políticos y pedagócios
 
Product Work Log
Product Work Log Product Work Log
Product Work Log
 
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...
CIM Formación: Legislación sobre animales de compañía de la Generalitat Valen...
 
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali, ...
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali,  ...Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali,  ...
Oportunidades alianza del pacífico para perú, manufacturas y prendas, cali, ...
 
Instruccions inici de curs sec 12 13 (1)
Instruccions inici de curs sec  12 13 (1)Instruccions inici de curs sec  12 13 (1)
Instruccions inici de curs sec 12 13 (1)
 
Workplace Pandemic
Workplace PandemicWorkplace Pandemic
Workplace Pandemic
 
Syllabus
SyllabusSyllabus
Syllabus
 
Grammar
GrammarGrammar
Grammar
 
13784156 Drug Decriminalization In Portugal Lessons For Creating Fair A...
13784156  Drug  Decriminalization In  Portugal  Lessons For  Creating  Fair A...13784156  Drug  Decriminalization In  Portugal  Lessons For  Creating  Fair A...
13784156 Drug Decriminalization In Portugal Lessons For Creating Fair A...
 
Grace abbinante miss peregrine's home for peculiar children
Grace abbinante miss peregrine's home for peculiar childrenGrace abbinante miss peregrine's home for peculiar children
Grace abbinante miss peregrine's home for peculiar children
 
Final INTED 2013 presentation 040313
Final INTED 2013 presentation 040313Final INTED 2013 presentation 040313
Final INTED 2013 presentation 040313
 
Lynbrook | Module #11 - Beating Stress
Lynbrook | Module #11 - Beating StressLynbrook | Module #11 - Beating Stress
Lynbrook | Module #11 - Beating Stress
 
Analisis de articulo Nativos Digitales
Analisis de articulo Nativos DigitalesAnalisis de articulo Nativos Digitales
Analisis de articulo Nativos Digitales
 
A. 150 herramientas para crear materiales educativos con tics.
A.  150 herramientas para crear materiales educativos con tics.A.  150 herramientas para crear materiales educativos con tics.
A. 150 herramientas para crear materiales educativos con tics.
 
Usos PedagóGicos Del Blog De Aula
Usos PedagóGicos Del Blog De AulaUsos PedagóGicos Del Blog De Aula
Usos PedagóGicos Del Blog De Aula
 

Similaire à EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS

Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech SynthesisCSCJournals
 
Machine learning for Arabic phonemes recognition using electrolarynx speech
Machine learning for Arabic phonemes recognition using  electrolarynx speechMachine learning for Arabic phonemes recognition using  electrolarynx speech
Machine learning for Arabic phonemes recognition using electrolarynx speechIJECEIAES
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingiosrjce
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identificationsipij
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifiereSAT Journals
 

Similaire à EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS (20)

Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
Machine learning for Arabic phonemes recognition using electrolarynx speech
Machine learning for Arabic phonemes recognition using  electrolarynx speechMachine learning for Arabic phonemes recognition using  electrolarynx speech
Machine learning for Arabic phonemes recognition using electrolarynx speech
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
H010625862
H010625862H010625862
H010625862
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identification
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifier
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifier
 

Dernier

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Dernier (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 DOI : 10.5121/ijnlc.2013.2205 51 EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS Jang Bahadur Singh, Radhika Khanna and Parveen Lehana* Department of Physics and Electronics, University of Jammu, Jammu, India E-mail: sonajbs@gmail.com, pklehanajournals@gmail.com. ABSTRACT The fundamental techniques used for man-machine communication include Speech synthesis, speech recognition, and speech transformation. Feature extraction techniques provide a compressed representation of the speech signals. The HNM analyses and synthesis provides high quality speech with less number of parameters. Dynamic time warping is well known technique used for aligning two given multidimensional sequences. It locates an optimal match between the given sequences. The improvement in the alignment is estimated from the corresponding distances. The objective of this research is to investigate the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals in the form of twenty five phrases were recorded. The recorded material was segmented manually and aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has been seen that effective speech alignment can be carried out even at phrase level. KEYWORDS Speech recognition, Speech transformation, MFCC, HNM, DTW 1. INTRODUCTION Speech is one of the most dominating and natural means of communicate or express thoughts, ideas, and emotions between individuals [1]. It is a complicated signal, naturally produced by human beings because various processes are involved in the generation of speech signal. As a result, verbal communication can be varied extensively in terms of their accent, pronunciation, articulation, nasality, pitch, volume, and speed [2], [3]. Speech signal processing is one of the biggest developing areas of research in signal processing. The aim of digital speech signal processing is to take advantage of digital computing techniques to process the speech signal for increased understanding, improved communication, and increased efficiency and productivity associated with speech activities. The major fields of speech signal processing consists of speech synthesis, speech recognition, and speech/speaker transformation. In order to extract features from speech signal it has to be divided into fixed frames called windows. The types of the window function used are Hanning, Hamming, cosine, half parallelogram, not with right angles. Length of frame can be varied, depending upon nature of feature vectors to be extracted [4] Overlapping of two consecutive windows is done in order to maintain continuity. Feature extraction approaches mostly depend upon modeling human voice production and modeling perception system [5]. They provide a compressed representation of the speech signal by extracting a sequence of feature vectors [6].
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 52 2. SPEECH RECOGNITION It is an alternative and effective method of interacting with a machine. It is also known as automatic speech recognition (ASR) converts spoken language in text. It has been two decades since the ASR have started moving from research labs to real-world. Speech recognition is more difficult than speech generation because of the fact that computers can store and recall enormous amounts of data, perform mathematical computations at very high speed, and do repetitive tasks without losing any type of efficiency. Because of these limitations, the accuracy of speech recognition is reduced. There are two main steps for speech recognition: feature extraction and feature matching. Isolated words of the input speech signal is analysed to obtain the speech parameters using Mel frequency cepstral coefficients (MFCC) or line spectral frequencies (LSF). These parameters provide the information related to the dynamically changing vocal tract during speech production. These parameters are then matched up to with earlier pattern of spoken words to recognize the closest match. Similar steps may be used for identity matching of a given speaker [7]. There are two types of speech recognition: speaker-dependent and speaker-independent. Speaker-dependent technique works by learning the distinctiveness of a single speaker like in case of voice recognition, while speaker-independent systems involves no training as they are designed to recognize anyone's voice. As the acoustic spaces of the speakers are multidimensional, reduction of their dimensionality is very important [8]. The most common method for training of the speaker recognition system is hidden Markov model (HNM) and its latest variant is (HMM-GMM) [9]. For better alignment or matching, normalization of the sub- band temporal modulation envelopes may be used [17]. The main factors responsible for the stagnation in the fields of speech recognition are environmental noise, channel distortion, and speaker variability [10], [11], [12]. Fig. 1 Blocks of speech recognition as pattern matching problem [13]. Let us consider simple speech recognition as a pattern matching problem. As shown in fig. 1, the system accepts an input speech waveform. The feature extraction sub-block converts input speech waveform to a set of feature vectors. These vectors represent speech characteristics/features suitable for detection and then match up to it with reference patterns to find a closest sample. If the input speech produced by the alike speaker or similar environmental condition from the reference patterns, the system is likely to find the correct pattern otherwise not [13], [14]. The few valuable approaches used for feature extraction are: full-band and subband energies [15], spectrum divergence between speech signal and noise in background [16], pitch estimation [17], zero crossing rate [18], and higher-order statistics [19], [20], [21], [22]. However, using long- term speech information [23], [24] has shown considerable benefits for identifying speech signal in noisy environments. Several methods used for the formulation of the rules in decision module based on distance measures techniques like the Euclidean distance [25] Itakura-Saito and Kullback-Leibler divergence [26]. Some other method are based on fuzzy logic [27] [support vector machines (SVM) [28] and genetic algorithms [29], [30].
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 53 3. METHODOLOGY The analysis process of the speech signals at phrase, word and phoneme levels was carried out with the raw recordings of six speakers in Hindi language. Speakers were of different age group, from different regions of Jammu. Sony recorder (ICD-UX513F) has been used for the recording of speech. It is a 4GB UX Digital Voice Recorder with expandable memory capabilities, and provides high quality voice recording. The investigation further includes the segmentation, labeling of the recorded speech at phrase, word and phoneme levels. Fig. 2 shows sub-section of speech signal waveform and spectrogram labeled at phoneme level. Feature extraction and lastly alignment of source and target feature vectors using DTW technique and then calculating Mahalanobis distance between them. Fig. 2 Sub-section of speech signal waveform and spectrogram labeled at phoneme level. Thus main experiment is separated into two main sections. In first section, segmented source and target speech, features vectors were extracted using MFCC algorithms. The features extracted were aligned separately by means of DTW techniques and alignment error was calculated using Mahalanobis distance. In second section, segmented source and target speech were analysed first by HNM module in order to obtain the HNM parameters afterward MFCC features were calculated. The extracted features were aligned by means of DTW techniques and alignment error was calculated using Mahalanobis distance. These steps were implemented on the phrase, word and phoneme levels of the speech signal. 4. RESULTS AND DISCUSSION In order to analyse the results based on the techniques namely MFCC, HNM, and DTW for the alignment of segmented speech at phrase, word, and phoneme levels. These investigations were carried out with male to male, female to male, and female to female speaker combinations. In order to compare alignment techniques, mean and standard deviation of Mahalanobis distances were calculated. Alignment error using Mahalanobis distances of various male-male combinations are represented from fig. 3 to fig. 5. The various combinations of speaker pair’s use as source and targets were written on the bottom of each plot, e.g. aman-s-nakul represents source target alignment without using HNM at sentence level while aman-sh-nakul represents source target alignment using HNM at sentence level.
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 54 From the below graphs it can easily be analysed that alignment error at all segmented level of speech decreases by using HNM model, as Mahalanobis distances reduces. Therefore using HNM model, alignment can further be improved in comparison with the feature extraction method based on MFCC only. Decrease in the Mahalanobis distances means better alignment between the two sequences. The overall comparisons of the speech alignment were shown in form of bar graphs. It can be well estimated that better speech alignment can be achieved at sentence level segmentation rather than word level or phoneme level segmentation of speech. Fig. 3 MFCC based speech alignment errors using Mahalanobis distance at sentence level with male to male combination. The first column shows the results for alignment without the use of HNM based method and second column shows the results for HNM based method. Fig. 4 MFCC based speech alignment errors using Mahalanobis distance at word level with male to male combination. The first column shows the results for alignment without the use of HNM based method and second column shows the results for HNM based method
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 55 Fig. 5 MFCC based speech alignment errors using Mahalanobis distance at phoneme level with male to male combination. The first column shows the results for alignment without the use of HNM based method and second column shows the results for HNM based method. Fig. 6 Bar graph illustrate averaged MFCC based mean along standard deviation speech alignment errors using Mahalanobis distance at sentence, word and phoneme level with all male to male, female to male and female to female combinations. The symbol s, w, p shows the results for alignment without the use of HNM based method and the symbol s”, w”, p” shows the results for HNM based method. 5. CONCLUSION The speech signals in the form of twenty phrases were recorded from each of six speakers. Investigations were carried out with three different combinations of male to male, female to male, and female to female speakers. The feature vectors are extracted using MFCC, and HNM
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 56 techniques. Speech alignment errors using Mahalanobis distances for labeled phrases, words, and phonemes, aligned by means of DTW were calculated. From the analysis of the results it can be observed that alignment error with HNM model decreases at all the levels of labeled speech levels. Therefore implementing HNM model alignment error can further be reduced in compression with the feature extraction method based on MFCC only. Reduction in alignment error means better alignment or matching between two speech signals. Thus this research work is concluded as follows in brief: 1) HNM based alignment is more promising than other existing techniques. 2) The accuracy of speech recognition cannot be considerably increased even labeling the phrases at phoneme level. Effective speech alignment can be obtained at phrase level also which save our valuable time and unnecessary step in algorithm. 3) It is remarkable to note that speech alignment error in case of female-male is much larger than other combinations of male-male and female-female. Therefore such combination must be avoided in speech recognition and speaker transformation. REFERENCES [1] Zaher A A (2009) “Recent Advances in Signal Processing”, Publisher: InTech, pp. 544. [2] Tebelskis J (1995) “Speech Recognition using Neural Networks”, Ph.D. dissertation, Dept. School of Computer Science, Carnegie Mellon University Pittsburgh, Pennsylvania. [3] http://www.learnartificialneuralnetworks.com/speechrecognition.html [4] Lindholm S (2010) “A speech recognition system for Swedish running on Android”. [5] Ursin M (2002) “Triphone Clustering in Finnish Continuous Speech Recognition”, M.S thesis, Department of Computer Science, Helsinki University of Technology, Finland, [6] Trabelsi I and Ayed D Ben (2012) “On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels”, Proc. IEEE, Sciences of Electronic, Technologies of Information and Telecommunications. [7] Rabiner L R (1989) “A tutorial on Hidden Markov Models and selected applications in speech recognition". [8] Wang1 X & Paliwal K K (2003) “Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition,” Pattern Recognition, vol. 36, pp. 2429. [9] Muller F & Mertins A (2011) “Contextual invariant-intergration features for improved speaker- independent speech recognition”, Speech communication, pp. 830. [10] Lu X, Unoki M & Nakamura S (2011) “Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments”, Computer Speech and Language, pp. 571. [11] Mporas I, Ganchev T, Kacsis O & Fakotakis N (2011) “Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environments”, Signal Processing, vol. 91, pp. 2101. [12] Dai P & Soon I Y (2011) “A temporal warped 2D psychoacoustic modeling for robust speech recognition system”, Speech Communication, vol. 53, pp. 229. [13] Rabiner L R & Juang B H (1993) “Fundamentals of Speech Recognition”, Prentice-Hall. [14] Doh S J (2000) “Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression”, PhD, Department of Electrical and Computer Engineering, Carnegie Mellon University Pittsburgh. [15] Woo K, Yang T, Park K & Lee C (2000) “Robust voice activity detection algorithm for estimating noise spectrum”, Electronics Letters, vol. 36, pp. 180. [16] Marzinzik M & Kollmeier B (2002) “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics”, IEEE Trans. Speech Audio Processing, vol. 10, pp. 341. [17] Tucker R (1992), “Voice activity detection using a periodicity measure”, Proc. Institute of Electronic Engineering, vol. 139, pp. 377.
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 57 [18] Bachu R G, Kopparthi S, Adapa B. & Barkana B D (2008) “Separation of Voiced and Unvoiced Using Zero Crossing Rate and Energy of the Speech Signal”, Proc., American Society for Engineering Education, Zone Conference, pp. 1. [19] Nemer E, Goubran R & Mahmoud S (2001) “Robust voice activity detection using higher order statistics in the lpc residual domain”, IEEE Trans. Speech Audio Processing, vol. 9, pp. 217. [20] Ramirez J, Gorriz J, Segura J C, Puntonet C G & Rubio A (2006) “Speech/Non-speech Discrimination based on Contextual Information Integrated Bispectrum LRT”, IEEE Signal Processing Letters, vol. 13, pp. 497. [21] Gorriz J M, Ramirez J, Puntonet C G & Segura J C (2006) “Generalized LRT-based voice activity detector”, IEEE Signal Processing Letters, vol. 13, pp. 636. [22] Ramírez J, Gorriz J M & Segura J C (2007) “Statistical Voice Activity Detection Based on Integrated Bispectrum Likelihood Ratio Tests”, Journal of the Acoustical Society of America, vo. 121, pp. 2946, [23] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2004), “Efficient Voice Activity Detection Algorithms Using Long-Term Speech Information”, Speech Communication, vol. 42, pp. 271. [24] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2005) “An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition”, IEEE Transactions on Speech and Audio Processing, vol. 13, pp. 1119. [25] Gorriz J M, Ramirez J, Segura J C & Puntonet C G (2006) “An effective cluster-based model for robust speech detection and speech recognition in noisy environments”, Journal of the Acoustical Society of America, vol. 120, pp. 470. [26] Ramirez J, Segura J C, Benitez C, Torre A & Rubio A (2004) “A New Kullback- Leibler VAD for Robust Speech Recognition”, IEEE Signal Processing Letters, vol. 11, pp. 266. [27] Beritelli F, Casale S, Rugeri G & Serrano S (2002) “Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors”, IEEE Signal Processing Letters, vol. 9, pp. 85. [28] Ramirez J, Yelamos P, Gorriz J M & Segura J C (2006) “SVM-based Speech Endpoint Detection Using Contextual Speech Features”, IEE Electronics Letters, vol. 42. [29] Estevez P A, Yoma N, Boric N & Ramirez J A (2005) “Genetic programming based voice activity detection”, Electronics Letters, vol. 41, pp. 1141. [30] Ramirez J, Gorriz J M & Segura J C (2007) “Voice Activity Detection. Fundamentals and Speech Recognition System Robustness”, Robust Speech Recognition and Understanding, I-TECH Education and Publishing, pp. 1.