SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
International Journal of Computer Applications (0975 – 8887)
International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014)
39
Voiced/Unvoiced Detection using Short Term
Processing
S. Nandhini A. Shenbagavalli, Ph.D.
PG Scholar, Dept of ECE Professor and Head of ECE
National Engineering College National Engineering College
Kovilpatti, Tamilnadu, India. Kovilpatti, Tamilnadu, India.
ABSTRACT
A new method for identifying voiced and unvoiced speech
region is proposed. Voiced/unvoiced speech detection is
needed to extract information from the speech signal and it
is important in the area of speech analysis. Voiced and
unvoiced speech region has been identified using Short
Term Processing (STP) in this paper. Short Term
Processing of speech has been performed by viewing the
speech signal in frames, which has a size of 10-30ms.
Short Term Processing has been performed in both time
domain and frequency domain. Short Term Energy (STE),
Short Term Zero Crossing Rate (STZCR) and Short Term
Autocorrelation (STA) are computed from the time domain
processing of speech. The spectral components in the
speech signal are not apparent in the time domain. Hence,
frequency domain representation is needed which is
achieved using fourier transform. Conventional fourier
representation is inadequate to provide information about
the time varying nature of spectral components present in
speech. So, short term version of fourier transform is
needed, which is named as Short Term Fourier Transform
(STFT).
Keywords
Voiced speech, Unvoiced speech, Short Term Energy,
Short Term Zero Crossing Rate, Short Term
Autocorrelation, Short Term Fourier Transform.
1. INTRODUCTION
Speech is an acoustic signal produced from the speech
production system. In the human speech production
system, air enters into the lungs during inhalation via
articulators and speech organs. Articulators mainly include
tongue, lips and jaw. The speech production organs
include pharynx, glottis, vocal cord, larynx and trachea.
This air is expelled out from the lungs through trachea and
causes the vocal cord to vibrate. During this process, the
air flow is chopped up into quasi-periodic pulses to
generate the required excitation signal. Finally, the pulses
are frequency shaped by the oral cavity and the nasal
cavity to produce the desired speech signal.
The speech production system produces the output speech
signal by responding to the input excitation signal.
Depending upon the input excitation signal, the output
speech signal from the speech production system is
broadly divided into three classes which are discussed in
section 2.
Identification of voiced/unvoiced speech mainly depends
on the nature of the speech waveform, energy associated
with the speech waveform [1], [2], [7], [8] number of zero
crossings present in the speech waveform [1], [2], [5], [7],
[8] and the correlation among the successive samples in
the speech waveform [2], [5], [8].
Detection of voiced/unvoiced speech in the speech
waveform is important in speech analysis system. In
speech analysis, the voiced/unvoiced detection is usually
performed in extracting information from the speech
signals. But the voiced/unvoiced detection is critical,
because it is essential to know whether the speech
production system involves vibration of the vocal cords
[2], [10]. The periodicity of the vocal tract vibration makes
the voiced speech segment periodic and so distinguishable
from the noise-like unvoiced speech segments [6].
This paper describes about the three classes of speech
signal in section 2. Section 3 focuses on Short Term
Processing of speech. Short Term Processing has been
performed in both time domain and frequency domain.
Parameters like Short Term Energy (STE), Short Term
Zero Crossing Rate (STZCR) and Short Term
Autocorrelation (STA) are computed from the time
domain. Short Term Fourier Transform is computed from
the frequency domain. The results are discussed in section
4.
2. THREE CLASSES OF SPEECH
SIGNAL
The three classes are named as voiced speech, unvoiced
speech and non-speech (silence). In the first class (voiced
speech), the input excitation is nearly periodic in nature. In
the second class (unvoiced speech), the input excitation is
random noise-like nature. In the third class (non-speech),
there is no excitation to the system.
2.1 Voiced Speech
When the input excitation to the speech production system
is nearly periodic impulse sequence, then the output speech
looks visually like a periodic signal and is termed as
voiced speech.
Fig 1: Block diagram representation of voiced speech
production
International Journal of Computer Applications (0975 – 8887)
International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014)
40
During the production of voiced speech, the air breathe out
of lungs through the trachea is interrupted periodically by
the vibrating vocal cords. Due to this interruption, glottal
wave is generated which excites the speech production
system to produce voiced speech.
Voiced speech can be identified by the periodic nature of
the speech waveform, relatively high energy associated
with the speech waveform [7], [8], less number of zero
crossings present in the speech waveform [3], [5], [7], [8]
and more correlation among the successive samples in the
speech waveform [8].
Due to the periodic nature of voiced speech, a fundamental
frequency (pitch frequency) and its harmonics can be
predicted in the spectrum of voiced speech. The presence
of harmonic structure in the voiced spectrum is another
distinguishing factor.
2.2 Unvoiced Speech
When the input excitation to the speech production system
is random noise-like structure, then the output speech will
also be random noise-like without any periodic nature and
is termed as unvoiced Speech.
Fig 2: Block diagram representation of unvoiced speech
production
During the production of unvoiced speech, the air breathe
out of lungs through the trachea is not interrupted by the
vibrating vocal cords. However, obstruction of air flow
occurs scarcely from glottis somewhere along the length of
vocal tract to produce total or partial closure. Due to this
modification of air flow, stop or frication excitation
occurs. This excites the vocal tract system to produce
unvoiced speech.
Unvoiced speech can be identified by the non-periodic and
noise-like nature of the speech waveform, relatively low
energy associated with the speech waveform when
compared to voiced speech [7], [8] more number of zero
crossings present in the speech waveform when compared
to voiced speech [3], [5], [7], [8] and relatively less
correlation among successive samples in the speech
waveform [8].
Due to the noise-like nature of unvoiced speech, there is no
fundamental frequency (pitch frequency) and its harmonics
in the spectrum of unvoiced speech. The absence of
harmonic structure in the unvoiced spectrum is another
distinguishing factor.
2.3 Non-Speech (Silence)
When there is no input excitation to the speech production
system, then the output corresponds to no speech and is
termed as non-speech. Non-speech (silence) is present
between voiced and unvoiced speech. Hence, speech will
not be intelligible without the presence of non-speech
between voiced and unvoiced speech.
Fig 3: Block diagram representation of non-speech
(silence) production
During the production of non-speech, there is no excitation
supplied to the vocal tract and hence there is no speech
output.
Non-speech is identified by the absence of any signal
characteristics, lowest energy associated with the speech
waveform when compared to unvoiced and voiced speech
[8], relatively more number of zero crossings present in the
speech waveform when compared to unvoiced speech [3],
[8] and no correlation among successive samples in the
speech waveform [8].
Due to the absence of speech signal, there is no output in
the spectrum of non-speech. This is another distinguishing
factor for non-speech.
3. SHORT TERM PROCESSING
Speech is produced from a time varying vocal tract system
with time varying excitation. Due to this, the speech signal
is non-stationary in nature. Speech signal is stationary
when it is viewed in blocks of 10-30msec [8]. Short Term
Processing divides the input speech signal into short
analysis segments that are isolated and processed with
fixed (non-time varying) properties. These short analysis
segments called as analysis frames almost always overlap
one another. Short Term Processing of speech is performed
both in time domain and in frequency domain [4].
3.1 Time Domain Processing of Speech
Short Term Energy (STE), Short Term Zero Crossing Rate
(STZCR) and Short Term Autocorrelation (STA) are
computed from the time domain processing of speech.
3.1.1 Short Term Energy (STE)
Speech is time varying in nature. The energy associated
with voiced speech is large when compared to unvoiced
speech [7]. Silence speech will have least or negligible
energy when compared to unvoiced speech [8]. Hence,
Short Term Energy can be used for voiced, unvoiced and
silence classification of speech.
Short Term Energy is derived from the following equation,
𝐸 𝑇 = 𝑠2
𝑚∞
𝑚=−∞ (1)
where 𝐸 𝑇 is the total energy and 𝑠(𝑚) is the discrete time
signal [4], [8].
For Short Term Energy computation, speech is considered
in terms of short analysis frames whose size typically
ranges from 10-30 msec. Consider the samples in a frame
of speech as m=0 to m=N-1, where N is the length of
frame. Hence, equation (1) is written as,
𝐸 𝑇 = 𝑠2
−1
𝑚=−∞
𝑚 + 𝑠2
𝑚 + 𝑠2
∞
𝑚=𝑁
𝑁−1
𝑚=0
(𝑚)
(2)
The speech sample is zero outside the frame length. Hence,
equation (2) is written as,
𝐸 𝑇 = 𝑠2
(𝑚)𝑁−1
𝑚=0 (3)
The relation in equation (3) gives the total energy present
in the frame of speech from m=0 to m=N-1.
Short Term Energy is defined as the sum of squares of the
samples in a frame and it is given by,
No Input
Excitation
Speech production
system
No Speech
Output
International Journal of Computer Applications (0975 – 8887)
International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014)
41
𝑒(𝑛) = [𝑠𝑛 𝑚 ]2∞
𝑚=−∞ (4)
After framing and windowing, the 𝑛 𝑡𝑕
frame speech
becomes 𝑠 𝑛(𝑚) = 𝑠(𝑚). 𝑤(𝑛 − 𝑚) and hence Short
Term Energy in equation (4) is given by [4], [7], [8],
𝑒(𝑛) = [𝑠 𝑚 . 𝑤 𝑛 − 𝑚 ]2∞
𝑚=−∞ (5)
where 𝑤(𝑛) represents the windowing function of finite
duration and 𝑛 represents the frame shift or rate in number
of samples. This frame shift is as small as one sample or as
large as frame size.
3.1.2 Short Term Zero Crossing Rate (STZCR)
Zero Crossing Rate is defined as the number of times the
zero axes is crossed per frame. If the number of zero
crossings is more in a given signal, then the signal is
changing rapidly and accordingly the signal may contain
high frequency information which is termed as unvoiced
speech. On the other hand, if the number of zero crossing
is less, then the signal is changing slowly and accordingly
the signal may contain low frequency information which is
termed as voiced speech [7], [8]. Thus, Zero Crossing Rate
gives indirect information about the frequency content of
the signal. Zero Crossing Rate is computed using typical
frame size of 10-30msec with half the frame size as frame
shift.
Short Term Zero Crossing Rate is defined as the weighted
average of number of times the speech signal changes sign
within the time window [4] and it is given by [1], [4], [5],
[7],
𝑍 𝑛 = │𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│. 𝑤(𝑛 − 𝑚)
∞
𝑚=−∞
(6)
where 𝑠𝑔𝑛[𝑠 𝑚 ] = 1 𝑖𝑓 𝑠 𝑚 ≥ 0
= −1 𝑖𝑓 𝑠(𝑚) < 0,
and 𝑤 𝑛 =
1
2𝑁
𝑖𝑓 0 ≤ 𝑛 ≤ 𝑁 − 1
= 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒.
If successive samples 𝑠(𝑚) and 𝑠(𝑚 − 1) have different
algebraic signs then,
│𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│ in (6) becomes equal to 1
and hence Zero Crossing Rate is counted.
If successive samples 𝑠(𝑚) and 𝑠(𝑚 − 1) have same
algebraic signs then,
│𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│ in (6) becomes equal to 0
and hence Zero Crossing Rate is not counted.
3.1.3 Short Term Autocorrelation (STA)
Correlation can be used for finding the similarity among
one or two sequences. Autocorrelation is achieved by
providing different time lag for the sequence and
computing with the given sequence as reference. The
autocorrelation 𝑟𝑋𝑋 𝑘 of a stationary sequence is given
by,
𝑟𝑋𝑋 𝑘 = 𝑋 𝑚 . 𝑋(𝑚 + 𝑘)∞
𝑚=−∞ (7)
Due to the non-stationary nature of speech, a short term
version of the autocorrelation is needed. The Short Term
Autocorrelation of a non-stationary sequence 𝑠(𝑚) is
defined as the deterministic autocorrelation function of the
sequence 𝑠 𝑛(𝑚) = 𝑠(𝑚). 𝑤(𝑛 − 𝑚) that is selected by
the window shifted to time 𝑛 and it is given by [4], [8],
𝑟𝑠𝑠 𝑛, 𝑘 = [𝑠 𝑚 . 𝑤 𝑛 − 𝑚∞
𝑚=−∞ ]. [𝑠 𝑘 +
𝑚 . 𝑤 𝑛 − 𝑘 − 𝑚 ] (8)
where 𝑠 𝑛 𝑚 = 𝑠 𝑚 . 𝑤 𝑛 − 𝑚 is the windowed version
of 𝑠 𝑚 .
The nature of Short Term Autocorrelation is primarily
different for voiced and unvoiced speech. The
autocorrelation of voiced speech will have periodic
waveform representing the pitch period [6], [9]. The
autocorrelation of unvoiced speech will have random
noise-like structure. Therefore, information from the
autocorrelation sequence is used for discriminating voiced
and unvoiced speech.
3.2 Frequency Domain Processing of
Speech
The short term time domain analysis is useful for
computing the time domain features like energy, zero
crossing rate and autocorrelation at the gross level. The
different frequency or spectral components that are present
in the speech signal are not directly apparent in the time
domain. Hence, frequency domain representation is needed
which is achieved using fourier transform. The
conventional fourier representation is inadequate to
provide information about the time varying nature of
spectral information present in speech. Hence, short term
version of fourier transform commonly known as Short
Term Fourier Transform (STFT) is needed [4].
3.2.1 Discrete Time Fourier Transform (DTFT)
Discrete Time Fourier Transform (DTFT) is used to obtain
the frequency domain representation [4]. If 𝑋(𝑤) is the
Discrete Time Fourier Transform of 𝑥(𝑛), then the DTFT
is given by,
𝑋 𝑤 = 𝑥(𝑛)𝑒−𝑗𝜔𝑛∞
𝑛=−∞ (9)
Where 𝑥(𝑛) is a discrete time signal and it is given by,
𝑥 𝑛 =
1
2𝛱
𝑋(𝑤)
𝛱
−𝛱
𝑒 𝑗𝜔𝑛
𝑑𝜔 (10)
As 𝑋 𝑤 is continuous function of frequency, it cannot be
computed. To make it as a discrete version of the DTFT
termed as Discrete Fourier Transform (DFT), we have to
uniformly sample 𝑋 𝑤 as,
𝑋 𝑤 = 𝑋(𝑤 𝑘 ) (11)
Where 𝑤 𝑘 =
2𝛱
𝑁
𝑘, k = 0, 1... (N-1), where N is the number
of samples of 𝑋(𝑤).
In speech, large magnitude frequency components
dominate in the linear magnitude spectrum to give a false
picture of small magnitude frequency components. To
overcome this, the logarithmic version of the magnitude
spectrum is taken. Although large and small magnitude
frequency components are visible in log magnitude
spectrum, it does not tell when a particular frequency
component is present. Hence, the timing information is
completely lost. Speech is a non-stationery signal where
the frequency components change with time.
International Journal of Computer Applications (0975 – 8887)
International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014)
42
Therefore, a representation that will give time varying
spectral information is needed for speech.
3.2.2 Short Term Fourier Transform (STFT)
To obtain the time varying spectral information, Short
Term Fourier Transform approach is employed. DTFT is
computed using DFT for a block size of 20ms and this
process is repeated for all the blocks of speech signal. All
the spectra computed are stacked together as a function of
time and frequency to observe the time varying spectra.
This leads to Short Term Fourier Transform which is given
by [4], [8],
𝑋 𝑤, 𝑛 = 𝑥 𝑚 𝑤(𝑛 − 𝑚)𝑒−𝑗𝜔𝑚∞
𝑚=−∞ (12)
where 𝑤(𝑛) is the window function for Short Term
Processing, [𝑥 𝑚 . 𝑤 𝑛 − 𝑚 ] represents the windowed
segment at the time instant 𝑛.
In STFT, the spectral amplitude and phase are function of
both frequency and time whereas it is the only function of
frequency in DTFT. STFT magnitude spectra is a three
dimensional (3D) plot of spectral amplitude versus time
and frequency. By fixing the time 𝑛 at a particular value
𝑛 = 𝑛0 and observing STFT gives spectrum of that
segment of speech. Alternatively, by fixing the frequency
𝑤 at a particular value 𝑤 = 𝑤0 and then observing STFT
gives time varying nature of that particular frequency
component. The spectral amplitude associated with a
frequency component varies as a function of time.
4. RESULTS AND DISCUSSIONS
A clean speech signal without any noise added to it is
taken as input.
Fig 4: Input speech waveform
Fig. 4 shows the input speech waveform of a clean speech
signal. Short Term Energy, Short Term Zero Crossing
Rate, Short Term Autocorrelation are computed from time
domain processing of speech. Short Term Fourier
transform is computed from frequency domain processing
of speech which overcomes Discrete Time Fourier
Transform.
4.1 Short Term Energy Computation
Fig 5: Short Term Energy contour for the speech signal
with 20ms frame size
Fig. 5 depicts that waveform region with high amplitude
and high energy is detected as voiced speech and the
waveform region with low amplitude and low energy is
detected as unvoiced speech.
4.2 Short Term Zero Crossing Rate
Computation
Fig 6: Short Term Zero Crossing Rate contour for the
speech signal with 20ms frame size
Fig. 6 depicts that the waveform region with more number
of zero crossings is detected as unvoiced speech and the
waveform region with less number of zero crossings is
detected as voiced speech.
4.3 Short Term Autocorrelation
Computation
Fig 7: Short Term Autocorrelation for the speech
signal with 30ms time duration
Fig. 7 depicts that the speech signal waveform within
30ms duration is periodic and hence it is detected as
voiced speech where as the speech signal waveform after
30ms duration is random noise-like structure and hence it
is detected as unvoiced speech.
4.4 Discrete Time Fourier Transform
Computation
Fig 8: Linear magnitude DTFT spectrum of a speech
signal
International Journal of Computer Applications (0975 – 8887)
International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014)
43
Fig. 8 shows that large magnitude frequency component
dominates the small magnitude frequency component.
Hence, linear magnitude DTFT spectrum cannot be used
for spectral analysis.
Fig 9: Log magnitude DTFT spectrum of a speech
signal
Fig. 9 shows both large magnitude and small magnitude
frequency component. However, the log magnitude DTFT
spectrum does not tell when a particular frequency
component is present. Hence, timing information is
completely lost.
4.5 Short Term Fourier Transform
Computation
Fig 10: 3D plot of Short Term Fourier Transform
Fig. 10 shows time varying frequency components. Here,
low frequency region with high amplitude and energy is
detected as voiced speech and high frequency region with
low amplitude and energy is detected as unvoiced speech.
5. CONCLUSION AND FUTURE
WORK
Detection of voiced and unvoiced speech region is
important in speech analysis. Voiced/unvoiced
identification is usually performed to extract information
from the speech signal. Short Term Processing (STP) was
performed to detect the voiced and unvoiced regions of
speech. Short Term Processing divides the input speech
signal into short analysis frames of size 10-30ms with a
frame shift of 10ms. Short Term processing was computed
both in time domain and in frequency domain. Short Term
Energy (STE), Short Term Zero Crossing Rate (STZCR),
Short Term Autocorrelation (STA) was computed from the
time domain. In the time domain, the spectral components
of the speech signal are not apparent. Hence, frequency
domain representation was needed and achieved using
fourier transform. Hence, Short Term Fourier Transform
(STFT) was computed to provide information about the
time varying nature of spectral components present in the
speech signal. This classification of speech into voiced and
unvoiced regions can be used for applications like speaker
identification and verification, speech coding, speech
synthesis and speech enhancement in future.
6. REFERENCES
[1] A.E.Mahdi and E.Jafer, “Two-Feature
Voiced/Unvoiced Classifier Using Wavelet
Transform”, The Open Electrical and Electronic
Engineering Journal, No.2, pp.8-13, 2008.
[2] Bishnu S.Atal and Lawrence R.Rabiner, “A Pattern
Recognition Approach to Voiced-Unvoiced-Silence
Classification with Applications to Speech
Recognition”, IEEE Transactions on Acoustics,
Speech and Signal Processing, Vol.24, No.3, 2003,
pp. 201-212.
[3] D.G.Childers, M.Hahn and J.N.Larar “Silent and
Voiced/Unvoiced/Mixed Excitation (Four-way)
Classification of Speech”, IEEE Transactions on
Acoustics, Speech and Signal Processing, Vol.37,
No.11, November 1989.
[4] Lawrence R.Rabiner and Ronald W.Schafer,
“Introduction to Digital Speech Processing”,
Foundations and Trends in Signal Processing, Vol.1,
No.33-53, 2007.
[5] Mojtaba Radmard, Mahdi Hadavi and Mohammad
Mahdi Nayebi, “A New Method of Voiced/Unvoiced
Classification Based on Clustering”, Journal of Signal
and Information Processing, 2011, Vol.2, pp.336-347.
[6] N.Dhananjaya and B.Yegnanarayana, “Voiced/ Non-
voiced Detection Based on Robustness of Voiced
Epochs”, IEEE Signal Processing Letters, Vol.17,
No.3, March 2010.
[7] R.G.Bachu, S.Kopparthi, B.Adapa and B.D.Barkana,
“Voiced/Unvoiced Decision for Speech Signals Based
on Zero-Crossing Rate and Energy”, Advanced
Techniques in Computing Sciences and Software
Engineering, pp.279-282, 2010.
[8] Ronald W.Schafer and Lawrence R.Rabiner, “Digital
Representations of Speech Signals”, Proceedings of
the IEEE, Vol.63, No.4, April 1975.
[9] S.Ahmadi and A.S.Spanias, “Cepstrum Based Pitch
Detection Using a New Statistical V/UV
Classification Algorithm”, IEEE Transactions on
Speech and audio Processing, Vol.7, No.3, pp.333-
338, 2002.
[10] Yingyong Qi and Bobby R.Hunt, “Voiced-Unvoiced-
Silence Classifications of Speech Using Hybrid
Features and a Network Classifier”, IEEE
Transactions on Speech and Audio Processing, Vol.1,
No.2, pp. 250-255, 2002.

Contenu connexe

Tendances

VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...ijistjournal
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
 
Speech signal analysis for linear filter banks of different orders
Speech signal analysis for linear filter banks of different ordersSpeech signal analysis for linear filter banks of different orders
Speech signal analysis for linear filter banks of different orderseSAT Journals
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET Journal
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
 
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
 
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESSPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESacijjournal
 
Automatic identification of silence, unvoiced and voiced chunks in speech
Automatic identification of silence, unvoiced and voiced chunks in speechAutomatic identification of silence, unvoiced and voiced chunks in speech
Automatic identification of silence, unvoiced and voiced chunks in speechcsandit
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...acijjournal
 
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionIOSRJVSP
 
Study of acoustic properties of nasal and nonnasal vowels in temporal domain
Study of acoustic properties of nasal and nonnasal vowels in temporal domainStudy of acoustic properties of nasal and nonnasal vowels in temporal domain
Study of acoustic properties of nasal and nonnasal vowels in temporal domaincsandit
 
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAIN
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAINSTUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAIN
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAINcscpconf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 

Tendances (18)

VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
 
Lr3620092012
Lr3620092012Lr3620092012
Lr3620092012
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
Speech signal analysis for linear filter banks of different orders
Speech signal analysis for linear filter banks of different ordersSpeech signal analysis for linear filter banks of different orders
Speech signal analysis for linear filter banks of different orders
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
 
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURESSPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
 
Automatic identification of silence, unvoiced and voiced chunks in speech
Automatic identification of silence, unvoiced and voiced chunks in speechAutomatic identification of silence, unvoiced and voiced chunks in speech
Automatic identification of silence, unvoiced and voiced chunks in speech
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
 
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
 
Study of acoustic properties of nasal and nonnasal vowels in temporal domain
Study of acoustic properties of nasal and nonnasal vowels in temporal domainStudy of acoustic properties of nasal and nonnasal vowels in temporal domain
Study of acoustic properties of nasal and nonnasal vowels in temporal domain
 
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAIN
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAINSTUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAIN
STUDY OF ACOUSTIC PROPERTIES OF NASAL AND NONNASAL VOWELS IN TEMPORAL DOMAIN
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 

Similaire à Iciiecs1461

An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresSivaranjan Goswami
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...ijnlc
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)Jeff Nelson
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...eSAT Journals
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsIJMTST Journal
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...kevig
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...IRJET Journal
 
To Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersTo Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersIOSR Journals
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...k srikanth
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 

Similaire à Iciiecs1461 (20)

Int journal 01
Int journal 01Int journal 01
Int journal 01
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech features
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
F010334548
F010334548F010334548
F010334548
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
 
To Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersTo Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant Users
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Iciiecs1461

  • 1. International Journal of Computer Applications (0975 – 8887) International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014) 39 Voiced/Unvoiced Detection using Short Term Processing S. Nandhini A. Shenbagavalli, Ph.D. PG Scholar, Dept of ECE Professor and Head of ECE National Engineering College National Engineering College Kovilpatti, Tamilnadu, India. Kovilpatti, Tamilnadu, India. ABSTRACT A new method for identifying voiced and unvoiced speech region is proposed. Voiced/unvoiced speech detection is needed to extract information from the speech signal and it is important in the area of speech analysis. Voiced and unvoiced speech region has been identified using Short Term Processing (STP) in this paper. Short Term Processing of speech has been performed by viewing the speech signal in frames, which has a size of 10-30ms. Short Term Processing has been performed in both time domain and frequency domain. Short Term Energy (STE), Short Term Zero Crossing Rate (STZCR) and Short Term Autocorrelation (STA) are computed from the time domain processing of speech. The spectral components in the speech signal are not apparent in the time domain. Hence, frequency domain representation is needed which is achieved using fourier transform. Conventional fourier representation is inadequate to provide information about the time varying nature of spectral components present in speech. So, short term version of fourier transform is needed, which is named as Short Term Fourier Transform (STFT). Keywords Voiced speech, Unvoiced speech, Short Term Energy, Short Term Zero Crossing Rate, Short Term Autocorrelation, Short Term Fourier Transform. 1. INTRODUCTION Speech is an acoustic signal produced from the speech production system. In the human speech production system, air enters into the lungs during inhalation via articulators and speech organs. Articulators mainly include tongue, lips and jaw. The speech production organs include pharynx, glottis, vocal cord, larynx and trachea. This air is expelled out from the lungs through trachea and causes the vocal cord to vibrate. During this process, the air flow is chopped up into quasi-periodic pulses to generate the required excitation signal. Finally, the pulses are frequency shaped by the oral cavity and the nasal cavity to produce the desired speech signal. The speech production system produces the output speech signal by responding to the input excitation signal. Depending upon the input excitation signal, the output speech signal from the speech production system is broadly divided into three classes which are discussed in section 2. Identification of voiced/unvoiced speech mainly depends on the nature of the speech waveform, energy associated with the speech waveform [1], [2], [7], [8] number of zero crossings present in the speech waveform [1], [2], [5], [7], [8] and the correlation among the successive samples in the speech waveform [2], [5], [8]. Detection of voiced/unvoiced speech in the speech waveform is important in speech analysis system. In speech analysis, the voiced/unvoiced detection is usually performed in extracting information from the speech signals. But the voiced/unvoiced detection is critical, because it is essential to know whether the speech production system involves vibration of the vocal cords [2], [10]. The periodicity of the vocal tract vibration makes the voiced speech segment periodic and so distinguishable from the noise-like unvoiced speech segments [6]. This paper describes about the three classes of speech signal in section 2. Section 3 focuses on Short Term Processing of speech. Short Term Processing has been performed in both time domain and frequency domain. Parameters like Short Term Energy (STE), Short Term Zero Crossing Rate (STZCR) and Short Term Autocorrelation (STA) are computed from the time domain. Short Term Fourier Transform is computed from the frequency domain. The results are discussed in section 4. 2. THREE CLASSES OF SPEECH SIGNAL The three classes are named as voiced speech, unvoiced speech and non-speech (silence). In the first class (voiced speech), the input excitation is nearly periodic in nature. In the second class (unvoiced speech), the input excitation is random noise-like nature. In the third class (non-speech), there is no excitation to the system. 2.1 Voiced Speech When the input excitation to the speech production system is nearly periodic impulse sequence, then the output speech looks visually like a periodic signal and is termed as voiced speech. Fig 1: Block diagram representation of voiced speech production
  • 2. International Journal of Computer Applications (0975 – 8887) International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014) 40 During the production of voiced speech, the air breathe out of lungs through the trachea is interrupted periodically by the vibrating vocal cords. Due to this interruption, glottal wave is generated which excites the speech production system to produce voiced speech. Voiced speech can be identified by the periodic nature of the speech waveform, relatively high energy associated with the speech waveform [7], [8], less number of zero crossings present in the speech waveform [3], [5], [7], [8] and more correlation among the successive samples in the speech waveform [8]. Due to the periodic nature of voiced speech, a fundamental frequency (pitch frequency) and its harmonics can be predicted in the spectrum of voiced speech. The presence of harmonic structure in the voiced spectrum is another distinguishing factor. 2.2 Unvoiced Speech When the input excitation to the speech production system is random noise-like structure, then the output speech will also be random noise-like without any periodic nature and is termed as unvoiced Speech. Fig 2: Block diagram representation of unvoiced speech production During the production of unvoiced speech, the air breathe out of lungs through the trachea is not interrupted by the vibrating vocal cords. However, obstruction of air flow occurs scarcely from glottis somewhere along the length of vocal tract to produce total or partial closure. Due to this modification of air flow, stop or frication excitation occurs. This excites the vocal tract system to produce unvoiced speech. Unvoiced speech can be identified by the non-periodic and noise-like nature of the speech waveform, relatively low energy associated with the speech waveform when compared to voiced speech [7], [8] more number of zero crossings present in the speech waveform when compared to voiced speech [3], [5], [7], [8] and relatively less correlation among successive samples in the speech waveform [8]. Due to the noise-like nature of unvoiced speech, there is no fundamental frequency (pitch frequency) and its harmonics in the spectrum of unvoiced speech. The absence of harmonic structure in the unvoiced spectrum is another distinguishing factor. 2.3 Non-Speech (Silence) When there is no input excitation to the speech production system, then the output corresponds to no speech and is termed as non-speech. Non-speech (silence) is present between voiced and unvoiced speech. Hence, speech will not be intelligible without the presence of non-speech between voiced and unvoiced speech. Fig 3: Block diagram representation of non-speech (silence) production During the production of non-speech, there is no excitation supplied to the vocal tract and hence there is no speech output. Non-speech is identified by the absence of any signal characteristics, lowest energy associated with the speech waveform when compared to unvoiced and voiced speech [8], relatively more number of zero crossings present in the speech waveform when compared to unvoiced speech [3], [8] and no correlation among successive samples in the speech waveform [8]. Due to the absence of speech signal, there is no output in the spectrum of non-speech. This is another distinguishing factor for non-speech. 3. SHORT TERM PROCESSING Speech is produced from a time varying vocal tract system with time varying excitation. Due to this, the speech signal is non-stationary in nature. Speech signal is stationary when it is viewed in blocks of 10-30msec [8]. Short Term Processing divides the input speech signal into short analysis segments that are isolated and processed with fixed (non-time varying) properties. These short analysis segments called as analysis frames almost always overlap one another. Short Term Processing of speech is performed both in time domain and in frequency domain [4]. 3.1 Time Domain Processing of Speech Short Term Energy (STE), Short Term Zero Crossing Rate (STZCR) and Short Term Autocorrelation (STA) are computed from the time domain processing of speech. 3.1.1 Short Term Energy (STE) Speech is time varying in nature. The energy associated with voiced speech is large when compared to unvoiced speech [7]. Silence speech will have least or negligible energy when compared to unvoiced speech [8]. Hence, Short Term Energy can be used for voiced, unvoiced and silence classification of speech. Short Term Energy is derived from the following equation, 𝐸 𝑇 = 𝑠2 𝑚∞ 𝑚=−∞ (1) where 𝐸 𝑇 is the total energy and 𝑠(𝑚) is the discrete time signal [4], [8]. For Short Term Energy computation, speech is considered in terms of short analysis frames whose size typically ranges from 10-30 msec. Consider the samples in a frame of speech as m=0 to m=N-1, where N is the length of frame. Hence, equation (1) is written as, 𝐸 𝑇 = 𝑠2 −1 𝑚=−∞ 𝑚 + 𝑠2 𝑚 + 𝑠2 ∞ 𝑚=𝑁 𝑁−1 𝑚=0 (𝑚) (2) The speech sample is zero outside the frame length. Hence, equation (2) is written as, 𝐸 𝑇 = 𝑠2 (𝑚)𝑁−1 𝑚=0 (3) The relation in equation (3) gives the total energy present in the frame of speech from m=0 to m=N-1. Short Term Energy is defined as the sum of squares of the samples in a frame and it is given by, No Input Excitation Speech production system No Speech Output
  • 3. International Journal of Computer Applications (0975 – 8887) International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014) 41 𝑒(𝑛) = [𝑠𝑛 𝑚 ]2∞ 𝑚=−∞ (4) After framing and windowing, the 𝑛 𝑡𝑕 frame speech becomes 𝑠 𝑛(𝑚) = 𝑠(𝑚). 𝑤(𝑛 − 𝑚) and hence Short Term Energy in equation (4) is given by [4], [7], [8], 𝑒(𝑛) = [𝑠 𝑚 . 𝑤 𝑛 − 𝑚 ]2∞ 𝑚=−∞ (5) where 𝑤(𝑛) represents the windowing function of finite duration and 𝑛 represents the frame shift or rate in number of samples. This frame shift is as small as one sample or as large as frame size. 3.1.2 Short Term Zero Crossing Rate (STZCR) Zero Crossing Rate is defined as the number of times the zero axes is crossed per frame. If the number of zero crossings is more in a given signal, then the signal is changing rapidly and accordingly the signal may contain high frequency information which is termed as unvoiced speech. On the other hand, if the number of zero crossing is less, then the signal is changing slowly and accordingly the signal may contain low frequency information which is termed as voiced speech [7], [8]. Thus, Zero Crossing Rate gives indirect information about the frequency content of the signal. Zero Crossing Rate is computed using typical frame size of 10-30msec with half the frame size as frame shift. Short Term Zero Crossing Rate is defined as the weighted average of number of times the speech signal changes sign within the time window [4] and it is given by [1], [4], [5], [7], 𝑍 𝑛 = │𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│. 𝑤(𝑛 − 𝑚) ∞ 𝑚=−∞ (6) where 𝑠𝑔𝑛[𝑠 𝑚 ] = 1 𝑖𝑓 𝑠 𝑚 ≥ 0 = −1 𝑖𝑓 𝑠(𝑚) < 0, and 𝑤 𝑛 = 1 2𝑁 𝑖𝑓 0 ≤ 𝑛 ≤ 𝑁 − 1 = 0 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒. If successive samples 𝑠(𝑚) and 𝑠(𝑚 − 1) have different algebraic signs then, │𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│ in (6) becomes equal to 1 and hence Zero Crossing Rate is counted. If successive samples 𝑠(𝑚) and 𝑠(𝑚 − 1) have same algebraic signs then, │𝑠𝑔𝑛[𝑠 𝑚 ] − 𝑠𝑔𝑛[𝑠 𝑚 − 1 ]│ in (6) becomes equal to 0 and hence Zero Crossing Rate is not counted. 3.1.3 Short Term Autocorrelation (STA) Correlation can be used for finding the similarity among one or two sequences. Autocorrelation is achieved by providing different time lag for the sequence and computing with the given sequence as reference. The autocorrelation 𝑟𝑋𝑋 𝑘 of a stationary sequence is given by, 𝑟𝑋𝑋 𝑘 = 𝑋 𝑚 . 𝑋(𝑚 + 𝑘)∞ 𝑚=−∞ (7) Due to the non-stationary nature of speech, a short term version of the autocorrelation is needed. The Short Term Autocorrelation of a non-stationary sequence 𝑠(𝑚) is defined as the deterministic autocorrelation function of the sequence 𝑠 𝑛(𝑚) = 𝑠(𝑚). 𝑤(𝑛 − 𝑚) that is selected by the window shifted to time 𝑛 and it is given by [4], [8], 𝑟𝑠𝑠 𝑛, 𝑘 = [𝑠 𝑚 . 𝑤 𝑛 − 𝑚∞ 𝑚=−∞ ]. [𝑠 𝑘 + 𝑚 . 𝑤 𝑛 − 𝑘 − 𝑚 ] (8) where 𝑠 𝑛 𝑚 = 𝑠 𝑚 . 𝑤 𝑛 − 𝑚 is the windowed version of 𝑠 𝑚 . The nature of Short Term Autocorrelation is primarily different for voiced and unvoiced speech. The autocorrelation of voiced speech will have periodic waveform representing the pitch period [6], [9]. The autocorrelation of unvoiced speech will have random noise-like structure. Therefore, information from the autocorrelation sequence is used for discriminating voiced and unvoiced speech. 3.2 Frequency Domain Processing of Speech The short term time domain analysis is useful for computing the time domain features like energy, zero crossing rate and autocorrelation at the gross level. The different frequency or spectral components that are present in the speech signal are not directly apparent in the time domain. Hence, frequency domain representation is needed which is achieved using fourier transform. The conventional fourier representation is inadequate to provide information about the time varying nature of spectral information present in speech. Hence, short term version of fourier transform commonly known as Short Term Fourier Transform (STFT) is needed [4]. 3.2.1 Discrete Time Fourier Transform (DTFT) Discrete Time Fourier Transform (DTFT) is used to obtain the frequency domain representation [4]. If 𝑋(𝑤) is the Discrete Time Fourier Transform of 𝑥(𝑛), then the DTFT is given by, 𝑋 𝑤 = 𝑥(𝑛)𝑒−𝑗𝜔𝑛∞ 𝑛=−∞ (9) Where 𝑥(𝑛) is a discrete time signal and it is given by, 𝑥 𝑛 = 1 2𝛱 𝑋(𝑤) 𝛱 −𝛱 𝑒 𝑗𝜔𝑛 𝑑𝜔 (10) As 𝑋 𝑤 is continuous function of frequency, it cannot be computed. To make it as a discrete version of the DTFT termed as Discrete Fourier Transform (DFT), we have to uniformly sample 𝑋 𝑤 as, 𝑋 𝑤 = 𝑋(𝑤 𝑘 ) (11) Where 𝑤 𝑘 = 2𝛱 𝑁 𝑘, k = 0, 1... (N-1), where N is the number of samples of 𝑋(𝑤). In speech, large magnitude frequency components dominate in the linear magnitude spectrum to give a false picture of small magnitude frequency components. To overcome this, the logarithmic version of the magnitude spectrum is taken. Although large and small magnitude frequency components are visible in log magnitude spectrum, it does not tell when a particular frequency component is present. Hence, the timing information is completely lost. Speech is a non-stationery signal where the frequency components change with time.
  • 4. International Journal of Computer Applications (0975 – 8887) International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014) 42 Therefore, a representation that will give time varying spectral information is needed for speech. 3.2.2 Short Term Fourier Transform (STFT) To obtain the time varying spectral information, Short Term Fourier Transform approach is employed. DTFT is computed using DFT for a block size of 20ms and this process is repeated for all the blocks of speech signal. All the spectra computed are stacked together as a function of time and frequency to observe the time varying spectra. This leads to Short Term Fourier Transform which is given by [4], [8], 𝑋 𝑤, 𝑛 = 𝑥 𝑚 𝑤(𝑛 − 𝑚)𝑒−𝑗𝜔𝑚∞ 𝑚=−∞ (12) where 𝑤(𝑛) is the window function for Short Term Processing, [𝑥 𝑚 . 𝑤 𝑛 − 𝑚 ] represents the windowed segment at the time instant 𝑛. In STFT, the spectral amplitude and phase are function of both frequency and time whereas it is the only function of frequency in DTFT. STFT magnitude spectra is a three dimensional (3D) plot of spectral amplitude versus time and frequency. By fixing the time 𝑛 at a particular value 𝑛 = 𝑛0 and observing STFT gives spectrum of that segment of speech. Alternatively, by fixing the frequency 𝑤 at a particular value 𝑤 = 𝑤0 and then observing STFT gives time varying nature of that particular frequency component. The spectral amplitude associated with a frequency component varies as a function of time. 4. RESULTS AND DISCUSSIONS A clean speech signal without any noise added to it is taken as input. Fig 4: Input speech waveform Fig. 4 shows the input speech waveform of a clean speech signal. Short Term Energy, Short Term Zero Crossing Rate, Short Term Autocorrelation are computed from time domain processing of speech. Short Term Fourier transform is computed from frequency domain processing of speech which overcomes Discrete Time Fourier Transform. 4.1 Short Term Energy Computation Fig 5: Short Term Energy contour for the speech signal with 20ms frame size Fig. 5 depicts that waveform region with high amplitude and high energy is detected as voiced speech and the waveform region with low amplitude and low energy is detected as unvoiced speech. 4.2 Short Term Zero Crossing Rate Computation Fig 6: Short Term Zero Crossing Rate contour for the speech signal with 20ms frame size Fig. 6 depicts that the waveform region with more number of zero crossings is detected as unvoiced speech and the waveform region with less number of zero crossings is detected as voiced speech. 4.3 Short Term Autocorrelation Computation Fig 7: Short Term Autocorrelation for the speech signal with 30ms time duration Fig. 7 depicts that the speech signal waveform within 30ms duration is periodic and hence it is detected as voiced speech where as the speech signal waveform after 30ms duration is random noise-like structure and hence it is detected as unvoiced speech. 4.4 Discrete Time Fourier Transform Computation Fig 8: Linear magnitude DTFT spectrum of a speech signal
  • 5. International Journal of Computer Applications (0975 – 8887) International conference on Innovations in Information, Embedded and Communication Systems (ICIIECS-2014) 43 Fig. 8 shows that large magnitude frequency component dominates the small magnitude frequency component. Hence, linear magnitude DTFT spectrum cannot be used for spectral analysis. Fig 9: Log magnitude DTFT spectrum of a speech signal Fig. 9 shows both large magnitude and small magnitude frequency component. However, the log magnitude DTFT spectrum does not tell when a particular frequency component is present. Hence, timing information is completely lost. 4.5 Short Term Fourier Transform Computation Fig 10: 3D plot of Short Term Fourier Transform Fig. 10 shows time varying frequency components. Here, low frequency region with high amplitude and energy is detected as voiced speech and high frequency region with low amplitude and energy is detected as unvoiced speech. 5. CONCLUSION AND FUTURE WORK Detection of voiced and unvoiced speech region is important in speech analysis. Voiced/unvoiced identification is usually performed to extract information from the speech signal. Short Term Processing (STP) was performed to detect the voiced and unvoiced regions of speech. Short Term Processing divides the input speech signal into short analysis frames of size 10-30ms with a frame shift of 10ms. Short Term processing was computed both in time domain and in frequency domain. Short Term Energy (STE), Short Term Zero Crossing Rate (STZCR), Short Term Autocorrelation (STA) was computed from the time domain. In the time domain, the spectral components of the speech signal are not apparent. Hence, frequency domain representation was needed and achieved using fourier transform. Hence, Short Term Fourier Transform (STFT) was computed to provide information about the time varying nature of spectral components present in the speech signal. This classification of speech into voiced and unvoiced regions can be used for applications like speaker identification and verification, speech coding, speech synthesis and speech enhancement in future. 6. REFERENCES [1] A.E.Mahdi and E.Jafer, “Two-Feature Voiced/Unvoiced Classifier Using Wavelet Transform”, The Open Electrical and Electronic Engineering Journal, No.2, pp.8-13, 2008. [2] Bishnu S.Atal and Lawrence R.Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.24, No.3, 2003, pp. 201-212. [3] D.G.Childers, M.Hahn and J.N.Larar “Silent and Voiced/Unvoiced/Mixed Excitation (Four-way) Classification of Speech”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.37, No.11, November 1989. [4] Lawrence R.Rabiner and Ronald W.Schafer, “Introduction to Digital Speech Processing”, Foundations and Trends in Signal Processing, Vol.1, No.33-53, 2007. [5] Mojtaba Radmard, Mahdi Hadavi and Mohammad Mahdi Nayebi, “A New Method of Voiced/Unvoiced Classification Based on Clustering”, Journal of Signal and Information Processing, 2011, Vol.2, pp.336-347. [6] N.Dhananjaya and B.Yegnanarayana, “Voiced/ Non- voiced Detection Based on Robustness of Voiced Epochs”, IEEE Signal Processing Letters, Vol.17, No.3, March 2010. [7] R.G.Bachu, S.Kopparthi, B.Adapa and B.D.Barkana, “Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy”, Advanced Techniques in Computing Sciences and Software Engineering, pp.279-282, 2010. [8] Ronald W.Schafer and Lawrence R.Rabiner, “Digital Representations of Speech Signals”, Proceedings of the IEEE, Vol.63, No.4, April 1975. [9] S.Ahmadi and A.S.Spanias, “Cepstrum Based Pitch Detection Using a New Statistical V/UV Classification Algorithm”, IEEE Transactions on Speech and audio Processing, Vol.7, No.3, pp.333- 338, 2002. [10] Yingyong Qi and Bobby R.Hunt, “Voiced-Unvoiced- Silence Classifications of Speech Using Hybrid Features and a Network Classifier”, IEEE Transactions on Speech and Audio Processing, Vol.1, No.2, pp. 250-255, 2002.