speech processing basics

Speech Processing
• Fundamentals of Digital Speech processing
1.Anatomy and physiology of speech organs
2.The process of speech production
3.The Acoustic Theory of speech production
4.Digital models for speech signals

Applications of Speech Processing
• 1.Speech recognition: speech to text
• 2.Speech understanding: Not exact words(meaning is
important rather than text) :speech translation
• 3.speech synthesis: Text to speech, computer can
speak to you
• 4.Word processing: check and correct spelling,
grammar and style
• 5.text prediction: speed up word processing
• 6.automatic summarization: Topic identification,
summary generation
• 7.text mining : Necessary data

• Anatomy: It is the study of structure of bodies of people or animals
• Physiology: It is the study of how people’s and animals bodies functions
and understanding the higher order mechanisms within the human central
nervous system that account for speech production in human beings
• Acoustic: It is a scientific study of sounds
• Phonetics: It is relating to the sound of a word or to the sounds that are
used in languages
• Phonemes: It is the smallest unit of sounds which is significant in a
language
• Articulatory:It is the action of productory a sound or word cleary,in speech
or music
• Linguistics: It is study of the way in which language works
• Semantics: It is the branch of Linguistics that deals with the meanings of
words and sentences.

Speech Processing
Signal
Processing Information
Theory
Phonetics
Acoustics
Algorithms
(Programming)
Fourier transforms
Discrete time filters
AR(MA) models
Entropy
Communication theory
Rate-distortion theory
Statistical SP
Stochastic
models
Psychoacoustics
Room acoustics
Speech production

ASR: Application
© James Glass, MIT

7
Recognition
Voice Input Analog to Digital Acoustic Model
Language Model
Display Speech EngineFeedback

Speech Generation
• first talker formulates a message(in this mind)that
he wants to transmit to listener via speech
• The process of message formulation is creation of
printed text expressing the words of message
• The next step is conversion of the message into a
language code.
• This roughly corresponds to converting the
printed text of message into set of phoneme
sequence corresponding to sounds that make up
words and pitch accent associated with the
sounds

• Once the language code is chosen, the talker
must execute a series of neuromuscular
commands to cause the vocal cords to vibrate
when appropriate and shape the vocal tract
such that the proper sequence of speech
sounds is created and spoken by the talker,
then producing an acoustic signal as final
output

Speech Recognition
• First the listener processes the acoustic signal
the basilar membrane in the inner ear, which
providing a running spectrum analysis of the
incoming signal.
• The neural activity along the auditory nerve is
converted into a language code at higher
centers of processing within the brain and
message comprehension is achieved

• The lungs and the associated muscles act as the source
of air for exciting the vocal mechanism.
• The muscle force pushes air out of lungs(shown as a
piston pushing up within a cylinder)and though the
bronchi and trachea.
• When the vocal cords are tensed, the air flow causes
them to vibrate ,producing so called voiced speech
sounds
• When the vocal cords are relaxed, in order to produce
a sound, the air flow either must pass through a
constriction in vocal tract and thereby become
turbulent, producing so called unvoiced speech sounds

Classifications
• 1.silence(s)-no speech is produced()
• 2.Unvoiced(U):vocal cords are not vibrating so
speech signal is aperiodic or random in nature
• 3.Voiced(V): vocal cords are vibrate
periodically when air flows from the lungs, so
speech signal is periodic

Speech Waveform Characteristics
• Loudness
• Voiced/Unvoiced.
• Pitch.
– Fundamental frequency.
• Spectral envelope.
– Formants.

Speech Waveform Characteristics
Cont.
Voiced Speech Unvoiced Speech
/ih/ /s/

Phoneme Hierarchy
Speech sounds
Vowels ConsonantsDiphtongs
Plosive
Nasal
Fricative
Retroflex
liquid
Lateral
liquid
Glide
iy, ih, ae, aa,
ah, ao,ax, eh,
er, ow, uh, uw
ay, ey,
oy, aw
w, y
p, b, t,
d, k, g
m, n, ng f, v, th, dh,
s, z, sh, zh, h
r
l
Language dependent.
About 50 in English.

• Speech signals are composed of a sequence of
sounds.
• The study of these rules and their implication
s in human communication is the domain of
linguistics.
• The study and classification of sound of
speech is called phonetics.

speech processing basics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à speech processing basics

Similaire à speech processing basics (20)

Dernier

Dernier (20)

speech processing basics