SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Research Issues in Speech Processing




                    Dr. M. Sabarimalai Manikandan
                        msm.sabari@gmail.com
Speech Production: the source-filter model
Speech signal conveys the information contained in the spoken word
         highly non-stationary signal
         Short segments of speech (20 to 30 ms )
         acoustical energy is in the frequency range of 100-6000 Hz




        Vocal tract transfer function can be modeled by an all-pole filter
Speech Processing Tasks


Speech recognition (recognizing lexical content)
Speech synthesis (Text-to speech)
Speaker recognition (recognizing who is speaking)
Speech understanding and vocal dialog
Speech coding (data rate deduction)
Speech enhancement (Noise reduction)
Speech transmission (noise free communication)
Voice conversion
Speech Processing
Speech measurements
       Short-time energy (STE)
       Zero crossing rate (ZCR)
       Autocorrelation (AC)
       Pitch period or frequency
       Formants

Speech signal components
       Speech-Silence or Non-speech
       Voiced speech-Unvoiced speech
Speech Processing
Speech representations or models
       Temporal features
          •   Low energy rate
          •   Zero crossing rate (ZCR)
          •   4Hz modulation energy
          •   Pitch contour

       Spectral features
           •    Spectral Centroid (sharpness)
           •    Spectral Flux (rate of change)
           •    Spectral Roll-Off (spectral shape)
           •    Spectral Flatness (deviation of the spectral form)
       Linear Predictive Coefficients (LPC)
       Cepstral coefficients
       Mel Frequency Cepstral Coefficients (MFCC): human auditory system
       Harmonic features: sinusoidal harmonic modelling
       Perceptual features: model of the human hearing process
       First order derivative (DELTA)
Elements of the speech signal
Phonemes: the smallest units of speech sounds
       Vowels and Consonants
       ~12 to 21 different vowel sounds used in the English language

       Consonants involve rapid and sometimes subtle changes in sound
              according to the manner of articulation:
                   •    plosive (p, b, t, etc.)
                   •    fricative (f, s, sh, etc.)
                   •    nasal (m, n, ng)
                   •    liquid (r, l) and
                   •    semivowel (w, y)

       Consonants are more independent of language than vowels are.

Syllable: one or more phonemes

Word: one or more syllables
Automatic Speech Recognition
There are two uses for speech recognition systems:

    Dictation: translation of the spoken word into written text
    Computer Control: control of the computer, and software
    applications by speaking commands

    Speaker dependent system: to operate for a single speaker
    Speaker independent system: to operate for any speaker
    of a particular type
    Speaker adaptive system: to adapt its operation to the
    characteristics of new speakers

    The size of vocabulary affects the complexity, processing
    requirements and the accuracy of the system
Speech Recognition: Applications

Automatic translation
Vehicle navigation systems
Human computer Interaction
Content-based spoken audio search
Home automation
Pronunciation evaluation
Robotics
Video games
Transcription of speech into mobile text messages
People with disabilities
Speech Recognition System

Sampling of speech

Acoustic signal processing:
   •     Linear Prediction Cepstral Coefficients (LPCC)
   •     Mel Frequency Cepstral Coefficients (MFCC)
   •     Perceptual Linear Prediction Cepstral Coefficients (PLPCC)

Recognition of phonemes, groups of phonemes and words:
   •    Dynamic Time Warping (DTW)
   •    hidden Markov models (HMMs)
   •    Gaussian mixture models (GMMs)
   •    Neural Networks (NNs)
   •    Expert systems and combinations of techniques
Automatic Speaker Recognition
Speaker recognition: the process of automatically recognizing who is
speaking by using the speaker-specific information included in speech
sounds

Speaker identity: physiological and behavioral characteristics of the speech
production model of an individual speaker
         the spectral envelope (vocal tract characteristics)
         the supra-segmental features (voice source characteristics) of
         speech

Applications:
    •    banking over a telephone network
    •    telephone shopping and database access services
    •    voice dialing and mail
    •     information and reservation services
    •    security control for confidential information
    •    forensics and surveillance applications
Speaker Recognition
Speaker identification: the process of determining which registered speaker
provides input speech sounds

                                  Similarity



                               Ref. template or
                              model (speaker #1)


                                   Similarity                     Identification
  Input       Feature                              Maximum
 speech      Extraction                                               result
                                                   selection
                                                                   (Speaker ID)
                               Ref. template or
                              model (speaker #2)



                                   Similarity



                               Ref. template or
                              model (speaker #N)
Speaker Recognition
Speaker verification: the process of accepting or rejecting the
identity claim of a speaker.
     Input        Feature                                   Verification
    speech       Extraction    Similarity     Decision         result
                                                          (Accept /Reject)


                              Ref. template   Threshold
                Input           or model
               speech         (speaker #M)




         Open Set and Closed Set Recognition

         Text-dependent and Text-independent Recognition
                 •   Vector quantization
                 •   Gaussian mixture models (GMM)
                 •   Dynamic time warping (DTW)
                 •   Hidden Markov model (HMM)
Text-to-Speech (TTS) System
    Synthesis of Speech for effective human machine communications
                     reading email messages
                     call center help desks and customer care
                     announcement machines



Raw or            Text             Phonetic          Prosodic        Speech            Synthetic
tagged text      Analysis          Analysis          Analysis       Synthesis          Speech

                    Document
                                      Homograph
                    Structure                           Pitch        Voice Rendering
                                    disambiguation
                    Detection


                                    Grapheme-to-
                       Text
                                      Phoneme          Duration
                   Normalization
                                     Conversion



                     Linguistic
                      Analysis




              Synthetic speech should be intelligible and natural
Speech Synthesis

Text-to-speech (TTS) synthesis systems
       Approach
       TTS system performance measure
          • Synthetic Speech Intelligibility
          • Synthetic speech naturalness

Speech Intelligibility Tests
      Segmental level analysis
          • the Rhyme Test
          • the Modified Rhyme Test
          • the Diagnostic Rhyme Test
      Supra-segmental analysis
          • the Harvard Psychoacoustic Sentences (HPS)
          • the Haskins syntactic sentences
Speech Coding (Compression)
Speech Coding for efficient transmission and storage of speech
           narrowband and broadband wired telephony
           cellular communications
           Voice over IP (VoIP) to utilize the Internet
           Telephone answering machines
           IVR systems
           Prerecorded messages
Speech-Assisted Translation Corrector System

 Objective: Develop a speech-assisted translation corrector (SATC)
 system which provides a grammatically correct sentence for a
 translated sentence from the machine translation
                              translated sentence                               grammatically
input                                 with                                      correct sentence
sentence       Multilingual   grammatical errors        Speech assisted
                Machine                               translation corrector
               Translation                                   system               text




He          came     here                                           speech               storage
                                                    Translator
                                                    speech signal is produced from the
                                                    words in the translated sentence.



“A MT system is correct and complete if it can analyze of the grammatical structures
encountered in the source language, and it can generate all of the grammatical structures
necessary in the target language translation.”
8/25/2011                                                                                    16
SATC System: Requirements and Challenging Tasks

   Creation of large scale rich multilingual speech databases is crucial
 task for research and development in language and speech technology

            Indian languages
            speakers (10 Males and 10 Females)
            age groups ( <20, 15-40, >40)
            audio format: 16-bit stereo, and sampling rate of 44.1 kHz
            annotation and assessment of speech databases


   Development of multilingual text to speech interface

   Development of spoken word matching module

   Development of speech signal processing (SSP) tools



8/25/2011                                                                17
Major Problems in Speech Processing
Acoustic variability: the same phonemes pronounced in
different contexts will have different acoustic realization
(coarticulation effect)

The signal is different when speech is uttered in various
environments:
       noise
       reverberation
       different types of microphones.

Speaking variability: when the same speaker speaks normally,
shouts, whispers, uses a creaky voice, or has a cold

Speaker variability: since different speakers have different
timbers and different speaking habits
Major Problems in Speech Processing
Linguistic variability: the same sentence can be pronounced
in many different ways, using many different words,
synonyms, and many different syntactic structures and
prosodic schemes

Phonetic variability: due to the different possible
pronunciations of the same words by speakers having
different regional accents

Lombard effect: noise modifies the utterance of the words (as
people tend to speak louder)
Major Problems in Speech Processing
Continuous speech:
   words are connected together (not separated by pauses or
   silences).

   It is difficult to find the start and end points of words

   The production of each phoneme is affected by the
   production of surrounding phonemes

   The start and end of words are affected by the preceding
   and following words

   the rate of speech (fast speech tends to be harder)
References

M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to
observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp.
777-782, 1999

S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981

M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements
in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286,
2001

T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor
tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996.

S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech
acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998.

Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using
Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.

Contenu connexe

Tendances

Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionManthan Gandhi
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionfathitarek
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overviewsajanazoya
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxNsaroj kumar
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 

Tendances (20)

Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 

En vedette

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizersheilacook
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeerRanbeer Tyagi
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speechRaghu Veer
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentationrandan88
 
Radio Communication
Radio CommunicationRadio Communication
Radio CommunicationJohn Grace
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processingsandhya jois
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....pptbalu008
 

En vedette (10)

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizer
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
 

Similaire à Speech processing

Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overviewamr0mt
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognitionVipul Munot
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 

Similaire à Speech processing (20)

Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Assign
AssignAssign
Assign
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 

Dernier

Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 

Dernier (20)

Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 

Speech processing

  • 1. Research Issues in Speech Processing Dr. M. Sabarimalai Manikandan msm.sabari@gmail.com
  • 2. Speech Production: the source-filter model Speech signal conveys the information contained in the spoken word highly non-stationary signal Short segments of speech (20 to 30 ms ) acoustical energy is in the frequency range of 100-6000 Hz Vocal tract transfer function can be modeled by an all-pole filter
  • 3. Speech Processing Tasks Speech recognition (recognizing lexical content) Speech synthesis (Text-to speech) Speaker recognition (recognizing who is speaking) Speech understanding and vocal dialog Speech coding (data rate deduction) Speech enhancement (Noise reduction) Speech transmission (noise free communication) Voice conversion
  • 4. Speech Processing Speech measurements Short-time energy (STE) Zero crossing rate (ZCR) Autocorrelation (AC) Pitch period or frequency Formants Speech signal components Speech-Silence or Non-speech Voiced speech-Unvoiced speech
  • 5. Speech Processing Speech representations or models Temporal features • Low energy rate • Zero crossing rate (ZCR) • 4Hz modulation energy • Pitch contour Spectral features • Spectral Centroid (sharpness) • Spectral Flux (rate of change) • Spectral Roll-Off (spectral shape) • Spectral Flatness (deviation of the spectral form) Linear Predictive Coefficients (LPC) Cepstral coefficients Mel Frequency Cepstral Coefficients (MFCC): human auditory system Harmonic features: sinusoidal harmonic modelling Perceptual features: model of the human hearing process First order derivative (DELTA)
  • 6. Elements of the speech signal Phonemes: the smallest units of speech sounds Vowels and Consonants ~12 to 21 different vowel sounds used in the English language Consonants involve rapid and sometimes subtle changes in sound according to the manner of articulation: • plosive (p, b, t, etc.) • fricative (f, s, sh, etc.) • nasal (m, n, ng) • liquid (r, l) and • semivowel (w, y) Consonants are more independent of language than vowels are. Syllable: one or more phonemes Word: one or more syllables
  • 7. Automatic Speech Recognition There are two uses for speech recognition systems: Dictation: translation of the spoken word into written text Computer Control: control of the computer, and software applications by speaking commands Speaker dependent system: to operate for a single speaker Speaker independent system: to operate for any speaker of a particular type Speaker adaptive system: to adapt its operation to the characteristics of new speakers The size of vocabulary affects the complexity, processing requirements and the accuracy of the system
  • 8. Speech Recognition: Applications Automatic translation Vehicle navigation systems Human computer Interaction Content-based spoken audio search Home automation Pronunciation evaluation Robotics Video games Transcription of speech into mobile text messages People with disabilities
  • 9. Speech Recognition System Sampling of speech Acoustic signal processing: • Linear Prediction Cepstral Coefficients (LPCC) • Mel Frequency Cepstral Coefficients (MFCC) • Perceptual Linear Prediction Cepstral Coefficients (PLPCC) Recognition of phonemes, groups of phonemes and words: • Dynamic Time Warping (DTW) • hidden Markov models (HMMs) • Gaussian mixture models (GMMs) • Neural Networks (NNs) • Expert systems and combinations of techniques
  • 10. Automatic Speaker Recognition Speaker recognition: the process of automatically recognizing who is speaking by using the speaker-specific information included in speech sounds Speaker identity: physiological and behavioral characteristics of the speech production model of an individual speaker the spectral envelope (vocal tract characteristics) the supra-segmental features (voice source characteristics) of speech Applications: • banking over a telephone network • telephone shopping and database access services • voice dialing and mail • information and reservation services • security control for confidential information • forensics and surveillance applications
  • 11. Speaker Recognition Speaker identification: the process of determining which registered speaker provides input speech sounds Similarity Ref. template or model (speaker #1) Similarity Identification Input Feature Maximum speech Extraction result selection (Speaker ID) Ref. template or model (speaker #2) Similarity Ref. template or model (speaker #N)
  • 12. Speaker Recognition Speaker verification: the process of accepting or rejecting the identity claim of a speaker. Input Feature Verification speech Extraction Similarity Decision result (Accept /Reject) Ref. template Threshold Input or model speech (speaker #M) Open Set and Closed Set Recognition Text-dependent and Text-independent Recognition • Vector quantization • Gaussian mixture models (GMM) • Dynamic time warping (DTW) • Hidden Markov model (HMM)
  • 13. Text-to-Speech (TTS) System Synthesis of Speech for effective human machine communications reading email messages call center help desks and customer care announcement machines Raw or Text Phonetic Prosodic Speech Synthetic tagged text Analysis Analysis Analysis Synthesis Speech Document Homograph Structure Pitch Voice Rendering disambiguation Detection Grapheme-to- Text Phoneme Duration Normalization Conversion Linguistic Analysis Synthetic speech should be intelligible and natural
  • 14. Speech Synthesis Text-to-speech (TTS) synthesis systems Approach TTS system performance measure • Synthetic Speech Intelligibility • Synthetic speech naturalness Speech Intelligibility Tests Segmental level analysis • the Rhyme Test • the Modified Rhyme Test • the Diagnostic Rhyme Test Supra-segmental analysis • the Harvard Psychoacoustic Sentences (HPS) • the Haskins syntactic sentences
  • 15. Speech Coding (Compression) Speech Coding for efficient transmission and storage of speech narrowband and broadband wired telephony cellular communications Voice over IP (VoIP) to utilize the Internet Telephone answering machines IVR systems Prerecorded messages
  • 16. Speech-Assisted Translation Corrector System Objective: Develop a speech-assisted translation corrector (SATC) system which provides a grammatically correct sentence for a translated sentence from the machine translation translated sentence grammatically input with correct sentence sentence Multilingual grammatical errors Speech assisted Machine translation corrector Translation system text He came here speech storage Translator speech signal is produced from the words in the translated sentence. “A MT system is correct and complete if it can analyze of the grammatical structures encountered in the source language, and it can generate all of the grammatical structures necessary in the target language translation.” 8/25/2011 16
  • 17. SATC System: Requirements and Challenging Tasks Creation of large scale rich multilingual speech databases is crucial task for research and development in language and speech technology Indian languages speakers (10 Males and 10 Females) age groups ( <20, 15-40, >40) audio format: 16-bit stereo, and sampling rate of 44.1 kHz annotation and assessment of speech databases Development of multilingual text to speech interface Development of spoken word matching module Development of speech signal processing (SSP) tools 8/25/2011 17
  • 18. Major Problems in Speech Processing Acoustic variability: the same phonemes pronounced in different contexts will have different acoustic realization (coarticulation effect) The signal is different when speech is uttered in various environments: noise reverberation different types of microphones. Speaking variability: when the same speaker speaks normally, shouts, whispers, uses a creaky voice, or has a cold Speaker variability: since different speakers have different timbers and different speaking habits
  • 19. Major Problems in Speech Processing Linguistic variability: the same sentence can be pronounced in many different ways, using many different words, synonyms, and many different syntactic structures and prosodic schemes Phonetic variability: due to the different possible pronunciations of the same words by speakers having different regional accents Lombard effect: noise modifies the utterance of the words (as people tend to speak louder)
  • 20. Major Problems in Speech Processing Continuous speech: words are connected together (not separated by pauses or silences). It is difficult to find the start and end points of words The production of each phoneme is affected by the production of surrounding phonemes The start and end of words are affected by the preceding and following words the rate of speech (fast speech tends to be harder)
  • 21. References M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp. 777-782, 1999 S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981 M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286, 2001 T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996. S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998. Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.