Deep Learning For Speech Recognition

•

2 likes•3,548 views

Deep Learning techniques have enabled exciting novel applications. Recent advances hold lot of promise for speech based applications that include synthesis and recognition. This slideset is a brief overview that presents a few architectures that are the state of the art in contemporary speech research. These slides are brief because most concepts/details were covered using the blackboard in a classroom setting. These slides are meant to supplement the lecture.

Technology

DEEP LEARNING FOR SPEECH
RECOGNITION
Anantharaman Palacode Narayana Iyer
JNResearch
ananth@jnresearch.com
15 April 2016

AGENDA
 Types of Speech Recognition and applications
 Traditional implementation pipeline
 Deep Learning for Speech Recognition
 Future directions

SPEECH APPLICATIONS
 Speech recognition:
 Hands-free in a car
 Commands for Personal assistants – e.g Siri
 Gaming
 Conversational agents
 E.g. agent for flight schedule enquiry, bookings etc
 Speaker identification
 E.g Forensics
 Extracting emotions and social meanings
 Text to speech

TYPES OF RECOGNITIONTASKS
 Isolated word recognition
 Connected words recognition
 Continuous speech recognition (LVCSR)
 The above can be realized as:
 Speaker independent implementation
 Speaker dependent implementation

SPEECH RECOGNITION IS PROBABILISTIC
Steps:
 Train the system
 Cross validate, finetune
 Test
 Deploy
Speech Recognizer
(ASR)
Speech Signal
Probabilistic match
between input and a set
of words

ISOLATED WORD RECOGNITION
 From the audio signal generate features. MFCC or
Filter banks are quite common
 Perform any additional pre-processing
 Using a code book of a given size, convert these
features in to discrete symbols.This is the vector
quantization procedure that can be implemented
with k-means clustering
 Train HMM’s using BaumWelch algorithm
 For each word in the vocabulary, instantiate a HMM
 Intuitively choose the number of states
 The set of symbols are all valid values of the code
book
 Use the HMM to predict unseen input
HMM 1
HMM 2
HMM n
Argmax λ
P(O|λ)
Observations
Predicted
Word

CONTINUOUS SPEECH RECOGNITION
• ASR for continuous speech is
traditionally built using Gaussian
Mixture Models (GMM)
• The emission probability table that
we used for discrete symbols is now
replaced by GMM
• The parameters of this model are
learnt as a part of the training using
BaumWelch procedure

KNOWLEDGE INTEGRATION FOR SPEECH
RECOGNITION
Feature
Analysis
Unit
Matching
System
Lexical
Hypothesis
Syntactic
Hypothesis
Semantic
Hypothesis
Utterence
Verifier
Speech
Recognized utterance
Inventory of
speech
recognition
units
Word
Dictition
ary
Gramm
ar
Task
Model

SOME CHALLENGES
 We don’t know the number of words
 We don’t know the boundaries
 They are fuzzy and non unique
 ForV word reference patterns and L positions there are
exponential combinatorial possibilities

USING DEEP NETWORKS FOR ASR
 Replace the GMM with a
Deep Neural Networks that
directly provides the
likelihood estimates
 Interface the DNN with a
HMM decoder
 Issues:
 We still need the HMM with
its underlying assumptions
for tractable computation

EMERGINGTRENDS
 HMM-free ASRs
 Avoids phoneme prediction and hence the need to have a
phoneme database
 Active area of research
 Current state of the art adopted by the industry uses DNN-HMM
 Future ASRs are likely to be fully neural networks based

What's hot

Emotion Recognition Based On Audio SpeechIOSR Journals

Speech recognition final presentationhimanshubhatti

Voice RecognitionAmrita More

Speech recognitionCharu Joshi

Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest

Speech recognition system seminarDiptimaya Sarangi

Speech to text conversionankit_saluja

Speaker recognition using MFCCHira Shaukat

Speech Recognition Goa App

Speech recognition an overviewVarun Jain

Speech Recognition TechnologySeminar Links

SPEECH CODINGShradheshwar Verma

Speech synthesis technologyKalluri Madhuri

Automatic speech recognition system using deep learningAnkan Dutta

Adaptive Noise Cancellationtazim68

Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectDiego Rios

Speech emotion recognitionsaniya shaikh

Speech Recognition Systemcurrently enjoying the closing of the year and not working

SPEECH BASED EMOTION RECOGNITION USING VOICEVamshidharSingh

Mel frequency cepstral coefficient (mfcc)BushraShaikh44

What's hot (20)

Emotion Recognition Based On Audio Speech

Speech recognition final presentation

Voice Recognition

Speech recognition

Deep Learning for Speech Recognition - Vikrant Singh Tomar

Speech recognition system seminar

Speech to text conversion

Speaker recognition using MFCC

Speech Recognition

Speech recognition an overview

Speech Recognition Technology

SPEECH CODING

Speech synthesis technology

Automatic speech recognition system using deep learning

Adaptive Noise Cancellation

Emotion Speech Recognition - Convolutional Neural Network Capstone Project

Speech emotion recognition

Speech Recognition System

SPEECH BASED EMOTION RECOGNITION USING VOICE

Mel frequency cepstral coefficient (mfcc)

Viewers also liked

Overview of TensorFlow For Natural Language Processingananth

Word representation: SVD, LSA, Word2Vecananth

Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Universitat Politècnica de Catalunya

Convolutional Neural Networks: Part 1ananth

Natural Language Processing: L03 maths fornlpananth

Natural Language Processing: L02 wordsananth

Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth

An overview of Hidden Markov Models (HMM)ananth

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya

Natural Language Processing: L01 introductionananth

A Simple Introduction to Word EmbeddingsBhaskar Mitra

L05 language model_part2ananth

Machine Learning Lecture 3 Decision Treesananth

Recurrent Neural Networks, LSTM and GRUananth

67 Weeks of TensorFlowAltoros

Speech recognition project reportSarang Afle

Reasoning Over Knowledge BaseShubham Agarwal

사회 연결망의 링크 예측Kyunghoon Kim

Multi Object Tracking | Presentation 2 | ID 103001Md. Minhazul Haque

Overview Of Video Object Tracking SystemEditor IJMTER

Viewers also liked (20)

Overview of TensorFlow For Natural Language Processing

Word representation: SVD, LSA, Word2Vec

Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...

Convolutional Neural Networks: Part 1

Natural Language Processing: L03 maths fornlp

Natural Language Processing: L02 words

Deep Learning For Practitioners, lecture 2: Selecting the right applications...

An overview of Hidden Markov Models (HMM)

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...

Natural Language Processing: L01 introduction

A Simple Introduction to Word Embeddings

L05 language model_part2

Machine Learning Lecture 3 Decision Trees

Recurrent Neural Networks, LSTM and GRU

67 Weeks of TensorFlow

Speech recognition project report

Reasoning Over Knowledge Base

사회 연결망의 링크 예측

Multi Object Tracking | Presentation 2 | ID 103001

Overview Of Video Object Tracking System

Similar to Deep Learning For Speech Recognition

lec26_audio.pptxKarimdabbabi

Thinking about nlpPan Xiaotong

Integration of speech recognition with computer assisted translationChamani Shiranthika

sr.pptKaleemKashif1

Voice recognitionr.pptSahidKhan61

sr.pptchalachew5

Intelligent speech based sms systemKamal Spring

Speech recognizers & generatorsPaul Kahoro

speech enhancementsenthilrajvlsi

STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig

Kf2517971799IJERA Editor

scribgy.pptNishanthNayakaNR

AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT

Deciphering voice of customer through speech analyticsR Systems International

NLP Techniques for Speech Recognition.docxKevinSims18

Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma

DeepPavlov 2019Mikhail Burtsev

CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation

Similar to Deep Learning For Speech Recognition (20)

lec26_audio.pptx

Thinking about nlp

Integration of speech recognition with computer assisted translation

sr.ppt

Voice recognitionr.ppt

sr.ppt

Intelligent speech based sms system

Speech recognizers & generators

speech enhancement

STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...

Kf2517971799

scribgy.ppt

AUTOMATIC SPEECH RECOGNITION- A SURVEY

Deciphering voice of customer through speech analytics

NLP Techniques for Speech Recognition.docx

Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit

DeepPavlov 2019

CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

ICT role in 21st century education and its challenges

AWS Community Day CPH - Three problems of Terraform

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

JohnPollard-hybrid-app-RailsConf2024.pptx

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Six Myths about Ontologies: The Basics of Formal Ontology

CNIC Information System with Pakdata Cf In Pakistan

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Deep Learning For Speech Recognition

1. DEEP LEARNING FOR SPEECH RECOGNITION Anantharaman Palacode Narayana Iyer JNResearch ananth@jnresearch.com 15 April 2016

2. REFERENCES

3. AGENDA  Types of Speech Recognition and applications  Traditional implementation pipeline  Deep Learning for Speech Recognition  Future directions

4. SPEECH APPLICATIONS  Speech recognition:  Hands-free in a car  Commands for Personal assistants – e.g Siri  Gaming  Conversational agents  E.g. agent for flight schedule enquiry, bookings etc  Speaker identification  E.g Forensics  Extracting emotions and social meanings  Text to speech

5. TYPES OF RECOGNITIONTASKS  Isolated word recognition  Connected words recognition  Continuous speech recognition (LVCSR)  The above can be realized as:  Speaker independent implementation  Speaker dependent implementation

6. SPEECH RECOGNITION IS PROBABILISTIC Steps:  Train the system  Cross validate, finetune  Test  Deploy Speech Recognizer (ASR) Speech Signal Probabilistic match between input and a set of words

7. ISOLATED WORD RECOGNITION  From the audio signal generate features. MFCC or Filter banks are quite common  Perform any additional pre-processing  Using a code book of a given size, convert these features in to discrete symbols.This is the vector quantization procedure that can be implemented with k-means clustering  Train HMM’s using BaumWelch algorithm  For each word in the vocabulary, instantiate a HMM  Intuitively choose the number of states  The set of symbols are all valid values of the code book  Use the HMM to predict unseen input HMM 1 HMM 2 HMM n Argmax λ P(O|λ) Observations Predicted Word

8. CONTINUOUS SPEECH RECOGNITION • ASR for continuous speech is traditionally built using Gaussian Mixture Models (GMM) • The emission probability table that we used for discrete symbols is now replaced by GMM • The parameters of this model are learnt as a part of the training using BaumWelch procedure

9. KNOWLEDGE INTEGRATION FOR SPEECH RECOGNITION Feature Analysis Unit Matching System Lexical Hypothesis Syntactic Hypothesis Semantic Hypothesis Utterence Verifier Speech Recognized utterance Inventory of speech recognition units Word Dictition ary Gramm ar Task Model

10. SOME CHALLENGES  We don’t know the number of words  We don’t know the boundaries  They are fuzzy and non unique  ForV word reference patterns and L positions there are exponential combinatorial possibilities

11. USING DEEP NETWORKS FOR ASR  Replace the GMM with a Deep Neural Networks that directly provides the likelihood estimates  Interface the DNN with a HMM decoder  Issues:  We still need the HMM with its underlying assumptions for tractable computation

12. EMERGINGTRENDS  HMM-free ASRs  Avoids phoneme prediction and hence the need to have a phoneme database  Active area of research  Current state of the art adopted by the industry uses DNN-HMM  Future ASRs are likely to be fully neural networks based

Deep Learning For Speech Recognition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Deep Learning For Speech Recognition

Similar to Deep Learning For Speech Recognition (20)

More from ananth

More from ananth (15)

Recently uploaded

Recently uploaded (20)

Deep Learning For Speech Recognition