SlideShare a Scribd company logo
1 of 12
Download to read offline
DEEP LEARNING FOR SPEECH
RECOGNITION
Anantharaman Palacode Narayana Iyer
JNResearch
ananth@jnresearch.com
15 April 2016
REFERENCES
AGENDA
 Types of Speech Recognition and applications
 Traditional implementation pipeline
 Deep Learning for Speech Recognition
 Future directions
SPEECH APPLICATIONS
 Speech recognition:
 Hands-free in a car
 Commands for Personal assistants – e.g Siri
 Gaming
 Conversational agents
 E.g. agent for flight schedule enquiry, bookings etc
 Speaker identification
 E.g Forensics
 Extracting emotions and social meanings
 Text to speech
TYPES OF RECOGNITIONTASKS
 Isolated word recognition
 Connected words recognition
 Continuous speech recognition (LVCSR)
 The above can be realized as:
 Speaker independent implementation
 Speaker dependent implementation
SPEECH RECOGNITION IS PROBABILISTIC
Steps:
 Train the system
 Cross validate, finetune
 Test
 Deploy
Speech Recognizer
(ASR)
Speech Signal
Probabilistic match
between input and a set
of words
ISOLATED WORD RECOGNITION
 From the audio signal generate features. MFCC or
Filter banks are quite common
 Perform any additional pre-processing
 Using a code book of a given size, convert these
features in to discrete symbols.This is the vector
quantization procedure that can be implemented
with k-means clustering
 Train HMM’s using BaumWelch algorithm
 For each word in the vocabulary, instantiate a HMM
 Intuitively choose the number of states
 The set of symbols are all valid values of the code
book
 Use the HMM to predict unseen input
HMM 1
HMM 2
HMM n
Argmax λ
P(O|λ)
Observations
Predicted
Word
CONTINUOUS SPEECH RECOGNITION
• ASR for continuous speech is
traditionally built using Gaussian
Mixture Models (GMM)
• The emission probability table that
we used for discrete symbols is now
replaced by GMM
• The parameters of this model are
learnt as a part of the training using
BaumWelch procedure
KNOWLEDGE INTEGRATION FOR SPEECH
RECOGNITION
Feature
Analysis
Unit
Matching
System
Lexical
Hypothesis
Syntactic
Hypothesis
Semantic
Hypothesis
Utterence
Verifier
Speech
Recognized utterance
Inventory of
speech
recognition
units
Word
Dictition
ary
Gramm
ar
Task
Model
SOME CHALLENGES
 We don’t know the number of words
 We don’t know the boundaries
 They are fuzzy and non unique
 ForV word reference patterns and L positions there are
exponential combinatorial possibilities
USING DEEP NETWORKS FOR ASR
 Replace the GMM with a
Deep Neural Networks that
directly provides the
likelihood estimates
 Interface the DNN with a
HMM decoder
 Issues:
 We still need the HMM with
its underlying assumptions
for tractable computation
EMERGINGTRENDS
 HMM-free ASRs
 Avoids phoneme prediction and hence the need to have a
phoneme database
 Active area of research
 Current state of the art adopted by the industry uses DNN-HMM
 Future ASRs are likely to be fully neural networks based

More Related Content

What's hot

Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAnkan Dutta
 
Adaptive Noise Cancellation
Adaptive Noise CancellationAdaptive Noise Cancellation
Adaptive Noise Cancellationtazim68
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectDiego Rios
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognitionsaniya shaikh
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICEVamshidharSingh
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)BushraShaikh44
 

What's hot (20)

Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Adaptive Noise Cancellation
Adaptive Noise CancellationAdaptive Noise Cancellation
Adaptive Noise Cancellation
 
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone ProjectEmotion Speech Recognition - Convolutional Neural Network Capstone Project
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
 
Speech emotion recognition
Speech emotion recognitionSpeech emotion recognition
Speech emotion recognition
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICE
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)
 

Viewers also liked

Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpananth
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
67 Weeks of TensorFlow
67 Weeks of TensorFlow67 Weeks of TensorFlow
67 Weeks of TensorFlowAltoros
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project reportSarang Afle
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
사회 연결망의 링크 예측
사회 연결망의 링크 예측사회 연결망의 링크 예측
사회 연결망의 링크 예측Kyunghoon Kim
 
Multi Object Tracking | Presentation 2 | ID 103001
Multi Object Tracking | Presentation 2 | ID 103001Multi Object Tracking | Presentation 2 | ID 103001
Multi Object Tracking | Presentation 2 | ID 103001Md. Minhazul Haque
 
Overview Of Video Object Tracking System
Overview Of Video Object Tracking SystemOverview Of Video Object Tracking System
Overview Of Video Object Tracking SystemEditor IJMTER
 

Viewers also liked (20)

Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Natural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlpNatural Language Processing: L03 maths fornlp
Natural Language Processing: L03 maths fornlp
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
67 Weeks of TensorFlow
67 Weeks of TensorFlow67 Weeks of TensorFlow
67 Weeks of TensorFlow
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
사회 연결망의 링크 예측
사회 연결망의 링크 예측사회 연결망의 링크 예측
사회 연결망의 링크 예측
 
Multi Object Tracking | Presentation 2 | ID 103001
Multi Object Tracking | Presentation 2 | ID 103001Multi Object Tracking | Presentation 2 | ID 103001
Multi Object Tracking | Presentation 2 | ID 103001
 
Overview Of Video Object Tracking System
Overview Of Video Object Tracking SystemOverview Of Video Object Tracking System
Overview Of Video Object Tracking System
 

Similar to Deep Learning For Speech Recognition

Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.pptSahidKhan61
 
Intelligent speech based sms system
Intelligent speech based sms systemIntelligent speech based sms system
Intelligent speech based sms systemKamal Spring
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxKevinSims18
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 

Similar to Deep Learning For Speech Recognition (20)

lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.ppt
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Intelligent speech based sms system
Intelligent speech based sms systemIntelligent speech based sms system
Intelligent speech based sms system
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
scribgy.ppt
scribgy.pptscribgy.ppt
scribgy.ppt
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docx
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 

More from ananth

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligenceananth
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overviewananth
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
 
L05 word representation
L05 word representationL05 word representation
L05 word representationananth
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionananth
 

More from ananth (15)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Deep Learning For Speech Recognition

  • 1. DEEP LEARNING FOR SPEECH RECOGNITION Anantharaman Palacode Narayana Iyer JNResearch ananth@jnresearch.com 15 April 2016
  • 3. AGENDA  Types of Speech Recognition and applications  Traditional implementation pipeline  Deep Learning for Speech Recognition  Future directions
  • 4. SPEECH APPLICATIONS  Speech recognition:  Hands-free in a car  Commands for Personal assistants – e.g Siri  Gaming  Conversational agents  E.g. agent for flight schedule enquiry, bookings etc  Speaker identification  E.g Forensics  Extracting emotions and social meanings  Text to speech
  • 5. TYPES OF RECOGNITIONTASKS  Isolated word recognition  Connected words recognition  Continuous speech recognition (LVCSR)  The above can be realized as:  Speaker independent implementation  Speaker dependent implementation
  • 6. SPEECH RECOGNITION IS PROBABILISTIC Steps:  Train the system  Cross validate, finetune  Test  Deploy Speech Recognizer (ASR) Speech Signal Probabilistic match between input and a set of words
  • 7. ISOLATED WORD RECOGNITION  From the audio signal generate features. MFCC or Filter banks are quite common  Perform any additional pre-processing  Using a code book of a given size, convert these features in to discrete symbols.This is the vector quantization procedure that can be implemented with k-means clustering  Train HMM’s using BaumWelch algorithm  For each word in the vocabulary, instantiate a HMM  Intuitively choose the number of states  The set of symbols are all valid values of the code book  Use the HMM to predict unseen input HMM 1 HMM 2 HMM n Argmax λ P(O|λ) Observations Predicted Word
  • 8. CONTINUOUS SPEECH RECOGNITION • ASR for continuous speech is traditionally built using Gaussian Mixture Models (GMM) • The emission probability table that we used for discrete symbols is now replaced by GMM • The parameters of this model are learnt as a part of the training using BaumWelch procedure
  • 9. KNOWLEDGE INTEGRATION FOR SPEECH RECOGNITION Feature Analysis Unit Matching System Lexical Hypothesis Syntactic Hypothesis Semantic Hypothesis Utterence Verifier Speech Recognized utterance Inventory of speech recognition units Word Dictition ary Gramm ar Task Model
  • 10. SOME CHALLENGES  We don’t know the number of words  We don’t know the boundaries  They are fuzzy and non unique  ForV word reference patterns and L positions there are exponential combinatorial possibilities
  • 11. USING DEEP NETWORKS FOR ASR  Replace the GMM with a Deep Neural Networks that directly provides the likelihood estimates  Interface the DNN with a HMM decoder  Issues:  We still need the HMM with its underlying assumptions for tractable computation
  • 12. EMERGINGTRENDS  HMM-free ASRs  Avoids phoneme prediction and hence the need to have a phoneme database  Active area of research  Current state of the art adopted by the industry uses DNN-HMM  Future ASRs are likely to be fully neural networks based