SlideShare une entreprise Scribd logo
1  sur  56
Télécharger pour lire hors ligne
Deep learning in practice : Speech
recognition and beyond
Abdel HEBA
27 septembre 2017
2 / 56
OutlineOutline
● Part 1 : Basics of Machine Learning ( Deep and Shallow) and of Signal
Processing
● Part 2 : Speech Recognition
● Acoustic representation
● Probabilistic speech recognition
● Part 3 : Neural Network Speech Recognition
● Hybrid neural networks
● End-to-End architecture
● Part 4 : Kaldi
3 / 56
Reading MaterialReading Material
4 / 56
A Deep-Learning
Approach
Books:
Bengio, Yoshua (2009).
"Learning Deep Architectures fo
r AI"
.
 
L. Deng and D. Yu (2014) "Deep
Learning: Methods and
Applications"
http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol
7-SIG-039.pdf
 
D. Yu and L. Deng (2014).
"Automatic Speech
Recognition: A Deep Learning
Approach” (Publisher:
Springer).
Reading MaterialReading Material
5 / 56
Reading MaterialReading Material
6 / 56
Part I : Machine Learning ( Deep/Shallow)Part I : Machine Learning ( Deep/Shallow)
and Signal Processingand Signal Processing
7 / 56
Current view of Artificial Intelligence, Machine Learning & DeepCurrent view of Artificial Intelligence, Machine Learning & Deep
LearningLearning
Edureka blog – what-is-deep-learning
8 / 56
Current view of Machine Learning founding & disciplinesCurrent view of Machine Learning founding & disciplines
Edureka blog – what-is-deep-learning
9 / 56
Machine Learning Paradigms : An OverviewMachine Learning Paradigms : An Overview
Machine learning Data
Analysis/
Statistic
s
Programs
10 / 56
Supervised Machine Learning (classification)Supervised Machine Learning (classification)
measurements (features)
&
associated ‘class’ labels
(colors used to show class labels)
Training data set
Training
algorithm
Parameters/weights
(and sometimes structure)
Learned model
Training phase (usually offline)
11 / 56
Supervised Machine Learning (classification)Supervised Machine Learning (classification)
Input test data point
structure + parameters
predicted class label or
label sequence (e.g. sentence)
Learned model Output
measurements (features) only
Test phase (run time, online)
12 / 56
What Is Deep Learning ?What Is Deep Learning ?
Deep learning
Machine
learning
Deep learning (deep
machine learning, or deep
structured learning, or
hierarchical learning, or
sometimes DL) is a branch of
machine learning based on a
set of algorithms that attempt
to model high-level
abstractions in data by using
model architectures, with
complex structures or
otherwise, composed of
multiple non-
linear transformations.[1](p198)[2]
[3][4]
13 / 56
Evolution of Machine LearningEvolution of Machine Learning
(Slide from: Yoshua Bengio)
14 / 56
Face RecognitionFace Recognition
Y LeCun
MA Ranzato
D-AE
DBN DBM
AEPerceptron
RBM
GMM BayesNP
SVM
Sparse
Coding

DecisionTree
Boosting
SHALLOW DEEP
Conv. Net
Neural Net
RNN
Bayes Nets
Modified from
Y LeCun
MA Ranzato
SHALLOW DEEP
Neural Networks
Probabilistic Models
D-AE
DBN DBM
AEPerceptron
RBM
GMM BayesNP
SVM
Sparse
Coding

DecisionTree
Boosting
Conv. Net
Deep Neural
Net RNN
Bayes Nets
Modified from
Y LeCun
MA Ranzato
SHALLOW DEEP
Neural Networks
Probabilistic Models
Conv. Net
D-AE
DBN DBM
AEPerceptron
RBM
?GMM BayesNP
SVM
Supervised Supervised
Unsupervised
Sparse
Coding

Boosting
DecisionTree
Deep Neural
Net RNN
?Bayes Nets
Modified from
18 / 56
Part II : Speech RecognitionPart II : Speech Recognition
19 / 56
Human Communication : verbal & non verbal informationHuman Communication : verbal & non verbal information
20 / 56
Speech recognition problemSpeech recognition problem
21 / 56
Speech recognition problemSpeech recognition problem
● Automatic speech recognition
● Spontaneous vs read speech
● Large vocabulary
● In noise
● Low resource
● Far-Field
● Accent-independent
● Speaker-adaptative
● Speaker identification
● Speech enhancement
● Speech separation
22 / 56
Speech representationSpeech representation
● Same word : « Appeler »
23 / 56
Speech representationSpeech representation
We want a low-dimensionality representation, invariant to
speaker, background noise, rate of speaking etc.
● Fourier analysis shows energy in different frequency bands
24 / 56
Acoustic representationAcoustic representation
Vowel triangle as seen from the formants 1 & 2
25 / 56
Acoustic representationAcoustic representation
● Features used in speech recognition
● Mel Frequency Cepstral Coefficients – MFCC
● Perceptual Linear Prediction – PLP
● RASTA-PLP
● Filter Banks Coefficient – F-BANKs
26 / 56
Speech Recognition asSpeech Recognition as
transduction Fromtransduction From
signal to languagesignal to language
27 / 56
Speech Recognition asSpeech Recognition as
transduction Fromtransduction From
signal to languagesignal to language
28 / 56
Speech Recognition asSpeech Recognition as
transduction Fromtransduction From
signal to languagesignal to language
29 / 56
Probabilistic speech recognitionProbabilistic speech recognition
● Speech signal represented as an acoustic observation sequence
● We want to find the most likely word sequence W
● We model this with a Hidden Markov Model
● The system has a set of discrete states,
● Transitions from state to state according to transition probabilities (Markovian :
memoryless)
● Acoustic observation when making a transition is conditioned on state alone.
P(o|c)
● We seek to recover the state sequence and consequently the word sequence
30 / 56
Speech Recognition asSpeech Recognition as
transduction - Phone Recognitiontransduction - Phone Recognition
● Training Algorithm (N iteration)
● Align data & text
● Compute probabilities P(o/p) of each segments o
● Update boundaries
31 / 56
Speech Recognition asSpeech Recognition as
transduction - Lexicontransduction - Lexicon
● Construct graph using Weighted Finite State Transducers
(WFST)
32 / 56
Speech Recognition asSpeech Recognition as
transductiontransduction
● Compose Lexicon FST with Grammar FST L o G
● Transduction via Composition
● Map output labels of lexicon to input labels of Language Model.
● Join and optimize end-to-end graph.
33 / 56
Different steps of acoustic modelingDifferent steps of acoustic modeling
34 / 56
DecodingDecoding
35 / 56
DecodingDecoding
● We want to find the most likely word sequence W
knowing the observation o in the graph
36 / 56
Part III : Neural Networks for Speech RecognitionPart III : Neural Networks for Speech Recognition
37 / 56
Three main paradigms for neural networks for speechThree main paradigms for neural networks for speech
● Use neural networks to compute nonlinear feature
representation
● « Bottleneck » or « tandem » features
● Use neural networks to estimate phonetic unit
probabilities (Hybrid networks)
● Use end-to-end neural networks
38 / 56
Neural network featuresNeural network features
● Train a neural network to discriminate classes.
● Use output or a low-dimensional bottleneck layer
representation as features.
39 / 56
Hybrid Speech Recognition SystemHybrid Speech Recognition System
● Train the network as a classifier with a softmax across
the phonetic units.
40 / 56
Hybrid Speech Recognition SystemHybrid Speech Recognition System
41 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Fully connected
● Convolutional Networks (CNNs)
● Recurrent neural networks (RNNs)
● LSTMs
● GRUs
42 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Convolutional Neural network
43 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Recurrent Neural Network
44 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Recurrent Neural Network
45 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Recurrent Neural Network
46 / 56
Neural network architectures for speech recognitionNeural network architectures for speech recognition
● Recurrent Neural Network
47 / 56
End-To-End Neural Networks for Speech Recognition :End-To-End Neural Networks for Speech Recognition :
CTC Loss FucntionCTC Loss Fucntion
48 / 56
End-To-End Speech Recognition :End-To-End Speech Recognition :
CTC InputCTC Input
● Graphem-based model : c {A,B,C…,Z,Blank,Space}
● P(c=HHH_E_LL_LO___|x)= P(c₁=H|x)P(c₂=H|x)...P(c₆=blank|x)..
49 / 56
Connexionist Temporal Classification (CTC)Connexionist Temporal Classification (CTC)
● CTC Loss Function :
50 / 56
Connexionist Temporal Classification (CTC)Connexionist Temporal Classification (CTC)
● Mise à jour du réseau avec la CTC Loss Function :● Mise à jour du réseau avec la CTC Loss Function :
● Backprobagation :
51 / 56
Home messageHome message
● Speech Recognition systems
● HMM-GMM traditional system
● Hybrid ASR system
● Use Neural Networks for feature representation
● Or , use Neural Networks for phoneme recognition
● End-To-End Neural Networks system
● Grapheme based model
● Need lot of date to perform
● Complex modeling
52 / 56
Part IV : KaldiPart IV : Kaldi
53 / 56
The Kaldi ToolkitThe Kaldi Toolkit
● Kaldi is specifically designed for speech recognition research
application
● Kaldi training tools
● Data preparation (link text to wav, speaker to utt..)
● Feature extraction : MFCC, PLP, F-BANKs, Pitch, LDA, HLDA,
fMLLR, MLLT, VTLN, etc.
● Scripts for building finite state transducer : converting
Lexicon & Language model to fst format
● HMM-GMM traditional system
● Hybrid system
● Online decoding
54 / 56
Kaldi ArchitectureKaldi Architecture
55 / 56
LinSTT use KaldiLinSTT use Kaldi
Site CLIPS ENST IRENE LIA LIMSI LIUM LORIA Linagora
WER 40.7 45.4 35.4 26.7 11.9 23.6 27.6 26.23
Audio Corpus 90h 90h 90h 90h 90h
+100h
90h
+90h
90h 90h
#states 1,500 114 6,000 3,600 12,000 7,000 6,000 15,000
#gaussians 24k 14k 200k 230k 370k 154k 90k 500k
#pronunciations 38k 118k 118k 130k 276k 107k 112k 105k
Thanks for your attentionThanks for your attention
LINAGORA – headquarters
80, rue Roque de Fillol
92800 PUTEAUX
FRANCE
Phone : +33 (0)1 46 96 63 63
Info : info@linagora.com
Web : www.linagora.com
facebook.com/Linagora/
@linagora

Contenu connexe

Tendances

Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Rohit Kumar
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extractionskylian
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxKarimdabbabi
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationkokanechandrakant
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Introduction to pattern recognization
Introduction to pattern recognizationIntroduction to pattern recognization
Introduction to pattern recognizationAjharul Abedeen
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rulesswapnac12
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
 

Tendances (20)

Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Introduction to pattern recognition
Introduction to pattern recognitionIntroduction to pattern recognition
Introduction to pattern recognition
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Word embedding
Word embedding Word embedding
Word embedding
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptx
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguation
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Introduction to pattern recognization
Introduction to pattern recognizationIntroduction to pattern recognization
Introduction to pattern recognization
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 

En vedette

Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic TheoryMelanie Swan
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedMelanie Swan
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Jaemin Cho
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep LearningMelanie Swan
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 

En vedette (20)

Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic Theory
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 

Similaire à Deep Learning in practice : Speech recognition and beyond - Meetup

Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2EXAMCELLH4
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksLooking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksDinesh V
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...Yun-Nung (Vivian) Chen
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdfssuser849b73
 
deeplearning
deeplearningdeeplearning
deeplearninghuda2018
 
Short story presentation
Short story presentationShort story presentation
Short story presentationStutiAgarwal36
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identificationTriloki Gupta
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Reviewchangedaeoh
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
 

Similaire à Deep Learning in practice : Speech recognition and beyond - Meetup (20)

Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2Deep learning Techniques JNTU R20 UNIT 2
Deep learning Techniques JNTU R20 UNIT 2
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksLooking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Sepformer&DPTNet.pdf
Sepformer&DPTNet.pdfSepformer&DPTNet.pdf
Sepformer&DPTNet.pdf
 
deeplearning
deeplearningdeeplearning
deeplearning
 
Short story presentation
Short story presentationShort story presentation
Short story presentation
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Speaker identification
Speaker identificationSpeaker identification
Speaker identification
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
Et25897899
Et25897899Et25897899
Et25897899
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
NLP DLforDS
NLP DLforDSNLP DLforDS
NLP DLforDS
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
 

Plus de LINAGORA

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques LINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFILINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseLINAGORA
 
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORAComment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORALINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalLINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésLINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wirelessLINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du CloudLINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPLINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINIDLINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...LINAGORA
 
Open Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraOpen Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraLINAGORA
 

Plus de LINAGORA (20)

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
 
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORAComment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA
Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
 
Open Source Software Assurance by Linagora
Open Source Software Assurance by LinagoraOpen Source Software Assurance by Linagora
Open Source Software Assurance by Linagora
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Deep Learning in practice : Speech recognition and beyond - Meetup

  • 1. Deep learning in practice : Speech recognition and beyond Abdel HEBA 27 septembre 2017
  • 2. 2 / 56 OutlineOutline ● Part 1 : Basics of Machine Learning ( Deep and Shallow) and of Signal Processing ● Part 2 : Speech Recognition ● Acoustic representation ● Probabilistic speech recognition ● Part 3 : Neural Network Speech Recognition ● Hybrid neural networks ● End-to-End architecture ● Part 4 : Kaldi
  • 3. 3 / 56 Reading MaterialReading Material
  • 4. 4 / 56 A Deep-Learning Approach Books: Bengio, Yoshua (2009). "Learning Deep Architectures fo r AI" .   L. Deng and D. Yu (2014) "Deep Learning: Methods and Applications" http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol 7-SIG-039.pdf   D. Yu and L. Deng (2014). "Automatic Speech Recognition: A Deep Learning Approach” (Publisher: Springer). Reading MaterialReading Material
  • 5. 5 / 56 Reading MaterialReading Material
  • 6. 6 / 56 Part I : Machine Learning ( Deep/Shallow)Part I : Machine Learning ( Deep/Shallow) and Signal Processingand Signal Processing
  • 7. 7 / 56 Current view of Artificial Intelligence, Machine Learning & DeepCurrent view of Artificial Intelligence, Machine Learning & Deep LearningLearning Edureka blog – what-is-deep-learning
  • 8. 8 / 56 Current view of Machine Learning founding & disciplinesCurrent view of Machine Learning founding & disciplines Edureka blog – what-is-deep-learning
  • 9. 9 / 56 Machine Learning Paradigms : An OverviewMachine Learning Paradigms : An Overview Machine learning Data Analysis/ Statistic s Programs
  • 10. 10 / 56 Supervised Machine Learning (classification)Supervised Machine Learning (classification) measurements (features) & associated ‘class’ labels (colors used to show class labels) Training data set Training algorithm Parameters/weights (and sometimes structure) Learned model Training phase (usually offline)
  • 11. 11 / 56 Supervised Machine Learning (classification)Supervised Machine Learning (classification) Input test data point structure + parameters predicted class label or label sequence (e.g. sentence) Learned model Output measurements (features) only Test phase (run time, online)
  • 12. 12 / 56 What Is Deep Learning ?What Is Deep Learning ? Deep learning Machine learning Deep learning (deep machine learning, or deep structured learning, or hierarchical learning, or sometimes DL) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non- linear transformations.[1](p198)[2] [3][4]
  • 13. 13 / 56 Evolution of Machine LearningEvolution of Machine Learning (Slide from: Yoshua Bengio)
  • 14. 14 / 56 Face RecognitionFace Recognition
  • 15. Y LeCun MA Ranzato D-AE DBN DBM AEPerceptron RBM GMM BayesNP SVM Sparse Coding  DecisionTree Boosting SHALLOW DEEP Conv. Net Neural Net RNN Bayes Nets Modified from
  • 16. Y LeCun MA Ranzato SHALLOW DEEP Neural Networks Probabilistic Models D-AE DBN DBM AEPerceptron RBM GMM BayesNP SVM Sparse Coding  DecisionTree Boosting Conv. Net Deep Neural Net RNN Bayes Nets Modified from
  • 17. Y LeCun MA Ranzato SHALLOW DEEP Neural Networks Probabilistic Models Conv. Net D-AE DBN DBM AEPerceptron RBM ?GMM BayesNP SVM Supervised Supervised Unsupervised Sparse Coding  Boosting DecisionTree Deep Neural Net RNN ?Bayes Nets Modified from
  • 18. 18 / 56 Part II : Speech RecognitionPart II : Speech Recognition
  • 19. 19 / 56 Human Communication : verbal & non verbal informationHuman Communication : verbal & non verbal information
  • 20. 20 / 56 Speech recognition problemSpeech recognition problem
  • 21. 21 / 56 Speech recognition problemSpeech recognition problem ● Automatic speech recognition ● Spontaneous vs read speech ● Large vocabulary ● In noise ● Low resource ● Far-Field ● Accent-independent ● Speaker-adaptative ● Speaker identification ● Speech enhancement ● Speech separation
  • 22. 22 / 56 Speech representationSpeech representation ● Same word : « Appeler »
  • 23. 23 / 56 Speech representationSpeech representation We want a low-dimensionality representation, invariant to speaker, background noise, rate of speaking etc. ● Fourier analysis shows energy in different frequency bands
  • 24. 24 / 56 Acoustic representationAcoustic representation Vowel triangle as seen from the formants 1 & 2
  • 25. 25 / 56 Acoustic representationAcoustic representation ● Features used in speech recognition ● Mel Frequency Cepstral Coefficients – MFCC ● Perceptual Linear Prediction – PLP ● RASTA-PLP ● Filter Banks Coefficient – F-BANKs
  • 26. 26 / 56 Speech Recognition asSpeech Recognition as transduction Fromtransduction From signal to languagesignal to language
  • 27. 27 / 56 Speech Recognition asSpeech Recognition as transduction Fromtransduction From signal to languagesignal to language
  • 28. 28 / 56 Speech Recognition asSpeech Recognition as transduction Fromtransduction From signal to languagesignal to language
  • 29. 29 / 56 Probabilistic speech recognitionProbabilistic speech recognition ● Speech signal represented as an acoustic observation sequence ● We want to find the most likely word sequence W ● We model this with a Hidden Markov Model ● The system has a set of discrete states, ● Transitions from state to state according to transition probabilities (Markovian : memoryless) ● Acoustic observation when making a transition is conditioned on state alone. P(o|c) ● We seek to recover the state sequence and consequently the word sequence
  • 30. 30 / 56 Speech Recognition asSpeech Recognition as transduction - Phone Recognitiontransduction - Phone Recognition ● Training Algorithm (N iteration) ● Align data & text ● Compute probabilities P(o/p) of each segments o ● Update boundaries
  • 31. 31 / 56 Speech Recognition asSpeech Recognition as transduction - Lexicontransduction - Lexicon ● Construct graph using Weighted Finite State Transducers (WFST)
  • 32. 32 / 56 Speech Recognition asSpeech Recognition as transductiontransduction ● Compose Lexicon FST with Grammar FST L o G ● Transduction via Composition ● Map output labels of lexicon to input labels of Language Model. ● Join and optimize end-to-end graph.
  • 33. 33 / 56 Different steps of acoustic modelingDifferent steps of acoustic modeling
  • 35. 35 / 56 DecodingDecoding ● We want to find the most likely word sequence W knowing the observation o in the graph
  • 36. 36 / 56 Part III : Neural Networks for Speech RecognitionPart III : Neural Networks for Speech Recognition
  • 37. 37 / 56 Three main paradigms for neural networks for speechThree main paradigms for neural networks for speech ● Use neural networks to compute nonlinear feature representation ● « Bottleneck » or « tandem » features ● Use neural networks to estimate phonetic unit probabilities (Hybrid networks) ● Use end-to-end neural networks
  • 38. 38 / 56 Neural network featuresNeural network features ● Train a neural network to discriminate classes. ● Use output or a low-dimensional bottleneck layer representation as features.
  • 39. 39 / 56 Hybrid Speech Recognition SystemHybrid Speech Recognition System ● Train the network as a classifier with a softmax across the phonetic units.
  • 40. 40 / 56 Hybrid Speech Recognition SystemHybrid Speech Recognition System
  • 41. 41 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Fully connected ● Convolutional Networks (CNNs) ● Recurrent neural networks (RNNs) ● LSTMs ● GRUs
  • 42. 42 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Convolutional Neural network
  • 43. 43 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Recurrent Neural Network
  • 44. 44 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Recurrent Neural Network
  • 45. 45 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Recurrent Neural Network
  • 46. 46 / 56 Neural network architectures for speech recognitionNeural network architectures for speech recognition ● Recurrent Neural Network
  • 47. 47 / 56 End-To-End Neural Networks for Speech Recognition :End-To-End Neural Networks for Speech Recognition : CTC Loss FucntionCTC Loss Fucntion
  • 48. 48 / 56 End-To-End Speech Recognition :End-To-End Speech Recognition : CTC InputCTC Input ● Graphem-based model : c {A,B,C…,Z,Blank,Space} ● P(c=HHH_E_LL_LO___|x)= P(c₁=H|x)P(c₂=H|x)...P(c₆=blank|x)..
  • 49. 49 / 56 Connexionist Temporal Classification (CTC)Connexionist Temporal Classification (CTC) ● CTC Loss Function :
  • 50. 50 / 56 Connexionist Temporal Classification (CTC)Connexionist Temporal Classification (CTC) ● Mise à jour du réseau avec la CTC Loss Function :● Mise à jour du réseau avec la CTC Loss Function : ● Backprobagation :
  • 51. 51 / 56 Home messageHome message ● Speech Recognition systems ● HMM-GMM traditional system ● Hybrid ASR system ● Use Neural Networks for feature representation ● Or , use Neural Networks for phoneme recognition ● End-To-End Neural Networks system ● Grapheme based model ● Need lot of date to perform ● Complex modeling
  • 52. 52 / 56 Part IV : KaldiPart IV : Kaldi
  • 53. 53 / 56 The Kaldi ToolkitThe Kaldi Toolkit ● Kaldi is specifically designed for speech recognition research application ● Kaldi training tools ● Data preparation (link text to wav, speaker to utt..) ● Feature extraction : MFCC, PLP, F-BANKs, Pitch, LDA, HLDA, fMLLR, MLLT, VTLN, etc. ● Scripts for building finite state transducer : converting Lexicon & Language model to fst format ● HMM-GMM traditional system ● Hybrid system ● Online decoding
  • 54. 54 / 56 Kaldi ArchitectureKaldi Architecture
  • 55. 55 / 56 LinSTT use KaldiLinSTT use Kaldi Site CLIPS ENST IRENE LIA LIMSI LIUM LORIA Linagora WER 40.7 45.4 35.4 26.7 11.9 23.6 27.6 26.23 Audio Corpus 90h 90h 90h 90h 90h +100h 90h +90h 90h 90h #states 1,500 114 6,000 3,600 12,000 7,000 6,000 15,000 #gaussians 24k 14k 200k 230k 370k 154k 90k 500k #pronunciations 38k 118k 118k 130k 276k 107k 112k 105k
  • 56. Thanks for your attentionThanks for your attention LINAGORA – headquarters 80, rue Roque de Fillol 92800 PUTEAUX FRANCE Phone : +33 (0)1 46 96 63 63 Info : info@linagora.com Web : www.linagora.com facebook.com/Linagora/ @linagora