SlideShare une entreprise Scribd logo
1  sur  31
Proprietary and confidential. Do not distribute.
End-to-end speech recognition with neon
Anthony Ndirango & Tyler Lee
MAKING MACHINES
SMARTER.™
now part of
Nervana Systems Proprietary
2
• Deep learning synopsis
• Large vocabulary continuous speech recognition
• End-to-end speech recognition systems
• Integrating weighted finite-state transducers for decoding
Nervana Systems Proprietary
3
Back-propagation
End-to-end
Resnet
ImageNet
Word2Vec
Regularization
Convolution
Unrolling
RNN
Generalization
hyperparameters
Video recognition
dropout
Pooling
LSTM
AlexNet
Speech recognition
download neon!
https://github.com/NervanaSystems/neon
git clone git@github.com:NervanaSystems/neon.git
Nervana’s deep learning tutorials:
https://www.nervanasys.com/deep-learning-tutorials/
We are hiring!
https://www.nervanasys.com/careers/
Nervana Systems Proprietary
4
A method for extracting features at multiple
levels of abstraction
• Features are discovered from data
• Performance improves with more data
• Network can express complex transformations
• High degree of representational power
Nervana Systems Proprietary
5
Nervana Systems Proprietary
6
Healthcare: Tumor detection
Automotive: Speech interfaces Finance: Time-series search engine
Positive:
Negative:
Agricultural Robotics Oil & Gas
Positive:
Negative:
Proteomics: Sequence analysis
Query:
Results:
Nervana Systems Proprietary
7
Nervana Systems Proprietary
8
kwɪk
braʊn
fɒks
quick
brown
fox
Nervana Systems Proprietary
9
Nervana Systems Proprietary
10
download aeon!
https://github.com/NervanaSystems/aeon
• Train directly from raw audio, extracting spectral features on-the-fly
• Handles arbitrarily large datasets
• Loads data from disk to device with minimal latency
• Also supports image and video data
Nervana Systems Proprietary
11
Acoustic models in neon complete source available at
https://github.com/NervanaSystems/deepspeech
Nervana Systems Proprietary
12
• The basic problem to be solved involves mapping a sequence of audio
features to a sequence of characters, with no obvious relationship
between the lengths of the sequences.
• CTC works around this problem by first defining a ”collapse” function.
Definition by example:
Collapse(_NNN_ _EE_ _R_ _VVV_AAA_N_AAAA_) = NERVANA
Nervana Systems Proprietary
13
• For each utterance, model outputs a matrix of frame-wise character probabilities
• Given the ”ground truth” transcript, the CTC algorithm:
1. finds all paths which collapse onto ground truth
2. uses the probability matrix to weight each path
Nervana Systems Proprietary
14
- input audio with 5 frames
- ground truth: “CAB”
- find all strings of length 5, including blank characters that collapse onto “CAB”
CTC Cost
Nervana Systems Proprietary
15
1.do an argmax for each column
2.concatenate the resulting characters to obtain
a string
3.“collapse” the string to get the output
Nervana Systems Proprietary
16
decoded outputs ground truth
younited presidentiol is a lefe in surance company united presidential is a life insurance company
that was sertainly true last week that was certainly true last week
we're now ready to say we're intechnical default a
spokesman said
we're not ready to say we're in technical default a
spokesman said
Nervana Systems Proprietary
17
So we have probabilities of each character at each frame. Now
what?
If CER (character error
rate) is nearly perfect…
We’re pretty much set.
Just use the best
character at each frame.
If CER is too high…
We should enforce some rules
from the language. E.g:
- All words must be valid
- Favor likely word sequences
Nervana Systems Proprietary
18
Weighted finite state transducers:
Automata whose state transitions map a
sequence of input symbols to a sequence of
output symbols
- Directed graph structure
- States enforce language structure
- Transitions choose amongst valid
symbols
For an in-depth review, see Mohri, Pereira & Riley, 2008
Nervana Systems Proprietary
19
- A lot of decoding concepts map nicely to FSTs.
- CTC, lexicon (vocabulary) and grammar (language model) can all be easily
represented.
- Efficient algorithms exist to combine FSTs, giving a single decoding graph
𝐷 = 𝑇 ∘ 𝐿 ∘ 𝐺
Decoding
graph
CTC
graph
Lexicon
graph
Grammar
graph
Nervana Systems Proprietary
20
Removes repeated characters and blanks (“_”)
𝑇
C_
A A A _
B B _
Input: “C _ A A A _ B B _”
C
C A
C A B
Output: “C A B”
Nervana Systems Proprietary
21
Maps a sequence of characters or phonemes to words
𝐿
Nervana Systems Proprietary
22
- Less end-to-end: A large number of
parameters learned completely
separate from the acoustic model
- Memory issues with large vocabularies
or complex language models
Graph # States # Arcs
CTC 31 91
Lexicon 30,629 40,516
Trigram 3,538,579 10,213,039
Composed
Trigram
26,817,696 54,104,686
Nervana Systems Proprietary
23
Reference
CER
(no LM)
WER
(no LM)
WER
(trigram LM)
WER
(trigram LM w/
enhancements)
Hannun, et al.
(2014)
10.7 35.8 14.1 N/A
Graves-Jaitly
(2014)
9.2 30.1 N/A 8.7
Hwang-Sung (2016) 10.6 38.4 8.88 8.1
Miao et al. (2015)
[Eesen]
N/A N/A 9.1 7.3
Bahdanau et al.
(2016)
6.4 18.6 10.8 9.3
Nervana-Speech 8.64 32.5 8.4 N/A
younited presidentiol is a lefe in surance
company
united presidential is a life insurance
company
that was sertainly true last week
that was certainly true last week
we're now ready to say we're intechnical
default a spokesman said
we're not ready to say we're in technical
default a spokesman said
Nervana Systems Proprietary
24
Nervana’s deep learning tutorials:
https://www.nervanasys.com/deep-learning-tutorials/
Acoustic model source available at:
https://github.com/NervanaSystems/deepspeech
Github page:
https://github.com/NervanaSystems/neon
For more information, contact:
info@nervanasys.com
Nervana Systems Proprietary
25
Nervana Systems Proprietary
26
• github.com/NervanaSystems/ModelZoo
• model files, parameters
Nervana Systems Proprietary
27
THANK YOU!
QUESTIONS?
Nervana Systems Proprietary
28
Nervana Systems Proprietary
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0 1 3 4 0 1 2 3 19
• Each element in the output is the result of a dot product between two vectors
29
input filter output
Nervana Systems Proprietary
30
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0
1
2
3
4
5
6
7
8
19
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43
Nervana Systems Proprietary
31

Contenu connexe

Tendances

Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognitionIntel Nervana
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntel Nervana
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsRoelof Pieters
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Edureka!
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshopodsc
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabImry Kissos
 

Tendances (20)

Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
ODSC West
ODSC WestODSC West
ODSC West
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshop
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
Introduction to deep learning in python and Matlab
Introduction to deep learning in python and MatlabIntroduction to deep learning in python and Matlab
Introduction to deep learning in python and Matlab
 

En vedette

High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningIntel Nervana
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Intel Nervana
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for InferenceIntel Nervana
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiMaho Nakata
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel IT Center
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleIntel Nervana
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016Intel Nervana
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크지운 배
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning NVIDIA
 

En vedette (17)

High-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep LearningHigh-Performance GPU Programming for Deep Learning
High-Performance GPU Programming for Deep Learning
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
An Analysis of Convolution for Inference
An Analysis of Convolution for InferenceAn Analysis of Convolution for Inference
An Analysis of Convolution for Inference
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare ppt
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Video Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model ExampleVideo Activity Recognition and NLP Q&A Model Example
Video Activity Recognition and NLP Q&A Model Example
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
Region Of Interest Extraction
Region Of Interest ExtractionRegion Of Interest Extraction
Region Of Interest Extraction
 
RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
Deepcheck, 딥러닝 기반의 얼굴인식 출석체크
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning The AI Era Ignited by GPU Deep Learning
The AI Era Ignited by GPU Deep Learning
 

Similaire à Intel Nervana Artificial Intelligence Meetup 11/30/16

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...changedaeoh
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Lviv Startup Club
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
UnCoRe-YvesSerena
UnCoRe-YvesSerenaUnCoRe-YvesSerena
UnCoRe-YvesSerenaSerena King
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site predictionsageteam
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...ssuser849b73
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...Yun-Nung (Vivian) Chen
 

Similaire à Intel Nervana Artificial Intelligence Meetup 11/30/16 (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
Convolutional Neural Networks for Natural Language Processing / Stanford cs22...
 
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
Grant Reaber “Wavenet and Wavenet 2: Generating high-quality audio with neura...
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
UnCoRe-YvesSerena
UnCoRe-YvesSerenaUnCoRe-YvesSerena
UnCoRe-YvesSerena
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
DNA Splice site prediction
DNA Splice site predictionDNA Splice site prediction
DNA Splice site prediction
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Intel Nervana Artificial Intelligence Meetup 11/30/16

  • 1. Proprietary and confidential. Do not distribute. End-to-end speech recognition with neon Anthony Ndirango & Tyler Lee MAKING MACHINES SMARTER.™ now part of
  • 2. Nervana Systems Proprietary 2 • Deep learning synopsis • Large vocabulary continuous speech recognition • End-to-end speech recognition systems • Integrating weighted finite-state transducers for decoding
  • 3. Nervana Systems Proprietary 3 Back-propagation End-to-end Resnet ImageNet Word2Vec Regularization Convolution Unrolling RNN Generalization hyperparameters Video recognition dropout Pooling LSTM AlexNet Speech recognition download neon! https://github.com/NervanaSystems/neon git clone git@github.com:NervanaSystems/neon.git Nervana’s deep learning tutorials: https://www.nervanasys.com/deep-learning-tutorials/ We are hiring! https://www.nervanasys.com/careers/
  • 4. Nervana Systems Proprietary 4 A method for extracting features at multiple levels of abstraction • Features are discovered from data • Performance improves with more data • Network can express complex transformations • High degree of representational power
  • 6. Nervana Systems Proprietary 6 Healthcare: Tumor detection Automotive: Speech interfaces Finance: Time-series search engine Positive: Negative: Agricultural Robotics Oil & Gas Positive: Negative: Proteomics: Sequence analysis Query: Results:
  • 10. Nervana Systems Proprietary 10 download aeon! https://github.com/NervanaSystems/aeon • Train directly from raw audio, extracting spectral features on-the-fly • Handles arbitrarily large datasets • Loads data from disk to device with minimal latency • Also supports image and video data
  • 11. Nervana Systems Proprietary 11 Acoustic models in neon complete source available at https://github.com/NervanaSystems/deepspeech
  • 12. Nervana Systems Proprietary 12 • The basic problem to be solved involves mapping a sequence of audio features to a sequence of characters, with no obvious relationship between the lengths of the sequences. • CTC works around this problem by first defining a ”collapse” function. Definition by example: Collapse(_NNN_ _EE_ _R_ _VVV_AAA_N_AAAA_) = NERVANA
  • 13. Nervana Systems Proprietary 13 • For each utterance, model outputs a matrix of frame-wise character probabilities • Given the ”ground truth” transcript, the CTC algorithm: 1. finds all paths which collapse onto ground truth 2. uses the probability matrix to weight each path
  • 14. Nervana Systems Proprietary 14 - input audio with 5 frames - ground truth: “CAB” - find all strings of length 5, including blank characters that collapse onto “CAB” CTC Cost
  • 15. Nervana Systems Proprietary 15 1.do an argmax for each column 2.concatenate the resulting characters to obtain a string 3.“collapse” the string to get the output
  • 16. Nervana Systems Proprietary 16 decoded outputs ground truth younited presidentiol is a lefe in surance company united presidential is a life insurance company that was sertainly true last week that was certainly true last week we're now ready to say we're intechnical default a spokesman said we're not ready to say we're in technical default a spokesman said
  • 17. Nervana Systems Proprietary 17 So we have probabilities of each character at each frame. Now what? If CER (character error rate) is nearly perfect… We’re pretty much set. Just use the best character at each frame. If CER is too high… We should enforce some rules from the language. E.g: - All words must be valid - Favor likely word sequences
  • 18. Nervana Systems Proprietary 18 Weighted finite state transducers: Automata whose state transitions map a sequence of input symbols to a sequence of output symbols - Directed graph structure - States enforce language structure - Transitions choose amongst valid symbols For an in-depth review, see Mohri, Pereira & Riley, 2008
  • 19. Nervana Systems Proprietary 19 - A lot of decoding concepts map nicely to FSTs. - CTC, lexicon (vocabulary) and grammar (language model) can all be easily represented. - Efficient algorithms exist to combine FSTs, giving a single decoding graph 𝐷 = 𝑇 ∘ 𝐿 ∘ 𝐺 Decoding graph CTC graph Lexicon graph Grammar graph
  • 20. Nervana Systems Proprietary 20 Removes repeated characters and blanks (“_”) 𝑇 C_ A A A _ B B _ Input: “C _ A A A _ B B _” C C A C A B Output: “C A B”
  • 21. Nervana Systems Proprietary 21 Maps a sequence of characters or phonemes to words 𝐿
  • 22. Nervana Systems Proprietary 22 - Less end-to-end: A large number of parameters learned completely separate from the acoustic model - Memory issues with large vocabularies or complex language models Graph # States # Arcs CTC 31 91 Lexicon 30,629 40,516 Trigram 3,538,579 10,213,039 Composed Trigram 26,817,696 54,104,686
  • 23. Nervana Systems Proprietary 23 Reference CER (no LM) WER (no LM) WER (trigram LM) WER (trigram LM w/ enhancements) Hannun, et al. (2014) 10.7 35.8 14.1 N/A Graves-Jaitly (2014) 9.2 30.1 N/A 8.7 Hwang-Sung (2016) 10.6 38.4 8.88 8.1 Miao et al. (2015) [Eesen] N/A N/A 9.1 7.3 Bahdanau et al. (2016) 6.4 18.6 10.8 9.3 Nervana-Speech 8.64 32.5 8.4 N/A younited presidentiol is a lefe in surance company united presidential is a life insurance company that was sertainly true last week that was certainly true last week we're now ready to say we're intechnical default a spokesman said we're not ready to say we're in technical default a spokesman said
  • 24. Nervana Systems Proprietary 24 Nervana’s deep learning tutorials: https://www.nervanasys.com/deep-learning-tutorials/ Acoustic model source available at: https://github.com/NervanaSystems/deepspeech Github page: https://github.com/NervanaSystems/neon For more information, contact: info@nervanasys.com
  • 26. Nervana Systems Proprietary 26 • github.com/NervanaSystems/ModelZoo • model files, parameters
  • 29. Nervana Systems Proprietary 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 1 3 4 0 1 2 3 19 • Each element in the output is the result of a dot product between two vectors 29 input filter output
  • 30. Nervana Systems Proprietary 30 0 1 2 3 4 5 6 7 8 0 1 2 3 19 25 37 43 0 1 2 3 4 5 6 7 8 19 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 25 37 43

Notes de l'éditeur

  1. Emphasize that we are just replicating DS2’s acoustic model
  2. don’t have to make a model from scratch - many examples of pre-trained models mention Yinyin-Fast-RCNN, Sathish C3D, babI
  3. To understand Convolution networks, we should understand convolution operation first and then see how such operation is implemented in a network structure