Deep Learning Summit (DLS01-4)

Exploiting the Power of Language
Alexander Smola, Amazon Web Services

Thanks
Hassan Sawaf, Zornitsa Kozareva, Hyuokun Yun, Hagen
Fürstenau, Daniel Marcu, Mu Li, Sheng Zha, Dimitris
Soulios, Vlad Zhukov, Vikram Ambazhagan, Yakov
Kronrod, Yaser Al-Onaizan
… and many others …

Language in AWS
Text Audio Image
Text
Machine Translation
Summarization,
Dialog
Speech Recognition OCR
Audio Synthetic Voice — —
Image Printing — —
Structure Sentiment, Topics
Language, Parsing
— —

Language in AWS - for everyone
• We love Open Source …
• Apache MxNet Deep Learning Framework
http://www.mxnet.io
• Sockeye Machine Translation Toolkit (seq2seq)
https://github.com/awslabs/sockeye
• more soon …
• … on fast infrastructure
• G2 (Kepler), P2 (Kepler), G3 (Maxwell), P3 (Volta)
C5 (Skylake)

Outline
• Sequence Input
• Object embeddings (words, sound, images)
• Sequences of objects (LSTMs, Convolutions)
• Sequence Output
• Words, sound, structures
• Beam search
• Attention, convolutions, lookup tables
• Gluon.mxnet.io - the engine
small & flexible set of tools
for many applications

Basic Idea
AWS is awesome. AWS est magnifique.
• Sequence Input
• Embed words
(indicator, word2vec, cLSTM)
• Sequences of objects
(bag of words, state update,
convolutions
• Sequence Output
• Embed outputs
(word2vec, sound outputs)
• Beam search for decoding
(‘to wreck a nice beach’)
• Structured output
(Tree LSTM)
• Mechanics
• Attention
• Dynamic state
updates (Q&A)
• Table lookup

Sequence Input
• Bag of words (until 2010)
• No prior knowledge required
• Tokenize words, ignore word order
• Linear model
AWS is awesome.
CMU Bag of words
Grandma and I eat. And I eat grandma.

Sequence Input
• Bag of embeddings (word2vec)
• Pretrain embeddings on more data
• Tokenize words, ignore word order
• (Usually non)linear model
AWS is awesome.
CMU Bag of words
Grandma and I eat. And I eat grandma.

Sequence Input
• Order Matters - update state after every word (LSTM)
Hochreiter & Schmidhuber, 1997
AWS is awesome.

• Order Matters (BLSTM)
• but sometimes we only know later what was relevant
• use a bidirectional LSTM - often multiple layers
Sequence Input
The president of the United States of America
The president of the Kansas Rabbit Breeding club

Example - Amazon Comprehend
• Named Entity Recognition
• Key-Phrase Extraction
• Language Identification
• Sentiment Analysis
• Topic Modeling

Amazon Comprehend: Named Entity Recognition

Amazon Comprehend: Key-Phrase Extraction

Amazon Comprehend: Sentiment Analysis

Amazon Comprehend: Language Identification

Sequence Input
• Sequence of embeddings
• Use the last vector in sequence to encode all?
AWS is awesome
AWS offers a wide range of services, it is
highly scalable, reliable and cost effective …
• Average over all vectors?
I like this shirt
My friend thought that Amazon Basics shirts
look cheap but I really like their designs.
Only pay attention to relevant parts.

Sequence Input
• Attention Mechanism (Bahdaneau et al, 2015)
I like this shirt
My friend thought that Amazon Basics shirts look
cheap but I really like their designs.
Only pay attention to relevant parts.
Learn to pay attention

Using it for simple outputs
• Encode input as described
• Estimate, e.g. for
• Sentiment
• Category
• LanguageID
• Tagging and parsing
• More issues
• Large vocabulary (cLSTM and backoff)
• Convolutions vs. LSTMs for speed
Amazon
Comprehend
on AWS

Sequence Output
• Many applications
• Machine Translation (Amazon Translate)
• Optical Character Recognition
• Speech Recognition (Amazon Lex)
• Text to Speech (Amazon Polly)
• Key problems
• Efficient decoding
• State space
• Variable output length (audio vs. text, MT)

Sequence Output
• Text Annotation
(named entity tagging, etc.)
• Input and output have the
same length (good)
• Simple sequence to sequence
model (decode one at a time)

Sequence Output
• Decoding
• In theory we could just decode one word at a time
• State space is too large, so use approximate statistic
• This is now an approximation. Cannot decode exactly.
• Beam search
• GAN-style samplers (need different loss)
Compress relevant state

Sequence Output
• One size does not fit all
AWS is awesome
AWS offers a wide range of services, it is highly
scalable, reliable and cost effective …
• Attention for nonparametric models
(update attention pointer A to select where to attend next)
same dimensionality for embedding is no good

Sequence Output
• Machine Translation
• word order is different
the white house - la casa blanca
• number of words is different
town wall - Stadtmauer
• context matters
he took it along - er nahm sie mit
• Attention Mechanism for decoding
• multiple pointers, hierarchical attention
for encoding and decoding, …

Sequence Output
• Text to Speech (e.g. Polly)
• Input is short and discrete (words)
• Output is wave function
• Encode
BLSTM or convolution as before
• Decode
• LSTM autoregressive model
• Attention on source text
AWS is awesome.

Amazon Transcribe
00:00:00,100 --> 00:00:02,949
You you have said one moment can make a
movement.
00:00:02,949 --> 00:00:07,170
What was that moment for you?
What do you looking at that moment right
00:00:07,170 --> 00:00:07,540
now?
00:00:16,460 --> 00:00:20,109
I think that what i meant by that and
what i mean by that is that any moment
00:00:20,109 --> 00:00:23,709
you can change the course of your life
you can change the direction of what
00:00:23,709 --> 00:00:24,730
you're going in.
00:00:00,000 --> 00:00:08,449
What would be the best policy response to dealing with
those who have been displaced in obviously trade
restrictions
00:00:08,449 --> 00:00:19,620
Bring with it a whole lot of difficulties. What what should
government do to address this problem which does lead
to quite a lot of disquiet in in the general public? There
are
00:00:19,620 --> 00:00:28,179
two sorts of policies that i think help one of them is
00:00:28,179 --> 00:00:34,100
conventional safety that policies very important to make
sure that if
00:00:34,100 --> 00:00:43,490
if jobs are displaced in an industry that losing those jobs
doesn't mean losing health care doesn't mean losing
your retirement benefits doesn't mean that

More language tools
• Amazon Transcribe
• Convert audio content into text.
• Hybrid system with deep Bidirectional LSTMs
• CTC-based encoder-decoder system
• Amazon Lex
• Extract intent of human language input (textual or audible user
requests) and convert into workflow.
• Beyond …
• Graphs and Knowledge Bases (Vertex embeddings)
• Lookup tables (Translation memories, Dictionaries, Interpolative TTS)
• Attention (Memory networks in dialog, Structured text)

Symbolic vs. Imperative
• easy to optimize
• portable
• easy to serialize
• hard to debug
• no dynamic graphs
• no native code
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),B=np.ones(10)*2)
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
print c
d = c + 1
• easy to code
• easy to debug
• dynamic graphs / native code
• hard to optimize
• hard to serialize
• JIT compiler fixes this

Performance Optimization
• Hybridization (JIT Compiler)
• Compile compute graph/sidestep the Python interpreter
• Flexibility when model changes (often for language)
• Dynamic Batching
Aggregate data automatically to deal with variable length
of graphs in execution
• Kernel Fusion
Combine operators, e.g. (A += B, A += C)

• LSTM
sequential dependence
between states
• Tree LSTM
hierarchical dependence
ancestors
of vertex
Example: Tree LSTMs

Even Faster with Dynamic Batching
Tree LSTM has dynamic graph
Group execution for efficiency
0
50
100
150
200
250
Gluon Batched Gluon
C4.8x

Summary
• Sequence Input
• Object embeddings (words, sound, images)
• Sequences of objects (LSTMs, Convolutions)
• Sequence Output
• Words, sound, structures
• Beam search
• Attention, convolutions, lookup tables
• gluon.mxnet.io

Deep Learning Summit (DLS01-4)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Deep Learning Summit (DLS01-4)

Similaire à Deep Learning Summit (DLS01-4) (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

Deep Learning Summit (DLS01-4)