AWS provides a wide range of language tools including machine translation, summarization, speech recognition, and optical character recognition. These tools utilize sequence models with object embeddings, LSTM/CNN sequences, and attention mechanisms to efficiently process input sequences and produce structured outputs through techniques like beam search. MXNet and Gluon provide a flexible engine for building and optimizing these types of models at scale on AWS infrastructure.
2. Thanks
Hassan Sawaf, Zornitsa Kozareva, Hyuokun Yun, Hagen
Fürstenau, Daniel Marcu, Mu Li, Sheng Zha, Dimitris
Soulios, Vlad Zhukov, Vikram Ambazhagan, Yakov
Kronrod, Yaser Al-Onaizan
… and many others …
3. Language in AWS
Text Audio Image
Text
Machine Translation
Summarization,
Dialog
Speech Recognition OCR
Audio Synthetic Voice — —
Image Printing — —
Structure Sentiment, Topics
Language, Parsing
— —
4. Language in AWS - for everyone
• We love Open Source …
• Apache MxNet Deep Learning Framework
http://www.mxnet.io
• Sockeye Machine Translation Toolkit (seq2seq)
https://github.com/awslabs/sockeye
• more soon …
• … on fast infrastructure
• G2 (Kepler), P2 (Kepler), G3 (Maxwell), P3 (Volta)
C5 (Skylake)
5. Outline
• Sequence Input
• Object embeddings (words, sound, images)
• Sequences of objects (LSTMs, Convolutions)
• Sequence Output
• Words, sound, structures
• Beam search
• Attention, convolutions, lookup tables
• Gluon.mxnet.io - the engine
small & flexible set of tools
for many applications
6. Basic Idea
AWS is awesome. AWS est magnifique.
• Sequence Input
• Embed words
(indicator, word2vec, cLSTM)
• Sequences of objects
(bag of words, state update,
convolutions
• Sequence Output
• Embed outputs
(word2vec, sound outputs)
• Beam search for decoding
(‘to wreck a nice beach’)
• Structured output
(Tree LSTM)
• Mechanics
• Attention
• Dynamic state
updates (Q&A)
• Table lookup
7. Sequence Input
• Bag of words (until 2010)
• No prior knowledge required
• Tokenize words, ignore word order
• Linear model
AWS is awesome.
CMU Bag of words
Grandma and I eat. And I eat grandma.
8. Sequence Input
• Bag of embeddings (word2vec)
• Pretrain embeddings on more data
• Tokenize words, ignore word order
• (Usually non)linear model
AWS is awesome.
CMU Bag of words
Grandma and I eat. And I eat grandma.
9. Sequence Input
• Order Matters - update state after every word (LSTM)
Hochreiter & Schmidhuber, 1997
AWS is awesome.
10. • Order Matters (BLSTM)
• but sometimes we only know later what was relevant
• use a bidirectional LSTM - often multiple layers
Sequence Input
The president of the United States of America
The president of the Kansas Rabbit Breeding club
11. Example - Amazon Comprehend
• Named Entity Recognition
• Key-Phrase Extraction
• Language Identification
• Sentiment Analysis
• Topic Modeling
16. Sequence Input
• Sequence of embeddings
• Use the last vector in sequence to encode all?
AWS is awesome
AWS offers a wide range of services, it is
highly scalable, reliable and cost effective …
• Average over all vectors?
I like this shirt
My friend thought that Amazon Basics shirts
look cheap but I really like their designs.
Only pay attention to relevant parts.
17. Sequence Input
• Attention Mechanism (Bahdaneau et al, 2015)
I like this shirt
My friend thought that Amazon Basics shirts look
cheap but I really like their designs.
Only pay attention to relevant parts.
Learn to pay attention
18. Using it for simple outputs
• Encode input as described
• Estimate, e.g. for
• Sentiment
• Category
• LanguageID
• Tagging and parsing
• More issues
• Large vocabulary (cLSTM and backoff)
• Convolutions vs. LSTMs for speed
Amazon
Comprehend
on AWS
19. Outline
• Sequence Input
• Object embeddings (words, sound, images)
• Sequences of objects (LSTMs, Convolutions)
• Sequence Output
• Words, sound, structures
• Beam search
• Attention, convolutions, lookup tables
• Gluon.mxnet.io - the engine
small & flexible set of tools
for many applications
20. Sequence Output
• Many applications
• Machine Translation (Amazon Translate)
• Optical Character Recognition
• Speech Recognition (Amazon Lex)
• Text to Speech (Amazon Polly)
• Key problems
• Efficient decoding
• State space
• Variable output length (audio vs. text, MT)
21. Sequence Output
• Text Annotation
(named entity tagging, etc.)
• Input and output have the
same length (good)
• Simple sequence to sequence
model (decode one at a time)
22. Sequence Output
• Decoding
• In theory we could just decode one word at a time
• State space is too large, so use approximate statistic
• This is now an approximation. Cannot decode exactly.
• Beam search
• GAN-style samplers (need different loss)
Compress relevant state
23. Sequence Output
• One size does not fit all
AWS is awesome
AWS offers a wide range of services, it is highly
scalable, reliable and cost effective …
• Attention for nonparametric models
(update attention pointer A to select where to attend next)
same dimensionality for embedding is no good
24. Sequence Output
• Machine Translation
• word order is different
the white house - la casa blanca
• number of words is different
town wall - Stadtmauer
• context matters
he took it along - er nahm sie mit
• Attention Mechanism for decoding
• multiple pointers, hierarchical attention
for encoding and decoding, …
26. Sequence Output
• Text to Speech (e.g. Polly)
• Input is short and discrete (words)
• Output is wave function
• Encode
BLSTM or convolution as before
• Decode
• LSTM autoregressive model
• Attention on source text
AWS is awesome.
27. Amazon Transcribe
00:00:00,100 --> 00:00:02,949
You you have said one moment can make a
movement.
00:00:02,949 --> 00:00:07,170
What was that moment for you?
What do you looking at that moment right
00:00:07,170 --> 00:00:07,540
now?
00:00:16,460 --> 00:00:20,109
I think that what i meant by that and
what i mean by that is that any moment
00:00:20,109 --> 00:00:23,709
you can change the course of your life
you can change the direction of what
00:00:23,709 --> 00:00:24,730
you're going in.
00:00:00,000 --> 00:00:08,449
What would be the best policy response to dealing with
those who have been displaced in obviously trade
restrictions
00:00:08,449 --> 00:00:19,620
Bring with it a whole lot of difficulties. What what should
government do to address this problem which does lead
to quite a lot of disquiet in in the general public? There
are
00:00:19,620 --> 00:00:28,179
two sorts of policies that i think help one of them is
00:00:28,179 --> 00:00:34,100
conventional safety that policies very important to make
sure that if
00:00:34,100 --> 00:00:43,490
if jobs are displaced in an industry that losing those jobs
doesn't mean losing health care doesn't mean losing
your retirement benefits doesn't mean that
28. More language tools
• Amazon Transcribe
• Convert audio content into text.
• Hybrid system with deep Bidirectional LSTMs
• CTC-based encoder-decoder system
• Amazon Lex
• Extract intent of human language input (textual or audible user
requests) and convert into workflow.
• Beyond …
• Graphs and Knowledge Bases (Vertex embeddings)
• Lookup tables (Translation memories, Dictionaries, Interpolative TTS)
• Attention (Memory networks in dialog, Structured text)
29. Outline
• Sequence Input
• Object embeddings (words, sound, images)
• Sequences of objects (LSTMs, Convolutions)
• Sequence Output
• Words, sound, structures
• Beam search
• Attention, convolutions, lookup tables
• Gluon.mxnet.io - the engine
small & flexible set of tools
for many applications
30.
31. Symbolic vs. Imperative
• easy to optimize
• portable
• easy to serialize
• hard to debug
• no dynamic graphs
• no native code
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),B=np.ones(10)*2)
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
print c
d = c + 1
• easy to code
• easy to debug
• dynamic graphs / native code
• hard to optimize
• hard to serialize
• JIT compiler fixes this
32. Performance Optimization
• Hybridization (JIT Compiler)
• Compile compute graph/sidestep the Python interpreter
• Flexibility when model changes (often for language)
• Dynamic Batching
Aggregate data automatically to deal with variable length
of graphs in execution
• Kernel Fusion
Combine operators, e.g. (A += B, A += C)