SlideShare une entreprise Scribd logo
1  sur  78
1© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning at AWS:
Embeddings & Attention
Models
Leo Dirac, Principal Engineer
July 20, 2017
2© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Goals of this talk
• Inspire you to think big!
• Explain some key Deep Learning concepts
• Share impressive research results
• Applications at Amazon
3© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How similar are these products?
• Identical?
• Different {sizes, styles} of the same product?
• Different products?
4© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supervised ML
Training Data Labels
6© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supervised ML
Model
7© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Code
Model
Training
Data
Algorithm Code
8© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML models as Code
• Linear Models (i.e. Logistic Regression)
– Very simple algorithm: SUMPRODUCT
– Fast to run, pretty easy to train
• Deep Neural Networks
– Arbitrarily complex algorithm
– Tricky & slow to train, requires GPU hardware
9© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Floating Point Performance
Multiply two (10,000 x 10,000) matrices
(400MB each, 32-bit)
• Native BLAS (python numpy): ~30 seconds*
• Java (Naïve triple for-loop):
• P2.xlarge GPU: ~0.6 seconds
*Tested on a 4-core (8 w/ HT) iMac w/ Intel Core i7 @ 3.4GHz; similar to c4.2xl
~5 hours*
10© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 p2.16xlarge
68,000,000,000,000
operations/second
(w/ 16 GPU’s, each about 4 TFlops)
11© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Clustering
• Finds “similar” items
• What is Similar?
• Vector Distance
– Data points are coordinates
12© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Euclidean Distance / L2-norm
13© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pixel Similarity Distance
0.483
1.412
1.770
15© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Preparing your data for Math
NxD matrix
16© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Embedding”
“Encoding”
“Latent Features”
“Feature Embedding”
“Feature Vector”
“Vector”
“Point”
“Coordinates”
17© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
☐Image Embeddings:
☐Word Embedding…
tricky
18© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embedding
• Why not just use char[]?
19© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Not semantically meaningful
s(“Duck”) = [68.00,
117.00,
99.00,
107.00,
0.00,
0.00,
0.00,
0.00]
• Closest to “Euck” “Dudk” “Dtck”.
• Not very similar to “duck” or
“Ducks”.
• Very far from “Goose”.
20© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bag of Words / 1-Hot
21© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
22© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
23© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Holds in higher dimensions
24© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Everything is equidistant
D(“king”,”queen”) = 1.4142
D(“king”,”kings”) = 1.4142
D(“king”,”small”) = 1.4142
D(”small”, “tiny”) = 1.4142
D(”frog”, “diesel”) = 1.4142
D(”soccer”, “ball”) = 1.4142
25© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Semantic meaning in geometry
D(“king”,”queen”) = 0.188
D(“king”,”kings”) = 0.052
D(“king”,”small”) = 1.385
D(”small”, “tiny”) = 0.165
26© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word2vec Embedding
W2v(“king”) = [-3.168
-0.136
3.770
4.767
3.558
-4.168
0.464
2.034
3.411
…
0.866]
• float[128]
• Meaningless to a human
• Like a hash code
• Pre-computed
• Map<String,float[]>
• Takes long time to train
27© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Similar words: nearby embeddings
W2v(“king”) = [-3.168
-0.136
3.770
4.767
3.558
-4.168
0.464
2.034
3.411
…
0.866]
W2v(“queen”) = [-3.101
-0.057
3.800
4.862
3.632
-4.157
0.549
2.064
3.428
…
0.884]
D(W2v(“king”) – W2v(“queen”)) = 0.188
28© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Algebra
[5.409
5.281
-1.331
3.714
-1.727
-3.167
-2.130
1.213
-3.285
…
-2.000]
W2v(“king”) – W2v(“queen”) + W2v(“aunt”) =
W2v(“uncle”) ≈
29© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analogies in Geomgetry
30© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analogies
31© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word2Vec Embedding
Training data: Large text corpus (like wikipedia)
32© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec
☐Image Embeddings: ?
33© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Goal: Semantic Similarity
0.058
0.731
0.782
34© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ImageNet
Training data: 10^6 <Image,Noun> pairs
Noun vocabulary: 1000
35© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolutional Neural Network
X F(X) Y
←Leopard
≈
ConvNet
CNN
36© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Human-level performance
39© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dark Knowledge
Training
Data
[0.001,
0.000,
0.685,
0.013,
…
0.004,
0.134,
0.000,
…
0.007]
grille 
grille
convertible
Predictions
Training
Algorithm
40© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Predictions as Embedding?
[0.001,
0.000,
0.685,
0.013,
…
0.004,
0.134,
0.000,
…
0.007]
grille 
convertible
41© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Image Features
X F(X)
↑
penultimate layer
y
42© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Linearly Separable Space
43© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learned features
Penultimate Layer
Dim1: Four legs?
Dim2: Straps?
Dim3: Brown & furry?
Dim4: Human leg?
Dim5: Standing in grass?
Dim6: Person holding it?
Dim7: Has laces?
…
Dim4096: In this sky?
Output Layer
Dim1: Is this an aardvark?
Dim2: Is this an airplane?
Dim3: Is this an apple?
…
Dim 258: Is this a dress shoe?
…
Dim721: Is this a sandal?
…
Dim 1000: Is this a zebra?
44© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Image Embedding
Embedding Features
Dim1: Four legs?
Dim2: Straps?
Dim3: Brown & furry?
Dim4: Human leg?
Dim5: Standing in grass?
Dim6: Person holding it?
Dim7: Has laces?
…
Dim4096: In this sky?
45© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec w/ Wikipedia
Image Embeddings: ConvNet w/ ImageNet data
46© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec w/ Wikipedia
Image Embeddings: ConvNet w/ ImageNet data
☐Phrase Embeddings: ?
47© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Translation
Training data: list of
<English Phrase, French Phrase> pairs
48© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encoder/Decoder Network
RNN
Recurrent
Neural
NetworkRNN RNN
seriously technique
RNN
powerful
RNN RNN RNN
technique au puissant
RNN
sérieux
Phrase
Embedding
49© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Embeddings as Interfaces
Encoder
RNN
English
Words
Decoder
RNN
French
Words
English
Word2Vec
French
Word2Vec
Joint English/French
Phrase Embedding
50© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Joint Embedding”
Combines two kinds of data into the same
embedding space.
Here: English & French phrases.
Or…
51© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec
Image Embeddings: ConvNet
Phrase Embeddings: Encoder/Decoder RNN
☐Image/Phrase joint embedding: ?
52© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Neural Image Captioning
Training Data: list of <Image,Phrase> pairs
53© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Composition of Neural Networks
Image
ConvNet
Image
Language
Decoder
RNN
Descriptive
Phrase
English
Word2Vec
Joint
Image/Phrase
Embedding
Raw Pixel
Encoding
54© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Composition of Neural Networks
Image
ConvNet
Image RNN Word1
English
Word2Vec
Joint
Image/Phrase
Embedding
Raw Pixel
Encoding
Word2
Word3
RNN
RNN
55© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NIC examples
56© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec
Image Embeddings: ConvNet
Phrase Embeddings: Encoder/Decoder RNN
Image captioning: ConvNet + Decoder RNN
☐Limits of embedding models
57© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Phrase embeddings don’t work well
58© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why?
ℝ512 is too small
You’re nuts!
59© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Information content in embeddings
• How many points can be organized in ℝ2 ?
60© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Information content in embeddings
• How many points can be organized in 2 single-
precision floats?
264 = 18,446,744,073,709,551,616
61© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Information content in embeddings
• Only using 1 bit per dimension
2512 ≈ 10154
62© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cover’s Function Counting Theorem
(1965)
http://www.cns.nyu.edu/~eorhan/notes/covers-theorem.pdf
Simplification: ℝN is probably linearly
separable for up to O(N) points.
63© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec
Image Embeddings: ConvNet
Phrase Embeddings: Encoder/Decoder RNN
Image captioning: ConvNet + Decoder RNN
☐Attention models
64© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seq2Seq Network
RNN RNN
seriously technique
RNN
powerful
RNN RNN RNN
technique au puissant
RNN
sérieux
Phrase
Embedding
65© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seq2Seq with Attention
66© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Attention Model
f(Decoder_state, Input_Word) -> [0,1]
How relevant is this input word to the current
output?
67© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
NMT attention
https://arxiv.org/pdf/1409.0473.pdf
68© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Attention in NIC
https://arxiv.org/pdf/1502.03044.pdf
69© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Attention in NIC
https://arxiv.org/pdf/1502.03044.pdf
70© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Far from perfect
https://arxiv.org/pdf/1502.03044.pdf
71© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Word Embeddings: Word2Vec
Image Embeddings: ConvNet
Phrase Embeddings: Encoder/Decoder RNN
Image captioning: ConvNet + Decoder RNN
Attention models
☐Amazon Applications
72© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Product2Vec
Product
Embedding
NN
Product 1
Features
-Title
-Description
Product 1-2
Similarity
(observed
in aggregate customer
behavior)
NN
Product 2
Features
-Title
-Description
distance
73© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Product Embeddings w/ Images
Product
Embedding
NN
Product 1
Features
Product 1-2
Similarity
NN
Product 2
Features
distance
Product 1
Image
Product 2
Image
CNN
CNN
Image
Embedding
74© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analogies
- + =
75© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analogies
Dave's Killer Bread - 21
Whole Grains Bread - 2
loaves - USDA Organic
Stroehmann
King Bread Loaf
Pack of 2 Quaker Chewy
Variety Pack 60
Granola Bars
- + =
Nature's Path
Organic Chewy
Granola Bars
76© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Linear Combinations
+
=
2
77© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Image2Vec on Product Images
78© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understanding Points in High
Dimensional Space
• Excel
• Clustering – assigns integer values
• Projection – maps to 2D space (or 3D, 4D, etc)
– PCA
– “t-SNE” is a learned projection
79© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
t-SNE Clustering Product Images
80© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons
• Embeddings need a context to have meaning
– Similarity & Distance become relevant
• Supervised ML can create useful embeddings
– Weak labels are often good enough
• Neural networks are composable
– Re-use network architectures or trained networks
• Attention mechanisms extend embeddings
– Embeddings have limited capacity.
– Attention provides interpretability
81© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Think big!
It’s still day 1 for Deep Learning.
82
THANKS!

Contenu connexe

Tendances

ABD210 deloitte amtrak case study
ABD210 deloitte amtrak case studyABD210 deloitte amtrak case study
ABD210 deloitte amtrak case study
Amazon Web Services
 

Tendances (20)

AWS Services for Data Migration - AWS Online Tech Talks
AWS Services for Data Migration - AWS Online Tech TalksAWS Services for Data Migration - AWS Online Tech Talks
AWS Services for Data Migration - AWS Online Tech Talks
 
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
Operation Monitoring and Alerting at Scale in GE Transportation - ENT340 - re...
 
Oracle Enterprise Solutions on AWS (GPSCT203) - AWS re:Invent 2018
Oracle Enterprise Solutions on AWS (GPSCT203) - AWS re:Invent 2018Oracle Enterprise Solutions on AWS (GPSCT203) - AWS re:Invent 2018
Oracle Enterprise Solutions on AWS (GPSCT203) - AWS re:Invent 2018
 
ABD210 deloitte amtrak case study
ABD210 deloitte amtrak case studyABD210 deloitte amtrak case study
ABD210 deloitte amtrak case study
 
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
MSC202_Learn How Salesforce Used ADCs for App Load Balancing for an Internati...
 
Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017
 
Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)Deployment of SAP Solutions on AWS (Level 200)
Deployment of SAP Solutions on AWS (Level 200)
 
Automate the Provisioning of Secure Developer Environments on AWS PPT
 Automate the Provisioning of Secure Developer Environments on AWS PPT Automate the Provisioning of Secure Developer Environments on AWS PPT
Automate the Provisioning of Secure Developer Environments on AWS PPT
 
Born in the Cloud, Built like a Startup
Born in the Cloud, Built like a StartupBorn in the Cloud, Built like a Startup
Born in the Cloud, Built like a Startup
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdfDEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
DEV305_Manage Your Applications with AWS Elastic Beanstalk.pdf
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
 
Cost Optimisation Solutions on AWS
Cost Optimisation Solutions on AWS Cost Optimisation Solutions on AWS
Cost Optimisation Solutions on AWS
 
Deep Dive on AWS Migration Hub - AWS Online Tech Talks
Deep Dive on AWS Migration Hub - AWS Online Tech TalksDeep Dive on AWS Migration Hub - AWS Online Tech Talks
Deep Dive on AWS Migration Hub - AWS Online Tech Talks
 
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
 
GPSTEC319-Build Once Deploy Many Architecting and Building Automated Reusable...
GPSTEC319-Build Once Deploy Many Architecting and Building Automated Reusable...GPSTEC319-Build Once Deploy Many Architecting and Building Automated Reusable...
GPSTEC319-Build Once Deploy Many Architecting and Building Automated Reusable...
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 
[REPEAT 1] Executing a Large-Scale Migration to AWS (ENT205-R1) - AWS re:Inve...
[REPEAT 1] Executing a Large-Scale Migration to AWS (ENT205-R1) - AWS re:Inve...[REPEAT 1] Executing a Large-Scale Migration to AWS (ENT205-R1) - AWS re:Inve...
[REPEAT 1] Executing a Large-Scale Migration to AWS (ENT205-R1) - AWS re:Inve...
 
ARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active ArchitectureARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active Architecture
 
Builders' Day- Mastering Kubernetes on AWS
Builders' Day- Mastering Kubernetes on AWSBuilders' Day- Mastering Kubernetes on AWS
Builders' Day- Mastering Kubernetes on AWS
 

Similaire à Deep Learning at AWS: Embedding & Attention Models

Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless Backends
Amazon Web Services
 

Similaire à Deep Learning at AWS: Embedding & Attention Models (20)

Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
Advanced Design Patterns for Amazon DynamoDB - DAT403 - re:Invent 2017
 
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS Sagemaker
 
Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017
 
Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless Backends
 
Journey Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million UsersJourney Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million Users
 
Use Amazon Rekognition to Build a Facial Recognition System
Use Amazon Rekognition to Build a Facial Recognition SystemUse Amazon Rekognition to Build a Facial Recognition System
Use Amazon Rekognition to Build a Facial Recognition System
 
Use Amazon Rekognition to Build a Facial Recognition System
Use Amazon Rekognition to Build a Facial Recognition SystemUse Amazon Rekognition to Build a Facial Recognition System
Use Amazon Rekognition to Build a Facial Recognition System
 
Building Multiregion Serverless Backends
Building Multiregion Serverless BackendsBuilding Multiregion Serverless Backends
Building Multiregion Serverless Backends
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model Training
 
AI / ML Services - re:Invent Comes to London 2.0
AI / ML Services - re:Invent Comes to London 2.0AI / ML Services - re:Invent Comes to London 2.0
AI / ML Services - re:Invent Comes to London 2.0
 
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to Start
 
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
 
Moving Forward with AI
Moving Forward with AIMoving Forward with AI
Moving Forward with AI
 
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
Rethink Your Graphics Workstation Strategy with Amazon AppStream 2.0 - BAP311...
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to Start
 
SageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningSageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine Learning
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with Containers
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Deep Learning at AWS: Embedding & Attention Models

  • 1. 1© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning at AWS: Embeddings & Attention Models Leo Dirac, Principal Engineer July 20, 2017
  • 2. 2© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Goals of this talk • Inspire you to think big! • Explain some key Deep Learning concepts • Share impressive research results • Applications at Amazon
  • 3. 3© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How similar are these products? • Identical? • Different {sizes, styles} of the same product? • Different products?
  • 4. 4© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supervised ML Training Data Labels
  • 5. 6© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supervised ML Model
  • 6. 7© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Code Model Training Data Algorithm Code
  • 7. 8© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML models as Code • Linear Models (i.e. Logistic Regression) – Very simple algorithm: SUMPRODUCT – Fast to run, pretty easy to train • Deep Neural Networks – Arbitrarily complex algorithm – Tricky & slow to train, requires GPU hardware
  • 8. 9© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Floating Point Performance Multiply two (10,000 x 10,000) matrices (400MB each, 32-bit) • Native BLAS (python numpy): ~30 seconds* • Java (Naïve triple for-loop): • P2.xlarge GPU: ~0.6 seconds *Tested on a 4-core (8 w/ HT) iMac w/ Intel Core i7 @ 3.4GHz; similar to c4.2xl ~5 hours*
  • 9. 10© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 p2.16xlarge 68,000,000,000,000 operations/second (w/ 16 GPU’s, each about 4 TFlops)
  • 10. 11© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Clustering • Finds “similar” items • What is Similar? • Vector Distance – Data points are coordinates
  • 11. 12© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Euclidean Distance / L2-norm
  • 12. 13© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pixel Similarity Distance 0.483 1.412 1.770
  • 13. 15© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Preparing your data for Math NxD matrix
  • 14. 16© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Embedding” “Encoding” “Latent Features” “Feature Embedding” “Feature Vector” “Vector” “Point” “Coordinates”
  • 15. 17© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ☐Image Embeddings: ☐Word Embedding… tricky
  • 16. 18© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embedding • Why not just use char[]?
  • 17. 19© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Not semantically meaningful s(“Duck”) = [68.00, 117.00, 99.00, 107.00, 0.00, 0.00, 0.00, 0.00] • Closest to “Euck” “Dudk” “Dtck”. • Not very similar to “duck” or “Ducks”. • Very far from “Goose”.
  • 18. 20© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bag of Words / 1-Hot
  • 19. 21© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 20. 22© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 21. 23© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Holds in higher dimensions
  • 22. 24© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Everything is equidistant D(“king”,”queen”) = 1.4142 D(“king”,”kings”) = 1.4142 D(“king”,”small”) = 1.4142 D(”small”, “tiny”) = 1.4142 D(”frog”, “diesel”) = 1.4142 D(”soccer”, “ball”) = 1.4142
  • 23. 25© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Semantic meaning in geometry D(“king”,”queen”) = 0.188 D(“king”,”kings”) = 0.052 D(“king”,”small”) = 1.385 D(”small”, “tiny”) = 0.165
  • 24. 26© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word2vec Embedding W2v(“king”) = [-3.168 -0.136 3.770 4.767 3.558 -4.168 0.464 2.034 3.411 … 0.866] • float[128] • Meaningless to a human • Like a hash code • Pre-computed • Map<String,float[]> • Takes long time to train
  • 25. 27© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Similar words: nearby embeddings W2v(“king”) = [-3.168 -0.136 3.770 4.767 3.558 -4.168 0.464 2.034 3.411 … 0.866] W2v(“queen”) = [-3.101 -0.057 3.800 4.862 3.632 -4.157 0.549 2.064 3.428 … 0.884] D(W2v(“king”) – W2v(“queen”)) = 0.188
  • 26. 28© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Algebra [5.409 5.281 -1.331 3.714 -1.727 -3.167 -2.130 1.213 -3.285 … -2.000] W2v(“king”) – W2v(“queen”) + W2v(“aunt”) = W2v(“uncle”) ≈
  • 27. 29© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analogies in Geomgetry
  • 28. 30© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analogies
  • 29. 31© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word2Vec Embedding Training data: Large text corpus (like wikipedia)
  • 30. 32© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec ☐Image Embeddings: ?
  • 31. 33© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Goal: Semantic Similarity 0.058 0.731 0.782
  • 32. 34© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ImageNet Training data: 10^6 <Image,Noun> pairs Noun vocabulary: 1000
  • 33. 35© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolutional Neural Network X F(X) Y ←Leopard ≈ ConvNet CNN
  • 34. 36© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Human-level performance
  • 35. 39© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dark Knowledge Training Data [0.001, 0.000, 0.685, 0.013, … 0.004, 0.134, 0.000, … 0.007] grille  grille convertible Predictions Training Algorithm
  • 36. 40© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Predictions as Embedding? [0.001, 0.000, 0.685, 0.013, … 0.004, 0.134, 0.000, … 0.007] grille  convertible
  • 37. 41© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Image Features X F(X) ↑ penultimate layer y
  • 38. 42© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Linearly Separable Space
  • 39. 43© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learned features Penultimate Layer Dim1: Four legs? Dim2: Straps? Dim3: Brown & furry? Dim4: Human leg? Dim5: Standing in grass? Dim6: Person holding it? Dim7: Has laces? … Dim4096: In this sky? Output Layer Dim1: Is this an aardvark? Dim2: Is this an airplane? Dim3: Is this an apple? … Dim 258: Is this a dress shoe? … Dim721: Is this a sandal? … Dim 1000: Is this a zebra?
  • 40. 44© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Image Embedding Embedding Features Dim1: Four legs? Dim2: Straps? Dim3: Brown & furry? Dim4: Human leg? Dim5: Standing in grass? Dim6: Person holding it? Dim7: Has laces? … Dim4096: In this sky?
  • 41. 45© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec w/ Wikipedia Image Embeddings: ConvNet w/ ImageNet data
  • 42. 46© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec w/ Wikipedia Image Embeddings: ConvNet w/ ImageNet data ☐Phrase Embeddings: ?
  • 43. 47© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Translation Training data: list of <English Phrase, French Phrase> pairs
  • 44. 48© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encoder/Decoder Network RNN Recurrent Neural NetworkRNN RNN seriously technique RNN powerful RNN RNN RNN technique au puissant RNN sérieux Phrase Embedding
  • 45. 49© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Embeddings as Interfaces Encoder RNN English Words Decoder RNN French Words English Word2Vec French Word2Vec Joint English/French Phrase Embedding
  • 46. 50© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Joint Embedding” Combines two kinds of data into the same embedding space. Here: English & French phrases. Or…
  • 47. 51© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec Image Embeddings: ConvNet Phrase Embeddings: Encoder/Decoder RNN ☐Image/Phrase joint embedding: ?
  • 48. 52© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Neural Image Captioning Training Data: list of <Image,Phrase> pairs
  • 49. 53© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Composition of Neural Networks Image ConvNet Image Language Decoder RNN Descriptive Phrase English Word2Vec Joint Image/Phrase Embedding Raw Pixel Encoding
  • 50. 54© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Composition of Neural Networks Image ConvNet Image RNN Word1 English Word2Vec Joint Image/Phrase Embedding Raw Pixel Encoding Word2 Word3 RNN RNN
  • 51. 55© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NIC examples
  • 52. 56© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec Image Embeddings: ConvNet Phrase Embeddings: Encoder/Decoder RNN Image captioning: ConvNet + Decoder RNN ☐Limits of embedding models
  • 53. 57© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Phrase embeddings don’t work well
  • 54. 58© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why? ℝ512 is too small You’re nuts!
  • 55. 59© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Information content in embeddings • How many points can be organized in ℝ2 ?
  • 56. 60© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Information content in embeddings • How many points can be organized in 2 single- precision floats? 264 = 18,446,744,073,709,551,616
  • 57. 61© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Information content in embeddings • Only using 1 bit per dimension 2512 ≈ 10154
  • 58. 62© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cover’s Function Counting Theorem (1965) http://www.cns.nyu.edu/~eorhan/notes/covers-theorem.pdf Simplification: ℝN is probably linearly separable for up to O(N) points.
  • 59. 63© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec Image Embeddings: ConvNet Phrase Embeddings: Encoder/Decoder RNN Image captioning: ConvNet + Decoder RNN ☐Attention models
  • 60. 64© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Seq2Seq Network RNN RNN seriously technique RNN powerful RNN RNN RNN technique au puissant RNN sérieux Phrase Embedding
  • 61. 65© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Seq2Seq with Attention
  • 62. 66© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Attention Model f(Decoder_state, Input_Word) -> [0,1] How relevant is this input word to the current output?
  • 63. 67© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. NMT attention https://arxiv.org/pdf/1409.0473.pdf
  • 64. 68© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Attention in NIC https://arxiv.org/pdf/1502.03044.pdf
  • 65. 69© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Attention in NIC https://arxiv.org/pdf/1502.03044.pdf
  • 66. 70© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Far from perfect https://arxiv.org/pdf/1502.03044.pdf
  • 67. 71© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Word Embeddings: Word2Vec Image Embeddings: ConvNet Phrase Embeddings: Encoder/Decoder RNN Image captioning: ConvNet + Decoder RNN Attention models ☐Amazon Applications
  • 68. 72© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Product2Vec Product Embedding NN Product 1 Features -Title -Description Product 1-2 Similarity (observed in aggregate customer behavior) NN Product 2 Features -Title -Description distance
  • 69. 73© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Product Embeddings w/ Images Product Embedding NN Product 1 Features Product 1-2 Similarity NN Product 2 Features distance Product 1 Image Product 2 Image CNN CNN Image Embedding
  • 70. 74© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analogies - + =
  • 71. 75© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analogies Dave's Killer Bread - 21 Whole Grains Bread - 2 loaves - USDA Organic Stroehmann King Bread Loaf Pack of 2 Quaker Chewy Variety Pack 60 Granola Bars - + = Nature's Path Organic Chewy Granola Bars
  • 72. 76© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Linear Combinations + = 2
  • 73. 77© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Image2Vec on Product Images
  • 74. 78© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understanding Points in High Dimensional Space • Excel • Clustering – assigns integer values • Projection – maps to 2D space (or 3D, 4D, etc) – PCA – “t-SNE” is a learned projection
  • 75. 79© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. t-SNE Clustering Product Images
  • 76. 80© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons • Embeddings need a context to have meaning – Similarity & Distance become relevant • Supervised ML can create useful embeddings – Weak labels are often good enough • Neural networks are composable – Re-use network architectures or trained networks • Attention mechanisms extend embeddings – Embeddings have limited capacity. – Attention provides interpretability
  • 77. 81© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Think big! It’s still day 1 for Deep Learning.