SlideShare a Scribd company logo
1 of 55
Download to read offline
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to bootstrap image
classification and question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
Textbook ML development
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Fact | Industry grade ML solutions are highly exploratory
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Choosing the
Learning Task
Defining Data
Input
Applying Data
Transforms
Choosing the
Learner
Choosing
Output
Choosing Run
Options
View Results
Debug and
Visualize Errors
Analyze Model
Predictions
Attempt 1 Attempt 2 Attempt 3
Attempt 4 Attempt n
Traditional versus Transfer learning
Learning
system
Learning
system
Learning
system
Different tasks
Traditional Machine Learning Transfer Learning
Source tasks
Learning
system
Target task
Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
Why are we talking about transfer learning ?
Commercial
success
Time 2016
Supervised
learning
Transfer
learning
Unsupervised
learning
Reinforcement
learning
Drivers of ML success in industry
Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
Transfer Learning in Computer Vision
Can we leverage knowledge of processing images to help with new
tasks?
• What’s in the picture?
• Where is the bike located?
• Can you find a similar bike?
• How many bikes are there?
Before Deep Learning
• Researchers took a traditional machine learning approach
• Manual creation of a variety of different visual feature extractors
• Followed by traditional ML classifiers
• Features not very generalizable to other vision tasks – not easy to transfer
• Example: HoG Detectors
- Histogram of oriented
gradients (HoG) features
- Sliding window detector
- SVM Classifier
- Very fast OpenCV
implementation (<100ms)
Deep Neural Networks
14,197,122 images
21841 synsets
Diverse images, Lots of labels!
Transfer Learning for Computer Vision
Train a model
using data from
ImageNet Retail
Manufacturing
Deep Learning
Model for
Computer
Vision
Apply the
model to
other domains
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Another fun site:
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Example – Visualizing the different layers
Source: Olah, et al., "Feature Visualization", Distill, 2017
https://distill.pub/2017/feature-visualization/
Check out these sites -
https://deepart.io/nips/submissions/random/
http://cs231n.stanford.edu/
Clothing texture dataset:
• 1716 images from Bing which were manually annotated
Striped
Argyle
Dotted
Transfer Learning – How to get started?
Type How to Initialize
Featurization
Layers
Output
Layer
Initialization
How is Transfer Learning
used?
How to Train?
Standard DNN Random Random None Train featurization and output
jointly
Headless DNN Learn using
another task
Separate ML
algorithm
Use the features learned
on a related task
Use the features to train a
separate classifier
Fine Tune DNN Learn using
another task
Random Use and fine tune
features learned on a
related task
Train featurization and output
jointly with a small learning rate
Multi-Task DNN Random Random Learned features need to
solve many related tasks
Share a featurization network
across both tasks. Train all
networks jointly with a loss
function (sum of individual task
loss function)
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
cat? YES
dog? NO
car? NO
Classi
fier
e.g.
SVM
dotted?
Complex
Objects &
Scenes
(people, animals,
cars, beach
scene, etc.)
Low-Level Features
(lines, edges,
color fields, etc.)
High-Level Features
(corners, contours,
simple shapes)
Object Parts
(wheels, faces,
windows, etc.)
Outputs of penultimate layer of ImageNet Trained CNN
provide excellent general purpose image features
Pre-Built CNN from General Task on Millions of Images
Output
Layer
Stripped
Using a pre-trained DNN, an accurate
model can be achieved with thousands (or
less) of labeled examples instead of millions
cat? YES
dog? NO
car? NO
dotted?
Train one or more
layers in new network
Transfer Learning Results - Texture Dataset
DNN featurization
Input Image Size: 224x224 pixels
Area Under Curve: 0.59
Classification Accuracy: 69.0%
Fine-tuning (full CNN)
Input Image Size: 224x224 pixels
Area Under Curve: 0.76
Classification Accuracy: 77.4%
Fine-tuning (full CNN)
Input Image Size: 896x886 pixels
Area Under Curve: 0.83
Classification Accuracy: 88.2%
Transfer Learning for Similarity
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
•
•
•
•
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
Full code:
https://github.com/miguelgfierro/sciblog_support/blob/master/A_Gentle_Introduction_to_Transfer_Learning/Intro_Transfer_Learning.ipynb
0
5000
10000
15000
20000
25000
30000
35000
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 2 3 4 5 6 7 8
Series1 Series2 Series3
Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection
Example Applications in Computer Vision
Lung Cancer Detection
https://github.com/MattKleinsmith/void-detector
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Label Noise
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Traditional Method: Manual Verification
Read more details: https://www.microsoft.com/en-us/research/blog/using-
transfer-learning-to-address-label-noise-for-large-scale-image-classification/
Applying Transfer Learning
Computer Vision is not a “solved problem”
The knowledge being “transferred” can be very useful but not the same as
how humans learn to see
Recap: Transfer Learning for Image Classification
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
or use it as a
headless DNN
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
Audio Spectrograms
Images
Rich, high-dimensional datasets
Rich, high-dimensional datasets
Text
Spare data (depends on the encoding)I s e e a b I g c a t
Deep Learning on Different Types of Data
How do we apply
Transfer Learning to NLP?
Different Type of NLP Tasks
And many more….
Transfer Learning for Text
Define the
Learning Task
Identify a pre-
trained model
Decide whether to
further fine-tune
Freeze top layers,
re-train the
classifier
Validate the model
Deploy the model
What does the top
layer encode?
What kind of pre-
trained model?
Word Embeddings
Male - Female Verb Tense Country - Capital
Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
Word Embeddings
2013 2014-2015 2017
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
embeddings_index = {}
f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'))
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
Compute an index
mapping words to
known embeddings
embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Compute Embedding
Matrix
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
from keras.layers import Embedding
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
Load the Embedding
Matrix into an
Embedding Layer
Prevent weights from being
updated during training
Using Pre-trained Embeddings
Text Classification using 20 Newsgroup dataset
Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(35)(x) # global max pooling
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(len(labels_index), activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=2, batch_size=128)
Build a small 1D
convnet to solve the
classification problem
From initializing the first layers to pre-
training the entire model
(and learning higher level semantic concepts)
Transfer Learning for NLP - ULMFiT
Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018
Train a Language Model
using Large General
Domain Corpus
Fine-tune the
Language Model
Fine-tune Classifier
Transfer Learning for NLP - ELMo
Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018
ELMo ELMo ELMo
have a nice
Corpus
Train
biLMs
Enhancing
Inputs with ELMos
Usual
Inputs
ELMo Pre-trained Models
Source: https://allennlp.org/elmo
Using ELMo with TensorFlow Hub
Source: https://www.tensorflow.org/hub/modules/google/elmo/2
elmo = hub.Module("https://tfhub.dev/google/elmo/2",
trainable=True)
embeddings = elmo(
["the cat is on the mat", "dogs are in the fog"],
signature="default",
as_dict=True)["elmo"]
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]
ELMo
Untokenized Sentences
Tokens
Or Dictionary
• Character-based word representation
• First LSTM Hidden State
• Second LSTM Hidden State
• elmo (weighted sum of 3 layers)
• Fixed mean-pooling of contextualized
word representation
Transfer Learning for MRC tasks
Source:
Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
Transfer Learning for MRC
Train a model
using data from
WikiPedia
News Articles
Customer Support Data
MRC
Model Apply the
model to
other domains
SQUAD
Stanford Question Answering Dataset (SQuAD)
Reading comprehension dataset
Based on Wikipedia articles
Crowdsource questions
Answer is Text Segment, or span, from
the corresponding reading passage, or the no
answers found.
Question Answer Pairs
MRC Datasets
Transfer Learning for MRC using SynNet
Train using a large
MRC Dataset (e.g.
SQuAD)
Apply the pre-
trained model to a
new domain (e.g.
NewsQA)
Validate
the model
Deploy the model
Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0
More comparisons between different MRC Approaches
SynNet
Stage 1- Answer Synthesis module
uses a bi-directional LSTM to predict
IOB tags on the input paragraph.
Marks out semantic concept that are
likely answer
Stage 2 – Question Synthesis module
uses a uni-directional LSTM to
generate the questions
Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
SynNet – Question/Answer Generation Example
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Summary
1. Transfer Learning and
Applications
2. How to use Transfer Learning for
Image Classification
3. How to use Transfer Learning for
NLP tasks
O'Reilly Artificial Intelligence Conference San Francisco 2018
How to use transfer learning to
bootstrap image classification and
question answering (QA)
Danielle Dean PhD, Wee Hyong Tok PhD
Principal Data Scientist Lead
Microsoft
@danielleodean | @weehyong
Thank You!

More Related Content

What's hot

Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
Robust real time object detection
Robust real time object detectionRobust real time object detection
Robust real time object detection
Erliyah Jannah
 
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
Edge AI and Vision Alliance
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 

What's hot (20)

“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
“Can You See What I See? The Power of Deep Learning,” a Presentation from Str...
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorialBuilding Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
 
Academia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's GuideAcademia to Data Science - A Hitchhiker's Guide
Academia to Data Science - A Hitchhiker's Guide
 
Robust real time object detection
Robust real time object detectionRobust real time object detection
Robust real time object detection
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
 
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事孫民/從電腦視覺看人工智慧 : 下一件大事
孫民/從電腦視覺看人工智慧 : 下一件大事
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
 
Oa 4 month exp
Oa 4 month expOa 4 month exp
Oa 4 month exp
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 

Similar to How to use transfer learning to bootstrap image classification and question answering (QA)

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
CodePolitan
 

Similar to How to use transfer learning to bootstrap image classification and question answering (QA) (20)

Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial Intelligence
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)Easy path to machine learning (2023-2024)
Easy path to machine learning (2023-2024)
 
DotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETDotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NET
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
Deep Neural Networks (DNN)
 
Production ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google CloudProduction ML Systems and Computer Vision with Google Cloud
Production ML Systems and Computer Vision with Google Cloud
 

Recently uploaded

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Recently uploaded (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

How to use transfer learning to bootstrap image classification and question answering (QA)

  • 1. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Inspired by “Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud Defense” , Mark Russinovich, RSA Conference 2018
  • 2. Textbook ML development Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions
  • 3. Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Fact | Industry grade ML solutions are highly exploratory Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task Defining Data Input Applying Data Transforms Choosing the Learner Choosing Output Choosing Run Options View Results Debug and Visualize Errors Analyze Model Predictions Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt n
  • 4. Traditional versus Transfer learning Learning system Learning system Learning system Different tasks Traditional Machine Learning Transfer Learning Source tasks Learning system Target task Source: "A survey on transfer learning." , Pan, Sinno Jialin, and Qiang Yang. IEEE Transactions on knowledge and data engineering
  • 5. Why are we talking about transfer learning ? Commercial success Time 2016 Supervised learning Transfer learning Unsupervised learning Reinforcement learning Drivers of ML success in industry Source: “Transfer Learning - Machine Learning's Next Frontier” , Ruder, Sebastian,
  • 6. Transfer Learning in Computer Vision Can we leverage knowledge of processing images to help with new tasks? • What’s in the picture? • Where is the bike located? • Can you find a similar bike? • How many bikes are there?
  • 7. Before Deep Learning • Researchers took a traditional machine learning approach • Manual creation of a variety of different visual feature extractors • Followed by traditional ML classifiers • Features not very generalizable to other vision tasks – not easy to transfer • Example: HoG Detectors - Histogram of oriented gradients (HoG) features - Sliding window detector - SVM Classifier - Very fast OpenCV implementation (<100ms)
  • 9. 14,197,122 images 21841 synsets Diverse images, Lots of labels!
  • 10. Transfer Learning for Computer Vision Train a model using data from ImageNet Retail Manufacturing Deep Learning Model for Computer Vision Apply the model to other domains
  • 11. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Another fun site: https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 12. Example – Visualizing the different layers Source: Olah, et al., "Feature Visualization", Distill, 2017 https://distill.pub/2017/feature-visualization/ Check out these sites - https://deepart.io/nips/submissions/random/ http://cs231n.stanford.edu/
  • 13. Clothing texture dataset: • 1716 images from Bing which were manually annotated Striped Argyle Dotted
  • 14.
  • 15. Transfer Learning – How to get started? Type How to Initialize Featurization Layers Output Layer Initialization How is Transfer Learning used? How to Train? Standard DNN Random Random None Train featurization and output jointly Headless DNN Learn using another task Separate ML algorithm Use the features learned on a related task Use the features to train a separate classifier Fine Tune DNN Learn using another task Random Use and fine tune features learned on a related task Train featurization and output jointly with a small learning rate Multi-Task DNN Random Random Learned features need to solve many related tasks Share a featurization network across both tasks. Train all networks jointly with a loss function (sum of individual task loss function)
  • 16. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped cat? YES dog? NO car? NO Classi fier e.g. SVM dotted? Complex Objects & Scenes (people, animals, cars, beach scene, etc.) Low-Level Features (lines, edges, color fields, etc.) High-Level Features (corners, contours, simple shapes) Object Parts (wheels, faces, windows, etc.) Outputs of penultimate layer of ImageNet Trained CNN provide excellent general purpose image features
  • 17. Pre-Built CNN from General Task on Millions of Images Output Layer Stripped Using a pre-trained DNN, an accurate model can be achieved with thousands (or less) of labeled examples instead of millions cat? YES dog? NO car? NO dotted? Train one or more layers in new network
  • 18. Transfer Learning Results - Texture Dataset DNN featurization Input Image Size: 224x224 pixels Area Under Curve: 0.59 Classification Accuracy: 69.0% Fine-tuning (full CNN) Input Image Size: 224x224 pixels Area Under Curve: 0.76 Classification Accuracy: 77.4% Fine-tuning (full CNN) Input Image Size: 896x886 pixels Area Under Curve: 0.83 Classification Accuracy: 88.2%
  • 19. Transfer Learning for Similarity
  • 20.
  • 25. Aerial Use Classification ESmart – Connected Drone Jabil – Defect Inspection Example Applications in Computer Vision Lung Cancer Detection
  • 27.
  • 28. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Label Noise
  • 29. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Traditional Method: Manual Verification
  • 30. Read more details: https://www.microsoft.com/en-us/research/blog/using- transfer-learning-to-address-label-noise-for-large-scale-image-classification/ Applying Transfer Learning
  • 31. Computer Vision is not a “solved problem” The knowledge being “transferred” can be very useful but not the same as how humans learn to see
  • 32. Recap: Transfer Learning for Image Classification Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune or use it as a headless DNN Freeze top layers, re-train the classifier Validate the model Deploy the model
  • 33. Audio Spectrograms Images Rich, high-dimensional datasets Rich, high-dimensional datasets Text Spare data (depends on the encoding)I s e e a b I g c a t Deep Learning on Different Types of Data
  • 34. How do we apply Transfer Learning to NLP?
  • 35. Different Type of NLP Tasks And many more….
  • 36. Transfer Learning for Text Define the Learning Task Identify a pre- trained model Decide whether to further fine-tune Freeze top layers, re-train the classifier Validate the model Deploy the model What does the top layer encode? What kind of pre- trained model?
  • 37. Word Embeddings Male - Female Verb Tense Country - Capital Source: Tensorflow Tutorial - https://www.tensorflow.org/tutorials/representation/word2vec
  • 39. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html embeddings_index = {} f = open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')) for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype='float32') embeddings_index[word] = coefs f.close() Compute an index mapping words to known embeddings embedding_matrix = np.zeros((len(word_index) + 1, EMBEDDING_DIM)) for word, i in word_index.items(): embedding_vector = embeddings_index.get(word) if embedding_vector is not None: # words not found in embedding index will be all-zeros. embedding_matrix[i] = embedding_vector Compute Embedding Matrix
  • 40. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html from keras.layers import Embedding embedding_layer = Embedding(len(word_index) + 1, EMBEDDING_DIM, weights=[embedding_matrix], input_length=MAX_SEQUENCE_LENGTH, trainable=False) Load the Embedding Matrix into an Embedding Layer Prevent weights from being updated during training
  • 41. Using Pre-trained Embeddings Text Classification using 20 Newsgroup dataset Source: https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32') embedded_sequences = embedding_layer(sequence_input) x = Conv1D(128, 5, activation='relu')(embedded_sequences) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(5)(x) x = Conv1D(128, 5, activation='relu')(x) x = MaxPooling1D(35)(x) # global max pooling x = Flatten()(x) x = Dense(128, activation='relu')(x) preds = Dense(len(labels_index), activation='softmax')(x) model = Model(sequence_input, preds) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc']) model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=2, batch_size=128) Build a small 1D convnet to solve the classification problem
  • 42. From initializing the first layers to pre- training the entire model (and learning higher level semantic concepts)
  • 43. Transfer Learning for NLP - ULMFiT Source: Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, ACL 2018 Train a Language Model using Large General Domain Corpus Fine-tune the Language Model Fine-tune Classifier
  • 44. Transfer Learning for NLP - ELMo Source: Deep contextualized word representations, Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer., NAACL 2018 ELMo ELMo ELMo have a nice Corpus Train biLMs Enhancing Inputs with ELMos Usual Inputs
  • 45. ELMo Pre-trained Models Source: https://allennlp.org/elmo
  • 46. Using ELMo with TensorFlow Hub Source: https://www.tensorflow.org/hub/modules/google/elmo/2 elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) embeddings = elmo( ["the cat is on the mat", "dogs are in the fog"], signature="default", as_dict=True)["elmo"] elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True) tokens_input = [["the", "cat", "is", "on", "the", "mat"], ["dogs", "are", "in", "the", "fog", ""]] tokens_length = [6, 5] embeddings = elmo( inputs={ "tokens": tokens_input, "sequence_len": tokens_length }, signature="tokens", as_dict=True)["elmo"] ELMo Untokenized Sentences Tokens Or Dictionary • Character-based word representation • First LSTM Hidden State • Second LSTM Hidden State • elmo (weighted sum of 3 layers) • Fixed mean-pooling of contextualized word representation
  • 47. Transfer Learning for MRC tasks Source: Transfer Learning for Machine Reading Comprehension - https://bit.ly/2Cmiffy
  • 48. Transfer Learning for MRC Train a model using data from WikiPedia News Articles Customer Support Data MRC Model Apply the model to other domains
  • 49. SQUAD Stanford Question Answering Dataset (SQuAD) Reading comprehension dataset Based on Wikipedia articles Crowdsource questions Answer is Text Segment, or span, from the corresponding reading passage, or the no answers found. Question Answer Pairs
  • 51. Transfer Learning for MRC using SynNet Train using a large MRC Dataset (e.g. SQuAD) Apply the pre- trained model to a new domain (e.g. NewsQA) Validate the model Deploy the model Transfer Learning for MRC –Survey - https://bit.ly/2JAt1h0 More comparisons between different MRC Approaches
  • 52. SynNet Stage 1- Answer Synthesis module uses a bi-directional LSTM to predict IOB tags on the input paragraph. Marks out semantic concept that are likely answer Stage 2 – Question Synthesis module uses a uni-directional LSTM to generate the questions Source: ACL 2017, https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/
  • 53. SynNet – Question/Answer Generation Example
  • 54. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Summary 1. Transfer Learning and Applications 2. How to use Transfer Learning for Image Classification 3. How to use Transfer Learning for NLP tasks
  • 55. O'Reilly Artificial Intelligence Conference San Francisco 2018 How to use transfer learning to bootstrap image classification and question answering (QA) Danielle Dean PhD, Wee Hyong Tok PhD Principal Data Scientist Lead Microsoft @danielleodean | @weehyong Thank You!