SlideShare une entreprise Scribd logo
1  sur  42
A Survey on
      Transfer Learning
               Sinno Jialin Pan
  Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
        Joint work with Prof. Qiang Yang
Transfer Learning? (DARPA 05)
Transfer Learning (TL):
    The ability of a system to recognize and apply knowledge and
    skills learned in previous tasks to novel tasks (in new domains)



 It is motivated by human learning. People can often transfer knowledge
 learnt previously to novel situations
  Chess  Checkers
  Mathematics  Computer Science
  Table Tennis  Tennis
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Traditional ML vs. TL
                                          (P. Langley 06)
                    Traditional ML in                                      Transfer of learning
                    multiple domains                                         across domains
training items




                                                          training items




                                                                                                       test items
                                             test items




             Humans can learn in many domains.                     Humans can also transfer from one
                                                                   domain to other domains.
Traditional ML vs. TL
           Learning Process of                       Learning Process of
             Traditional ML                           Transfer Learning

              training items                           training items




Learning System          Learning System   Learning System




                                            Knowledge               Learning System
Notation
Domain:
It consists of two components: A feature space     , a marginal distribution


In general, if two domains are different, then they may have different feature
    spaces
or different marginal distributions.

Task:
Given a specific domain and label space      , for each   in the domain, to
predict its corresponding label

In general, if two tasks are different, then they may have different label spaces or
different conditional distributions
Notation
For simplicity, we only consider at most two domains and two tasks.

Source domain:

Task in the source domain:

Target domain:

Task in the target domain
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Why Transfer Learning?
 In some domains, labeled data are in short supply.
 In some domains, the calibration effort is very expensive.
 In some domains, the learning process is time consuming.


  How to extract knowledge learnt from related domains to help
  learning in a target domain with a few labeled data?
  How to extract knowledge learnt from related domains to speed up
  learning in a target domain?


          Transfer learning techniques may help!
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Settings of Transfer Learning
  Transfer learning settings     Labeled data in   Labeled data in      Tasks
                                 a source domain   a target domain
 Inductive Transfer Learning                                         Classification
                                       ×                 √
                                                                      Regression
                                       √                 √                …
Transductive Transfer Learning                                       Classification
                                       √                 ×
                                                                      Regression
                                                                          …
Unsupervised Transfer Learning                                        Clustering
                                       ×                 ×
                                                                         …
An overview of
various settings of                                                                                                 Self-taught
                                                                              Case 1
  transfer learning                                                                                                  Learning
                                                   No labeled data in a source domain


                                       Inductive
                                   Transfer Learning
                                            Labeled data are available in a source domain
   Labeled data are available in
                                                                                                 Source and
         a target domain
                                                                                               target tasks are
                                                                                                                    Multi-task
                                                                              Case 2                learnt          Learning
                                                                                               simultaneously

 Transfer
 Learning
                              Labeled data are
                             available only in a                                                Assumption:
                               source domain                Transductive                          different          Domain
                                                                                                domains but
                                                          Transfer Learning                      single task        Adaptation
   No labeled data in
    both source and
     target domain                                                                      Assumption: single domain
                                                                                             and single task



                              Unsupervised                                                          Sample Selection Bias /
                            Transfer Learning                                                          Covariance Shift
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Approaches to Transfer Learning
Transfer learning approaches                        Description
        Instance-transfer            To re-weight some labeled data in a source
                                        domain for use in the target domain
 Feature-representation-transfer   Find a “good” feature representation that reduces
                                   difference between a source and a target domain
                                            or minimizes error of models
         Model-transfer            Discover shared parameters or priors of models
                                   between a source domain and a target domain
 Relational-knowledge-transfer     Build mapping of relational knowledge between
                                        a source domain and a target domain.
Approaches to Transfer Learning
                             Inductive          Transductive        Unsupervised
                          Transfer Learning   Transfer Learning   Transfer Learning

   Instance-transfer             √                   √
Feature-representation-          √                   √                   √
       transfer
    Model-transfer               √
Relational-knowledge-            √
       transfer
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Inductive Transfer Learning
               Instance-transfer Approaches
• Assumption: the source domain and target domain data use
  exactly the same features and labels.

• Motivation: Although the source domain data can not be
  reused directly, there are some parts of the data that can still
  be reused by re-weighting.

• Main Idea: Discriminatively adjust weighs of data in the
  source domain for use in the target domain.
Inductive Transfer Learning
                        --- Instance-transfer Approaches
                             Non-standard SVMs
                                 [Wu and Dietterich ICML-04]

                       Uniform weights                 Correct the decision boundary by re-weighting




Loss function on the                     Loss function on the
target domain data                       source domain data                       Regularization term


   Differentiate the cost for misclassification of the target and source data
Inductive Transfer Learning
    --- Instance-transfer Approaches
                         TrAdaBoost
                              [Dai et al. ICML-07]

           Hedge ( β )                           AdaBoost
      [Freund et al. 1997]                 [Freund et al. 1997]
  To decrease the weights                    To increase the weights of
  of the misclassified data                  the misclassified data



                                                            The whole
                    Source domain                        training data set
                                         target domain
                     labeled data         labeled data


                                           Classifiers trained on
                                          re-weighted labeled data

                                Target domain
                                unlabeled data
Inductive Transfer Learning
     Feature-representation-transfer Approaches
         Supervised Feature Construction
                  [Argyriou et al. NIPS-06, NIPS-07]

Assumption: If t tasks are related to each other, then they may
share some common features which can benefit for all tasks.

Input: t tasks, each of them has its own training data.

Output: Common features learnt across t tasks and t models for t
tasks, respectively.
Supervised Feature Construction
                    [Argyriou et al. NIPS-06, NIPS-07]




        Average of the empirical
        error across t tasks                            Regularization to make the
                                                        representation sparse

                               Orthogonal Constraints


where
Inductive Transfer Learning
     Feature-representation-transfer Approaches
       Unsupervised Feature Construction
                        [Raina et al. ICML-07]

Three steps:
•   Applying sparse coding [Lee et al. NIPS-07] algorithm to learn
    higher-level representation from unlabeled data in the source
    domain.

•   Transforming the target data to new representations by new bases
    learnt in the first step.

•   Traditional discriminative models can be applied on new
    representations of the target data with corresponding labels.
Unsupervised Feature Construction
                         [Raina et al. ICML-07]

Step1:




Input: Source domain data X S = {xS } and coefficient β
                                         i

Output: New representations of the source domain data AS = {aS }     i


       and new bases B = {bi }

Step2:



Input: Target domain data X T = {xT } , coefficient β and bases B = {bi }
                                     i


Output: New representations of the target domain data AT = {aT } i
Inductive Transfer Learning
                     Model-transfer Approaches
               Regularization-based Method
                          [Evgeiou and Pontil, KDD-04]

Assumption: If t tasks are related to each other, then they may share some
parameters among individual models.

Assume f t = wt ⋅ x be a hyper-plane for task , where        t ∈ {T , S } and




               Common part                       Specific part for individual task

                                                                                 Regularization terms
                                                                                  for multiple tasks
Encode them into SVMs:
Inductive Transfer Learning
       Relational-knowledge-transfer Approaches
                               TAMAR
                        [Mihalkova et al. AAAI-07]
Assumption: If the target domain and source domain are related, then there
may be some relationship between domains being similar, which can be used for
transfer learning

Input:
6. Relational data in the source domain and a statistical relational model,
     Markov Logic Network (MLN), which has been learnt in the source
     domain.
7. Relational data in the target domain.

Output: A new statistical relational model, MLN, in the target domain.

Goal: To learn a MLN in the target domain more efficiently and effectively.
TAMAR [Mihalkova et al. AAAI-07]
Two Stages:
2. Predicate Mapping
   – Establish the mapping between predicates in the source
     and target domain. Once a mapping is established, clauses
     from the source domain can be translated into the target
     domain.
3. Revising the Mapped Structure
   – The clauses mapping from the source domain directly
     may not be completely accurate and may need to be
     revised, augmented , and re-weighted in order to properly
     model the target data.
TAMAR [Mihalkova et al. AAAI-07]
    Source domain (academic domain)            Target domain (movie domain)



               AdvisedBy                                 WorkedFor
Student (B)                   Professor (A)   Actor(A)               Director(B)


      Publication         Publication             MovieMember    MovieMember


              Paper (T)                                  Movie(M)




Mapping




Revising
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Transductive Transfer Learning
                    Instance-transfer Approaches
       Sample Selection Bias / Covariance Shift
                  [Zadrozny ICML-04, Schwaighofer JSPI-00]
Input: A lot of labeled data in the source domain and no labeled data in the
target domain.

Output: Models for use in the target domain data.

Assumption: The source domain and target domain are the same. In addition,
P (YS | X S )   P (YT | X T )                 P( X S )   P( X T )
           and            are the same while         and        may be
different causing by different sampling process (training data and test data).

Main Idea: Re-weighting (important sampling) the source domain data.
Sample Selection Bias/Covariance Shift
 To correct sample selection bias:




                             weights for source
                               domain data




How to estimate        ?
One straightforward solution is to estimate P( X S ) and P( X T ) ,
respectively. However, estimating density function is a hard
   problem.
Sample Selection Bias/Covariance Shift
              Kernel Mean Match (KMM)
                         [Huang et al. NIPS 2006]

Main Idea: KMM tries to estimate              directly instead of estimating
density function.

It can be proved that can be estimated by solving the following quadratic
programming (QP) optimization problem.
                                                     To match means between
                                                 training and test data in a RKHS




Theoretical Support: Maximum Mean Discrepancy (MMD) [Borgwardt et al.
BIOINFOMATICS-06]. The distance of distributions can be measured
by Euclid distance of their mean vectors in a RKHS.
Transductive Transfer Learning
      Feature-representation-transfer Approaches
                       Domain Adaptation
 [Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07]
Assumption: Single task across domains, which means P(YS | X S ) and P(YT | X T )
are the same while P ( X S ) and P ( X T ) may be different causing by feature
representations across domains.

Main Idea: Find a “good” feature representation that reduce the “distance”
between domains.

Input: A lot of labeled data in the source domain and only unlabeled data in the
target domain.

Output: A common representation between source domain data and target
domain data and a model on the new representation for use in the target
     domain.
Domain Adaptation
 Structural Correspondence Learning (SCL)
   [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05]

Motivation: If two domains are related to each other, then there may exist
some “pivot” features across both domain. Pivot features are features that
behave in the same way for discriminative learning in both domains.

Main Idea: To identify correspondences among features from different
domains by modeling their correlations with pivot features. Non-pivot features
form different domains that are correlated with many of the same pivot
features are assumed to correspond, and they are treated similarly in a
discriminative learner.
SCL
[Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang
                          JMLR-05]
                                            a) Heuristically choose m pivot
                                            features, which is task specific.
                                            b) Transform each vector of pivot
                                            feature to a vector of binary
                                            values and then create
                                            corresponding prediction
                                            problem.

                                                Learn parameters of each
                                                prediction problem


                                                Do Eigen Decomposition
                                                on the matrix of
                                                parameters and learn the
                                                linear mapping function.


                                       Use the learnt mapping function to
                                       construct new features and train
                                       classifiers onto the new representations.
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Unsupervised Transfer Learning
      Feature-representation-transfer Approaches
              Self-taught Clustering (STC)
                            [Dai et al. ICML-08]

Input: A lot of unlabeled data in a source domain and a few unlabeled data in a
target domain.

Goal: Clustering the target domain data.

Assumption: The source domain and target domain data share some common
features, which can help clustering in the target domain.

Main Idea: To extend the information theoretic co-clustering algorithm
[Dhillon et al. KDD-03] for transfer learning.
Self-taught Clustering (STC)
                                 [Dai et al. ICML-08]
                                 Common features



                                                        Target domain data



Source domain data

                                                                         Co-clustering in the
                                                                           source domain
Objective function that need to be minimized


      Co-clustering in the target domain


where

                                    Cluster functions
Output
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Negative Transfer
 Most approaches to transfer learning assume transferring knowledge across
  domains be always positive.

 However, in some cases, when two tasks are too dissimilar, brute-force
  transfer may even hurt the performance of the target task, which is called
  negative transfer [Rosenstein et al NIPS-05 Workshop].

 Some researchers have studied how to measure relatedness among tasks
  [Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03].

 How to design a mechanism to avoid negative transfer needs to be studied
  theoretically.
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Conclusion
                             Inductive          Transductive        Unsupervised
                          Transfer Learning   Transfer Learning   Transfer Learning

   Instance-transfer             √                   √
Feature-representation-          √                   √                   √
       transfer
    Model-transfer               √
Relational-knowledge-            √
       transfer




How to avoid negative transfer need to be attracted more attention!

Contenu connexe

Tendances

Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detectionMohamed Elfadly
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Uninformed search
Uninformed searchUninformed search
Uninformed searchBablu Shofi
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fieldsVarun Bhaseen
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
ImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).pptImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).pptVikramBarapatre2
 
Integration of Sensors & Actuators With Arduino.pptx
Integration of Sensors & Actuators With Arduino.pptxIntegration of Sensors & Actuators With Arduino.pptx
Integration of Sensors & Actuators With Arduino.pptxNShravani1
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
JSF Request Processing Lifecycle
JSF Request Processing LifecycleJSF Request Processing Lifecycle
JSF Request Processing LifecycleMilap Bhanderi
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...The Integral Worm
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)RakeshSaran5
 
Computer Vision Basics
Computer Vision BasicsComputer Vision Basics
Computer Vision BasicsSuren Kumar
 

Tendances (20)

Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Uninformed search
Uninformed searchUninformed search
Uninformed search
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Chapter8
Chapter8Chapter8
Chapter8
 
ImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).pptImageProcessing10-Segmentation(Thresholding) (1).ppt
ImageProcessing10-Segmentation(Thresholding) (1).ppt
 
Integration of Sensors & Actuators With Arduino.pptx
Integration of Sensors & Actuators With Arduino.pptxIntegration of Sensors & Actuators With Arduino.pptx
Integration of Sensors & Actuators With Arduino.pptx
 
Data structures
Data structuresData structures
Data structures
 
AI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptxAI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptx
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Activation function
Activation functionActivation function
Activation function
 
JSF Request Processing Lifecycle
JSF Request Processing LifecycleJSF Request Processing Lifecycle
JSF Request Processing Lifecycle
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
Artificial Intelligence: The Nine Phases of the Expert System Development Lif...
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)
 
Computer Vision Basics
Computer Vision BasicsComputer Vision Basics
Computer Vision Basics
 

En vedette

Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clusteringSOYEON KIM
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web imagesMediaMixerCommunity
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Best Blue Brain ppt ever.
Best Blue Brain ppt ever.Best Blue Brain ppt ever.
Best Blue Brain ppt ever.Suhail Shaikh
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainRawan Al-Omari
 
TRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine AnoranTRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine AnoranLorraine Mae Anoran
 
Amith blue brain
Amith blue brainAmith blue brain
Amith blue brainAmith Kp
 
Brain Fingerprinting PPT
Brain Fingerprinting PPTBrain Fingerprinting PPT
Brain Fingerprinting PPTVishnu Mysterio
 

En vedette (17)

Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clustering
 
Video concept detection by learning from web images
Video concept detection by learning from web imagesVideo concept detection by learning from web images
Video concept detection by learning from web images
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Best Blue Brain ppt ever.
Best Blue Brain ppt ever.Best Blue Brain ppt ever.
Best Blue Brain ppt ever.
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Artificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google BrainArtificial Neural Network Seminar - Google Brain
Artificial Neural Network Seminar - Google Brain
 
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
 
TRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine AnoranTRANSFER OF LEARNING by Lorraine Anoran
TRANSFER OF LEARNING by Lorraine Anoran
 
Amith blue brain
Amith blue brainAmith blue brain
Amith blue brain
 
Blue brain
Blue brainBlue brain
Blue brain
 
Brain fingerprinting
Brain fingerprintingBrain fingerprinting
Brain fingerprinting
 
BLUE BRAIN
BLUE BRAINBLUE BRAIN
BLUE BRAIN
 
Brain Fingerprinting PPT
Brain Fingerprinting PPTBrain Fingerprinting PPT
Brain Fingerprinting PPT
 

Similaire à A survey on transfer learning

Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Presentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarPresentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarJacob Molenaar
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
Kma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typeKma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typegharawi
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
 
Analysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersAnalysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersharshavardhan814108
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithmsbutest
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondEunjeong (Lucy) Park
 

Similaire à A survey on transfer learning (14)

[ppt]
[ppt][ppt]
[ppt]
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Presentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarPresentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob Molenaar
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
Seminar dm
Seminar dmSeminar dm
Seminar dm
 
Kma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typeKma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_type
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
 
Analysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersAnalysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papers
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithms
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and Beyond
 

Dernier

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Dernier (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

A survey on transfer learning

  • 1. A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint work with Prof. Qiang Yang
  • 2. Transfer Learning? (DARPA 05) Transfer Learning (TL): The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (in new domains) It is motivated by human learning. People can often transfer knowledge learnt previously to novel situations  Chess  Checkers  Mathematics  Computer Science  Table Tennis  Tennis
  • 3. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 4. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 5. Traditional ML vs. TL (P. Langley 06) Traditional ML in Transfer of learning multiple domains across domains training items training items test items test items Humans can learn in many domains. Humans can also transfer from one domain to other domains.
  • 6. Traditional ML vs. TL Learning Process of Learning Process of Traditional ML Transfer Learning training items training items Learning System Learning System Learning System Knowledge Learning System
  • 7. Notation Domain: It consists of two components: A feature space , a marginal distribution In general, if two domains are different, then they may have different feature spaces or different marginal distributions. Task: Given a specific domain and label space , for each in the domain, to predict its corresponding label In general, if two tasks are different, then they may have different label spaces or different conditional distributions
  • 8. Notation For simplicity, we only consider at most two domains and two tasks. Source domain: Task in the source domain: Target domain: Task in the target domain
  • 9. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 10. Why Transfer Learning?  In some domains, labeled data are in short supply.  In some domains, the calibration effort is very expensive.  In some domains, the learning process is time consuming.  How to extract knowledge learnt from related domains to help learning in a target domain with a few labeled data?  How to extract knowledge learnt from related domains to speed up learning in a target domain?  Transfer learning techniques may help!
  • 11. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 12. Settings of Transfer Learning Transfer learning settings Labeled data in Labeled data in Tasks a source domain a target domain Inductive Transfer Learning Classification × √ Regression √ √ … Transductive Transfer Learning Classification √ × Regression … Unsupervised Transfer Learning Clustering × × …
  • 13. An overview of various settings of Self-taught Case 1 transfer learning Learning No labeled data in a source domain Inductive Transfer Learning Labeled data are available in a source domain Labeled data are available in Source and a target domain target tasks are Multi-task Case 2 learnt Learning simultaneously Transfer Learning Labeled data are available only in a Assumption: source domain Transductive different Domain domains but Transfer Learning single task Adaptation No labeled data in both source and target domain Assumption: single domain and single task Unsupervised Sample Selection Bias / Transfer Learning Covariance Shift
  • 14. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 15. Approaches to Transfer Learning Transfer learning approaches Description Instance-transfer To re-weight some labeled data in a source domain for use in the target domain Feature-representation-transfer Find a “good” feature representation that reduces difference between a source and a target domain or minimizes error of models Model-transfer Discover shared parameters or priors of models between a source domain and a target domain Relational-knowledge-transfer Build mapping of relational knowledge between a source domain and a target domain.
  • 16. Approaches to Transfer Learning Inductive Transductive Unsupervised Transfer Learning Transfer Learning Transfer Learning Instance-transfer √ √ Feature-representation- √ √ √ transfer Model-transfer √ Relational-knowledge- √ transfer
  • 17. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 18. Inductive Transfer Learning Instance-transfer Approaches • Assumption: the source domain and target domain data use exactly the same features and labels. • Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting. • Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.
  • 19. Inductive Transfer Learning --- Instance-transfer Approaches Non-standard SVMs [Wu and Dietterich ICML-04] Uniform weights Correct the decision boundary by re-weighting Loss function on the Loss function on the target domain data source domain data Regularization term  Differentiate the cost for misclassification of the target and source data
  • 20. Inductive Transfer Learning --- Instance-transfer Approaches TrAdaBoost [Dai et al. ICML-07] Hedge ( β ) AdaBoost [Freund et al. 1997] [Freund et al. 1997] To decrease the weights To increase the weights of of the misclassified data the misclassified data The whole Source domain training data set target domain labeled data labeled data Classifiers trained on re-weighted labeled data Target domain unlabeled data
  • 21. Inductive Transfer Learning Feature-representation-transfer Approaches Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07] Assumption: If t tasks are related to each other, then they may share some common features which can benefit for all tasks. Input: t tasks, each of them has its own training data. Output: Common features learnt across t tasks and t models for t tasks, respectively.
  • 22. Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07] Average of the empirical error across t tasks Regularization to make the representation sparse Orthogonal Constraints where
  • 23. Inductive Transfer Learning Feature-representation-transfer Approaches Unsupervised Feature Construction [Raina et al. ICML-07] Three steps: • Applying sparse coding [Lee et al. NIPS-07] algorithm to learn higher-level representation from unlabeled data in the source domain. • Transforming the target data to new representations by new bases learnt in the first step. • Traditional discriminative models can be applied on new representations of the target data with corresponding labels.
  • 24. Unsupervised Feature Construction [Raina et al. ICML-07] Step1: Input: Source domain data X S = {xS } and coefficient β i Output: New representations of the source domain data AS = {aS } i and new bases B = {bi } Step2: Input: Target domain data X T = {xT } , coefficient β and bases B = {bi } i Output: New representations of the target domain data AT = {aT } i
  • 25. Inductive Transfer Learning Model-transfer Approaches Regularization-based Method [Evgeiou and Pontil, KDD-04] Assumption: If t tasks are related to each other, then they may share some parameters among individual models. Assume f t = wt ⋅ x be a hyper-plane for task , where t ∈ {T , S } and Common part Specific part for individual task Regularization terms for multiple tasks Encode them into SVMs:
  • 26. Inductive Transfer Learning Relational-knowledge-transfer Approaches TAMAR [Mihalkova et al. AAAI-07] Assumption: If the target domain and source domain are related, then there may be some relationship between domains being similar, which can be used for transfer learning Input: 6. Relational data in the source domain and a statistical relational model, Markov Logic Network (MLN), which has been learnt in the source domain. 7. Relational data in the target domain. Output: A new statistical relational model, MLN, in the target domain. Goal: To learn a MLN in the target domain more efficiently and effectively.
  • 27. TAMAR [Mihalkova et al. AAAI-07] Two Stages: 2. Predicate Mapping – Establish the mapping between predicates in the source and target domain. Once a mapping is established, clauses from the source domain can be translated into the target domain. 3. Revising the Mapped Structure – The clauses mapping from the source domain directly may not be completely accurate and may need to be revised, augmented , and re-weighted in order to properly model the target data.
  • 28. TAMAR [Mihalkova et al. AAAI-07] Source domain (academic domain) Target domain (movie domain) AdvisedBy WorkedFor Student (B) Professor (A) Actor(A) Director(B) Publication Publication MovieMember MovieMember Paper (T) Movie(M) Mapping Revising
  • 29. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 30. Transductive Transfer Learning Instance-transfer Approaches Sample Selection Bias / Covariance Shift [Zadrozny ICML-04, Schwaighofer JSPI-00] Input: A lot of labeled data in the source domain and no labeled data in the target domain. Output: Models for use in the target domain data. Assumption: The source domain and target domain are the same. In addition, P (YS | X S ) P (YT | X T ) P( X S ) P( X T ) and are the same while and may be different causing by different sampling process (training data and test data). Main Idea: Re-weighting (important sampling) the source domain data.
  • 31. Sample Selection Bias/Covariance Shift To correct sample selection bias: weights for source domain data How to estimate ? One straightforward solution is to estimate P( X S ) and P( X T ) , respectively. However, estimating density function is a hard problem.
  • 32. Sample Selection Bias/Covariance Shift Kernel Mean Match (KMM) [Huang et al. NIPS 2006] Main Idea: KMM tries to estimate directly instead of estimating density function. It can be proved that can be estimated by solving the following quadratic programming (QP) optimization problem. To match means between training and test data in a RKHS Theoretical Support: Maximum Mean Discrepancy (MMD) [Borgwardt et al. BIOINFOMATICS-06]. The distance of distributions can be measured by Euclid distance of their mean vectors in a RKHS.
  • 33. Transductive Transfer Learning Feature-representation-transfer Approaches Domain Adaptation [Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07] Assumption: Single task across domains, which means P(YS | X S ) and P(YT | X T ) are the same while P ( X S ) and P ( X T ) may be different causing by feature representations across domains. Main Idea: Find a “good” feature representation that reduce the “distance” between domains. Input: A lot of labeled data in the source domain and only unlabeled data in the target domain. Output: A common representation between source domain data and target domain data and a model on the new representation for use in the target domain.
  • 34. Domain Adaptation Structural Correspondence Learning (SCL) [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05] Motivation: If two domains are related to each other, then there may exist some “pivot” features across both domain. Pivot features are features that behave in the same way for discriminative learning in both domains. Main Idea: To identify correspondences among features from different domains by modeling their correlations with pivot features. Non-pivot features form different domains that are correlated with many of the same pivot features are assumed to correspond, and they are treated similarly in a discriminative learner.
  • 35. SCL [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05] a) Heuristically choose m pivot features, which is task specific. b) Transform each vector of pivot feature to a vector of binary values and then create corresponding prediction problem. Learn parameters of each prediction problem Do Eigen Decomposition on the matrix of parameters and learn the linear mapping function. Use the learnt mapping function to construct new features and train classifiers onto the new representations.
  • 36. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 37. Unsupervised Transfer Learning Feature-representation-transfer Approaches Self-taught Clustering (STC) [Dai et al. ICML-08] Input: A lot of unlabeled data in a source domain and a few unlabeled data in a target domain. Goal: Clustering the target domain data. Assumption: The source domain and target domain data share some common features, which can help clustering in the target domain. Main Idea: To extend the information theoretic co-clustering algorithm [Dhillon et al. KDD-03] for transfer learning.
  • 38. Self-taught Clustering (STC) [Dai et al. ICML-08] Common features Target domain data Source domain data Co-clustering in the source domain Objective function that need to be minimized Co-clustering in the target domain where Cluster functions Output
  • 39. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 40. Negative Transfer  Most approaches to transfer learning assume transferring knowledge across domains be always positive.  However, in some cases, when two tasks are too dissimilar, brute-force transfer may even hurt the performance of the target task, which is called negative transfer [Rosenstein et al NIPS-05 Workshop].  Some researchers have studied how to measure relatedness among tasks [Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03].  How to design a mechanism to avoid negative transfer needs to be studied theoretically.
  • 41. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 42. Conclusion Inductive Transductive Unsupervised Transfer Learning Transfer Learning Transfer Learning Instance-transfer √ √ Feature-representation- √ √ √ transfer Model-transfer √ Relational-knowledge- √ transfer How to avoid negative transfer need to be attracted more attention!

Notes de l'éditeur

  1. Seems the changes is involved in moving to new tasks is more radical than moving to new domains.
  2. Generality means human can learn to perform a variety of tasks.
  3. Generality means human can learn to perform a variety of tasks.