SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
1 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
February 22, 2018
2 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
3 / 26
Meta-Learning
4 / 26
Meta-Learning
Which is Aconitum napellus?
5 / 26
Meta-Learning
Which is Aconitum napellus?
Same information, but this version of the task is impossible
for humans. We clearly have something that helps us process
new visual information.
6 / 26
Meta-Learning
Which is Aconitum napellus?
Some humans have (meta-)learned to answer this question.
Meta-learning can occur using acquired knowledge.
7 / 26
Meta-Learning
8 / 26
Previous Deep Meta-Learning Methods
Metric Learning1234
Learn a metric in image space
Specific to few-shot classification(Omniglot, MiniImageNet
etc)
Learning=nearest neighbor, Meta-Learning=metric
1
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural
Networks for One-shot Image Recognition”. In: ICML (2015).
2
Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS
(2016).
3
Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks
for Few-shot Learning”. In: NIPS (2017).
4
Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot
Learning”. In: arXiv (2017).
9 / 26
Previous Deep Meta-Learning Methods
RNNs as learners67
Should be able to approximate any learning algorithm.
Temporal convolutions5 have also been used in a similar way.
Learning=RNN rollforward, Meta-Learning=RNN weights
5
Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR
(2018).
6
Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural
Networks”. In: ICML (2016).
7
Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow
Reinforcement Learning”. In: arXiv (2016).
10 / 26
Previous Deep Meta-Learning Methods
Optimizer Learning89
Learn parameter update given gradients (search space includes
SGD, RMSProp, Adam etc)
Applicable to any architecture/task
Learning=generalized SGD with optimizer,
Meta-Learning=optimizer parameters
8
Marcin Andrychowicz et al. “Learning to learn by gradient descent by
gradient descent”. In: NIPS (2016).
9
Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot
Learning”. In: ICLR (2017).
11 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
12 / 26
Gradient-Based Meta-Learning
MAML10
10
Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic
Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017).
13 / 26
Gradient-Based Meta-Learning
Can approximate any learning algorithm11
Can be interpreted as hierarchical Bayes12
Unlike other methods, learning and meta-learning happen in
the same parameter space.
Learning=SGD, Meta-Learning=Initial parameters
11
Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep
Representations and Gradient Descent can Approximate any Learning
Algorithm”. In: ICLR (2018).
12
Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical
Bayes”. In: ICLR (2018).
14 / 26
Gradient-Based Meta-Learning
Implicit assumption: meta-learning and learning require the
same number of parameters.
15 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
Yoonho Lee, Seungjin Choi
1801.05558, submitted to ICML 2018
16 / 26
MT-nets
Idea: task-specific learning should require less degrees of
freedom than meta-learning.
17 / 26
MT-nets
18 / 26
MT-nets
From a task-specific learner’s point of view, T alters the
activation space.
19 / 26
MT-nets
Proposition
Fix x and A. Let U be a d-dimensional subspace of Rn (d ≤ n).
There exist configurations of T, W, and ζ such that the span of
ynew − y is U while satisfying A = TW.
Proposition
Fix x, A, and a loss function LT . Let U be a d-dimensional
subspace of Rn, and g(·, ·) a metric tensor on U. There exist
configurations of T, W, and ζ such that the vector ynew − y is in
the steepest direction of descent on LT with respect to the metric
du.
20 / 26
Experiments
Ablation. All components are necessary.
21 / 26
Experiments
Robust to step size α, since T can change effective step size.
22 / 26
Experiments
3 meta-tasks: regression to polynomials of order n
(n ∈ 0, 1, 2).
MT-nets choose to update more parameters for more
complicated meta-tasks.
23 / 26
Experiments
miniImagenet one-shot classification
24 / 26
Experiments
miniImagenet one-shot classification
5-way 1-shot classification accuracy.
25 / 26
Summary
MT-nets are robust to step size because of T, and the mask
M reflects the complexity of the meta-task.
MT-nets achieve state-of-the-art performance on a
challenging few-shot learning task.
26 / 26
Future Work
Our work shows that gradient-based meta-learning can benefit
from additional structure. Other architectures for
meta-learners?
Our method performs gradient descent on some metric that
makes learning faster, this might somehow relate to natural
gradients13.
Our metric is learned layerwise, which is similar to how a
recent work14 factors parameter space to tractably
approximate natural gradients.
13
Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In:
Neural computation 10.2 (1998), pp. 251–276.
14
James Martens and Roger Grosse. “Optimizing neural networks with
kronecker-factored approximate curvature”. In: ICML. 2015.
27 / 26
References I
[1] Shun-Ichi Amari. “Natural gradient works efficiently in
learning”. In: Neural computation 10.2 (1998), pp. 251–276.
[2] Marcin Andrychowicz et al. “Learning to learn by gradient
descent by gradient descent”. In: NIPS (2016).
[3] Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via
Slow Reinforcement Learning”. In: arXiv (2016).
[4] Chelsea Finn, Pieter Abbeel, and Sergey Levine.
“Model-Agnostic Meta-Learning for Fast Adaptation of Deep
Networks”. In: ICML (2017).
[5] Chelsea Finn and Sergey Levine. “Meta-Learning and
Universality: Deep Representations and Gradient Descent
can Approximate any Learning Algorithm”. In: ICLR (2018).
[6] Erin Grant et al. “Recasting Gradient-Based Meta-Learning
as Hierarchical Bayes”. In: ICLR (2018).
28 / 26
References II
[7] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov.
“Siamese Neural Networks for One-shot Image Recognition”.
In: ICML (2015).
[8] James Martens and Roger Grosse. “Optimizing neural
networks with kronecker-factored approximate curvature”.
In: ICML. 2015.
[9] Nikhil Mishra et al. “A Simple Neural Attentive
Meta-Learner”. In: ICLR (2018).
[10] Sachin Ravi and Hugo Larochelle. “Optimization as a Model
for Few-shot Learning”. In: ICLR (2017).
[11] Adam Santoro et al. “One-shot Learning with
Memory-Augmented Neural Networks”. In: ICML (2016).
[12] Jake Snell, Kevin Swersky, and Richard S. Zemel.
“Prototypical Networks for Few-shot Learning”. In: NIPS
(2017).
29 / 26
References III
[13] Flood Sung et al. “Learning to Compare: Relation Network
for Few-Shot Learning”. In: arXiv (2017).
[14] Oriol Vinyals et al. “Matching Networks for One Shot
Learning”. In: NIPS (2016).
30 / 26
Thank You
31 / 26
Pseudocode

Contenu connexe

Tendances

Numerical Integral using NNI
Numerical Integral using NNINumerical Integral using NNI
Numerical Integral using NNIFahmeen Mazhar
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Learning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven RegularizationLearning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven RegularizationEnzo Tartaglione
 
Multi-Chart Generative Surface Modeling
Multi-Chart Generative Surface ModelingMulti-Chart Generative Surface Modeling
Multi-Chart Generative Surface ModelingHeliBenHamu
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningDaniel Emaasit
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal clusterVenkat Projects
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive MetabonomicsMarilyn Arceo
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Danko Nikolic
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetDalin Zhang
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learningVaibhav Singh
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryEric van Heck
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
Always adopt self supervised learning
Always adopt self supervised learningAlways adopt self supervised learning
Always adopt self supervised learningLibgirlTeam
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsVaibhav Singh
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...webwinkelvakdag
 

Tendances (20)

Numerical Integral using NNI
Numerical Integral using NNINumerical Integral using NNI
Numerical Integral using NNI
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Learning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven RegularizationLearning Sparse Neural Networksvia Sensitivity-Driven Regularization
Learning Sparse Neural Networksvia Sensitivity-Driven Regularization
 
Multi-Chart Generative Surface Modeling
Multi-Chart Generative Surface ModelingMulti-Chart Generative Surface Modeling
Multi-Chart Generative Surface Modeling
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive Metabonomics
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNet
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
2011611009
20116110092011611009
2011611009
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options Theory
 
final seminar
final seminarfinal seminar
final seminar
 
Test PDF
Test PDFTest PDF
Test PDF
 
Always adopt self supervised learning
Always adopt self supervised learningAlways adopt self supervised learning
Always adopt self supervised learning
 
Multi Task Learning for Recommendation Systems
Multi Task Learning for Recommendation SystemsMulti Task Learning for Recommendation Systems
Multi Task Learning for Recommendation Systems
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Jl
JlJl
Jl
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
 

Similaire à Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace

Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmijfcstjournal
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsIJMTST Journal
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deakin University
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...Konstantinos Demertzis
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
 

Similaire à Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace (20)

Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
CNN
CNNCNN
CNN
 
PggLas12
PggLas12PggLas12
PggLas12
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 

Plus de Yoonho Lee

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsYoonho Lee
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for ExplorationYoonho Lee
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Yoonho Lee
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 

Plus de Yoonho Lee (7)

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 

Dernier

George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 

Dernier (20)

George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 

Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace

  • 1. 1 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology February 22, 2018
  • 2. 2 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
  • 4. 4 / 26 Meta-Learning Which is Aconitum napellus?
  • 5. 5 / 26 Meta-Learning Which is Aconitum napellus? Same information, but this version of the task is impossible for humans. We clearly have something that helps us process new visual information.
  • 6. 6 / 26 Meta-Learning Which is Aconitum napellus? Some humans have (meta-)learned to answer this question. Meta-learning can occur using acquired knowledge.
  • 8. 8 / 26 Previous Deep Meta-Learning Methods Metric Learning1234 Learn a metric in image space Specific to few-shot classification(Omniglot, MiniImageNet etc) Learning=nearest neighbor, Meta-Learning=metric 1 Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural Networks for One-shot Image Recognition”. In: ICML (2015). 2 Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS (2016). 3 Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks for Few-shot Learning”. In: NIPS (2017). 4 Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot Learning”. In: arXiv (2017).
  • 9. 9 / 26 Previous Deep Meta-Learning Methods RNNs as learners67 Should be able to approximate any learning algorithm. Temporal convolutions5 have also been used in a similar way. Learning=RNN rollforward, Meta-Learning=RNN weights 5 Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR (2018). 6 Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural Networks”. In: ICML (2016). 7 Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow Reinforcement Learning”. In: arXiv (2016).
  • 10. 10 / 26 Previous Deep Meta-Learning Methods Optimizer Learning89 Learn parameter update given gradients (search space includes SGD, RMSProp, Adam etc) Applicable to any architecture/task Learning=generalized SGD with optimizer, Meta-Learning=optimizer parameters 8 Marcin Andrychowicz et al. “Learning to learn by gradient descent by gradient descent”. In: NIPS (2016). 9 Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot Learning”. In: ICLR (2017).
  • 11. 11 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
  • 12. 12 / 26 Gradient-Based Meta-Learning MAML10 10 Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017).
  • 13. 13 / 26 Gradient-Based Meta-Learning Can approximate any learning algorithm11 Can be interpreted as hierarchical Bayes12 Unlike other methods, learning and meta-learning happen in the same parameter space. Learning=SGD, Meta-Learning=Initial parameters 11 Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm”. In: ICLR (2018). 12 Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical Bayes”. In: ICLR (2018).
  • 14. 14 / 26 Gradient-Based Meta-Learning Implicit assumption: meta-learning and learning require the same number of parameters.
  • 15. 15 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace Yoonho Lee, Seungjin Choi 1801.05558, submitted to ICML 2018
  • 16. 16 / 26 MT-nets Idea: task-specific learning should require less degrees of freedom than meta-learning.
  • 18. 18 / 26 MT-nets From a task-specific learner’s point of view, T alters the activation space.
  • 19. 19 / 26 MT-nets Proposition Fix x and A. Let U be a d-dimensional subspace of Rn (d ≤ n). There exist configurations of T, W, and ζ such that the span of ynew − y is U while satisfying A = TW. Proposition Fix x, A, and a loss function LT . Let U be a d-dimensional subspace of Rn, and g(·, ·) a metric tensor on U. There exist configurations of T, W, and ζ such that the vector ynew − y is in the steepest direction of descent on LT with respect to the metric du.
  • 20. 20 / 26 Experiments Ablation. All components are necessary.
  • 21. 21 / 26 Experiments Robust to step size α, since T can change effective step size.
  • 22. 22 / 26 Experiments 3 meta-tasks: regression to polynomials of order n (n ∈ 0, 1, 2). MT-nets choose to update more parameters for more complicated meta-tasks.
  • 23. 23 / 26 Experiments miniImagenet one-shot classification
  • 24. 24 / 26 Experiments miniImagenet one-shot classification 5-way 1-shot classification accuracy.
  • 25. 25 / 26 Summary MT-nets are robust to step size because of T, and the mask M reflects the complexity of the meta-task. MT-nets achieve state-of-the-art performance on a challenging few-shot learning task.
  • 26. 26 / 26 Future Work Our work shows that gradient-based meta-learning can benefit from additional structure. Other architectures for meta-learners? Our method performs gradient descent on some metric that makes learning faster, this might somehow relate to natural gradients13. Our metric is learned layerwise, which is similar to how a recent work14 factors parameter space to tractably approximate natural gradients. 13 Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In: Neural computation 10.2 (1998), pp. 251–276. 14 James Martens and Roger Grosse. “Optimizing neural networks with kronecker-factored approximate curvature”. In: ICML. 2015.
  • 27. 27 / 26 References I [1] Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In: Neural computation 10.2 (1998), pp. 251–276. [2] Marcin Andrychowicz et al. “Learning to learn by gradient descent by gradient descent”. In: NIPS (2016). [3] Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow Reinforcement Learning”. In: arXiv (2016). [4] Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017). [5] Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm”. In: ICLR (2018). [6] Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical Bayes”. In: ICLR (2018).
  • 28. 28 / 26 References II [7] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural Networks for One-shot Image Recognition”. In: ICML (2015). [8] James Martens and Roger Grosse. “Optimizing neural networks with kronecker-factored approximate curvature”. In: ICML. 2015. [9] Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR (2018). [10] Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot Learning”. In: ICLR (2017). [11] Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural Networks”. In: ICML (2016). [12] Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks for Few-shot Learning”. In: NIPS (2017).
  • 29. 29 / 26 References III [13] Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot Learning”. In: arXiv (2017). [14] Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS (2016).