SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
1 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
February 22, 2018
2 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
3 / 26
Meta-Learning
4 / 26
Meta-Learning
Which is Aconitum napellus?
5 / 26
Meta-Learning
Which is Aconitum napellus?
Same information, but this version of the task is impossible
for humans. We clearly have something that helps us process
new visual information.
6 / 26
Meta-Learning
Which is Aconitum napellus?
Some humans have (meta-)learned to answer this question.
Meta-learning can occur using acquired knowledge.
7 / 26
Meta-Learning
8 / 26
Previous Deep Meta-Learning Methods
Metric Learning1234
Learn a metric in image space
Specific to few-shot classification(Omniglot, MiniImageNet
etc)
Learning=nearest neighbor, Meta-Learning=metric
1
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural
Networks for One-shot Image Recognition”. In: ICML (2015).
2
Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS
(2016).
3
Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks
for Few-shot Learning”. In: NIPS (2017).
4
Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot
Learning”. In: arXiv (2017).
9 / 26
Previous Deep Meta-Learning Methods
RNNs as learners67
Should be able to approximate any learning algorithm.
Temporal convolutions5 have also been used in a similar way.
Learning=RNN rollforward, Meta-Learning=RNN weights
5
Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR
(2018).
6
Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural
Networks”. In: ICML (2016).
7
Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow
Reinforcement Learning”. In: arXiv (2016).
10 / 26
Previous Deep Meta-Learning Methods
Optimizer Learning89
Learn parameter update given gradients (search space includes
SGD, RMSProp, Adam etc)
Applicable to any architecture/task
Learning=generalized SGD with optimizer,
Meta-Learning=optimizer parameters
8
Marcin Andrychowicz et al. “Learning to learn by gradient descent by
gradient descent”. In: NIPS (2016).
9
Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot
Learning”. In: ICLR (2017).
11 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
12 / 26
Gradient-Based Meta-Learning
MAML10
10
Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic
Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017).
13 / 26
Gradient-Based Meta-Learning
Can approximate any learning algorithm11
Can be interpreted as hierarchical Bayes12
Unlike other methods, learning and meta-learning happen in
the same parameter space.
Learning=SGD, Meta-Learning=Initial parameters
11
Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep
Representations and Gradient Descent can Approximate any Learning
Algorithm”. In: ICLR (2018).
12
Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical
Bayes”. In: ICLR (2018).
14 / 26
Gradient-Based Meta-Learning
Implicit assumption: meta-learning and learning require the
same number of parameters.
15 / 26
Gradient-Based Meta-Learning with Learned
Layerwise Metric and Subspace
Yoonho Lee, Seungjin Choi
1801.05558, submitted to ICML 2018
16 / 26
MT-nets
Idea: task-specific learning should require less degrees of
freedom than meta-learning.
17 / 26
MT-nets
18 / 26
MT-nets
From a task-specific learner’s point of view, T alters the
activation space.
19 / 26
MT-nets
Proposition
Fix x and A. Let U be a d-dimensional subspace of Rn (d ≤ n).
There exist configurations of T, W, and ζ such that the span of
ynew − y is U while satisfying A = TW.
Proposition
Fix x, A, and a loss function LT . Let U be a d-dimensional
subspace of Rn, and g(·, ·) a metric tensor on U. There exist
configurations of T, W, and ζ such that the vector ynew − y is in
the steepest direction of descent on LT with respect to the metric
du.
20 / 26
Experiments
Ablation. All components are necessary.
21 / 26
Experiments
Robust to step size α, since T can change effective step size.
22 / 26
Experiments
3 meta-tasks: regression to polynomials of order n
(n ∈ 0, 1, 2).
MT-nets choose to update more parameters for more
complicated meta-tasks.
23 / 26
Experiments
miniImagenet one-shot classification
24 / 26
Experiments
miniImagenet one-shot classification
5-way 1-shot classification accuracy.
25 / 26
Summary
MT-nets are robust to step size because of T, and the mask
M reflects the complexity of the meta-task.
MT-nets achieve state-of-the-art performance on a
challenging few-shot learning task.
26 / 26
Future Work
Our work shows that gradient-based meta-learning can benefit
from additional structure. Other architectures for
meta-learners?
Our method performs gradient descent on some metric that
makes learning faster, this might somehow relate to natural
gradients13.
Our metric is learned layerwise, which is similar to how a
recent work14 factors parameter space to tractably
approximate natural gradients.
13
Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In:
Neural computation 10.2 (1998), pp. 251–276.
14
James Martens and Roger Grosse. “Optimizing neural networks with
kronecker-factored approximate curvature”. In: ICML. 2015.
27 / 26
References I
[1] Shun-Ichi Amari. “Natural gradient works efficiently in
learning”. In: Neural computation 10.2 (1998), pp. 251–276.
[2] Marcin Andrychowicz et al. “Learning to learn by gradient
descent by gradient descent”. In: NIPS (2016).
[3] Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via
Slow Reinforcement Learning”. In: arXiv (2016).
[4] Chelsea Finn, Pieter Abbeel, and Sergey Levine.
“Model-Agnostic Meta-Learning for Fast Adaptation of Deep
Networks”. In: ICML (2017).
[5] Chelsea Finn and Sergey Levine. “Meta-Learning and
Universality: Deep Representations and Gradient Descent
can Approximate any Learning Algorithm”. In: ICLR (2018).
[6] Erin Grant et al. “Recasting Gradient-Based Meta-Learning
as Hierarchical Bayes”. In: ICLR (2018).
28 / 26
References II
[7] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov.
“Siamese Neural Networks for One-shot Image Recognition”.
In: ICML (2015).
[8] James Martens and Roger Grosse. “Optimizing neural
networks with kronecker-factored approximate curvature”.
In: ICML. 2015.
[9] Nikhil Mishra et al. “A Simple Neural Attentive
Meta-Learner”. In: ICLR (2018).
[10] Sachin Ravi and Hugo Larochelle. “Optimization as a Model
for Few-shot Learning”. In: ICLR (2017).
[11] Adam Santoro et al. “One-shot Learning with
Memory-Augmented Neural Networks”. In: ICML (2016).
[12] Jake Snell, Kevin Swersky, and Richard S. Zemel.
“Prototypical Networks for Few-shot Learning”. In: NIPS
(2017).
29 / 26
References III
[13] Flood Sung et al. “Learning to Compare: Relation Network
for Few-Shot Learning”. In: arXiv (2017).
[14] Oriol Vinyals et al. “Matching Networks for One Shot
Learning”. In: NIPS (2016).
30 / 26
Thank You
31 / 26
Pseudocode

Contenu connexe

Tendances

Test PDF
Test PDFTest PDF
Test PDF
AlgnuD
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 

Tendances (20)

5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Test PDF
Test PDFTest PDF
Test PDF
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Learning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learningLearning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learning
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GAN
 
Icml2017 overview
Icml2017 overviewIcml2017 overview
Icml2017 overview
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
Complex system
Complex systemComplex system
Complex system
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 

Similaire à Gradient-based Meta-learning with learned layerwise subspace and metric

GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
Konstantinos Demertzis
 

Similaire à Gradient-based Meta-learning with learned layerwise subspace and metric (20)

Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
CNN
CNNCNN
CNN
 
PggLas12
PggLas12PggLas12
PggLas12
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspec...
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 
Boosting auxiliary task guidance: a probabilistic approach
Boosting auxiliary task guidance: a probabilistic approachBoosting auxiliary task guidance: a probabilistic approach
Boosting auxiliary task guidance: a probabilistic approach
 

Plus de NAVER Engineering

Plus de NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Gradient-based Meta-learning with learned layerwise subspace and metric

  • 1. 1 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology February 22, 2018
  • 2. 2 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
  • 4. 4 / 26 Meta-Learning Which is Aconitum napellus?
  • 5. 5 / 26 Meta-Learning Which is Aconitum napellus? Same information, but this version of the task is impossible for humans. We clearly have something that helps us process new visual information.
  • 6. 6 / 26 Meta-Learning Which is Aconitum napellus? Some humans have (meta-)learned to answer this question. Meta-learning can occur using acquired knowledge.
  • 8. 8 / 26 Previous Deep Meta-Learning Methods Metric Learning1234 Learn a metric in image space Specific to few-shot classification(Omniglot, MiniImageNet etc) Learning=nearest neighbor, Meta-Learning=metric 1 Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural Networks for One-shot Image Recognition”. In: ICML (2015). 2 Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS (2016). 3 Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks for Few-shot Learning”. In: NIPS (2017). 4 Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot Learning”. In: arXiv (2017).
  • 9. 9 / 26 Previous Deep Meta-Learning Methods RNNs as learners67 Should be able to approximate any learning algorithm. Temporal convolutions5 have also been used in a similar way. Learning=RNN rollforward, Meta-Learning=RNN weights 5 Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR (2018). 6 Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural Networks”. In: ICML (2016). 7 Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow Reinforcement Learning”. In: arXiv (2016).
  • 10. 10 / 26 Previous Deep Meta-Learning Methods Optimizer Learning89 Learn parameter update given gradients (search space includes SGD, RMSProp, Adam etc) Applicable to any architecture/task Learning=generalized SGD with optimizer, Meta-Learning=optimizer parameters 8 Marcin Andrychowicz et al. “Learning to learn by gradient descent by gradient descent”. In: NIPS (2016). 9 Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot Learning”. In: ICLR (2017).
  • 11. 11 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
  • 12. 12 / 26 Gradient-Based Meta-Learning MAML10 10 Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017).
  • 13. 13 / 26 Gradient-Based Meta-Learning Can approximate any learning algorithm11 Can be interpreted as hierarchical Bayes12 Unlike other methods, learning and meta-learning happen in the same parameter space. Learning=SGD, Meta-Learning=Initial parameters 11 Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm”. In: ICLR (2018). 12 Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical Bayes”. In: ICLR (2018).
  • 14. 14 / 26 Gradient-Based Meta-Learning Implicit assumption: meta-learning and learning require the same number of parameters.
  • 15. 15 / 26 Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace Yoonho Lee, Seungjin Choi 1801.05558, submitted to ICML 2018
  • 16. 16 / 26 MT-nets Idea: task-specific learning should require less degrees of freedom than meta-learning.
  • 18. 18 / 26 MT-nets From a task-specific learner’s point of view, T alters the activation space.
  • 19. 19 / 26 MT-nets Proposition Fix x and A. Let U be a d-dimensional subspace of Rn (d ≤ n). There exist configurations of T, W, and ζ such that the span of ynew − y is U while satisfying A = TW. Proposition Fix x, A, and a loss function LT . Let U be a d-dimensional subspace of Rn, and g(·, ·) a metric tensor on U. There exist configurations of T, W, and ζ such that the vector ynew − y is in the steepest direction of descent on LT with respect to the metric du.
  • 20. 20 / 26 Experiments Ablation. All components are necessary.
  • 21. 21 / 26 Experiments Robust to step size α, since T can change effective step size.
  • 22. 22 / 26 Experiments 3 meta-tasks: regression to polynomials of order n (n ∈ 0, 1, 2). MT-nets choose to update more parameters for more complicated meta-tasks.
  • 23. 23 / 26 Experiments miniImagenet one-shot classification
  • 24. 24 / 26 Experiments miniImagenet one-shot classification 5-way 1-shot classification accuracy.
  • 25. 25 / 26 Summary MT-nets are robust to step size because of T, and the mask M reflects the complexity of the meta-task. MT-nets achieve state-of-the-art performance on a challenging few-shot learning task.
  • 26. 26 / 26 Future Work Our work shows that gradient-based meta-learning can benefit from additional structure. Other architectures for meta-learners? Our method performs gradient descent on some metric that makes learning faster, this might somehow relate to natural gradients13. Our metric is learned layerwise, which is similar to how a recent work14 factors parameter space to tractably approximate natural gradients. 13 Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In: Neural computation 10.2 (1998), pp. 251–276. 14 James Martens and Roger Grosse. “Optimizing neural networks with kronecker-factored approximate curvature”. In: ICML. 2015.
  • 27. 27 / 26 References I [1] Shun-Ichi Amari. “Natural gradient works efficiently in learning”. In: Neural computation 10.2 (1998), pp. 251–276. [2] Marcin Andrychowicz et al. “Learning to learn by gradient descent by gradient descent”. In: NIPS (2016). [3] Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow Reinforcement Learning”. In: arXiv (2016). [4] Chelsea Finn, Pieter Abbeel, and Sergey Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: ICML (2017). [5] Chelsea Finn and Sergey Levine. “Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm”. In: ICLR (2018). [6] Erin Grant et al. “Recasting Gradient-Based Meta-Learning as Hierarchical Bayes”. In: ICLR (2018).
  • 28. 28 / 26 References II [7] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural Networks for One-shot Image Recognition”. In: ICML (2015). [8] James Martens and Roger Grosse. “Optimizing neural networks with kronecker-factored approximate curvature”. In: ICML. 2015. [9] Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: ICLR (2018). [10] Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot Learning”. In: ICLR (2017). [11] Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural Networks”. In: ICML (2016). [12] Jake Snell, Kevin Swersky, and Richard S. Zemel. “Prototypical Networks for Few-shot Learning”. In: NIPS (2017).
  • 29. 29 / 26 References III [13] Flood Sung et al. “Learning to Compare: Relation Network for Few-Shot Learning”. In: arXiv (2017). [14] Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In: NIPS (2016).