Slideshare breaking inter layer co-adaptation

•

2 j'aime•1,121 vues

This paper proposes a technique called classifier anonymization (FOCA) to break co-adaptation between the feature extractor and classifier in deep neural networks. FOCA trains the feature extractor to make weak classifiers strong by optimizing it for different randomly generated weak classifiers on small batches of data. The paper theoretically proves that under FOCA, the feature extractor learns to project data points to a simple point-like distribution in the feature space. Experiments on real datasets show that FOCA allows the classifier to be trained with fewer samples than standard training and largely confirms the point-like property.

Sciences

Masayuki Tanaka
Breaking Inter-Layer Co-Adaptation
by Classifier Anonymization
Ikuro Sato†, Kohta Ishikawa†, Guoqing Liu†, Masayuki Tanaka‡
(ICML2019)
† ‡

Meta reviewer’s comment
…This paper seems to me like a perfect example of a
“High Risk High Reward” paper, …
Acceptance ratio of ICML2019: 773/3424 = 22.6%
We have taken that as a compliment. It is a research!
1

What I’m going to talk
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Feature space 𝜉𝜉
+
+
+ +
+
+ +
--
-
-
-- -
-
Feature space 𝜉𝜉
+
++
+
+
+
+-- --
--
-
End-to-end DNN
<<
Which is better? Why? How can we obtain good features?2

Summary
About what?
How?
Theory?
In reality?
Breaking co-adaptation between
feature extractor and classifier.
By classifier anonymization technique.
Proved: Features form simple
point-like distribution.
Point-like property largely confirmed
on real datasets.
3

What is a co-adaptation?
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Feature space 𝜉𝜉
Decision
boundary
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation:
Feature extractor adapts a particular classifier.
Classifier adapts a particular feature extractor.
Break
co-adaptation
-
Feature space 𝜉𝜉
+
++
+
+
+
+-- --
--
-
Classifiers
Feature extractor should be
trained for many classifiers.
End-to-end DNN
4

Proposed algorithm: FOCA
-
Feature space 𝜉𝜉
+++
+
+ ++
--
-----
(Under several conditions,)
we theoretically proved the FOCA
can train the feature extractor
which projects single point.
for given feature extractor
FOCA can train feature extractor to make any weak classifier strong.
FOCA:
Feature-extractor Optimization through Classifier Anonymization
5

Message of FOCA
Traditional training FOCA training
Feature extractor
(Junior researcher)
Feature extractor
(Junior researcher)
Weak classifiers
(Boss variety???)
Strong classifier
(Smart boss)
Transfer learning
(New boss, new domain)
FOCA can train
feature extractor strong.
6

Weak classifier assumption
Definition:
Weak classifier is slightly better than random guess.
𝜃𝜃𝜙𝜙
∗
= arg min
𝜃𝜃
E
(𝑥𝑥,𝑡𝑡)~𝑝𝑝(𝑥𝑥,𝑡𝑡)
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
𝜃𝜃𝜙𝜙
𝐵𝐵
= arg min
𝜃𝜃
�
𝑥𝑥,𝑡𝑡 ∈𝐵𝐵
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
Strong classifier
Strong classifier is strong for entire data.
Weak classifier assumption
We assume that strong classifier for small samples is
weak classifier for entire data.
B is small samples of entire data.
7

Practical FOCA algorithm
𝐹𝐹𝜙𝜙(𝑥𝑥)
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
generatorFeature
extractor
Classifier model
𝐹𝐹𝐹𝜙𝜙(𝑥𝑥)
Previous
feature extractor
Training data
Optimize the classifier
for given small samples
with previous feature extractor.
Update feature extractor
for given mini-batch
with weak classifier.
Sampling
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
Update
Mini-batch
8

Experimental validation
Two-step training:
Train the feature extractor. Then, train the classifier with the fixed
given feature extractor.
-
Feature space 𝜉𝜉
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation Point-like
-
Feature space 𝜉𝜉
+++
+
+ ++
--
-----
Many samples are required to train
the classifier.
A few samples are good enough to
train the classifier.
9

Links
Official proceedings of ICML2019
http://proceedings.mlr.press/v97/
arxiv: Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
https://arxiv.org/abs/1906.01150
Twitter: Masayuki Tanaka
https://twitter.com/likesilkto
Twitter: Ikuro Sato
https://twitter.com/ikuro_s
12

Contenu connexe

Tendances

Fuzzy logic member functionsDr. C.V. Suresh Babu

GenericsRavi_Kant_Sahu

Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal

Best practices in JavaMudit Gupta

Wrapper classesRavi_Kant_Sahu

DotNet programming & PracticesDev Raj Gautam

(Recursion)adsRavi Rao

Recursion Pattern Analysis and FeedbackSander Mak (@Sander_Mak)

Pattern Matching - at a glanceKnoldus Inc.

Chapter 11 dsHanif Durad

Java GenericsDeeptiJava

wrapper classesRajesh Roky

Feature selectionDong Guo

Generics in javasuraj pandey

Feature recognition and classificationSooraz Sresta

Data Handling and FunctionRatnaJava

Tendances (16)

Fuzzy logic member functions

Generics

Optimal feature selection from v mware esxi 5.1 feature set

Best practices in Java

Wrapper classes

DotNet programming & Practices

(Recursion)ads

Recursion Pattern Analysis and Feedback

Pattern Matching - at a glance

Chapter 11 ds

Java Generics

wrapper classes

Feature selection

Generics in java

Feature recognition and classification

Data Handling and Function

Similaire à Slideshare breaking inter layer co-adaptation

Machine learning for document analysis and understandingSeiichi Uchida

Machine Learning Lecture 3 Decision Treesananth

Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...Waqas Tariq

Using CNTK's Python Interface for Deep LearningDave DeBarr - PyData

Spark MeetupSahan Bulathwela

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...Annibale Panichella

Text analysis using pythonVijay Ramachandran

Branch And Bound and Beam Search Feature Selection AlgorithmsChamin Nalinda Loku Gam Hewage

supervised.pptxMohamedSaied316569

Efficient top-k queries processing in column-family distributed databasesRui Vieira

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou

XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...Erlangen Artificial Intelligence & Machine Learning Meetup

Booting into functional programmingDhaval Dalal

Deep learning from a novice perspectiveAnirban Santara

Python master class 2Chathuranga Bandara

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2

Foundations: Artificial Neural Networksananth

2017 nov reflow sbtbmariuseriksen4

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai

Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon

Similaire à Slideshare breaking inter layer co-adaptation (20)

Machine learning for document analysis and understanding

Machine Learning Lecture 3 Decision Trees

Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attri...

Using CNTK's Python Interface for Deep LearningDave DeBarr -

Spark Meetup

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic A...

Text analysis using python

Branch And Bound and Beam Search Feature Selection Algorithms

supervised.pptx

Efficient top-k queries processing in column-family distributed databases

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt

XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...

Booting into functional programming

Deep learning from a novice perspective

Python master class 2

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...

Foundations: Artificial Neural Networks

2017 nov reflow sbtb

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017

Plus de Masayuki Tanaka

PRMU201902 Presentation documentMasayuki Tanaka

Gradient-Based Low-Light Image EnhancementMasayuki Tanaka

Year-End Seminar 2018Masayuki Tanaka

遠赤外線カメラと可視カメラを利用した悪条件下における画像取得Masayuki Tanaka

Learnable Image EncryptionMasayuki Tanaka

クリエイティブ・コモンズMasayuki Tanaka

デザイン4原則Masayuki Tanaka

メラビアンの法則Masayuki Tanaka

類似性の法則Masayuki Tanaka

権威に訴える論証Masayuki Tanaka

Chain rule of deep neural network layer for back propagationMasayuki Tanaka

Give Me FourMasayuki Tanaka

Tech art 20170315Masayuki Tanaka

My Slide ThemeMasayuki Tanaka

Font MemoMasayuki Tanaka

One-point for presentationMasayuki Tanaka

ADMM algorithm in ProxImaL Masayuki Tanaka

Intensity Constraint Gradient-Based Image ReconstructionMasayuki Tanaka

Least Square with L0, L1, and L2 ConstraintMasayuki Tanaka

Lasso regressionMasayuki Tanaka

Plus de Masayuki Tanaka (20)

PRMU201902 Presentation document

Gradient-Based Low-Light Image Enhancement

Year-End Seminar 2018

遠赤外線カメラと可視カメラを利用した悪条件下における画像取得

Learnable Image Encryption

クリエイティブ・コモンズ

デザイン4原則

メラビアンの法則

類似性の法則

権威に訴える論証

Chain rule of deep neural network layer for back propagation

Give Me Four

Tech art 20170315

My Slide Theme

Font Memo

One-point for presentation

ADMM algorithm in ProxImaL

Intensity Constraint Gradient-Based Image Reconstruction

Least Square with L0, L1, and L2 Constraint

Lasso regression

Dernier

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Biological Classification BioHack (3).pdfmuntazimhurra

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA

Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136

Botany 4th semester series (krishna).pdfSumit Kumar yadav

Dernier (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

Animal Communication- Auditory and Visual.pptx

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Broad bean, Lima Bean, Jack bean, Ullucus.pptx

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |

Presentation Vikram Lander by Vedansh Gupta.pptx

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡

Formation of low mass protostars and their circumstellar disks

Engler and Prantl system of classification in plant taxonomy

Disentangling the origin of chemical differences using GHOST

Artificial Intelligence In Microbiology by Dr. Prince C P

Biological Classification BioHack (3).pdf

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Grafana in space: Monitoring Japan's SLIM moon lander in real time

Cultivation of KODO MILLET . made by Ghanshyam pptx

Botany 4th semester series (krishna).pdf

Slideshare breaking inter layer co-adaptation

1. Masayuki Tanaka Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro Sato†, Kohta Ishikawa†, Guoqing Liu†, Masayuki Tanaka‡ (ICML2019) † ‡

2. Meta reviewer’s comment …This paper seems to me like a perfect example of a “High Risk High Reward” paper, … Acceptance ratio of ICML2019: 773/3424 = 22.6% We have taken that as a compliment. It is a research! 1

3. What I’m going to talk 𝑥𝑥 Input 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) 𝜂𝜂 Output 𝜉𝜉 Feature Let’s consider a classification task. Feature extractor Classifier + - Feature space 𝜉𝜉 + + + + + + + -- - - -- - - Feature space 𝜉𝜉 + ++ + + + +-- -- -- - End-to-end DNN << Which is better? Why? How can we obtain good features?2

4. Summary About what? How? Theory? In reality? Breaking co-adaptation between feature extractor and classifier. By classifier anonymization technique. Proved: Features form simple point-like distribution. Point-like property largely confirmed on real datasets. 3

5. What is a co-adaptation? 𝑥𝑥 Input 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) 𝜂𝜂 Output 𝜉𝜉 Feature Let’s consider a classification task. Feature extractor Classifier + - Feature space 𝜉𝜉 Decision boundary + + + + + + + -- - - -- - Co-adaptation: Feature extractor adapts a particular classifier. Classifier adapts a particular feature extractor. Break co-adaptation - Feature space 𝜉𝜉 + ++ + + + +-- -- -- - Classifiers Feature extractor should be trained for many classifiers. End-to-end DNN 4

6. Proposed algorithm: FOCA - Feature space 𝜉𝜉 +++ + + ++ -- ----- (Under several conditions,) we theoretically proved the FOCA can train the feature extractor which projects single point. for given feature extractor FOCA can train feature extractor to make any weak classifier strong. FOCA: Feature-extractor Optimization through Classifier Anonymization 5

7. Message of FOCA Traditional training FOCA training Feature extractor (Junior researcher) Feature extractor (Junior researcher) Weak classifiers (Boss variety???) Strong classifier (Smart boss) Transfer learning (New boss, new domain) FOCA can train feature extractor strong. 6

8. Weak classifier assumption Definition: Weak classifier is slightly better than random guess. 𝜃𝜃𝜙𝜙 ∗ = arg min 𝜃𝜃 E (𝑥𝑥,𝑡𝑡)~𝑝𝑝(𝑥𝑥,𝑡𝑡) 𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡 𝜃𝜃𝜙𝜙 𝐵𝐵 = arg min 𝜃𝜃 � 𝑥𝑥,𝑡𝑡 ∈𝐵𝐵 𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡 Strong classifier Strong classifier is strong for entire data. Weak classifier assumption We assume that strong classifier for small samples is weak classifier for entire data. B is small samples of entire data. 7

9. Practical FOCA algorithm 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) Weak classifier generatorFeature extractor Classifier model 𝐹𝐹𝐹𝜙𝜙(𝑥𝑥) Previous feature extractor Training data Optimize the classifier for given small samples with previous feature extractor. Update feature extractor for given mini-batch with weak classifier. Sampling 𝐶𝐶𝜃𝜃(𝜉𝜉) Weak classifier Update Mini-batch 8

10. Experimental validation Two-step training: Train the feature extractor. Then, train the classifier with the fixed given feature extractor. - Feature space 𝜉𝜉 + + + + + + + -- - - -- - Co-adaptation Point-like - Feature space 𝜉𝜉 +++ + + ++ -- ----- Many samples are required to train the classifier. A few samples are good enough to train the classifier. 9

11. Results 10

12. Poster as a summary 11

13. Links Official proceedings of ICML2019 http://proceedings.mlr.press/v97/ arxiv: Breaking Inter-Layer Co-Adaptation by Classifier Anonymization https://arxiv.org/abs/1906.01150 Twitter: Masayuki Tanaka https://twitter.com/likesilkto Twitter: Ikuro Sato https://twitter.com/ikuro_s 12

Slideshare breaking inter layer co-adaptation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (16)

Similaire à Slideshare breaking inter layer co-adaptation

Similaire à Slideshare breaking inter layer co-adaptation (20)

Plus de Masayuki Tanaka

Plus de Masayuki Tanaka (20)

Dernier

Dernier (20)

Slideshare breaking inter layer co-adaptation