SlideShare a Scribd company logo
1 of 57
Download to read offline
1
Learning Discrete Representations via
Information Maximizing
Self-Augmented Training
Weihua Hu, Takeru Miyato, Seiya Tokui,
Eiichi Matsumoto, Masashi Sugiyama
Intelligent Information processing II
Nov 20, 2017
University of Tokyo, RIKEN AIP, Preferred Networks, Inc.
Proceedings of the 34th International Conference on Machine Learning
Presented by Shunsuke KITADA
The reason why I chose this paper
● With unsupervised learning achieved high accuracy (98%!)
in MNIST classification.
● Published from the University of Tokyo (Sugiyama lab)
and Preferred Networks.
● VAT is used as effective regularization term.
● Accepted by ICML 2017.
2
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
3
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
4
Introduction
● Unsupervised discrete representation Learning
5
○ To obtain a function that maps similar (or dissimilar) data into
similar (or dissimilar) discrete representations.
○ The similarity of data is defined according to applications of
interests.
Introduction
● Clustering and Hash learning
6
○ Clustering
■ Widely applied to data-driven application
domains. [Berkhin 2006]
○ Hash learning
■ Popular for an approximate nearest neighbor search for
large scale information retrieval. [Wang+ 2016]
Introduction
● Development of Deep neural networks
7
○ Scalability and flexibility
■ It is possible that learn complex feature and non-linear
decision boundaries.
○ Their model complexity is very huge
■ Regularization of the networks is crucial to learn
meaningful representations of data.
Introduction
● In unsupervised representation learning
8
○ Target representations are not provided.
○ There are no constraining conditions.
➔ We need to regularize the networks in order to learn useful
representations that exhibit intended invariance for
applications of interest.
◆ e.g. ) invariance to small perturbations or affine transformation
Introduction | In this paper
● Use data augmentation to model the invariance of
learned data representations
9
○ Map data points into their discrete representations by a deep
neural network.
○ Regularize it by encouraging its prediction to be invariant to data
augmentation.
10
● Self-Augmented Training
(SAT)
Encourage the predicted
representations of augmented data
points to be close to those of the original
data points in end-to-end fashion.
● Regularized Information
Maximization (RIM)
Maximize information theoretic
dependency between inputs and their
mapped outputs, while regularizing the
mapping function.
Information Maximizing
Self-Augmented Training
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
11
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
12
Related work | Clustering & Hash Learning
● The representative clustering and hashing methods
○ K-means clustering and hashing [He+ 2013]
○ Gaussian mixture model clustering, iterative quantization [Gong+ 2013]
○ Minimal-loss hashing [Norouzi & Blei 2011]
13
These methods can only model linear boundaries between
different representations.
Related work | Clustering & Hash Learning
● Methods that can model the non-linearity of data
○ Kernel-based [Xu+ 2014; Kulis & Darrell 2019]
○ Spectral clustering [Xu+ 2014; Kulis & Darrell 2019]
14
They are difficult to scale to large dataset.
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Clustering
15
■ To learn feature representations and
cluster assignments [Xie+ 2016]
■ Model the data generation process by using deep
generative models with Gaussian mixture models as
prior dist [Dilokthanakul+ 2016; Zheng+ 2016]
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Hash learning
16
■ Supervised hash learning
[Xia+ 2014; Lai+ 2015; Zhang+ 2015; Xu+2015; Li+ 2015]
■ Unsupervised hash learning
● Stacked RBM [Salakhutdinov & Hinton 2009]
● Use DL for the mapping function [Erin Liong+ 2015]
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Hash learning
17
■ These unsupervised methods did not explicitly
intended impose the invariance on the learned
representations.
■ The predicted representations may not be useful
for applications of interest.
Related work | Data Augmentation
● About data augmentation
○ In supervised and semisupervised learning
18
■ Applying data augmentation to a supervised learning problem
is equivalent to adding a regularization to the original cost
function. [Leen 1995]
■ Achieve state-of-the-art performance in applying data
augmentation to semi-supervised learning.
[Bachman+ 2014; Miyato+ 2016; Sajjadi+ 2016]
Related work | Data Augmentation
● About data augmentation
○ In unsupervised learning
19
■ Proposed to use data augmentation to model the invariance
of learned representations. [Donovitskiy+ 2014]
Related work | Data Augmentation
● Difference between Dosoviskiy+ and IMSAT
20
○ Directly imposes the invariance on the learned representations
■ Dosoviskiy+ imposes invariance on surrogate classes, not
directly on the learned representations.
○ Focuses on learning discrete representations that are directly
usable for clustering and hsh learning
■ Doviskiy+ focused on learning continuous representations
that are then used for other tasks such as classification and
clustering.
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 21
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 22
At the same time, it regularizes the complexity of the classifier. Let and   
denote random variables for data and cluster assignments, respectively, where K is
the number of clusters.
Method | about RIM
The RIM [Gomes+ 2010] learns a following probabilistic classifier such that
mutual information [Cover and Thomas 2012] between inputs and cluster
assignments is maximized.
23
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 24
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 25
where . Let be
a random variable for the discrete representation.
Method | about IMSAT
● Information maximization for learning discrete representations
26
Extend the RIM and consider learning M-dimensional discrete representations of
data. Let the output domain be
Method | about IMSAT
● Information maximization for learning discrete representations
27
To learn a multi-output probabilistic classifier that maps similar
inputs into similar representations. And then model the conditional probability by
using deep neural network.
Under the model, inputs are conditionally independent given x:
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 28
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 29
Method | about IMSAT
● Regularization of deep neural networks via SAT
30
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
Method | about IMSAT
● Regularization of deep neural networks via SAT
31
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
The prediction of original
data point x
Method | about IMSAT
● Regularization of deep neural networks via SAT
32
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
The prediction of
augmented data point x
Method | about IMSAT
● Regularization of deep neural networks via SAT
33
The regularization by SAT is then the average of over all the
training data points:
The augmented function T means adding small perturbation r and can be expressed
by the following expression:
Method | about IMSAT
● Regularization of deep neural networks via SAT
34
The two representative regularization methods based on local perturbations
● Random Perturbation Training (RPT) [Bachman+ 2016]
● Virtual Adversarial Training (VAT) [Miyato+ 2016]
In VAT, perturbation r is chosen to be an adversarial direction:
Method | for Clustering
35
In clustering, we can directly apply the RIM.
By representing mutual information as the difference between marginal entropy and
conditional entropy [Cover & Thomas 2012], we have the objective to minimize:
The two entropy terms can be calculated as
Method | for Clustering
36
Here, h is the following entropy function:
● Increasing the marginal entropy H(Y)
○ Encourages the cluster sizes to be uniform
● Decreasing the conditional entropy H(Y|X)
○ Encourages unambiguous cluster assignments [Bridle+ 1991]
In the previous research shows that we can incorporate our prior knowledge on
cluster sizes by modifying H(Y) [Gomes+ 2010]
Method | for Clustering
37
H(Y) can be rewritten as follows:
Maximization of H(Y) is equivalent to minimization of KL, which encourages
predicted cluster dist pθ(y) to be close U.
Replaced U in KL with any specified class prior q(y) so that pθ(y) is encouraged to
be close to q(y). We consider the following constrained optimization problem:
Method | for Hash Learning
38
Considering the output space of the augmented data, this gives us
Follows from the definition of interaction information and the conditional
independence that
Method | for Hash Learning
39
In hash learning, each data point is mapped into a D-bit binary code. So the
original RIM is not directly applicable.
The computation of mutual information of D-bit binary code is intractable for large
D because it involves a summation over an exponential number of terms.
[Brown 2009] shows that mutual information can be expanded as the sum of
interaction information like:
Method | for Hash Learning
40
In summary, our approximated objective to minimize is
● First term
○ Regularizes the neural network
● Second term
○ Maximizes the mutual information between data and each hash bit
● Third term
○ Removes the redundancy among the hash bits
Method | Marginal Distribution
41
It is necessary to calculate the marginal distribution when computing mutual
information. This is computationally done using the entire dataset, which is not
suitable for using mini batch SGD. Therefore, we use an following approximation:
In the case of clustering, the approximated objective that we actually minimize is an
upper bound of the exact objective that we try to minimize.
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 42
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 43
Experiments | Overview
44
● About implements
● About clustering
● About hash learning
Experiments | about implements
45
● Clustering
○ Set the network dimensionality to d-1200-1200-M
○ Use Softmax as output layer
● Hash learning
○ Use smaller network sizes to ensure fast computation of mapping
data info hash codes (will be shown later).
○ Use sigmoid as output layer
● Use Adam, ReLU, BatchNorm
Experiments | clustering
46
● About baseline models
Experiments | clustering
47
● About datasets
Experiments | clustering
48
● About evaluation metric
○ Evaluate with Unsupervised clustering accuracy (ACC)
Experiments | clustering
49
● Experiment result
Experiments | clustering
50
● Experiment result
Experiments | clustering
51
● Experiment result
Experiments | hash learning
52
● About dataset
○ MNIST / CIFAR-10
● About baseline models
○ Spectral hashing [Weiss+ 2009]
○ PCA-ITQ [Gong+ 2013]
○ Deep Hash [Erin Liong+ 2015]
○ Linear RIM / Deep RIM / IMSAT(VAT)
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 53
Experiments | hash learning
54
Experiments | hash learning
55
● About evaluation metric
○ Mean Average Precision (mAP)
○ Precision at N = 500 samples
○ Hamming distance
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 56
Conclusion | IMSAT
57
● Proposed “IMSAT”
○ Information theoretic method for unsupervised discrete
representation learning using deep neural networks
● Directly introduce invariance to data augmentation in
an end-to-end fashion
○ Learn robust discrete representations for small perturbations and
affine transformations

More Related Content

What's hot

GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 
秘密分散法の数理
秘密分散法の数理秘密分散法の数理
秘密分散法の数理Akito Tabira
 
ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術MITSUNARI Shigeo
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
mathematical_notation
mathematical_notationmathematical_notation
mathematical_notationKenta Oono
 
Chapter 14 AutoEncoder
Chapter 14 AutoEncoderChapter 14 AutoEncoder
Chapter 14 AutoEncoderKyeongUkJang
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advancedMITSUNARI Shigeo
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper ReviewLEE HOSEONG
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)Susang Kim
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayesKyuri Kim
 

What's hot (20)

Xgboost
XgboostXgboost
Xgboost
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
秘密分散法の数理
秘密分散法の数理秘密分散法の数理
秘密分散法の数理
 
ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術ブロックチェーン系プロジェクトで着目される暗号技術
ブロックチェーン系プロジェクトで着目される暗号技術
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
mathematical_notation
mathematical_notationmathematical_notation
mathematical_notation
 
Chapter 14 AutoEncoder
Chapter 14 AutoEncoderChapter 14 AutoEncoder
Chapter 14 AutoEncoder
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
暗認本読書会13 advanced
暗認本読書会13 advanced暗認本読書会13 advanced
暗認本読書会13 advanced
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
動的計画法を極める!
動的計画法を極める!動的計画法を極める!
動的計画法を極める!
 
Topological sort
Topological sortTopological sort
Topological sort
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
1次式とノルムで構成された最適化問題とその双対問題
1次式とノルムで構成された最適化問題とその双対問題1次式とノルムで構成された最適化問題とその双対問題
1次式とノルムで構成された最適化問題とその双対問題
 

Similar to Learning Discrete Representations via Information Maximizing Self-Augmented Training (IMSAT)

[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...LeapMind Inc
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentationgustavosouto
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning SolivarLabs
 
Application of interpolation in CSE
Application of interpolation in CSEApplication of interpolation in CSE
Application of interpolation in CSEMd. Tanvir Hossain
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyCharles Martin
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptShivaShiva783981
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?Dhafer Malouche
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsViswanath Gangavaram
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databasesssuseraef7e0
 

Similar to Learning Discrete Representations via Information Maximizing Self-Augmented Training (IMSAT) (20)

[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 
Application of interpolation in CSE
Application of interpolation in CSEApplication of interpolation in CSE
Application of interpolation in CSE
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effects
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databases
 

Recently uploaded

Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 

Recently uploaded (20)

Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

Learning Discrete Representations via Information Maximizing Self-Augmented Training (IMSAT)

  • 1. 1 Learning Discrete Representations via Information Maximizing Self-Augmented Training Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama Intelligent Information processing II Nov 20, 2017 University of Tokyo, RIKEN AIP, Preferred Networks, Inc. Proceedings of the 34th International Conference on Machine Learning Presented by Shunsuke KITADA
  • 2. The reason why I chose this paper ● With unsupervised learning achieved high accuracy (98%!) in MNIST classification. ● Published from the University of Tokyo (Sugiyama lab) and Preferred Networks. ● VAT is used as effective regularization term. ● Accepted by ICML 2017. 2
  • 3. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 3
  • 4. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 4
  • 5. Introduction ● Unsupervised discrete representation Learning 5 ○ To obtain a function that maps similar (or dissimilar) data into similar (or dissimilar) discrete representations. ○ The similarity of data is defined according to applications of interests.
  • 6. Introduction ● Clustering and Hash learning 6 ○ Clustering ■ Widely applied to data-driven application domains. [Berkhin 2006] ○ Hash learning ■ Popular for an approximate nearest neighbor search for large scale information retrieval. [Wang+ 2016]
  • 7. Introduction ● Development of Deep neural networks 7 ○ Scalability and flexibility ■ It is possible that learn complex feature and non-linear decision boundaries. ○ Their model complexity is very huge ■ Regularization of the networks is crucial to learn meaningful representations of data.
  • 8. Introduction ● In unsupervised representation learning 8 ○ Target representations are not provided. ○ There are no constraining conditions. ➔ We need to regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. ◆ e.g. ) invariance to small perturbations or affine transformation
  • 9. Introduction | In this paper ● Use data augmentation to model the invariance of learned data representations 9 ○ Map data points into their discrete representations by a deep neural network. ○ Regularize it by encouraging its prediction to be invariant to data augmentation.
  • 10. 10 ● Self-Augmented Training (SAT) Encourage the predicted representations of augmented data points to be close to those of the original data points in end-to-end fashion. ● Regularized Information Maximization (RIM) Maximize information theoretic dependency between inputs and their mapped outputs, while regularizing the mapping function. Information Maximizing Self-Augmented Training
  • 11. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 11
  • 12. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 12
  • 13. Related work | Clustering & Hash Learning ● The representative clustering and hashing methods ○ K-means clustering and hashing [He+ 2013] ○ Gaussian mixture model clustering, iterative quantization [Gong+ 2013] ○ Minimal-loss hashing [Norouzi & Blei 2011] 13 These methods can only model linear boundaries between different representations.
  • 14. Related work | Clustering & Hash Learning ● Methods that can model the non-linearity of data ○ Kernel-based [Xu+ 2014; Kulis & Darrell 2019] ○ Spectral clustering [Xu+ 2014; Kulis & Darrell 2019] 14 They are difficult to scale to large dataset.
  • 15. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Clustering 15 ■ To learn feature representations and cluster assignments [Xie+ 2016] ■ Model the data generation process by using deep generative models with Gaussian mixture models as prior dist [Dilokthanakul+ 2016; Zheng+ 2016]
  • 16. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Hash learning 16 ■ Supervised hash learning [Xia+ 2014; Lai+ 2015; Zhang+ 2015; Xu+2015; Li+ 2015] ■ Unsupervised hash learning ● Stacked RBM [Salakhutdinov & Hinton 2009] ● Use DL for the mapping function [Erin Liong+ 2015]
  • 17. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Hash learning 17 ■ These unsupervised methods did not explicitly intended impose the invariance on the learned representations. ■ The predicted representations may not be useful for applications of interest.
  • 18. Related work | Data Augmentation ● About data augmentation ○ In supervised and semisupervised learning 18 ■ Applying data augmentation to a supervised learning problem is equivalent to adding a regularization to the original cost function. [Leen 1995] ■ Achieve state-of-the-art performance in applying data augmentation to semi-supervised learning. [Bachman+ 2014; Miyato+ 2016; Sajjadi+ 2016]
  • 19. Related work | Data Augmentation ● About data augmentation ○ In unsupervised learning 19 ■ Proposed to use data augmentation to model the invariance of learned representations. [Donovitskiy+ 2014]
  • 20. Related work | Data Augmentation ● Difference between Dosoviskiy+ and IMSAT 20 ○ Directly imposes the invariance on the learned representations ■ Dosoviskiy+ imposes invariance on surrogate classes, not directly on the learned representations. ○ Focuses on learning discrete representations that are directly usable for clustering and hsh learning ■ Doviskiy+ focused on learning continuous representations that are then used for other tasks such as classification and clustering.
  • 21. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 21
  • 22. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 22
  • 23. At the same time, it regularizes the complexity of the classifier. Let and    denote random variables for data and cluster assignments, respectively, where K is the number of clusters. Method | about RIM The RIM [Gomes+ 2010] learns a following probabilistic classifier such that mutual information [Cover and Thomas 2012] between inputs and cluster assignments is maximized. 23
  • 24. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 24
  • 25. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 25
  • 26. where . Let be a random variable for the discrete representation. Method | about IMSAT ● Information maximization for learning discrete representations 26 Extend the RIM and consider learning M-dimensional discrete representations of data. Let the output domain be
  • 27. Method | about IMSAT ● Information maximization for learning discrete representations 27 To learn a multi-output probabilistic classifier that maps similar inputs into similar representations. And then model the conditional probability by using deep neural network. Under the model, inputs are conditionally independent given x:
  • 28. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 28
  • 29. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 29
  • 30. Method | about IMSAT ● Regularization of deep neural networks via SAT 30 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is
  • 31. Method | about IMSAT ● Regularization of deep neural networks via SAT 31 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is The prediction of original data point x
  • 32. Method | about IMSAT ● Regularization of deep neural networks via SAT 32 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is The prediction of augmented data point x
  • 33. Method | about IMSAT ● Regularization of deep neural networks via SAT 33 The regularization by SAT is then the average of over all the training data points: The augmented function T means adding small perturbation r and can be expressed by the following expression:
  • 34. Method | about IMSAT ● Regularization of deep neural networks via SAT 34 The two representative regularization methods based on local perturbations ● Random Perturbation Training (RPT) [Bachman+ 2016] ● Virtual Adversarial Training (VAT) [Miyato+ 2016] In VAT, perturbation r is chosen to be an adversarial direction:
  • 35. Method | for Clustering 35 In clustering, we can directly apply the RIM. By representing mutual information as the difference between marginal entropy and conditional entropy [Cover & Thomas 2012], we have the objective to minimize: The two entropy terms can be calculated as
  • 36. Method | for Clustering 36 Here, h is the following entropy function: ● Increasing the marginal entropy H(Y) ○ Encourages the cluster sizes to be uniform ● Decreasing the conditional entropy H(Y|X) ○ Encourages unambiguous cluster assignments [Bridle+ 1991] In the previous research shows that we can incorporate our prior knowledge on cluster sizes by modifying H(Y) [Gomes+ 2010]
  • 37. Method | for Clustering 37 H(Y) can be rewritten as follows: Maximization of H(Y) is equivalent to minimization of KL, which encourages predicted cluster dist pθ(y) to be close U. Replaced U in KL with any specified class prior q(y) so that pθ(y) is encouraged to be close to q(y). We consider the following constrained optimization problem:
  • 38. Method | for Hash Learning 38 Considering the output space of the augmented data, this gives us Follows from the definition of interaction information and the conditional independence that
  • 39. Method | for Hash Learning 39 In hash learning, each data point is mapped into a D-bit binary code. So the original RIM is not directly applicable. The computation of mutual information of D-bit binary code is intractable for large D because it involves a summation over an exponential number of terms. [Brown 2009] shows that mutual information can be expanded as the sum of interaction information like:
  • 40. Method | for Hash Learning 40 In summary, our approximated objective to minimize is ● First term ○ Regularizes the neural network ● Second term ○ Maximizes the mutual information between data and each hash bit ● Third term ○ Removes the redundancy among the hash bits
  • 41. Method | Marginal Distribution 41 It is necessary to calculate the marginal distribution when computing mutual information. This is computationally done using the entire dataset, which is not suitable for using mini batch SGD. Therefore, we use an following approximation: In the case of clustering, the approximated objective that we actually minimize is an upper bound of the exact objective that we try to minimize.
  • 42. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 42
  • 43. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 43
  • 44. Experiments | Overview 44 ● About implements ● About clustering ● About hash learning
  • 45. Experiments | about implements 45 ● Clustering ○ Set the network dimensionality to d-1200-1200-M ○ Use Softmax as output layer ● Hash learning ○ Use smaller network sizes to ensure fast computation of mapping data info hash codes (will be shown later). ○ Use sigmoid as output layer ● Use Adam, ReLU, BatchNorm
  • 46. Experiments | clustering 46 ● About baseline models
  • 48. Experiments | clustering 48 ● About evaluation metric ○ Evaluate with Unsupervised clustering accuracy (ACC)
  • 49. Experiments | clustering 49 ● Experiment result
  • 50. Experiments | clustering 50 ● Experiment result
  • 51. Experiments | clustering 51 ● Experiment result
  • 52. Experiments | hash learning 52 ● About dataset ○ MNIST / CIFAR-10 ● About baseline models ○ Spectral hashing [Weiss+ 2009] ○ PCA-ITQ [Gong+ 2013] ○ Deep Hash [Erin Liong+ 2015] ○ Linear RIM / Deep RIM / IMSAT(VAT)
  • 53. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 53
  • 54. Experiments | hash learning 54
  • 55. Experiments | hash learning 55 ● About evaluation metric ○ Mean Average Precision (mAP) ○ Precision at N = 500 samples ○ Hamming distance
  • 56. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 56
  • 57. Conclusion | IMSAT 57 ● Proposed “IMSAT” ○ Information theoretic method for unsupervised discrete representation learning using deep neural networks ● Directly introduce invariance to data augmentation in an end-to-end fashion ○ Learn robust discrete representations for small perturbations and affine transformations