Autoencoders for image_classification

Autoencoders for Image
Classification
Bartosz Witkowski
Jagiellonian University
Faculty of Mathematics and Computer Science
INSTITUTE OF COMPUTER SCIENCE

Contents
• Theoretical Background
• Problem Formulation
• Methodology
• Results

Theoretıcal Background
• Artificial Neural Networks
• Deep Neural Networks and Deep Learning
• Autoencoders and Sparsity
• Convolutional Networks

artificial neural networks
• The central idea is to extract linear
combinations of the inputs as derived features
and then model the target as a nonlinear
function of these features
• A feedforward neural network of depth n is a n-
stage regression or classification model,

The outputs of layer l are called activations and are
computed based on linear combinations of inputs
and the bias unit in the following way:
Encoding
Decoding

soft-max activation function used as the
last layer (classifier) for K-class
classification
Two types of activation functions:
sigmoid activation and soft-max
activation

When training feedforward networks we
use an average sum-of-squared errors as
an error function
To prevent from overfitting we add regularization
to error function

deep neural networks and
deep learning
• Deep vanilla neural networks perform worse
than neural networks with one or two hidden
layers.
• In theory deep neural networks have at least
the same expressive power as shallow neural
networks but in practice they stuck in local
optima during training phase.
• It is important to use a non-linear activation
function f(x) in each hidden layer

autoencoders and sparsity
• An autoencoder is a neural network that is
trained to encode an input x into some
representation c(x) so that the input can be
reconstructed from that representation

After successful3 training,
it should decompose
the inputs into a
combination of
hidden layer activations.
With this trained
autoencoder has learned
features

We can measure the average activations of the
neurons in the second layer:
and add a penalty to the error function which will prevent
the activations from straying too far from some desired
mean activation p (the sparsity parameter).
* Kullback-Leibler divergence

The resulting autoencoder is called a sparse
autoencoder.
B is called the sparsity constraint and controls
the sparsity penalty.

convolutional networks
• Better than vanilla neural network.
• Inspired by the human visual system structure
and work by exploiting local connections
through two operations ( Convolution and Sub-
sampling / Pooling)

convolution
• Organized in layers of two types:
• Convolution, Sub-sampling

pooling
• Biologically inspired operation that reduces the
dimensionality of the input.

Single cell of output matrix is calculated by:
kernel, I is the input matrix. In actual implementation P

Dataset: Handwritten digits,
Training Set: 60,000 examples
Test Set: 10,000 examples
Size: 28 x 28

methodology
• Architecture-1 Stacked Autoencoders
• Artchitecture-2 Stacked Convolutional
Autoencoders
• Visualizing Features

architecture-1
• 784-200-200-200-10 Deep network
• Greedy layerwise training
• Training protocol
• Training Parameters and Methods

greedy layer wise training
• to construct a deep pretrained network of n
layers divide the learning into n stages.
• In the first stage train an autoencoder on the
provided training data sans labels.
• Next map the training data to the feature space.
• The mapped data is then used to train the next
stage auto encoder.
• The training follows layer by layer until the last
one.
• The last layer is trained as a classifier (not as an
autoencoder) using supervised learning.

t: the first 30000 images (out of 60000

After training the last stage, the networks n1 through n4
are stacked to form a deep neural network. Use the full
training set to train the deep neural network – this final
step is called fine-tuning.

modify the weights W(1) as well, so that adjustments can be m

archıtecture-2
• Instead of training the network on the full
image we can exploit local connectivity via
convolutional networks, and additionally
restrict the number of trainable parameters
with the use of pooling.

visualizing features
Activation of the hidden unit i

difference of cnns and
autoencoders
• The main difference between AutoEncoder and
Convolutional Network is the level of network hardwiring.
Convolutional Nets are pretty much hardwired. Convolution
operation is pretty much local in image domain, meaning
much more sparsity in the number of connections in neural
network view. Pooling(subsampling) operation in image
domain is also a hardwired set of neural connections in
neural domain. Such topological constraints on network
structure. Given such constraints, training of CNN learns
best weights for this convolution operation (In practice there
are multiple filters). CNNs are usually used for image and
speech tasks where convolutional constraints are a good
assumption.

• In contrast, Autoencoders almost specify
nothing about the topology of the network.
They are much more general. The idea is to
find good neural transformation to reconstruct
the input. They are composed of encoder
(projects the input to hidden layer) and
decoder (reprojects hidden layer to output).
The hidden layer learns a set of latent features
or latent factors. Linear autoencoders span the
same subspace with PCA. Given a dataset,
they learn number of basis to explain the
underlying pattern of the data.

Cenk Bircanoğlu
“Thank You For Listening”

Autoencoders for image_classification

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Autoencoders for image_classification

Similaire à Autoencoders for image_classification (20)

Plus de Cenk Bircanoğlu

Plus de Cenk Bircanoğlu (7)

Dernier

Dernier (20)

Autoencoders for image_classification