(1) The document discusses using autoencoders for image classification. Autoencoders are neural networks trained to encode inputs so they can be reconstructed, learning useful features in the process. (2) Stacked autoencoders and convolutional autoencoders are evaluated on the MNIST handwritten digit dataset. Greedy layerwise training is used to construct deep pretrained networks. (3) Visualization of hidden unit activations shows the features learned by the autoencoders. The main difference between autoencoders and convolutional networks is that convolutional networks have more hardwired topological constraints due to the convolutional and pooling operations.
6. Theoretıcal Background
• Artificial Neural Networks
• Deep Neural Networks and Deep Learning
• Autoencoders and Sparsity
• Convolutional Networks
7. artificial neural networks
• The central idea is to extract linear
combinations of the inputs as derived features
and then model the target as a nonlinear
function of these features
• A feedforward neural network of depth n is a n-
stage regression or classification model,
8.
9. The outputs of layer l are called activations and are
computed based on linear combinations of inputs
and the bias unit in the following way:
Encoding
Decoding
10. soft-max activation function used as the
last layer (classifier) for K-class
classification
Two types of activation functions:
sigmoid activation and soft-max
activation
11. When training feedforward networks we
use an average sum-of-squared errors as
an error function
To prevent from overfitting we add regularization
to error function
12. deep neural networks and
deep learning
• Deep vanilla neural networks perform worse
than neural networks with one or two hidden
layers.
• In theory deep neural networks have at least
the same expressive power as shallow neural
networks but in practice they stuck in local
optima during training phase.
• It is important to use a non-linear activation
function f(x) in each hidden layer
13. autoencoders and sparsity
• An autoencoder is a neural network that is
trained to encode an input x into some
representation c(x) so that the input can be
reconstructed from that representation
14.
15. After successful3 training,
it should decompose
the inputs into a
combination of
hidden layer activations.
With this trained
autoencoder has learned
features
16. We can measure the average activations of the
neurons in the second layer:
and add a penalty to the error function which will prevent
the activations from straying too far from some desired
mean activation p (the sparsity parameter).
* Kullback-Leibler divergence
17.
18. The resulting autoencoder is called a sparse
autoencoder.
B is called the sparsity constraint and controls
the sparsity penalty.
24. convolutional networks
• Better than vanilla neural network.
• Inspired by the human visual system structure
and work by exploiting local connections
through two operations ( Convolution and Sub-
sampling / Pooling)
35. greedy layer wise training
• to construct a deep pretrained network of n
layers divide the learning into n stages.
• In the first stage train an autoencoder on the
provided training data sans labels.
• Next map the training data to the feature space.
• The mapped data is then used to train the next
stage auto encoder.
• The training follows layer by layer until the last
one.
• The last layer is trained as a classifier (not as an
autoencoder) using supervised learning.
40. After training the last stage, the networks n1 through n4
are stacked to form a deep neural network. Use the full
training set to train the deep neural network – this final
step is called fine-tuning.
42. archıtecture-2
• Instead of training the network on the full
image we can exploit local connectivity via
convolutional networks, and additionally
restrict the number of trainable parameters
with the use of pooling.
53. difference of cnns and
autoencoders
• The main difference between AutoEncoder and
Convolutional Network is the level of network hardwiring.
Convolutional Nets are pretty much hardwired. Convolution
operation is pretty much local in image domain, meaning
much more sparsity in the number of connections in neural
network view. Pooling(subsampling) operation in image
domain is also a hardwired set of neural connections in
neural domain. Such topological constraints on network
structure. Given such constraints, training of CNN learns
best weights for this convolution operation (In practice there
are multiple filters). CNNs are usually used for image and
speech tasks where convolutional constraints are a good
assumption.
54. • In contrast, Autoencoders almost specify
nothing about the topology of the network.
They are much more general. The idea is to
find good neural transformation to reconstruct
the input. They are composed of encoder
(projects the input to hidden layer) and
decoder (reprojects hidden layer to output).
The hidden layer learns a set of latent features
or latent factors. Linear autoencoders span the
same subspace with PCA. Given a dataset,
they learn number of basis to explain the
underlying pattern of the data.