Unsupervised Feature Learning

Unsupervised Feature Learning:
A Literature Review
By: Amgad Muhammad & Mohamed EL Fadly

1

Outline
• Background
• Problem Definition
• Unsupervised Feature Learning
• Our Work
• Sparse Auto-encoder
• Preprocessing: PCA and Whitening
• Self-Taught Learning and Unsupervised Feature Learning

• References

2 of 37

Background
•

Machine learning is one of the corner stone fields in Artificial Intelligence, where machines learn to act

autonomously, and react to new situations without being pre-programmed.
•

Machine learning has seen numerous successes, but applying learning algorithms today often means
spending a long time hand-engineering the input feature representation. This is true for many problems in
vision, audio, NLP, robotics, and other areas.

•

There are many learning algorithms for learning among them are [1]:
1)

Supervised learning

2)

Unsupervised learning

3 of 37

Problem Definition
•

The target of the supervised learning method can be summarized as follows:
•
•

•

Regression
Classification

The first step to train a machine using the supervised learning method, is collecting the data set, which in most cases
is a very difficult and an expensive process

•

The alternative approach is to measure and use everything, which will lead to other problems, i.e. the noisy data [2]

4 of 37

Unsupervised feature learning
•

The unsupervised feature learning approach learns higher-level representation of the unlabeled data

features by detecting patterns using various algorithms, i.e. sparse encoding algorithm [3]
•

It is a self-taught learning framework developed to transfer knowledge from unlabeled data, which is much
easier to obtain, to be used as preprocessing step to enhance the supervised inductive models.

•

This framework is developed to tackle present issues in the supervised learning model and to increase its
accuracy regardless of the domain of interest (vision, sound, and text).[4]

5 of 37

Our Work
•

We will present some of the methods for unsupervised feature learning and deep learning, each of which

automatically learns a good representation of the input from unlabeled data.
•

We will be concentrating on the following algorithms, with more details in the following slides:
•
•

PCA and Whitening

•

•

Sparse Autoencoder

Self-Taught

We will also be focusing on the application of these algorithms to learn features from images

6 of 37

Sparse Auto-encoder

Autoencoder [6]
8 of 37

Neural Network
Before we get further into the details of the algorithm, we need to quickly go through neural network.
To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises

a single "neuron." We will use the following diagram to denote a single neuron [5]

Single Neuron [8]

9 of 37

Sigmoid Activation Function

Sigmoid Function [8]

11 of 37

Tanh Activation Function

Tanh Function [8]

12 of 37

Neural Network Model
•

A neural network is put together by hooking together many of our simple "neurons," so that the output of a
neuron can be the input of another. For example, here is a small neural network

•

The circles labeled "+1" are called bias units, and correspond to the intercept term. The leftmost layer of the
network is called the input layer, and the rightmost layer the output layer .The middle layer of nodes is called
the hidden layer, because its values are not observed in the training set.[8]

Small Neural Network[8]

13 of 37

Neural Network Model

14 of 37

Autoencoders and Sparsity

15 of 37

Autoencoders and Sparsity Algorithm

16 of 37

Autoencoders and Sparsity Algorithm –cont’d

17 of 37

Autoencoders and Sparsity Algorithm –cont’d

KL Function [6]
18 of 37

Autoencoders and Sparsity Algorithm – Cont’d

19 of 37

Autoencoder Implementation
•

We implemented a sparse autoencoder, trained with 8×8 image patches using the L-BFGS optimization algorithm

Step 1: Generate training set
The first step is to generate a training set.

A random sample of 200 patches from the dataset.

20 of 37

Autoencoder Implementation
Step 2: Sparse autoencoder objective
Compute the sparse autoencoder cost function Jsparse(W,b) and the corresponding derivatives of Jsparse with respect
to the different parameters
Step3: Train the sparse autoencoder
After computing Jsparse and its derivatives, we will minimize Jsparse with respect to its parameters, and thereby train our
sparse autoencoder. We trained our sparse encoder with L-BFGS algorithm Our neural network for training has 64
input units, 25 hidden units, and 64 output units.

21 of 37

Autoencoder Implementation Results
After training the sparse autoencoder, the sparse autoencoder
successfully learned a set of edge detectors.
CPU

Intel corei7 Quad Core processor
2.7GHz

RAM

6 GB RAM

Training Set

200 patches 8x8 images

Neural Network for training

64 input units, 25 hidden units, and
64 output units.

22 of 37

Autoencoder Implementation Results
Training Time

Expected Time [1]

39 seconds

Less than a minute

23 of 37

Principle Component Analysis –
PCA

24 of 32

Principle Component Analysis – PCA
•

PCA is a dimensionality reduction mechanism used to eliminate highly correlated variables, without
sacrificing much of the details.[7]

25 of 37

PCA – Example
Example
•

Given the 2D data example.

•

This data has already been pre-processed using mean normalization.

•

We want to find the principle directions of variation.

2D data example[8]
26 of 37

PCA – Example (Cont’d)

u2

u1

2D data example[8]
27 of 37

PCA – Math

2D data example[8]

29 of 37

PCA – Dimensionality Reduction

30 of 37

PCA – Dimensionality Reduction

31 of 37

Self-Taught Learning

33 of 32

Self-Taught learning and Unsupervised feature
learning
Given an unlabeled data set, we can start training a sparse autoencoder to extract
features to give us a better, condense representation of the data.

Neural Network[8]

34 of 37

learning
• Once the training is done, the network is now ready to find better features to represent
the input using the activations of the network hidden layer. [8]

Input layer of Neural Network[8]

35 of 37

learning


36 of 37

learning


37 of 37

Self-Taught Learning Application
• We used the self-taught learning paradigm with the sparse autoencoder and softmax
classifier to build a classifier for handwritten digits.
• The goal is to distinguish between the digits from 0 to 4. We will use the digits 5 to 9 as

our "unlabeled" dataset; we will then use a labeled dataset with the digits 0 to 4 with
which to train the softmax classifier.

38 of 37

Self-Taught Learning Implementation
Step 1: Generate the input and test data sets
We used the datasets from the MNIST Handwritten Digit Database for this project.
Step 2: Train the sparse autoencoder
We used the unlabeled data (the digits from 5 to 9) to train a sparse autoencoder. These results are shown after training is
complete for a visualization of pen strokes like the image shown to the right
Step 3: Extracting features
After the sparse autoencoder is trained, we will use it to extract features from the handwritten digit images.
Step 4: Training and testing the logistic regression model
We will train a softmax classifier using the training set features and labels and finally computing the predictions and accuracy

39 of 37

Self-Taught Learning Setup Environment
CPU

Intel
corei7
Quad
processor 2.7GHz

Core

RAM

6 GB RAM

Training Set

60,000 examples from MNIST
database

Unlabeled set

29404 examples

Supervised training set

15298 examples

Supervised testing set

15298 examples

40 of 37

Self-Taught Learning Results
The results are shown below after training is complete for a visualization of pen strokes like the image shown below:

41 of 37

Self-Taught Learning Anaylsis
We have done a comparison between our
application outputs and the Stanford course
tutorial outputs [8].
Our classifier

Tutorial’s
classifier

Training
Time

16 minutes

25 minutes

Classifier
Score
(Accuracy)

98.208916%

98 %

42 of 37

Future Work
We propose that if we were able to parallize our code or make the training part run on a GPU for example, it will
boost the performance and decrease the time needed to train the classifier

43 of 37

References
[1] Taiwo Oladipupo Ayodele. New Advances in Machine Learning. InTech, 2010.
[2] SB Kotsiantis, ID Zaharakis, and PE Pintelas. Supervised machine learning: A review of classication techniques. 31:249-268, 2007.
[3] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Ng. Ecient sparse coding algorithms. In Advances in neural information processing systems, pages 801-808,2006.
[4] Bruno A Olshausen et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607-609, 1996.
[5] Simon O. Haykin, ”Multilayer Perceptron,” in Neural Networks and Learning Machines, 3rd Edition ed. , Prentice Hall, 2009.
[6] Andrew Ng. CS294A . Lecture notes, Topic : “Sparse autoencoder ” Standford University, Jan 11, 2011. Available:
http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf. [Accessed Dec. 10,2013].
[7] Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer, “Principal components and whitening,” in Natural Image Statistics: A Probabilistic Approach to Early
Computational Vision., Vol. 39, Springer-Verlag, 2009,pp. 97-137
[8] Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, and Caroline Suen, “UFLDL Tutorial”, April 7, 2013. [Online]. Available:
http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial. [Accessed Dec. 10,2013].

44 of 37

Unsupervised Feature Learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Unsupervised Feature Learning

Similaire à Unsupervised Feature Learning (20)

Plus de Amgad Muhammad

Plus de Amgad Muhammad (6)

Dernier

Dernier (20)

Unsupervised Feature Learning