Wits presentation 6_28072015

Object Recognition Tutorial
Beatrice van Eden
- Part time PhD Student at the University of the Witwatersrand.
- Fulltime employee of the Council for Scientific and Industrial Research.

Research Problem
• Hierarchical concept
formation
• This research will allow a robot to
learn about its environment
autonomously
• Build a concept about these
environments
• Even if it has not seen that specific
instance previously

Why Object Recognition
• Environments are build up by different objects
• RGB-D Sensor for perception
• Concept formation need some base line to work from
• Exposure to ML techniques
• Cascading classifiers
• Convolutional Neural Networks
• Support Vector Machine

Index: Cascading Classifiers
• Haar-like features
• Local binary patterns
• Implementation
• Results

Cascading classifiers
• Cascading is a particular case of ensemble
learning based on the concatenation of several
classifiers, using all information collected from
the output from a given classifier as additional
information for the next classifier in the cascade.

Haar-like features
• The difference of the sum of pixels of areas inside the
rectangle
• The values indicate certain characteristics of a particular area
of the image.

Haar-like features
• The Viola-Jones detector is a strong, binary classifier build of
several weak detectors
• Does a certain sub-region of the original image contain an
instance of the object of interest or not

Local binary patterns
• Divide the examined window into cells (e.g. 16x16 pixels for
each cell).
• For each pixel in a cell, compare the pixel to each of its 8
neighbours (on its left-top, left-middle, left-bottom, right-top,
etc.). Follow the pixels along a circle, i.e. clockwise or counter-
clockwise.
• Where the centre pixel's value is greater than the neighbour's
value, write "1". Otherwise, write "0". This gives an 8-digit
binary number.
• Compute the histogram, over the cell, of the frequency of
each "number" occurring.
• Optionally normalize the histogram.
• Concatenate (normalized) histograms of all cells. This gives the
feature vector for the window.

Local binary patterns
• a Powerful feature for texture classification
• LBP is faster but less accurate than Haar.
• LBP does all the calculations in integers. Haar uses floats.
• LBP few hours of training Haar few days

Implementation
• SAMPLES - How many images do we need?
• Depend on a variety of factors, including the quality of the
images, the object you want to recognize, the method to
generate the samples, the CPU power you have and probably
some magic.
• Positive images 50 -> 1500, list in .txt file.
• Negative images 1500, list in .txt file.

Implementation
• Create samples with OpenCV, generates a large number of
positive samples from our positive images, by applying
transformations and distortions. Used a Perl script to combine
positive image with negative image
• *.vec file are created, merge them into one
• opencv_haartraining and opencv_traincascade.
opencv_traincascade supports both Haar [Viola2001] and LBP
[Liao2007] (Local Binary Patterns) features.

Implementation
• http://coding-robin.de/2013/07/22/train-your-own-opencv-
haar-classifier.html
• Video LBP – Coke Can
• Video Haar – Coke Can
• Video LBP – Face recognition
• Choose amount of stages to train

Results
• To be generated – working on confusion matrix
•

Index: CNN
• Example
• Overview and Intuition
• Implementation
• Results

Convolutional Neural Networks
• Neural network vs. Convolutional neural network
• Layers used to build ConvNets
• Convolutional Layer, Pooling Layer, and Fully-Connected Layer
(exactly as seen in regular Neural Networks).

Example
• Input:
• Image: width 32, height 32, three colour channels.
• CONV layer:
• Local filter over previous layer
• Dot product between weights and sliding region in the input volume.
[32x32x12]
• RELU layer:
• Apply an elementwise activation function, such as the max(0,x)
thresholding at zero. This leaves the size of the volume unchanged.
• POOL layer:
• Down sampling operation along the spatial dimensions (width,
height). [16x16x12]
• FC layer:
• Compute the class scores. As with ordinary Neural Networks each
neuron in this layer will be connected to all the numbers in the
previous volume.

• CNN is a type of feed-forward artificial neural network where
the individual neurons are tiled in such a way that they
respond to overlapping regions in the visual field.

Overview and Intuition
• CONV layer's parameters consist of a set of learnable filters
• Every filter is small spatially (along width and height), but
extends through the full depth of the input volume
• As we slide the filter, across the input, we are computing the
dot product between the entries of the filter and the input
• Intuitively, the network will learn filters that activate when
they see some specific type of feature at some spatial position
in the input
• Stacking these activation maps for all filters along the depth
dimension forms the full output volume

• Three hyperparameters control the size of the output volume:
the depth, stride and zero-padding
• Depth of the output volume is a hyperparameter that we can
pick. It controls the number of neurons in the Conv layer that
connect to the same region of the input volume.
• We specify the stride with which we allocate depth columns
around the spatial dimensions (width and height).
• Zero padding allow us to control the spatial size of the output
volumes.
Example filters learned

Implementation
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/#the-data
• Lasagne, a library for building neural networks with Python
and Theano.
• CPU vs. CUDA-capable GPU
• Ran the MNIST example (Recognise 0-9 digits).
• Facial key point
• Data available as *.csv files. Load training
and test data.
• Video CNN – Coke Can
• Video CNN – Coke Can
The predictions of
net1 on the left
compared to the
predictions of net2.

Index: SVM
• Histogram of Oriented Gradients
• Implementation
• Results

Support Vector Machine
• Given a set of training examples, each marked for belonging to
one of two categories, an SVM training algorithm builds a
model that assigns new examples into one category or the
other.

What is the goal of the Support
Vector Machine (SVM)?
• The goal of a support vector machine is to find the optimal
separating hyperplane which maximizes the margin of the
training data.

Histogram of Oriented Gradients
• The technique counts occurrences of gradient orientation in
localized portions of an image
• The descriptor is made up of M*N cells covering the image
window in a grid.
• Each cell is represented by a histogram of edge orientations,
where the number of discretized edge orientations is a parameter
(usually 9).
• The cell histogram is visualized by a 'star' showing the strength of
the edge orientations in the histogram: the stronger a specific
orientation, the longer it is relative to the others.

• Note that there are various normalization schemes:
• Local schemes, in which the cell in normalized with respect to
neighboring cells only [Dalal-Triggs]
• Global schemes, in which the orientation length is normalized by
all the cells
• Also note that some authors use multiple local normalizations per
cell
The example below shows a model of a bike (from Felzenszwalb et al.)
with HoG consisting of 7*11 cells, each with 8 orientations

• (a) Test image
• (b) Gradient image of the test image
• (c) Orientation and magnitude of Gradient in each cell
• (d) HoG of cells
• (e) Average gradient image over the training example
• (f) Weights of positive SVM in the block
• (g) HoG descriptor weighted by the positive SVM weights

Implementation
• http://solvedstack.com/questions/svm-classifier-based-on-
hog-features-for-object-detection-in-opencv
• http://thebrainiac1.blogspot.com/2012/07/v-
behaviorurldefaultvmlo.html
• Video HoG – Coke Can
• Video HoG – Face recognition

Results
• To be generated – working on confusion matrix

Conclusion

Wits presentation 6_28072015

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à Wits presentation 6_28072015

Similaire à Wits presentation 6_28072015 (20)

Plus de Beatrice van Eden

Plus de Beatrice van Eden (19)

Dernier

Dernier (20)

Wits presentation 6_28072015