Seal of Good Local Governance (SGLG) 2024Final.pptx
Wits presentation 6_28072015
1. Object Recognition Tutorial
Beatrice van Eden
- Part time PhD Student at the University of the Witwatersrand.
- Fulltime employee of the Council for Scientific and Industrial Research.
2. Research Problem
• Hierarchical concept
formation
• This research will allow a robot to
learn about its environment
autonomously
• Build a concept about these
environments
• Even if it has not seen that specific
instance previously
3. Why Object Recognition
• Environments are build up by different objects
• RGB-D Sensor for perception
• Concept formation need some base line to work from
• Exposure to ML techniques
• Cascading classifiers
• Convolutional Neural Networks
• Support Vector Machine
4. Index: Cascading Classifiers
• Cascading classifiers
• Haar-like features
• Local binary patterns
• Implementation
• Results
5. Cascading classifiers
• Cascading is a particular case of ensemble
learning based on the concatenation of several
classifiers, using all information collected from
the output from a given classifier as additional
information for the next classifier in the cascade.
6. Haar-like features
• The difference of the sum of pixels of areas inside the
rectangle
• The values indicate certain characteristics of a particular area
of the image.
7. Haar-like features
• The Viola-Jones detector is a strong, binary classifier build of
several weak detectors
• Does a certain sub-region of the original image contain an
instance of the object of interest or not
8. Local binary patterns
• Divide the examined window into cells (e.g. 16x16 pixels for
each cell).
• For each pixel in a cell, compare the pixel to each of its 8
neighbours (on its left-top, left-middle, left-bottom, right-top,
etc.). Follow the pixels along a circle, i.e. clockwise or counter-
clockwise.
• Where the centre pixel's value is greater than the neighbour's
value, write "1". Otherwise, write "0". This gives an 8-digit
binary number.
• Compute the histogram, over the cell, of the frequency of
each "number" occurring.
• Optionally normalize the histogram.
• Concatenate (normalized) histograms of all cells. This gives the
feature vector for the window.
9. Local binary patterns
• a Powerful feature for texture classification
• LBP is faster but less accurate than Haar.
• LBP does all the calculations in integers. Haar uses floats.
• LBP few hours of training Haar few days
10. Implementation
• SAMPLES - How many images do we need?
• Depend on a variety of factors, including the quality of the
images, the object you want to recognize, the method to
generate the samples, the CPU power you have and probably
some magic.
• Positive images 50 -> 1500, list in .txt file.
• Negative images 1500, list in .txt file.
11. Implementation
• Create samples with OpenCV, generates a large number of
positive samples from our positive images, by applying
transformations and distortions. Used a Perl script to combine
positive image with negative image
• *.vec file are created, merge them into one
• opencv_haartraining and opencv_traincascade.
opencv_traincascade supports both Haar [Viola2001] and LBP
[Liao2007] (Local Binary Patterns) features.
15. Index: CNN
• Convolutional Neural Networks
• Example
• Overview and Intuition
• Implementation
• Results
16. Convolutional Neural Networks
• Neural network vs. Convolutional neural network
• Layers used to build ConvNets
• Convolutional Layer, Pooling Layer, and Fully-Connected Layer
(exactly as seen in regular Neural Networks).
17. Example
• Input:
• Image: width 32, height 32, three colour channels.
• CONV layer:
• Local filter over previous layer
• Dot product between weights and sliding region in the input volume.
[32x32x12]
• RELU layer:
• Apply an elementwise activation function, such as the max(0,x)
thresholding at zero. This leaves the size of the volume unchanged.
• POOL layer:
• Down sampling operation along the spatial dimensions (width,
height). [16x16x12]
• FC layer:
• Compute the class scores. As with ordinary Neural Networks each
neuron in this layer will be connected to all the numbers in the
previous volume.
18. Convolutional Neural Networks
• CNN is a type of feed-forward artificial neural network where
the individual neurons are tiled in such a way that they
respond to overlapping regions in the visual field.
19. Overview and Intuition
• CONV layer's parameters consist of a set of learnable filters
• Every filter is small spatially (along width and height), but
extends through the full depth of the input volume
• As we slide the filter, across the input, we are computing the
dot product between the entries of the filter and the input
• Intuitively, the network will learn filters that activate when
they see some specific type of feature at some spatial position
in the input
• Stacking these activation maps for all filters along the depth
dimension forms the full output volume
20. Convolutional Neural Networks
• Three hyperparameters control the size of the output volume:
the depth, stride and zero-padding
• Depth of the output volume is a hyperparameter that we can
pick. It controls the number of neurons in the Conv layer that
connect to the same region of the input volume.
• We specify the stride with which we allocate depth columns
around the spatial dimensions (width and height).
• Zero padding allow us to control the spatial size of the output
volumes.
Example filters learned
23. Index: SVM
• Support Vector Machine
• Histogram of Oriented Gradients
• Implementation
• Results
24. Support Vector Machine
• Given a set of training examples, each marked for belonging to
one of two categories, an SVM training algorithm builds a
model that assigns new examples into one category or the
other.
25. What is the goal of the Support
Vector Machine (SVM)?
• The goal of a support vector machine is to find the optimal
separating hyperplane which maximizes the margin of the
training data.
26. Histogram of Oriented Gradients
• The technique counts occurrences of gradient orientation in
localized portions of an image
• The descriptor is made up of M*N cells covering the image
window in a grid.
• Each cell is represented by a histogram of edge orientations,
where the number of discretized edge orientations is a parameter
(usually 9).
• The cell histogram is visualized by a 'star' showing the strength of
the edge orientations in the histogram: the stronger a specific
orientation, the longer it is relative to the others.
27. • Note that there are various normalization schemes:
• Local schemes, in which the cell in normalized with respect to
neighboring cells only [Dalal-Triggs]
• Global schemes, in which the orientation length is normalized by
all the cells
• Also note that some authors use multiple local normalizations per
cell
Histogram of Oriented Gradients
The example below shows a model of a bike (from Felzenszwalb et al.)
with HoG consisting of 7*11 cells, each with 8 orientations
28. • (a) Test image
• (b) Gradient image of the test image
• (c) Orientation and magnitude of Gradient in each cell
• (d) HoG of cells
• (e) Average gradient image over the training example
• (f) Weights of positive SVM in the block
• (g) HoG descriptor weighted by the positive SVM weights
Histogram of Oriented Gradients