Multi-class Classification on Manifolds for Video Surveillance

Multi-class Classification on Riemannian
Manifolds for Video Surveillance
D. Tosato, M. Farenzena, M. Cristani, M. Spera and V. Murino
Dipartimento di Informatica, University of Verona, Italy
Istituto Italiano di Tecnologia (IIT), Genova, Italy

The Problem
In video surveillance, classification of visual data
can be very hard
Small obj., << 50x50
< 60x40
Low resolution
Occlusions
Bad light conditions
What kind of situations we want to tackle?

The goal of this work is …
• finding a feature able to describe visual
objects at prohibitive low resolutions.
• building a robust multi-class learning
framework which marries the selected object
description.
Related works:
• O. Tuzel, F. Porikli, P. Meer. Pedestrian detection via classification on Riemannian manifolds.
IEEE PAMI, 2008.
• J. Orozco, S. Gong, T. Xiang. Head pose classification in crowded scenes. BMVC 2009.
• N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. CVPR 2005.
3

Outline
• Problem overview
• Feature layout: ARCO
• Multi-class classification framework on
Riemannian Manifolds
• Experiments
• Computational considerations
• Conclusions and future work
4

Overview
Assume:
• Data are typically at low resolution, often in crowded
scenarios.
5
• According to a surveillance task, only a coarse
categorization can be achieved.
• Data must be roughly aligned for training purposes.

Overview
The method in a nutshell …
1. Features and their integral representations are
calculated.
2. For a set of overlapping patches on a regular grid
covariance matrices as object descriptors are built.
3. Covariance matrices are managed introducing the
sectional curvature analysis (SCA)
4. For each patch a multi-class LogitBoost classifier is
instantiated.
5. A majority voting is used to label an image.
6

Outline
• Feature layout: ARCO
• Experiments
7

ARCO (ARray of COvariances)
• An image is organized into a grid of uniformly spaced and
overlapping patches.
• This layout does not needs to find region/point of interest and
is efficiently computed.
• The patches of pixels, on a fixed grid of pixels steps.
• We achieve the best classification performances both for
pedestrians and heads using where is the image
dimension. 8

ARCO (ARray of COvariances)
• Each patch is described by a covariance matrix of image
features.
• These have been exploited as powerful descriptors of
pedestrians [Tuzel et al. PAMI2008].
• Their effectiveness have been explicitly investigated in a
comparative study [Paisitkriangkrai et al. IET-CV2008].
• Their versatility has been shown in [Tuzel et al. ECCV2006].
• Feature set is task-dependent:
9
Head Pose Classification/Detection
Pedestrian Detection

Working with covariances
• An object is described with a set of covariance
matrices.
• Covariance matrices live on a Riemannian manifold
and typical machine learning techniques are not
usable.
• Covariances have to be projected on local manifold
views (vectorial spaces) for learning purposes.
• At the state-of-the art, classifiers are learned on the
local views and combined with boosting.
10

Interesting issues in using covariances
• On the Riemannian Manifold
of covariances , the
distance between points is
• On a tangent space the
distance is the usual
Euclidean distance
11
So, the question is:
- is a good approximation of ? ?
- if so, which tangent space must be chosen?

Working efficiently with covariances
• Covariances are very powerful descriptors with some
characteristics:
– covariances’ calculation is fast thanks to integral tensor
representation [Tuzel et al., ECCV2006]
– the computational burden necessary to utilize them is a
drawback
• The nonlinear manifold of covariances can be turned
into a flat one using the Log-Euclidean metric, but the
goodness of the approximation have to be estimated.
Sectional Curvature Analysis - SCA
12
I. Chavel, Riemannian Geometry - A modern introduction. Cambridge Univ. Press, 2006.

Sectional Curvature Analysis - SCA
• The space of covariance matrices can be
equipped with a Riemannian metric.
• SCA is a way to describe the curvature on a Riemannian
Manifold which naturally generalize the classical
Gaussian curvature for surfaces.
• is a homogeneous symmetric space, therefore
its negative sectional curvature can be computed at .
• If the SCA coefficient is close to 0 implies that the
Riemannian Manifold is almost flat (= Euclidean space).
13
Idea

SCA (2)
Let and their
logarithm mapping at ( e.g. ), and
an approximated (geodesic) distance
14
It is a non-negative function that depends
on the sectional curvature

SCA (3)
• Experimentally mean value is -10-3 that is
far from the standard negative curvature -1
• In this conditions, one can choose any point
on which to map the dataset.
15
Sectional Curvature

Outline
• Feature layout: ArCO
• Experiments
16

Multi-class Boosting Framework
• For each patch a multi-class boosting classifier is
learned.
• To reach a computationally feasible solution we
exploit the result of SCA in a multi-class
LogitBoost learning framework:
• Given , where , the
Riemannian Manifold of covariance matrices and
a set of labels, data is mapped to
by:
17

LogitBoost [J. Friedman, T. Hastie, and R. Tibshirani, Ann Statist. 2000]
18
• LB is a real (not 1-vs-All) multi-class boosting
framework which fits iteratively an additive
symmetric logistic model to get the posterior
over the classes
• The update step combine the weak
classification response coming from each class.
Multi-class weak classifier
Binary weak classifier

• At each iteration, LB combine binary weak
classification response fitting its
own linear/non-linear regressor .
• Each multi-class weak learner
focuses on a sub-window on an overlapped
regular grid of Np patches.
• We assign a class label with a estimating
where
19
LogitBoost (2)

Multi-class Boosting Framework (2)
• We eliminate the necessity of using the boosting as
feature selector reinforcing the weak learning
strategy: Weighted Regression Trees.
• Adding an extra class with negative examples and
using the rejection cascade*, we can build a robust
multi-class detector.
• We have established an automatic stopping rule for
the learning process:
20* Viola and Jones, CVPR 2001

Weighted Regression Trees
• WRTs* are binary tree which can be applied to
efficiently tackle the weighted nonlinear
regression problem.
• WRTs growth have been limited strongly in
order to use them as weak classifiers.
• Boosting Weights are injected into a WRT to
refine the regression result.
21* L. Breiman et al., Classification and Regression Trees, CRC Press, 1984

Outline
• Experiments
22

Experiments
Datasets
Head Pose Classification:
• QMUL 4 Head Pose Dataset:
– 5 classes (back/front/left/right/background)
– 4000 examples/class automatically collected
– Image resolution: 128x64 pixels
• Additional Examples from INRIA Person Dataset
– 2736 head examples / ~ 2000 background examples
– head examples are manually classified in 4 classes
– Image resolution: 32x pixels
Pedestrian Detection:
• INRIA Person Dataset:
– 3580 pedestrians / 1671 person-free images;
– Pedestrian ROI resolution 128x64 pixels.
23

24
Experiments
Head Pose Classification
4 classes is used from QMUL 4 Head Pose Dataset (no background). Some examples:
Performances using
different feature subsets

Experiments
Head Pose Classification
25
Our, avg =.94, std=.05 Orozco et al., avg =.82, std=.11

• 5 classes is used from QMUL 4 Head
Pose Dataset.
26
Our, avg =.90, std=.09
Orozco et al., avg =.67, std=.36
* Data provided by QMUL * Data is extracted from INRIA Person Dataset
• 5 classes coming from the
previous dataset join extra ~ 500
examples/class for the FG and
~2000 for the BG from a more
general dataset.
Experiments
Head Pose Detection

Experiments
Pedestrian Detection
• We use the INRIA Person Dataset where a person is
contained in a ROI of 128x64 pixels and its actual
average dimension is 50x50 pixels.
• To achieve the best performances in terms of FPPW,
we imply a cascade of 5 levels.
27
True Positive
False Negative
miss rate
True Negative
False Positive
FPPW

Outline
• Experiments
30

Computational considerations
• Fixing the feature layout, ARCO decreases the
computational complexity of the learning phase
of one order of magnitude wrt the state-of-the-
art boosting framework embedding the feature
selection: from to .
• Using as unique projection point we reduce
the compitational complexity of projection from
to .
31
Number of WL per class
Number of classes
Number of candidate features
SVD
Number of candidate features
In time: from 2 weeks to 2 hours.

Computational considerations (2)
• The computation of a image integral tensors takes
, where is the number of features
considered and and are the image
dimensions.
• The complexity of using regression trees as weak
learners (fixing a priori the number of elements
per terminal node) is , with the
number of samples.
32

Conclusions and future work
• We have proposed the novel general-purpose ARCO
descriptor.
• We have built an effective and efficient multi-class
LogitBoost framework able to work on Riemannian
Manifolds.
• SCA is introduced as an effective tool to analyze the
curvature of a Riemannian Manifold.
• ARCO is able to describe visual objects at prohibitive
low resolutions.
• ARCO will be used with more general class of object.
• In the future, we plan to work on devising a more
novel and powerful classification technique, to replace
the LogitBoost framework. 34

Multi-class Classification on Manifolds for Video Surveillance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multi-class Classification on Manifolds for Video Surveillance

Similar to Multi-class Classification on Manifolds for Video Surveillance (20)

Recently uploaded

Recently uploaded (20)

Multi-class Classification on Manifolds for Video Surveillance