CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection

A vne
da cd
Ifr t nT e r i
nomai h oyn
o
C P “ aN t e”
VRi n us l
hl CP
VR
T ti
u rl
oa J n 1 -82 1
u e 31 0 0
S nFa c c ,A
a rn i oC
s
High-Dimensional Feature Selection:
Images, Genes & Graphs

Francisco Escolano

High-Dimensional Feature Selection

Background. Feature selection in CVPR is mainly motivated by: (i)
dimensionality reduction for improving the performance of classiﬁers,
and (ii) feature-subset pursuit for a better understanding/description
of the patterns at hand (either using generative or discriminative
methods).
Exploring the domain of images [Zhu et al.,97] and other patterns
(e.g. micro-array/gene expression data) [Wang and Gotoh,09] implies
dealing with thousands of features [Guyon and Elisseeﬀ, 03]. In terms
of Information Theory (IT) this task demands pursuing the most
informative features, not only individually but also capturing their
statistical interactions, beyond taking into account pairs of features.
This is a big challenge since the early 70’s due to its intrinsic
combinatorial nature.
2/56

Features Classiﬁcation

Strong and Weak Relevance
1. Strong relevance: A feature fi is strongly relevant if its removal
degrades the performance of the Bayes optimum classiﬁer.
2. Weak relevance: fi is weakly relevant if it is not strongly
relevant and there exists a subset of features F such that the
performance on F ∪ {fi } is better than the one on just F .

Redundancy and Noisy
1. Redundant features: those which can be removed because there
is another feature or subset of features which already supply the
same information about the class.
2. Noisy features: not redundant and not informative.
3/56

Features Classiﬁcation (2)

Irrelevant features

Weakly relevant features

Strongly relevant
features

Figure: Feature classiﬁcation

4/56

Wrappers and Filters

Wrappers vs Filters
1. Wrapper feature selection (WFS) consists in selecting features
according to the classification results that these features yield
(e.g. using Cross Validation (CV)). Therefore wrapper feature
selection is a classifier-dependent approach.
2. Filter feature selection (FFS)is classifier independent, as it is
based on statistical analysis on the input variables (features),
given the classification labels of the samples.
3. Wrappers build classifiers each time a feature set has to be
evaluated. This makes them more prone to overfitting than
filters.
4. In FFS the classifier itself is built and tested after FS.
5/56

FS is a Complex Task

The Complexity of FS
The only way to assure that a feature set is optimum (following what
criterion?) is the exhaustive search among feature combinations.
The curse of dimensionality limits this search, as the complexity is
n
n
O(z), z =
i
i=1

Filters design is desirable in order to avoid overﬁtting, but they must
be as multivariate as Wrappers. However, this implies to design and
estimate a cost function capturing the high-order interaction of
many variables. In the following we will focus on FFS.

6/56

Mutual Information Criterion

A Criterion is Needed
The primary problem of feature selection is the criterion which
evaluates a feature set. It must decide whether a feature subset is
suitable for the classiﬁcation problem, or not. The optimal criterion
for such purpose would be the Bayesian error rate for the subset of
selected features:

E (S) = p(S) 1 − max(p(ci |S)) d S,
S i

where S is the vector of selected features and ci ∈ C is a class from
all the possible classes C existing in the data.

7/56

Mutual Information Criterion (2)

Bounding Bayes Error
An upper bound of the Bayesian error, which was obtained by
Hellman and Raviv (1970) is:

H(C |S)
E (S) ≤ .
2
This bound is related to mutual information (MI), because mutual
information can be expressed as

I (S; C ) = H(C ) − H(C |S)

and H(C ) is the entropy of the class labels which do not depend on
the feature subspace S.
8/56


I(X;C)

C
x1

x4 x2 x3

Figure: Redundancy and MI
9/56


MI and KL Divergence
From the deﬁnition of MI we have that:
p(x, y )
I (X ; Y ) = p(x, y ) log
p(x)p(y )
y ∈Y x∈X
p(x|y )
= p(y ) p(x|y ) log
p(x)
y ∈Y x∈X
= EY (KL (p(x|y )||p(x))).

Then, maximizing MI is equivalent to maximizing the expectation of
the KL divergence between the class-conditional densities P(S|C )
and the density of the feature subset P(S).
10/56


The Bottleneck
Estimating mutual information between a high-dimensional
continuous set of features, and the class labels, is not
straightforward, due to the entropy estimation.
Simplifying assumptions: (i) Dependency is not-informative
about the class, (ii) redundancies between features are
independent of the class [Vasconcelos & Vasconcelos, 04].
The basic idea is to simplify the computation of the MI. For
instance:
I (S ∗ ; C ) ≈ I (xi ; C )
xi ∈S

More realistic assumptions involving interactions are needed!
11/56

The mRMR Criterion

Deﬁnition
Maximize the relevance I (xj ; C ) of each individual feature xj ∈ F
and simultaneously minimize the redundancy between xj and the
rest of selected features xi ∈ S, i = j. This criterion is known as the
min-Redundancy Max-Relevance (mRMR) criterion and its
formulation for the selection of the m-th feature is [Peng et al., 2005]:
 
1
max I (xj ; C ) − I (xj ; xi )
xj ∈F −Sm−1 m−1
xi ∈Sm−1

Thus, second-order interactions are considered!

12/56

The mRMR Criterion (2)

Properties
Can be embedded in a forward greedy search, that is, at each
iteration of the algorithm the next feature optimizing the
criterion is selected.
Doing so, this criterion is equivalent to a ﬁrst-order using the
Maximum Depedency criterion:

max I (S; C )
S⊆F

Then, the m-th feature is selected according to:

max I (Sm−1 , xj ; C )
xj ∈F −Sm−1

13/56

Eﬃcient MD

Need of an Entropy Estimator
Whilst in mRMR the mutual information is incrementally
estimated by estimating it between two variables of one
dimension, in MD the estimation of I (S; C ) is not trivial
because S could consist of a large number of features.
If we base MI calculation in terms of the conditional entropy
H(S|C ) we have:

I (S; C ) = H(S) − H(S|C ).

To do this, H(S|C = c)p(C = c) entropies have to be
calculated, Anyway, we must calculate multi-dimensional
entropies!
14/56

Leonenko’s Estimator

Theoretical Bases
Recently Leonenko et al. published an extensive study [Leonenko et
al, 08] about R´nyi and Tsallis entropy estimation, also considering
e
the case of the limit of α → 1 for obtaining the Shannon entropy.
Their construction relies on the integral

Iα = E {f α−1 (X )} = f α (x)dx,
Rd

where f (.) refers to the density of a set of n i.i.d. samples
X = {X1 , X2 , . . . , XN }. The latter integral is valid for α = 1,
however, the limits for α → 1 are also calculated in order to consider
the Shannon entropy estimation.

15/56

Leonenko’s Estimator (2)

Entropy Maximization Distributions
The α-entropy maximizing distributions are only defined for
0 < α < 1, where the entropy Hα is a concave function. The
maximizing distributions are defined under some constraints.
The uniform distribution maximizes α-entropy under the
constraint that the distribution has a finite support. For
distributions with a given covariance matrix the maximizing
distribution is Student-t, if d/(d + 2) < α < 1, for any number
of dimensions d ≥ 1.
This is a generalization of the property that the Gaussian
distribution maximizes the Shannon entropy H.

16/56


R´nyi, Tsallis and Shannon Entropy Estimates
e
For α = 1, the estimated I is:
ˆN,k,α = 1 n 1−α ,
I N i=1 (ζN,i,k )

with
(i)
ζN,i,k = (N − 1)Ck Vd (ρk,N−1 )d ,
(i)
where ρk,N−1 is the Euclidean distance from Xi to its k-th nearest
neighbour from among the resting N − 1 samples.
d
Vd = π 2 /Γ( d + 1) is the volume of the unit ball B(0, 1) in Rd and
2
1
Ck is Ck = [Γ(k)/Γ(k + 1 − α)] 1−α .

17/56


R´nyi, Tsallis and Shannon Entropy Estimates (cont.)
e
The estimator ˆN,k,α is asymptotically unbiased, which is to say
I
ˆN,k,α → Iq as N → ∞. It is also consistent under mild
that E I
conditions.
Given these conditions, the estimated R´nyi entropy Hα of f is
e
ˆ log(ˆN,k,α )
I
HN,k,α = 1−α , α = 1,
1
and for the Tsallis entropy Sα = q−1 (1 − x f α (x)dx) is
ˆ 1−ˆN,k,α
I
SN,k,α = α−1 , α = 1.
and as N → ∞, both estimators are asymptotically unbiased
and consistent.

18/56


Shannon Entropy Estimate
The limit of the Tsallis entropy estimator as α → 1 gives the
Shannon entropy H estimator:
N
ˆ 1
HN,k,1 = log ξN,i,k ,
N
i=1

(i)
ξN,i,k = (N − 1)e −Ψ(k) Vd (ρk,N−1 )d ,
where Ψ(k) is the digamma function:
j 1
Ψ(1) = −γ 0.5772, Ψ(k) = −γ + Ak−1 , A0 = 0, Aj = i=1 i .

19/56

1−D Gaussian distributions 2−D Gaussian distributions

1.45
3

1.4
2.9

1.35
entropy

entropy
2.8

1.3
2.7
5−NN 5−NN
KD−part. KD−part.
1.25 2.6
Gauss. Σ=I Gauss. Σ=I
Gauss. real Σ Gauss. real Σ
1.2 2.5
0 500 1000 1500 2000 0 500 1000 1500 2000
samples samples

3−D Gaussian distributions 4−D Gaussian distributions
4.9
6.8
4.8

4.7 6.6

4.6 5−NN 6.4 5−NN
KD−part. KD−part.
entropy

entropy
4.5 6.2
Gauss. Σ=I Gauss. Σ=I
4.4 Gauss. real Σ Gauss. real Σ
6
4.3
5.8
4.2

4.1 5.6

4 5.4
0 500 1000 1500 2000 0 500 1000 1500 2000
samples samples

Figure: Comparison: k-NN vs k-d partitioning estimator for Gaussians.
20/56

Reminder: Visual Localization

Figure: 3D+2D map for coarse-to-ﬁne localization
21/56

Image Categorization

Problem Statement
Design a mininal complexity and maximal accuracy classifier
based on forward MI-FS for coarse-localization [Lozano et al, 09].
We have a bag of filters (say we have K different filters):
Harris detector, Canny, horizontal and vertical gradient, gradient
magnitude, 12 color filter based on Gaussians over hue range, sparse
depth.
We take the statistics of each filter and compose a histogram
with say B bins.
Therefore, the total number of features NF per image is
NF = K × (B − 1).

22/56

Image Categorization (2)

Figure: Examples of filters responses.From top-bottom and left-right: input
image, depth, vertical gradient, gradient magnitude, and five colour filters.
23/56


Greedy Feature Selection on different feature sets Greedy Feature Selection on 4−bin histograms
60 20
16 filters x 6 bins 8 Classes
16 filters x 4 bins 6 Classes
50
16 filters x 2 bins
4 Classes

10−fold CV error (%)
15
10−fold CV error (%)

40 2 Classes

30 10

20
5
10

0 0
0 20 40 60 80 0 10 20 30 40 50
# Selected Features # Selected Features

Figure: Left: Finding the optima number of bins B for K = 16 ﬁlters.
Right: Testing whether |C | = 6 is a correct choice.

24/56

MD, MmD and mRMR Feature Selection

MD (10−fold CV error)
60 MD (test error)
MmD (10−fold CV error)
MmD (test error)
50 mRMR (10−fold CV error)
mRMR (test error),

40
% error

30

20

10

0
0 2 4 6 8 10 12 14 16 18 20
# features

Figure: FS performance on image histograms data with 48 features.
25/56


C#1 C#2 C#3 C#4 C#5 C#6
C#1 26 0 0 0 0 0
C#2 2/3 63/56 1/4 0/3 0 0
C#3 0 1/0 74/67 1/9 0 0
C#4 4/12 5/6 10/0 96/95 0/2 0
C#5 0 0 0 0 81 0
C#6 0 0 0 0 30/23 78/85
Table: Key question: What is the best classiﬁer after Feature Selection?
K-NN / SVM Confusion Matrix

26/56


Test Image 1st NN 2nd NN 3rd NN 4th NN 5th NN

27/56


Figure: Confusion trajectories at coarse (left) and ﬁne (right) localization

28/56


Figure: Confusion trajectory using only coarse localization

29/56


Figure: Confusion trajectory after coarse-to-ﬁne localization

30/56


Results
With mRMR, the lowest CV error (8.05%) is achieved with a
set of 17 features, while with MD a CV error of 4.16% is
achieved with a set of 13 features.
The CV errors yielded by MmD are very similar to those of MD.
On the other hand, test errors present a slightly diﬀerent
evolution.
The best error is given by MD, and the worst one is given by
mRMR.
This may be explained by the fact that the entropy estimator
we use does not degrade its accuracy as the number of
dimensions increases.
31/56

Gene Selection

Microarray Analysis
The basic setup of a microarray platform is a glass slide which
is spotted with DNA fragments or oligonucleotides representing
some specific gene coding regions. Then, purified RNA is
labelled fluorescently or radioactively and it is hybridized to the
slide.
The hybridization can be done simultaneously with reference
RNA to enable comparison of data with multiple experiments.
Then washing is performed and the raw data is obtained by
scanning with a laser or autoradiographing imaging. The
obtained expression levels of tens of thousands of genes are
numerically quantified and used for statistical studies.

32/56

Gene Selection (2)

Fluorescent
labeling
Sample RNA

Combination
Hybridization

Fluorescent
labeling
Referece ADNc

Figure: An illustration of the basic idea of a microarray platform.

33/56

Gene Selection (3)
MD Feature Selection on Microarray
45

40
LOOCV error (%)
I(S;C)
35
% error and MI

30

25

20

15

10
0 20 40 60 80 100 120 140 160
# features

Figure: MD-FS performance on the NCI microarray data set with 6,380
features. Lowest LOOCV error with a set of 39 features [Bonev et al.,08]
34/56

Gene Selection (4)
Comparison of FS criteria
100
MD criterion
90 MmD criterion
mRMR criterion
LOOCV wrapper
80

70
% LOOCV error

60

50

40

30

20

10

0
5 10 15 20 25 30 35
# features

Figure: Performance of FS criteria on the NCI microarray. Only the
selection of the ﬁrst 36 features (out of 6,380) is represented.
35/56

Gene Selection (5)
MD Feature Selection mRMR Feature Selection
CNS CNS 1
CNS CNS
CNS CNS
RENAL RENAL
BREAST BREAST
CNS CNS
CNS CNS 0.9
BREAST BREAST
NSCLC NSCLC
NSCLC NSCLC
RENAL RENAL
RENAL RENAL
RENAL RENAL 0.8
RENAL RENAL
RENAL RENAL
RENAL RENAL
RENAL RENAL
BREAST BREAST
NSCLC NSCLC
RENAL RENAL 0.7
UNKNOWN UNKNOWN
OVARIAN OVARIAN
MELANOMA MELANOMA
PROSTATE PROSTATE
OVARIAN OVARIAN
OVARIAN OVARIAN 0.6
OVARIAN OVARIAN
OVARIAN OVARIAN
Class (disease)

OVARIAN OVARIAN
PROSTATE PROSTATE
NSCLC NSCLC
NSCLC NSCLC 0.5
NSCLC NSCLC
LEUKEMIA LEUKEMIA
K562B−repro K562B−repro
K562A−repro K562A−repro
LEUKEMIA LEUKEMIA
LEUKEMIA LEUKEMIA
LEUKEMIA LEUKEMIA 0.4
LEUKEMIA LEUKEMIA
LEUKEMIA LEUKEMIA
COLON COLON
COLON COLON
COLON COLON
COLON COLON 0.3
COLON COLON
COLON COLON
COLON COLON
MCF7A−repro MCF7A−repro
BREAST BREAST
MCF7D−repro MCF7D−repro
BREAST BREAST 0.2
NSCLC NSCLC
NSCLC NSCLC
NSCLC NSCLC
MELANOMA MELANOMA
BREAST BREAST
BREAST BREAST 0.1
MELANOMA MELANOMA
MELANOMA MELANOMA
MELANOMA MELANOMA
MELANOMA MELANOMA
MELANOMA MELANOMA
MELANOMA MELANOMA 0
→ 2080

→ 6145

→ 2080

→ 6145
1177
1470
1671

3227
3400
3964
4057
4063
4110
4289
4357
4441
4663
4813
5226
5481
5494
5495
5508
5790
5892
6013
6019
6032
6045
6087

6184
6643

1378
1382
1409
1841

2081
2083
2086
3253
3371
3372
4383
4459
4527
5435
5504
5538
5696
5812
5887
5934
6072
6115
6305
6399
6429
6430
6566
→ 135

→ 135
246
663
766
982

133
134
233
259
381
561
19

Number of Selected Gene Number of Selected Gene

Figure: Feature Selection on the NCI DNA microarray data. Features
(genes) selected by MD and mRMR are marked with an arrow.

36/56

Gene Selection (6)

Results
Leukemia: 38 bone marrow samples, 27 Acute Lymphoblastic
Leukemia (ALL) and 11 Acute Myeloid Leukemia (AML), over 7,129
probes (features) from 6,817 human genes. Also 34 samples testing
data is provided, with 20 ALL and 14 AML. We obtained a 2.94%
test error with 7 features, while other authors report the same error
selecting 49 features via individualized markers. Other recent works
report test errors higher than 5% for the Leukemia data set.
Central Nervous System Embryonal Tumors: 60 patient samples,
21 are survivors and 39 are failures. There are 7,129 genes in the data
set. We selected 9 genes achieving a 1.67% LOOCV error. Best result
in the literature is a 8.33% CV error with 100 features is reported.

37/56

Gene Selection (7)

Results (cont)
Colon: 62 samples collected from colon-cancer patients. Among
them, 40 tumor biopsies are from tumors and 22 normal biopsies are
from healthy parts of the colons of the same patients. From 6,500
genes, 2,000 were selected for this data set, based on the conﬁdence
in the measured expression levels. In the feature selection experiments
we performed, 15 of them were selected, resulting in a 0% LOOCV
error, while recent works report errors higher than 12%.
Prostate: 25 tumor and 9 normal samples. with 12,600 genes.
Our experiments achieved the best error with only 5 features,
with a test error of 5.88%. Other works report test errors
higher than 6%.

38/56

Structural Recognition

Reeb Graphs
Given a surface S and a real function f : S → R, the Reeb
graph (RG), represents the topology of S through a graph
structure whose nodes correspond to the critical points of f .
The Extended Reeb Graph (ERG) [Biasotti, 05] is an
approximation of the RG by using of a ﬁxed number of level
sets (63 in this work) that divide the surface into a set of
regions; critical regions, rather than critical points, are
identiﬁed according to the behaviour of f along level sets; ERG
nodes correspond to critical regions, while the arcs are detected
by tracking the evolution of level sets.

39/56

Structural Recognition (2)

Figure: Extended Reeb Graphs. Image by courtesy of Daniela Giorgi and
Silvia Biasotti. a) Geodesic, b) Distance to center, b) bSphere

40/56


Catalog of Features
Subgraph node centrality: quantiﬁes the degree of participation
of a node i in a subgraph.
n 2 λk
CS (i) = k=1 φk (i) e ,
where where n = |V |, λk the k-th eigenvalue of A and φk its
corresponding eigenvector.
Perron-Frobenius eigenvector: φn (the eigenvector
corresponding to the largest eigenvalue of A). The components
of this vector denote the importance of each node.
Adjacency Spectra: the magnitudes |φk | of the (leading)
eigenvalues of A have been been experimentally validated for
graph embedding [Luo et al.,03].
41/56


Catalog of Features (cont.)
Spectra from Laplacians: both from the un-normalized and
normalized ones. The Laplacian spectrum plays a fundamental
role in the development of regularization kernels for graphs.
Friedler vector: eigenvector corresponding to the ﬁrst
non-trivial eigenvalue of the Laplacian (φ2 in connected
graphs). It encodes the connectivity structure of the graph
(actually is the core of graph-cut methods).
Commute Times either coming from the un-normalized and
normalized Laplacian. They encode the path-length distribution
of the graph.
The Heat-Flow Complexity trace as seen in the section above.
42/56


Backwards Filtering
The entropies H(·) of a set with a large n number of features
can be eﬃciently estimated using the k-NN-based method
developed by Leonenko.
Thus, we take the data set with all its features and determine
which feature to discard in order to produce the smallest
decrease of I (Sn−1 ; C ).
We then repeat the process for the features of the remaining
feature set, until only one feature is left [Bonev et al.,10].
Then, the subset yielding the minimum 10-CV error is selected.
Most of the spectral features are histogrammed into a variable
number of bins: 9 · 3 · (2 + 4 + 6 + 8) = 540 features.
43/56

Experimental Results

Elements
SHREC database (15 classes × 20 objects). Each of the 300
samples is characterized by 540 features, and has a class label
l ∈ {human, cup, glasses, airplane, chair, octopus, table, hand,
fish, bird, spring, armadillo, buste, mechanic, four-leg }
The errors are measured by 10-fold cross validation (10-fold
CV). MI is maximized as the number of selected features grows.
A high number of features degrades the classification
performance.
However the MI curve, as well as the selected features, do not
depend on the classifier, as it is a purely information-theoretic
measure.
44/56


Figure: Classiﬁcation error vs Mutual Information.

45/56


Figure: Feature selection on the 15-class experiment (left) and the feature
statistics for the best-error feature set (right).

46/56


Analysis
For the 15-class problem, the minimal error (23, 3%) is achieved
with a set of 222 features.
The coloured areas in the plot represent how much a feature is
used with respect to the remaining ones (the height on the Y
axis is arbitrary).
For the 15-class experiment, in the feature sets smaller than
100 features, the most important is the Friedler vector, in
combination with the remaining features. Commute time is also
an important feature.Some features that are not relevant are
the node centrality and the complexity ﬂow. Turning our
attention to the graphs type, all three appear relevant.
47/56


Analysis (cont)
We can see that the four different binnings of the features do
have importance for graph characterization.
These conclusions concerning the relevance of each feature
cannot be drawn without performing some additional
experiments with different groups of graph classes.
We perform our different 3-class experiments. The classes share
some structural similarities, for example the 3 classes of the
first experiment have a head and limbs.
Although in each experiment the minimum error is achieved
with very different numbers of features, the participation of
each feature is highly consistent with the 15-class experiment.
48/56


Figure: Feature Selection on 3-class experiments:
Human/Armadillo/Four-legged, Aircraft/Fish/Bird

49/56


Figure: Feature Selection on 3-class experiments: Cup/Bust/Mechanic,
Chair/Octopus/Table.

50/56


Comparison of forward and backward F.S.
80
Forward selection

70 Backward elimination

60
CV error (%)

50

40

30

20
0 100 200 300 400 500 600
# features

Figure: Forward vs Backwards FS

51/56

Bootstrapping on forward FS Bootstrapping on backward FS

Standard deviation Standard deviation
90 90
Test error Test error

80 80

70 70
% test error

% test error
60 60

50 50

40 40

30 30

0 100 200 300 400 500 0 100 200 300 400 500
# features # features

Forward FS: feature coincidences in bootstraps Backward FS: feature coincidences in bootstraps
25 25

20 20
# coincidences

# coincidences
15 15

10 10

5 5

0 0
0 100 200 300 400 500 0 100 200 300 400 500
# selected features # selected features

Figure: Bootstrapping experiments on forward (on the left) and backward
(on the right) feature selection, with 25 bootstraps.

52/56


Conclusions
The main diﬀerence among experiments is that node centrality
seems to be more important for discerning among elongated
sharp objects with long sharp characteristics. Although all three
graph types are relevant, the sphere graph performs best for
blob-shaped objects.
Bootstrapping shows how better is the Backwards method.
The IT approach not only performs good classiﬁcation but also
yield the role of each spectral feature in it!
Given other points of view (size functions, eigenfunctions,...) it
is desirable to exploit them!

53/56

References

[Zhu et al.,97] Zhu, S.C., Wu, Y.N. and Mumford, D (1997). Filters,
Random Fields And Maximum Entropy (FRAME): towads an uniﬁed
theory for texture modeling. IJCV 27 (2) 107–126
[Wang and Gotoh, 2009] Wang, X. and Gotoh, O. (2009). Accurate
molecular classication of cancer using simple rules. BMC Medical
Genomics, 64(2)
[Guyon and Elisseeﬀ, 03] Guyon, I. and Elissee?, A. (2003). An intro-
duction to variable and feature selection. Journal of Machine
Learning Research, volume 3, pages 1157–1182.
[Vasconcelos and Vasconcelos,04] Vasconcelos, N. and Vasconcelos,
M. (2004). Scalable discriminant feature selection for image retrieval
and recognition. In CVPR04, pages 770–775.

54/56

References (2)

[Peng et al., 2005]Peng, H., Long, F., and Ding, C. (2005). Feature
selection based on mutual information: Criteria of max-dependency,
max-relevance, and min-redundancy. In IEEE Trans. on PAMI, 27(8),
pp.1226-1238.
[Leonenko et al, 08] Leonenko, N., Pronzato, L., and Savani, V.
(2008). A class of R´nyi information estimators for multidimensional
e
densities. The Annals of Statistics, 36(5):21532182.
[Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P.,
Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell.
a
based categorization of images with unsupervised graph learning.
Image Vision Comput. 27(7): 960–978

55/56

References (3)

[Bonev et al.,08] Bonev, B., Escolano, F., and Cazorla, M. (2008).
Feature selection, mutual information, and the classication of high-
dimensional patterns. Pattern Analysis and Applications 11(3-4):
304–319.
[Biasotti, 05] Biasotti, S. (2005). Topological coding of surfaces with
boundary using Reeb graphs. Computer Graphics and Geometry,
7(1):3145.
[Luo et al.,03 Luo, B., Wilson, R., and Hancock, E. (2003). Spectral
embedding of graphs. Pattern Recognition, 36(10):22132223
[Bonev et al.,10] Bonev, B., Escolano, F., Giorgi, D., Biasotti, S,
(2010). High-dimensional Spectral Feature Selection for 3D Object
Recognition based on Reeb Graphs, SSPR’2010 (accepted)

56/56

CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (17)

En vedette

En vedette (9)

Similaire à CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection

Similaire à CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection (20)

Plus de zukun

Plus de zukun (20)

Dernier

Dernier (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection