SlideShare une entreprise Scribd logo
1  sur  56
Télécharger pour lire hors ligne
A vne
 da cd
Ifr t nT e r i
nomai h oyn
     o
C P “ aN t e”
 VRi n us l
          hl                             CP
                                          VR
  T ti
   u rl
    oa                                J n 1 -82 1
                                       u e 31 0 0
                                      S nFa c c ,A
                                       a rn i oC
                                             s
High-Dimensional Feature Selection:
Images, Genes & Graphs

Francisco Escolano
High-Dimensional Feature Selection

Background. Feature selection in CVPR is mainly motivated by: (i)
dimensionality reduction for improving the performance of classifiers,
and (ii) feature-subset pursuit for a better understanding/description
of the patterns at hand (either using generative or discriminative
methods).
Exploring the domain of images [Zhu et al.,97] and other patterns
(e.g. micro-array/gene expression data) [Wang and Gotoh,09] implies
dealing with thousands of features [Guyon and Elisseeff, 03]. In terms
of Information Theory (IT) this task demands pursuing the most
informative features, not only individually but also capturing their
statistical interactions, beyond taking into account pairs of features.
This is a big challenge since the early 70’s due to its intrinsic
combinatorial nature.
                                                                          2/56
Features Classification

Strong and Weak Relevance
 1. Strong relevance: A feature fi is strongly relevant if its removal
    degrades the performance of the Bayes optimum classifier.
 2. Weak relevance: fi is weakly relevant if it is not strongly
    relevant and there exists a subset of features F such that the
    performance on F ∪ {fi } is better than the one on just F .

Redundancy and Noisy
 1. Redundant features: those which can be removed because there
    is another feature or subset of features which already supply the
    same information about the class.
 2. Noisy features: not redundant and not informative.
                                                                         3/56
Features Classification (2)

Irrelevant features



        Weakly relevant features



             Strongly relevant
                 features




         Figure: Feature classification

                                               4/56
Wrappers and Filters

Wrappers vs Filters
 1. Wrapper feature selection (WFS) consists in selecting features
    according to the classification results that these features yield
    (e.g. using Cross Validation (CV)). Therefore wrapper feature
    selection is a classifier-dependent approach.
 2. Filter feature selection (FFS)is classifier independent, as it is
    based on statistical analysis on the input variables (features),
    given the classification labels of the samples.
 3. Wrappers build classifiers each time a feature set has to be
    evaluated. This makes them more prone to overfitting than
    filters.
 4. In FFS the classifier itself is built and tested after FS.
                                                                       5/56
FS is a Complex Task

The Complexity of FS
The only way to assure that a feature set is optimum (following what
criterion?) is the exhaustive search among feature combinations.
The curse of dimensionality limits this search, as the complexity is
                                     n
                                          n
                       O(z), z =
                                          i
                                    i=1

Filters design is desirable in order to avoid overfitting, but they must
be as multivariate as Wrappers. However, this implies to design and
estimate a cost function capturing the high-order interaction of
many variables. In the following we will focus on FFS.

                                                                          6/56
Mutual Information Criterion

A Criterion is Needed
The primary problem of feature selection is the criterion which
evaluates a feature set. It must decide whether a feature subset is
suitable for the classification problem, or not. The optimal criterion
for such purpose would be the Bayesian error rate for the subset of
selected features:

             E (S) =        p(S) 1 − max(p(ci |S)) d S,
                        S              i


where S is the vector of selected features and ci ∈ C is a class from
all the possible classes C existing in the data.

                                                                        7/56
Mutual Information Criterion (2)

Bounding Bayes Error
An upper bound of the Bayesian error, which was obtained by
Hellman and Raviv (1970) is:

                                   H(C |S)
                         E (S) ≤           .
                                     2
This bound is related to mutual information (MI), because mutual
information can be expressed as

                    I (S; C ) = H(C ) − H(C |S)

and H(C ) is the entropy of the class labels which do not depend on
the feature subspace S.
                                                                      8/56
Mutual Information Criterion (3)




               I(X;C)


                                  C
x1




     x4         x2      x3


      Figure: Redundancy and MI
                                        9/56
Mutual Information Criterion (4)

MI and KL Divergence
From the definition of MI we have that:
                                                      p(x, y )
           I (X ; Y ) =              p(x, y ) log
                                                     p(x)p(y )
                          y ∈Y x∈X
                                                             p(x|y )
                     =           p(y )         p(x|y ) log
                                                              p(x)
                          y ∈Y           x∈X
                     = EY (KL (p(x|y )||p(x))).

Then, maximizing MI is equivalent to maximizing the expectation of
the KL divergence between the class-conditional densities P(S|C )
and the density of the feature subset P(S).
                                                                       10/56
Mutual Information Criterion (5)

The Bottleneck
    Estimating mutual information between a high-dimensional
    continuous set of features, and the class labels, is not
    straightforward, due to the entropy estimation.
    Simplifying assumptions: (i) Dependency is not-informative
    about the class, (ii) redundancies between features are
    independent of the class [Vasconcelos & Vasconcelos, 04].
    The basic idea is to simplify the computation of the MI. For
    instance:
                          I (S ∗ ; C ) ≈ I (xi ; C )
                                   xi ∈S

    More realistic assumptions involving interactions are needed!
                                                                    11/56
The mRMR Criterion

Definition
Maximize the relevance I (xj ; C ) of each individual feature xj ∈ F
and simultaneously minimize the redundancy between xj and the
rest of selected features xi ∈ S, i = j. This criterion is known as the
min-Redundancy Max-Relevance (mRMR) criterion and its
formulation for the selection of the m-th feature is [Peng et al., 2005]:
                                                              
                                       1
                max I (xj ; C ) −                 I (xj ; xi )
            xj ∈F −Sm−1             m−1
                                           xi ∈Sm−1

Thus, second-order interactions are considered!

                                                                        12/56
The mRMR Criterion (2)

Properties
    Can be embedded in a forward greedy search, that is, at each
    iteration of the algorithm the next feature optimizing the
    criterion is selected.
    Doing so, this criterion is equivalent to a first-order using the
    Maximum Depedency criterion:

                               max I (S; C )
                               S⊆F

    Then, the m-th feature is selected according to:

                           max        I (Sm−1 , xj ; C )
                        xj ∈F −Sm−1

                                                                       13/56
Efficient MD

Need of an Entropy Estimator
    Whilst in mRMR the mutual information is incrementally
    estimated by estimating it between two variables of one
    dimension, in MD the estimation of I (S; C ) is not trivial
    because S could consist of a large number of features.
    If we base MI calculation in terms of the conditional entropy
    H(S|C ) we have:

                      I (S; C ) = H(S) − H(S|C ).

    To do this,   H(S|C = c)p(C = c) entropies have to be
    calculated, Anyway, we must calculate multi-dimensional
    entropies!
                                                                    14/56
Leonenko’s Estimator

Theoretical Bases
Recently Leonenko et al. published an extensive study [Leonenko et
al, 08] about R´nyi and Tsallis entropy estimation, also considering
               e
the case of the limit of α → 1 for obtaining the Shannon entropy.
Their construction relies on the integral

                 Iα = E {f α−1 (X )} =        f α (x)dx,
                                         Rd

where f (.) refers to the density of a set of n i.i.d. samples
X = {X1 , X2 , . . . , XN }. The latter integral is valid for α = 1,
however, the limits for α → 1 are also calculated in order to consider
the Shannon entropy estimation.

                                                                       15/56
Leonenko’s Estimator (2)

Entropy Maximization Distributions
    The α-entropy maximizing distributions are only defined for
    0 < α < 1, where the entropy Hα is a concave function. The
    maximizing distributions are defined under some constraints.
    The uniform distribution maximizes α-entropy under the
    constraint that the distribution has a finite support. For
    distributions with a given covariance matrix the maximizing
    distribution is Student-t, if d/(d + 2) < α < 1, for any number
    of dimensions d ≥ 1.
    This is a generalization of the property that the Gaussian
    distribution maximizes the Shannon entropy H.

                                                                  16/56
Leonenko’s Estimator (3)

R´nyi, Tsallis and Shannon Entropy Estimates
 e
For α = 1, the estimated I is:
                    ˆN,k,α =   1    n            1−α ,
                    I          N    i=1 (ζN,i,k )

with
                                              (i)
                  ζN,i,k = (N − 1)Ck Vd (ρk,N−1 )d ,
       (i)
where ρk,N−1 is the Euclidean distance from Xi to its k-th nearest
neighbour from among the resting N − 1 samples.
       d
Vd = π 2 /Γ( d + 1) is the volume of the unit ball B(0, 1) in Rd and
             2
                                    1
Ck is Ck = [Γ(k)/Γ(k + 1 − α)] 1−α .

                                                                       17/56
Leonenko’s Estimator (4)

R´nyi, Tsallis and Shannon Entropy Estimates (cont.)
 e
    The estimator ˆN,k,α is asymptotically unbiased, which is to say
                   I
           ˆN,k,α → Iq as N → ∞. It is also consistent under mild
    that E I
    conditions.
    Given these conditions, the estimated R´nyi entropy Hα of f is
                                               e
                   ˆ          log(ˆN,k,α )
                                  I
                   HN,k,α =     1−α        , α = 1,
                                        1
    and for the Tsallis entropy Sα =   q−1 (1   −   x   f α (x)dx) is
                    ˆ       1−ˆN,k,α
                              I
                    SN,k,α = α−1 , α = 1.
    and as N → ∞, both estimators are asymptotically unbiased
    and consistent.

                                                                        18/56
Leonenko’s Estimator (5)

Shannon Entropy Estimate
The limit of the Tsallis entropy estimator as α → 1 gives the
Shannon entropy H estimator:
                                    N
                     ˆ        1
                     HN,k,1 =            log ξN,i,k ,
                              N
                                   i=1

                                                 (i)
                ξN,i,k = (N − 1)e −Ψ(k) Vd (ρk,N−1 )d ,
where Ψ(k) is the digamma function:
                                                            j   1
Ψ(1) = −γ 0.5772, Ψ(k) = −γ + Ak−1 , A0 = 0, Aj =           i=1 i .


                                                                      19/56
Leonenko’s Estimator (6)
                                   1−D Gaussian distributions                                                   2−D Gaussian distributions

                   1.45
                                                                                                     3

                       1.4
                                                                                                    2.9


                   1.35
         entropy




                                                                                          entropy
                                                                                                    2.8


                       1.3
                                                                                                    2.7
                                                                   5−NN                                                                         5−NN
                                                                   KD−part.                                                                     KD−part.
                   1.25                                                                             2.6
                                                                   Gauss. Σ=I                                                                   Gauss. Σ=I
                                                                   Gauss. real Σ                                                                Gauss. real Σ
                       1.2                                                                          2.5
                         0   500            1000                1500               2000               0   500            1000                1500               2000
                                           samples                                                                      samples

                                   3−D Gaussian distributions                                                   4−D Gaussian distributions
                       4.9
                                                                                                    6.8
                       4.8

                       4.7                                                                          6.6

                       4.6                                         5−NN                             6.4                                         5−NN
                                                                   KD−part.                                                                     KD−part.
             entropy




                                                                                          entropy
                       4.5                                                                          6.2
                                                                   Gauss. Σ=I                                                                   Gauss. Σ=I
                       4.4                                         Gauss. real Σ                                                                Gauss. real Σ
                                                                                                     6
                       4.3
                                                                                                    5.8
                       4.2

                       4.1                                                                          5.6

                        4                                                                           5.4
                        0    500            1000                1500               2000               0   500            1000                1500               2000
                                           samples                                                                      samples




Figure: Comparison: k-NN vs k-d partitioning estimator for Gaussians.
                                                                                                                                                                       20/56
Reminder: Visual Localization




Figure: 3D+2D map for coarse-to-fine localization
                                                   21/56
Image Categorization

Problem Statement
   Design a mininal complexity and maximal accuracy classifier
   based on forward MI-FS for coarse-localization [Lozano et al, 09].
   We have a bag of filters (say we have K different filters):
   Harris detector, Canny, horizontal and vertical gradient, gradient
   magnitude, 12 color filter based on Gaussians over hue range, sparse
   depth.
   We take the statistics of each filter and compose a histogram
   with say B bins.
   Therefore, the total number of features NF per image is
   NF = K × (B − 1).

                                                                         22/56
Image Categorization (2)




Figure: Examples of filters responses.From top-bottom and left-right: input
image, depth, vertical gradient, gradient magnitude, and five colour filters.
                                                                          23/56
Image Categorization (3)

                                 Greedy Feature Selection on different feature sets                                        Greedy Feature Selection on 4−bin histograms
                        60                                                                                        20
                                                                16 filters x 6 bins                                                                          8 Classes
                                                                16 filters x 4 bins                                                                          6 Classes
                        50
                                                                16 filters x 2 bins
                                                                                                                                                             4 Classes




                                                                                           10−fold CV error (%)
                                                                                                                  15
 10−fold CV error (%)




                        40                                                                                                                                   2 Classes


                        30                                                                                        10


                        20
                                                                                                                   5
                        10

                         0                                                                                         0
                             0             20            40         60                80                               0        10        20        30        40          50
                                                # Selected Features                                                                    # Selected Features


Figure: Left: Finding the optima number of bins B for K = 16 filters.
Right: Testing whether |C | = 6 is a correct choice.

                                                                                                                                                                    24/56
Image Categorization (4)
                                       MD, MmD and mRMR Feature Selection

                                                                    MD (10−fold CV error)
                      60                                            MD (test error)
                                                                    MmD (10−fold CV error)
                                                                    MmD (test error)
                      50                                            mRMR (10−fold CV error)
                                                                    mRMR (test error),


                      40
            % error




                      30



                      20



                      10



                       0
                           0   2   4     6      8       10     12    14      16     18        20
                                                    # features




Figure: FS performance on image histograms data with 48 features.
                                                                                                   25/56
Image Categorization (5)


               C#1      C#2       C#3      C#4      C#5       C#6
       C#1      26       0         0        0        0         0
       C#2     2/3     63/56      1/4      0/3       0         0
       C#3       0      1/0      74/67     1/9       0         0
       C#4     4/12     5/6      10/0     96/95     0/2        0
       C#5       0       0         0        0        81        0
       C#6       0       0         0        0      30/23     78/85
Table: Key question: What is the best classifier after Feature Selection?
K-NN / SVM Confusion Matrix




                                                                           26/56
Image Categorization (6)

Test Image   1st NN   2nd NN   3rd NN   4th NN   5th NN




                                                          27/56
Image Categorization (7)




Figure: Confusion trajectories at coarse (left) and fine (right) localization


                                                                               28/56
Image Categorization (8)




Figure: Confusion trajectory using only coarse localization

                                                              29/56
Image Categorization (9)




Figure: Confusion trajectory after coarse-to-fine localization

                                                                30/56
Image Categorization (10)

Results
    With mRMR, the lowest CV error (8.05%) is achieved with a
    set of 17 features, while with MD a CV error of 4.16% is
    achieved with a set of 13 features.
    The CV errors yielded by MmD are very similar to those of MD.
    On the other hand, test errors present a slightly different
    evolution.
    The best error is given by MD, and the worst one is given by
    mRMR.
    This may be explained by the fact that the entropy estimator
    we use does not degrade its accuracy as the number of
    dimensions increases.
                                                                   31/56
Gene Selection

Microarray Analysis
    The basic setup of a microarray platform is a glass slide which
    is spotted with DNA fragments or oligonucleotides representing
    some specific gene coding regions. Then, purified RNA is
    labelled fluorescently or radioactively and it is hybridized to the
    slide.
    The hybridization can be done simultaneously with reference
    RNA to enable comparison of data with multiple experiments.
    Then washing is performed and the raw data is obtained by
    scanning with a laser or autoradiographing imaging. The
    obtained expression levels of tens of thousands of genes are
    numerically quantified and used for statistical studies.

                                                                     32/56
Gene Selection (2)


                 Fluorescent
                  labeling
 Sample RNA

                                                    Combination
                                                   Hybridization




                 Fluorescent
                  labeling
Referece ADNc


    Figure: An illustration of the basic idea of a microarray platform.


                                                                          33/56
Gene Selection (3)
                                                     MD Feature Selection on Microarray
                                  45



                                  40
                                                                                          LOOCV error (%)
                                                                                          I(S;C)
                                  35
                 % error and MI




                                  30



                                  25



                                  20



                                  15



                                  10
                                       0   20   40        60        80      100      120       140      160
                                                                 # features




Figure: MD-FS performance on the NCI microarray data set with 6,380
features. Lowest LOOCV error with a set of 39 features [Bonev et al.,08]
                                                                                                              34/56
Gene Selection (4)
                                               Comparison of FS criteria
                                100
                                                                                MD criterion
                                 90                                             MmD criterion
                                                                                mRMR criterion
                                                                                LOOCV wrapper
                                 80


                                 70
                % LOOCV error




                                 60


                                 50


                                 40


                                 30


                                 20


                                 10


                                  0
                                      5   10     15           20           25     30         35
                                                      # features




Figure: Performance of FS criteria on the NCI microarray. Only the
selection of the first 36 features (out of 6,380) is represented.
                                                                                                  35/56
Gene Selection (5)
                                                MD Feature Selection                    mRMR Feature Selection
                                       CNS                                      CNS                               1
                                       CNS                                      CNS
                                       CNS                                      CNS
                                     RENAL                                    RENAL
                                    BREAST                                   BREAST
                                       CNS                                      CNS
                                       CNS                                      CNS                               0.9
                                    BREAST                                   BREAST
                                     NSCLC                                    NSCLC
                                     NSCLC                                    NSCLC
                                     RENAL                                    RENAL
                                     RENAL                                    RENAL
                                     RENAL                                    RENAL                               0.8
                                     RENAL                                    RENAL
                                     RENAL                                    RENAL
                                     RENAL                                    RENAL
                                     RENAL                                    RENAL
                                    BREAST                                   BREAST
                                     NSCLC                                    NSCLC
                                     RENAL                                    RENAL                               0.7
                                 UNKNOWN                                  UNKNOWN
                                   OVARIAN                                  OVARIAN
                                 MELANOMA                                 MELANOMA
                                 PROSTATE                                 PROSTATE
                                   OVARIAN                                  OVARIAN
                                   OVARIAN                                  OVARIAN                               0.6
                                   OVARIAN                                  OVARIAN
                                   OVARIAN                                  OVARIAN
              Class (disease)




                                   OVARIAN                                  OVARIAN
                                 PROSTATE                                 PROSTATE
                                     NSCLC                                    NSCLC
                                     NSCLC                                    NSCLC                               0.5
                                     NSCLC                                    NSCLC
                                  LEUKEMIA                                 LEUKEMIA
                                 K562B−repro                              K562B−repro
                                 K562A−repro                              K562A−repro
                                  LEUKEMIA                                 LEUKEMIA
                                  LEUKEMIA                                 LEUKEMIA
                                  LEUKEMIA                                 LEUKEMIA                               0.4
                                  LEUKEMIA                                 LEUKEMIA
                                  LEUKEMIA                                 LEUKEMIA
                                     COLON                                    COLON
                                     COLON                                    COLON
                                     COLON                                    COLON
                                     COLON                                    COLON                               0.3
                                     COLON                                    COLON
                                     COLON                                    COLON
                                     COLON                                    COLON
                                MCF7A−repro                              MCF7A−repro
                                    BREAST                                   BREAST
                                MCF7D−repro                              MCF7D−repro
                                    BREAST                                   BREAST                               0.2
                                     NSCLC                                    NSCLC
                                     NSCLC                                    NSCLC
                                     NSCLC                                    NSCLC
                                 MELANOMA                                 MELANOMA
                                    BREAST                                   BREAST
                                    BREAST                                   BREAST                               0.1
                                 MELANOMA                                 MELANOMA
                                 MELANOMA                                 MELANOMA
                                 MELANOMA                                 MELANOMA
                                 MELANOMA                                 MELANOMA
                                 MELANOMA                                 MELANOMA
                                 MELANOMA                                 MELANOMA                                0
                                           → 2080




                                           → 6145




                                                                                    → 2080




                                                                                    → 6145
                                             1177
                                             1470
                                             1671

                                             3227
                                             3400
                                             3964
                                             4057
                                             4063
                                             4110
                                             4289
                                             4357
                                             4441
                                             4663
                                             4813
                                             5226
                                             5481
                                             5494
                                             5495
                                             5508
                                             5790
                                             5892
                                             6013
                                             6019
                                             6032
                                             6045
                                             6087

                                             6184
                                             6643




                                                                                      1378
                                                                                      1382
                                                                                      1409
                                                                                      1841

                                                                                      2081
                                                                                      2083
                                                                                      2086
                                                                                      3253
                                                                                      3371
                                                                                      3372
                                                                                      4383
                                                                                      4459
                                                                                      4527
                                                                                      5435
                                                                                      5504
                                                                                      5538
                                                                                      5696
                                                                                      5812
                                                                                      5887
                                                                                      5934
                                                                                      6072
                                                                                      6115
                                                                                      6305
                                                                                      6399
                                                                                      6429
                                                                                      6430
                                                                                      6566
                                           → 135




                                                                                    → 135
                                              246
                                              663
                                              766
                                              982




                                                                                       133
                                                                                       134
                                                                                       233
                                                                                       259
                                                                                       381
                                                                                       561
                                               19




                                               Number of Selected Gene                  Number of Selected Gene




Figure: Feature Selection on the NCI DNA microarray data. Features
(genes) selected by MD and mRMR are marked with an arrow.

                                                                                                                        36/56
Gene Selection (6)

Results
    Leukemia: 38 bone marrow samples, 27 Acute Lymphoblastic
    Leukemia (ALL) and 11 Acute Myeloid Leukemia (AML), over 7,129
    probes (features) from 6,817 human genes. Also 34 samples testing
    data is provided, with 20 ALL and 14 AML. We obtained a 2.94%
    test error with 7 features, while other authors report the same error
    selecting 49 features via individualized markers. Other recent works
    report test errors higher than 5% for the Leukemia data set.
    Central Nervous System Embryonal Tumors: 60 patient samples,
    21 are survivors and 39 are failures. There are 7,129 genes in the data
    set. We selected 9 genes achieving a 1.67% LOOCV error. Best result
    in the literature is a 8.33% CV error with 100 features is reported.

                                                                            37/56
Gene Selection (7)

Results (cont)
    Colon: 62 samples collected from colon-cancer patients. Among
    them, 40 tumor biopsies are from tumors and 22 normal biopsies are
    from healthy parts of the colons of the same patients. From 6,500
    genes, 2,000 were selected for this data set, based on the confidence
    in the measured expression levels. In the feature selection experiments
    we performed, 15 of them were selected, resulting in a 0% LOOCV
    error, while recent works report errors higher than 12%.
    Prostate: 25 tumor and 9 normal samples. with 12,600 genes.
    Our experiments achieved the best error with only 5 features,
    with a test error of 5.88%. Other works report test errors
    higher than 6%.

                                                                          38/56
Structural Recognition

Reeb Graphs
   Given a surface S and a real function f : S → R, the Reeb
   graph (RG), represents the topology of S through a graph
   structure whose nodes correspond to the critical points of f .
   The Extended Reeb Graph (ERG) [Biasotti, 05] is an
   approximation of the RG by using of a fixed number of level
   sets (63 in this work) that divide the surface into a set of
   regions; critical regions, rather than critical points, are
   identified according to the behaviour of f along level sets; ERG
   nodes correspond to critical regions, while the arcs are detected
   by tracking the evolution of level sets.

                                                                    39/56
Structural Recognition (2)




Figure: Extended Reeb Graphs. Image by courtesy of Daniela Giorgi and
Silvia Biasotti. a) Geodesic, b) Distance to center, b) bSphere



                                                                        40/56
Structural Recognition (3)

Catalog of Features
    Subgraph node centrality: quantifies the degree of participation
    of a node i in a subgraph.
                                  n         2 λk
                       CS (i) =   k=1 φk (i) e ,
    where where n = |V |, λk the k-th eigenvalue of A and φk its
    corresponding eigenvector.
    Perron-Frobenius eigenvector: φn (the eigenvector
    corresponding to the largest eigenvalue of A). The components
    of this vector denote the importance of each node.
    Adjacency Spectra: the magnitudes |φk | of the (leading)
    eigenvalues of A have been been experimentally validated for
    graph embedding [Luo et al.,03].
                                                                   41/56
Structural Recognition (4)

Catalog of Features (cont.)
    Spectra from Laplacians: both from the un-normalized and
    normalized ones. The Laplacian spectrum plays a fundamental
    role in the development of regularization kernels for graphs.
    Friedler vector: eigenvector corresponding to the first
    non-trivial eigenvalue of the Laplacian (φ2 in connected
    graphs). It encodes the connectivity structure of the graph
    (actually is the core of graph-cut methods).
    Commute Times either coming from the un-normalized and
    normalized Laplacian. They encode the path-length distribution
    of the graph.
    The Heat-Flow Complexity trace as seen in the section above.
                                                                    42/56
Structural Recognition (5)

Backwards Filtering
    The entropies H(·) of a set with a large n number of features
    can be efficiently estimated using the k-NN-based method
    developed by Leonenko.
    Thus, we take the data set with all its features and determine
    which feature to discard in order to produce the smallest
    decrease of I (Sn−1 ; C ).
    We then repeat the process for the features of the remaining
    feature set, until only one feature is left [Bonev et al.,10].
    Then, the subset yielding the minimum 10-CV error is selected.
    Most of the spectral features are histogrammed into a variable
    number of bins: 9 · 3 · (2 + 4 + 6 + 8) = 540 features.
                                                                     43/56
Experimental Results

Elements
   SHREC database (15 classes × 20 objects). Each of the 300
   samples is characterized by 540 features, and has a class label
   l ∈ {human, cup, glasses, airplane, chair, octopus, table, hand,
   fish, bird, spring, armadillo, buste, mechanic, four-leg }
   The errors are measured by 10-fold cross validation (10-fold
   CV). MI is maximized as the number of selected features grows.
   A high number of features degrades the classification
   performance.
   However the MI curve, as well as the selected features, do not
   depend on the classifier, as it is a purely information-theoretic
   measure.
                                                                      44/56
Structural Recognition (7)




Figure: Classification error vs Mutual Information.

                                                     45/56
Structural Recognition (8)




Figure: Feature selection on the 15-class experiment (left) and the feature
statistics for the best-error feature set (right).


                                                                          46/56
Structural Recognition (9)

Analysis
    For the 15-class problem, the minimal error (23, 3%) is achieved
    with a set of 222 features.
    The coloured areas in the plot represent how much a feature is
    used with respect to the remaining ones (the height on the Y
    axis is arbitrary).
    For the 15-class experiment, in the feature sets smaller than
    100 features, the most important is the Friedler vector, in
    combination with the remaining features. Commute time is also
    an important feature.Some features that are not relevant are
    the node centrality and the complexity flow. Turning our
    attention to the graphs type, all three appear relevant.
                                                                     47/56
Structural Recognition (10)

Analysis (cont)
    We can see that the four different binnings of the features do
    have importance for graph characterization.
    These conclusions concerning the relevance of each feature
    cannot be drawn without performing some additional
    experiments with different groups of graph classes.
    We perform our different 3-class experiments. The classes share
    some structural similarities, for example the 3 classes of the
    first experiment have a head and limbs.
    Although in each experiment the minimum error is achieved
    with very different numbers of features, the participation of
    each feature is highly consistent with the 15-class experiment.
                                                                      48/56
Structural Recognition (11)




Figure: Feature Selection on 3-class experiments:
Human/Armadillo/Four-legged, Aircraft/Fish/Bird


                                                       49/56
Structural Recognition (12)




Figure: Feature Selection on 3-class experiments: Cup/Bust/Mechanic,
Chair/Octopus/Table.

                                                                       50/56
Structural Recognition (14)

                                 Comparison of forward and backward F.S.
               80
                                                             Forward selection

               70                                            Backward elimination



               60
CV error (%)




               50



               40



               30



               20
                    0      100          200       300        400       500          600
                                               # features



                        Figure: Forward vs Backwards FS

                                                                                          51/56
Structural Recognition (15)
                                                  Bootstrapping on forward FS                                                          Bootstrapping on backward FS

                                                                                Standard deviation                                                                  Standard deviation
                                 90                                                                                   90
                                                                                Test error                                                                          Test error

                                 80                                                                                   80


                                 70                                                                                   70
                % test error




                                                                                                     % test error
                                 60                                                                                   60


                                 50                                                                                   50


                                 40                                                                                   40


                                 30                                                                                   30

                                      0   100         200          300          400         500                            0   100         200          300          400         500
                                                            # features                                                                           # features


                                          Forward FS: feature coincidences in bootstraps                                       Backward FS: feature coincidences in bootstraps
                                 25                                                                                   25




                                 20                                                                                   20
                # coincidences




                                                                                                     # coincidences
                                 15                                                                                   15




                                 10                                                                                   10




                                  5                                                                                    5




                                  0                                                                                    0
                                      0   100        200          300           400         500                            0   100         200          300          400         500
                                                      # selected features                                                                   # selected features




Figure: Bootstrapping experiments on forward (on the left) and backward
(on the right) feature selection, with 25 bootstraps.

                                                                                                                                                                                         52/56
Structural Recognition (16)

Conclusions
    The main difference among experiments is that node centrality
    seems to be more important for discerning among elongated
    sharp objects with long sharp characteristics. Although all three
    graph types are relevant, the sphere graph performs best for
    blob-shaped objects.
    Bootstrapping shows how better is the Backwards method.
    The IT approach not only performs good classification but also
    yield the role of each spectral feature in it!
    Given other points of view (size functions, eigenfunctions,...) it
    is desirable to exploit them!

                                                                     53/56
References

[Zhu et al.,97] Zhu, S.C., Wu, Y.N. and Mumford, D (1997). Filters,
Random Fields And Maximum Entropy (FRAME): towads an unified
theory for texture modeling. IJCV 27 (2) 107–126
[Wang and Gotoh, 2009] Wang, X. and Gotoh, O. (2009). Accurate
molecular classication of cancer using simple rules. BMC Medical
Genomics, 64(2)
[Guyon and Elisseeff, 03] Guyon, I. and Elissee?, A. (2003). An intro-
duction to variable and feature selection. Journal of Machine
Learning Research, volume 3, pages 1157–1182.
[Vasconcelos and Vasconcelos,04] Vasconcelos, N. and Vasconcelos,
M. (2004). Scalable discriminant feature selection for image retrieval
and recognition. In CVPR04, pages 770–775.

                                                                         54/56
References (2)

[Peng et al., 2005]Peng, H., Long, F., and Ding, C. (2005). Feature
selection based on mutual information: Criteria of max-dependency,
max-relevance, and min-redundancy. In IEEE Trans. on PAMI, 27(8),
pp.1226-1238.
[Leonenko et al, 08] Leonenko, N., Pronzato, L., and Savani, V.
(2008). A class of R´nyi information estimators for multidimensional
                     e
densities. The Annals of Statistics, 36(5):21532182.
[Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P.,
Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell.
               a
based categorization of images with unsupervised graph learning.
Image Vision Comput. 27(7): 960–978


                                                                       55/56
References (3)

[Bonev et al.,08] Bonev, B., Escolano, F., and Cazorla, M. (2008).
Feature selection, mutual information, and the classication of high-
dimensional patterns. Pattern Analysis and Applications 11(3-4):
304–319.
[Biasotti, 05] Biasotti, S. (2005). Topological coding of surfaces with
boundary using Reeb graphs. Computer Graphics and Geometry,
7(1):3145.
[Luo et al.,03 Luo, B., Wilson, R., and Hancock, E. (2003). Spectral
embedding of graphs. Pattern Recognition, 36(10):22132223
[Bonev et al.,10] Bonev, B., Escolano, F., Giorgi, D., Biasotti, S,
(2010). High-dimensional Spectral Feature Selection for 3D Object
Recognition based on Reeb Graphs, SSPR’2010 (accepted)

                                                                       56/56

Contenu connexe

Tendances

Tensor Decomposition and its Applications
Tensor Decomposition and its ApplicationsTensor Decomposition and its Applications
Tensor Decomposition and its ApplicationsKeisuke OTAKI
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Matthew Leingang
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability DistributionsE-tan
 
7 - Model Assessment and Selection
7 - Model Assessment and Selection7 - Model Assessment and Selection
7 - Model Assessment and SelectionNikita Zhiltsov
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325Yi-Hsin Liu
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Pricing interest rate derivatives (ext)
Pricing interest rate derivatives (ext)Pricing interest rate derivatives (ext)
Pricing interest rate derivatives (ext)Swati Mital
 
Local and Stochastic volatility
Local and Stochastic volatilityLocal and Stochastic volatility
Local and Stochastic volatilitySwati Mital
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
レポート深層学習前編
レポート深層学習前編レポート深層学習前編
レポート深層学習前編ssuser9d95b3
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Face Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationFace Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationIDES Editor
 

Tendances (17)

Tensor Decomposition and its Applications
Tensor Decomposition and its ApplicationsTensor Decomposition and its Applications
Tensor Decomposition and its Applications
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)
 
Discrete Probability Distributions
Discrete  Probability DistributionsDiscrete  Probability Distributions
Discrete Probability Distributions
 
7 - Model Assessment and Selection
7 - Model Assessment and Selection7 - Model Assessment and Selection
7 - Model Assessment and Selection
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Spectral measures valentin
Spectral measures valentinSpectral measures valentin
Spectral measures valentin
 
Ejsr 86 3
Ejsr 86 3Ejsr 86 3
Ejsr 86 3
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Pricing interest rate derivatives (ext)
Pricing interest rate derivatives (ext)Pricing interest rate derivatives (ext)
Pricing interest rate derivatives (ext)
 
Probability distributionv1
Probability distributionv1Probability distributionv1
Probability distributionv1
 
Local and Stochastic volatility
Local and Stochastic volatilityLocal and Stochastic volatility
Local and Stochastic volatility
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Lesson 1: Functions
Lesson 1: FunctionsLesson 1: Functions
Lesson 1: Functions
 
レポート深層学習前編
レポート深層学習前編レポート深層学習前編
レポート深層学習前編
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Face Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationFace Recognition Using Sign Only Correlation
Face Recognition Using Sign Only Correlation
 

En vedette

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registration
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, RegistrationCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registration
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registrationzukun
 
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...zukun
 
Cvpr2007 object category recognition p0 - introduction
Cvpr2007 object category recognition   p0 - introductionCvpr2007 object category recognition   p0 - introduction
Cvpr2007 object category recognition p0 - introductionzukun
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
 
Lisieux 2011 album - bis
Lisieux 2011   album - bisLisieux 2011   album - bis
Lisieux 2011 album - bisOutremeuse
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest PointsCVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Pointszukun
 
cvpr2011: game theory in CVPR part 2
cvpr2011: game theory in CVPR part 2cvpr2011: game theory in CVPR part 2
cvpr2011: game theory in CVPR part 2zukun
 
Lecture30
Lecture30Lecture30
Lecture30zukun
 

En vedette (9)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registration
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, RegistrationCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registration
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: Isocontours, Registration
 
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
 
Cvpr2007 object category recognition p0 - introduction
Cvpr2007 object category recognition   p0 - introductionCvpr2007 object category recognition   p0 - introduction
Cvpr2007 object category recognition p0 - introduction
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
Envoi bari
Envoi bariEnvoi bari
Envoi bari
 
Lisieux 2011 album - bis
Lisieux 2011   album - bisLisieux 2011   album - bis
Lisieux 2011 album - bis
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest PointsCVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points
CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points
 
cvpr2011: game theory in CVPR part 2
cvpr2011: game theory in CVPR part 2cvpr2011: game theory in CVPR part 2
cvpr2011: game theory in CVPR part 2
 
Lecture30
Lecture30Lecture30
Lecture30
 

Similaire à CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection

isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionIOSR Journals
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
LDA classifier: Tutorial
LDA classifier: TutorialLDA classifier: Tutorial
LDA classifier: TutorialAlaa Tharwat
 
Dh31504508
Dh31504508Dh31504508
Dh31504508IJMER
 
The Fuzzy Logical Databases
The Fuzzy Logical DatabasesThe Fuzzy Logical Databases
The Fuzzy Logical DatabasesAlaaZ
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware RecommendationYONG ZHENG
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Riccardo Satta
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 

Similaire à CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection (20)

isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Ajas11 alok
Ajas11 alokAjas11 alok
Ajas11 alok
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
Prac ex'cises 3[1].5
Prac ex'cises 3[1].5Prac ex'cises 3[1].5
Prac ex'cises 3[1].5
 
Feature Selection
Feature Selection Feature Selection
Feature Selection
 
Fuzzy
FuzzyFuzzy
Fuzzy
 
LDA classifier: Tutorial
LDA classifier: TutorialLDA classifier: Tutorial
LDA classifier: Tutorial
 
Dh31504508
Dh31504508Dh31504508
Dh31504508
 
The Fuzzy Logical Databases
The Fuzzy Logical DatabasesThe Fuzzy Logical Databases
The Fuzzy Logical Databases
 
[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation[WISE 2015] Similarity-Based Context-aware Recommendation
[WISE 2015] Similarity-Based Context-aware Recommendation
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
Svm dbeth
Svm dbethSvm dbeth
Svm dbeth
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 

Dernier

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Dernier (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

CVPR2010: Advanced ITinCVPR in a Nutshell: part 3: Feature Selection

  • 1. A vne da cd Ifr t nT e r i nomai h oyn o C P “ aN t e” VRi n us l hl CP VR T ti u rl oa J n 1 -82 1 u e 31 0 0 S nFa c c ,A a rn i oC s High-Dimensional Feature Selection: Images, Genes & Graphs Francisco Escolano
  • 2. High-Dimensional Feature Selection Background. Feature selection in CVPR is mainly motivated by: (i) dimensionality reduction for improving the performance of classifiers, and (ii) feature-subset pursuit for a better understanding/description of the patterns at hand (either using generative or discriminative methods). Exploring the domain of images [Zhu et al.,97] and other patterns (e.g. micro-array/gene expression data) [Wang and Gotoh,09] implies dealing with thousands of features [Guyon and Elisseeff, 03]. In terms of Information Theory (IT) this task demands pursuing the most informative features, not only individually but also capturing their statistical interactions, beyond taking into account pairs of features. This is a big challenge since the early 70’s due to its intrinsic combinatorial nature. 2/56
  • 3. Features Classification Strong and Weak Relevance 1. Strong relevance: A feature fi is strongly relevant if its removal degrades the performance of the Bayes optimum classifier. 2. Weak relevance: fi is weakly relevant if it is not strongly relevant and there exists a subset of features F such that the performance on F ∪ {fi } is better than the one on just F . Redundancy and Noisy 1. Redundant features: those which can be removed because there is another feature or subset of features which already supply the same information about the class. 2. Noisy features: not redundant and not informative. 3/56
  • 4. Features Classification (2) Irrelevant features Weakly relevant features Strongly relevant features Figure: Feature classification 4/56
  • 5. Wrappers and Filters Wrappers vs Filters 1. Wrapper feature selection (WFS) consists in selecting features according to the classification results that these features yield (e.g. using Cross Validation (CV)). Therefore wrapper feature selection is a classifier-dependent approach. 2. Filter feature selection (FFS)is classifier independent, as it is based on statistical analysis on the input variables (features), given the classification labels of the samples. 3. Wrappers build classifiers each time a feature set has to be evaluated. This makes them more prone to overfitting than filters. 4. In FFS the classifier itself is built and tested after FS. 5/56
  • 6. FS is a Complex Task The Complexity of FS The only way to assure that a feature set is optimum (following what criterion?) is the exhaustive search among feature combinations. The curse of dimensionality limits this search, as the complexity is n n O(z), z = i i=1 Filters design is desirable in order to avoid overfitting, but they must be as multivariate as Wrappers. However, this implies to design and estimate a cost function capturing the high-order interaction of many variables. In the following we will focus on FFS. 6/56
  • 7. Mutual Information Criterion A Criterion is Needed The primary problem of feature selection is the criterion which evaluates a feature set. It must decide whether a feature subset is suitable for the classification problem, or not. The optimal criterion for such purpose would be the Bayesian error rate for the subset of selected features: E (S) = p(S) 1 − max(p(ci |S)) d S, S i where S is the vector of selected features and ci ∈ C is a class from all the possible classes C existing in the data. 7/56
  • 8. Mutual Information Criterion (2) Bounding Bayes Error An upper bound of the Bayesian error, which was obtained by Hellman and Raviv (1970) is: H(C |S) E (S) ≤ . 2 This bound is related to mutual information (MI), because mutual information can be expressed as I (S; C ) = H(C ) − H(C |S) and H(C ) is the entropy of the class labels which do not depend on the feature subspace S. 8/56
  • 9. Mutual Information Criterion (3) I(X;C) C x1 x4 x2 x3 Figure: Redundancy and MI 9/56
  • 10. Mutual Information Criterion (4) MI and KL Divergence From the definition of MI we have that: p(x, y ) I (X ; Y ) = p(x, y ) log p(x)p(y ) y ∈Y x∈X p(x|y ) = p(y ) p(x|y ) log p(x) y ∈Y x∈X = EY (KL (p(x|y )||p(x))). Then, maximizing MI is equivalent to maximizing the expectation of the KL divergence between the class-conditional densities P(S|C ) and the density of the feature subset P(S). 10/56
  • 11. Mutual Information Criterion (5) The Bottleneck Estimating mutual information between a high-dimensional continuous set of features, and the class labels, is not straightforward, due to the entropy estimation. Simplifying assumptions: (i) Dependency is not-informative about the class, (ii) redundancies between features are independent of the class [Vasconcelos & Vasconcelos, 04]. The basic idea is to simplify the computation of the MI. For instance: I (S ∗ ; C ) ≈ I (xi ; C ) xi ∈S More realistic assumptions involving interactions are needed! 11/56
  • 12. The mRMR Criterion Definition Maximize the relevance I (xj ; C ) of each individual feature xj ∈ F and simultaneously minimize the redundancy between xj and the rest of selected features xi ∈ S, i = j. This criterion is known as the min-Redundancy Max-Relevance (mRMR) criterion and its formulation for the selection of the m-th feature is [Peng et al., 2005]:   1 max I (xj ; C ) − I (xj ; xi ) xj ∈F −Sm−1 m−1 xi ∈Sm−1 Thus, second-order interactions are considered! 12/56
  • 13. The mRMR Criterion (2) Properties Can be embedded in a forward greedy search, that is, at each iteration of the algorithm the next feature optimizing the criterion is selected. Doing so, this criterion is equivalent to a first-order using the Maximum Depedency criterion: max I (S; C ) S⊆F Then, the m-th feature is selected according to: max I (Sm−1 , xj ; C ) xj ∈F −Sm−1 13/56
  • 14. Efficient MD Need of an Entropy Estimator Whilst in mRMR the mutual information is incrementally estimated by estimating it between two variables of one dimension, in MD the estimation of I (S; C ) is not trivial because S could consist of a large number of features. If we base MI calculation in terms of the conditional entropy H(S|C ) we have: I (S; C ) = H(S) − H(S|C ). To do this, H(S|C = c)p(C = c) entropies have to be calculated, Anyway, we must calculate multi-dimensional entropies! 14/56
  • 15. Leonenko’s Estimator Theoretical Bases Recently Leonenko et al. published an extensive study [Leonenko et al, 08] about R´nyi and Tsallis entropy estimation, also considering e the case of the limit of α → 1 for obtaining the Shannon entropy. Their construction relies on the integral Iα = E {f α−1 (X )} = f α (x)dx, Rd where f (.) refers to the density of a set of n i.i.d. samples X = {X1 , X2 , . . . , XN }. The latter integral is valid for α = 1, however, the limits for α → 1 are also calculated in order to consider the Shannon entropy estimation. 15/56
  • 16. Leonenko’s Estimator (2) Entropy Maximization Distributions The α-entropy maximizing distributions are only defined for 0 < α < 1, where the entropy Hα is a concave function. The maximizing distributions are defined under some constraints. The uniform distribution maximizes α-entropy under the constraint that the distribution has a finite support. For distributions with a given covariance matrix the maximizing distribution is Student-t, if d/(d + 2) < α < 1, for any number of dimensions d ≥ 1. This is a generalization of the property that the Gaussian distribution maximizes the Shannon entropy H. 16/56
  • 17. Leonenko’s Estimator (3) R´nyi, Tsallis and Shannon Entropy Estimates e For α = 1, the estimated I is: ˆN,k,α = 1 n 1−α , I N i=1 (ζN,i,k ) with (i) ζN,i,k = (N − 1)Ck Vd (ρk,N−1 )d , (i) where ρk,N−1 is the Euclidean distance from Xi to its k-th nearest neighbour from among the resting N − 1 samples. d Vd = π 2 /Γ( d + 1) is the volume of the unit ball B(0, 1) in Rd and 2 1 Ck is Ck = [Γ(k)/Γ(k + 1 − α)] 1−α . 17/56
  • 18. Leonenko’s Estimator (4) R´nyi, Tsallis and Shannon Entropy Estimates (cont.) e The estimator ˆN,k,α is asymptotically unbiased, which is to say I ˆN,k,α → Iq as N → ∞. It is also consistent under mild that E I conditions. Given these conditions, the estimated R´nyi entropy Hα of f is e ˆ log(ˆN,k,α ) I HN,k,α = 1−α , α = 1, 1 and for the Tsallis entropy Sα = q−1 (1 − x f α (x)dx) is ˆ 1−ˆN,k,α I SN,k,α = α−1 , α = 1. and as N → ∞, both estimators are asymptotically unbiased and consistent. 18/56
  • 19. Leonenko’s Estimator (5) Shannon Entropy Estimate The limit of the Tsallis entropy estimator as α → 1 gives the Shannon entropy H estimator: N ˆ 1 HN,k,1 = log ξN,i,k , N i=1 (i) ξN,i,k = (N − 1)e −Ψ(k) Vd (ρk,N−1 )d , where Ψ(k) is the digamma function: j 1 Ψ(1) = −γ 0.5772, Ψ(k) = −γ + Ak−1 , A0 = 0, Aj = i=1 i . 19/56
  • 20. Leonenko’s Estimator (6) 1−D Gaussian distributions 2−D Gaussian distributions 1.45 3 1.4 2.9 1.35 entropy entropy 2.8 1.3 2.7 5−NN 5−NN KD−part. KD−part. 1.25 2.6 Gauss. Σ=I Gauss. Σ=I Gauss. real Σ Gauss. real Σ 1.2 2.5 0 500 1000 1500 2000 0 500 1000 1500 2000 samples samples 3−D Gaussian distributions 4−D Gaussian distributions 4.9 6.8 4.8 4.7 6.6 4.6 5−NN 6.4 5−NN KD−part. KD−part. entropy entropy 4.5 6.2 Gauss. Σ=I Gauss. Σ=I 4.4 Gauss. real Σ Gauss. real Σ 6 4.3 5.8 4.2 4.1 5.6 4 5.4 0 500 1000 1500 2000 0 500 1000 1500 2000 samples samples Figure: Comparison: k-NN vs k-d partitioning estimator for Gaussians. 20/56
  • 21. Reminder: Visual Localization Figure: 3D+2D map for coarse-to-fine localization 21/56
  • 22. Image Categorization Problem Statement Design a mininal complexity and maximal accuracy classifier based on forward MI-FS for coarse-localization [Lozano et al, 09]. We have a bag of filters (say we have K different filters): Harris detector, Canny, horizontal and vertical gradient, gradient magnitude, 12 color filter based on Gaussians over hue range, sparse depth. We take the statistics of each filter and compose a histogram with say B bins. Therefore, the total number of features NF per image is NF = K × (B − 1). 22/56
  • 23. Image Categorization (2) Figure: Examples of filters responses.From top-bottom and left-right: input image, depth, vertical gradient, gradient magnitude, and five colour filters. 23/56
  • 24. Image Categorization (3) Greedy Feature Selection on different feature sets Greedy Feature Selection on 4−bin histograms 60 20 16 filters x 6 bins 8 Classes 16 filters x 4 bins 6 Classes 50 16 filters x 2 bins 4 Classes 10−fold CV error (%) 15 10−fold CV error (%) 40 2 Classes 30 10 20 5 10 0 0 0 20 40 60 80 0 10 20 30 40 50 # Selected Features # Selected Features Figure: Left: Finding the optima number of bins B for K = 16 filters. Right: Testing whether |C | = 6 is a correct choice. 24/56
  • 25. Image Categorization (4) MD, MmD and mRMR Feature Selection MD (10−fold CV error) 60 MD (test error) MmD (10−fold CV error) MmD (test error) 50 mRMR (10−fold CV error) mRMR (test error), 40 % error 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 # features Figure: FS performance on image histograms data with 48 features. 25/56
  • 26. Image Categorization (5) C#1 C#2 C#3 C#4 C#5 C#6 C#1 26 0 0 0 0 0 C#2 2/3 63/56 1/4 0/3 0 0 C#3 0 1/0 74/67 1/9 0 0 C#4 4/12 5/6 10/0 96/95 0/2 0 C#5 0 0 0 0 81 0 C#6 0 0 0 0 30/23 78/85 Table: Key question: What is the best classifier after Feature Selection? K-NN / SVM Confusion Matrix 26/56
  • 27. Image Categorization (6) Test Image 1st NN 2nd NN 3rd NN 4th NN 5th NN 27/56
  • 28. Image Categorization (7) Figure: Confusion trajectories at coarse (left) and fine (right) localization 28/56
  • 29. Image Categorization (8) Figure: Confusion trajectory using only coarse localization 29/56
  • 30. Image Categorization (9) Figure: Confusion trajectory after coarse-to-fine localization 30/56
  • 31. Image Categorization (10) Results With mRMR, the lowest CV error (8.05%) is achieved with a set of 17 features, while with MD a CV error of 4.16% is achieved with a set of 13 features. The CV errors yielded by MmD are very similar to those of MD. On the other hand, test errors present a slightly different evolution. The best error is given by MD, and the worst one is given by mRMR. This may be explained by the fact that the entropy estimator we use does not degrade its accuracy as the number of dimensions increases. 31/56
  • 32. Gene Selection Microarray Analysis The basic setup of a microarray platform is a glass slide which is spotted with DNA fragments or oligonucleotides representing some specific gene coding regions. Then, purified RNA is labelled fluorescently or radioactively and it is hybridized to the slide. The hybridization can be done simultaneously with reference RNA to enable comparison of data with multiple experiments. Then washing is performed and the raw data is obtained by scanning with a laser or autoradiographing imaging. The obtained expression levels of tens of thousands of genes are numerically quantified and used for statistical studies. 32/56
  • 33. Gene Selection (2) Fluorescent labeling Sample RNA Combination Hybridization Fluorescent labeling Referece ADNc Figure: An illustration of the basic idea of a microarray platform. 33/56
  • 34. Gene Selection (3) MD Feature Selection on Microarray 45 40 LOOCV error (%) I(S;C) 35 % error and MI 30 25 20 15 10 0 20 40 60 80 100 120 140 160 # features Figure: MD-FS performance on the NCI microarray data set with 6,380 features. Lowest LOOCV error with a set of 39 features [Bonev et al.,08] 34/56
  • 35. Gene Selection (4) Comparison of FS criteria 100 MD criterion 90 MmD criterion mRMR criterion LOOCV wrapper 80 70 % LOOCV error 60 50 40 30 20 10 0 5 10 15 20 25 30 35 # features Figure: Performance of FS criteria on the NCI microarray. Only the selection of the first 36 features (out of 6,380) is represented. 35/56
  • 36. Gene Selection (5) MD Feature Selection mRMR Feature Selection CNS CNS 1 CNS CNS CNS CNS RENAL RENAL BREAST BREAST CNS CNS CNS CNS 0.9 BREAST BREAST NSCLC NSCLC NSCLC NSCLC RENAL RENAL RENAL RENAL RENAL RENAL 0.8 RENAL RENAL RENAL RENAL RENAL RENAL RENAL RENAL BREAST BREAST NSCLC NSCLC RENAL RENAL 0.7 UNKNOWN UNKNOWN OVARIAN OVARIAN MELANOMA MELANOMA PROSTATE PROSTATE OVARIAN OVARIAN OVARIAN OVARIAN 0.6 OVARIAN OVARIAN OVARIAN OVARIAN Class (disease) OVARIAN OVARIAN PROSTATE PROSTATE NSCLC NSCLC NSCLC NSCLC 0.5 NSCLC NSCLC LEUKEMIA LEUKEMIA K562B−repro K562B−repro K562A−repro K562A−repro LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA 0.4 LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA COLON COLON COLON COLON COLON COLON COLON COLON 0.3 COLON COLON COLON COLON COLON COLON MCF7A−repro MCF7A−repro BREAST BREAST MCF7D−repro MCF7D−repro BREAST BREAST 0.2 NSCLC NSCLC NSCLC NSCLC NSCLC NSCLC MELANOMA MELANOMA BREAST BREAST BREAST BREAST 0.1 MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA 0 → 2080 → 6145 → 2080 → 6145 1177 1470 1671 3227 3400 3964 4057 4063 4110 4289 4357 4441 4663 4813 5226 5481 5494 5495 5508 5790 5892 6013 6019 6032 6045 6087 6184 6643 1378 1382 1409 1841 2081 2083 2086 3253 3371 3372 4383 4459 4527 5435 5504 5538 5696 5812 5887 5934 6072 6115 6305 6399 6429 6430 6566 → 135 → 135 246 663 766 982 133 134 233 259 381 561 19 Number of Selected Gene Number of Selected Gene Figure: Feature Selection on the NCI DNA microarray data. Features (genes) selected by MD and mRMR are marked with an arrow. 36/56
  • 37. Gene Selection (6) Results Leukemia: 38 bone marrow samples, 27 Acute Lymphoblastic Leukemia (ALL) and 11 Acute Myeloid Leukemia (AML), over 7,129 probes (features) from 6,817 human genes. Also 34 samples testing data is provided, with 20 ALL and 14 AML. We obtained a 2.94% test error with 7 features, while other authors report the same error selecting 49 features via individualized markers. Other recent works report test errors higher than 5% for the Leukemia data set. Central Nervous System Embryonal Tumors: 60 patient samples, 21 are survivors and 39 are failures. There are 7,129 genes in the data set. We selected 9 genes achieving a 1.67% LOOCV error. Best result in the literature is a 8.33% CV error with 100 features is reported. 37/56
  • 38. Gene Selection (7) Results (cont) Colon: 62 samples collected from colon-cancer patients. Among them, 40 tumor biopsies are from tumors and 22 normal biopsies are from healthy parts of the colons of the same patients. From 6,500 genes, 2,000 were selected for this data set, based on the confidence in the measured expression levels. In the feature selection experiments we performed, 15 of them were selected, resulting in a 0% LOOCV error, while recent works report errors higher than 12%. Prostate: 25 tumor and 9 normal samples. with 12,600 genes. Our experiments achieved the best error with only 5 features, with a test error of 5.88%. Other works report test errors higher than 6%. 38/56
  • 39. Structural Recognition Reeb Graphs Given a surface S and a real function f : S → R, the Reeb graph (RG), represents the topology of S through a graph structure whose nodes correspond to the critical points of f . The Extended Reeb Graph (ERG) [Biasotti, 05] is an approximation of the RG by using of a fixed number of level sets (63 in this work) that divide the surface into a set of regions; critical regions, rather than critical points, are identified according to the behaviour of f along level sets; ERG nodes correspond to critical regions, while the arcs are detected by tracking the evolution of level sets. 39/56
  • 40. Structural Recognition (2) Figure: Extended Reeb Graphs. Image by courtesy of Daniela Giorgi and Silvia Biasotti. a) Geodesic, b) Distance to center, b) bSphere 40/56
  • 41. Structural Recognition (3) Catalog of Features Subgraph node centrality: quantifies the degree of participation of a node i in a subgraph. n 2 λk CS (i) = k=1 φk (i) e , where where n = |V |, λk the k-th eigenvalue of A and φk its corresponding eigenvector. Perron-Frobenius eigenvector: φn (the eigenvector corresponding to the largest eigenvalue of A). The components of this vector denote the importance of each node. Adjacency Spectra: the magnitudes |φk | of the (leading) eigenvalues of A have been been experimentally validated for graph embedding [Luo et al.,03]. 41/56
  • 42. Structural Recognition (4) Catalog of Features (cont.) Spectra from Laplacians: both from the un-normalized and normalized ones. The Laplacian spectrum plays a fundamental role in the development of regularization kernels for graphs. Friedler vector: eigenvector corresponding to the first non-trivial eigenvalue of the Laplacian (φ2 in connected graphs). It encodes the connectivity structure of the graph (actually is the core of graph-cut methods). Commute Times either coming from the un-normalized and normalized Laplacian. They encode the path-length distribution of the graph. The Heat-Flow Complexity trace as seen in the section above. 42/56
  • 43. Structural Recognition (5) Backwards Filtering The entropies H(·) of a set with a large n number of features can be efficiently estimated using the k-NN-based method developed by Leonenko. Thus, we take the data set with all its features and determine which feature to discard in order to produce the smallest decrease of I (Sn−1 ; C ). We then repeat the process for the features of the remaining feature set, until only one feature is left [Bonev et al.,10]. Then, the subset yielding the minimum 10-CV error is selected. Most of the spectral features are histogrammed into a variable number of bins: 9 · 3 · (2 + 4 + 6 + 8) = 540 features. 43/56
  • 44. Experimental Results Elements SHREC database (15 classes × 20 objects). Each of the 300 samples is characterized by 540 features, and has a class label l ∈ {human, cup, glasses, airplane, chair, octopus, table, hand, fish, bird, spring, armadillo, buste, mechanic, four-leg } The errors are measured by 10-fold cross validation (10-fold CV). MI is maximized as the number of selected features grows. A high number of features degrades the classification performance. However the MI curve, as well as the selected features, do not depend on the classifier, as it is a purely information-theoretic measure. 44/56
  • 45. Structural Recognition (7) Figure: Classification error vs Mutual Information. 45/56
  • 46. Structural Recognition (8) Figure: Feature selection on the 15-class experiment (left) and the feature statistics for the best-error feature set (right). 46/56
  • 47. Structural Recognition (9) Analysis For the 15-class problem, the minimal error (23, 3%) is achieved with a set of 222 features. The coloured areas in the plot represent how much a feature is used with respect to the remaining ones (the height on the Y axis is arbitrary). For the 15-class experiment, in the feature sets smaller than 100 features, the most important is the Friedler vector, in combination with the remaining features. Commute time is also an important feature.Some features that are not relevant are the node centrality and the complexity flow. Turning our attention to the graphs type, all three appear relevant. 47/56
  • 48. Structural Recognition (10) Analysis (cont) We can see that the four different binnings of the features do have importance for graph characterization. These conclusions concerning the relevance of each feature cannot be drawn without performing some additional experiments with different groups of graph classes. We perform our different 3-class experiments. The classes share some structural similarities, for example the 3 classes of the first experiment have a head and limbs. Although in each experiment the minimum error is achieved with very different numbers of features, the participation of each feature is highly consistent with the 15-class experiment. 48/56
  • 49. Structural Recognition (11) Figure: Feature Selection on 3-class experiments: Human/Armadillo/Four-legged, Aircraft/Fish/Bird 49/56
  • 50. Structural Recognition (12) Figure: Feature Selection on 3-class experiments: Cup/Bust/Mechanic, Chair/Octopus/Table. 50/56
  • 51. Structural Recognition (14) Comparison of forward and backward F.S. 80 Forward selection 70 Backward elimination 60 CV error (%) 50 40 30 20 0 100 200 300 400 500 600 # features Figure: Forward vs Backwards FS 51/56
  • 52. Structural Recognition (15) Bootstrapping on forward FS Bootstrapping on backward FS Standard deviation Standard deviation 90 90 Test error Test error 80 80 70 70 % test error % test error 60 60 50 50 40 40 30 30 0 100 200 300 400 500 0 100 200 300 400 500 # features # features Forward FS: feature coincidences in bootstraps Backward FS: feature coincidences in bootstraps 25 25 20 20 # coincidences # coincidences 15 15 10 10 5 5 0 0 0 100 200 300 400 500 0 100 200 300 400 500 # selected features # selected features Figure: Bootstrapping experiments on forward (on the left) and backward (on the right) feature selection, with 25 bootstraps. 52/56
  • 53. Structural Recognition (16) Conclusions The main difference among experiments is that node centrality seems to be more important for discerning among elongated sharp objects with long sharp characteristics. Although all three graph types are relevant, the sphere graph performs best for blob-shaped objects. Bootstrapping shows how better is the Backwards method. The IT approach not only performs good classification but also yield the role of each spectral feature in it! Given other points of view (size functions, eigenfunctions,...) it is desirable to exploit them! 53/56
  • 54. References [Zhu et al.,97] Zhu, S.C., Wu, Y.N. and Mumford, D (1997). Filters, Random Fields And Maximum Entropy (FRAME): towads an unified theory for texture modeling. IJCV 27 (2) 107–126 [Wang and Gotoh, 2009] Wang, X. and Gotoh, O. (2009). Accurate molecular classication of cancer using simple rules. BMC Medical Genomics, 64(2) [Guyon and Elisseeff, 03] Guyon, I. and Elissee?, A. (2003). An intro- duction to variable and feature selection. Journal of Machine Learning Research, volume 3, pages 1157–1182. [Vasconcelos and Vasconcelos,04] Vasconcelos, N. and Vasconcelos, M. (2004). Scalable discriminant feature selection for image retrieval and recognition. In CVPR04, pages 770–775. 54/56
  • 55. References (2) [Peng et al., 2005]Peng, H., Long, F., and Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. In IEEE Trans. on PAMI, 27(8), pp.1226-1238. [Leonenko et al, 08] Leonenko, N., Pronzato, L., and Savani, V. (2008). A class of R´nyi information estimators for multidimensional e densities. The Annals of Statistics, 36(5):21532182. [Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P., Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell. a based categorization of images with unsupervised graph learning. Image Vision Comput. 27(7): 960–978 55/56
  • 56. References (3) [Bonev et al.,08] Bonev, B., Escolano, F., and Cazorla, M. (2008). Feature selection, mutual information, and the classication of high- dimensional patterns. Pattern Analysis and Applications 11(3-4): 304–319. [Biasotti, 05] Biasotti, S. (2005). Topological coding of surfaces with boundary using Reeb graphs. Computer Graphics and Geometry, 7(1):3145. [Luo et al.,03 Luo, B., Wilson, R., and Hancock, E. (2003). Spectral embedding of graphs. Pattern Recognition, 36(10):22132223 [Bonev et al.,10] Bonev, B., Escolano, F., Giorgi, D., Biasotti, S, (2010). High-dimensional Spectral Feature Selection for 3D Object Recognition based on Reeb Graphs, SSPR’2010 (accepted) 56/56