SlideShare une entreprise Scribd logo
1  sur  96
Feature Selection and Extraction :Feature Selection and Extraction :
An Introduction with emphasis onAn Introduction with emphasis on
Principal Component AnalysisPrincipal Component Analysis
Dr. N. B.Venkateswarlu, AITAM,Tekkali
What I am going to cover?
• What is feature selection/extraction
• Need and discussion
• Methodologies
• PCA
Feature Selection/Extraction
Classifier
Pre-
Processing
Feature
Selection/
Extractio
n
Data
4
What Is Feature Selection?
• Selecting the most “relevant” subset of attributes
according to some selection criteria.
5
Why Feature Selection?
• High-dimensional data often contain irrelevant or
redundant features
– reduce the accuracy of data mining algorithms
– slow down the mining process
– be a problem in storage and retrieval
– hard to interpret
Why feature selection is important?
• May improve performance of learning
algorithm
• Learning algorithm may not scale up to the
size of the full feature set either in sample
or time
• Allows us to better understand the domain
• Cheaper to collect a reduced set of features
What is feature selection?
cat 2
and 35
it 20
kitten 8
electric 2
trouble 4
then 5
several 9
feline 2
while 4
…
lemon 2
cat 2
kitten 8
feline 2
Vegetarian No
Plays video
games
Yes
Family history No
Athletic No
Smoker Yes
Sex Male
Lung capacity 5.8L
Hair color Red
Car Audi
…
Weight 185 lbs
Family
history
No
Smoker Yes
Task: classify whether a document is
about cats
Data: word counts in the document
Task: predict chances of lung disease
Data: medical history survey
X X
Reduced X
Reduced X
Characterising features
• Generally, features are characterized as:
– Relevant: These are features which have an influence on
the output and their role can not be assumed by the rest
– Irrelevant: Irrelevant features are defined as those features
not having any influence on the output, and whose values
are generated at random for each example.
– Redundant: A redundancy exists whenever a feature can
take the role of another (perhaps the simplest way to
model redundancy).
Discussion Ex: Remote Sensing
Surveys
12
Challenges in Feature Selection (1)Challenges in Feature Selection (1)
• Dealing with ultra-high dimensional data and feature
interactions
Traditional feature selection encounter two major problems when the
dimensionality runs into tens or hundreds of thousands:
1. curse of dimensionality
2. the relative shortage of instances.
13
Challenges in Feature Selection (2)Challenges in Feature Selection (2)
• Dealing with active instances (Liu et al., 2005)
When the dataset is huge, feature selection performed on
the whole dataset is inefficient,
so instance selection is necessary:
– Random sampling (pure random sampling without
exploiting any data characteristics)
– Active feature selection (selective sampling using
data characteristics achieves better or equally good
results with a significantly smaller number of
instances).
14
Challenges in Feature Selection (3)Challenges in Feature Selection (3)
• Dealing with new data types (Liu et al., 2005)
– traditional data type: an N*M data matrix
Due to the growth of computer and Internet/Web techniques,
new data types are emerging:
– text-based data (e.g., e-mails, online news, newsgroups)
– semistructure data (e.g., HTML, XML)
– data streams.
15
Challenges in Feature Selection (4)Challenges in Feature Selection (4)
• Unsupervised feature selection
– Feature selection vs classification: almost
every classification algorithm
– Subspace method with the curse of
dimensionality in classification
– Subspace clustering.
16
Challenges in Feature Selection (5)Challenges in Feature Selection (5)
• Dealing with predictive-but-unpredictable
attributes in noisy data
– Attribute noise is difficult to process, and removing
noisy instances is dangerous
– Predictive attributes: essential to classification
– Unpredictable attributes: cannot be predicted by the
class and other attributes
• Noise identification, cleansing, and
measurement need special attention [Yang et
al., 2004]
Feature Selection Methods
• Feature selection is an optimization problem.
– Search the space of possible feature subsets.
– Pick the one that is optimal or near-optimal with respect to a
certain criterion.
Search strategies Evaluation strategies
– Optimal - Filter methods
– Heuristic - Wrapper methods
– Randomized
Evaluation Strategies
• Filter Methods
– Evaluation is independent of the classification algorithm or its
error criteria.
• Wrapper Methods
– Evaluation uses a criterion related to the classification
algorithm.
• Wrapper methods provide more accurate solutions
than filter methods, but in general are more
computationally expensive.
Typical Feature Selection –
First step
Generation Evaluation
Stopping
Criterion Validation
Original
Feature Set Subset
Goodness of
the subset
No Yes
1 2
3 4
Generates subset
of features for
evaluation
Can start with:
•no features
•all features
•random subset
of features
Typical Feature Selection –
Second step
Generation Evaluation
Stopping
Criterion Validation
Original
Feature Set Subset
Goodness of
the subset
No Yes
1 2
3 4
Measures the
goodness of
the subset
Compares with
the previous
best subset
if found better,
then replaces
the previous
best subset
Typical Feature Selection –
Third step
Generation Evaluation
Stopping
Criterion Validation
Original
Feature Set Subset
Goodness of
the subset
No Yes
1 2
3 4
Based on Generation
Procedure:
•Pre-defined number of features
•Pre-defined number of iterations
Based on Evaluation
Function:
•whether addition or deletion of a
feature does not produce a better
subset
•whether optimal subset based on
some evaluation function is
achieved
Typical Feature Selection -
Fourth step
Generation Evaluation
Stopping
Criterion Validation
Original
Feature Set Subset
Goodness of
the subset
No Yes
1 2
3 4
Basically not part of the feature
selection process itself
- compare results with already
established results or results from
competing feature selection
methods
Exhaustive Search
• Assuming m features, an exhaustive search would
require:
– Examining all possible subsets of size n.
– Selecting the subset that performs the best according to the
criterion function.
• The number of subsets grows combinatorialy, making
exhaustive search impractical.
• Iterative procedures are often used but they cannot
guarantee the selection of the optimal subset.
23
m
n
 
 ÷
 
Naïve Search
• Sort the given d features in order of their probability of
correct recognition.
• Select the top m features from this sorted list.
• Disadvantage
– Feature correlation is not considered.
– Best pair of features may not even contain the best individual
feature.
Sequential forward selection
(SFS)
(heuristic search)
• First, the best single feature is selected
(i.e., using some criterion function).
• Then, pairs of features are formed using
one of the remaining features and this
best feature, and the best pair is
selected.
• Next, triplets of features are formed
using one of the remaining features and
these two best features, and the best
triplet is selected.
• This procedure continues until a
predefined number of features are
selected. 25
SFS performs
best when the
optimal subset is
small.
Example
26
Results of sequential forward feature selection for classification of a
satellite image using 28 features. x-axis shows the classification accuracy
(%) and y-axis shows the features added at each iteration (the first iteration
is at the bottom). The highest accuracy value is shown with a star.
Sequential backward selection
(SBS)
(heuristic search)
• First, the criterion function is computed
for all d features.
• Then, each feature is deleted one at a
time, the criterion function is computed
for all subsets with d − 1 features, and
the worst feature is discarded.
• Next, each feature among the remaining
d − 1 is deleted one at a time, and the
worst feature is discarded to form a
subset with d − 2 features.
• This procedure continues until a
predefined number of features are left.
27
SBS performs
best when the
optimal subset is
large.
Example
28
Results of sequential backward feature selection for classification of a
satellite image using 28 features. x-axis shows the classification
accuracy (%) and y-axis shows the features removed at each iteration
(the first iteration is at the bottom). The highest accuracy value is
shown with a star.
Plus-L minus-R selection (LRS)
• A generalization of SFS and SBS
– If L>R, LRS starts from the empty set and
repeatedly adds L features and removes R
features.
– If L<R, LRS starts from the full set and
repeatedly removes R features and adds L
features.
• Comments
– LRS attempts to compensate for the
weaknesses of SFS and SBS with some
backtracking capabilities.
– How to choose the optimal values of L and
R?
Bidirectional Search (BDS)
• BDS applies SFS and SBS
simultaneously:
– SFS is performed from the empty
set
– SBS is performed from the full set
• To guarantee that SFS and SBS
converge to the same solution
– Features already selected by SFS
are not removed by SBS
– Features already removed by SBS
are not selected by SFS
Sequential floating selection
(SFFS and SFBS)
• An extension to LRS with flexible backtracking
capabilities
– Rather than fixing the values of L and R, floating methods
determine these values from the data.
– The dimensionality of the subset during the search can be
thought to be “floating” up and down
• There are two floating methods:
– Sequential floating forward selection (SFFS)
– Sequential floating backward selection (SFBS)
P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature
selection, Pattern Recognition Lett. 15 (1994) 1119–1125.
Sequential floating selection
(SFFS and SFBS)
• SFFS
– Sequential floating forward selection (SFFS) starts from
the empty set.
– After each forward step, SFFS performs backward steps
as long as the objective function increases.
• SFBS
– Sequential floating backward selection (SFBS) starts from
the full set.
– After each backward step, SFBS performs forward steps
as long as the objective function increases.
Argument for wrapper methods
• The estimated accuracy of the learning
algorithm is the best available heuristic for
measuring the values of features.
• Different learning algorithms may perform
better with different feature sets, even if
they are using the same training set.
Wrapper selection algorithms (1)
• The simplest method is forward selection
(FS). It starts with the empty set and
greedily adds features one at a time
(without backtracking).
• Backward stepwise selection (BS) starts
with all features in the feature set and
greedily removes them one at a time
(without backtracking).
Wrapper selection algorithms (2)
• The Best First search starts with an empty set of features and
generates all possible single feature expansions. The subset with
the highest evaluation is chosen and is expanded in the same
manner by adding single features (with backtracking). The Best First
search (BFFS) can be combined with forward or backward selection
(BFBS).
• Genetic algorithm selection. A solution is typically a fixed length
binary string representing a feature subset—the value of each
position in the string represents the presence or absence of a
particular feature. The algorithm is an iterative process where each
successive generation is produced by applying genetic operators
such as crossover and mutation to the members of the current
generation.
Disadvantages of Support Vector Machines
"Perhaps the biggest limitation of the support vector approach
lies in choice of the kernel."
Burgess (1998)
"A second limitation is speed and size, both in training and
testing."
Burgess (1998)
"Discete data presents another problem..."
Burgess (1998)
"...the optimal design for multiclass SVM classifiers is a
further area for research."
Burgess (1998)
"Although SVMs have good generalization performance, they can be
abysmally slow in test phase, a problem addressed in (Burges, 1996;
Osuna and Girosi, 1998)."
Burgess (1998)
"Besides the advantages of SVMs - from a practical point of view - they
have some drawbacks. An important practical question that is not entirely
solved, is the selection of the kernel function parameters - for Gaussian
kernels the width parameter [sigma] - and the value of [epsilon] in the
[epsilon]-insensitive loss function...[more]"
Horváth (2003) in Suykens et al.
"However, from a practical point of view perhaps the most serious
problem with SVMs is the high algorithmic complexity and extensive
memory requirements of the required quadratic programming in large-
scale tasks."
Horváth (2003) in Suykens et al. p 392
Principal Component AnalysisPrincipal Component Analysis
•Another names for PCA:
1) Karhunen-Loewe Transformation (KLT);
2) Hotelling Transformation.
3)Eigenvector Analysis
•Properties of PCA
1) Data decorrelation;
2) Dimensionality reduction.
Important Objective
• The goal of principal component analysis is
to take n variables x1, x2,…, xn and find
linear combinations of these variables to
produce a new set of variables y1, y2, …, yn
that are uncorrelated. The transformed
variables are indexed or ordered so that y1
shows the largest amount of variation, y2
has the second largest amount of variation,
and so on.
x’y’
• PCA finds an orthogonal basis that best represents given data
set.
• 2
PCA – the general idea
x
y
PCA – the general idea
• PCA finds an orthogonal basis that best represents given data
set.
• PCA finds a best approximating plane (again, in terms of
Σdistances2
)
3D point set in
standard basis
x y
z
PCA – the general idea
• PCA finds an orthogonal basis that best represents given data
set.
• PCA finds a best approximating plane (again, in terms of
Σdistances2
)
3D point set in
standard basis
Application: finding tight
bounding box
• An axis-aligned bounding box: agrees with
the axes
x
y
minX maxX
maxY
minY
Usage of bounding boxes (bounding volumes)
 Serve as very simple “approximation” of the object
 Fast collision detection, visibility queries
 Whenever we need to know the dimensions (size) of the object
 The models consist of
thousands of polygons
 To quickly test that they
don’t intersect, the
bounding boxes are
tested
 Sometimes a hierarchy
of BB’s is used
 The tighter the BB – the
less “false alarms” we
have
Centered data points x in n-dimensional space:
µ is mean value of the vector x.
Covariance matrix C for the centered data:
{ } ( )
{ }.
,......'xxC
,
1
1
jiji
n
n
xxc
xx
x
x
Ε=










⋅










Ε=Ε=

,...x
1










=
nx
x

{ }










=Ε=
0
...
0
xμ

Here E{f(x)} is expectation value of f(x).
{ } ( ){ } { }
{ } Cw.w'wxx'w'
wxx'w'wx''wx')x(
22
w
=Ε=
=Ε=Ε=Ε= wPσ
The variance of the projection on to the direction ww:
Projection of the data point xx on to direction ww:
w.x'w,x)x( ⋅>==<wP
w
x
)x(wP
x1
x2
So,
The vector ww should be normalized:
Hence, finding the normalized direction of maximal
variances reduces to the following computation.
Maximizing varianceMaximizing variance: The normalized direction ww that
maximizes the variance can be found by solving the
following problem:
{ },Cww'max
w
subject to: .1ww'w
2
==
Cw.w'2
w =σ
.1w
2
=
The constrained optimization problem is reduced to
unconstrained one using method of Lagrange
multipliers:
{ }
( ) wCwww'Cww'
w
)1ww'(Cww'max
w
λλ
λ
−=−
∂
∂
−−
0.wCw =−λ
Condition for maximum of the function:
We have to solve the following equation:
wCw λ=
and find eigenvalues λi and eigenvectors wi of the
covariance matrix C.
Covariance matrix C is symmetric, so the equation has n
distinct solutions:
• n eigenvectors (w1,w2, …, wn) that form orthonormal
basis in n dimensional space:
• n positive eigenvalues that are the data variances
along the corresponding eigenvectors:
λ1≥ λ2≥ … ≥ λn ≥0.
wCw λ=



≠
=
=
.0
;,1
ww'
ji
ji
ji
• Direction of the maximum variance is given by the
eigenvector w1 corresponding to the largest eigenvalue λ1
and the variance of the projection on to the direction is
equal to the largest eigenvalue λ1.
• The direction w1 is called the first principal axes.first principal axes.
• The directionThe direction w2 is called the secondsecond principal axes,
and so on.
w1
w2
1)1) Decorrelation property of PCADecorrelation property of PCA
• Let’s represent vector xx as linear combination of n
eigenvectors with the coefficients aaii:
x=a1w1+ a2w1+…+ anwn,
where coefficients aaii ≡Pw(x)=x’wi≡ (wi)’x are computed as
projection of vector xx on to the basis vectors wwi.i.
1)1) Decorrelation property of PCADecorrelation property of PCA
• Calculate correlation of the coefficients ai and aj:
{ } { } { } .wwwxx'wwxx'w '''
jijijiji Caa =Ε=Ε=Ε
• From the main property of eigenvalues and eigenvector
it follows that any pair of the coefficients is uncorrelated:
{ } .wwww ,
''
jijjijjiji Caa δλλ ===Ε
where δi,j is Kronecker symbol:



≠
=
=
ji
ji
ji
0
,1
,δ
{ } .2
iia λ=Ε
{ } .,0 jiaa ji ≠=Ε
•Variance of the projection is eigenvalue:
• Projections are pairwise uncorrelated:
2) PCA dimensionality reduction:
•The objective of PCA is to perform dimensionality
reduction while preserving as much of the data in high-
dimensonal space as possible:
* for visualization,
* for compression,
* to cancel data containing low/no information.
Demonstrations with 1-D and 2-D data points
on transparencies in 3-D space.
2) PCA dimensionality reduction: Main idea
• Find the m first eigenvectors corresponding to the m
largest eigenvalues.
• Project the data points into the subspace spanned on to
the first m eigenvectors:
∑=
=
m
k
kkm
P
1
w,...,w )ww(x')x(1
• Input data:
{ } .)x(x
1
2
w,...,w
2
1 ∑+=
=−Ε=
n
mk
km m
P λσ
• Data projected into the subspace spanned on to the first
m eigenvectors:
• Error caused by the dimensionality reduction:
∑=
=
n
k
kk
1
)wwx'(x
• Variance of the error:
∑=
=
m
k
kkm
P
1
w,...,w )wwx'()x(1
∑+=
=−=
n
mk
kkm
P
1
w,...,w )wwx'()x(xδx 1
{ } .)x(x
1
2
w,...,w
2
1 ∑+=
=−Ε=
n
mk
km m
P λσ
• Varianc of the error is equal to the sum of eigenvalues
for dropped-out dimensions:
• The PCA used to a some sense optimal representation
of data in a low-dimensional subspace of the original
high-dimensional pattern space. PCA provides the
mimimal mean squared error among all other linear
transformations.
• This subspace is spanned by the first m eigenvectors
of covariance matrix C corresponding to m largest
eigenvalues.
.
w
u
k
k
k
λ
=
Data ”whitenning”Data ”whitenning”
Let’s introduce new basis by scaling the eigenvectors:
• The new uu basis is orthonormal:
• Variances of projections in the new basis uu are equal
to unit: 1.2
u =σ
jiji ,
'
uu δ=
Scatter matrix -
eigendecomposition
• S is symmetric
⇒ S has eigendecomposition: S = VΛV
T
S = v2v1 vn
λ1
λ2
λn
v2
v1
vn
The eigenvectors form
orthogonal basis
Principal components
• S measures the “scatterness” of the data.
• Eigenvectors that correspond to big
eigenvalues are the directions in which the
data has strong components.
• If the eigenvalues are more or less the same
– there is not preferable direction.
Principal components
• There’s no preferable
direction
• S looks like this:
• Any vector is an
eigenvector






λ
λ
• There is a clear preferable
direction
• S looks like this:
∀ µ is close to zero, much
smaller than λ.
T
VV 





µ
λ
How to use what we got
• For finding oriented bounding box – we
simply compute the bounding box with
respect to the axes defined by the
eigenvectors. The origin is at the mean
point m.
v2
v1
v3
For approximation
x
y
v1
v2
x
y
This line segment approximates
the original data set
The projected data set
approximates the original data set
x
y
For approximation
• In general dimension d, the eigenvalues are
sorted in descending order:
λ1 ≥ λ2 ≥ … ≥ λd
• The eigenvectors are sorted accordingly.
• To get an approximation of dimension d’ <
d, we take the d’ first eigenvectors and look
at the subspace they span (d’ = 1 is a line,
d’ = 2 is a plane…)
For approximation
• To get an approximating set, we project the
original data points onto the chosen
subspace:
xi = m + α1v1+ α2v2+…+ αd’vd’+…+αdvd
Projection:
xi’ = m + α1v1+ α2v2+…+ αd’vd’+0⋅vd’+1+…+ 0⋅
vd
Optimality of approximation
• The approximation is optimal in least-
squares sense. It gives the minimal of:
• The projected points have maximal
variance.
∑=
′−
n
k
kk
1
2
xx
Original set projection on arbitrary line projection on v1 axis
• A line graph of the (ordered) eigenvalues is
called a scree plot
How to work in practice?
•We have training set X of size l (l n-dimensional vectors):
X,X'
1
C
l
= .
1
1
,,, ∑=
=
l
i
kijikj xx
l
cwhere










=
nll
n
xx
xx
,1,
,11,1
...
...
...
X
•Evaluate covariance matrix using the training set X:
•Find eigenvalues and eigenvectors for the covariance
matrix.
Principal Component Analysis (PCA)Principal Component Analysis (PCA)
takes an initial subset of the principal axes of the
training data and project the data (both training and
test) into the space spanned by this set of eigenvectors.
•The data is projected onto subspace spanned by m
first eigenvectors of covariance matrix. The new
coordinates are known as principal coordinates with
eigenvectors referred as principal axes.
Algorithm:Algorithm:
Input: Dataset X={x1, x2, …, xl}⊆ℜn
,
Process: ∑=
=
l
i
i
l 1
x
1
μ
')μx)(μx(
1
C
1
∑=
−−=
l
i
ii
l
[ ] )C(eig,W l=Λ
,...,l.,iii 21,xWx~ =⋅=
Output: transformed data { }lS x~,...,x~,x~~
21=
{ }kxxx ~,...,~,~x~
21=
Example:Example: 8 vectors in 2-D space8 vectors in 2-D space
X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8];
•Mean values:
• Centered data: [-4,-3; -2,-2; -2,0; 0,-1; 0,1; 1,0; 3,2; 4,3]
• Covariance matrix C:






=
5.325.4
25.425.6
C
()25.6
8
50
4 4 3 3 1 1 0 0 0 0 )2 )(2 ( )2 )(2 ( )4 )(4 (
8
1
1,1= = ⋅ + ⋅ + ⋅ + ⋅ + ⋅ + − − + − − + − − = c
()5.3
8
28
3 3 2 2 0 0 1 1 )1 )(1 ( 0 0 )2 )(2 ( )3 )(3 (
8
1
2,2= = ⋅ + ⋅ + ⋅ + ⋅ + − − + ⋅ + − − + − − = c
()25.4
8
34
3 4 2 3 0 1 1 0 )1 ( 0 0 )2 ( )2 )(2 ( )3 )(4 (
8
1
1,2 2,1= = ⋅ + ⋅ + ⋅ + ⋅ + − ⋅ + ⋅ − + − − + − − = =c c
()5
8
40
9 8 6 5 5 3 3 1
8
1
1= = + + + + + + + = µ
()5
8
40
8 7 5 6 4 5 3 2
8
1
2= = + + + + + + + = µ
Example:Example: 8 vectors in 2-D space8 vectors in 2-D space
X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8];






=
5.325.4
25.425.6
C








=







⋅





2
1
2
1
5.325.4
25.425.6
x
x
x
x
w
w
w
w
λ
Find eigenvalues and eigenvectors of the covariance
matrix C:
wCw λ=
Covariance matrix C:
1) Find eigenvalues1) Find eigenvalues
( ) 0I =−
=⋅
wC
wwC
nλ
λ
41.0
34.9
2
1
=
=
λ
λ
Solve the characteristic equation:
( )
025.425.4)5.3)(25.6(
0
5.325.4
25.425.6
det
0Idet
=⋅−−−
=





−
−
=−
λλ
λ
λ
λ nC
Eigenvalues:
2) Find eigenvectors for the eigenvalues:2) Find eigenvectors for the eigenvalues:
2221
22
21
22
21
1111
12
11
12
11
376.1
41.0
41.0
5.325.4
25.425.6
b)
376.1
34.9
34.9
5.325.4
25.425.6
a)
ww
w
w
w
w
ww
w
w
w
w
⋅−=⇒





⋅
⋅
=





⋅





⋅=⇒





⋅
⋅
=





⋅










−
=





⇒=+⋅⇒=+






=





⇒=+⋅⇒=+
81.0
59.0
1)376.1(1
59.0
81.0
1)376.1(1
22
212
22
2
22
22
22
2
21
12
112
12
2
12
22
12
2
11
w
w
wwww
w
w
wwww
Normalized the eigenvectors
Check orthogonality: ( ) 0
81.0
59.0
59.081.02
'
1 =




−
=ww





−
=











=





81.0
59.0
59.0
81.0
22
21
12
11
w
w
w
w
Check normalization: ( ) 1
59.0
81.0
59.081.01
'
1 =





=ww
( ) 1
81.0
59.0
81.059.02
'
2 =




−
−=ww
Eigenvectors:
Orthonormal basis:










=
'
'
1
w
...
w
W
k






⋅







=





2
1
'
2
'
1
2
1
w
w
~
~
x
x
x
x
Transfromation (projection) into new basis:






−
=





⋅





− 6.0
5
3
4
81.059.0
59.081.0
Example:
Wxx~ =





−
=






=
8086.0
5883.0
w
5883.0
8086.0
w
2
1
w2
w1
41.0
34.9
2
1
=
=
λ
λ
The principal axes for the test data.
The example with MatlabThe example with Matlab
X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8];
X1=X(:,1);
X2=X(:,2);
X1=X1-mean(X1); % Centered data
X2=X2-mean(X2); % Centered data
C=cov(X1,X2); eval C;






=
0.48571.4
8571.41429.7
C % Divided by (l-1), not l!
[W,Lambda]=eig(C);






=
6764.100
04664.0
Lambda
Eigenvectors do not depend on scaling of covariance matrix, eigenvalues do.






−−
−
=
5883.08086.0
8086.05883.0
W
Limitations of PCA
λ1,2,3=1/3
Case Study 1: Gender Classification
• Determine the gender of a subject from facial images.
– Race, age, facial expression, hair style, etc.
Z. Sun, G. Bebis, X. Yuan, and S. Louis, "Genetic Feature Subset
Selection for Gender Classification: A Comparison Study", IEEE
Workshop on Applications of Computer Vision, pp. 165-170,
Orlando, December 2002.
Feature Extraction Using PCA
• PCA maps the data in a lower-dimensional space
using a linear transformation.
• The columns of the projection matrix are the “best”
eigenvectors (i.e., eigenfaces) of the covariance
matrix of the data.
Which eigenvectors encode
mostly gender information?
EV#1 EV#2 EV#3 EV#4 EV#5 EV#6
EV#8 EV#10 EV#12 EV#14 EV#19 EV#20
Dataset
• 400 frontal images from 400 different people
– 200 male, 200 female
– Different races, lighting conditions, and facial expressions
• Images were registered and normalized
– No hair information
– Account for different lighting conditions
Experiments
• Classifiers
– LDA
– Bayes classifier
– Neural Networks (NNs)
– Support Vector Machines (SVMs)
• Comparison with SFBS
• Three-fold cross validation
– Training set: 75% of the data
– Validation set: 12.5% of the data
– Test set: 12.5% of the data
Error Rates
ERM: error rate using top eigenvectors
ERG: error rate using GA selected eigenvectors
17.7%
11.3%
22.4%
13.3%
14.2%
9% 8.9%
4.7%
6.7%
Ratio of Features - Information Kept
RN: percentage of eigenvectors in the feature subset selected.
RI: percentage of information contained in the eigenvector subset selected.
17.6%
38%
13.3%
31%
36.4%
61.2%
8.4%
32.4%
42.8%
69.0%
Distribution of Selected
Eigenvectors
(a) LDA (b) Bayes
(d) SVMs(c) NN
Reconstructed Images
Reconstructed faces using GA-selected EVs do not contain information
about identity but disclose strong gender information!
Original images
Using top 30 EVs
Using EVs selected
by SVM+GA
Using EVs selected
by NN+GA
Comparison with SFBS
Original images
Top 30 EVs
EVs selected
by SVM+GA
EVs selected
by SVM+SFBS
Case Study 2: Vehicle Detection
low light camera
rear views
Non-vehicles class much
larger than vehicle class.
Z. Sun, G. Bebis, and R. Miller, "Object Detection Using
Feature Subset Selection", Pattern Recognition,
vol. 37, pp. 2165-2176, 2004.
Ford Motor Company
Which eigenvectors encode the
most important vehicle features?
Experiments
• Training data set (collected in Fall 2001)
 2102 images (1051 vehicles and 1051 non-vehicles)
• Test data sets (collected in Summer 2001)
 231 images (vehicles and non-vehicles)
• Comparison with SFBS
• Three-fold cross-validation
• SVM for classification
Error Rate
10.24%
6.49%
Histograms of Selected
Eigenvectors
SFBS-SVM GA-SVM
Number of eigenvectors selected by SBFS: 87 (43.5% information)
Number of eigenvectors selected by GA: 46 (23% information)
Vehicle Detection
Original
Top 50 EVs
EVs selected
by SFBS
Evs selected
by GAs
Reconstructed images using the selected feature subsets.
- Lighting differences have been disregarded by the GA approach.

Contenu connexe

Tendances

[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIAnand Joshi
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesLionel Briand
 
Chaos Presentation
Chaos PresentationChaos Presentation
Chaos PresentationAlbert Yang
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
The “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer LearningThe “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer LearningRahul Krishna
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISCOGS Presentations
 
Feature recognition and classification
Feature recognition and classificationFeature recognition and classification
Feature recognition and classificationSooraz Sresta
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLLionel Briand
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document RankingAndrea Gigli
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingLionel Briand
 

Tendances (15)

[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation
 
Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
 
Chaos Presentation
Chaos PresentationChaos Presentation
Chaos Presentation
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
The “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer LearningThe “Bellwether” Effect and Its Implications to Transfer Learning
The “Bellwether” Effect and Its Implications to Transfer Learning
 
Unit1 pg math model
Unit1 pg math modelUnit1 pg math model
Unit1 pg math model
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 
Feature recognition and classification
Feature recognition and classificationFeature recognition and classification
Feature recognition and classification
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
 

Similaire à Nbvtalkonfeatureselection

few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...DrPArivalaganASSTPRO
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...IJTET Journal
 
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine LearningDimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine LearningRomiRoy4
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptxPriyadharshiniG41
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionDavide Nardone
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 
C11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxC11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxTariqqandeel
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaLuca Marignati
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Feature selection for classification
Feature selection for classificationFeature selection for classification
Feature selection for classificationefcastillo744
 

Similaire à Nbvtalkonfeatureselection (20)

few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
few common Feature of Size Datum Features are bores, cylinders, slots, or tab...
 
Feature Selection.pdf
Feature Selection.pdfFeature Selection.pdf
Feature Selection.pdf
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
 
Dimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine LearningDimensionality Reduction in Machine Learning
Dimensionality Reduction in Machine Learning
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
C11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptxC11BD 22-23 data ana-Exploration II.pptx
C11BD 22-23 data ana-Exploration II.pptx
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Presentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in InformaticaPresentazione Tesi Laurea Triennale in Informatica
Presentazione Tesi Laurea Triennale in Informatica
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Feature selection for classification
Feature selection for classificationFeature selection for classification
Feature selection for classification
 
Data reduction
Data reductionData reduction
Data reduction
 
DMW.pptx
DMW.pptxDMW.pptx
DMW.pptx
 

Plus de Nagasuri Bala Venkateswarlu

Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Nagasuri Bala Venkateswarlu
 
Let us explore How To solve Technical Education in India.
Let us explore How To solve Technical Education in India.Let us explore How To solve Technical Education in India.
Let us explore How To solve Technical Education in India.Nagasuri Bala Venkateswarlu
 
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...Do We need to rejuvenate our self in Statistics to herald the 21st Century re...
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...Nagasuri Bala Venkateswarlu
 

Plus de Nagasuri Bala Venkateswarlu (20)

Building mathematicalcraving
Building mathematicalcravingBuilding mathematicalcraving
Building mathematicalcraving
 
Nbvtalkonmoocs
NbvtalkonmoocsNbvtalkonmoocs
Nbvtalkonmoocs
 
Nbvtalkatbzaonencryptionpuzzles
NbvtalkatbzaonencryptionpuzzlesNbvtalkatbzaonencryptionpuzzles
Nbvtalkatbzaonencryptionpuzzles
 
Nbvtalkon what is engineering(Revised)
Nbvtalkon what is engineering(Revised)Nbvtalkon what is engineering(Revised)
Nbvtalkon what is engineering(Revised)
 
Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.
 
Nbvtalkon what is engineering
Nbvtalkon what is engineeringNbvtalkon what is engineering
Nbvtalkon what is engineering
 
Fourth paradigm
Fourth paradigmFourth paradigm
Fourth paradigm
 
Let us explore How To solve Technical Education in India.
Let us explore How To solve Technical Education in India.Let us explore How To solve Technical Education in India.
Let us explore How To solve Technical Education in India.
 
Nbvtalkstaffmotivationataitam
NbvtalkstaffmotivationataitamNbvtalkstaffmotivationataitam
Nbvtalkstaffmotivationataitam
 
top 10 Data Mining Algorithms
top 10 Data Mining Algorithmstop 10 Data Mining Algorithms
top 10 Data Mining Algorithms
 
Dip
DipDip
Dip
 
Anits dip
Anits dipAnits dip
Anits dip
 
Bglrsession4
Bglrsession4Bglrsession4
Bglrsession4
 
Gmrit2
Gmrit2Gmrit2
Gmrit2
 
Introduction to socket programming nbv
Introduction to socket programming nbvIntroduction to socket programming nbv
Introduction to socket programming nbv
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Nbvtalkataitamimageprocessingconf
NbvtalkataitamimageprocessingconfNbvtalkataitamimageprocessingconf
Nbvtalkataitamimageprocessingconf
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Webinaron muticoreprocessors
Webinaron muticoreprocessorsWebinaron muticoreprocessors
Webinaron muticoreprocessors
 
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...Do We need to rejuvenate our self in Statistics to herald the 21st Century re...
Do We need to rejuvenate our self in Statistics to herald the 21st Century re...
 

Dernier

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Dernier (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Nbvtalkonfeatureselection

  • 1. Feature Selection and Extraction :Feature Selection and Extraction : An Introduction with emphasis onAn Introduction with emphasis on Principal Component AnalysisPrincipal Component Analysis Dr. N. B.Venkateswarlu, AITAM,Tekkali
  • 2. What I am going to cover? • What is feature selection/extraction • Need and discussion • Methodologies • PCA
  • 4. 4 What Is Feature Selection? • Selecting the most “relevant” subset of attributes according to some selection criteria.
  • 5. 5 Why Feature Selection? • High-dimensional data often contain irrelevant or redundant features – reduce the accuracy of data mining algorithms – slow down the mining process – be a problem in storage and retrieval – hard to interpret
  • 6. Why feature selection is important? • May improve performance of learning algorithm • Learning algorithm may not scale up to the size of the full feature set either in sample or time • Allows us to better understand the domain • Cheaper to collect a reduced set of features
  • 7. What is feature selection? cat 2 and 35 it 20 kitten 8 electric 2 trouble 4 then 5 several 9 feline 2 while 4 … lemon 2 cat 2 kitten 8 feline 2 Vegetarian No Plays video games Yes Family history No Athletic No Smoker Yes Sex Male Lung capacity 5.8L Hair color Red Car Audi … Weight 185 lbs Family history No Smoker Yes Task: classify whether a document is about cats Data: word counts in the document Task: predict chances of lung disease Data: medical history survey X X Reduced X Reduced X
  • 8. Characterising features • Generally, features are characterized as: – Relevant: These are features which have an influence on the output and their role can not be assumed by the rest – Irrelevant: Irrelevant features are defined as those features not having any influence on the output, and whose values are generated at random for each example. – Redundant: A redundancy exists whenever a feature can take the role of another (perhaps the simplest way to model redundancy).
  • 10.
  • 12. 12 Challenges in Feature Selection (1)Challenges in Feature Selection (1) • Dealing with ultra-high dimensional data and feature interactions Traditional feature selection encounter two major problems when the dimensionality runs into tens or hundreds of thousands: 1. curse of dimensionality 2. the relative shortage of instances.
  • 13. 13 Challenges in Feature Selection (2)Challenges in Feature Selection (2) • Dealing with active instances (Liu et al., 2005) When the dataset is huge, feature selection performed on the whole dataset is inefficient, so instance selection is necessary: – Random sampling (pure random sampling without exploiting any data characteristics) – Active feature selection (selective sampling using data characteristics achieves better or equally good results with a significantly smaller number of instances).
  • 14. 14 Challenges in Feature Selection (3)Challenges in Feature Selection (3) • Dealing with new data types (Liu et al., 2005) – traditional data type: an N*M data matrix Due to the growth of computer and Internet/Web techniques, new data types are emerging: – text-based data (e.g., e-mails, online news, newsgroups) – semistructure data (e.g., HTML, XML) – data streams.
  • 15. 15 Challenges in Feature Selection (4)Challenges in Feature Selection (4) • Unsupervised feature selection – Feature selection vs classification: almost every classification algorithm – Subspace method with the curse of dimensionality in classification – Subspace clustering.
  • 16. 16 Challenges in Feature Selection (5)Challenges in Feature Selection (5) • Dealing with predictive-but-unpredictable attributes in noisy data – Attribute noise is difficult to process, and removing noisy instances is dangerous – Predictive attributes: essential to classification – Unpredictable attributes: cannot be predicted by the class and other attributes • Noise identification, cleansing, and measurement need special attention [Yang et al., 2004]
  • 17. Feature Selection Methods • Feature selection is an optimization problem. – Search the space of possible feature subsets. – Pick the one that is optimal or near-optimal with respect to a certain criterion. Search strategies Evaluation strategies – Optimal - Filter methods – Heuristic - Wrapper methods – Randomized
  • 18. Evaluation Strategies • Filter Methods – Evaluation is independent of the classification algorithm or its error criteria. • Wrapper Methods – Evaluation uses a criterion related to the classification algorithm. • Wrapper methods provide more accurate solutions than filter methods, but in general are more computationally expensive.
  • 19. Typical Feature Selection – First step Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes 1 2 3 4 Generates subset of features for evaluation Can start with: •no features •all features •random subset of features
  • 20. Typical Feature Selection – Second step Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes 1 2 3 4 Measures the goodness of the subset Compares with the previous best subset if found better, then replaces the previous best subset
  • 21. Typical Feature Selection – Third step Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes 1 2 3 4 Based on Generation Procedure: •Pre-defined number of features •Pre-defined number of iterations Based on Evaluation Function: •whether addition or deletion of a feature does not produce a better subset •whether optimal subset based on some evaluation function is achieved
  • 22. Typical Feature Selection - Fourth step Generation Evaluation Stopping Criterion Validation Original Feature Set Subset Goodness of the subset No Yes 1 2 3 4 Basically not part of the feature selection process itself - compare results with already established results or results from competing feature selection methods
  • 23. Exhaustive Search • Assuming m features, an exhaustive search would require: – Examining all possible subsets of size n. – Selecting the subset that performs the best according to the criterion function. • The number of subsets grows combinatorialy, making exhaustive search impractical. • Iterative procedures are often used but they cannot guarantee the selection of the optimal subset. 23 m n    ÷  
  • 24. Naïve Search • Sort the given d features in order of their probability of correct recognition. • Select the top m features from this sorted list. • Disadvantage – Feature correlation is not considered. – Best pair of features may not even contain the best individual feature.
  • 25. Sequential forward selection (SFS) (heuristic search) • First, the best single feature is selected (i.e., using some criterion function). • Then, pairs of features are formed using one of the remaining features and this best feature, and the best pair is selected. • Next, triplets of features are formed using one of the remaining features and these two best features, and the best triplet is selected. • This procedure continues until a predefined number of features are selected. 25 SFS performs best when the optimal subset is small.
  • 26. Example 26 Results of sequential forward feature selection for classification of a satellite image using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the features added at each iteration (the first iteration is at the bottom). The highest accuracy value is shown with a star.
  • 27. Sequential backward selection (SBS) (heuristic search) • First, the criterion function is computed for all d features. • Then, each feature is deleted one at a time, the criterion function is computed for all subsets with d − 1 features, and the worst feature is discarded. • Next, each feature among the remaining d − 1 is deleted one at a time, and the worst feature is discarded to form a subset with d − 2 features. • This procedure continues until a predefined number of features are left. 27 SBS performs best when the optimal subset is large.
  • 28. Example 28 Results of sequential backward feature selection for classification of a satellite image using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the features removed at each iteration (the first iteration is at the bottom). The highest accuracy value is shown with a star.
  • 29. Plus-L minus-R selection (LRS) • A generalization of SFS and SBS – If L>R, LRS starts from the empty set and repeatedly adds L features and removes R features. – If L<R, LRS starts from the full set and repeatedly removes R features and adds L features. • Comments – LRS attempts to compensate for the weaknesses of SFS and SBS with some backtracking capabilities. – How to choose the optimal values of L and R?
  • 30. Bidirectional Search (BDS) • BDS applies SFS and SBS simultaneously: – SFS is performed from the empty set – SBS is performed from the full set • To guarantee that SFS and SBS converge to the same solution – Features already selected by SFS are not removed by SBS – Features already removed by SBS are not selected by SFS
  • 31. Sequential floating selection (SFFS and SFBS) • An extension to LRS with flexible backtracking capabilities – Rather than fixing the values of L and R, floating methods determine these values from the data. – The dimensionality of the subset during the search can be thought to be “floating” up and down • There are two floating methods: – Sequential floating forward selection (SFFS) – Sequential floating backward selection (SFBS) P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature selection, Pattern Recognition Lett. 15 (1994) 1119–1125.
  • 32. Sequential floating selection (SFFS and SFBS) • SFFS – Sequential floating forward selection (SFFS) starts from the empty set. – After each forward step, SFFS performs backward steps as long as the objective function increases. • SFBS – Sequential floating backward selection (SFBS) starts from the full set. – After each backward step, SFBS performs forward steps as long as the objective function increases.
  • 33. Argument for wrapper methods • The estimated accuracy of the learning algorithm is the best available heuristic for measuring the values of features. • Different learning algorithms may perform better with different feature sets, even if they are using the same training set.
  • 34. Wrapper selection algorithms (1) • The simplest method is forward selection (FS). It starts with the empty set and greedily adds features one at a time (without backtracking). • Backward stepwise selection (BS) starts with all features in the feature set and greedily removes them one at a time (without backtracking).
  • 35. Wrapper selection algorithms (2) • The Best First search starts with an empty set of features and generates all possible single feature expansions. The subset with the highest evaluation is chosen and is expanded in the same manner by adding single features (with backtracking). The Best First search (BFFS) can be combined with forward or backward selection (BFBS). • Genetic algorithm selection. A solution is typically a fixed length binary string representing a feature subset—the value of each position in the string represents the presence or absence of a particular feature. The algorithm is an iterative process where each successive generation is produced by applying genetic operators such as crossover and mutation to the members of the current generation.
  • 36. Disadvantages of Support Vector Machines "Perhaps the biggest limitation of the support vector approach lies in choice of the kernel." Burgess (1998) "A second limitation is speed and size, both in training and testing." Burgess (1998) "Discete data presents another problem..." Burgess (1998) "...the optimal design for multiclass SVM classifiers is a further area for research." Burgess (1998)
  • 37. "Although SVMs have good generalization performance, they can be abysmally slow in test phase, a problem addressed in (Burges, 1996; Osuna and Girosi, 1998)." Burgess (1998) "Besides the advantages of SVMs - from a practical point of view - they have some drawbacks. An important practical question that is not entirely solved, is the selection of the kernel function parameters - for Gaussian kernels the width parameter [sigma] - and the value of [epsilon] in the [epsilon]-insensitive loss function...[more]" Horváth (2003) in Suykens et al. "However, from a practical point of view perhaps the most serious problem with SVMs is the high algorithmic complexity and extensive memory requirements of the required quadratic programming in large- scale tasks." Horváth (2003) in Suykens et al. p 392
  • 39. •Another names for PCA: 1) Karhunen-Loewe Transformation (KLT); 2) Hotelling Transformation. 3)Eigenvector Analysis •Properties of PCA 1) Data decorrelation; 2) Dimensionality reduction.
  • 40. Important Objective • The goal of principal component analysis is to take n variables x1, x2,…, xn and find linear combinations of these variables to produce a new set of variables y1, y2, …, yn that are uncorrelated. The transformed variables are indexed or ordered so that y1 shows the largest amount of variation, y2 has the second largest amount of variation, and so on.
  • 41. x’y’ • PCA finds an orthogonal basis that best represents given data set. • 2 PCA – the general idea x y
  • 42. PCA – the general idea • PCA finds an orthogonal basis that best represents given data set. • PCA finds a best approximating plane (again, in terms of Σdistances2 ) 3D point set in standard basis x y z
  • 43. PCA – the general idea • PCA finds an orthogonal basis that best represents given data set. • PCA finds a best approximating plane (again, in terms of Σdistances2 ) 3D point set in standard basis
  • 44. Application: finding tight bounding box • An axis-aligned bounding box: agrees with the axes x y minX maxX maxY minY
  • 45. Usage of bounding boxes (bounding volumes)  Serve as very simple “approximation” of the object  Fast collision detection, visibility queries  Whenever we need to know the dimensions (size) of the object  The models consist of thousands of polygons  To quickly test that they don’t intersect, the bounding boxes are tested  Sometimes a hierarchy of BB’s is used  The tighter the BB – the less “false alarms” we have
  • 46. Centered data points x in n-dimensional space: µ is mean value of the vector x. Covariance matrix C for the centered data: { } ( ) { }. ,......'xxC , 1 1 jiji n n xxc xx x x Ε=           ⋅           Ε=Ε=  ,...x 1           = nx x  { }           =Ε= 0 ... 0 xμ  Here E{f(x)} is expectation value of f(x).
  • 47. { } ( ){ } { } { } Cw.w'wxx'w' wxx'w'wx''wx')x( 22 w =Ε= =Ε=Ε=Ε= wPσ The variance of the projection on to the direction ww: Projection of the data point xx on to direction ww: w.x'w,x)x( ⋅>==<wP w x )x(wP x1 x2
  • 48. So, The vector ww should be normalized: Hence, finding the normalized direction of maximal variances reduces to the following computation. Maximizing varianceMaximizing variance: The normalized direction ww that maximizes the variance can be found by solving the following problem: { },Cww'max w subject to: .1ww'w 2 == Cw.w'2 w =σ .1w 2 =
  • 49. The constrained optimization problem is reduced to unconstrained one using method of Lagrange multipliers: { } ( ) wCwww'Cww' w )1ww'(Cww'max w λλ λ −=− ∂ ∂ −− 0.wCw =−λ Condition for maximum of the function: We have to solve the following equation: wCw λ= and find eigenvalues λi and eigenvectors wi of the covariance matrix C.
  • 50. Covariance matrix C is symmetric, so the equation has n distinct solutions: • n eigenvectors (w1,w2, …, wn) that form orthonormal basis in n dimensional space: • n positive eigenvalues that are the data variances along the corresponding eigenvectors: λ1≥ λ2≥ … ≥ λn ≥0. wCw λ=    ≠ = = .0 ;,1 ww' ji ji ji
  • 51. • Direction of the maximum variance is given by the eigenvector w1 corresponding to the largest eigenvalue λ1 and the variance of the projection on to the direction is equal to the largest eigenvalue λ1. • The direction w1 is called the first principal axes.first principal axes. • The directionThe direction w2 is called the secondsecond principal axes, and so on. w1 w2
  • 52. 1)1) Decorrelation property of PCADecorrelation property of PCA • Let’s represent vector xx as linear combination of n eigenvectors with the coefficients aaii: x=a1w1+ a2w1+…+ anwn, where coefficients aaii ≡Pw(x)=x’wi≡ (wi)’x are computed as projection of vector xx on to the basis vectors wwi.i.
  • 53. 1)1) Decorrelation property of PCADecorrelation property of PCA • Calculate correlation of the coefficients ai and aj: { } { } { } .wwwxx'wwxx'w ''' jijijiji Caa =Ε=Ε=Ε • From the main property of eigenvalues and eigenvector it follows that any pair of the coefficients is uncorrelated: { } .wwww , '' jijjijjiji Caa δλλ ===Ε where δi,j is Kronecker symbol:    ≠ = = ji ji ji 0 ,1 ,δ { } .2 iia λ=Ε { } .,0 jiaa ji ≠=Ε •Variance of the projection is eigenvalue: • Projections are pairwise uncorrelated:
  • 54. 2) PCA dimensionality reduction: •The objective of PCA is to perform dimensionality reduction while preserving as much of the data in high- dimensonal space as possible: * for visualization, * for compression, * to cancel data containing low/no information. Demonstrations with 1-D and 2-D data points on transparencies in 3-D space.
  • 55. 2) PCA dimensionality reduction: Main idea • Find the m first eigenvectors corresponding to the m largest eigenvalues. • Project the data points into the subspace spanned on to the first m eigenvectors: ∑= = m k kkm P 1 w,...,w )ww(x')x(1
  • 56. • Input data: { } .)x(x 1 2 w,...,w 2 1 ∑+= =−Ε= n mk km m P λσ • Data projected into the subspace spanned on to the first m eigenvectors: • Error caused by the dimensionality reduction: ∑= = n k kk 1 )wwx'(x • Variance of the error: ∑= = m k kkm P 1 w,...,w )wwx'()x(1 ∑+= =−= n mk kkm P 1 w,...,w )wwx'()x(xδx 1
  • 57. { } .)x(x 1 2 w,...,w 2 1 ∑+= =−Ε= n mk km m P λσ • Varianc of the error is equal to the sum of eigenvalues for dropped-out dimensions: • The PCA used to a some sense optimal representation of data in a low-dimensional subspace of the original high-dimensional pattern space. PCA provides the mimimal mean squared error among all other linear transformations. • This subspace is spanned by the first m eigenvectors of covariance matrix C corresponding to m largest eigenvalues.
  • 58. . w u k k k λ = Data ”whitenning”Data ”whitenning” Let’s introduce new basis by scaling the eigenvectors: • The new uu basis is orthonormal: • Variances of projections in the new basis uu are equal to unit: 1.2 u =σ jiji , ' uu δ=
  • 59. Scatter matrix - eigendecomposition • S is symmetric ⇒ S has eigendecomposition: S = VΛV T S = v2v1 vn λ1 λ2 λn v2 v1 vn The eigenvectors form orthogonal basis
  • 60. Principal components • S measures the “scatterness” of the data. • Eigenvectors that correspond to big eigenvalues are the directions in which the data has strong components. • If the eigenvalues are more or less the same – there is not preferable direction.
  • 61. Principal components • There’s no preferable direction • S looks like this: • Any vector is an eigenvector       λ λ • There is a clear preferable direction • S looks like this: ∀ µ is close to zero, much smaller than λ. T VV       µ λ
  • 62. How to use what we got • For finding oriented bounding box – we simply compute the bounding box with respect to the axes defined by the eigenvectors. The origin is at the mean point m. v2 v1 v3
  • 63. For approximation x y v1 v2 x y This line segment approximates the original data set The projected data set approximates the original data set x y
  • 64. For approximation • In general dimension d, the eigenvalues are sorted in descending order: λ1 ≥ λ2 ≥ … ≥ λd • The eigenvectors are sorted accordingly. • To get an approximation of dimension d’ < d, we take the d’ first eigenvectors and look at the subspace they span (d’ = 1 is a line, d’ = 2 is a plane…)
  • 65. For approximation • To get an approximating set, we project the original data points onto the chosen subspace: xi = m + α1v1+ α2v2+…+ αd’vd’+…+αdvd Projection: xi’ = m + α1v1+ α2v2+…+ αd’vd’+0⋅vd’+1+…+ 0⋅ vd
  • 66. Optimality of approximation • The approximation is optimal in least- squares sense. It gives the minimal of: • The projected points have maximal variance. ∑= ′− n k kk 1 2 xx Original set projection on arbitrary line projection on v1 axis
  • 67. • A line graph of the (ordered) eigenvalues is called a scree plot
  • 68. How to work in practice? •We have training set X of size l (l n-dimensional vectors): X,X' 1 C l = . 1 1 ,,, ∑= = l i kijikj xx l cwhere           = nll n xx xx ,1, ,11,1 ... ... ... X •Evaluate covariance matrix using the training set X: •Find eigenvalues and eigenvectors for the covariance matrix.
  • 69. Principal Component Analysis (PCA)Principal Component Analysis (PCA) takes an initial subset of the principal axes of the training data and project the data (both training and test) into the space spanned by this set of eigenvectors. •The data is projected onto subspace spanned by m first eigenvectors of covariance matrix. The new coordinates are known as principal coordinates with eigenvectors referred as principal axes.
  • 70. Algorithm:Algorithm: Input: Dataset X={x1, x2, …, xl}⊆ℜn , Process: ∑= = l i i l 1 x 1 μ ')μx)(μx( 1 C 1 ∑= −−= l i ii l [ ] )C(eig,W l=Λ ,...,l.,iii 21,xWx~ =⋅= Output: transformed data { }lS x~,...,x~,x~~ 21= { }kxxx ~,...,~,~x~ 21=
  • 71. Example:Example: 8 vectors in 2-D space8 vectors in 2-D space X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8]; •Mean values: • Centered data: [-4,-3; -2,-2; -2,0; 0,-1; 0,1; 1,0; 3,2; 4,3] • Covariance matrix C:       = 5.325.4 25.425.6 C ()25.6 8 50 4 4 3 3 1 1 0 0 0 0 )2 )(2 ( )2 )(2 ( )4 )(4 ( 8 1 1,1= = ⋅ + ⋅ + ⋅ + ⋅ + ⋅ + − − + − − + − − = c ()5.3 8 28 3 3 2 2 0 0 1 1 )1 )(1 ( 0 0 )2 )(2 ( )3 )(3 ( 8 1 2,2= = ⋅ + ⋅ + ⋅ + ⋅ + − − + ⋅ + − − + − − = c ()25.4 8 34 3 4 2 3 0 1 1 0 )1 ( 0 0 )2 ( )2 )(2 ( )3 )(4 ( 8 1 1,2 2,1= = ⋅ + ⋅ + ⋅ + ⋅ + − ⋅ + ⋅ − + − − + − − = =c c ()5 8 40 9 8 6 5 5 3 3 1 8 1 1= = + + + + + + + = µ ()5 8 40 8 7 5 6 4 5 3 2 8 1 2= = + + + + + + + = µ
  • 72. Example:Example: 8 vectors in 2-D space8 vectors in 2-D space X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8];       = 5.325.4 25.425.6 C         =        ⋅      2 1 2 1 5.325.4 25.425.6 x x x x w w w w λ Find eigenvalues and eigenvectors of the covariance matrix C: wCw λ= Covariance matrix C:
  • 73. 1) Find eigenvalues1) Find eigenvalues ( ) 0I =− =⋅ wC wwC nλ λ 41.0 34.9 2 1 = = λ λ Solve the characteristic equation: ( ) 025.425.4)5.3)(25.6( 0 5.325.4 25.425.6 det 0Idet =⋅−−− =      − − =− λλ λ λ λ nC Eigenvalues:
  • 74. 2) Find eigenvectors for the eigenvalues:2) Find eigenvectors for the eigenvalues: 2221 22 21 22 21 1111 12 11 12 11 376.1 41.0 41.0 5.325.4 25.425.6 b) 376.1 34.9 34.9 5.325.4 25.425.6 a) ww w w w w ww w w w w ⋅−=⇒      ⋅ ⋅ =      ⋅      ⋅=⇒      ⋅ ⋅ =      ⋅           − =      ⇒=+⋅⇒=+       =      ⇒=+⋅⇒=+ 81.0 59.0 1)376.1(1 59.0 81.0 1)376.1(1 22 212 22 2 22 22 22 2 21 12 112 12 2 12 22 12 2 11 w w wwww w w wwww Normalized the eigenvectors
  • 75. Check orthogonality: ( ) 0 81.0 59.0 59.081.02 ' 1 =     − =ww      − =            =      81.0 59.0 59.0 81.0 22 21 12 11 w w w w Check normalization: ( ) 1 59.0 81.0 59.081.01 ' 1 =      =ww ( ) 1 81.0 59.0 81.059.02 ' 2 =     − −=ww Eigenvectors:
  • 76. Orthonormal basis:           = ' ' 1 w ... w W k       ⋅        =      2 1 ' 2 ' 1 2 1 w w ~ ~ x x x x Transfromation (projection) into new basis:       − =      ⋅      − 6.0 5 3 4 81.059.0 59.081.0 Example: Wxx~ =
  • 78. The example with MatlabThe example with Matlab X=[1,2; 3,3; 3,5; 5,4; 5,6; 6,5; 8,7; 9,8]; X1=X(:,1); X2=X(:,2); X1=X1-mean(X1); % Centered data X2=X2-mean(X2); % Centered data C=cov(X1,X2); eval C;       = 0.48571.4 8571.41429.7 C % Divided by (l-1), not l! [W,Lambda]=eig(C);       = 6764.100 04664.0 Lambda Eigenvectors do not depend on scaling of covariance matrix, eigenvalues do.       −− − = 5883.08086.0 8086.05883.0 W
  • 80.
  • 81. Case Study 1: Gender Classification • Determine the gender of a subject from facial images. – Race, age, facial expression, hair style, etc. Z. Sun, G. Bebis, X. Yuan, and S. Louis, "Genetic Feature Subset Selection for Gender Classification: A Comparison Study", IEEE Workshop on Applications of Computer Vision, pp. 165-170, Orlando, December 2002.
  • 82. Feature Extraction Using PCA • PCA maps the data in a lower-dimensional space using a linear transformation. • The columns of the projection matrix are the “best” eigenvectors (i.e., eigenfaces) of the covariance matrix of the data.
  • 83. Which eigenvectors encode mostly gender information? EV#1 EV#2 EV#3 EV#4 EV#5 EV#6 EV#8 EV#10 EV#12 EV#14 EV#19 EV#20
  • 84. Dataset • 400 frontal images from 400 different people – 200 male, 200 female – Different races, lighting conditions, and facial expressions • Images were registered and normalized – No hair information – Account for different lighting conditions
  • 85. Experiments • Classifiers – LDA – Bayes classifier – Neural Networks (NNs) – Support Vector Machines (SVMs) • Comparison with SFBS • Three-fold cross validation – Training set: 75% of the data – Validation set: 12.5% of the data – Test set: 12.5% of the data
  • 86. Error Rates ERM: error rate using top eigenvectors ERG: error rate using GA selected eigenvectors 17.7% 11.3% 22.4% 13.3% 14.2% 9% 8.9% 4.7% 6.7%
  • 87. Ratio of Features - Information Kept RN: percentage of eigenvectors in the feature subset selected. RI: percentage of information contained in the eigenvector subset selected. 17.6% 38% 13.3% 31% 36.4% 61.2% 8.4% 32.4% 42.8% 69.0%
  • 88. Distribution of Selected Eigenvectors (a) LDA (b) Bayes (d) SVMs(c) NN
  • 89. Reconstructed Images Reconstructed faces using GA-selected EVs do not contain information about identity but disclose strong gender information! Original images Using top 30 EVs Using EVs selected by SVM+GA Using EVs selected by NN+GA
  • 90. Comparison with SFBS Original images Top 30 EVs EVs selected by SVM+GA EVs selected by SVM+SFBS
  • 91. Case Study 2: Vehicle Detection low light camera rear views Non-vehicles class much larger than vehicle class. Z. Sun, G. Bebis, and R. Miller, "Object Detection Using Feature Subset Selection", Pattern Recognition, vol. 37, pp. 2165-2176, 2004. Ford Motor Company
  • 92. Which eigenvectors encode the most important vehicle features?
  • 93. Experiments • Training data set (collected in Fall 2001)  2102 images (1051 vehicles and 1051 non-vehicles) • Test data sets (collected in Summer 2001)  231 images (vehicles and non-vehicles) • Comparison with SFBS • Three-fold cross-validation • SVM for classification
  • 95. Histograms of Selected Eigenvectors SFBS-SVM GA-SVM Number of eigenvectors selected by SBFS: 87 (43.5% information) Number of eigenvectors selected by GA: 46 (23% information)
  • 96. Vehicle Detection Original Top 50 EVs EVs selected by SFBS Evs selected by GAs Reconstructed images using the selected feature subsets. - Lighting differences have been disregarded by the GA approach.

Notes de l'éditeur

  1. 1 curse of dimensionality As most existing feature selection algorithms have quadratic or higher time complexity about N, it is difficult to scale up with high dimensionality. 2 the relative shortage of instances. That is, the dimensionality N can sometimes greatly exceed the number of instances I
  2. Random sampling (pure random sampling without exploiting any data characteristic) Active feature selection (selective sampling by using data characteristics achieves better or equally good results with a significantly smaller number of instances)