PCA for the uninitiated

8/22/13 PCA for the uninitiated
benmabey.com/presentations/pca-tutorial/#2 1/51
PCAfortheuninitiated
Intuitive motivation via maximum variance interpretation
Ben Mabey
benmabey.com
github.com/bmabey
@bmabey
D
ow
nload

For PDF viewers...
This deck can be found in its original (and better) HTML5 form at
benmabey.com/presentations/pca-tutorial/
.N.B.: The deck isn't completely standalone since I don't explain every step made as I did when
actually presenting it. That said I think the deck should be useful for anyone who wants to get a quick
idea of what PCA is and the math behind it (I only take into account conventional PCA, not
probabilistic interpretations). I am inconsistent with some of my equations to make some of the
algebra easier (all legal though!) which I explained during the actual presentation. For people who
want to go deeper and follow the math more closely I highly recommend the tutorial by Jonathan
Shlens which is where I got most of my derivations.
See the last slide of the deck for additional resources.
2/51

The ubiquitous & versatile PCA
Dimensionality Reduction
Noise Reduction
Exploration
Feature Extraction
Regression (Orthogonal)
·
Data Visualization
Learn faster
Lossy Data Compression
-
-
-
·
·
·
·
Unsupervised Learning Algorithm
K-Means
Computer Graphics (e.g. Bounded Volumes)
and many more across various domains...
·
Anomaly Detection (not the best)
Matching/Distance (e.g. Eigenfaces, LSI)
-
-
·
·
·
3/51

Majority of PCA tutorials...
1. Organize dataset as matrix.
2. Subtract off the mean for each measurement.
3. Calculate the covariance matrix and perform eigendecomposition.
4. Profit!
4/51

3. Calculate the covariance correlation matrix and perform eigendecomposition.
4. Profit!
5/51

3. Calculate the covariance correlation matrix and perform eigendecomposition.
4. Perform SVD.
5. Profit!
6/51

7/51

The intuitive Magic Math behind PCA
Maximize the variance.
Minimize the projection error.
·
·
8/51

=Pm×m Xm×n Ym×n
9/51

http://www.squidoo.com/noise-sources-signal-noise-ratio-snr-and-a-look-at-them-in-the-frequency-domain

SNR =
2 2
signal
2 2
noise
11/51

Rotate to maximize variance
13/51

14/51

library(PerformanceAnalytics)
chart.Correlation(iris[-5],bg=iris$Species,pch=21)
15/51

chart.Correlation(decorrelated.iris,bg=iris$Species,pch=21)
16/51

Variance and Covariance
MATHEMATICALLY USEFUL INTUITIVE
Dispersion
Relationship
unitless measure
or is if and only if and are uncorrelated.
= var(A)2 2
A
=
=
E[(A J ]+A
)
2
( J1
n ∑
i=1
n
ai +A
)
2
= stddev(A) =2A var(A)_ _____R
= cov(A, B)2AB =
=
E[(A J )(B J )]+A
+B
( J )( J )
1
n ∑
i=1
n
ai +A
bi +B
= =0AB
2AB
2A 2B
cov(AB)
stddev(A) stddev(B)
(J1.0..1.0)
cov(A, A) = var(A)
2AB 0AB
0 A B
17/51

Covariance Matrix
Preprocess so that it has zero mean. Now
¤ =
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
21,1
22,1
'
2n,1
21,2
22,2
'
2n,2
(
(
*
(
21,n
22,n
'
2n,n
⎤
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
X =2AB
1
n
I n
i=1
ai
bi
= X¤
X
1
n
X
T
18/51

center<-function(x)x-mean(x)
iris.centered<-apply(as.matrix(iris[-5]),2,center)
(t(iris.centered)%*%iris.centered)/(nrow(iris)-1)
## Sepal.LengthSepal.WidthPetal.LengthPetal.Width
##Sepal.Length 0.68569 -0.04243 1.2743 0.5163
##Sepal.Width -0.04243 0.18998 -0.3297 -0.1216
##Petal.Length 1.27432 -0.32966 3.1163 1.2956
##Petal.Width 0.51627 -0.12164 1.2956 0.5810
19/51

center<-function(x)x-mean(x)
m.centered<-apply(as.matrix(iris[-5]),2,center)
(t(m.centered)%*%m.centered)/(nrow(iris)-1)
##Sepal.Length 0.68569 -0.04243 1.2743 0.5163
##Sepal.Width -0.04243 0.18998 -0.3297 -0.1216
##Petal.Length 1.27432 -0.32966 3.1163 1.2956
##Petal.Width 0.51627 -0.12164 1.2956 0.5810
cov(iris[-5])
##Sepal.Length 0.68569 -0.04243 1.2743 0.5163
##Sepal.Width -0.04243 0.18998 -0.3297 -0.1216
##Petal.Length 1.27432 -0.32966 3.1163 1.2956
##Petal.Width 0.51627 -0.12164 1.2956 0.5810
20/51

What would our ideal look like?
i.e. is decorrelated.
¤
Y
PX = Y
=¤
Y
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
2 2
1
2 2
2
0
0
*
2 2
n
⎤
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
Y
21/51

Our goal...
Find some orthonormal matrix in
such that is a
diagonal matrix. The rows of are
the principal components of .
Note, that I transposed the design matrix (the data) so that covariance
calculation is also reversed. This will make our life easier...
P
PX = Y = Y¤
Y Y
T
Yn P
X
22/51

Rewrite in terms of the unknown...¤
Y
¤
Y
¤
Y
=
=
=
=
=
Y
1
n
Y
T
(PX)(PX
1
n
)
T
PX
1
n
X
T
P
T
P( X )
1
n
X
T
P
T
P¤
X
P
T
23/51

Spectral Theorem / Principal Axis Theorem
Every symmetric matrix has the
eigendecomposition (i.e. can be
diagnolized) of:
A = Q| = Q|
Q
J1
Q
T

Remember, we are choosing what is...P
PX = Y
25/51

Remember, we are choosing what is...
Let every row, , be an eigenvector of . What this
means is that
where comes from the eigendecomposition of .
P
pi
¤
X
P = Q
T
Q ¤
X
= Q|
¤
X Q
T
26/51

Turn the Algebra crank...
¤
Y
¤
Y
=
=
=
=
=
=
P¤
X P
T
P(Q| )Q
T
P
T
P( |P)P
T
P
T
(P )|(P )P
T
P
T
I|I
|
¤
X
The principal components are linear combinations of original features of .
The principal components of are the eigenvectors of .
The corresponding eigenvaules lie in and represent the variance.
· X
· X ¤
X
· ¤
Y
27/51

Manual PCA in R
iris.eigen=eigen(cov(iris.centered))
rownames(iris.eigen$vectors)=colnames(iris.centered)
colnames(iris.eigen$vectors)=c("PC1","PC2","PC3","PC4")
iris.eigen
##$values
##[1]4.228240.242670.078210.02384
##
##$vectors
## PC1 PC2 PC3 PC4
##Sepal.Length 0.36139-0.65659-0.58203 0.3155
##Sepal.Width -0.08452-0.73016 0.59791-0.3197
##Petal.Length 0.85667 0.17337 0.07624-0.4798
##Petal.Width 0.35829 0.07548 0.54583 0.7537
28/51

Make the contributions intuitive...
iris.eigen$vectors^2
## PC1 PC2 PC3 PC4
##Sepal.Length0.1306000.4311090.3387590.09953
##Sepal.Width 0.0071440.5331360.3574970.10222
##Petal.Length0.7338850.0300580.0058120.23025
##Petal.Width 0.1283710.0056970.2979320.56800
29/51

squared<-iris.eigen$vectors^2
sorted.squares<-squared[order(squared[,1]),1]
dotplot(sorted.squares,main="VariableContributionstoPC1",cex=1.5,col="red")
30/51

#library(FactoMineR);iris.pca<-PCA(iris,quali.sup=5)
plot(iris.pca,choix="var",title="CorrelationCircle")
31/51

#res.pca<-PCA(decathlon,quanti.sup=11:12,quali.sup=13)
plot(res.pca,choix="var",title="CorrelationCircle")
32/51

What does the variance (eigenvaules) tell us?
iris.eigen$values #ThevarianceforeachcorrespondingPC
##[1]4.228240.242670.078210.02384
33/51

#library(FactoMineR);iris.pca<-PCA(iris,quali.sup=5)
plot(iris.pca,habillage=5,col.hab=c("green","blue","red"),title="DatasetprojectedontoPC1-2Su
34/51

How many components should you keep?
Ratio of variance retained (e.g. 99% is common):
Ik
i=1
2i
In
i=1
2i
cumsum(iris.eigen$values/sum(iris.eigen$values))
##[1]0.92460.97770.99481.0000
35/51

The Elbow Test
iris.prcomp<-prcomp(iris[-5],center=TRUE,scale=FALSE)
screeplot(iris.prcomp,type="line",main="ScreePlot")
36/51

Kaiser Criterion
Keep only the components whose eigenvalue is larger than the average eigenvalue. For a correlation
PCA, this rule boils down to the standard advice to "keep only the eigenvalues larger than 1".
eigen(cor(iris.centered))$values
##[1]2.918500.914030.146760.02071
37/51

Remeber, always...
CROSS
VALIDATE!PCA is overused and commonly misused, so always verify it is helping by cross validating.
38/51

Lots of other ways to aid interpretation...
iris.prcomp<-prcomp(iris[-5],center=TRUE,scale=FALSE)
biplot(iris.prcomp)
39/51

Learn more...
40/51

42/51

How will PCA perform?
scaled.iris<-iris
scaled.iris$Petal.Length<-iris$Petal.Length/1000
scaled.iris$Petal.Width<-iris$Petal.Width/1000
scaled.iris$Sepal.Width<-iris$Sepal.Width*10
43/51

Scale Matters
44/51

Correlation Matrix - Standardize the data
#(Inpracticejustusethebuilt-incorfunction)
standardize<-function(x){
centered<-x-mean(x)
centered/sd(centered)
}
scaled.iris.standardized<-apply(as.matrix(scaled.iris[-5]),2,standardize)
(t(scaled.iris.standardized)%*%scaled.iris.standardized)/(nrow(iris)-1)
##Sepal.Length 1.0000 -0.1176 0.8718 0.8179
##Sepal.Width -0.1176 1.0000 -0.4284 -0.3661
##Petal.Length 0.8718 -0.4284 1.0000 0.9629
##Petal.Width 0.8179 -0.3661 0.9629 1.0000
45/51

Ok, so why SVD? And how is it equivalent?
Short answer on why:
SVD is more numerically stable
More efficient
Especially when operating on a wide matrix.. you skip the step of calculating the covariance matrix
There are a lot of SVD algoritms and implementations to choose from
·
·
·
46/51

"absolutely a high point of linear algebra"
Every matrix has the singular value
decomposition (SVD) of:
A = UDV
T

Hey, and look familar...
Recall that eigendecomposition for an orthonormal matrix is .
Therefore are the eigenvectors of and are the eigenvalues.
Likewise are the eigenvectors of and are the eigenvalues.
AA
T
AA
T
A
AA
T
AA
T
=
=
=
=
=
UDV
T
UD (UDV
T
V
T
)
T
UD VV
T
D
T
U
T
UD ( V = I since V, and U, are orthonormal)D
T
U
T
V
T
U (since D is a diagnol matrix)D
2
U
T
A = Q|Q
T
U AA
T
D
2
V AA
T
D
2
48/51

Turn the crank once more...
Let a new matrix where each column of is mean centered.
So, if we run SVD on our then will contain the eigenvectors of ... 's principal components!
Our eigenvalues, the variances, will be .
Y =
1
n
√
X
T
Y
YY
T
YY
T
=
=
=
( ( )
1
n__√
X
T
)
T
1
n__√
X
T
X
1
n
X
T
¤
X
Y V ¤
X X
D
2
49/51

Tada!
y<-iris.centered/sqrt(nrow(iris)-1)
y.svd<-svd(y)
pcs<-y.svd$v
rownames(pcs)=colnames(iris.centered)
colnames(pcs)=c("PC1","PC2","PC3","PC4")
pcs
## PC1 PC2 PC3 PC4
##Sepal.Length 0.36139-0.65659 0.58203 0.3155
##Sepal.Width -0.08452-0.73016-0.59791-0.3197
##Petal.Length 0.85667 0.17337-0.07624-0.4798
##Petal.Width 0.35829 0.07548-0.54583 0.7537
y.svd$d^2 #variances
##[1]4.228240.242670.078210.02384
50/51

References and Resources
1. Jon Shlens (versions 2.0 and 3.1), Tutorial on Principal Component Analysis
2. H Abdi and L J Williams (2010), Principal component analysis
3. Andrew Ng (2009), cs229 Lecture Notes 10
4. Andrew Ng (2009), cs229 Lectures 14 & 15
5. Christopher Bishop (2006), Pattern Recognition and Machine Learning,
section 12.1
6. Steve Pittard (2012), Principal Components Analysis Using R
7. Quick-R, Principal Components and Factor Analysis (good pointers to
additional R packages)
8. C Ding, X He (2004), K-means Clustering via Principal Component Analysis

PCA for the uninitiated

Recommandé

Recommandé

Contenu connexe

Similaire à PCA for the uninitiated

Similaire à PCA for the uninitiated (20)

Plus de Ben Mabey

Plus de Ben Mabey (6)

Dernier

Dernier (20)

PCA for the uninitiated