SlideShare une entreprise Scribd logo
1  sur  43
Principal Component Analysis
(Dimensionality Reduction)
Overview:
• What is Principal Component Analysis
• Computing the compnents in PCA
• Dimensionality Reduction using PCA
• A 2D example in PCA
• Applications of PCA in computer vision
• Importance of PCA in analysing data in higher dimensions
• Questions
Feature reduction vs feature selection
• Feature reduction
– All original features are used
– The transformed features are linear combinations
of the original features.
• Feature selection
– Only a subset of the original features are used.
Why feature reduction?
• Most machine learning and data mining
techniques may not be effective for high-
dimensional data
– Curse of Dimensionality
– Query accuracy and efficiency degrade rapidly as the
dimension increases.
• The intrinsic dimension may be small.
– For example, the number of genes responsible for a
certain type of disease may be small.
Why feature reduction?
• Visualization: projection of high-dimensional data
onto 2D or 3D.
• Data compression: efficient storage and retrieval.
• Noise removal: positive effect on query accuracy.
Applications of feature reduction
• Face recognition
• Handwritten digit recognition
• Text mining
• Image retrieval
• Microarray data analysis
• Protein classification
• We study phenomena that can not be directly observed
– ego, personality, intelligence in psychology
– Underlying factors that govern the observed data
• We want to identify and operate with underlying latent
factors rather than the observed data
– E.g. topics in news articles
– Transcription factors in genomics
• We want to discover and exploit hidden relationships
– “beautiful car” and “gorgeous automobile” are closely related
– So are “driver” and “automobile”
– But does your search engine know this?
– Reduces noise and error in results
Why Factor or Component Analysis?
• We have too many observations and dimensions
– To reason about or obtain insights from
– To visualize
– Too much noise in the data
– Need to “reduce” them to a smaller set of factors
– Better representation of data without losing much information
– Can build more effective data analyses on the reduced-
dimensional space: classification, clustering, pattern recognition
• Combinations of observed variables may be more
effective bases for insights, even if physical meaning is
obscure
Why Factor or Component Analysis?
• Discover a new set of factors/dimensions/axes against
which to represent, describe or evaluate the data
– For more effective reasoning, insights, or better visualization
– Reduce noise in the data
– Typically a smaller set of factors: dimension reduction
– Better representation of data without losing much information
– Can build more effective data analyses on the reduced-
dimensional space: classification, clustering, pattern recognition
• Factors are combinations of observed variables
– May be more effective bases for insights, even if physical
meaning is obscure
– Observed data are described in terms of these factors rather than
in terms of original variables/dimensions
Factor or Component Analysis
Basic Concept
Areas of variance in data are where items can be best discriminated and key
underlying phenomena observed
Areas of greatest “signal” in the data
If two items or dimensions are highly correlated or dependent
They are likely to represent highly related phenomena
If they tell us about the same underlying variance in the data, combining them to
form a single measure is reasonable
Parsimony
Reduction in Error
So we want to combine related variables, and focus on uncorrelated or
independent ones, especially those along which the observations have high
variance
We want a smaller set of variables that explain most of the variance in the
original data, in more compact and insightful form
Basic Concept
What if the dependences and correlations are not so strong or
direct?
And suppose you have 3 variables, or 4, or 5, or 10000?
Look for the phenomena underlying the observed covariance/co-
dependence in a set of variables
Once again, phenomena that are uncorrelated or independent, and
especially those along which the data show high variance
These phenomena are called “factors” or “principal components”
or “independent components,” depending on the methods used
Factor analysis: based on variance/covariance/correlation
Independent Component Analysis: based on independence
Principal Component Analysis
Most common form of factor analysis
The new variables/dimensions
Are linear combinations of the original ones
Are uncorrelated with one another
Orthogonal in original dimension space
Capture as much of the original variance in the data as
possible
Are called Principal Components
Principal Component Analysis
•Most common form of factor analysis
•The new variables/dimensions
–Are linear combinations of the original ones
–Are uncorrelated with one another
•Orthogonal in original dimension space
–Capture as much of the original variance in the data as possible
–Are called Principal Components
What are the new axes?
• Orthogonal directions of greatest variance in data
• Projections along PC1 discriminate the data most along any one axis
Principal Components
•First principal component is the direction of greatest
variability (covariance) in the data
• Second is the next orthogonal (uncorrelated) direction of greatest variability
– So first remove all the variability along the first
component, and then find the next direction of
greatest variability
• And so on …
Principal Components Analysis (PCA)
• Principle
– Linear projection method to reduce the number of parameters
– Transfer a set of correlated variables into a new set of uncorrelated
variables
– Map the data into a space of lower dimensionality
– Form of unsupervised learning
• Properties
–It can be viewed as a rotation of the existing axes to new positions in the
space defined by original variables
–New axes are orthogonal and represent the directions with maximum
variability
Computing the Components
• Data points are vectors in a multidimensional space
• Projection of vector x onto an axis (dimension) u is u.x
•Direction of greatest variability is that in which the average square of the
projection is greatest
– I.e. u such that E((u.x)2) over all x is maximized
–(we subtract the mean along each dimension, and center the original axis
system at the centroid of all data points, for simplicity)
– This direction of u is the direction of the first Principal Component
Computing the Components
• E((u.x)2) = E ((u.x) (u.x)T) = E (u.x.x T.uT)
•The matrix S = x.xT contains the correlations (similarities) of the original axes based
on how the data values project onto them
• So we are looking for w that maximizes uSuT, subject to u being unit-length
• It is maximized when w is the principal eigenvector of the matrix S, in which case
–uCuT = uλuT = λ if u is unit-length, where λ is the principal eigenvalue of the
correlation matrix C
– The eigenvalue denotes the amount of variability captured along that dimension
Why the Eigenvectors?
Maximise uTxxTu s.t uTu = 1
Construct Langrangian uTxxTu – λuTu
Vector of partial derivatives set to zero
xxTu – λu = (xxT – λI) u = 0
As u ≠ 0 then u must be an eigenvector of xxT with eigenvalue λ
Computing the Components
• Similarly for the next axis, etc.
•So, the new axes are the eigenvectors of the matrix of correlations of the
original variables, which captures the similarities of the original variables
based on how data samples project to them
• Geometrically: centering followed by rotation
• – Linear transformation
PCs, Variance and Least-Squares
• The first PC retains the greatest amount of variation in the sample
• The kth PC retains the kth greatest fraction of the variation in the sample
•The kth largest eigenvalue of the correlation matrix C is the variance in the
sample along the kth PC
•The least-squares view: PCs are a series of linear least
squares fits to a sample, each orthogonal to all previous ones
How Many PCs?
•For n original dimensions, correlation matrix is
nxn, and has up to n eigenvectors. So n PCs.
•Where does dimensionality reduction come
from?
Dimensionality Reduction
Can ignore the components of lesser significance.
You do lose some information, but if the eigenvalues are small, you don’t lose
much
– n dimensions in original data
– calculate n eigenvectors and eigenvalues
– choose only the first p eigenvectors, based on their eigenvalues
– final data set has only p dimensions
Eigenvectors of a Correlation Matrix
PCA Example –STEP 1
• Subtract the mean
from each of the data dimensions. All the x values
have x subtracted and y values have y subtracted
from them. This produces a data set whose mean is
zero.
Subtracting the mean makes variance and covariance
calculation easier by simplifying their equations. The
variance and co-variance values are not affected by
the mean value.
PCA Example –STEP 1
DATA:
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9
ZERO MEAN DATA:
x y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 .79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 2
• Calculate the covariance matrix
cov = .616555556 .615444444
.615444444 .716555556
• since the non-diagonal elements in this covariance
matrix are positive, we should expect that both the x
and y variable increase together.
PCA Example –STEP 3
• Calculate the eigenvectors and eigenvalues of
the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
PCA Example –STEP 3
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
•eigenvectors are plotted as
diagonal dotted lines on the
plot.
•Note they are
perpendicular to each other.
•Note one of the
eigenvectors goes through
the middle of the points, like
drawing a line of best fit.
•The second eigenvector
gives us the other, less
important, pattern in the
data, that all the points
follow the main line, but are
off to the side of the main
line by some amount.
PCA Example –STEP 4
• Reduce dimensionality and form feature vector
the eigenvector with the highest eigenvalue is the principle
component of the data set.
In our example, the eigenvector with the larges eigenvalue
was the one that pointed down the middle of the data.
Once eigenvectors are found from the covariance matrix, the
next step is to order them by eigenvalue, highest to lowest.
This gives you the components in order of significance.
PCA Example –STEP 4
Now, if you like, you can decide to ignore the components of
lesser significance.
You do lose some information, but if the eigenvalues are
small, you don’t lose much
• n dimensions in your data
• calculate n eigenvectors and eigenvalues
• choose only the first p eigenvectors
• final data set has only p dimensions.
PCA Example –STEP 4
• Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of the
eigenvectors:
-.677873399
-.735178656
-.735178656
.677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
PCA Example –STEP 5
• Deriving the new data
FinalData = RowFeatureVector x RowZeroMeanData
RowFeatureVector is the matrix with the eigenvectors in the
columns transposed so that the eigenvectors are now in the
rows, with the most significant eigenvector at the top
RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each column,
with each row holding a separate dimension.
PCA Example –STEP 5
FinalData transpose: dimensions
along columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
PCA Example –STEP 5
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
Reconstruction of original Data
• If we reduced the dimensionality, obviously, when
reconstructing the data we would lose those
dimensions we chose to discard. In our example let
us assume that we considered only the x dimension…
Reconstruction of original Data
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
Applications in computer vision-
PCA to find patterns-
• 20 face images: NxN size
• One image represented as follows-
• Putting all 20 images in 1 big matrix as follows-
• Performing PCA to find patterns in the face images
• Idenifying faces by measuring differences along the new axes (PCs)
PCA for image compression:
• Compile a dataset of 20 images
• Build the covariance matrix of 20 dimensions
• Compute the eigenvectors and eigenvalues
• Based on the eigenvalues, 5 dimensions can be left out, those with the least
eigenvalues.
• 1/4th of the space is saved.
Importance of PCA
• In data of high dimensions, where graphical representation is difficult, PCA is a
powerful tool for analysing data and finding patterns in it.
• Data compression is possible using PCA
• The most efficient expression of data is by the use of perpendicular components,
as done in PCA.
Questions:
• What do the eigenvectors of the covariance matrix while computing the principal
components give us?
• At what point in the PCA process can we decide to compress the data?
• Why are the principal components orthogonal?
• How many different covariance values can you calculate for an n-dimensional data
set?
THANK YOU

Contenu connexe

Tendances

Tendances (20)

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
The Wishart and inverse-wishart distribution
 The Wishart and inverse-wishart distribution The Wishart and inverse-wishart distribution
The Wishart and inverse-wishart distribution
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 

Similaire à PCA Final.pptx

Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
Shrey Nishchal
 

Similaire à PCA Final.pptx (20)

Lecture1_jps (1).ppt
Lecture1_jps (1).pptLecture1_jps (1).ppt
Lecture1_jps (1).ppt
 
Lecture1_jps.ppt
Lecture1_jps.pptLecture1_jps.ppt
Lecture1_jps.ppt
 
pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptx
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
data clean.ppt
data clean.pptdata clean.ppt
data clean.ppt
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Data reduction
Data reductionData reduction
Data reduction
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 

Plus de HarisMasood20

10aiusesinhealthcare-21060120322353.pptx
10aiusesinhealthcare-21060120322353.pptx10aiusesinhealthcare-21060120322353.pptx
10aiusesinhealthcare-21060120322353.pptx
HarisMasood20
 
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptxFaculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
HarisMasood20
 
1. Deans Presentation Departmental Admission and Campaign (02-14-2024).pptx
1. Deans Presentation Departmental Admission and Campaign  (02-14-2024).pptx1. Deans Presentation Departmental Admission and Campaign  (02-14-2024).pptx
1. Deans Presentation Departmental Admission and Campaign (02-14-2024).pptx
HarisMasood20
 
14th BOS Presentation for engineering.pptx
14th BOS Presentation for engineering.pptx14th BOS Presentation for engineering.pptx
14th BOS Presentation for engineering.pptx
HarisMasood20
 
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptxAdmissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
HarisMasood20
 
Format Draft for Research Productivity Data (1) (1).pptx
Format Draft for Research Productivity Data (1) (1).pptxFormat Draft for Research Productivity Data (1) (1).pptx
Format Draft for Research Productivity Data (1) (1).pptx
HarisMasood20
 
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptx
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptxUpdated Revenue Generation 7-8-23 (Electrical Engineering).pptx
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptx
HarisMasood20
 
Qualifier B - Haider Ali - rev 3.0 (1).pptx
Qualifier B - Haider Ali - rev 3.0 (1).pptxQualifier B - Haider Ali - rev 3.0 (1).pptx
Qualifier B - Haider Ali - rev 3.0 (1).pptx
HarisMasood20
 

Plus de HarisMasood20 (20)

Open House & Job Fair 2024 Meeting Agenda Points.pptx
Open House & Job Fair 2024 Meeting Agenda Points.pptxOpen House & Job Fair 2024 Meeting Agenda Points.pptx
Open House & Job Fair 2024 Meeting Agenda Points.pptx
 
FYDP Defense Presentation 2023 Electrical.pptx
FYDP Defense Presentation 2023 Electrical.pptxFYDP Defense Presentation 2023 Electrical.pptx
FYDP Defense Presentation 2023 Electrical.pptx
 
-Report of the Activities Done for Struggling Students.pptx
-Report of the Activities Done for Struggling Students.pptx-Report of the Activities Done for Struggling Students.pptx
-Report of the Activities Done for Struggling Students.pptx
 
10aiusesinhealthcare-21060120322353.pptx
10aiusesinhealthcare-21060120322353.pptx10aiusesinhealthcare-21060120322353.pptx
10aiusesinhealthcare-21060120322353.pptx
 
Dr Sohaira-CACIO-V2a_EngineeringTech.pptx
Dr Sohaira-CACIO-V2a_EngineeringTech.pptxDr Sohaira-CACIO-V2a_EngineeringTech.pptx
Dr Sohaira-CACIO-V2a_EngineeringTech.pptx
 
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptxFaculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
Faculty of Engineering-Summary of Activities to Groom Students (SW and CPE).pptx
 
1. Deans Presentation Departmental Admission and Campaign (02-14-2024).pptx
1. Deans Presentation Departmental Admission and Campaign  (02-14-2024).pptx1. Deans Presentation Departmental Admission and Campaign  (02-14-2024).pptx
1. Deans Presentation Departmental Admission and Campaign (02-14-2024).pptx
 
assessment of EBTL organization theory.pptx
assessment of EBTL organization theory.pptxassessment of EBTL organization theory.pptx
assessment of EBTL organization theory.pptx
 
14th BOS Presentation for engineering.pptx
14th BOS Presentation for engineering.pptx14th BOS Presentation for engineering.pptx
14th BOS Presentation for engineering.pptx
 
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptxAdmissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
Admissions Fall 2021 (ACC Meeting 21 Dec 2021).pptx
 
Dr sohaira Future Academic Plansss.pptx
Dr sohaira  Future Academic Plansss.pptxDr sohaira  Future Academic Plansss.pptx
Dr sohaira Future Academic Plansss.pptx
 
Format Draft for Research Productivity Data (1) (1).pptx
Format Draft for Research Productivity Data (1) (1).pptxFormat Draft for Research Productivity Data (1) (1).pptx
Format Draft for Research Productivity Data (1) (1).pptx
 
Electrical Engineering Deptt Research Outputs 2022-2023.pptx
Electrical Engineering Deptt Research Outputs  2022-2023.pptxElectrical Engineering Deptt Research Outputs  2022-2023.pptx
Electrical Engineering Deptt Research Outputs 2022-2023.pptx
 
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptx
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptxUpdated Revenue Generation 7-8-23 (Electrical Engineering).pptx
Updated Revenue Generation 7-8-23 (Electrical Engineering).pptx
 
Qualifier B - Haider Ali - rev 3.0 (1).pptx
Qualifier B - Haider Ali - rev 3.0 (1).pptxQualifier B - Haider Ali - rev 3.0 (1).pptx
Qualifier B - Haider Ali - rev 3.0 (1).pptx
 
UPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptxUPDATED Sampling Lecture (2).pptx
UPDATED Sampling Lecture (2).pptx
 
cornea 3 rd year 2023.pptx
cornea 3 rd year 2023.pptxcornea 3 rd year 2023.pptx
cornea 3 rd year 2023.pptx
 
EBTL LECTURE 2.pptx
EBTL LECTURE 2.pptxEBTL LECTURE 2.pptx
EBTL LECTURE 2.pptx
 
Lec5.pptx
Lec5.pptxLec5.pptx
Lec5.pptx
 
lec4-PF (Done).pptx
lec4-PF (Done).pptxlec4-PF (Done).pptx
lec4-PF (Done).pptx
 

Dernier

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 

Dernier (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 

PCA Final.pptx

  • 2. Overview: • What is Principal Component Analysis • Computing the compnents in PCA • Dimensionality Reduction using PCA • A 2D example in PCA • Applications of PCA in computer vision • Importance of PCA in analysing data in higher dimensions • Questions
  • 3. Feature reduction vs feature selection • Feature reduction – All original features are used – The transformed features are linear combinations of the original features. • Feature selection – Only a subset of the original features are used.
  • 4. Why feature reduction? • Most machine learning and data mining techniques may not be effective for high- dimensional data – Curse of Dimensionality – Query accuracy and efficiency degrade rapidly as the dimension increases. • The intrinsic dimension may be small. – For example, the number of genes responsible for a certain type of disease may be small.
  • 5. Why feature reduction? • Visualization: projection of high-dimensional data onto 2D or 3D. • Data compression: efficient storage and retrieval. • Noise removal: positive effect on query accuracy.
  • 6. Applications of feature reduction • Face recognition • Handwritten digit recognition • Text mining • Image retrieval • Microarray data analysis • Protein classification
  • 7. • We study phenomena that can not be directly observed – ego, personality, intelligence in psychology – Underlying factors that govern the observed data • We want to identify and operate with underlying latent factors rather than the observed data – E.g. topics in news articles – Transcription factors in genomics • We want to discover and exploit hidden relationships – “beautiful car” and “gorgeous automobile” are closely related – So are “driver” and “automobile” – But does your search engine know this? – Reduces noise and error in results Why Factor or Component Analysis?
  • 8. • We have too many observations and dimensions – To reason about or obtain insights from – To visualize – Too much noise in the data – Need to “reduce” them to a smaller set of factors – Better representation of data without losing much information – Can build more effective data analyses on the reduced- dimensional space: classification, clustering, pattern recognition • Combinations of observed variables may be more effective bases for insights, even if physical meaning is obscure Why Factor or Component Analysis?
  • 9. • Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data – For more effective reasoning, insights, or better visualization – Reduce noise in the data – Typically a smaller set of factors: dimension reduction – Better representation of data without losing much information – Can build more effective data analyses on the reduced- dimensional space: classification, clustering, pattern recognition • Factors are combinations of observed variables – May be more effective bases for insights, even if physical meaning is obscure – Observed data are described in terms of these factors rather than in terms of original variables/dimensions Factor or Component Analysis
  • 10. Basic Concept Areas of variance in data are where items can be best discriminated and key underlying phenomena observed Areas of greatest “signal” in the data If two items or dimensions are highly correlated or dependent They are likely to represent highly related phenomena If they tell us about the same underlying variance in the data, combining them to form a single measure is reasonable Parsimony Reduction in Error So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form
  • 11. Basic Concept What if the dependences and correlations are not so strong or direct? And suppose you have 3 variables, or 4, or 5, or 10000? Look for the phenomena underlying the observed covariance/co- dependence in a set of variables Once again, phenomena that are uncorrelated or independent, and especially those along which the data show high variance These phenomena are called “factors” or “principal components” or “independent components,” depending on the methods used Factor analysis: based on variance/covariance/correlation Independent Component Analysis: based on independence
  • 12. Principal Component Analysis Most common form of factor analysis The new variables/dimensions Are linear combinations of the original ones Are uncorrelated with one another Orthogonal in original dimension space Capture as much of the original variance in the data as possible Are called Principal Components
  • 13. Principal Component Analysis •Most common form of factor analysis •The new variables/dimensions –Are linear combinations of the original ones –Are uncorrelated with one another •Orthogonal in original dimension space –Capture as much of the original variance in the data as possible –Are called Principal Components
  • 14. What are the new axes? • Orthogonal directions of greatest variance in data • Projections along PC1 discriminate the data most along any one axis
  • 15. Principal Components •First principal component is the direction of greatest variability (covariance) in the data • Second is the next orthogonal (uncorrelated) direction of greatest variability – So first remove all the variability along the first component, and then find the next direction of greatest variability • And so on …
  • 16. Principal Components Analysis (PCA) • Principle – Linear projection method to reduce the number of parameters – Transfer a set of correlated variables into a new set of uncorrelated variables – Map the data into a space of lower dimensionality – Form of unsupervised learning • Properties –It can be viewed as a rotation of the existing axes to new positions in the space defined by original variables –New axes are orthogonal and represent the directions with maximum variability
  • 17. Computing the Components • Data points are vectors in a multidimensional space • Projection of vector x onto an axis (dimension) u is u.x •Direction of greatest variability is that in which the average square of the projection is greatest – I.e. u such that E((u.x)2) over all x is maximized –(we subtract the mean along each dimension, and center the original axis system at the centroid of all data points, for simplicity) – This direction of u is the direction of the first Principal Component
  • 18. Computing the Components • E((u.x)2) = E ((u.x) (u.x)T) = E (u.x.x T.uT) •The matrix S = x.xT contains the correlations (similarities) of the original axes based on how the data values project onto them • So we are looking for w that maximizes uSuT, subject to u being unit-length • It is maximized when w is the principal eigenvector of the matrix S, in which case –uCuT = uλuT = λ if u is unit-length, where λ is the principal eigenvalue of the correlation matrix C – The eigenvalue denotes the amount of variability captured along that dimension
  • 19. Why the Eigenvectors? Maximise uTxxTu s.t uTu = 1 Construct Langrangian uTxxTu – λuTu Vector of partial derivatives set to zero xxTu – λu = (xxT – λI) u = 0 As u ≠ 0 then u must be an eigenvector of xxT with eigenvalue λ
  • 20. Computing the Components • Similarly for the next axis, etc. •So, the new axes are the eigenvectors of the matrix of correlations of the original variables, which captures the similarities of the original variables based on how data samples project to them • Geometrically: centering followed by rotation • – Linear transformation
  • 21. PCs, Variance and Least-Squares • The first PC retains the greatest amount of variation in the sample • The kth PC retains the kth greatest fraction of the variation in the sample •The kth largest eigenvalue of the correlation matrix C is the variance in the sample along the kth PC •The least-squares view: PCs are a series of linear least squares fits to a sample, each orthogonal to all previous ones
  • 22. How Many PCs? •For n original dimensions, correlation matrix is nxn, and has up to n eigenvectors. So n PCs. •Where does dimensionality reduction come from?
  • 23. Dimensionality Reduction Can ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much – n dimensions in original data – calculate n eigenvectors and eigenvalues – choose only the first p eigenvectors, based on their eigenvalues – final data set has only p dimensions
  • 24. Eigenvectors of a Correlation Matrix
  • 25. PCA Example –STEP 1 • Subtract the mean from each of the data dimensions. All the x values have x subtracted and y values have y subtracted from them. This produces a data set whose mean is zero. Subtracting the mean makes variance and covariance calculation easier by simplifying their equations. The variance and co-variance values are not affected by the mean value.
  • 26. PCA Example –STEP 1 DATA: x y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 ZERO MEAN DATA: x y .69 .49 -1.31 -1.21 .39 .99 .09 .29 1.29 1.09 .49 .79 .19 -.31 -.81 -.81 -.31 -.31 -.71 -1.01 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
  • 27. PCA Example –STEP 1 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
  • 28. PCA Example –STEP 2 • Calculate the covariance matrix cov = .616555556 .615444444 .615444444 .716555556 • since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable increase together.
  • 29. PCA Example –STEP 3 • Calculate the eigenvectors and eigenvalues of the covariance matrix eigenvalues = .0490833989 1.28402771 eigenvectors = -.735178656 -.677873399 .677873399 -.735178656
  • 30. PCA Example –STEP 3 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf •eigenvectors are plotted as diagonal dotted lines on the plot. •Note they are perpendicular to each other. •Note one of the eigenvectors goes through the middle of the points, like drawing a line of best fit. •The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.
  • 31. PCA Example –STEP 4 • Reduce dimensionality and form feature vector the eigenvector with the highest eigenvalue is the principle component of the data set. In our example, the eigenvector with the larges eigenvalue was the one that pointed down the middle of the data. Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance.
  • 32. PCA Example –STEP 4 Now, if you like, you can decide to ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much • n dimensions in your data • calculate n eigenvectors and eigenvalues • choose only the first p eigenvectors • final data set has only p dimensions.
  • 33. PCA Example –STEP 4 • Feature Vector FeatureVector = (eig1 eig2 eig3 … eign) We can either form a feature vector with both of the eigenvectors: -.677873399 -.735178656 -.735178656 .677873399 or, we can choose to leave out the smaller, less significant component and only have a single column: - .677873399 - .735178656
  • 34. PCA Example –STEP 5 • Deriving the new data FinalData = RowFeatureVector x RowZeroMeanData RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top RowZeroMeanData is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension.
  • 35. PCA Example –STEP 5 FinalData transpose: dimensions along columns x y -.827970186 -.175115307 1.77758033 .142857227 -.992197494 .384374989 -.274210416 .130417207 -1.67580142 -.209498461 -.912949103 .175282444 .0991094375 -.349824698 1.14457216 .0464172582 .438046137 .0177646297 1.22382056 -.162675287
  • 36. PCA Example –STEP 5 http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
  • 37. Reconstruction of original Data • If we reduced the dimensionality, obviously, when reconstructing the data we would lose those dimensions we chose to discard. In our example let us assume that we considered only the x dimension…
  • 38. Reconstruction of original Data http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf x -.827970186 1.77758033 -.992197494 -.274210416 -1.67580142 -.912949103 .0991094375 1.14457216 .438046137 1.22382056
  • 39. Applications in computer vision- PCA to find patterns- • 20 face images: NxN size • One image represented as follows- • Putting all 20 images in 1 big matrix as follows- • Performing PCA to find patterns in the face images • Idenifying faces by measuring differences along the new axes (PCs)
  • 40. PCA for image compression: • Compile a dataset of 20 images • Build the covariance matrix of 20 dimensions • Compute the eigenvectors and eigenvalues • Based on the eigenvalues, 5 dimensions can be left out, those with the least eigenvalues. • 1/4th of the space is saved.
  • 41. Importance of PCA • In data of high dimensions, where graphical representation is difficult, PCA is a powerful tool for analysing data and finding patterns in it. • Data compression is possible using PCA • The most efficient expression of data is by the use of perpendicular components, as done in PCA.
  • 42. Questions: • What do the eigenvectors of the covariance matrix while computing the principal components give us? • At what point in the PCA process can we decide to compress the data? • Why are the principal components orthogonal? • How many different covariance values can you calculate for an n-dimensional data set?