Publicité

Principal component analysis.pptx

27 Oct 2022
Publicité

Contenu connexe

Publicité

Principal component analysis.pptx

  1. Principal component analysis (PCA)
  2. • Principal component analysis (PCA) is a statistical technique that is useful for the compression and classification of data. • The purpose is to reduce the dimensionality of a data set (sample) by finding a new set of variables, smaller than the original set of variables • The new data set retains most of the information in form of the variation present in the sample, given by the correlations between the original variables in the large data set. • The new variables, called principal components (PCs), are uncorrelated, and are arranged by the fraction of the total information each retains. • The features are selected on the basis of variance that they cause in the output. Original features of the dataset are converted to the Principal Components which are the linear combinations of the existing features.
  3. • The feature that causes highest variance is the first Principal Component. The feature that is responsible for second highest variance is considered the second Principal Component, and so on. • Traditionally, principal component analysis is performed on a square symmetric matrix. • PCA reduces attribute space from a larger number of variables to a smaller number of factors and as such is a "non-dependent" procedure
  4. • Step 1: Get some data • Step 2: Subtract the mean • Step 3: Calculate the covariance matrix • Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix • Step 5: Choosing components and forming a feature vector • Step 6: Deriving the new data set
  5. • Advantages • Removes Correlated Features • 2. Improves Algorithm Performance • 3. Reduces Overfitting • 4. Improves Visualization • Disadvantages : • Independent variables become less interpretable • Data standardization is must before PCA • 3. Information Loss
Publicité