SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Dimensionality
Reduction using

Principal Components
Analysis


Rumman Chowdhury, Senior Data Scientist
@ruchowdh
rummanchowdhury.com
thisismetis.com
Me:
Political Science PhD, Data Scientist, Teacher, Do-
Gooder. Check me out on twitter: @ruchowdh, or on
my website: rummanchowdhury.com (psst, I post
cool jobs there)
What’s Metis?
Metis accelerates the careers of data scientists by
providing full-time immersive bootcamps, evening
part-time professional development courses, online
training, and corporate programs.
Who is Rumman? What’s a Metis?
What is PCA?
Why do we need dimensionality reduction?
Intuition behind Principal Components Analysis
Coding example
What is Principal Components
Analysis?
What is PCA?
- A shift in perspective
- A reduction in the number of
dimensions
Why do we need dimensionality
reduction?
Curse of Dimensionality
One dimension:
Small space
Being close quite
probableCigarettes
per day
Curse of Dimensionality
Two
dimensions
Height
Cigarettes per day
Curse of Dimensionality
Height
Two dimensions:
More space but still not so
much Being close not
improbable
Cigarettes per day
Curse of Dimensionality
Height
Three
dimensions
Cigarettes per day
Exercise
Curse of Dimensionality
Height
Three dimensions:
Much larger space
Being close less
probable
Cigarettes per dayExercise
Curse of Dimensionality
Height
Four
dimensions
Age
Cigarettes per day
Exercise
Curse of Dimensionality
Age
Height
Four dimensions:
Omg so much space
Being close quite
improbable
Cigarettes per
dayExercise
Curse of Dimensionality
Thousand dimensions:
Helloooo… hellooo.. helloo…
Can anybody hear meee..
mee.. mee.. mee..
So
alone….
Curse of Dimensionality
Thousand dimensions:
I specified you with such high
resolution, with so much
detail, that you don’t look
like anybody else anymore.
You’re unique.
Curse of Dimensionality
Height
Classification, clustering and other analysis methods
become exponentially difficult with increasing
dimensions.
Cigarettes per day
Curse of Dimensionality
Height
Classification, clustering and other analysis methods
become exponentially difficult with increasing
dimensions.
To understand how to divide that huge space, we
need a whole lot more data (usually much more
than we do or can have).
Cigarettes per day
Curse of Dimensionality
Height
Lots of features, lots of data is best. But what if
you don’t have the luxury of ginormous amounts of
data?
Not all features provide the same amount of
information. We can reduce the dimensions
(compress the data) without necessarily losing too
much information.
Cigarettes per day
Dimensionality Reduction
Feature Extraction
Do I have to choose the
dimensions among existing
features?
Height
Cigarettes per day
Feature Extraction
Do I have to choose the
dimensions among existing
features?
Height
Cigarettes per day
Why do we need dimensionality reduction?
- To better perform analyses
- …without sacrificing the information we
get from our features
- To better visualize our data
What is the intuition behind PCA?
Variable 1
Variable 2
Height
Cigarettes per day
PC 1PC 2
Ducks and Bunnies
PC 1
PC 2
Height
Cigarettes per day
0.398 (Height) + 0.602 (Cigarettes)
Height
Cigarettes
0.398 (Height) + 0.602 (Cigarettes)
Advantage: You retain more information
Disadvantage: You lose interpretability
2D
Healthy_or_not = logit( β1(Height) + β2(Cigarettes per day) )
Feature selection 1D
Healthy_or_not = logit( β1(Height) )
Feature extraction 1D
Healthy_or_not = logit( β1(0.4*Height + 0.6*Cigarettes per
day) )
3D → 2D Feature Extraction (PCA)
Height
Cigarettes
Exercise
3D → 2D Feature Extraction (PCA)
Optimum plane
Height
Cigarettes
Exercise
Cigarettes
Height
3D → 2D Feature Extraction (PCA)
Optimum plane
Exercise
A1*(Height)+B1*(cigarettes)+C1*(Exercise)
A2 *(Height) + B2 *(Cigarettes) + C2 *(Exercise)
Singular Value Decomposition
The eigenvectors and eigenvalues of a covariance (or
correlation) matrix represent the "core" of a PCA:
The eigenvectors (principal components) determine
the directions of the new feature space, and the
eigenvalues determine their magnitude.
In other words, the eigenvalues explain the
variance of the data along the new feature axes.
PCA Math
Correlation or Covariance Matrix?
Use the correlation matrix to calculate the principal components
if variables are measured by different scales and you want to
standardize them or if the variances differ widely between
variables. You can use the covariance or correlation matrix in all
other situations.
Matrix Selection
Kaiser Method
Retain any components with eigenvector values
greater than 1
Scree Test
Bar plot that shows the variance explained by each
component. Ideally you will see a clear drop-off
(elbow).
Percent Variance Explained
Calculate the sum of variance explained by each
component, stop when you reach a point.
How do I know how many dimensions to
reduce by?
What is the intuition behind PCA?
- We are attempting to resolve the curse of
dimensionality
- by shifting our perspective
- and keeping the eigenvectors that explain the
highest amount of variance.
- We select those components based on our end
goal, or by particular methods (Kaiser, Scree, %
Variance).

Contenu connexe

Similaire à Principal Components Analysis - PyBay 2016

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
Natalino Busa
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210
pbaxter
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
Amit Kumar
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
AllAnalytics
 

Similaire à Principal Components Analysis - PyBay 2016 (20)

The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Image Processing as a Part of Big Data Initiatives
Image Processing as a Part of Big Data InitiativesImage Processing as a Part of Big Data Initiatives
Image Processing as a Part of Big Data Initiatives
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
07 learning
07 learning07 learning
07 learning
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Data .pptx
Data .pptxData .pptx
Data .pptx
 
Medical diagnosis using decision tree
Medical diagnosis using decision treeMedical diagnosis using decision tree
Medical diagnosis using decision tree
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210Statistical-Process-Control-Analysis-Unraveled_updated210
Statistical-Process-Control-Analysis-Unraveled_updated210
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 

Dernier

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Principal Components Analysis - PyBay 2016

  • 1. Dimensionality Reduction using
 Principal Components Analysis 
 Rumman Chowdhury, Senior Data Scientist @ruchowdh rummanchowdhury.com thisismetis.com
  • 2. Me: Political Science PhD, Data Scientist, Teacher, Do- Gooder. Check me out on twitter: @ruchowdh, or on my website: rummanchowdhury.com (psst, I post cool jobs there) What’s Metis? Metis accelerates the careers of data scientists by providing full-time immersive bootcamps, evening part-time professional development courses, online training, and corporate programs. Who is Rumman? What’s a Metis?
  • 3. What is PCA? Why do we need dimensionality reduction? Intuition behind Principal Components Analysis Coding example
  • 4. What is Principal Components Analysis?
  • 5.
  • 6.
  • 7. What is PCA? - A shift in perspective - A reduction in the number of dimensions
  • 8. Why do we need dimensionality reduction?
  • 10. One dimension: Small space Being close quite probableCigarettes per day Curse of Dimensionality
  • 12. Height Two dimensions: More space but still not so much Being close not improbable Cigarettes per day Curse of Dimensionality
  • 14. Height Three dimensions: Much larger space Being close less probable Cigarettes per dayExercise Curse of Dimensionality
  • 16. Age Height Four dimensions: Omg so much space Being close quite improbable Cigarettes per dayExercise Curse of Dimensionality
  • 17. Thousand dimensions: Helloooo… hellooo.. helloo… Can anybody hear meee.. mee.. mee.. mee.. So alone…. Curse of Dimensionality
  • 18. Thousand dimensions: I specified you with such high resolution, with so much detail, that you don’t look like anybody else anymore. You’re unique. Curse of Dimensionality
  • 19. Height Classification, clustering and other analysis methods become exponentially difficult with increasing dimensions. Cigarettes per day Curse of Dimensionality
  • 20. Height Classification, clustering and other analysis methods become exponentially difficult with increasing dimensions. To understand how to divide that huge space, we need a whole lot more data (usually much more than we do or can have). Cigarettes per day Curse of Dimensionality
  • 21. Height Lots of features, lots of data is best. But what if you don’t have the luxury of ginormous amounts of data? Not all features provide the same amount of information. We can reduce the dimensions (compress the data) without necessarily losing too much information. Cigarettes per day Dimensionality Reduction
  • 22. Feature Extraction Do I have to choose the dimensions among existing features? Height Cigarettes per day
  • 23. Feature Extraction Do I have to choose the dimensions among existing features? Height Cigarettes per day
  • 24. Why do we need dimensionality reduction? - To better perform analyses - …without sacrificing the information we get from our features - To better visualize our data
  • 25. What is the intuition behind PCA?
  • 27.
  • 30.
  • 31. Height Cigarettes per day 0.398 (Height) + 0.602 (Cigarettes)
  • 33. Advantage: You retain more information Disadvantage: You lose interpretability 2D Healthy_or_not = logit( β1(Height) + β2(Cigarettes per day) ) Feature selection 1D Healthy_or_not = logit( β1(Height) ) Feature extraction 1D Healthy_or_not = logit( β1(0.4*Height + 0.6*Cigarettes per day) )
  • 34. 3D → 2D Feature Extraction (PCA) Height Cigarettes Exercise
  • 35. 3D → 2D Feature Extraction (PCA) Optimum plane Height Cigarettes Exercise
  • 36. Cigarettes Height 3D → 2D Feature Extraction (PCA) Optimum plane Exercise A1*(Height)+B1*(cigarettes)+C1*(Exercise) A2 *(Height) + B2 *(Cigarettes) + C2 *(Exercise)
  • 37. Singular Value Decomposition The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the "core" of a PCA: The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In other words, the eigenvalues explain the variance of the data along the new feature axes. PCA Math
  • 38. Correlation or Covariance Matrix? Use the correlation matrix to calculate the principal components if variables are measured by different scales and you want to standardize them or if the variances differ widely between variables. You can use the covariance or correlation matrix in all other situations. Matrix Selection
  • 39. Kaiser Method Retain any components with eigenvector values greater than 1 Scree Test Bar plot that shows the variance explained by each component. Ideally you will see a clear drop-off (elbow). Percent Variance Explained Calculate the sum of variance explained by each component, stop when you reach a point. How do I know how many dimensions to reduce by?
  • 40. What is the intuition behind PCA? - We are attempting to resolve the curse of dimensionality - by shifting our perspective - and keeping the eigenvectors that explain the highest amount of variance. - We select those components based on our end goal, or by particular methods (Kaiser, Scree, % Variance).