SlideShare a Scribd company logo
1 of 61
DIMENSION REDUCTION
KaziToufiqWadud
kazitoufiq@gmail.com
Twitter: @KaziToufiqWadud
WHAT IS DIMENSION REDUCTION?
Process of converting data set having vast dimensionsinto data set
with lesser dimensions ensuring that it conveys similar
information concisely
DIMENSION REDUCTION :WHY ?
Curse of Dimensionality
what is the curse?
UNDERSTANDINGTHE CURSE
Consider a 3 -class pattern recognition problem.
1) Strat with 1 dimension/feature
2) Divide the feature space into uniform bins
3) Compute the ratio of examples for each class at each bin
4) For a new example, find its bin and choose the predominant class in that bin
In this case, We decide to start with one feature and divide the real line into 3 bins
- but exists overlap between classes ; so let’s add 2nd Feature to improve discrimination.
UNDERSTANDINGTHE CURSE
 2 dimensions : two dimensions increases the number of bins from 3 to 32 =9
QUESTION: Which should we maintain constant?
The total number of examples?This results in a 2D scatter plot - Reduced Overlapping , Higher Sparsity
To address sparsity, what about keeping the density of examples per bin constant (say 3) ?This increases the
number of examples from 9 to 27 ( 9 x 3 = 27 , at least)
UNDERSTANDINGTHE CURSE
Moving to 3 features
 The number of bins grow to 3^3 = 27
 For same number of examples, 3D scatter plot is
almost empty
Constant Density:
 To keep the initial density of 3, required examples, 27 X 3 = 81
IMPLICATIONS OF CURSE OF
DIMENSIONALITY
Exponential growth with dimensionality in the number of examples
required to accurately estimate a function
In practice, the curse of dimensionality means that for a given sample size, there is a
maximum number of features above which the performance of a classifier will degrade
rather than improve.
In most cases the information that is lost by discarding some features is compensated by
more accurate mapping in lower dimensional space
MULTICOLLINEARITY
Multicollinearity is a state of very high
intercorrelations or inter-associations among the
independent variables. It is therefore a type of
disturbance in the data, and if present in the data the
statistical inferences made about the data may not
be reliable.
UNDERSTANDING MULTI-COLLINEARITY
Let’s look at the image shown below.
It shows 2 dimensions x1 and x2, which are,
Let us say, measurements of several object -
in cm (x1) and inches (x2).
Now, if you were to use both these dimensions in machine learning,
they will convey similar information and introduce a lot of noise in
System.
We are better of just using one dimension. Here we have converted
the dimension of data from 2D (from x1 and x2) to 1D (z1), which has made the data
relatively easier to explain.
BENEFITS OF APPLYING DIMENSION
REDUCTION
 Data Compression; Reduction of storage space
 Less computing; Faster processing
 Removal of multi-collinearity (redundant features) to reduce noise for better
model fit
 Better visualization and interpretation
APPROACHES FOR DIMENSION
REDUCTION
 Feature selection:
choosing a subset of all the features
 Feature extraction:
creating new features by combining existing ones
In either case, the goal is to find a low-dimensional representation of the data that preserves (most of)
the information or structure in the data
COMMON METHODS FOR DIMENSION
REDUCTION
 MISSINGVALUE
While exploring data, if we encounter missing values, what we do? Our first step should be to
identify the reason then impute missing values/ drop variables using appropriate methods.
But, what if we have too many missing values? Should we impute missing or drop the variables?
May be dropping is a good option, because it would not have lot more details about data set.
Also, it would not help in improving the power of model.
Next question, is there any threshold of missing values for dropping a variable?
It varies from case to case. If the information contained in the variable is not that much, you can
drop the variable if it has more than ~40-50% missing values.
COMMON METHODS FOR DIMENSION
REDUCTION
 MISSING VALUE
R code:
summary(data)
COMMON METHODS FOR DIMENSION
REDUCTION
 LOWVARIANCE
Let’s think of a scenario where we have a constant variable (all observations have
same value, 5) in our data set. Do you think, it can improve the power of model?
Of course NOT, because it has zero variance. In case of high number of dimensions,
we should drop variables having low variance compared to others because these
variables will not explain the variation in target variables.
nearZeroVar
EXAMPLE AND CODE
 To identify these types of predictors, the following two metrics can be calculated:
- the frequency of the most prevalent value over the second most frequent value
(called the "frequency ratio''), which would be near one for well-behaved predictors
and very large for highly-unbalanced data>
- the "percent of unique values'' is the number of unique values divided by the total
number of samples (times 100) that approaches zero as the granularity of the data
increases
 If the frequency ratio is less than a pre-specified threshold and the unique value
percentage is less than a threshold, we might consider a predictor to be near zero-
variance
NEARZEROVARIANCE
data(mdrr)
data.frame(table(mdrrDescr$nR11))
nzv <- nearZeroVar(mdrrDescr, saveMetrics= TRUE)
nzv[nzv$nzv,][1:10,]
dim(mdrrDescr)
nzv <- nearZeroVar(mdrrDescr)
filteredDescr <- mdrrDescr[, -nzv]
dim(filteredDescr)
COMMON METHODS FOR DIMENSION
REDUCTION
 DECISION TREE
It can be used as an ultimate solution to tackle multiple challenges like missing
values, outliers and identifying significant variables
- How? Need to understand how DecisionTree works and the
concept of Entropy and Information Gain
DECISIONTREE – DATA SET
DECISIONTREE: HOW DOES IT LOOK LIKE?
DECISIONTREE AND ENTROPY
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
To build a decision tree , we need to calculate 2 types of Entropy using frequency tables:
a) Entropy using the frequency table of one attribute
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
b) Entropy using the frequency table of two attributes:
ENTROPY …..CONTD.
ENTROPY …..CONTD.
ENTROPY CALCULATION
ENTROPY CALCULATION
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
Similarly :
KEY POINTS:
 Branch with entropy 0 is leaf node
KEY POINTS:
 A branch with entropy more than 0 needs further splitting.
The ID3 algorithm is run
recursively on the non-leaf
branches, until all data is
classified
DECISIONTREE: PROS AND CONS
R CODE
fancyRpartPlot(fit)
##to calculate information gain
library(FSelector)
weights <- information.gain(Play~., data=w)
print(weights)
subset <- cutoff.k(weights, 3)
f <- as.simple.formula(subset, "Play")
print(f)
COMMON METHODS FOR DIMENSION
REDUCTION
 Random Forest
Similar to decision tree is Random Forest. I would also recommend using the in-built
feature importance provided by random forests to select a smaller subset of input
features.
Just be careful that random forests have a tendency to bias towards variables that have
more no. of distinct values i.e. favour numeric variables over binary/categorical values.
HOW DOES RANDOM FORESTWORK?
W/ ANDW/O REPLACEMENT
HOW RANDOMFOREST IS BUILT
R CODE
w <- read.csv("weather.csv",header=T)
library(randomForest)
set.seed(12345)
w <- as.data.frame(w)
w <- read.csv("weather.csv",header=T)
str(w)
w <- w[names(w)[1:5]]
fit1 <- randomForest(Play~ Outlook+Temperature+Humidity+Windy, data=w,
importance=TRUE, ntree=20)
varImpPlot(fit1)
COMMON METHODS FOR DIMENSION
REDUCTION
 High Correlation
Dimensions exhibiting higher correlation can lower down the performance of model.
Moreover, it is not good to have multiple variables of similar information or variation
also known as “Multicollinearity”.
You can use Pearson (continuous variables) or Polychoric (discrete variables) correlation
matrix to identify the variables with high correlation and select one of them using
VIF (Variance Inflation Factor).Variables having higher value (VIF > 5 ) can be dropped.
IMPACT OF MULTICOLLINEARITY
 R demonstration with example
 13 independent variables against response variable of Crimerate
- a discrepancy is noticeable on studying the regression equation with regard to
expenditure on police services between the years 1959 and 1960.Why should police
expenditure in one year be associated with increase in crime rate and decrease in the
previous year? It does not make sense
IMPACT OF MULTICOLLINEARITY
2nd - even though the F statistic is highly significant, and which provides evidence for the presence
of a linear relationship between all 13 variables and the response variable, the β coefficients of both
expenditures for 1959 and 1960 have nonsignificant t ratios. Non-significant t means there is no
slope! In other words police expenditure has no effect whatsoever on crime rate!
HIGH CORRELATION -VIF
Most widely-used diagnostic for multicollinearity, the variance inflation factor (VIF).
-TheVIF may be calculated for each predictor by doing a linear regression of that predictor on all
the other predictors, and then obtaining the R2 from that regression.TheVIF is just 1/(1-R2).
- It’s called the variance inflation factor because it estimates how much the variance of a
coefficient is “inflated” because of linear dependence with other predictors.Thus, aVIF of 1.8 tells
us that the variance (the square of the standard error) of a particular coefficient is 80% larger than
it would be if that predictor was completely uncorrelated with all the other predictors.
-TheVIF has a lower bound of 1 but no upper bound. Authorities differ on how high theVIF has to
be to constitute a problem. Personally, I tend to get concerned when aVIF is greater than 2.50,
which corresponds to an R2 of .60 with the other variables.
VIF – R CODE
 vif(fit)
KEY POINTS:
 The variables with highVIFs are control variables, and the variables of interest
do not have highVIFs.
Let’s the sample consists of U.S. colleges.The dependent variable is graduation
rate, and the variable of interest is an indicator (dummy) for public vs. private.Two
control variables are average SAT scores and average ACT scores for entering
freshmen.These two variables have a correlation above .9, which corresponds to
VIFs of at least 5.26 for each of them. But theVIF for the public/private indicator is
only 1.04. So there’s no problem to be concerned about, and no need to delete one
or the other of the two controls.
http://statisticalhorizons.com/multicollinearity
COMMON METHODS FOR DIMENSION
REDUCTION
 Backward Feature Elimination/ Forward Feature Selection
 In this method, we start with all n dimensions. Compute the sum of square of error
(SSR) after eliminating each variable (n times).Then, identifying variables whose
removal has produced the smallest increase in the SSR and removing it finally, leaving
us with n-1 input features.
 Repeat this process until no other variables can be dropped.
Reverse to this, we can use “Forward Feature Selection” method. In this method, we
select one variable and analyse the performance of model by adding another variable.
Here, selection of variable is based on higher improvement in model performance.
COMMON METHODS FOR DIMENSION
REDUCTION
 Factor Analysis
Let’s say some variables are highly correlated.These variables can be grouped by their
correlations i.e. all variables in a particular group can be highly correlated among themselves but
have low correlation with variables of other group(s). Here each group represents a single
underlying construct or factor.These factors are small in number as compared to large number of
dimensions. However, these factors are difficult to observe.
There are basically two methods of performing factor analysis:
 EFA (Exploratory Factor Analysis)
 CFA (Confirmatory FactorAnalysis)
COMMON METHODS FOR DIMENSION
REDUCTION
 Principal Component Analysis (PCA)
In this technique, variables are transformed into a new set of variables, which are linear combination of original
variables.These new set of variables are known as principle components.They are obtained in such a way that
first principle component accounts for most of the possible variation of original data after which each succeeding
component has the highest possible variance.
The second principal component must be orthogonal to the first principal component. In other words, it does its
best to capture the variance in the data that is not captured by the first principal component. For two-dimensional
dataset, there can be only two principal components.
The principal components are sensitive to the scale of measurement, now to fix this issue we should always
standardize variables before applying PCA.Applying PCA to your data set loses its meaning.
If interpretability of the results is important for your analysis, PCA is not the right technique for your project.
UNDERSTANDING PCA:
PREREQUISITE
Standard Deviation
Statisticians are usually concerned with taking a sample of a population
Mean
“The average distance from the mean of the data set to a point”
Variance
UNDERSTANDING PCA:
PREREQUISITE
Covariance Covariance is always measured between 2 dimensions.
Covariance Matrix
EXAMPLE: HOWTO CALCULATE
COVARIANCE
EIGENVECTORS & EIGENVALUES OF A
MATRIX
EigenVector
EigenValueMatrix
PROPERTIES:
- eigenvectors and eigenvalues always come in pairs
- Eigenvectors can only be found for square matrices; but not all square matrices have eigenvectors;
-For n x n matrix, there are n eigenvectors
-Another property of eigenvectors is that even if I scale the vector by some amount
before I multiply it, I still get the same multiple of it as a result.This
is because if you scale a vector by some amount, all you are doing is making it longer, not changing it’s
direction
PROPERTIES:
all the eigenvectors of a matrix are perpendicular,
ie. at right angles to each other, no matter how many dimensions you have. By the way,
another word for perpendicular, in maths talk, is orthogonal
This is important ;
That means we can express data in terms of these perpendicular eigenvectors, instead of expressing
them in terms of the x axes and y axes.
PROPERTIES:
Another important thing to know is that when mathematicians find eigenvectors,
they like to find the eigenvectors whose length is exactly one.This is because, as
you know, the length of a vector doesn’t affect whether it’s an eigenvector or not,
whereas the direction does. So, in order to keep eigenvectors standard, whenever
we find an eigenvector we usually scale it to make it have a length of 1, so that all
eigenvectors have the same length. Here’s a demonstration from our example
above.
HOW DOES ONE GO ABOUT FINDINGTHESE
MYSTICAL EIGENVECTORS?
- Unfortunately, it’s only easy(ish) if you have a rather small matrix,
-The usual way to find the eigenvectors is by some complicated iterative method
which is beyond the scope of this tutorial
REVISE:
METHOD
 1. Get Some Data
 2. Subtract the mean
3. Calculate the covariance matrix
METHOD ….CONTD
4. Calculate the eigenvectors and eigenvalues of the covariance
Matrix
PLOT
METHOD
METHOD:
 Step 5: Deriving the new data set
RawFeatureVector is the matrix with the eigenvectors in the columns transposed
so that the eigenvectors are now in the rows, with the most significant eigenvector
at the top,
RowDataAdjust is the mean-adjusted data transposed, ie. the data
items are in each column, with each row holding a separate dimension
PCA
Thanks
kazitoufiq@gmail.com
Twitter @KaziToufiqWadud

More Related Content

What's hot

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)kalung0313
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 

What's hot (20)

Pca
PcaPca
Pca
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Similar to Dimension Reduction: What? Why? and How?

Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I IZoha Qureshi
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variationleblance
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variationleblance
 
Statistics in research
Statistics in researchStatistics in research
Statistics in researchBalaji P
 
Describing and exploring data
Describing and exploring dataDescribing and exploring data
Describing and exploring dataTarun Gehlot
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative dataBing Villamor
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guideprateek kumar
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxanhlodge
 

Similar to Dimension Reduction: What? Why? and How? (20)

Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I I
 
2 UNIT-DSP.pptx
2 UNIT-DSP.pptx2 UNIT-DSP.pptx
2 UNIT-DSP.pptx
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Describing and exploring data
Describing and exploring dataDescribing and exploring data
Describing and exploring data
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Dimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional DataDimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional Data
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guide
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Dimension Reduction: What? Why? and How?

  • 2. WHAT IS DIMENSION REDUCTION? Process of converting data set having vast dimensionsinto data set with lesser dimensions ensuring that it conveys similar information concisely
  • 3. DIMENSION REDUCTION :WHY ? Curse of Dimensionality what is the curse?
  • 4. UNDERSTANDINGTHE CURSE Consider a 3 -class pattern recognition problem. 1) Strat with 1 dimension/feature 2) Divide the feature space into uniform bins 3) Compute the ratio of examples for each class at each bin 4) For a new example, find its bin and choose the predominant class in that bin In this case, We decide to start with one feature and divide the real line into 3 bins - but exists overlap between classes ; so let’s add 2nd Feature to improve discrimination.
  • 5. UNDERSTANDINGTHE CURSE  2 dimensions : two dimensions increases the number of bins from 3 to 32 =9 QUESTION: Which should we maintain constant? The total number of examples?This results in a 2D scatter plot - Reduced Overlapping , Higher Sparsity To address sparsity, what about keeping the density of examples per bin constant (say 3) ?This increases the number of examples from 9 to 27 ( 9 x 3 = 27 , at least)
  • 6. UNDERSTANDINGTHE CURSE Moving to 3 features  The number of bins grow to 3^3 = 27  For same number of examples, 3D scatter plot is almost empty Constant Density:  To keep the initial density of 3, required examples, 27 X 3 = 81
  • 7. IMPLICATIONS OF CURSE OF DIMENSIONALITY Exponential growth with dimensionality in the number of examples required to accurately estimate a function In practice, the curse of dimensionality means that for a given sample size, there is a maximum number of features above which the performance of a classifier will degrade rather than improve. In most cases the information that is lost by discarding some features is compensated by more accurate mapping in lower dimensional space
  • 8. MULTICOLLINEARITY Multicollinearity is a state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable.
  • 9. UNDERSTANDING MULTI-COLLINEARITY Let’s look at the image shown below. It shows 2 dimensions x1 and x2, which are, Let us say, measurements of several object - in cm (x1) and inches (x2). Now, if you were to use both these dimensions in machine learning, they will convey similar information and introduce a lot of noise in System. We are better of just using one dimension. Here we have converted the dimension of data from 2D (from x1 and x2) to 1D (z1), which has made the data relatively easier to explain.
  • 10. BENEFITS OF APPLYING DIMENSION REDUCTION  Data Compression; Reduction of storage space  Less computing; Faster processing  Removal of multi-collinearity (redundant features) to reduce noise for better model fit  Better visualization and interpretation
  • 11. APPROACHES FOR DIMENSION REDUCTION  Feature selection: choosing a subset of all the features  Feature extraction: creating new features by combining existing ones In either case, the goal is to find a low-dimensional representation of the data that preserves (most of) the information or structure in the data
  • 12. COMMON METHODS FOR DIMENSION REDUCTION  MISSINGVALUE While exploring data, if we encounter missing values, what we do? Our first step should be to identify the reason then impute missing values/ drop variables using appropriate methods. But, what if we have too many missing values? Should we impute missing or drop the variables? May be dropping is a good option, because it would not have lot more details about data set. Also, it would not help in improving the power of model. Next question, is there any threshold of missing values for dropping a variable? It varies from case to case. If the information contained in the variable is not that much, you can drop the variable if it has more than ~40-50% missing values.
  • 13. COMMON METHODS FOR DIMENSION REDUCTION  MISSING VALUE R code: summary(data)
  • 14. COMMON METHODS FOR DIMENSION REDUCTION  LOWVARIANCE Let’s think of a scenario where we have a constant variable (all observations have same value, 5) in our data set. Do you think, it can improve the power of model? Of course NOT, because it has zero variance. In case of high number of dimensions, we should drop variables having low variance compared to others because these variables will not explain the variation in target variables. nearZeroVar
  • 15. EXAMPLE AND CODE  To identify these types of predictors, the following two metrics can be calculated: - the frequency of the most prevalent value over the second most frequent value (called the "frequency ratio''), which would be near one for well-behaved predictors and very large for highly-unbalanced data> - the "percent of unique values'' is the number of unique values divided by the total number of samples (times 100) that approaches zero as the granularity of the data increases  If the frequency ratio is less than a pre-specified threshold and the unique value percentage is less than a threshold, we might consider a predictor to be near zero- variance
  • 16. NEARZEROVARIANCE data(mdrr) data.frame(table(mdrrDescr$nR11)) nzv <- nearZeroVar(mdrrDescr, saveMetrics= TRUE) nzv[nzv$nzv,][1:10,] dim(mdrrDescr) nzv <- nearZeroVar(mdrrDescr) filteredDescr <- mdrrDescr[, -nzv] dim(filteredDescr)
  • 17. COMMON METHODS FOR DIMENSION REDUCTION  DECISION TREE It can be used as an ultimate solution to tackle multiple challenges like missing values, outliers and identifying significant variables - How? Need to understand how DecisionTree works and the concept of Entropy and Information Gain
  • 19. DECISIONTREE: HOW DOES IT LOOK LIKE?
  • 21. DECISIONTREE: ENTROPY AND INFORMATION GAIN To build a decision tree , we need to calculate 2 types of Entropy using frequency tables: a) Entropy using the frequency table of one attribute
  • 22. DECISIONTREE: ENTROPY AND INFORMATION GAIN b) Entropy using the frequency table of two attributes:
  • 28. KEY POINTS:  Branch with entropy 0 is leaf node
  • 29. KEY POINTS:  A branch with entropy more than 0 needs further splitting. The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified
  • 31. R CODE fancyRpartPlot(fit) ##to calculate information gain library(FSelector) weights <- information.gain(Play~., data=w) print(weights) subset <- cutoff.k(weights, 3) f <- as.simple.formula(subset, "Play") print(f)
  • 32. COMMON METHODS FOR DIMENSION REDUCTION  Random Forest Similar to decision tree is Random Forest. I would also recommend using the in-built feature importance provided by random forests to select a smaller subset of input features. Just be careful that random forests have a tendency to bias towards variables that have more no. of distinct values i.e. favour numeric variables over binary/categorical values.
  • 33. HOW DOES RANDOM FORESTWORK?
  • 36. R CODE w <- read.csv("weather.csv",header=T) library(randomForest) set.seed(12345) w <- as.data.frame(w) w <- read.csv("weather.csv",header=T) str(w) w <- w[names(w)[1:5]] fit1 <- randomForest(Play~ Outlook+Temperature+Humidity+Windy, data=w, importance=TRUE, ntree=20) varImpPlot(fit1)
  • 37. COMMON METHODS FOR DIMENSION REDUCTION  High Correlation Dimensions exhibiting higher correlation can lower down the performance of model. Moreover, it is not good to have multiple variables of similar information or variation also known as “Multicollinearity”. You can use Pearson (continuous variables) or Polychoric (discrete variables) correlation matrix to identify the variables with high correlation and select one of them using VIF (Variance Inflation Factor).Variables having higher value (VIF > 5 ) can be dropped.
  • 38. IMPACT OF MULTICOLLINEARITY  R demonstration with example  13 independent variables against response variable of Crimerate - a discrepancy is noticeable on studying the regression equation with regard to expenditure on police services between the years 1959 and 1960.Why should police expenditure in one year be associated with increase in crime rate and decrease in the previous year? It does not make sense
  • 39. IMPACT OF MULTICOLLINEARITY 2nd - even though the F statistic is highly significant, and which provides evidence for the presence of a linear relationship between all 13 variables and the response variable, the β coefficients of both expenditures for 1959 and 1960 have nonsignificant t ratios. Non-significant t means there is no slope! In other words police expenditure has no effect whatsoever on crime rate!
  • 40. HIGH CORRELATION -VIF Most widely-used diagnostic for multicollinearity, the variance inflation factor (VIF). -TheVIF may be calculated for each predictor by doing a linear regression of that predictor on all the other predictors, and then obtaining the R2 from that regression.TheVIF is just 1/(1-R2). - It’s called the variance inflation factor because it estimates how much the variance of a coefficient is “inflated” because of linear dependence with other predictors.Thus, aVIF of 1.8 tells us that the variance (the square of the standard error) of a particular coefficient is 80% larger than it would be if that predictor was completely uncorrelated with all the other predictors. -TheVIF has a lower bound of 1 but no upper bound. Authorities differ on how high theVIF has to be to constitute a problem. Personally, I tend to get concerned when aVIF is greater than 2.50, which corresponds to an R2 of .60 with the other variables.
  • 41. VIF – R CODE  vif(fit)
  • 42. KEY POINTS:  The variables with highVIFs are control variables, and the variables of interest do not have highVIFs. Let’s the sample consists of U.S. colleges.The dependent variable is graduation rate, and the variable of interest is an indicator (dummy) for public vs. private.Two control variables are average SAT scores and average ACT scores for entering freshmen.These two variables have a correlation above .9, which corresponds to VIFs of at least 5.26 for each of them. But theVIF for the public/private indicator is only 1.04. So there’s no problem to be concerned about, and no need to delete one or the other of the two controls. http://statisticalhorizons.com/multicollinearity
  • 43. COMMON METHODS FOR DIMENSION REDUCTION  Backward Feature Elimination/ Forward Feature Selection  In this method, we start with all n dimensions. Compute the sum of square of error (SSR) after eliminating each variable (n times).Then, identifying variables whose removal has produced the smallest increase in the SSR and removing it finally, leaving us with n-1 input features.  Repeat this process until no other variables can be dropped. Reverse to this, we can use “Forward Feature Selection” method. In this method, we select one variable and analyse the performance of model by adding another variable. Here, selection of variable is based on higher improvement in model performance.
  • 44. COMMON METHODS FOR DIMENSION REDUCTION  Factor Analysis Let’s say some variables are highly correlated.These variables can be grouped by their correlations i.e. all variables in a particular group can be highly correlated among themselves but have low correlation with variables of other group(s). Here each group represents a single underlying construct or factor.These factors are small in number as compared to large number of dimensions. However, these factors are difficult to observe. There are basically two methods of performing factor analysis:  EFA (Exploratory Factor Analysis)  CFA (Confirmatory FactorAnalysis)
  • 45. COMMON METHODS FOR DIMENSION REDUCTION  Principal Component Analysis (PCA) In this technique, variables are transformed into a new set of variables, which are linear combination of original variables.These new set of variables are known as principle components.They are obtained in such a way that first principle component accounts for most of the possible variation of original data after which each succeeding component has the highest possible variance. The second principal component must be orthogonal to the first principal component. In other words, it does its best to capture the variance in the data that is not captured by the first principal component. For two-dimensional dataset, there can be only two principal components. The principal components are sensitive to the scale of measurement, now to fix this issue we should always standardize variables before applying PCA.Applying PCA to your data set loses its meaning. If interpretability of the results is important for your analysis, PCA is not the right technique for your project.
  • 46. UNDERSTANDING PCA: PREREQUISITE Standard Deviation Statisticians are usually concerned with taking a sample of a population Mean “The average distance from the mean of the data set to a point” Variance
  • 47. UNDERSTANDING PCA: PREREQUISITE Covariance Covariance is always measured between 2 dimensions. Covariance Matrix
  • 49. EIGENVECTORS & EIGENVALUES OF A MATRIX EigenVector EigenValueMatrix
  • 50. PROPERTIES: - eigenvectors and eigenvalues always come in pairs - Eigenvectors can only be found for square matrices; but not all square matrices have eigenvectors; -For n x n matrix, there are n eigenvectors -Another property of eigenvectors is that even if I scale the vector by some amount before I multiply it, I still get the same multiple of it as a result.This is because if you scale a vector by some amount, all you are doing is making it longer, not changing it’s direction
  • 51. PROPERTIES: all the eigenvectors of a matrix are perpendicular, ie. at right angles to each other, no matter how many dimensions you have. By the way, another word for perpendicular, in maths talk, is orthogonal This is important ; That means we can express data in terms of these perpendicular eigenvectors, instead of expressing them in terms of the x axes and y axes.
  • 52. PROPERTIES: Another important thing to know is that when mathematicians find eigenvectors, they like to find the eigenvectors whose length is exactly one.This is because, as you know, the length of a vector doesn’t affect whether it’s an eigenvector or not, whereas the direction does. So, in order to keep eigenvectors standard, whenever we find an eigenvector we usually scale it to make it have a length of 1, so that all eigenvectors have the same length. Here’s a demonstration from our example above.
  • 53. HOW DOES ONE GO ABOUT FINDINGTHESE MYSTICAL EIGENVECTORS? - Unfortunately, it’s only easy(ish) if you have a rather small matrix, -The usual way to find the eigenvectors is by some complicated iterative method which is beyond the scope of this tutorial
  • 55. METHOD  1. Get Some Data  2. Subtract the mean 3. Calculate the covariance matrix
  • 56. METHOD ….CONTD 4. Calculate the eigenvectors and eigenvalues of the covariance Matrix
  • 57. PLOT
  • 59. METHOD:  Step 5: Deriving the new data set RawFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top, RowDataAdjust is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension
  • 60. PCA