SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Machine Learning
Dr.M.Pyingkodi
Dept of MCA
Kongu Engineering College
Erode, Tamilnadu, India
Basics of Feature Engineering - Unit II
Basics of Feature Engineering
2
• the process of translating a data set into features such that these features are able to represent
the data set more effectively and result in a better learning performance.
• refers to manipulation — addition, deletion, combination, mutation — of your data set to improve
machine learning model training, leading to better performance and greater accuracy.
• a part of the preparatory activities
• It is responsible for taking raw input data and converting that to well-aligned features which are
ready to be used by the machine learning models.
• FE encapsulates various data engineering techniques such as selecting relevant features, handling
missing data, encoding the data, and normalizing it.
• It has two major elements
❖ feature transformation
❖ feature subset selection
3
• A feature is an attribute of a data set that is used in a machine learning process.
• selection of the subset of features which are meaningful for machine learning is a sub-area of
feature engineering.
• The features in a data set are also called its dimensions.
• a data set having ‘n’ features is called an n-dimensional data set.
Class variable
- Species
predictor variables
5 dimensional dataset
4
Apply a mathematical formula to a particular column(feature) and transform the values which are
useful for our further analysis.
Creating new features from existing features that may help in improving the model performance.
A set of features (m) to create a new feature set (n) while retaining as much information as possible
It has two major elements:
1. feature construction
2. feature extraction
Both are sometimes known as feature discovery
There are two distinct goals of feature transformation:
1. Achieving best reconstruction of the original features in the data set
2. Achieving highest efficiency in the learning task
5
Involves transforming a given set of input features to generate a new set of more powerful features.
The data set has three features –
apartment length,
apartment breadth, and
price of the apartment.
If it is used as an input to a regression problem, such data can be training data for the regression model.
So given the training data, the model should be able to predict the price of an apartment whose price is not
known or which has just come up for sale.
However, instead of using length and breadth of the apartment as a predictor, it is much convenient and
makes more sense to use the area of the apartment, which is not an existing feature of the data set.
So such a feature, namely apartment area, can be added to the data set.
In other words, we transform the three- dimensional data set to a four-dimensional data set,
with the newly ‘discovered’ feature apartment area being added to the original data set.
6
• when features have categorical value and machine learning needs numeric value inputs
• when features having numeric (continuous) values and need to be converted to ordinal values
• when text-specific feature construction needs to be done
Ordinal data are discrete integers that can be ranked or sorted
7
Encoding categorical (nominal) variables
FIG. 4.3 Feature construction (encoding nominal variables)
8
• The grade is an ordinal variable
with values A, B, C, and D.
• To transform this variable to a
numeric variable, we can create
a feature num_grade mapping a
numeric value against each
ordinal value.
• mapped to values 1, 2, 3, and 4
in the transformed variable
9
• As a real estate price category prediction,
which is a classification problem.
• In that case, we can ‘bin’ the numerical data
into multiple categories based on the data
range.
• In the context of the real estate price
prediction example, the original data set has a
numerical feature apartment_price
10
Text is arguably the most predominant medium of communication.
Text mining is an important area of research
unstructured nature of the data
All machine learning models need numerical data as input.
So the text data in the data sets need to be transformed into numerical features
EX:
Facebook or micro-blogging channels like Twitter or emails or short messaging services such as
Whatsapp, Text plays a major role in the flow of information.
Vectorization:
  Turning text into vectors/ arrays
To turn text to integer (or boolean, or floating numbers) vectors.
 vectors are lists with n positions. 
11
I want to turn my text into data.
One-hot encoding for “I want to turn my text into data”
One-hot encoding for “I want my data”.
One hot encoding only treats values as “present” and “not present”. 
12
1. Tokenize
In order to tokenize a corpus, the blank spaces and punctuations are used as delimiters to separate
out the words, or tokens
A corpus is a collection of authentic text or audio organized into datasets.
2. Count
Then the number of occurrences of each token is counted, for each document.
3. Normalize
Tokens are weighted with reducing importance when they occur in the majority of the documents.
A matrix is then formed with each token representing a column and a specific document of the
corpus representing each row.
Each cell contains the count of occurrence of the token in a specific document.
This matrix is known as a document-term matrix / term- document matrix
13
Typical document- term matrix which forms an
input to a machine learning model.
14
New features are created from a combination of original features.
Operators for combining the original features include
1. For Boolean features:
Conjunctions, Disjunctions, Negation, etc.
2. For nominal features:
Cartesian product, M of N, etc.
3. For numerical features:
Min, Max, Addition, Subtraction,Multiplication, Division, Average, Equivalence, Inequality, etc.
15
16
Dimensionality-reduction method.
has multiple attributes or dimensions – many of which might have similarity with each other.
EX: If the height is more, generally weight is more and vice versa
In PCA, a new set of features are extracted from the original features which are quite dissimilar in nature.
transforming a large set of variables into a smaller one.
reduce the number of variables of a data set, while preserving as much information as possible.
Objective:
1.The new features are distinct, i.e. the covariance between the new features, i.e. the principal components is
0.
2. The principal components are generated in order of the variability in the data that it captures. Hence, the
first principal component should capture the maximum variability, the second principal component should
capture the next highest variability etc.
3. The sum of variance of the new features or the principal components should be equal to the sum of
variance of the original features.
17
converts the observations of correlated features into a set of linearly uncorrelated features with the
help of orthogonal transformation
These new transformed features are called the Principal Components.
it contains the important variables and drops the least important variable.
Examples:
image processing, movie recommendation system, optimizing the power allocation in various
communication channels.
Eigen vectors are the principal components of the data set.
Eigenvectors and values exist in pairs: every eigen vector has a corresponding eigenvalue.
Eigen vector is the direction of the line (vertical, horizontal, 45 degrees etc.).
An eigenvalue is a number, telling you how much variance there is in the data in that direction,
Eigenvalue is a number telling you how spread out the data is on the line.
18
Dimensionality
It is the number of features or variables present in the given dataset.
Correlation
It signifies that how strongly two variables are related to each other. Such as if one changes, the other
variable also gets changed. The correlation value ranges from -1 to +1.
Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
Orthogonal
It defines that variables are not correlated to each other, and hence the correlation between the pair of
variables is zero.
Eigenvectors
column vector.
Eigenvalue can be referred to as the strength of the transformation
Covariance Matrix
A matrix containing the covariance between the pair of variables is called the Covariance Matrix.
19
1. Standardizing the data
those variables with larger ranges will dominate over those with small ranges (For example, a
variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and
2. Covariance Matrix Calculation
how the variables of the input data set are varying from the mean with respect to each other.
if positive then : the two variables increase or decrease together (correlated)
if negative then : One increases when the other decreases (Inversely correlated)
20
Steps for PCA algorithm
3.Compute The Eigenvectors & Eigenvalues of The Covariance Matrix To Identify The Principal
Components
Principal components are new variables that are constructed as linear combinations or mixtures of
the initial variables.
These combinations are done in such a way that the new variables (i.e., principal components) are
uncorrelated and most of the information within the initial variables is squeezed or compressed
into the first components.
principal components represent the directions of the data that explain a maximal amount of
variance
21
4. The eigenvector having the next highest eigenvalue represents the direction in which data has
the highest remaining variance and also orthogonal to the first direction. So this helps in
identifying the second principal component.
5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the ‘k’ principal
components.
eigenvectors of the Covariance matrix are actually the directions of the axes where there is the
most variance(most information) and that we call Principal Components.
eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of
variance carried in each Principal Component.
22
• a matrix is a factorization of that matrix into three matrices.
• linear transformations
SVD of a matrix A (m × n) is a factorization of the form:
• two orthogonal matrices U and V and diagonal matrix D.
• where, U and V are orthonormal matrices
• U is an m × m unitary matrix,
• V is an n × n unitary matrix and
• Σ is an m × n rectangular diagonal matrix.
• The diagonal entries of Σ are known as singular values of matrix A.
• The columns of U and V are called the left-singular and right-singular vectors of matrix A
• The square roots of these eigenvalues are called singular values.
23
1. Patterns in the attributes are captured by the right-singular vectors, i.e.the columns of V.
2. Patterns among the instances are captured by the left-singular, i.e. the
columns of U.
3. Larger a singular value, larger is the part of the matrix A that it accounts for and its
associated vectors.
4. New data matrix with k’ attributes is obtained using the equation
D = D × [v , v , … , v ]
Thus, the dimensionality gets reduced to k
24
PCA, is capture the data set variability.
PCA that calculates eigenvalues of the covariancematrix of the data set
LDA focuses on class separability.
To reduce the number of features to a more manageable number before classification.
 commonly used for supervised classification problems
Separating the features based on class separability so as to avoid over-fitting of the machine
learning model.
• LDA calculates eigenvalues and eigenvectors within a class and inter-class scatter matrices.
25
1. Calculate the mean vectors for the individual classes.
2. Calculate intra-class and inter-class scatter matrices.
3. Calculate eigenvalues and eigenvectors for S and S , where S is the intra-class scatter matrix and S
is the inter-class scatter matrix
where, mi is the sample mean for each class, m is the overall mean of the data set,
Ni is the sample size of each class
4. Identify the top k’ eigenvectors having top k’ eigenvalues
where, m is the mean vector of the i-th class
26
to select a subset of attributes or features that makes the most meaningful contribution to a
machine learning activity.
the process of reducing the number of input variables when developing a predictive model.
Dimensionally reduction techniques in that both methods seek fewer input variables to a predictive
model.
Why feature subset selection?
to reduce the number of input variables to both reduce the computational cost of modeling and, in
some cases, to improve the performance of the model.
It removes unnecessary and repetitive data,
27
Trying to predict the weight of students
based on past information about similar
students,
The student weight data set has features
such as Roll Number, Age, Height, and
Weight.
Roll number can have no influence to
predict weight of the students,
Therefore we eliminate rollno from the
dataset
28
The high number of variables or attributes or features present in certain data sets known as High
dimensional dataset
The high-dimensional spaces often have hundreds or thousands of dimensions or attributes
Ex: DNA analysis, geographic information systems (GIS), social networking, etc.
EX dataset :
DNA
DNA microarray data can have up to 450,000 variables (gene probes).
social networking sites, emails
The text data generated from different sources also have extremely high dimensions.
29
• Having faster and more cost-effective (i.e. less need for computational resources) learning model
• Improving the efficiency of the learning model
• Having a better understanding of the underlying model that generated the data
To identify a subset of original features from a given dataset while removing irrelevant and/or
redundant features
The simplest algorithm is to test each possible subset of features finding the one which minimizes
the error rate.
30
Techniques that calculate a score for all the input features for a given model —
The scores simply represent the “importance” of each feature.
A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain
variable.
Each of the predictor variables, is expected to contribute information to decide the value of the class label.
In case a variable is not contributing any information, it is said to be irrelevant.
In case the information contribution for prediction is very little, the variable is said to be weakly relevant.
Remaining variables, which make a significant contribution to the prediction task are said to be strongly relevant variables.
In unsupervised learning, there is no training data set or labelled data.
Grouping of similar data instances are done and similarity of data instances are evaluated based on the value of different
variables.
Certain variables do not contribute any useful information for deciding the similarity of dissimilarity of data instances.
Hence, those variables make no significant information contribution in the grouping process.
These variables are marked as irrelevant variables in the context of the unsupervised machine learning task.
31
A feature may contribute to information that is similar to the information contributed by one or more
features.
Duplicated features or information, that adds as a precaution against failure or error.
For example, in the Student Data-set (refer slide no.8)
both the features Age & Height contribute similar information.
This is because, with an increase in age, weight is expected to increase.
Similarly, with the increase in Height also weight is expected to increase.
So, in context to that problem, Age and Height contribute similar information.
In other words, irrespective of whether the feature Height is present or not, the learning model will give the
same results.
In this kind of situation where one feature is similar to another feature, the feature is said to be potentially
redundant in the context of a machine learning problem.
32
Supervised Learning
Mutual information is considered as a good measure of information contribution of a feature to
decide the value of the class label.
Mutual information:
A quantity called mutual information measures the amount of information one can obtain from one
random variable given another.
a measure of dependence between two random variables
it is a good indicator of the relevance of a feature with respect to the class variable.
The higher the value of mutual information of a feature, the more relevant is that feature.
Mutual information =
H(C) - marginal Entropy of the classs
33
marginal entropy of the feature ‘x’, H( f ) =
K = number of classes
C = class variable
f = feature set that take discrete values.
In the case of unsupervised learning, the entropy of the set of features without one feature at a
time is calculated for all features.
Then the features are ranked in descending order of information gain from a feature and the top
'beta' percentage (value of beta is a design parameter of the algorithm) of features are
selected as relevant features.
The entropy of a feature f is calculated using Shannon’s formula
34
Similar information contribution by multiple features.
Methods:
1. Correlation-based measures
2. Distance-based measures
3. Other coefficient-based measure
Correlation-based similarity measure
Correlation is a measure of linear dependency between two random variables.
Pearson’s product moment correlation coefficient is one of the most popular and accepted
measures of correlation between two random variables.
35
For two random feature variables F1 and F2, Pearson correlation coefficient is defined
Correlation values range between +1 and –1. A correlation of 1 (+ / –) indicates perfect correlation
the two features having a perfect linear relationship
In case the correlation is 0, then the features seem to have no linear relationship.
36
Euclidean distance, which, between two features F1 and F2 are calculated as
where F1 and F2 are features of an n-dimensional data set.
37
A more generalized form of the Euclidean distance is the Minkowski distance
Minkowski distance takes the form of Euclidean distance (also called L2 norm) when r = 2.
At r = 1, it takes the form of Manhattan distance (also called L1 norm),
Manhattan distance, used more frequently to calculate the distance between binary vectors is the
Hamming distance. For example, the Hamming distance
between two vectors 01101011 and 11001001 is 3, as
38
Measure of similarity between two features.
The Jaccard distance, a measure of dissimilarity between two features, is complementary of Jaccard index.
For two features having binary values, Jaccard index is measured as
where, n11 = number of cases where both the features have value 1
n01 = number of cases where the feature 1 has value 0 and feature 2 has value 1
n10 = number of cases where the feature 1 has value 1 and feature 2 has value 0
Jaccard distance, dJ = 1 - J
39
It includes a number of cases where both the features have a value of 0.
where, n11 = number of cases where both the features have value 1
n01 = number of cases where the feature 1 has value 0 and feature 2 has value 1
n10 = number of cases where the feature 1 has value 1 and feature 2 has value 0
n00 = number of cases where both the features have value 0
40
Cosine similarity which is one of the most popular measures in text classification
if Cosine similarity value = 1
then similarity exists
if Cosine similarity value = 0
then no similarity found between the features
41
Feature selection is the process of selecting a subset of features in a data set
steps:
1. generation of possible subsets
2. subset evaluation
3. stop searching based on some stopping criterion
4. validation of the result
F
42
Produce all possible candidate subsets.
for an n- dimensional data set, 2n subsets can be generated.
value of ‘n’ becomes high, finding an optimal subset from all the 2n candidate subsets becomes
intractable.
Approximate search strategies are employed to find candidate subsets for evaluation.
Sequential forward selection
The search may start with an empty set and keep adding features.
Sequential backward elimination
A search may start with a full set and successively remove features
Bi-directional selection
search start with both ends and add and remove features simultaneously
43
2.Subset Evaluation
Each candidate subset is then evaluated and compared with the previous best performing subset based on
certain evaluation criterion. If the new subset performs better, it replaces the previous one.
3. Stopping Criterion
This cycle of subset generation and evaluation continues till a pre-defined stopping criterion is fulfilled. Some
commonly used stopping criteria are
1. the search completes
2. some given bound (e.g. a specified number of iterations) is reached
3. subsequent addition (or deletion) of the feature is not producing a better subset
4. a sufficiently good subset (e.g. a subset having better classification accuracy than the existing
benchmark) is selected
4. Result validation
The selected best subset is validated either against prior benchmarks or by experiments using real-life or
synthetic but authentic data sets.
Steps F
44
The feature subset is selected based on statistical measures done to assess the merits of the
features from the data perspective.
No learning algorithm is employed to evaluate the goodness of the feature selected.
Methods
v Pearson’s correlation,
v information gain,
v Fisher score,
v analysis of variance (ANOVA),
v Chi-Square, etc.
45
Using the induction algorithm as a black box.
Every candidate subset, the learning model is trained and the result is evaluated by running the
learning algorithm, wrapper approach is computationally very expensive.
46
Hybrid
Takes the advantage of both filter and wrapper approaches.
A typical hybrid algorithm makes use of both the statistical tests as used in filter approach to decide the best
subsets for a given cardinality and a learning algorithm to select the final best subset among the best
subsets across different cardinalities
Embedded
Similar to wrapper approach as it also uses and inductive algorithm to evaluate the generated feature subsets.
47

Contenu connexe

Tendances

Seminar(Pattern Recognition)
Seminar(Pattern Recognition)Seminar(Pattern Recognition)
Seminar(Pattern Recognition)anurodhsinha
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsDezyreAcademy
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...Edge AI and Vision Alliance
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxGauravSonawane51
 

Tendances (20)

Seminar(Pattern Recognition)
Seminar(Pattern Recognition)Seminar(Pattern Recognition)
Seminar(Pattern Recognition)
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Machine learning
Machine learningMachine learning
Machine learning
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Supervised learning
  Supervised learning  Supervised learning
Supervised learning
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 

Similaire à Unit_2_Feature Engineering.pdf

Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningPyingkodi Maran
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptxbeherasushree212
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS Adams Sidibe
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptxPriyadharshiniG41
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET Journal
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 

Similaire à Unit_2_Feature Engineering.pdf (20)

Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Dimensionality Reduction.pptx
Dimensionality Reduction.pptxDimensionality Reduction.pptx
Dimensionality Reduction.pptx
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 

Plus de Pyingkodi Maran

Health Monitoring System using IoT.doc
Health Monitoring System using IoT.docHealth Monitoring System using IoT.doc
Health Monitoring System using IoT.docPyingkodi Maran
 
IoT Industry Adaptation of AI.ppt
IoT Industry Adaptation of AI.pptIoT Industry Adaptation of AI.ppt
IoT Industry Adaptation of AI.pptPyingkodi Maran
 
Creation of Web Portal using DURPAL
Creation of Web Portal using DURPALCreation of Web Portal using DURPAL
Creation of Web Portal using DURPALPyingkodi Maran
 
AWS Relational Database Instance
AWS Relational Database InstanceAWS Relational Database Instance
AWS Relational Database InstancePyingkodi Maran
 
Creation of AWS Instance in Cloud Platform
Creation of AWS Instance in Cloud PlatformCreation of AWS Instance in Cloud Platform
Creation of AWS Instance in Cloud PlatformPyingkodi Maran
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing IntroductionPyingkodi Maran
 
Supervised Machine Learning Algorithm
Supervised Machine Learning AlgorithmSupervised Machine Learning Algorithm
Supervised Machine Learning AlgorithmPyingkodi Maran
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
Relational Database and Relational Algebra
Relational Database and Relational AlgebraRelational Database and Relational Algebra
Relational Database and Relational AlgebraPyingkodi Maran
 
IoT Real world Applications.pdf
IoT Real world Applications.pdfIoT Real world Applications.pdf
IoT Real world Applications.pdfPyingkodi Maran
 

Plus de Pyingkodi Maran (20)

Health Monitoring System using IoT.doc
Health Monitoring System using IoT.docHealth Monitoring System using IoT.doc
Health Monitoring System using IoT.doc
 
IoT Industry Adaptation of AI.ppt
IoT Industry Adaptation of AI.pptIoT Industry Adaptation of AI.ppt
IoT Industry Adaptation of AI.ppt
 
IoT_Testing.ppt
IoT_Testing.pptIoT_Testing.ppt
IoT_Testing.ppt
 
Azure Devops
Azure DevopsAzure Devops
Azure Devops
 
Creation of Web Portal using DURPAL
Creation of Web Portal using DURPALCreation of Web Portal using DURPAL
Creation of Web Portal using DURPAL
 
AWS Relational Database Instance
AWS Relational Database InstanceAWS Relational Database Instance
AWS Relational Database Instance
 
AWS S3 Buckets
AWS S3  BucketsAWS S3  Buckets
AWS S3 Buckets
 
Creation of AWS Instance in Cloud Platform
Creation of AWS Instance in Cloud PlatformCreation of AWS Instance in Cloud Platform
Creation of AWS Instance in Cloud Platform
 
Amazon Web Service.pdf
Amazon Web Service.pdfAmazon Web Service.pdf
Amazon Web Service.pdf
 
Cloud Security
Cloud SecurityCloud Security
Cloud Security
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introduction
 
Supervised Machine Learning Algorithm
Supervised Machine Learning AlgorithmSupervised Machine Learning Algorithm
Supervised Machine Learning Algorithm
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Relational Database and Relational Algebra
Relational Database and Relational AlgebraRelational Database and Relational Algebra
Relational Database and Relational Algebra
 
Transaction in DBMS
Transaction in DBMSTransaction in DBMS
Transaction in DBMS
 
IoT_Frameworks_.pdf
IoT_Frameworks_.pdfIoT_Frameworks_.pdf
IoT_Frameworks_.pdf
 
IoT Real world Applications.pdf
IoT Real world Applications.pdfIoT Real world Applications.pdf
IoT Real world Applications.pdf
 
IoT_Introduction.pdf
IoT_Introduction.pdfIoT_Introduction.pdf
IoT_Introduction.pdf
 
Keys in DBMS
Keys in DBMSKeys in DBMS
Keys in DBMS
 

Dernier

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 

Dernier (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

Unit_2_Feature Engineering.pdf

  • 1. Machine Learning Dr.M.Pyingkodi Dept of MCA Kongu Engineering College Erode, Tamilnadu, India Basics of Feature Engineering - Unit II
  • 2. Basics of Feature Engineering 2
  • 3. • the process of translating a data set into features such that these features are able to represent the data set more effectively and result in a better learning performance. • refers to manipulation — addition, deletion, combination, mutation — of your data set to improve machine learning model training, leading to better performance and greater accuracy. • a part of the preparatory activities • It is responsible for taking raw input data and converting that to well-aligned features which are ready to be used by the machine learning models. • FE encapsulates various data engineering techniques such as selecting relevant features, handling missing data, encoding the data, and normalizing it. • It has two major elements ❖ feature transformation ❖ feature subset selection 3
  • 4. • A feature is an attribute of a data set that is used in a machine learning process. • selection of the subset of features which are meaningful for machine learning is a sub-area of feature engineering. • The features in a data set are also called its dimensions. • a data set having ‘n’ features is called an n-dimensional data set. Class variable - Species predictor variables 5 dimensional dataset 4
  • 5. Apply a mathematical formula to a particular column(feature) and transform the values which are useful for our further analysis. Creating new features from existing features that may help in improving the model performance. A set of features (m) to create a new feature set (n) while retaining as much information as possible It has two major elements: 1. feature construction 2. feature extraction Both are sometimes known as feature discovery There are two distinct goals of feature transformation: 1. Achieving best reconstruction of the original features in the data set 2. Achieving highest efficiency in the learning task 5
  • 6. Involves transforming a given set of input features to generate a new set of more powerful features. The data set has three features – apartment length, apartment breadth, and price of the apartment. If it is used as an input to a regression problem, such data can be training data for the regression model. So given the training data, the model should be able to predict the price of an apartment whose price is not known or which has just come up for sale. However, instead of using length and breadth of the apartment as a predictor, it is much convenient and makes more sense to use the area of the apartment, which is not an existing feature of the data set. So such a feature, namely apartment area, can be added to the data set. In other words, we transform the three- dimensional data set to a four-dimensional data set, with the newly ‘discovered’ feature apartment area being added to the original data set. 6
  • 7. • when features have categorical value and machine learning needs numeric value inputs • when features having numeric (continuous) values and need to be converted to ordinal values • when text-specific feature construction needs to be done Ordinal data are discrete integers that can be ranked or sorted 7
  • 8. Encoding categorical (nominal) variables FIG. 4.3 Feature construction (encoding nominal variables) 8
  • 9. • The grade is an ordinal variable with values A, B, C, and D. • To transform this variable to a numeric variable, we can create a feature num_grade mapping a numeric value against each ordinal value. • mapped to values 1, 2, 3, and 4 in the transformed variable 9
  • 10. • As a real estate price category prediction, which is a classification problem. • In that case, we can ‘bin’ the numerical data into multiple categories based on the data range. • In the context of the real estate price prediction example, the original data set has a numerical feature apartment_price 10
  • 11. Text is arguably the most predominant medium of communication. Text mining is an important area of research unstructured nature of the data All machine learning models need numerical data as input. So the text data in the data sets need to be transformed into numerical features EX: Facebook or micro-blogging channels like Twitter or emails or short messaging services such as Whatsapp, Text plays a major role in the flow of information. Vectorization:   Turning text into vectors/ arrays To turn text to integer (or boolean, or floating numbers) vectors.  vectors are lists with n positions.  11
  • 12. I want to turn my text into data. One-hot encoding for “I want to turn my text into data” One-hot encoding for “I want my data”. One hot encoding only treats values as “present” and “not present”.  12
  • 13. 1. Tokenize In order to tokenize a corpus, the blank spaces and punctuations are used as delimiters to separate out the words, or tokens A corpus is a collection of authentic text or audio organized into datasets. 2. Count Then the number of occurrences of each token is counted, for each document. 3. Normalize Tokens are weighted with reducing importance when they occur in the majority of the documents. A matrix is then formed with each token representing a column and a specific document of the corpus representing each row. Each cell contains the count of occurrence of the token in a specific document. This matrix is known as a document-term matrix / term- document matrix 13
  • 14. Typical document- term matrix which forms an input to a machine learning model. 14
  • 15. New features are created from a combination of original features. Operators for combining the original features include 1. For Boolean features: Conjunctions, Disjunctions, Negation, etc. 2. For nominal features: Cartesian product, M of N, etc. 3. For numerical features: Min, Max, Addition, Subtraction,Multiplication, Division, Average, Equivalence, Inequality, etc. 15
  • 16. 16
  • 17. Dimensionality-reduction method. has multiple attributes or dimensions – many of which might have similarity with each other. EX: If the height is more, generally weight is more and vice versa In PCA, a new set of features are extracted from the original features which are quite dissimilar in nature. transforming a large set of variables into a smaller one. reduce the number of variables of a data set, while preserving as much information as possible. Objective: 1.The new features are distinct, i.e. the covariance between the new features, i.e. the principal components is 0. 2. The principal components are generated in order of the variability in the data that it captures. Hence, the first principal component should capture the maximum variability, the second principal component should capture the next highest variability etc. 3. The sum of variance of the new features or the principal components should be equal to the sum of variance of the original features. 17
  • 18. converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation These new transformed features are called the Principal Components. it contains the important variables and drops the least important variable. Examples: image processing, movie recommendation system, optimizing the power allocation in various communication channels. Eigen vectors are the principal components of the data set. Eigenvectors and values exist in pairs: every eigen vector has a corresponding eigenvalue. Eigen vector is the direction of the line (vertical, horizontal, 45 degrees etc.). An eigenvalue is a number, telling you how much variance there is in the data in that direction, Eigenvalue is a number telling you how spread out the data is on the line. 18
  • 19. Dimensionality It is the number of features or variables present in the given dataset. Correlation It signifies that how strongly two variables are related to each other. Such as if one changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly proportional to each other. Orthogonal It defines that variables are not correlated to each other, and hence the correlation between the pair of variables is zero. Eigenvectors column vector. Eigenvalue can be referred to as the strength of the transformation Covariance Matrix A matrix containing the covariance between the pair of variables is called the Covariance Matrix. 19
  • 20. 1. Standardizing the data those variables with larger ranges will dominate over those with small ranges (For example, a variable that ranges between 0 and 100 will dominate over a variable that ranges between 0 and 2. Covariance Matrix Calculation how the variables of the input data set are varying from the mean with respect to each other. if positive then : the two variables increase or decrease together (correlated) if negative then : One increases when the other decreases (Inversely correlated) 20
  • 21. Steps for PCA algorithm 3.Compute The Eigenvectors & Eigenvalues of The Covariance Matrix To Identify The Principal Components Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components. principal components represent the directions of the data that explain a maximal amount of variance 21
  • 22. 4. The eigenvector having the next highest eigenvalue represents the direction in which data has the highest remaining variance and also orthogonal to the first direction. So this helps in identifying the second principal component. 5. Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues so as to get the ‘k’ principal components. eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most variance(most information) and that we call Principal Components. eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each Principal Component. 22
  • 23. • a matrix is a factorization of that matrix into three matrices. • linear transformations SVD of a matrix A (m × n) is a factorization of the form: • two orthogonal matrices U and V and diagonal matrix D. • where, U and V are orthonormal matrices • U is an m × m unitary matrix, • V is an n × n unitary matrix and • Σ is an m × n rectangular diagonal matrix. • The diagonal entries of Σ are known as singular values of matrix A. • The columns of U and V are called the left-singular and right-singular vectors of matrix A • The square roots of these eigenvalues are called singular values. 23
  • 24. 1. Patterns in the attributes are captured by the right-singular vectors, i.e.the columns of V. 2. Patterns among the instances are captured by the left-singular, i.e. the columns of U. 3. Larger a singular value, larger is the part of the matrix A that it accounts for and its associated vectors. 4. New data matrix with k’ attributes is obtained using the equation D = D × [v , v , … , v ] Thus, the dimensionality gets reduced to k 24
  • 25. PCA, is capture the data set variability. PCA that calculates eigenvalues of the covariancematrix of the data set LDA focuses on class separability. To reduce the number of features to a more manageable number before classification.  commonly used for supervised classification problems Separating the features based on class separability so as to avoid over-fitting of the machine learning model. • LDA calculates eigenvalues and eigenvectors within a class and inter-class scatter matrices. 25
  • 26. 1. Calculate the mean vectors for the individual classes. 2. Calculate intra-class and inter-class scatter matrices. 3. Calculate eigenvalues and eigenvectors for S and S , where S is the intra-class scatter matrix and S is the inter-class scatter matrix where, mi is the sample mean for each class, m is the overall mean of the data set, Ni is the sample size of each class 4. Identify the top k’ eigenvectors having top k’ eigenvalues where, m is the mean vector of the i-th class 26
  • 27. to select a subset of attributes or features that makes the most meaningful contribution to a machine learning activity. the process of reducing the number of input variables when developing a predictive model. Dimensionally reduction techniques in that both methods seek fewer input variables to a predictive model. Why feature subset selection? to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. It removes unnecessary and repetitive data, 27
  • 28. Trying to predict the weight of students based on past information about similar students, The student weight data set has features such as Roll Number, Age, Height, and Weight. Roll number can have no influence to predict weight of the students, Therefore we eliminate rollno from the dataset 28
  • 29. The high number of variables or attributes or features present in certain data sets known as High dimensional dataset The high-dimensional spaces often have hundreds or thousands of dimensions or attributes Ex: DNA analysis, geographic information systems (GIS), social networking, etc. EX dataset : DNA DNA microarray data can have up to 450,000 variables (gene probes). social networking sites, emails The text data generated from different sources also have extremely high dimensions. 29
  • 30. • Having faster and more cost-effective (i.e. less need for computational resources) learning model • Improving the efficiency of the learning model • Having a better understanding of the underlying model that generated the data To identify a subset of original features from a given dataset while removing irrelevant and/or redundant features The simplest algorithm is to test each possible subset of features finding the one which minimizes the error rate. 30
  • 31. Techniques that calculate a score for all the input features for a given model — The scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable. Each of the predictor variables, is expected to contribute information to decide the value of the class label. In case a variable is not contributing any information, it is said to be irrelevant. In case the information contribution for prediction is very little, the variable is said to be weakly relevant. Remaining variables, which make a significant contribution to the prediction task are said to be strongly relevant variables. In unsupervised learning, there is no training data set or labelled data. Grouping of similar data instances are done and similarity of data instances are evaluated based on the value of different variables. Certain variables do not contribute any useful information for deciding the similarity of dissimilarity of data instances. Hence, those variables make no significant information contribution in the grouping process. These variables are marked as irrelevant variables in the context of the unsupervised machine learning task. 31
  • 32. A feature may contribute to information that is similar to the information contributed by one or more features. Duplicated features or information, that adds as a precaution against failure or error. For example, in the Student Data-set (refer slide no.8) both the features Age & Height contribute similar information. This is because, with an increase in age, weight is expected to increase. Similarly, with the increase in Height also weight is expected to increase. So, in context to that problem, Age and Height contribute similar information. In other words, irrespective of whether the feature Height is present or not, the learning model will give the same results. In this kind of situation where one feature is similar to another feature, the feature is said to be potentially redundant in the context of a machine learning problem. 32
  • 33. Supervised Learning Mutual information is considered as a good measure of information contribution of a feature to decide the value of the class label. Mutual information: A quantity called mutual information measures the amount of information one can obtain from one random variable given another. a measure of dependence between two random variables it is a good indicator of the relevance of a feature with respect to the class variable. The higher the value of mutual information of a feature, the more relevant is that feature. Mutual information = H(C) - marginal Entropy of the classs 33
  • 34. marginal entropy of the feature ‘x’, H( f ) = K = number of classes C = class variable f = feature set that take discrete values. In the case of unsupervised learning, the entropy of the set of features without one feature at a time is calculated for all features. Then the features are ranked in descending order of information gain from a feature and the top 'beta' percentage (value of beta is a design parameter of the algorithm) of features are selected as relevant features. The entropy of a feature f is calculated using Shannon’s formula 34
  • 35. Similar information contribution by multiple features. Methods: 1. Correlation-based measures 2. Distance-based measures 3. Other coefficient-based measure Correlation-based similarity measure Correlation is a measure of linear dependency between two random variables. Pearson’s product moment correlation coefficient is one of the most popular and accepted measures of correlation between two random variables. 35
  • 36. For two random feature variables F1 and F2, Pearson correlation coefficient is defined Correlation values range between +1 and –1. A correlation of 1 (+ / –) indicates perfect correlation the two features having a perfect linear relationship In case the correlation is 0, then the features seem to have no linear relationship. 36
  • 37. Euclidean distance, which, between two features F1 and F2 are calculated as where F1 and F2 are features of an n-dimensional data set. 37
  • 38. A more generalized form of the Euclidean distance is the Minkowski distance Minkowski distance takes the form of Euclidean distance (also called L2 norm) when r = 2. At r = 1, it takes the form of Manhattan distance (also called L1 norm), Manhattan distance, used more frequently to calculate the distance between binary vectors is the Hamming distance. For example, the Hamming distance between two vectors 01101011 and 11001001 is 3, as 38
  • 39. Measure of similarity between two features. The Jaccard distance, a measure of dissimilarity between two features, is complementary of Jaccard index. For two features having binary values, Jaccard index is measured as where, n11 = number of cases where both the features have value 1 n01 = number of cases where the feature 1 has value 0 and feature 2 has value 1 n10 = number of cases where the feature 1 has value 1 and feature 2 has value 0 Jaccard distance, dJ = 1 - J 39
  • 40. It includes a number of cases where both the features have a value of 0. where, n11 = number of cases where both the features have value 1 n01 = number of cases where the feature 1 has value 0 and feature 2 has value 1 n10 = number of cases where the feature 1 has value 1 and feature 2 has value 0 n00 = number of cases where both the features have value 0 40
  • 41. Cosine similarity which is one of the most popular measures in text classification if Cosine similarity value = 1 then similarity exists if Cosine similarity value = 0 then no similarity found between the features 41
  • 42. Feature selection is the process of selecting a subset of features in a data set steps: 1. generation of possible subsets 2. subset evaluation 3. stop searching based on some stopping criterion 4. validation of the result F 42
  • 43. Produce all possible candidate subsets. for an n- dimensional data set, 2n subsets can be generated. value of ‘n’ becomes high, finding an optimal subset from all the 2n candidate subsets becomes intractable. Approximate search strategies are employed to find candidate subsets for evaluation. Sequential forward selection The search may start with an empty set and keep adding features. Sequential backward elimination A search may start with a full set and successively remove features Bi-directional selection search start with both ends and add and remove features simultaneously 43
  • 44. 2.Subset Evaluation Each candidate subset is then evaluated and compared with the previous best performing subset based on certain evaluation criterion. If the new subset performs better, it replaces the previous one. 3. Stopping Criterion This cycle of subset generation and evaluation continues till a pre-defined stopping criterion is fulfilled. Some commonly used stopping criteria are 1. the search completes 2. some given bound (e.g. a specified number of iterations) is reached 3. subsequent addition (or deletion) of the feature is not producing a better subset 4. a sufficiently good subset (e.g. a subset having better classification accuracy than the existing benchmark) is selected 4. Result validation The selected best subset is validated either against prior benchmarks or by experiments using real-life or synthetic but authentic data sets. Steps F 44
  • 45. The feature subset is selected based on statistical measures done to assess the merits of the features from the data perspective. No learning algorithm is employed to evaluate the goodness of the feature selected. Methods v Pearson’s correlation, v information gain, v Fisher score, v analysis of variance (ANOVA), v Chi-Square, etc. 45
  • 46. Using the induction algorithm as a black box. Every candidate subset, the learning model is trained and the result is evaluated by running the learning algorithm, wrapper approach is computationally very expensive. 46
  • 47. Hybrid Takes the advantage of both filter and wrapper approaches. A typical hybrid algorithm makes use of both the statistical tests as used in filter approach to decide the best subsets for a given cardinality and a learning algorithm to select the final best subset among the best subsets across different cardinalities Embedded Similar to wrapper approach as it also uses and inductive algorithm to evaluate the generated feature subsets. 47