SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
PRACTICAL ISSUES
in Machine Learning
Partha Sarathi Kar
IVSM 166777
1
CONTENTS
1. Importance of Good Features
2. Irrelevant and Redundant Features
3. Feature Pruning and Normalization
4. Evaluating Model Performance
5. Cross Validation
6. Hypothesis Testing and Statistical Significance
7. Debugging Learning Algorithms
8. Bias/Variance Trade-off
2/15
IMPORTANCE OF GOOD FEATURES
Feature:
• a feature is an individual
measurable property
• a base of a model
3/15
Importance of
Feature:
• choosing poorly will
result in an unreliable
model
Figure: Machine learning workflow
FEATURE EXTRACTION EXAMPLE
pixel representation
• 100 x 100 pixel image = 30,000
dimension vector
• each dimension corresponds to the
RGB
• Like feature(1.1) is ..
4/15
patch representation
• the unit of interest is a small rectangular
block
• rather than a single pixel
object recognition from images
Figure: pixel representation
Figure: patch representation
FEATURE EXTRACTION EXAMPLE
shape representation
• throw out all color and pixel
information
• simply provide a bounding polygon
5/15
text categorization
bag of words representation
object recognition from images
Figure: pixel representation
Figure: pixel representation
Figure: shape representation
Figure: text categorization
IRRELEVANT AND REDUNDANT FEATURES
6/15
Figure: pixel representation
Figure: shape representation
Irrelevant Feature:
the presence of
the word “the” might
be largely irrelevant
for predicting whether
a
course review is
positive or negative.
an irrelevant
feature is one that is
completely uncorrelated with
the prediction
task
IRRELEVANT AND REDUNDANT FEATURES
7/15
Figure: pixel representation
Figure: shape representation
Redundant Feature:
having a bright red
pixel in an image at
position
(20, 93) is probably
highly redundant with
having a bright red
pixel
at position (21, 93)
two features are redundant if
they are highly correlated
eg: both might be useful for
identifying fire hydrants
Figure: fire hydrants
FEATURE PRUNING AND NORMALIZATION
8/15
Figure: pixel representation
Figure: shape representation
Feature Pruning:
the word “good” appears
in exactly one training
document, which is
positive.
It’s hard to tell with just
one training example if it
is really correlated with
the
positive class, or is it just
noise.
• reduces the size of decision trees
• reduces the complexity of the
final classifier
FEATURE PRUNING AND NORMALIZATION
9/15
Figure: pixel representation
Figure: shape representation
Normalization:
to make it easier for your learning
algorithm to learn.
Eg: the height of the “A” has been
reduced from 8 to 6 pixels, while the
width has been reduced from 7 to 5
pixels
EVALUATING MODEL PERFORMANCE
10/15
Figure: pixel representation
Figure: shape representation
Purpose:
highly accurate classifier
eg:
Medical Diagnosis
Spam Detection
There are two major types of binary
classification problems.
1.“X versus Y.” For instance, positive versus
negative sentiment.
2. “X versus not-X.” For instance, spam versus
non-spam.
CROSS VALIDATION
11/15
Figure: pixel representation
Figure: shape representation
• evaluating and comparing learning
algorithms
• how a model will perform in the
future
dividing data into two
segments:
one used to learn or
train a model
and the other used to
validate the model
HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE
12/15
Figure: pixel representation
eg. In cross validation, compare
between 7% error and 6.9%
error over 1000 examples
in machine learning just as in statistical
hypothesis testing.
DEBUGGING LEARNING ALGORITHMS
13/15
Figure: pixel representation
Moreover, sometimes bugs lead
to learning algorithms
performing
better
• it’s unclear to identify there’s a bug
or
• problem is too hard or
• there’s too much noise
• Learning algorithms are notoriously hard to debug
BIAS/VARIANCE TRADE-OFF
14/15
Figure: pixel representation
trade-off between estimation error and
approximation error
f be the learned classifier, selected
from a set F of “all possible
classifiers using a fixed
representation,” and f * is optimal
classifier
estimation error, measures how
far the actual learned classifier f
is from the optimal classifier f *
approximation error, measures
the quality of the model family
REFERENCES
15/15
Figure: pixel representation
• http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf
• https://en.wikipedia.org/wiki/Feature_(machine_learning)
• https://stats.stackexchange.com
• https://www.quora.com
16
THANKS

Contenu connexe

Tendances

Testing strategies part -1
Testing strategies part -1Testing strategies part -1
Testing strategies part -1Divya Tiwari
 
Need for Software Engineering
Need for Software EngineeringNeed for Software Engineering
Need for Software EngineeringUpekha Vandebona
 
Finite automata-for-lexical-analysis
Finite automata-for-lexical-analysisFinite automata-for-lexical-analysis
Finite automata-for-lexical-analysisDattatray Gandhmal
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisMunni28
 
States, state graphs and transition testing
States, state graphs and transition testingStates, state graphs and transition testing
States, state graphs and transition testingABHISHEK KUMAR
 
Quality and productivity factors
Quality and productivity factorsQuality and productivity factors
Quality and productivity factorsNancyBeaulah_R
 
Software Engineering concept
Software Engineering concept Software Engineering concept
Software Engineering concept Atamjitsingh92
 
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)IrtazaAfzal3
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler DesignShine Raj
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lexAnusuya123
 
Constructive Cost Model - II (COCOMO-II)
Constructive Cost Model - II (COCOMO-II)Constructive Cost Model - II (COCOMO-II)
Constructive Cost Model - II (COCOMO-II)AmanSharma1172
 
Metrics for project size estimation
Metrics for project size estimationMetrics for project size estimation
Metrics for project size estimationNur Islam
 
Estimating Software Maintenance Costs
Estimating Software Maintenance CostsEstimating Software Maintenance Costs
Estimating Software Maintenance Costslalithambiga kamaraj
 
software cost factor
software cost factorsoftware cost factor
software cost factorAbinaya B
 
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dhrumil Panchal
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Aman Sharma
 

Tendances (20)

Testing strategies part -1
Testing strategies part -1Testing strategies part -1
Testing strategies part -1
 
Need for Software Engineering
Need for Software EngineeringNeed for Software Engineering
Need for Software Engineering
 
Finite automata-for-lexical-analysis
Finite automata-for-lexical-analysisFinite automata-for-lexical-analysis
Finite automata-for-lexical-analysis
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
States, state graphs and transition testing
States, state graphs and transition testingStates, state graphs and transition testing
States, state graphs and transition testing
 
Quality and productivity factors
Quality and productivity factorsQuality and productivity factors
Quality and productivity factors
 
Software Engineering concept
Software Engineering concept Software Engineering concept
Software Engineering concept
 
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler Design
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Constructive Cost Model - II (COCOMO-II)
Constructive Cost Model - II (COCOMO-II)Constructive Cost Model - II (COCOMO-II)
Constructive Cost Model - II (COCOMO-II)
 
Metrics for project size estimation
Metrics for project size estimationMetrics for project size estimation
Metrics for project size estimation
 
Estimating Software Maintenance Costs
Estimating Software Maintenance CostsEstimating Software Maintenance Costs
Estimating Software Maintenance Costs
 
software cost factor
software cost factorsoftware cost factor
software cost factor
 
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)Dynamic Programming Code-Optimization Algorithm (Compiler Design)
Dynamic Programming Code-Optimization Algorithm (Compiler Design)
 
Defect Causal Analysis
Defect Causal Analysis Defect Causal Analysis
Defect Causal Analysis
 
Checkpoints of the Process
Checkpoints of the ProcessCheckpoints of the Process
Checkpoints of the Process
 
Cocomo model
Cocomo modelCocomo model
Cocomo model
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 

Similaire à Practical issues in Machine Learning

DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDisplayr
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable MLMayur Sand
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionMohamad Sahil
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1heinestien
 
Visual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisVisual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisMinsuk Kahng
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentationNeerajNishad4
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 

Similaire à Practical issues in Machine Learning (20)

DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
C3 w5
C3 w5C3 w5
C3 w5
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Interpretable ML
Interpretable MLInterpretable ML
Interpretable ML
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
PMED: APPM Workshop: Eliminating the Irrelevant - The HARVEST Algorithm - Her...
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 
Visual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube AnalysisVisual Exploration of Machine Learning Results using Data Cube Analysis
Visual Exploration of Machine Learning Results using Data Cube Analysis
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
PPT1
PPT1PPT1
PPT1
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
ch15.ppt
ch15.pptch15.ppt
ch15.ppt
 

Dernier

Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
Technical Management of cement industry.pdf
Technical Management of cement industry.pdfTechnical Management of cement industry.pdf
Technical Management of cement industry.pdfMadan Karki
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsYusuf Yıldız
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide LaboratoryBahzad5
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging systemgokuldongala
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.ppt
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.pptOracle_PLSQL_basic_tutorial_with_workon_Exercises.ppt
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.pptDheerajKashnyal
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfodunowoeminence2019
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfJulia Kaye
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...amrabdallah9
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 

Dernier (20)

Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
Technical Management of cement industry.pdf
Technical Management of cement industry.pdfTechnical Management of cement industry.pdf
Technical Management of cement industry.pdf
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovations
 
Présentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdfPrésentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdf
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging system
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.ppt
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.pptOracle_PLSQL_basic_tutorial_with_workon_Exercises.ppt
Oracle_PLSQL_basic_tutorial_with_workon_Exercises.ppt
 
計劃趕得上變化
計劃趕得上變化計劃趕得上變化
計劃趕得上變化
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
Lecture 4 .pdf
Lecture 4                              .pdfLecture 4                              .pdf
Lecture 4 .pdf
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 

Practical issues in Machine Learning

  • 1. PRACTICAL ISSUES in Machine Learning Partha Sarathi Kar IVSM 166777 1
  • 2. CONTENTS 1. Importance of Good Features 2. Irrelevant and Redundant Features 3. Feature Pruning and Normalization 4. Evaluating Model Performance 5. Cross Validation 6. Hypothesis Testing and Statistical Significance 7. Debugging Learning Algorithms 8. Bias/Variance Trade-off 2/15
  • 3. IMPORTANCE OF GOOD FEATURES Feature: • a feature is an individual measurable property • a base of a model 3/15 Importance of Feature: • choosing poorly will result in an unreliable model Figure: Machine learning workflow
  • 4. FEATURE EXTRACTION EXAMPLE pixel representation • 100 x 100 pixel image = 30,000 dimension vector • each dimension corresponds to the RGB • Like feature(1.1) is .. 4/15 patch representation • the unit of interest is a small rectangular block • rather than a single pixel object recognition from images Figure: pixel representation Figure: patch representation
  • 5. FEATURE EXTRACTION EXAMPLE shape representation • throw out all color and pixel information • simply provide a bounding polygon 5/15 text categorization bag of words representation object recognition from images Figure: pixel representation Figure: pixel representation Figure: shape representation Figure: text categorization
  • 6. IRRELEVANT AND REDUNDANT FEATURES 6/15 Figure: pixel representation Figure: shape representation Irrelevant Feature: the presence of the word “the” might be largely irrelevant for predicting whether a course review is positive or negative. an irrelevant feature is one that is completely uncorrelated with the prediction task
  • 7. IRRELEVANT AND REDUNDANT FEATURES 7/15 Figure: pixel representation Figure: shape representation Redundant Feature: having a bright red pixel in an image at position (20, 93) is probably highly redundant with having a bright red pixel at position (21, 93) two features are redundant if they are highly correlated eg: both might be useful for identifying fire hydrants Figure: fire hydrants
  • 8. FEATURE PRUNING AND NORMALIZATION 8/15 Figure: pixel representation Figure: shape representation Feature Pruning: the word “good” appears in exactly one training document, which is positive. It’s hard to tell with just one training example if it is really correlated with the positive class, or is it just noise. • reduces the size of decision trees • reduces the complexity of the final classifier
  • 9. FEATURE PRUNING AND NORMALIZATION 9/15 Figure: pixel representation Figure: shape representation Normalization: to make it easier for your learning algorithm to learn. Eg: the height of the “A” has been reduced from 8 to 6 pixels, while the width has been reduced from 7 to 5 pixels
  • 10. EVALUATING MODEL PERFORMANCE 10/15 Figure: pixel representation Figure: shape representation Purpose: highly accurate classifier eg: Medical Diagnosis Spam Detection There are two major types of binary classification problems. 1.“X versus Y.” For instance, positive versus negative sentiment. 2. “X versus not-X.” For instance, spam versus non-spam.
  • 11. CROSS VALIDATION 11/15 Figure: pixel representation Figure: shape representation • evaluating and comparing learning algorithms • how a model will perform in the future dividing data into two segments: one used to learn or train a model and the other used to validate the model
  • 12. HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE 12/15 Figure: pixel representation eg. In cross validation, compare between 7% error and 6.9% error over 1000 examples in machine learning just as in statistical hypothesis testing.
  • 13. DEBUGGING LEARNING ALGORITHMS 13/15 Figure: pixel representation Moreover, sometimes bugs lead to learning algorithms performing better • it’s unclear to identify there’s a bug or • problem is too hard or • there’s too much noise • Learning algorithms are notoriously hard to debug
  • 14. BIAS/VARIANCE TRADE-OFF 14/15 Figure: pixel representation trade-off between estimation error and approximation error f be the learned classifier, selected from a set F of “all possible classifiers using a fixed representation,” and f * is optimal classifier estimation error, measures how far the actual learned classifier f is from the optimal classifier f * approximation error, measures the quality of the model family
  • 15. REFERENCES 15/15 Figure: pixel representation • http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf • https://en.wikipedia.org/wiki/Feature_(machine_learning) • https://stats.stackexchange.com • https://www.quora.com