SlideShare une entreprise Scribd logo
1  sur  33
Machine Learning with
DECISION TREES
Ramandeep Kaur
Software Consultant
Knoldus Software LLP
Agenda
● What are Decision trees?
● Appropriate problems for DTrees
● Information gain , Entropy
● Issues with DTrees
● Overfitting
● Pruning
● Demo
What are Decision trees?
● A decision tree is a tree in which each branch
node represents a choice between a number
of alternatives, and each leaf node represents
a decision.
● A type of supervised learning algorithm.
● It is one of the most widely used and practical
methods for Inductive Inference.
Example
Appropriate problems for DTrees
● Instances are represented by attribute-value pairs
● The target function has discrete output values
● The training data may contain errors
● The training data may have missing attribute values
How to select the deciding node?
Which is the best Classifier?
Which node can be described easily?
➔ Less impure node requires less information to describe it.
➔ More impure node requires more information.
===> Information theory is a measure to
define this degree of disorganization in a
system known as Entropy.
So, we can conclude
Entropy - measuring
homogeneity of a learning set
● Entropy is a measure of the uncertainty about a source of
messages.
● Given a collection S, containing positive and negative
examples of some target concept, the entropy of S relative to
this classification.
where, pi is the proportion of S belonging to class i
●
Entropy is 0 if all the members
of S belong to the same class.
● Entropy is 1 when the collection
contains an equal no. of +ve
and -ve examples.
●
Entropy is between 0 and 1 if
the collection contains unequal
no. of +ve and -ve examples.
Information gain
● Decides which attribute goes into a decision node.
● To minimize the decision tree depth, the attribute with the
most entropy reduction is the best choice!
Where:
● S is each value v of all possible values of attribute A
● Sv = subset of S for which attribute A has value v
● |Sv| = number of elements in Sv
● |S| = number of elements in S
The information gain, Gain(S,A) of an attribute A,
Issues in Decision Tree
Learning
ISSUES
● How deeply to grow the decision tree?
● Handling continuous attributes
● Choosing an appropriate attribute selection
measure
● Handling training data with missing attribute
values
Overfitting in Decision Trees
● If a decision tree is fully grown, it may lose
some generalization capability.
● This is a phenomenon known as overfitting.
Overfitting
A hypothesis overfits the training examples if
some other hypothesis that fits the training
examples less well actually performs better
over the entire distribution of instances (i.e,
including instances beyond the training
examples)
Why ??
Causes of Overfitting
● Overfitting Due to Presence of Noise -
Mislabeled instances may contradict the class
labels of other similar records.
● Overfitting Due to Lack of Representative
Instances - Lack of representative instances
in the training data can prevent refinement of
the learning algorithm.
Overfitting Due to Noise: An
Example
Overfitting Due to Noise
Overfitting Due to Noise
Overfitting Due to Lack of
Samples
Overfitting Due to Lack of
Samples
● Although the model’s training error
is zero, its error rate on the test set
is 30%.
● Humans, elephants, and dolphins
are misclassified because the
decision tree classifies all
warmblooded vertebrates that do not
hibernate as non-mammals.
A good model must not only fit the
training data well
but also accurately classify records
it has never seen.
Avoiding overfitting in decision
tree learning
● approaches that stop growing the tree earlier,
before it reaches the point where it perfectly
classifies the training data.
● approaches that allow the tree to overfit the
data, and then post-prune the tree.
Approaches to implement
● separate set of examples
● a statistical test: chi-square test
● a heuristic called the Minimum Description
Length principle - (explicit measure of the
complexity for encoding the training examples
and the decision tree, halting growth of the
tree when this encoding size is minimized.)
PRUNING
● Consider each of the decision nodes in the tree to be candidates for
pruning.
● Pruning a decision node consists of removing the subtree rooted at
that node, making it a leaf node, and assigning it the most common
classification of the training examples affiliated with that node.
● Nodes are removed only if the resulting pruned tree performs no
worse than the original over the validation set.
● Pruning of nodes continues until further pruning is harmful (i.e.,
decreases accuracy of the tree over the validation set).
DEMO
References
● Machine Learning – Tom Mitchell
● https://www3.nd.edu/~rjohns15/cse40647.sp14/ww
Thank You

Contenu connexe

Tendances

Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 

Tendances (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Random forest
Random forestRandom forest
Random forest
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Cnn
CnnCnn
Cnn
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Back propagation
Back propagationBack propagation
Back propagation
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 

Similaire à Machine Learning with Decision trees

20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptxkibriaswe
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...zohebmusharraf
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.DrezzingGaming
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision TreesRupak Roy
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptxGaytriDhingra1
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...DurgaDevi310087
 

Similaire à Machine Learning with Decision trees (20)

MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision tree
Decision tree Decision tree
Decision tree
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 

Plus de Knoldus Inc.

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks PresentationKnoldus Inc.
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxKnoldus Inc.
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance TestingKnoldus Inc.
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationKnoldus Inc.
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.Knoldus Inc.
 

Plus de Knoldus Inc. (20)

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and Kotlin
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks Presentation
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptx
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance Testing
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower Presentation
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.
 

Dernier

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Dernier (20)

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

Machine Learning with Decision trees

  • 1. Machine Learning with DECISION TREES Ramandeep Kaur Software Consultant Knoldus Software LLP
  • 2. Agenda ● What are Decision trees? ● Appropriate problems for DTrees ● Information gain , Entropy ● Issues with DTrees ● Overfitting ● Pruning ● Demo
  • 3. What are Decision trees? ● A decision tree is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision. ● A type of supervised learning algorithm.
  • 4. ● It is one of the most widely used and practical methods for Inductive Inference.
  • 6. Appropriate problems for DTrees ● Instances are represented by attribute-value pairs ● The target function has discrete output values ● The training data may contain errors ● The training data may have missing attribute values
  • 7. How to select the deciding node? Which is the best Classifier?
  • 8. Which node can be described easily?
  • 9. ➔ Less impure node requires less information to describe it. ➔ More impure node requires more information. ===> Information theory is a measure to define this degree of disorganization in a system known as Entropy. So, we can conclude
  • 10. Entropy - measuring homogeneity of a learning set ● Entropy is a measure of the uncertainty about a source of messages. ● Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this classification. where, pi is the proportion of S belonging to class i
  • 11. ● Entropy is 0 if all the members of S belong to the same class. ● Entropy is 1 when the collection contains an equal no. of +ve and -ve examples. ● Entropy is between 0 and 1 if the collection contains unequal no. of +ve and -ve examples.
  • 12. Information gain ● Decides which attribute goes into a decision node. ● To minimize the decision tree depth, the attribute with the most entropy reduction is the best choice!
  • 13. Where: ● S is each value v of all possible values of attribute A ● Sv = subset of S for which attribute A has value v ● |Sv| = number of elements in Sv ● |S| = number of elements in S The information gain, Gain(S,A) of an attribute A,
  • 14. Issues in Decision Tree Learning
  • 15. ISSUES ● How deeply to grow the decision tree? ● Handling continuous attributes ● Choosing an appropriate attribute selection measure ● Handling training data with missing attribute values
  • 16. Overfitting in Decision Trees ● If a decision tree is fully grown, it may lose some generalization capability. ● This is a phenomenon known as overfitting.
  • 17. Overfitting A hypothesis overfits the training examples if some other hypothesis that fits the training examples less well actually performs better over the entire distribution of instances (i.e, including instances beyond the training examples)
  • 19. Causes of Overfitting ● Overfitting Due to Presence of Noise - Mislabeled instances may contradict the class labels of other similar records. ● Overfitting Due to Lack of Representative Instances - Lack of representative instances in the training data can prevent refinement of the learning algorithm.
  • 20. Overfitting Due to Noise: An Example
  • 23. Overfitting Due to Lack of Samples
  • 24. Overfitting Due to Lack of Samples ● Although the model’s training error is zero, its error rate on the test set is 30%. ● Humans, elephants, and dolphins are misclassified because the decision tree classifies all warmblooded vertebrates that do not hibernate as non-mammals.
  • 25. A good model must not only fit the training data well but also accurately classify records it has never seen.
  • 26. Avoiding overfitting in decision tree learning ● approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data. ● approaches that allow the tree to overfit the data, and then post-prune the tree.
  • 27. Approaches to implement ● separate set of examples ● a statistical test: chi-square test ● a heuristic called the Minimum Description Length principle - (explicit measure of the complexity for encoding the training examples and the decision tree, halting growth of the tree when this encoding size is minimized.)
  • 28. PRUNING ● Consider each of the decision nodes in the tree to be candidates for pruning. ● Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node. ● Nodes are removed only if the resulting pruned tree performs no worse than the original over the validation set. ● Pruning of nodes continues until further pruning is harmful (i.e., decreases accuracy of the tree over the validation set).
  • 29.
  • 30.
  • 31. DEMO
  • 32. References ● Machine Learning – Tom Mitchell ● https://www3.nd.edu/~rjohns15/cse40647.sp14/ww