SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Dr. sura Zakie
By : Aya jamal Haidy
Data Mining
Decision Tree
What We Will Talk
About
• Introduction
• Decision Tree Algorithms
• Hunt's algorithm
• Methods for Expressing Test Condition
• Advantages AND Disadvantages of the Decision Tree
• Measures for Selecting the Best Split
Introduction
• A classification technique is a systematic approach to building classification models from an input
data set.Each technique employs a learning algorithm to identify a model that best fits the
relationship between the attribute set
and class label of the input data. The model generated by a learning algorithm
should both fit the input data well and correctly predict the class labels
of It has never seen before.
Classifying a test record is straightforward once a decision tree has been
constructed. Starting from the root node, we apply the test condition to the
record and follow the appropriate branch based on the outcome of the test.
• Classification
• A decision tree classifier, which is a simple yet widely used classification technique, which is a
hierarchical structure consisting of nodes and directed edges.
• A root node:- that has no incoming edges and zero or more outgoing edges.
• Internal nodes:- each of which has exactly one incoming edge and two or more
outgoing edges.
• Leaf or terminal nodes:- each of which has exactly one incoming edge and no outgoing edges.
Introduction
• Decision Tree
The tree has three types of nodes :
In a decision tree, each leaf node is assigned a class label.
The nonterminal nodes, which include the root and other internal nodes, contain attribute test
conditions to separate records that have different characteristics.
Decision Tree
Diagram
there are some of the efficient algorithms have been developed for decision tree . These algorithms
usually employ a greedy strategy that grows a decision tree ,
One such algorithm is hunt's algorithm,
which is the basis of many existing decision
tree induction algorithms,
including ID3 , C4.5, and CART.
• Decision Tree Classification Algorithms
In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records
into successively purer subsets.
Let Dl be the set of training records that are associated with node t and g : {At,U2, . . . ,A"}
be the class labels.
• Hunt's algorithm
Step 1 : If ail the records in Dt belong to the same class y1, then t is a leaf node labeled as y.
Step 2: If D; contains records that belong to more than one class, an attribute test condition is selected to
partition the records into smaller subsets. A child node is created for each outcome of the test condition and
the records in D1 are distributed to the children based on the outcomes. The algorithm is then recursively
applied to each child node
The following is a recursive definition of Hunt's algorithm.
Tid Home owner Marital Status Annual Income Defulted Borrower
1 Yes single 125k No
2 NO Married 100k No
3 NO single 70k No
4 yes Married 120k No
5 NO Divorced 95k Yes
6 NO Married 60k No
7 Yes Divorced 220k No
8 NO Single 85k Yes
9 NO Married 75k No
10 NO single 90k Yes
⦁ Depends on attribute types :
⦁ Nominal
⦁ Ordinal
⦁ Continuous
• Methods for Expressing Test Condition
⦁ Depends on number of ways to split
⦁ 2-way split
⦁ Multi-way split
Binary Attributes :
Warm-
blooded
cold-
blooded
• attribute types
The test condition for a binary attribute generates two potential outcomes
Body
Temperature
Nominal Attributes :
Single
Divorced
Married
• attribute types
Since a nominal attribute can have many values, its test condition can be expressed in two ways :
Use as many partitions as distinct values . Divides values into two subsets. Need to find optimal partitioning.
Marital
Status
1. Multiway spli : 2. Binary split ( by grouping attribute values ) :
{Single,
Married}
Divorced
Marital
Status
{Married} {Single,
Divorced}
Marital
Status
{Married}
Divorced
Marital
Status
{Single}
OR OR
Ordinal Attributes :
• attribute types
Multi-way split: Use as many partitions as distinct values
– Divides values into two subsets
Binary split:
– Preserve order property among attribute values
{Small ,
large}
Medium ,Extra
large }
Short size
{Medium
Large,Extra large
}
Short size
{Small}
{small,
medium}
{large, ,Extra
large }
Short size
⦁ Continuous Attributes
– Discretization to form an ordinal categorical attribute
Ranges can be found by equal interval bucketing, equal frequency
bucketing (percentiles), or clustering.
• Static – discretize once at the beginning
• Dynamic – repeat at each node
Different ways of handling
> 80K
> 10K
Annual
Income
> 50K , 80K
> 25K , 50K
> 10K , 25K
⦁ Continuous Attributes
– Binary Decision: (A < v) or (A v)
consider all possible splits and finds the best cut
can be more compute intensive
Stop splitting if all the records belong to the
same class or have identical attribute values Early termination.
• How should the splitting procedure stop?
NO
YES
Annual
Income
> 80K
There are many measures that can be used to determine the best way to split the records. These measures are
defined in terms of the class distribution of the records before and after splitting.
The measures developed for selecting the best split are often based on the degree of impurity of the child nodes. The
smaller the degree of impurity, the more skewed the class distribution. For example, a node with class distribution (0,1)
has zero impurity, whereas a node with uniform class distribution (0.5,0.5) has the highest impurity. Examples of impurity
measures include
where c is the number of classes and 0log2 0 = 0 in entropy calculations.
• Measures for Selecting the Best Split
examples of computing the different impuritv measures
• Measures for Selecting the Best Split
To determine how well a test condition performs, we need to compare the degree of impurity of the parent node
(before splitting) with the degree of impurity of the child nodes (after splitting). The larger their difference, the
better the test condition. The gain,
is a criterion that can be used to determine the goodness of a split:
where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, k is
the number of attribute values, and N(Vj) is the number of records associated with the child node, u7.
Decision tree induction algorithms often choose a test condition that maximizes the gain . Since /(parent) is
the same for all test conditions, maximizing the gain is equivalent to minimizing the weighted average
impurity measures of the child nodes. Finally, when entropy is used as the impurity measure in Equation
4.6, the difference in entropy is known as the information gain.
• Measures for Selecting the Best Split
Parent
C0 6
C1 6
GINI=0.5
N1 N2
C0 4 2
C1 3 3
Gini=0.486
Splits into two partitions (child nodes)
Effect of Weighing partitions:
Larger and purer partitions are sought
Gini Index for (A)
Gini(N1)
= 1 – (4/7)2 – (3/7)2
= 0.489
• Splitting of Binary Attributes
Node N1 Node N2
A
YES NO
Gini(N2)
= 1 – (2/5)2 – (3/5)2
= 0.48
Weighted Gini of N1 N2
= 7/12 * 0.489 +
5/12 * 0.48
= 0.486
Gain = 0.5 – 0.486 = 0.014
sport,
luxury
Family
C0 9 1
C1 7 3
Gini
lFor each distinct value, gather counts for each class in the dataset
lUse the count matrix to make decisions
• Splitting of Nominal Attributes
Care Type
sports
Family,
luxury
C0 8 2
C1 0 10
Gini
Care Type
Family sports Luxury
C0 1 8 1
C1 3 0 7
Gini
Care Type
sport,
luxury
Family
Family,
luxury
sports
Family
sports
luxury
0.167
0.468
car Type car Type car Type
0.163
• lUse Binary Decisions based on one value
• lSeveral Choices for the splitting value
Splitting of Continuous Attribute
• – Number of possible splitting values
• = Number of distinct values
• lEach splitting value has a count matrix associated with it
• – Class counts in each of the partitions, A ≤ v and A > v
• lSimple method to choose best v
• –For each v, scan the database to gather count matrix and compute its
• Gini index Computationally Inefficient! Repetition of work.
• For efficient computation: for each attribute,
Splitting of Continuous Attribute
– Sort the attribute on values
– Linearly scan these values, each time updating the count matrix and computing gini index
– Choose the split position that has the least gini index
Sorted Values
Split Positions
Class
• It is simple to understand as it follows the same process which a human
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
• For more class labels, the computational complexity of the decision
tree may increase.
Advantages of the Decision Tree:
Disadvantages of the Decision Tree:
Thank you for listening!
Reach out for any
questions.

Contenu connexe

Similaire à Decision tree for data mining and computer

Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledLuca Zavarella
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning Souma Maiti
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptxcmpt cmpt
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learningRajasekhar364622
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคลBAINIDA
 

Similaire à Decision tree for data mining and computer (20)

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
Decision trees
Decision treesDecision trees
Decision trees
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Data discretization
Data discretizationData discretization
Data discretization
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Machine Learning (Decisoion Trees)
Machine Learning (Decisoion Trees)Machine Learning (Decisoion Trees)
Machine Learning (Decisoion Trees)
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคลMachine Learning: An introduction โดย รศ.ดร.สุรพงค์  เอื้อวัฒนามงคล
Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล
 

Dernier

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Dernier (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Decision tree for data mining and computer

  • 1. Dr. sura Zakie By : Aya jamal Haidy Data Mining Decision Tree
  • 2. What We Will Talk About • Introduction • Decision Tree Algorithms • Hunt's algorithm • Methods for Expressing Test Condition • Advantages AND Disadvantages of the Decision Tree • Measures for Selecting the Best Split
  • 3. Introduction • A classification technique is a systematic approach to building classification models from an input data set.Each technique employs a learning algorithm to identify a model that best fits the relationship between the attribute set and class label of the input data. The model generated by a learning algorithm should both fit the input data well and correctly predict the class labels of It has never seen before. Classifying a test record is straightforward once a decision tree has been constructed. Starting from the root node, we apply the test condition to the record and follow the appropriate branch based on the outcome of the test. • Classification
  • 4. • A decision tree classifier, which is a simple yet widely used classification technique, which is a hierarchical structure consisting of nodes and directed edges. • A root node:- that has no incoming edges and zero or more outgoing edges. • Internal nodes:- each of which has exactly one incoming edge and two or more outgoing edges. • Leaf or terminal nodes:- each of which has exactly one incoming edge and no outgoing edges. Introduction • Decision Tree The tree has three types of nodes :
  • 5. In a decision tree, each leaf node is assigned a class label. The nonterminal nodes, which include the root and other internal nodes, contain attribute test conditions to separate records that have different characteristics. Decision Tree Diagram
  • 6. there are some of the efficient algorithms have been developed for decision tree . These algorithms usually employ a greedy strategy that grows a decision tree , One such algorithm is hunt's algorithm, which is the basis of many existing decision tree induction algorithms, including ID3 , C4.5, and CART. • Decision Tree Classification Algorithms
  • 7. In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records into successively purer subsets. Let Dl be the set of training records that are associated with node t and g : {At,U2, . . . ,A"} be the class labels. • Hunt's algorithm Step 1 : If ail the records in Dt belong to the same class y1, then t is a leaf node labeled as y. Step 2: If D; contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets. A child node is created for each outcome of the test condition and the records in D1 are distributed to the children based on the outcomes. The algorithm is then recursively applied to each child node The following is a recursive definition of Hunt's algorithm.
  • 8. Tid Home owner Marital Status Annual Income Defulted Borrower 1 Yes single 125k No 2 NO Married 100k No 3 NO single 70k No 4 yes Married 120k No 5 NO Divorced 95k Yes 6 NO Married 60k No 7 Yes Divorced 220k No 8 NO Single 85k Yes 9 NO Married 75k No 10 NO single 90k Yes
  • 9.
  • 10. ⦁ Depends on attribute types : ⦁ Nominal ⦁ Ordinal ⦁ Continuous • Methods for Expressing Test Condition ⦁ Depends on number of ways to split ⦁ 2-way split ⦁ Multi-way split
  • 11. Binary Attributes : Warm- blooded cold- blooded • attribute types The test condition for a binary attribute generates two potential outcomes Body Temperature
  • 12. Nominal Attributes : Single Divorced Married • attribute types Since a nominal attribute can have many values, its test condition can be expressed in two ways : Use as many partitions as distinct values . Divides values into two subsets. Need to find optimal partitioning. Marital Status 1. Multiway spli : 2. Binary split ( by grouping attribute values ) : {Single, Married} Divorced Marital Status {Married} {Single, Divorced} Marital Status {Married} Divorced Marital Status {Single} OR OR
  • 13. Ordinal Attributes : • attribute types Multi-way split: Use as many partitions as distinct values – Divides values into two subsets Binary split: – Preserve order property among attribute values {Small , large} Medium ,Extra large } Short size {Medium Large,Extra large } Short size {Small} {small, medium} {large, ,Extra large } Short size
  • 14. ⦁ Continuous Attributes – Discretization to form an ordinal categorical attribute Ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles), or clustering. • Static – discretize once at the beginning • Dynamic – repeat at each node Different ways of handling > 80K > 10K Annual Income > 50K , 80K > 25K , 50K > 10K , 25K
  • 15. ⦁ Continuous Attributes – Binary Decision: (A < v) or (A v) consider all possible splits and finds the best cut can be more compute intensive Stop splitting if all the records belong to the same class or have identical attribute values Early termination. • How should the splitting procedure stop? NO YES Annual Income > 80K
  • 16. There are many measures that can be used to determine the best way to split the records. These measures are defined in terms of the class distribution of the records before and after splitting. The measures developed for selecting the best split are often based on the degree of impurity of the child nodes. The smaller the degree of impurity, the more skewed the class distribution. For example, a node with class distribution (0,1) has zero impurity, whereas a node with uniform class distribution (0.5,0.5) has the highest impurity. Examples of impurity measures include where c is the number of classes and 0log2 0 = 0 in entropy calculations. • Measures for Selecting the Best Split
  • 17. examples of computing the different impuritv measures • Measures for Selecting the Best Split
  • 18. To determine how well a test condition performs, we need to compare the degree of impurity of the parent node (before splitting) with the degree of impurity of the child nodes (after splitting). The larger their difference, the better the test condition. The gain, is a criterion that can be used to determine the goodness of a split: where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, k is the number of attribute values, and N(Vj) is the number of records associated with the child node, u7. Decision tree induction algorithms often choose a test condition that maximizes the gain . Since /(parent) is the same for all test conditions, maximizing the gain is equivalent to minimizing the weighted average impurity measures of the child nodes. Finally, when entropy is used as the impurity measure in Equation 4.6, the difference in entropy is known as the information gain. • Measures for Selecting the Best Split
  • 19. Parent C0 6 C1 6 GINI=0.5 N1 N2 C0 4 2 C1 3 3 Gini=0.486 Splits into two partitions (child nodes) Effect of Weighing partitions: Larger and purer partitions are sought Gini Index for (A) Gini(N1) = 1 – (4/7)2 – (3/7)2 = 0.489 • Splitting of Binary Attributes Node N1 Node N2 A YES NO Gini(N2) = 1 – (2/5)2 – (3/5)2 = 0.48 Weighted Gini of N1 N2 = 7/12 * 0.489 + 5/12 * 0.48 = 0.486 Gain = 0.5 – 0.486 = 0.014
  • 20. sport, luxury Family C0 9 1 C1 7 3 Gini lFor each distinct value, gather counts for each class in the dataset lUse the count matrix to make decisions • Splitting of Nominal Attributes Care Type sports Family, luxury C0 8 2 C1 0 10 Gini Care Type Family sports Luxury C0 1 8 1 C1 3 0 7 Gini Care Type sport, luxury Family Family, luxury sports Family sports luxury 0.167 0.468 car Type car Type car Type 0.163
  • 21. • lUse Binary Decisions based on one value • lSeveral Choices for the splitting value Splitting of Continuous Attribute • – Number of possible splitting values • = Number of distinct values • lEach splitting value has a count matrix associated with it • – Class counts in each of the partitions, A ≤ v and A > v • lSimple method to choose best v • –For each v, scan the database to gather count matrix and compute its • Gini index Computationally Inefficient! Repetition of work.
  • 22. • For efficient computation: for each attribute, Splitting of Continuous Attribute – Sort the attribute on values – Linearly scan these values, each time updating the count matrix and computing gini index – Choose the split position that has the least gini index
  • 24. • It is simple to understand as it follows the same process which a human • It can be very useful for solving decision-related problems. • It helps to think about all the possible outcomes for a problem. • There is less requirement of data cleaning compared to other algorithms. • The decision tree contains lots of layers, which makes it complex. • It may have an overfitting issue, which can be resolved using the Random Forest algorithm. • For more class labels, the computational complexity of the decision tree may increase. Advantages of the Decision Tree: Disadvantages of the Decision Tree:
  • 25. Thank you for listening! Reach out for any questions.