2. What We Will Talk
About
• Introduction
• Decision Tree Algorithms
• Hunt's algorithm
• Methods for Expressing Test Condition
• Advantages AND Disadvantages of the Decision Tree
• Measures for Selecting the Best Split
3. Introduction
• A classification technique is a systematic approach to building classification models from an input
data set.Each technique employs a learning algorithm to identify a model that best fits the
relationship between the attribute set
and class label of the input data. The model generated by a learning algorithm
should both fit the input data well and correctly predict the class labels
of It has never seen before.
Classifying a test record is straightforward once a decision tree has been
constructed. Starting from the root node, we apply the test condition to the
record and follow the appropriate branch based on the outcome of the test.
• Classification
4. • A decision tree classifier, which is a simple yet widely used classification technique, which is a
hierarchical structure consisting of nodes and directed edges.
• A root node:- that has no incoming edges and zero or more outgoing edges.
• Internal nodes:- each of which has exactly one incoming edge and two or more
outgoing edges.
• Leaf or terminal nodes:- each of which has exactly one incoming edge and no outgoing edges.
Introduction
• Decision Tree
The tree has three types of nodes :
5. In a decision tree, each leaf node is assigned a class label.
The nonterminal nodes, which include the root and other internal nodes, contain attribute test
conditions to separate records that have different characteristics.
Decision Tree
Diagram
6. there are some of the efficient algorithms have been developed for decision tree . These algorithms
usually employ a greedy strategy that grows a decision tree ,
One such algorithm is hunt's algorithm,
which is the basis of many existing decision
tree induction algorithms,
including ID3 , C4.5, and CART.
• Decision Tree Classification Algorithms
7. In Hunt's algorithm, a decision tree is grown in a recursive fashion by partitioning the training records
into successively purer subsets.
Let Dl be the set of training records that are associated with node t and g : {At,U2, . . . ,A"}
be the class labels.
• Hunt's algorithm
Step 1 : If ail the records in Dt belong to the same class y1, then t is a leaf node labeled as y.
Step 2: If D; contains records that belong to more than one class, an attribute test condition is selected to
partition the records into smaller subsets. A child node is created for each outcome of the test condition and
the records in D1 are distributed to the children based on the outcomes. The algorithm is then recursively
applied to each child node
The following is a recursive definition of Hunt's algorithm.
8. Tid Home owner Marital Status Annual Income Defulted Borrower
1 Yes single 125k No
2 NO Married 100k No
3 NO single 70k No
4 yes Married 120k No
5 NO Divorced 95k Yes
6 NO Married 60k No
7 Yes Divorced 220k No
8 NO Single 85k Yes
9 NO Married 75k No
10 NO single 90k Yes
9.
10. ⦁ Depends on attribute types :
⦁ Nominal
⦁ Ordinal
⦁ Continuous
• Methods for Expressing Test Condition
⦁ Depends on number of ways to split
⦁ 2-way split
⦁ Multi-way split
12. Nominal Attributes :
Single
Divorced
Married
• attribute types
Since a nominal attribute can have many values, its test condition can be expressed in two ways :
Use as many partitions as distinct values . Divides values into two subsets. Need to find optimal partitioning.
Marital
Status
1. Multiway spli : 2. Binary split ( by grouping attribute values ) :
{Single,
Married}
Divorced
Marital
Status
{Married} {Single,
Divorced}
Marital
Status
{Married}
Divorced
Marital
Status
{Single}
OR OR
13. Ordinal Attributes :
• attribute types
Multi-way split: Use as many partitions as distinct values
– Divides values into two subsets
Binary split:
– Preserve order property among attribute values
{Small ,
large}
Medium ,Extra
large }
Short size
{Medium
Large,Extra large
}
Short size
{Small}
{small,
medium}
{large, ,Extra
large }
Short size
14. ⦁ Continuous Attributes
– Discretization to form an ordinal categorical attribute
Ranges can be found by equal interval bucketing, equal frequency
bucketing (percentiles), or clustering.
• Static – discretize once at the beginning
• Dynamic – repeat at each node
Different ways of handling
> 80K
> 10K
Annual
Income
> 50K , 80K
> 25K , 50K
> 10K , 25K
15. ⦁ Continuous Attributes
– Binary Decision: (A < v) or (A v)
consider all possible splits and finds the best cut
can be more compute intensive
Stop splitting if all the records belong to the
same class or have identical attribute values Early termination.
• How should the splitting procedure stop?
NO
YES
Annual
Income
> 80K
16. There are many measures that can be used to determine the best way to split the records. These measures are
defined in terms of the class distribution of the records before and after splitting.
The measures developed for selecting the best split are often based on the degree of impurity of the child nodes. The
smaller the degree of impurity, the more skewed the class distribution. For example, a node with class distribution (0,1)
has zero impurity, whereas a node with uniform class distribution (0.5,0.5) has the highest impurity. Examples of impurity
measures include
where c is the number of classes and 0log2 0 = 0 in entropy calculations.
• Measures for Selecting the Best Split
17. examples of computing the different impuritv measures
• Measures for Selecting the Best Split
18. To determine how well a test condition performs, we need to compare the degree of impurity of the parent node
(before splitting) with the degree of impurity of the child nodes (after splitting). The larger their difference, the
better the test condition. The gain,
is a criterion that can be used to determine the goodness of a split:
where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, k is
the number of attribute values, and N(Vj) is the number of records associated with the child node, u7.
Decision tree induction algorithms often choose a test condition that maximizes the gain . Since /(parent) is
the same for all test conditions, maximizing the gain is equivalent to minimizing the weighted average
impurity measures of the child nodes. Finally, when entropy is used as the impurity measure in Equation
4.6, the difference in entropy is known as the information gain.
• Measures for Selecting the Best Split
19. Parent
C0 6
C1 6
GINI=0.5
N1 N2
C0 4 2
C1 3 3
Gini=0.486
Splits into two partitions (child nodes)
Effect of Weighing partitions:
Larger and purer partitions are sought
Gini Index for (A)
Gini(N1)
= 1 – (4/7)2 – (3/7)2
= 0.489
• Splitting of Binary Attributes
Node N1 Node N2
A
YES NO
Gini(N2)
= 1 – (2/5)2 – (3/5)2
= 0.48
Weighted Gini of N1 N2
= 7/12 * 0.489 +
5/12 * 0.48
= 0.486
Gain = 0.5 – 0.486 = 0.014
20. sport,
luxury
Family
C0 9 1
C1 7 3
Gini
lFor each distinct value, gather counts for each class in the dataset
lUse the count matrix to make decisions
• Splitting of Nominal Attributes
Care Type
sports
Family,
luxury
C0 8 2
C1 0 10
Gini
Care Type
Family sports Luxury
C0 1 8 1
C1 3 0 7
Gini
Care Type
sport,
luxury
Family
Family,
luxury
sports
Family
sports
luxury
0.167
0.468
car Type car Type car Type
0.163
21. • lUse Binary Decisions based on one value
• lSeveral Choices for the splitting value
Splitting of Continuous Attribute
• – Number of possible splitting values
• = Number of distinct values
• lEach splitting value has a count matrix associated with it
• – Class counts in each of the partitions, A ≤ v and A > v
• lSimple method to choose best v
• –For each v, scan the database to gather count matrix and compute its
• Gini index Computationally Inefficient! Repetition of work.
22. • For efficient computation: for each attribute,
Splitting of Continuous Attribute
– Sort the attribute on values
– Linearly scan these values, each time updating the count matrix and computing gini index
– Choose the split position that has the least gini index
24. • It is simple to understand as it follows the same process which a human
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
• For more class labels, the computational complexity of the decision
tree may increase.
Advantages of the Decision Tree:
Disadvantages of the Decision Tree:
25. Thank you for listening!
Reach out for any
questions.