# Decision tree

ASSISTANT PROFESSOR at G D Goenka University Gurgaon à G D Goenka University Gurgaon
31 Mar 2020
1 sur 22

### Decision tree

• 1. DECISION TREE 3/31/2020 Shivani Saluja 1
• 2. INTRODUCTION • Decision Trees are a type of Supervised Machine Learning • Decision Tree Analysis is a general, predictive modelling tool • Data is continuously split according to a certain parameter • Decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. 3/31/2020 Shivani Saluja 2
• 3. RULES • The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. • The decision rules are generally in form of if-then-else statements. • Deeper the tree, the more complex the rules and fitter the model. 3/31/2020 Shivani Saluja 3
• 4. TERMINOLOGIES • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. • Splitting: It is a process of dividing a node into two or more sub-nodes. • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node. • Leaf/ Terminal Node: Nodes with no children (no further split) is called Leaf or Terminal node. • Pruning: When we reduce the size of decision trees by removing nodes (opposite of Splitting), the process is called pruning. • Branch / Sub-Tree: A sub section of decision tree is called branch or sub-tree. • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node. 3/31/2020 Shivani Saluja 4
• 5. ENTITIES • Decision nodes :Decision nodes are where the data is split. • Leaves: The leaves are the decisions or the final outcomes. 3/31/2020 Shivani Saluja 5
• 6. TYPES OF DECISION TREES Classification trees (Yes/No types) • What we’ve seen above is an example of classification tree, where the outcome was variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical. Regression trees (Continuous data types) • Here the decision or the outcome variable is Continuous, e.g. a number like 123. 3/31/2020 Shivani Saluja 6
• 7. EXPRESSIVENESS OF DECISION TREES • Decision trees can represent any boolean function of the input attributes • Decision trees to perform the function of :AND, OR 3/31/2020 Shivani Saluja 7
• 8. DECISION TREE FOR OR 3/31/2020 Shivani Saluja 8
• 9. SELECT THE BEST ATTRIBUTE → A • Best attribute in terms of which attribute has the most information gain • a measure that expresses how well an attribute splits that data into groups based on classification. • ID3 is a greedy algorithm that grows the tree top-down, at each node selecting the attribute that best classifies the local training examples. This process continues until the tree perfectly classifies the training examples or until all attributes have been used. 3/31/2020 Shivani Saluja 9
• 10. ENTROPY • Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data. • It tells us about the predictability of a certain event. • lower values imply less uncertainty • while higher values imply high uncertainty. • If the sample is completely homogeneous the entropy is zero and if the sample is equally divided then it has entropy of one. • 3/31/2020 Shivani Saluja 10
• 11. INFORMATION GAIN • Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy after deciding on a particular attribute A. • It measures the relative change in entropy with respect to the independent variables. 3/31/2020 Shivani Saluja 11 where IG(S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire set, while the second term calculates the Entropy after applying the feature A, where P(x) is the probability of event x.
• 12. DECISION TREE LEARNING ALGORITHM (ID3) • Builds decision trees using a top-down, greedy approach • Select the best attribute → A • Assign A as the decision attribute (test case) for the NODE. • For each value of A, create a new descendant of the NODE. – • Sort the training examples to the appropriate descendant node leaf. • If examples are perfectly classified, then STOP else iterate over the new leaf nodes. 3/31/2020 Shivani Saluja 12
• 13. EXAMPLE • Consider a piece of data collected over the course of 14 days where the features are Outlook, Temperature, Humidity, Wind and the outcome variable is whether Golf was played on the day. Now, our job is to build a predictive model which takes in above 4 parameters and predicts whether Golf will be played on the day. We’ll build a decision tree to do that using ID3 algorithm. 3/31/2020 Shivani Saluja 13
• 14. EXAMPLE 3/31/2020 Shivani Saluja 14
• 15. 3/31/2020 Shivani Saluja 15
• 16. 3/31/2020 Shivani Saluja 16
• 17. 3/31/2020 Shivani Saluja 17
• 18. 3/31/2020 Shivani Saluja 18
• 19. 3/31/2020 Shivani Saluja 19
• 20. 3/31/2020 Shivani Saluja 20
• 21. 3/31/2020 Shivani Saluja 21
• 22. 3/31/2020 Shivani Saluja 22

### Notes de l'éditeur

1. An example of a decision tree can be explained using above binary tree. Let’s say you want to predict whether a person is fit given their information like age, eating habit, and physical activity, etc. The decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this was a binary classification problem (a yes no type problem).
2. . Example, consider a coin toss whose probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest possible, since there’s no way of determining what the outcome might be. Alternatively, consider a coin which has heads on both the sides, the entropy of such an event can be predicted perfectly since we know beforehand that it’ll always be heads. In other words, this event has no randomness hence it’s entropy is zero