5. DECISION TREES
A decision tree is a learning algorithm that constructs a set of
decisions based on training data.
Decision trees are popular because:
• They are naturally non-linear, so you can use them to solve
complex problems
• They are easy to visualise
• How they work is easily explained
• They can be used for regression (predict a number) and
classification (predict a class)
A decision tree algorithm is an explicit version of “ten questions”.
6. A TOY EXAMPLE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
7. BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
8. BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
9. BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
10. DECISION TREES IN SCIKIT
sklearn.tree.DecisionTreeClassifier(
criterion=‘gini',
splitter=‘best',
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_split=1e-07,
class_weight=None,
presort=False
)
11. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
12. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
13. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
Yes No
14. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
15. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
16. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
Date No Date
17. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
18. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
19. BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
20. ID3
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
21. The ID3 algorithm, and many other decision tree algorithms, are
prone to overfitting: trees become too deep and start to capture
noise in the training data.
Overfitting means a trained algorithm will fail to generalise well to
new examples.
One way of combatting overfitting is to use an ensemble method.
ID3
26. RANDOM FORESTS
A random forest is an ensemble method based on decision trees.
The basic idea is deceptively simple:
1. Construct N decision trees
• Randomly sample a subset of the training data (with
replacement)
• Construct/train a decision tree using the decision tree
algorithm and the sampled subset of data
2. Predict by asking all trees in the forest for their opinion
• For regression problems, take the mean (average) of all
trees’ predictions
• For classification problems, take the mode of all trees’
predictions (i.e. vote)
29. SUMMARY
Decision trees are easy-to-understand learning algorithms that can
be used for regression and classification, even for non-linear
problems.
Random forests are ensemble learning algorithms that help prevent
overfitting by creating many decision trees and averaging their
predictions.
If you are just getting started with machine learning, decision trees
are an excellent starting point.