Decision trees

DATA WARE HOUSING
AND DATA MINING
DECISION TREE

Contents
• Introduction
• Decision Tree
• Decision Tree Algorithm
• Decision Tree Based Algorithm
• Algorithm
• Decision Tree Advantages and Disadvantages

Introduction
• Classification is a most familiar and most
popular data mining technique.
• Classification applications includes image and
pattern recognition, loan approval, detecting
faults in industrial applications.
• All approaches to performing classification
assumes some knowledge of the data.
• Training set is used to develop specific
parameters required by the technique.

Decision Tree

• Decision Tree (DT):
▫ Tree where the root and each internal node is
labeled with a question.
▫ The arcs represent each possible answer to the
associated question.
▫ Each leaf node represents a prediction of a solution
to the problem.
• Popular technique for classification; Leaf node
indicates class to which the corresponding tuple
belongs.

Decision Tree
• A Decision Tree Model is a computational
model consisting of three parts:
▫ Decision Tree
▫ Algorithm to create the tree
▫ Algorithm that applies the tree to data
• Creation of the tree is the most difficult part.
• Processing is basically a search similar to that
in a binary search tree (although DT may not
be binary).

Algorithm Definition

• The decision tree approach is most useful in
classification problems. With this technique, a
tree is constructed to model the classification
process.
• Once the tree is build, it is applied to each tuple
in the database and results in a classification for
that tuple.
• There are two basics step in this techinque:
Building the tree and Applying the tree to the
database.

• The decision tree approach to classification is to
divide the search space into rectangular region.
A tuple is classified based on the region into
which it falls.
• Definition: Given a database D={t1……..tn}
where ti=<ti1……..tih> and the database schema
consist of following attributes {A1,A2,………,Ah}
also a set of classes C={C1,……,Cm}. A decision
tree DT or classification tree is a tree associated
with D that has the following properties:
▫ Each internal node is labeled with an attribute Ai
▫ Each arc is labeled with a predicate that can be
applied to a attribute associated with a parent.
▫ Each leaf node is labeled with a class Cj.

Algorithm
• Input:
D // Training data
• Output:
T //Decision tree
• DTBuild algorithm
// Simplistic algorithm to illustrate naive
approach to building DT

• T=0;
Determine best splitting criterion;
T=Create root node, node and label with splitting
attribute;
T=Add arc to root node for each split predicate and
label;
for each arc do
D= database created by applying splitting predicate to
D;
if stopping point reached for this path, then
T’= Create leaf node and label with appropriate class;
else
T’=DTBuild(D);
T=Add T’ to arc;

DT Advantages/Disadvantages
• Advantages:
▫ Easy to understand.
▫ Easy to generate rules
• Disadvantages:
▫ May suffer from overfitting.
▫ Classifies by rectangular partitioning.
▫ Does not easily handle nonnumeric data.
▫ Can be quite large – pruning is necessary.

Decision trees

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (12)

Similaire à Decision trees

Similaire à Decision trees (20)

Plus de Jagjit Wilku

Plus de Jagjit Wilku (6)

Dernier

Dernier (20)

Decision trees