SlideShare une entreprise Scribd logo
1  sur  21
DATA MINING
TECHNIQUES
(DECISION TREES )
Presented by:
Shweta Ghate
MIT College OF Engineering
What is Data Mining ???
• Data Mining is all about automating the
process of searching for patterns in the
data.
• Data mining is the discovery of hidden
knowledge, unexpected patterns and new rules in
large databases..
Data Mining Techniques
Key techniques
 Association
 Classification
Decision Trees
 Clustering Techniques
 Regression
Classification
 Classification is a most familiar and most popular data mining
technique.
 Classification applications includes image and pattern
recognition, loan approval, detecting faults in industrial
applications.
 All approaches to performing classification assumes some
knowledge of the data.
 Training set is used to develop specific parameters required by
the technique.
 The goal of classification is to build a concise model that can
be use to predict the class of records whose class label is not
know.
Classification
Classification consists of assigning a class
label to a set of unclassified cases.
1. Supervised Classification
The set of possible classes is known in
advance.
2. Unsupervised Classification
Set of possible classes is not known. After
classification we can try to assign a name to
that class. Unsupervised classification is
called clustering.
Decision tree
 Classification scheme
 Generates a tree and a set of rules
 Set of records divide into 2 subsets
◦ -training set (deriving the classifier)
◦ - test set (measure the accuracy of classifier)
• Attributes are divided into 2 types
-numerical attribute
-categorical attribute
Decision tree
 Decision tree
◦ A flow-chart-like tree structure
◦ Internal node denotes a test on an attribute
◦ Branch represents an outcome of the test
◦ Leaf nodes represent class labels or class
distribution or rule.
 Use of decision tree: Classifying an unknown sample
◦ Test the attribute values of the sample against the
decision tree
Training Dataset
Output: A Decision Tree
OUTLOOK
HUMIDITY PLAY WINDY
PLAY NO PLAY
NO PLAY PLAY
sunny
overcast
rain
<=75 >75 true
false
Extracting Classification Rules from Trees
 Represent the knowledge in the form of IF-THEN
rules
 One rule is created for each path from the root to a
leaf
 Each attribute-value pair along a path forms a
conjunction
 The leaf node holds the class prediction
 Rules are easier for humans to understand
RULE 1: If it is sunny and the humidity is not above 75% then play.
RULE 2: If it is sunny and the humidity is not above 75% then play.
RULE 3:If it is overcast , then play
RULE 4:If it is rainy and not windy , then play.
RULE 5:If it is rainy and windy, then don't play.
Output: A Decision Tree whether to play a golf
OUTLOOK
HUMIDITY PLAY WINDY
PLAY NO PLAY
NO PLAY PLAY
sunny
overcast
rain
<=75 >75 true
false
Example
 The classification of an unknown input vector is done
by traversing the tree from the root node to the leaf
node.
 e.g: outlook= rain, temp=70,humidity=65,
and weather=true…..then find the value of Class
attribute?????
Tree construction Principle
 Splitting Attribute
 Splitting Criterion
3 main phases
-construction Phase
-Pruning Phase
-Processing the pruned tree to improve
the understandability
The Generic Algorithm
 Let the training data set be T with class-
labels{C1,C2….Ck}.
 T he tree is built by repeatedly partitioning
the training data set
 The process continued till all the records in
partition belong to the same class.
T is homogenous
-T contains cases all belonging to a single class Cj. The
decision tree for T is a leaf identifying class Cj.
T is not homogeneous
-T contains cases that belongs to a mixture of classes.
-A test is chosen ,based on single attribute, that has one or
more mutually exclusive outcomes{O1,O2,….On}.
-T is partitioned into subset T1,T2,T3…..Tn.
where Ti contains all those cases in T that have the
outcome Oi of the chosen set.
-The decision tree for T consist of decision node identifying
the test, and one branch for each possible outcome.
-The same tree building method is applied
recursively to each subset of training cases.
- n is taken 2,and a binary decision tree is generated.
T is trivial
- T contains no cases.
- The decision tree T is a leaf ,but the class to be
associated with the leaf must be determined from
information other than T.
Decision Tree Construction Algorithms
 CART(Classification And Regression Tree)
 ID3(Iterative Dichotomizer 3)
 C4.5
Advantages
 Generate understandable rules
 Able to handle both numeric and
categorical attributes
 They provide clear indication of which
fields are most important for prediction or
classification.
Weaknesses
 Some decision trees can only deal with
binary-valued target classes
 Others can assign records to an arbitrary
number of classes ,but are error-prone
when the number of training examples are
class gets small.
 Process of growing a decision tree is
computationally expensive.
References
• http://www.ibm.com/developerworks/opensource/library/
ba-data-mining-techniques/index.html
• Data Mining: Concepts and Techniques (Chapter 7 Slide for
textbook), Jiawei Han and Micheline Kamber, Intelligent
Database Systems Research Lab, School of Computing
Science, Simon Fraser University, Canada
• Data Mining Techiques: Second edition by Arun K.
Pujari.
Data mining technique (decision tree)

Contenu connexe

Tendances

Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
Laila Fatehy
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 

Tendances (20)

2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data Mining
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision tree
Decision treeDecision tree
Decision tree
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 

En vedette

Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree Learning
Adrian Cuyugan
 
Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mining
anjeshdubey
 

En vedette (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision trees
Decision treesDecision trees
Decision trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Text Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree LearningText Mining, Association Rules and Decision Tree Learning
Text Mining, Association Rules and Decision Tree Learning
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Data mining
Data miningData mining
Data mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
Neural networks
Neural networksNeural networks
Neural networks
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mining
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
7 data warehouse & marts
7 data warehouse & marts7 data warehouse & marts
7 data warehouse & marts
 
2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis 2014 Chicago Crime Data Analysis
2014 Chicago Crime Data Analysis
 
Data Mining. Classification
Data Mining. ClassificationData Mining. Classification
Data Mining. Classification
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
Neural network
Neural networkNeural network
Neural network
 
Knn
KnnKnn
Knn
 
Leadership, early & contemporary approaches to leadership
Leadership, early & contemporary approaches to leadershipLeadership, early & contemporary approaches to leadership
Leadership, early & contemporary approaches to leadership
 

Similaire à Data mining technique (decision tree)

Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
HimanshuSharma997566
 

Similaire à Data mining technique (decision tree) (20)

Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Decision tree
Decision treeDecision tree
Decision tree
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
CART Training 1999
CART Training 1999CART Training 1999
CART Training 1999
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Decision Trees.ppt
Decision Trees.pptDecision Trees.ppt
Decision Trees.ppt
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 

Plus de Shweta Ghate

Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
Shweta Ghate
 
Unified process,agile process,process assesment ppt
Unified process,agile process,process assesment pptUnified process,agile process,process assesment ppt
Unified process,agile process,process assesment ppt
Shweta Ghate
 
Open source web GIS
Open source web GISOpen source web GIS
Open source web GIS
Shweta Ghate
 
Introduction to 3G technology
Introduction to 3G technologyIntroduction to 3G technology
Introduction to 3G technology
Shweta Ghate
 
computer virus and related legal issues
computer virus and related legal issuescomputer virus and related legal issues
computer virus and related legal issues
Shweta Ghate
 

Plus de Shweta Ghate (6)

Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
Unified process,agile process,process assesment ppt
Unified process,agile process,process assesment pptUnified process,agile process,process assesment ppt
Unified process,agile process,process assesment ppt
 
Open source web GIS
Open source web GISOpen source web GIS
Open source web GIS
 
Introduction to 3G technology
Introduction to 3G technologyIntroduction to 3G technology
Introduction to 3G technology
 
computer virus and related legal issues
computer virus and related legal issuescomputer virus and related legal issues
computer virus and related legal issues
 
Domain logic patterns of Software Architecture
Domain logic patterns of Software ArchitectureDomain logic patterns of Software Architecture
Domain logic patterns of Software Architecture
 

Dernier

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 

Dernier (20)

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 

Data mining technique (decision tree)

  • 1. DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering
  • 2. What is Data Mining ??? • Data Mining is all about automating the process of searching for patterns in the data. • Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases..
  • 3. Data Mining Techniques Key techniques  Association  Classification Decision Trees  Clustering Techniques  Regression
  • 4. Classification  Classification is a most familiar and most popular data mining technique.  Classification applications includes image and pattern recognition, loan approval, detecting faults in industrial applications.  All approaches to performing classification assumes some knowledge of the data.  Training set is used to develop specific parameters required by the technique.  The goal of classification is to build a concise model that can be use to predict the class of records whose class label is not know.
  • 5. Classification Classification consists of assigning a class label to a set of unclassified cases. 1. Supervised Classification The set of possible classes is known in advance. 2. Unsupervised Classification Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering.
  • 6. Decision tree  Classification scheme  Generates a tree and a set of rules  Set of records divide into 2 subsets ◦ -training set (deriving the classifier) ◦ - test set (measure the accuracy of classifier) • Attributes are divided into 2 types -numerical attribute -categorical attribute
  • 7. Decision tree  Decision tree ◦ A flow-chart-like tree structure ◦ Internal node denotes a test on an attribute ◦ Branch represents an outcome of the test ◦ Leaf nodes represent class labels or class distribution or rule.  Use of decision tree: Classifying an unknown sample ◦ Test the attribute values of the sample against the decision tree
  • 9. Output: A Decision Tree OUTLOOK HUMIDITY PLAY WINDY PLAY NO PLAY NO PLAY PLAY sunny overcast rain <=75 >75 true false
  • 10. Extracting Classification Rules from Trees  Represent the knowledge in the form of IF-THEN rules  One rule is created for each path from the root to a leaf  Each attribute-value pair along a path forms a conjunction  The leaf node holds the class prediction  Rules are easier for humans to understand
  • 11. RULE 1: If it is sunny and the humidity is not above 75% then play. RULE 2: If it is sunny and the humidity is not above 75% then play. RULE 3:If it is overcast , then play RULE 4:If it is rainy and not windy , then play. RULE 5:If it is rainy and windy, then don't play. Output: A Decision Tree whether to play a golf OUTLOOK HUMIDITY PLAY WINDY PLAY NO PLAY NO PLAY PLAY sunny overcast rain <=75 >75 true false
  • 12. Example  The classification of an unknown input vector is done by traversing the tree from the root node to the leaf node.  e.g: outlook= rain, temp=70,humidity=65, and weather=true…..then find the value of Class attribute?????
  • 13. Tree construction Principle  Splitting Attribute  Splitting Criterion 3 main phases -construction Phase -Pruning Phase -Processing the pruned tree to improve the understandability
  • 14. The Generic Algorithm  Let the training data set be T with class- labels{C1,C2….Ck}.  T he tree is built by repeatedly partitioning the training data set  The process continued till all the records in partition belong to the same class.
  • 15. T is homogenous -T contains cases all belonging to a single class Cj. The decision tree for T is a leaf identifying class Cj. T is not homogeneous -T contains cases that belongs to a mixture of classes. -A test is chosen ,based on single attribute, that has one or more mutually exclusive outcomes{O1,O2,….On}. -T is partitioned into subset T1,T2,T3…..Tn. where Ti contains all those cases in T that have the outcome Oi of the chosen set. -The decision tree for T consist of decision node identifying the test, and one branch for each possible outcome.
  • 16. -The same tree building method is applied recursively to each subset of training cases. - n is taken 2,and a binary decision tree is generated. T is trivial - T contains no cases. - The decision tree T is a leaf ,but the class to be associated with the leaf must be determined from information other than T.
  • 17. Decision Tree Construction Algorithms  CART(Classification And Regression Tree)  ID3(Iterative Dichotomizer 3)  C4.5
  • 18. Advantages  Generate understandable rules  Able to handle both numeric and categorical attributes  They provide clear indication of which fields are most important for prediction or classification.
  • 19. Weaknesses  Some decision trees can only deal with binary-valued target classes  Others can assign records to an arbitrary number of classes ,but are error-prone when the number of training examples are class gets small.  Process of growing a decision tree is computationally expensive.
  • 20. References • http://www.ibm.com/developerworks/opensource/library/ ba-data-mining-techniques/index.html • Data Mining: Concepts and Techniques (Chapter 7 Slide for textbook), Jiawei Han and Micheline Kamber, Intelligent Database Systems Research Lab, School of Computing Science, Simon Fraser University, Canada • Data Mining Techiques: Second edition by Arun K. Pujari.