SlideShare une entreprise Scribd logo
1  sur  18
An Algorithm for Building Decision Trees (C4.5) 1. Let T be the set of training instances.2. Choose an attribute that best differentiates the instances in T.3. Create a tree node whose value is the chosen attribute. 	-Create child links from this node where each link  represents a unique value 	  for the chosen attribute.	-Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3:     	-If the instances in the subclass satisfy predefined criteria or if the set of 	  remaining attribute choices for this path is null, specify the classification 	  for new instances following this decision path. 	-If the subclass does not satisfy the criteria and there is at least one attribute 	  to further subdivide the path of the tree, let T be 	the current set of subclass 	  instances and return to step 2.
Entropy Example Given a set R of objects Entropy(R) = S (–p(I)log2p(I)) where p(I) is the proportion of set R that belongs to class I. An example: If set R is a collection of 14 objects, 9 of them belong to class A, and 5 of them belong to class B, then Entropy(R) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 The range of entropy is from 0 (perfectly classified) to 1 (totally random).
Information Gain Example Actual example: Suppose there are 14 objects in set R, 9 of them belong to the class Evil, 5 of them belong to the class Good. Suppose that each object has an attribute Size, and Size can either be Big or Small. Suppose that out of these 14 objects, 8 have Size = Big, and 6 have Size = Small. Suppose that out of the 8 objects who have Size = Big, 6 are Evil and 2 are Good. Suppose that out of the 6 objects who have Size = Small, 3 are Evil and 3 are Good. Then, the information gain due to splitting R by attribute Size is: Gain(R,Size)=Entropy(R)-(8/14)*Entropy(RBig)-(6/14)*Entropy(RSmall)                       = 0.940 - (8/14)*0.811 - (6/14)*1.00                      = 0.048 Entropy(RBig) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = 0.811 Entropy(RSmall) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00
Which attribute to use as split point for a node in decision tree? At the node, calculate information gain for each attribute. Choose the attribute that has the highest information gain, and use that as the split point. In the preceding example, the attribute Size has only two possible values. Often, an attribute can have more than two possible values, and we’d have to adapt the formula accordingly.
A Decision Tree Example The weather data example.
Information Gained by Knowing the Result of a Decision 	In the weather data example, there are 9 instances of which the decision to play is “yes” and there are 5 instances of which the decision to play is “no”.  Then, the information gained by knowing the result of the decision is
Information Further Required If “Outlook” Is Placed at the Root Outlook sunny overcast rainy yes yes no no no yes yes yes yes yes yes yes no no
Information Gained by Placing Each of the 4 Attributes Gain(outlook) = 0.940 bits – 0.693 bits  	= 0.247 bits. Gain(temperature) = 0.029 bits. Gain(humidity) = 0.152 bits. Gain(windy) = 0.048 bits.
The Strategy for Selecting an Attribute to Place at a Node Select the attribute that gives us the largest information gain. In this example, it is the attribute “Outlook”. Outlook sunny overcast rainy 2 “yes”  3 “no” 4 “yes” 3 “yes” 2 “no”
The Recursive Procedure for Constructing a Decision Tree The operation discussed above is applied to each branch recursively to construct the decision tree. For example, for the branch “Outlook = Sunny”, we evaluate the information gained by applying each of the remaining 3 attributes. Gain(Outlook=sunny;Temperature) = 0.971 – 0.4 = 0.571 Gain(Outlook=sunny;Humidity) = 0.971 – 0 = 0.971 Gain(Outlook=sunny;Windy) = 0.971 – 0.951 = 0.02
Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch “Outlook = rainy”. Gain(Outlook=rainy;Temperature) = 0.971 – 0.951 = 0.02 Gain(Outlook=rainy;Humidity) = 0.971 – 0.951 = 0.02 Gain(Outlook=rainy;Windy) =0.971 – 0 = 0.971
Over-fitting and Pruning If we recursively build the decision tree based on our training set until each leaf is totally classified, we have most likely over-fitted the data. To avoid over-fitting, we need to set aside part of the training data to test the decision tree, and prune (delete) the branches that give poor predictions.
The Over-fitting Issue Over-fitting is caused by creating decision rules that work accurately on the training set based on insufficient quantity of samples. As a result, these decision rules may not work well in more general cases.
Evaluation Training accuracy How many training instances can be correctly classify based on the available data? Is high when the tree is deep/large, or when there is less confliction in the training instances.  however, higher training accuracy does not mean good generalization Testing accuracy Given a number of new instances, how many of them can we correctly classify? Cross validation
A partial decision tree with root node = income range
A partial decision tree with root node = credit card insurance
A three-node decision tree for the credit card database

Contenu connexe

Tendances

Binary search tree
Binary search treeBinary search tree
Binary search treeRacksaviR
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised LearningAlia Hamwi
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10Bianca Teşilă
 
Interval intersection
Interval intersectionInterval intersection
Interval intersectionAabida Noman
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmssamairaakram
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisTalha Kabakus
 
Data Compression in Data mining and Business Intelligencs
Data Compression in Data mining and Business Intelligencs Data Compression in Data mining and Business Intelligencs
Data Compression in Data mining and Business Intelligencs ShahDhruv21
 
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...ijcsit
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesDataminingTools Inc
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort Amit Kundu
 
Data Visualization using base graphics
Data Visualization using base graphicsData Visualization using base graphics
Data Visualization using base graphicsRupak Roy
 
Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"LefterisMitsimponas
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Anand Ingle
 
Path compression
Path compressionPath compression
Path compressionDEEPIKA T
 

Tendances (17)

Binary search tree
Binary search treeBinary search tree
Binary search tree
 
ID3 Algorithm
ID3 AlgorithmID3 Algorithm
ID3 Algorithm
 
Unsupervised Learning
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10
 
Interval intersection
Interval intersectionInterval intersection
Interval intersection
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
ID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC Analysis
 
Data Compression in Data mining and Business Intelligencs
Data Compression in Data mining and Business Intelligencs Data Compression in Data mining and Business Intelligencs
Data Compression in Data mining and Business Intelligencs
 
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...
 
Lect4
Lect4Lect4
Lect4
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
 
Data Visualization using base graphics
Data Visualization using base graphicsData Visualization using base graphics
Data Visualization using base graphics
 
Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"Data Analysis project "TITANIC SURVIVAL"
Data Analysis project "TITANIC SURVIVAL"
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure
 
Path compression
Path compressionPath compression
Path compression
 
Heap sort
Heap sortHeap sort
Heap sort
 

En vedette

The Solar System
The Solar SystemThe Solar System
The Solar Systembingari0922
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningNasha Dmasive
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 

En vedette (8)

The Solar System
The Solar SystemThe Solar System
The Solar System
 
Algoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data MiningAlgoritma C4.5 Dalam Data Mining
Algoritma C4.5 Dalam Data Mining
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Similaire à An algorithm for building

ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_CSrimatre K
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRupak Roy
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 

Similaire à An algorithm for building (20)

ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap Aggregation
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 

Dernier

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 

Dernier (20)

31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 

An algorithm for building

  • 1. An Algorithm for Building Decision Trees (C4.5) 1. Let T be the set of training instances.2. Choose an attribute that best differentiates the instances in T.3. Create a tree node whose value is the chosen attribute. -Create child links from this node where each link represents a unique value for the chosen attribute. -Use the child link values to further subdivide the instances into subclasses. 4. For each subclass created in step 3: -If the instances in the subclass satisfy predefined criteria or if the set of remaining attribute choices for this path is null, specify the classification for new instances following this decision path. -If the subclass does not satisfy the criteria and there is at least one attribute to further subdivide the path of the tree, let T be the current set of subclass instances and return to step 2.
  • 2. Entropy Example Given a set R of objects Entropy(R) = S (–p(I)log2p(I)) where p(I) is the proportion of set R that belongs to class I. An example: If set R is a collection of 14 objects, 9 of them belong to class A, and 5 of them belong to class B, then Entropy(R) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 The range of entropy is from 0 (perfectly classified) to 1 (totally random).
  • 3. Information Gain Example Actual example: Suppose there are 14 objects in set R, 9 of them belong to the class Evil, 5 of them belong to the class Good. Suppose that each object has an attribute Size, and Size can either be Big or Small. Suppose that out of these 14 objects, 8 have Size = Big, and 6 have Size = Small. Suppose that out of the 8 objects who have Size = Big, 6 are Evil and 2 are Good. Suppose that out of the 6 objects who have Size = Small, 3 are Evil and 3 are Good. Then, the information gain due to splitting R by attribute Size is: Gain(R,Size)=Entropy(R)-(8/14)*Entropy(RBig)-(6/14)*Entropy(RSmall) = 0.940 - (8/14)*0.811 - (6/14)*1.00 = 0.048 Entropy(RBig) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = 0.811 Entropy(RSmall) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00
  • 4. Which attribute to use as split point for a node in decision tree? At the node, calculate information gain for each attribute. Choose the attribute that has the highest information gain, and use that as the split point. In the preceding example, the attribute Size has only two possible values. Often, an attribute can have more than two possible values, and we’d have to adapt the formula accordingly.
  • 5. A Decision Tree Example The weather data example.
  • 6. Information Gained by Knowing the Result of a Decision In the weather data example, there are 9 instances of which the decision to play is “yes” and there are 5 instances of which the decision to play is “no”. Then, the information gained by knowing the result of the decision is
  • 7. Information Further Required If “Outlook” Is Placed at the Root Outlook sunny overcast rainy yes yes no no no yes yes yes yes yes yes yes no no
  • 8. Information Gained by Placing Each of the 4 Attributes Gain(outlook) = 0.940 bits – 0.693 bits = 0.247 bits. Gain(temperature) = 0.029 bits. Gain(humidity) = 0.152 bits. Gain(windy) = 0.048 bits.
  • 9. The Strategy for Selecting an Attribute to Place at a Node Select the attribute that gives us the largest information gain. In this example, it is the attribute “Outlook”. Outlook sunny overcast rainy 2 “yes” 3 “no” 4 “yes” 3 “yes” 2 “no”
  • 10. The Recursive Procedure for Constructing a Decision Tree The operation discussed above is applied to each branch recursively to construct the decision tree. For example, for the branch “Outlook = Sunny”, we evaluate the information gained by applying each of the remaining 3 attributes. Gain(Outlook=sunny;Temperature) = 0.971 – 0.4 = 0.571 Gain(Outlook=sunny;Humidity) = 0.971 – 0 = 0.971 Gain(Outlook=sunny;Windy) = 0.971 – 0.951 = 0.02
  • 11. Similarly, we also evaluate the information gained by applying each of the remaining 3 attributes for the branch “Outlook = rainy”. Gain(Outlook=rainy;Temperature) = 0.971 – 0.951 = 0.02 Gain(Outlook=rainy;Humidity) = 0.971 – 0.951 = 0.02 Gain(Outlook=rainy;Windy) =0.971 – 0 = 0.971
  • 12. Over-fitting and Pruning If we recursively build the decision tree based on our training set until each leaf is totally classified, we have most likely over-fitted the data. To avoid over-fitting, we need to set aside part of the training data to test the decision tree, and prune (delete) the branches that give poor predictions.
  • 13. The Over-fitting Issue Over-fitting is caused by creating decision rules that work accurately on the training set based on insufficient quantity of samples. As a result, these decision rules may not work well in more general cases.
  • 14.
  • 15. Evaluation Training accuracy How many training instances can be correctly classify based on the available data? Is high when the tree is deep/large, or when there is less confliction in the training instances. however, higher training accuracy does not mean good generalization Testing accuracy Given a number of new instances, how many of them can we correctly classify? Cross validation
  • 16. A partial decision tree with root node = income range
  • 17. A partial decision tree with root node = credit card insurance
  • 18. A three-node decision tree for the credit card database