SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
Marketing Campaign Effectiveness
Classification and Decision Tree Classifier
CIS 435
Francisco E. Figueroa
I. Introduction
Classification is a data mining task or function that assign objects to one of several
predefined categories or classes. The classification model encompasese diverse of
applications such as identifying load applicants as low, medium or high credit scores, detect
spam email messages based on the message header, among other examples. We must
consider that the classification model is the middle process where an input of attribute (x) that
goes through the classification model to obtain the output of the class label (y). The
classification task begins with a data set in which the class assignments are known. The
classifications are discrete and do not imply any type of order. If the class label is a continuous
attribute, then regression models will be used as predictive model. The simplest type of
classification problem is binary, where two possible values are possible. In the case that has
more values, then we have a multiclass. (Tan, 2006)
When building the classification model, after preparing the data, the training process is
key to the classification algorithm to find the relationships between the values of the predictors
and the values of the target. Descriptive modeling support the training process because it serve
as an explanatory tool to distinguish between objects of different classes. In the case, of the
predictive modeling, is used to predict the class label of unknown records. It’s important to point
out that classification techniques are suited for predicting or describing data sets with binary or
nominal categories. (SAS,2016)
In general, the classification technique requires a learning algorithm to identify a model
that best fits the relationship between the attribute set and the class label of the input data. The
objective of the algorithm is to build models with good generalization capability. To solve
classification problems we need to use a training set that will be applied to the test set, which
consist of records with unknown class labels. The evaluation of the performance of the
classification model is based on the confusion matrix.
The classification model has many application in customer segmentation, business
modeling, marketing, and credit analysis, among others.
II. Overview of Decision Tree
The decision tree is a classifier and is a powerful form to perform multiple variable
analysis. Decision trees are produced by algorithms that identify various ways of splitting a data
set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe,
or classify an outcome (or target). An example of a multiple variable analysis is a probability of
sale or the likelihood to respond to a marketing campaign as a result of the combined effects of
multiple input variables, factors, or dimensions. This multiple variable analysis capability of
decision trees enables to go beyond simple one-cause, one-effect relationships and to discover
and describe things in the context of multiple influences. (SAS,2016)
In a decision tree is created from a series of questions and their possible answers that
are organized in a hierarchical structure consisting of nodes and directed edges. The tree has
three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges;
b) internal nodes - each of which has exactly one incoming edge and two or more outgoing
edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not
outgoing edges.
Efficient algorithms have been developed to induce a reasonably accurate decision
trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a
series of locally optimum decisions about which attribute to use for partitioning the data. The
Hunt’s algorithm is the bases of many existing decision tree induction algorithms.
One of the biggest questions is how to split the training records and when to stop the
splitting. The decision induction algorithm must provide a method for expressing an attribute
test condition and its corresponding outcomes for different attribute types. There are measures
that can be used to determine the best way to split the records. The measures are defined in
terms of the class distribution of the record before and after the splitting. The measures
developed for selecting the best split are often based on the degree of impurity of the child
ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy
is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in
the dataset to divide dataset into several classes. Entropy is used for when node belongs to
only one class, then entropy will become zero, when disorder of dataset is high or classes are
equally divided then entropy will be maximal and help in making decision at several stages.
(Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by
CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and
Gini are primary factors of measuring data impurity for classification. Entropy is best for
categorical attributes and Gini more numeric and continuous attributes.
III. Parameters Used for Model Accuracy
The evaluation metrics available for binary classification models are: Accuracy,
Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true
positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall,
and Lift curves. When you see the accuracy is the proportion of correctly classified instances
and it is usually the first metric you look to evaluate a classifier. In the case that the data is is
unbalanced (where most of the instances belong to one of the classes), or you are more
interested in the performance on either one of the classes, accuracy doesn’t really capture the
effectiveness of a classifier.
The precision of the model let us understand which is the proportion of positives that are
classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier
classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between
precision and recall. Other areas that generates value to the accuracy model is the inspection
of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic
(ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is
to the upper left corner, the better the classifier’s performance is (that is maximizing the true
positive rate while minimizing the false positive rate). (Azure,2016)
IV. Weka Exercises
According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In
this case when we apply the training set with all the attributes we obtained the following results:
Correctly Classified Instances 4023 88.9847 %
Incorrectly Classified Instances 498 11.0153 %
No Yes
No 3838 (TN) 162 (FP)
Yes 336 (FN) 185 (TP)
The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104
Leaves and the size of the tree is 146.
When eliminating contact, day, month, and duration we obtained the following :
Correctly Classified Instances 4025 89.029 %
Incorrectly Classified Instances 496 10.971 %
No Yes
No 3961 (TN) 39 (FP)
Yes 457 (FN) 64 (TP)
The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves
and the size of the tree is 42. In summary, the training data when eliminating the contact, day,
month, and duration becomes more effective in terms of accuracy and the decision tree is less
complex.
V. Use Cases
Decision Tree is one of the successful data mining techniques used in the diagnosis of heart
disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is
based on Gain Ratio and binary discretization. (Showman,2011). Another application is for
marketing when a marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new item.
References
Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal
Factors for Performance Improvement: A Review. May 2016. International Journal of Computer
Applications. Vol 141 - No. 14.
Magee, J. Decision Trees for Decision Making.
Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved
from
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-perf
ormance/
SAS. Decision Trees - What are They. Retrieved from
http://support.sas.com/publishing/pubcat/chaps/57587.pdf
Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients
Retrieved from ​http://crpit.com/confpapers/CRPITV121Shouman.pdf

Contenu connexe

Tendances

Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1NBER
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11Mazhar Poohlah
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...National Cheng Kung University
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & predictionhktripathy
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsHarsh Parekh
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithmrahulmonikasharma
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningPranov Mishra
 

Tendances (17)

Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
Data analysis
Data analysisData analysis
Data analysis
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
Data analysis
Data analysisData analysis
Data analysis
 
Mkt research
Mkt researchMkt research
Mkt research
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 

En vedette (7)

El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS OptimizedEl Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
 
Collect Pro Datasheet
Collect Pro DatasheetCollect Pro Datasheet
Collect Pro Datasheet
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
 
Applying data mining for wine industry
Applying data mining for wine industryApplying data mining for wine industry
Applying data mining for wine industry
 
The iron triangle of healthcare
The iron triangle of healthcareThe iron triangle of healthcare
The iron triangle of healthcare
 
Integration and interoperability LOINC
Integration and interoperability LOINCIntegration and interoperability LOINC
Integration and interoperability LOINC
 

Similaire à Classification and decision tree classifier machine learning

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPromptCloud
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterIOSR Journals
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUESCREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUESijaia
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 

Similaire à Classification and decision tree classifier machine learning (20)

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
U0 vqmtq2otq=
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUESCREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 

Plus de Francisco E. Figueroa-Nigaglioni (7)

Healthcare terminologies recommendations
Healthcare terminologies recommendationsHealthcare terminologies recommendations
Healthcare terminologies recommendations
 
Interoperability critique
Interoperability critiqueInteroperability critique
Interoperability critique
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
Clustering algorithm Machine Learning
Clustering algorithm Machine LearningClustering algorithm Machine Learning
Clustering algorithm Machine Learning
 
Caribbean Business News - eCloud Suite 050512
Caribbean Business News - eCloud Suite 050512Caribbean Business News - eCloud Suite 050512
Caribbean Business News - eCloud Suite 050512
 
Resumen Solucion CollectPro
Resumen Solucion CollectProResumen Solucion CollectPro
Resumen Solucion CollectPro
 
Introduction to CollectPro
Introduction to CollectProIntroduction to CollectPro
Introduction to CollectPro
 

Dernier

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 

Dernier (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Classification and decision tree classifier machine learning

  • 1. Marketing Campaign Effectiveness Classification and Decision Tree Classifier CIS 435 Francisco E. Figueroa I. Introduction Classification is a data mining task or function that assign objects to one of several predefined categories or classes. The classification model encompasese diverse of applications such as identifying load applicants as low, medium or high credit scores, detect spam email messages based on the message header, among other examples. We must consider that the classification model is the middle process where an input of attribute (x) that goes through the classification model to obtain the output of the class label (y). The classification task begins with a data set in which the class assignments are known. The classifications are discrete and do not imply any type of order. If the class label is a continuous attribute, then regression models will be used as predictive model. The simplest type of classification problem is binary, where two possible values are possible. In the case that has more values, then we have a multiclass. (Tan, 2006) When building the classification model, after preparing the data, the training process is key to the classification algorithm to find the relationships between the values of the predictors and the values of the target. Descriptive modeling support the training process because it serve as an explanatory tool to distinguish between objects of different classes. In the case, of the predictive modeling, is used to predict the class label of unknown records. It’s important to point out that classification techniques are suited for predicting or describing data sets with binary or nominal categories. (SAS,2016) In general, the classification technique requires a learning algorithm to identify a model that best fits the relationship between the attribute set and the class label of the input data. The objective of the algorithm is to build models with good generalization capability. To solve classification problems we need to use a training set that will be applied to the test set, which consist of records with unknown class labels. The evaluation of the performance of the classification model is based on the confusion matrix. The classification model has many application in customer segmentation, business modeling, marketing, and credit analysis, among others. II. Overview of Decision Tree The decision tree is a classifier and is a powerful form to perform multiple variable analysis. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe, or classify an outcome (or target). An example of a multiple variable analysis is a probability of sale or the likelihood to respond to a marketing campaign as a result of the combined effects of multiple input variables, factors, or dimensions. This multiple variable analysis capability of decision trees enables to go beyond simple one-cause, one-effect relationships and to discover and describe things in the context of multiple influences. (SAS,2016)
  • 2. In a decision tree is created from a series of questions and their possible answers that are organized in a hierarchical structure consisting of nodes and directed edges. The tree has three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges; b) internal nodes - each of which has exactly one incoming edge and two or more outgoing edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not outgoing edges. Efficient algorithms have been developed to induce a reasonably accurate decision trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally optimum decisions about which attribute to use for partitioning the data. The Hunt’s algorithm is the bases of many existing decision tree induction algorithms. One of the biggest questions is how to split the training records and when to stop the splitting. The decision induction algorithm must provide a method for expressing an attribute test condition and its corresponding outcomes for different attribute types. There are measures that can be used to determine the best way to split the records. The measures are defined in terms of the class distribution of the record before and after the splitting. The measures developed for selecting the best split are often based on the degree of impurity of the child ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in the dataset to divide dataset into several classes. Entropy is used for when node belongs to only one class, then entropy will become zero, when disorder of dataset is high or classes are equally divided then entropy will be maximal and help in making decision at several stages. (Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and Gini are primary factors of measuring data impurity for classification. Entropy is best for categorical attributes and Gini more numeric and continuous attributes. III. Parameters Used for Model Accuracy The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall, and Lift curves. When you see the accuracy is the proportion of correctly classified instances and it is usually the first metric you look to evaluate a classifier. In the case that the data is is unbalanced (where most of the instances belong to one of the classes), or you are more interested in the performance on either one of the classes, accuracy doesn’t really capture the effectiveness of a classifier. The precision of the model let us understand which is the proportion of positives that are classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between precision and recall. Other areas that generates value to the accuracy model is the inspection of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is
  • 3. to the upper left corner, the better the classifier’s performance is (that is maximizing the true positive rate while minimizing the false positive rate). (Azure,2016) IV. Weka Exercises According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In this case when we apply the training set with all the attributes we obtained the following results: Correctly Classified Instances 4023 88.9847 % Incorrectly Classified Instances 498 11.0153 % No Yes No 3838 (TN) 162 (FP) Yes 336 (FN) 185 (TP) The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104 Leaves and the size of the tree is 146. When eliminating contact, day, month, and duration we obtained the following : Correctly Classified Instances 4025 89.029 % Incorrectly Classified Instances 496 10.971 % No Yes No 3961 (TN) 39 (FP) Yes 457 (FN) 64 (TP) The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves and the size of the tree is 42. In summary, the training data when eliminating the contact, day, month, and duration becomes more effective in terms of accuracy and the decision tree is less complex. V. Use Cases Decision Tree is one of the successful data mining techniques used in the diagnosis of heart disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is based on Gain Ratio and binary discretization. (Showman,2011). Another application is for marketing when a marketing manager at a company needs to analyze a customer with a given profile, who will buy a new item.
  • 4. References Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal Factors for Performance Improvement: A Review. May 2016. International Journal of Computer Applications. Vol 141 - No. 14. Magee, J. Decision Trees for Decision Making. Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved from https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-perf ormance/ SAS. Decision Trees - What are They. Retrieved from http://support.sas.com/publishing/pubcat/chaps/57587.pdf Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients Retrieved from ​http://crpit.com/confpapers/CRPITV121Shouman.pdf