Classification and decision tree classifier machine learning
1. Marketing Campaign Effectiveness
Classification and Decision Tree Classifier
CIS 435
Francisco E. Figueroa
I. Introduction
Classification is a data mining task or function that assign objects to one of several
predefined categories or classes. The classification model encompasese diverse of
applications such as identifying load applicants as low, medium or high credit scores, detect
spam email messages based on the message header, among other examples. We must
consider that the classification model is the middle process where an input of attribute (x) that
goes through the classification model to obtain the output of the class label (y). The
classification task begins with a data set in which the class assignments are known. The
classifications are discrete and do not imply any type of order. If the class label is a continuous
attribute, then regression models will be used as predictive model. The simplest type of
classification problem is binary, where two possible values are possible. In the case that has
more values, then we have a multiclass. (Tan, 2006)
When building the classification model, after preparing the data, the training process is
key to the classification algorithm to find the relationships between the values of the predictors
and the values of the target. Descriptive modeling support the training process because it serve
as an explanatory tool to distinguish between objects of different classes. In the case, of the
predictive modeling, is used to predict the class label of unknown records. It’s important to point
out that classification techniques are suited for predicting or describing data sets with binary or
nominal categories. (SAS,2016)
In general, the classification technique requires a learning algorithm to identify a model
that best fits the relationship between the attribute set and the class label of the input data. The
objective of the algorithm is to build models with good generalization capability. To solve
classification problems we need to use a training set that will be applied to the test set, which
consist of records with unknown class labels. The evaluation of the performance of the
classification model is based on the confusion matrix.
The classification model has many application in customer segmentation, business
modeling, marketing, and credit analysis, among others.
II. Overview of Decision Tree
The decision tree is a classifier and is a powerful form to perform multiple variable
analysis. Decision trees are produced by algorithms that identify various ways of splitting a data
set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe,
or classify an outcome (or target). An example of a multiple variable analysis is a probability of
sale or the likelihood to respond to a marketing campaign as a result of the combined effects of
multiple input variables, factors, or dimensions. This multiple variable analysis capability of
decision trees enables to go beyond simple one-cause, one-effect relationships and to discover
and describe things in the context of multiple influences. (SAS,2016)
2. In a decision tree is created from a series of questions and their possible answers that
are organized in a hierarchical structure consisting of nodes and directed edges. The tree has
three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges;
b) internal nodes - each of which has exactly one incoming edge and two or more outgoing
edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not
outgoing edges.
Efficient algorithms have been developed to induce a reasonably accurate decision
trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a
series of locally optimum decisions about which attribute to use for partitioning the data. The
Hunt’s algorithm is the bases of many existing decision tree induction algorithms.
One of the biggest questions is how to split the training records and when to stop the
splitting. The decision induction algorithm must provide a method for expressing an attribute
test condition and its corresponding outcomes for different attribute types. There are measures
that can be used to determine the best way to split the records. The measures are defined in
terms of the class distribution of the record before and after the splitting. The measures
developed for selecting the best split are often based on the degree of impurity of the child
ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy
is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in
the dataset to divide dataset into several classes. Entropy is used for when node belongs to
only one class, then entropy will become zero, when disorder of dataset is high or classes are
equally divided then entropy will be maximal and help in making decision at several stages.
(Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by
CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and
Gini are primary factors of measuring data impurity for classification. Entropy is best for
categorical attributes and Gini more numeric and continuous attributes.
III. Parameters Used for Model Accuracy
The evaluation metrics available for binary classification models are: Accuracy,
Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true
positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall,
and Lift curves. When you see the accuracy is the proportion of correctly classified instances
and it is usually the first metric you look to evaluate a classifier. In the case that the data is is
unbalanced (where most of the instances belong to one of the classes), or you are more
interested in the performance on either one of the classes, accuracy doesn’t really capture the
effectiveness of a classifier.
The precision of the model let us understand which is the proportion of positives that are
classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier
classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between
precision and recall. Other areas that generates value to the accuracy model is the inspection
of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic
(ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is
3. to the upper left corner, the better the classifier’s performance is (that is maximizing the true
positive rate while minimizing the false positive rate). (Azure,2016)
IV. Weka Exercises
According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In
this case when we apply the training set with all the attributes we obtained the following results:
Correctly Classified Instances 4023 88.9847 %
Incorrectly Classified Instances 498 11.0153 %
No Yes
No 3838 (TN) 162 (FP)
Yes 336 (FN) 185 (TP)
The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104
Leaves and the size of the tree is 146.
When eliminating contact, day, month, and duration we obtained the following :
Correctly Classified Instances 4025 89.029 %
Incorrectly Classified Instances 496 10.971 %
No Yes
No 3961 (TN) 39 (FP)
Yes 457 (FN) 64 (TP)
The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves
and the size of the tree is 42. In summary, the training data when eliminating the contact, day,
month, and duration becomes more effective in terms of accuracy and the decision tree is less
complex.
V. Use Cases
Decision Tree is one of the successful data mining techniques used in the diagnosis of heart
disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is
based on Gain Ratio and binary discretization. (Showman,2011). Another application is for
marketing when a marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new item.
4. References
Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal
Factors for Performance Improvement: A Review. May 2016. International Journal of Computer
Applications. Vol 141 - No. 14.
Magee, J. Decision Trees for Decision Making.
Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved
from
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-perf
ormance/
SAS. Decision Trees - What are They. Retrieved from
http://support.sas.com/publishing/pubcat/chaps/57587.pdf
Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients
Retrieved from http://crpit.com/confpapers/CRPITV121Shouman.pdf