SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
C4.5 algorithm and Multivariate Decision Trees

                                                   Thales Sehn Korting

                 Image Processing Division, National Institute for Space Research – INPE
                                   S˜o Jos´ dos Campos – SP, Brazil
                                     a     e

                                                tkorting@dpi.inpe.br


                        Abstract
   The aim of this article is to show a brief description
about the C4.5 algorithm, used to create Univariate De-
cision Trees. We also talk about Multivariate Decision
Trees, their process to classify instances using more than
one attribute per node in the tree. We try to discuss how
they work, and how to implement the algorithms that
build such trees, including examples of Univariate and
Multivariate results.



1. Introduction
    Describing the Pattern Recognition process, the goal
is to learn (or to “teach” a machine) how to classify ob-
jects, through the analysis of an instances set, whose
classes1 are known [5].
    As we know the classes of an instances set (or train-
ing set), we can use several algorithms to discover the
way the attributes-vector of the instances behaves, to             Figure 1. Simple example of a classification pro-
estimate the classes for new instances. One manner to              cess.
do this is through Decision Trees (DT’s).
    A tree is either a leaf node labeled with a class, or
a structure containing a test, linked to two or more                The DT’s can deal with one attribute per test node
nodes (or subtrees) [5]. So, to classify some instance,          or with more than one. The former approach is called
first we get its attribute-vector, and apply this vec-            Univariate DT, and the second is the Multivariate
tor to the tree. The tests are performed into these at-          method. This article explains the construction of Uni-
tributes, reaching one or other leaf, to complete the            variate DT’s and the C4.5 algorithm, used to build such
classification process, as in Figure 1.                           trees (Section 2). After this, we discuss the Multivari-
    If we have n attributes for our instances, we’ll have        ate approach, and how to construct such trees (Section
a n-dimensional space to the classes. And the DT will            3). At the end of each approach (Uni and Multivari-
create hyperplanes (or partitions) to divide this space          ate), we show some results for different test cases.
to the classes. A 2D space is shown in Figure 2, and
the lines means the hyperplanes in this dimension.               2. C4.5 Algorithm
1   Mutually exclusive labels, such as “buildings”, “deforest-      This section explains one of the algorithms used to
    ment”, etc.                                                  create Univariate DT’s. This one, called C4.5, is based
and finally, we define Gain by

                                                                      Gain(y, j) = Entropy(y − Entropy(j|y))

                                                                  The aim is to maximize the Gain, dividing by over-
                                                               all entropy due to split argument y by value j.

                                                               2.3. Prunning

                                                                  This is an important step to the result because of
                                                               the outliers. All data sets contain a little subset of in-
                                                               stances that are not well-defined, and differs from the
                                                               other ones on its neighborhood.
           Figure 2. Partitions created in a DT.
                                                                  After the complete creation of the tree, that must
                                                               classify all the instances in the training set, it is pruned.
on the ID32 algorithm, that tries to find small (or sim-        This is to reduce classification errors, caused by espe-
ple) DT’s. We start presenting some premisses on wich          cialization in the training set; this is done to make the
this algorithm is based, and after we discuss the infer-       tree more general.
ence of the weights and tests in the nodes of the trees.
                                                               2.4. Results
2.1. Construction

   Some premisses guide this algorithm, such as the fol-          To show concrete examples of the C4.5 algorithm ap-
lowing [4]:                                                    plication, we used the System WEKA [6]. One training
                                                               set, considering some aspects of working people, like
    • if all cases are of the same class, the tree is a leaf   vacation time, working hours, health plan was used.
      and so the leaf is returned labelled with this class;    The resulting classes are about the work conditions,
    • for each attribute, calculate the potential informa-     i.e. good or bad. Figure 3 shows the resulting DT, us-
      tion provided by a test on the attribute (based on       ing C4.5 implementation from WEKA.
      the probabilities of each case having a particular          Another example deals with levels of contact-lenses,
      value for the attribute). Also calculate the gain in     according to some characteristics of the patients. Re-
      information that would result from a test on the         sults in Figure 4.
      attribute (based on the probabilities of each case
      with a particular value for the attribute being of
                                                               3. Multivariate DT’s
      a particular class);
    • depending on the current selection criterion, find           Talking about Multivariate DT’s, and inductive-
      the best attribute to branch on.                         learning, they are able to generalize well when deal-
                                                               ing with attributes correlation. Also, the results are
2.2. Counting gain                                             easy to the humans, i.e. we can understand the influ-
                                                               ence of each attribute to the whole process [2].
   This process uses the “Entropy”, i.e. a measure of             One problem, when using simple (or Univariate)
the disorder of the data. The Entropy of y is calculated       DT’s, is that in the whole path, they test some at-
by                                                             tributes more than once. Sometimes this prejudices
                                    n                          the performance of the system, because with a sim-
                                        |yj |     |yj |
              Entropy(y) = −                  log              ple transformation in the data, such as principal com-
                                  j=1
                                         |y|       |y|
                                                               ponents, we can reduce de correlation between the at-
iterating over all possible values of y. The conditional       tributes, and with a simple test realize the same clas-
Entropy is                                                     sification. But the aim of the Multivariate DT’s are to
                                                               perform different tests with the data, according to the
                                     |yj |     |yj |           Figure 5.
                Entropy(j|y) =             log
                                      |y|       |y|               The purpose of the Multivariate approach is to use
                                                               more than one attribute in the test leaves. In the ex-
2    ID3 stands for Iterative Dichotomiser 3                   ample of Figure 5, we can change the whole set of tests
Figure 3. Simple Univariate DT, created by the C4.5 algorithm. In blue are the tests, green and red are the
   resulting classes.




   Figure 4. Other Univariate DT, created by the C4.5 algorithm. In blue are the tests, and in red the resulting
   classes.


by the simple one x + y ≥ 8. But, how to develop an al-          3.1. Tree Construction
gorithm that is able to “discover” such planes? This is
the content of the following sections.                              The first step in this phase is to have a set of train-
   We can think this approach like a linear combina-             ing instances. All of them have a attributes, and a as-
tion of the attributes, at each internal node. For exem-         sociated class. This is the default procedure for all clas-
ple, an instance with this attributes y = y1 , y2 , . . . , yn   sification methods.
belonging to class Cj . The tests at each node of the               Through a top-down decision tree algorithm, and a
tree will follow the form:                                       merit selection criterion, the process chooses the best
                                                                 test to split the data, creating a branch. Now, in the
                       n+1
                                                                 first time, we have two partitions, on wich the algo-
                              wi yi > 0
                                                                 rithm do the same top-down analysis, to make more
                        i=1
                                                                 partitions, according to the criteria.
where w1 , w2 , . . . wn+1 are real-valued coefficients [3].          One of the stop criterion is when some partition
Let’s also consider the attributes y1 , y2 , . . . , yn can be   presents just a single class, so this node becomes a
real too, but some approaches deals with symbolic ones,          leave, with an associated class.
most of the times inserting them into a scale of num-               But, we want to know how the process splits the
bers.                                                            data, and here is the difference between Multi and Uni-
   Multivariate and Univariate DT’s share some prop-             variate DT’s.
erties, when modelling the tree, specially at the stage             Considering a multiclass instance set, we can repre-
of prunig statistically invalid branches.                        sent the multivariate tests with a Linear Machine (LM)
Figure 5. Problem in the Univariate approach [2]. It performs several, and the blue line (Multivariate) is
   much more efficient.


[2].                                                          3.1.2. Thermal Perceptron: For not linearly sep-
                                                              arable instances, one method is the “thermal percep-
LM: Let y be an instance description consisting of 1          tron” [1], that also adjusts wi and wj , and deals with
   and the n features that describe the instance. Then        some constants
   each discriminant function gi (y) has the form                                          B
                                                                                     c=
                              T
                             wi y                                                        B+k
                                                              and
       where wi is a vector of n + 1 coefficients. The LM                               (wj − wi )T y
       infers instance y to belong to class i iff                                k=
                                                                                         2y T y
                   (∀j, i = j)gi (y) > gj (y)                 The process is according to the following algorithm:
                                                              1. B = 2;
   Some methods for training a LM have been pro-              2. If LM is correct for all instances
posed. We can start the weights vector with a default            Or B < 0.001, RETURN
value for all wi , i = 1, . . . N . Here, we show the abso-   3. Otherwise, for each misclassified instance
lute error correction rule, and the thermal perceptron.          3.1. Compute correction c
                                                                      update w[i] and w[j]
3.1.1. Absolute Error Correction rule: One ap-
                                                                 3.2. Adjust B <- aB - b
proach for updating the weight of the discriminat func-
                                                                      with a = 0.99 and b = 0.0005
tions is the absolute error correction rule, wich adjusts
                                                              4. Back to step 2
wi , where i is the class to which the instance belongs,
and wj , where j is the class to which the LM incor-             The basic idea of this algorithm is to correct the
rectly assigns the instance. The correction is accom-         weights-vector until all instances become correct, or in
plished by                                                    the worst case, a certain number of iterations is reached
                                                              (represented by the atualization of B value, decreasing
                      wi ← wi + cy                            according the equation B = aB − b, as a = 99% and
                                                              b = 0.0005 is also a linear small decreasing of the value
and                                                           B.

                      wj ← wj − cy                            3.2. Prunning

where                                                            When prunnig Multivariate DT’s, one must consider
                                                              that this can result in more classification errors than
                         (wj − wi )T y                        in generalization increasing. Generally, just some fea-
                   c=                                         tures (or attributes) are extracted from the multivari-
                            2y T y
                                                              ate tests, instead of prunnnig the whole node. [2] says
is the smallest integer such that the updated LM will         that a multivariate test with n−1 features is more gen-
classify the instance correctly.                              eral than one based on n features.
3.3. Results                                                    [4] J. Quinlan. C 4. 5: Programs for Machine Learning. Mor-
                                                                    gan Kaufmann, 1992.
   Figure 6 shows a good example, doing the classifi-            [5] J. Quinlan. Learning decision tree classifiers. ACM Com-
cation with simple tests, even with a complicated data              puting Surveys (CSUR), 28(1):71–72, 1996.
set.                                                            [6] Weka. WEKA (Data Mining Software). Available at
                                                                    http://www.cs.waikato.ac.nz/ml/weka/. 2006.

4. Conclusion
   In this article we made a discussion about Decision
Trees, the Univariate and the Multivariate approaches.
The C4.5 algorithm implements one way to build Uni-
variate DT’s and some results were shown. About the
Multivariate approach, first we discussed about the ad-
vantages of using it, and we showed how to build such
trees with the Linear Machine approach, using the Ab-
solute Error Correction and also the Thermal percep-
tron rules.
   DT’s are a powerful tool for classification, specially
when the results need to be interpreted by human. Mul-
tivariate DT’s deals well with attributes correlation,
presenting advantages in the tests, considering the Uni-
variate approach.

References
[1] C. Brodley and P. Utgoff. Multivariate Versus Univariate
    Decision Trees. 1992.
[2] C. Brodley and P. Utgoff. Multivariate decision trees. Ma-
    chine Learning, 19(1):45–77, 1995.
[3] S. Murthy, S. Kasif, and S. Salzberg. A System for
    Induction of Oblique Decision Trees. Arxiv preprint
    cs.AI/9408103, 1994.




   Figure 6. Multivariate DT, created by the OC1
   algorithm (Oblique Classifier 1) [3].

Contenu connexe

Tendances

Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AIMegha Sharma
 
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...vikas dhakane
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
K means clustering
K means clusteringK means clustering
K means clusteringKuppusamy P
 
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...RahulSharma4566
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmHema Kashyap
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extractionRushin Shah
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 

Tendances (20)

Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Line Detection
Line DetectionLine Detection
Line Detection
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Forward and Backward chaining in AI
Forward and Backward chaining in AIForward and Backward chaining in AI
Forward and Backward chaining in AI
 
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
I.INFORMED SEARCH IN ARTIFICIAL INTELLIGENCE II. HEURISTIC FUNCTION IN AI III...
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...
Heuristic Search in Artificial Intelligence | Heuristic Function in AI | Admi...
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Lstm
LstmLstm
Lstm
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Lecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithmLecture 17 Iterative Deepening a star algorithm
Lecture 17 Iterative Deepening a star algorithm
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 

En vedette

Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesMarc Garcia
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
 
Predicting Student Performance in Solving Parameterized Exercises
Predicting Student Performance in Solving Parameterized ExercisesPredicting Student Performance in Solving Parameterized Exercises
Predicting Student Performance in Solving Parameterized ExercisesShaghayegh (Sherry) Sahebi
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance. Ranjith Gowda
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 

En vedette (12)

C3.3.1
C3.3.1C3.3.1
C3.3.1
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression Trees
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
 
Predicting Student Performance in Solving Parameterized Exercises
Predicting Student Performance in Solving Parameterized ExercisesPredicting Student Performance in Solving Parameterized Exercises
Predicting Student Performance in Solving Parameterized Exercises
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 

Similaire à Multivariate decision tree

LE03.doc
LE03.docLE03.doc
LE03.docbutest
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionXin-She Yang
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...cscpconf
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 

Similaire à Multivariate decision tree (20)

Hx3115011506
Hx3115011506Hx3115011506
Hx3115011506
 
LE03.doc
LE03.docLE03.doc
LE03.doc
 
Decision tree
Decision treeDecision tree
Decision tree
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
 
Decision tree
Decision tree Decision tree
Decision tree
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 

Dernier

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Dernier (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Multivariate decision tree

  • 1. C4.5 algorithm and Multivariate Decision Trees Thales Sehn Korting Image Processing Division, National Institute for Space Research – INPE S˜o Jos´ dos Campos – SP, Brazil a e tkorting@dpi.inpe.br Abstract The aim of this article is to show a brief description about the C4.5 algorithm, used to create Univariate De- cision Trees. We also talk about Multivariate Decision Trees, their process to classify instances using more than one attribute per node in the tree. We try to discuss how they work, and how to implement the algorithms that build such trees, including examples of Univariate and Multivariate results. 1. Introduction Describing the Pattern Recognition process, the goal is to learn (or to “teach” a machine) how to classify ob- jects, through the analysis of an instances set, whose classes1 are known [5]. As we know the classes of an instances set (or train- ing set), we can use several algorithms to discover the way the attributes-vector of the instances behaves, to Figure 1. Simple example of a classification pro- estimate the classes for new instances. One manner to cess. do this is through Decision Trees (DT’s). A tree is either a leaf node labeled with a class, or a structure containing a test, linked to two or more The DT’s can deal with one attribute per test node nodes (or subtrees) [5]. So, to classify some instance, or with more than one. The former approach is called first we get its attribute-vector, and apply this vec- Univariate DT, and the second is the Multivariate tor to the tree. The tests are performed into these at- method. This article explains the construction of Uni- tributes, reaching one or other leaf, to complete the variate DT’s and the C4.5 algorithm, used to build such classification process, as in Figure 1. trees (Section 2). After this, we discuss the Multivari- If we have n attributes for our instances, we’ll have ate approach, and how to construct such trees (Section a n-dimensional space to the classes. And the DT will 3). At the end of each approach (Uni and Multivari- create hyperplanes (or partitions) to divide this space ate), we show some results for different test cases. to the classes. A 2D space is shown in Figure 2, and the lines means the hyperplanes in this dimension. 2. C4.5 Algorithm 1 Mutually exclusive labels, such as “buildings”, “deforest- This section explains one of the algorithms used to ment”, etc. create Univariate DT’s. This one, called C4.5, is based
  • 2. and finally, we define Gain by Gain(y, j) = Entropy(y − Entropy(j|y)) The aim is to maximize the Gain, dividing by over- all entropy due to split argument y by value j. 2.3. Prunning This is an important step to the result because of the outliers. All data sets contain a little subset of in- stances that are not well-defined, and differs from the other ones on its neighborhood. Figure 2. Partitions created in a DT. After the complete creation of the tree, that must classify all the instances in the training set, it is pruned. on the ID32 algorithm, that tries to find small (or sim- This is to reduce classification errors, caused by espe- ple) DT’s. We start presenting some premisses on wich cialization in the training set; this is done to make the this algorithm is based, and after we discuss the infer- tree more general. ence of the weights and tests in the nodes of the trees. 2.4. Results 2.1. Construction Some premisses guide this algorithm, such as the fol- To show concrete examples of the C4.5 algorithm ap- lowing [4]: plication, we used the System WEKA [6]. One training set, considering some aspects of working people, like • if all cases are of the same class, the tree is a leaf vacation time, working hours, health plan was used. and so the leaf is returned labelled with this class; The resulting classes are about the work conditions, • for each attribute, calculate the potential informa- i.e. good or bad. Figure 3 shows the resulting DT, us- tion provided by a test on the attribute (based on ing C4.5 implementation from WEKA. the probabilities of each case having a particular Another example deals with levels of contact-lenses, value for the attribute). Also calculate the gain in according to some characteristics of the patients. Re- information that would result from a test on the sults in Figure 4. attribute (based on the probabilities of each case with a particular value for the attribute being of 3. Multivariate DT’s a particular class); • depending on the current selection criterion, find Talking about Multivariate DT’s, and inductive- the best attribute to branch on. learning, they are able to generalize well when deal- ing with attributes correlation. Also, the results are 2.2. Counting gain easy to the humans, i.e. we can understand the influ- ence of each attribute to the whole process [2]. This process uses the “Entropy”, i.e. a measure of One problem, when using simple (or Univariate) the disorder of the data. The Entropy of y is calculated DT’s, is that in the whole path, they test some at- by tributes more than once. Sometimes this prejudices n the performance of the system, because with a sim- |yj | |yj | Entropy(y) = − log ple transformation in the data, such as principal com- j=1 |y| |y| ponents, we can reduce de correlation between the at- iterating over all possible values of y. The conditional tributes, and with a simple test realize the same clas- Entropy is sification. But the aim of the Multivariate DT’s are to perform different tests with the data, according to the |yj | |yj | Figure 5. Entropy(j|y) = log |y| |y| The purpose of the Multivariate approach is to use more than one attribute in the test leaves. In the ex- 2 ID3 stands for Iterative Dichotomiser 3 ample of Figure 5, we can change the whole set of tests
  • 3. Figure 3. Simple Univariate DT, created by the C4.5 algorithm. In blue are the tests, green and red are the resulting classes. Figure 4. Other Univariate DT, created by the C4.5 algorithm. In blue are the tests, and in red the resulting classes. by the simple one x + y ≥ 8. But, how to develop an al- 3.1. Tree Construction gorithm that is able to “discover” such planes? This is the content of the following sections. The first step in this phase is to have a set of train- We can think this approach like a linear combina- ing instances. All of them have a attributes, and a as- tion of the attributes, at each internal node. For exem- sociated class. This is the default procedure for all clas- ple, an instance with this attributes y = y1 , y2 , . . . , yn sification methods. belonging to class Cj . The tests at each node of the Through a top-down decision tree algorithm, and a tree will follow the form: merit selection criterion, the process chooses the best test to split the data, creating a branch. Now, in the n+1 first time, we have two partitions, on wich the algo- wi yi > 0 rithm do the same top-down analysis, to make more i=1 partitions, according to the criteria. where w1 , w2 , . . . wn+1 are real-valued coefficients [3]. One of the stop criterion is when some partition Let’s also consider the attributes y1 , y2 , . . . , yn can be presents just a single class, so this node becomes a real too, but some approaches deals with symbolic ones, leave, with an associated class. most of the times inserting them into a scale of num- But, we want to know how the process splits the bers. data, and here is the difference between Multi and Uni- Multivariate and Univariate DT’s share some prop- variate DT’s. erties, when modelling the tree, specially at the stage Considering a multiclass instance set, we can repre- of prunig statistically invalid branches. sent the multivariate tests with a Linear Machine (LM)
  • 4. Figure 5. Problem in the Univariate approach [2]. It performs several, and the blue line (Multivariate) is much more efficient. [2]. 3.1.2. Thermal Perceptron: For not linearly sep- arable instances, one method is the “thermal percep- LM: Let y be an instance description consisting of 1 tron” [1], that also adjusts wi and wj , and deals with and the n features that describe the instance. Then some constants each discriminant function gi (y) has the form B c= T wi y B+k and where wi is a vector of n + 1 coefficients. The LM (wj − wi )T y infers instance y to belong to class i iff k= 2y T y (∀j, i = j)gi (y) > gj (y) The process is according to the following algorithm: 1. B = 2; Some methods for training a LM have been pro- 2. If LM is correct for all instances posed. We can start the weights vector with a default Or B < 0.001, RETURN value for all wi , i = 1, . . . N . Here, we show the abso- 3. Otherwise, for each misclassified instance lute error correction rule, and the thermal perceptron. 3.1. Compute correction c update w[i] and w[j] 3.1.1. Absolute Error Correction rule: One ap- 3.2. Adjust B <- aB - b proach for updating the weight of the discriminat func- with a = 0.99 and b = 0.0005 tions is the absolute error correction rule, wich adjusts 4. Back to step 2 wi , where i is the class to which the instance belongs, and wj , where j is the class to which the LM incor- The basic idea of this algorithm is to correct the rectly assigns the instance. The correction is accom- weights-vector until all instances become correct, or in plished by the worst case, a certain number of iterations is reached (represented by the atualization of B value, decreasing wi ← wi + cy according the equation B = aB − b, as a = 99% and b = 0.0005 is also a linear small decreasing of the value and B. wj ← wj − cy 3.2. Prunning where When prunnig Multivariate DT’s, one must consider that this can result in more classification errors than (wj − wi )T y in generalization increasing. Generally, just some fea- c= tures (or attributes) are extracted from the multivari- 2y T y ate tests, instead of prunnnig the whole node. [2] says is the smallest integer such that the updated LM will that a multivariate test with n−1 features is more gen- classify the instance correctly. eral than one based on n features.
  • 5. 3.3. Results [4] J. Quinlan. C 4. 5: Programs for Machine Learning. Mor- gan Kaufmann, 1992. Figure 6 shows a good example, doing the classifi- [5] J. Quinlan. Learning decision tree classifiers. ACM Com- cation with simple tests, even with a complicated data puting Surveys (CSUR), 28(1):71–72, 1996. set. [6] Weka. WEKA (Data Mining Software). Available at http://www.cs.waikato.ac.nz/ml/weka/. 2006. 4. Conclusion In this article we made a discussion about Decision Trees, the Univariate and the Multivariate approaches. The C4.5 algorithm implements one way to build Uni- variate DT’s and some results were shown. About the Multivariate approach, first we discussed about the ad- vantages of using it, and we showed how to build such trees with the Linear Machine approach, using the Ab- solute Error Correction and also the Thermal percep- tron rules. DT’s are a powerful tool for classification, specially when the results need to be interpreted by human. Mul- tivariate DT’s deals well with attributes correlation, presenting advantages in the tests, considering the Uni- variate approach. References [1] C. Brodley and P. Utgoff. Multivariate Versus Univariate Decision Trees. 1992. [2] C. Brodley and P. Utgoff. Multivariate decision trees. Ma- chine Learning, 19(1):45–77, 1995. [3] S. Murthy, S. Kasif, and S. Salzberg. A System for Induction of Oblique Decision Trees. Arxiv preprint cs.AI/9408103, 1994. Figure 6. Multivariate DT, created by the OC1 algorithm (Oblique Classifier 1) [3].