SlideShare a Scribd company logo
1 of 41
Download to read offline
A Principled Evaluation of Ensembles of Learning
            Machines for Software Effort Estimation

                                   Leandro Minku, Xin Yao
                              {L.L.Minku,X.Yao}@cs.bham.ac.uk

               CERCIA, School of Computer Science, The University of Birmingham




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   1 / 22
Outline




            Introduction (Background and Motivation)
            Research Questions (Aims)
            Experiments (Method and Results)
            Answers to Research Questions (Conclusions)
            Future Work




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   2 / 22
Introduction

     Software cost estimation:
            Set of techniques and procedures that an organisation uses to
            arrive at an estimate.
            Major contributing factor is effort (in person-hours,
            person-month, etc).
            Overestimation vs. underestimation.

     Several software cost/effort estimation models have been proposed.

     ML models have been receiving increased attention:
            They make no or minimal assumptions about the data and the
            function being modelled.


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   3 / 22
Introduction
     Ensembles of Learning Machines are groups of learning machines
     trained to perform the same task and combined with the aim of
     improving predictive performance.

     Studies comparing ensembles against single learners in software
     effort estimation are contradictory:
            Braga et al IJCNN’07 claims that Bagging improves a bit
            effort estimations produced by single learners.
            Kultur et al KBS’09 claims that an adapted Bagging provides
            large improvements.
            Kocaguneli et al ISSRE’09 claims that combining different
            learners does not improve effort estimations.

     These studies either miss statistical tests or do not present the
     parameters choice. None of them analyse the reason for the
     achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   4 / 22
Introduction
     Ensembles of Learning Machines are groups of learning machines
     trained to perform the same task and combined with the aim of
     improving predictive performance.

     Studies comparing ensembles against single learners in software
     effort estimation are contradictory:
            Braga et al IJCNN’07 claims that Bagging improves a bit
            effort estimations produced by single learners.
            Kultur et al KBS’09 claims that an adapted Bagging provides
            large improvements.
            Kocaguneli et al ISSRE’09 claims that combining different
            learners does not improve effort estimations.

     These studies either miss statistical tests or do not present the
     parameters choice. None of them analyse the reason for the
     achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   4 / 22
Introduction
     Ensembles of Learning Machines are groups of learning machines
     trained to perform the same task and combined with the aim of
     improving predictive performance.

     Studies comparing ensembles against single learners in software
     effort estimation are contradictory:
            Braga et al IJCNN’07 claims that Bagging improves a bit
            effort estimations produced by single learners.
            Kultur et al KBS’09 claims that an adapted Bagging provides
            large improvements.
            Kocaguneli et al ISSRE’09 claims that combining different
            learners does not improve effort estimations.

     These studies either miss statistical tests or do not present the
     parameters choice. None of them analyse the reason for the
     achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   4 / 22
Introduction
     Ensembles of Learning Machines are groups of learning machines
     trained to perform the same task and combined with the aim of
     improving predictive performance.

     Studies comparing ensembles against single learners in software
     effort estimation are contradictory:
            Braga et al IJCNN’07 claims that Bagging improves a bit
            effort estimations produced by single learners.
            Kultur et al KBS’09 claims that an adapted Bagging provides
            large improvements.
            Kocaguneli et al ISSRE’09 claims that combining different
            learners does not improve effort estimations.

     These studies either miss statistical tests or do not present the
     parameters choice. None of them analyse the reason for the
     achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   4 / 22
Research Questions




     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Research Questions


     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?

            The current studies are contradictory.
            They either do not perform statistical comparisons or do not
            explain the parameters choice.
            It would be worth to investigate the use of different ensemble
            approaches.
            We build upon current work by considering these points.



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Research Questions



     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?

     Question 2
     If a particular method is singled out, what insight on how to
     improve effort estimations can we gain by analysing its behaviour
     and the reasons for its better performance?




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Research Questions


     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?

     Question 2
     If a particular method is singled out, what insight on how to
     improve effort estimations can we gain by analysing its behaviour
     and the reasons for its better performance?

            Principled experiments, not just intuition or speculations.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Research Questions


     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?

     Question 2
     If a particular method is singled out, what insight on how to
     improve effort estimations can we gain by analysing its behaviour
     and the reasons for its better performance?

     Question 3
     How can someone determine what model to be used considering a
     particular data set?


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Research Questions
     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?

     Question 2
     If a particular method is singled out, what insight on how to
     improve effort estimations can we gain by analysing its behaviour
     and the reasons for its better performance?

     Question 3
     How can someone determine what model to be used considering a
     particular data set?

            Our study complements previous work, parameters choice is
            important.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   5 / 22
Data Sets and Preprocessing



            Data sets: cocomo81, nasa93, nasa, cocomo2, desharnais, 7
            ISBSG organization type subsets.
                    Cover a wide range of features.
                    In particular, ISBSG subsets’ productivity rate is statistically
                    different.
            Attributes: cocomo attributes for PROMISE data, functional
            size, development type and language type for ISBSG.
            Missing values: delete for PROMISE, k-NN imputation for
            ISBSG.
            Outliers: K-means detection / elimination.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   6 / 22
Experimental Framework – Step 1: choice of learning
machines


            Single learners:
                    MultiLayer Perceptrons (MLPs) – universal approximators;
                    Radial Basis Function networks (RBFs) – local learning; and
                    Regression Trees (RTs) – simple and comprehensive.


            Ensemble learners:
                    Bagging with MLPs, with RBFs and with RTs – widely and
                    successfully used;
                    Random with MLPs – use full training set for each learner; and
                    Negative Correlation Learning (NCL) with MLPs – regression.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   7 / 22
Experimental Framework – Step 2: choice of evaluation
method
     Executions were done in 30 rounds, 10 projects for testing and
     remaining for training, as suggested by Menzies et al. TSE’06.

     Evaluation was done in two steps:
       1 Menzies et al. TSE’06’s survival rejection rules:

                    If MMREs are significantly different according to a paired
                    t-test with 95% of confidence, the best model is the one with
                    the lowest average MMRE.
                    If not, the best method is the one with the best:
                       1   Correlation
                       2   Standard deviation
                       3   PRED(N)
                       4   Number of attributes
        2   Wilcoxon tests with 95% of confidence to compare the two
            methods more often among the best in terms of MMRE and
            PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   8 / 22
Experimental Framework – Step 2: choice of evaluation
method
     Executions were done in 30 rounds, 10 projects for testing and
     remaining for training, as suggested by Menzies et al. TSE’06.

     Evaluation was done in two steps:
       1 Menzies et al. TSE’06’s survival rejection rules:

                    If MMREs are significantly different according to a paired
                    t-test with 95% of confidence, the best model is the one with
                    the lowest average MMRE.
                    If not, the best method is the one with the best:
                       1   Correlation
                       2   Standard deviation
                       3   PRED(N)
                       4   Number of attributes
        2   Wilcoxon tests with 95% of confidence to compare the two
            methods more often among the best in terms of MMRE and
            PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   8 / 22
Experimental Framework – Step 2: choice of evaluation
method
     Executions were done in 30 rounds, 10 projects for testing and
     remaining for training, as suggested by Menzies et al. TSE’06.

     Evaluation was done in two steps:
       1 Menzies et al. TSE’06’s survival rejection rules:

                    If MMREs are significantly different according to a paired
                    t-test with 95% of confidence, the best model is the one with
                    the lowest average MMRE.
                    If not, the best method is the one with the best:
                       1   Correlation
                       2   Standard deviation
                       3   PRED(N)
                       4   Number of attributes
        2   Wilcoxon tests with 95% of confidence to compare the two
            methods more often among the best in terms of MMRE and
            PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   8 / 22
Experimental Framework – Step 2: choice of evaluation
method

     Mean Magnitude of the Relative Error
                                                                         |predictedi −actuali |
     M M RE = T T M REi , where M REi =
               1
                   i=1                                                          actuali

     Percentage of estimations within N % of the actual values
                                              N
                             1, if M REi ≤ 100
     P RED(N ) = T T1
                        i=1
                             0, otherwise
     Correlation between estimated and actual effort:
                  S
     CORR = √ pa , where
                         Sp Sa
                   T
                   i=1 (predictedi −¯)(actuali −¯)
                                    p           a
     Spa =                    T −1
                  T   (predictedi −¯)2
                                   p                      T    (actuali −¯)2
                                                                         a
     Sp =         i=1      T −1        ,        Sa = i=1           T −1      ,
                T   predictedi                   T   actuali
     p=
     ¯          i=1      T     , a=¯             i=1    T    .


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   9 / 22
Experimental Framework – Step 2: choice of evaluation
method

     Mean Magnitude of the Relative Error
                                                                         |predictedi −actuali |
     M M RE = T T M REi , where M REi =
               1
                   i=1                                                          actuali

     Percentage of estimations within N % of the actual values
                                              N
                             1, if M REi ≤ 100
     P RED(N ) = T T1
                        i=1
                             0, otherwise
     Correlation between estimated and actual effort:
                  S
     CORR = √ pa , where
                         Sp Sa
                   T
                   i=1 (predictedi −¯)(actuali −¯)
                                    p           a
     Spa =                    T −1
                  T   (predictedi −¯)2
                                   p                      T    (actuali −¯)2
                                                                         a
     Sp =         i=1      T −1        ,        Sa = i=1           T −1      ,
                T   predictedi                   T   actuali
     p=
     ¯          i=1      T     , a=¯             i=1    T    .


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   9 / 22
Experimental Framework – Step 2: choice of evaluation
method

     Mean Magnitude of the Relative Error
                                                                         |predictedi −actuali |
     M M RE = T T M REi , where M REi =
               1
                   i=1                                                          actuali

     Percentage of estimations within N % of the actual values
                                              N
                             1, if M REi ≤ 100
     P RED(N ) = T T1
                        i=1
                             0, otherwise
     Correlation between estimated and actual effort:
                  S
     CORR = √ pa , where
                         Sp Sa
                   T
                   i=1 (predictedi −¯)(actuali −¯)
                                    p           a
     Spa =                    T −1
                  T   (predictedi −¯)2
                                   p                      T    (actuali −¯)2
                                                                         a
     Sp =         i=1      T −1        ,        Sa = i=1           T −1      ,
                T   predictedi                   T   actuali
     p=
     ¯          i=1      T     , a=¯             i=1    T    .


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   9 / 22
Experimental Framework – Step 3: choice of parameters




            Preliminary experiments using 5 runs.
            Each approach was run with all the combinations of 3 or 5
            parameter values.
            Parameters with the lowest MMRE were chosen for further 30
            runs.
            Base learners will not necessarily have the same parameters as
            single learners.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   10 / 22
Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules



     Table: Number of Data Sets in which Each Method Survived. Methods
     that never survived are omitted.
                               PROMISE Data      ISBSG Data       All Data
                               RT:         2     MLP:         2   RT:           3
                               Bag + MLP: 1      Bag + RTs:   2   Bag + MLP:    2
                               NCL + MLP: 1      Bag + MLP:   1   NCL + MLP:    2
                               Rand + MLP: 1     RT:          1   Bag + RTs:    2
                                                 Bag + RBF:   1   MLP:          2
                                                 NCL + MLP:   1   Rand + MLP:   1
                                                                  Bag + RBF:    1



            No approach is consistently the best, even considering
            ensembles!



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   11 / 22
Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules



     Table: Number of Data Sets in which Each Method Survived. Methods
     that never survived are omitted.
                               PROMISE Data      ISBSG Data       All Data
                               RT:         2     MLP:         2   RT:           3
                               Bag + MLP: 1      Bag + RTs:   2   Bag + MLP:    2
                               NCL + MLP: 1      Bag + MLP:   1   NCL + MLP:    2
                               Rand + MLP: 1     RT:          1   Bag + RTs:    2
                                                 Bag + RBF:   1   MLP:          2
                                                 NCL + MLP:   1   Rand + MLP:   1
                                                                  Bag + RBF:    1



            No approach is consistently the best, even considering
            ensembles!



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   11 / 22
Comparison of Learning Machines
  What methods are usually among
  the best?
                                                                     RTs and bag+MLPs are more
  Table:     Number of Data Sets in which Each Method                frequently among the best
  Was Ranked First or Second According to MMRE and
  PRED(25). Methods never among the first and second
                                                                     considering MMRE than
  are omitted.                                                       considering PRED(25).
              (a) Accoding to MMRE
                                                                     The first ranked method’s
     PROMISE Data      ISBSG Data         All Data
     RT:        4      RT:           5    RT:            9           MMRE is statistically different
     Bag + MLP: 3      Bag + MLP     5    Bag + MLP:     8
     Bag + RT:  2      Bag + RBF:    3    Bag + RBF:     3
                                                                     from the others in 35.16% of
     MLP:       1      MLP:          1    MLP:           2           the cases.
                       Rand + MLP:   1    Bag + RT:      2
                       NCL + MLP:    1    Rand + MLP:    1
                                          NCL + MLP:     1           The second ranked method’s
                                                                     MMRE is statistically different
             (b) Acording to PRED(25)
                                                                     from the lower ranked methods
     PROMISE Data      ISBSG Data         All Data
     Bag + MLP: 3      RT:           5    RT:            6           in 16.67% of the cases.
     Rand + MLP: 3     Rand + MLP:   3    Rand + MLP:    6
     Bag + RT:
     RT:
                 2
                 1
                       Bag + MLP:
                       MLP:
                                     2
                                     2
                                          Bag + MLP:
                                          Bag + RT:
                                                         5
                                                         3
                                                                     RTs and bag+MLPs are
     MLP:        1     RBF:          2    MLP:           3           usually statistically equal in
                       Bag + RBF:    1    RBF:           2
                       Bag + RT:     1    Bag + RBF:     1           terms of MMRE and
                                                                     PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk       Ensembles for Software Effort Estimation   12 / 22
Research Questions – Revisited



     Question 1
     Do readily available ensemble methods generally improve effort
     estimations given by single learners? Which of them would be
     more useful?
            Even though bag+MLPs is frequently among the best
            methods, it is statistically similar to RTs.
            RTs are more comprehensive and have faster training.
            Bag+MLPs seem to have more potential for improvements.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   13 / 22
Why Were RTs Singled Out?


            Hypothesis: As RTs have splits based on information gain,
            they may work in such a way to give more importance for
            more relevant attributes.
            A further study using correlation-based feature selection
            revealed that RTs usually put higher features higher ranked by
            the feature selection method in higher level splits of the tree.
            Feature selection by itself was not able to always improve
            accuracy.

     It may be important to give weights to features when using ML
     approaches.



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   14 / 22
Why Were RTs Singled Out?


     Table: Correlation-Based Feature Selection and RT Attributes Relative
     Importance for Cocomo81.
                Attributes ranking              First tree level in which the attribute   Percentage of
                                                appears in more than 50% of the trees     trees
                LOC                             Level 0                                   100.00%
                Development mode
                Required software reliability   Level 1                                   90.00%
                Modern programing practices
                Time constraint for cpu         Level 2                                   73.33%
                Data base size                  Level 2                                   83.34%
                Main memory constraint
                Turnaround time
                Programmers capability
                Analysts capability
                Language experience
                Virtual machine experience
                Schedule constraint
                Application experience          Level 2                                   66.67%
                Use of software tools
                Machine volatility




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk        Ensembles for Software Effort Estimation     15 / 22
Why Were Bag+MLPs Singled Out

            Hypothesis: bag+MLPs may have lead to a more adequate
            level of diversity.
            If we use correlation as the diversity measure, we can see that
            bag+MLPs usually had more moderate values when it was the
            1st or 2nd ranked MMRE method.
            However, the correlation between diversity and MMRE was
            usually quite low.
  Table:  Correlation Considering Data Sets in which
  Bag+MLPs Were Ranked 1st or 2nd.                           Table:    Correlation Considering All Data Sets.

         Approach       Correlation interval                     Approach       Correlation interval
                        across different data sets                               across different data sets
         Bag+MLP        0.74-0.92                                Bag+MLP        0.47-0.98
         Bag+RBF        0.40-0.83                                Bag+RBF        0.40-0.83
         Bag+RT         0.51-0.81                                Bag+RT         0.37-0.88
         NCL+MLP        0.59-1.00                                NCL+MLP        0.59-1.00
         Rand+MLP       0.93-1.00                                Rand+MLP       0.93-1.00




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation                16 / 22
Taking a Closer Look...




     Table: Correlations between ensemble covariance (diversity) and
     train/test MMRE for the data sets in which bag+MLP obtained the best
     MMREs and was ranked 1st or 2nd against the data sets in which it
     obtained the worst MMREs.
                                                           Cov. vs       Cov. vs
                                                         Test MMRE     Train MMRE
                              Best MMRE (desharnais)         0.24          0.14
                              2nd best MMRE (org2)           0.70           0.38
                              2nd worst MMRE (org7)         -0.42          -0.37
                              Worst MMRE (cocomo2)          -0.99          -0.99




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk     Ensembles for Software Effort Estimation   17 / 22
Taking a Closer Look...


     Table: Correlations between ensemble covariance (diversity) and
     train/test MMRE for the data sets in which bag+MLP obtained the best
     MMREs and was ranked 1st or 2nd against the data sets in which it
     obtained the worst MMREs.
                                                           Cov. vs       Cov. vs
                                                         Test MMRE     Train MMRE
                              Best MMRE (desharnais)         0.24          0.14
                              2nd best MMRE (org2)           0.70           0.38
                              2nd worst MMRE (org7)         -0.42          -0.37
                              Worst MMRE (cocomo2)          -0.99          -0.99




     Diversity is not only affected by the ensemble method, but also by
     the data set:
            Software effort estimation data sets are very different from
            each other.


Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk     Ensembles for Software Effort Estimation   17 / 22
Taking a Closer Look...


     Table: Correlations between ensemble covariance (diversity) and
     train/test MMRE for the data sets in which bag+MLP obtained the best
     MMREs and was ranked 1st or 2nd against the data sets in which it
     obtained the worst MMREs.
                                                           Cov. vs       Cov. vs
                                                         Test MMRE     Train MMRE
                              Best MMRE (desharnais)         0.24          0.14
                              2nd best MMRE (org2)           0.70           0.38
                              2nd worst MMRE (org7)         -0.42          -0.37
                              Worst MMRE (cocomo2)          -0.99          -0.99




     Correlation between diversity and performance on test set follows
     tendency on train set.
            Why do we have a negative correlation in the worst cases?
            Could a method that self-adapts diversity help to improve
            estimations? How?

Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk     Ensembles for Software Effort Estimation   17 / 22
Research Questions – Revisited


     Question 2
     If a particular method is singled out, what insight on how to
     improve effort estimations can we gain by analysing its behaviour
     and the reasons for its better performance?
            RTs give more importance to more important features.
            Weighting attributes may be helpful when using ML for
            software effort estimation.
            Ensembles seem to have more room for improvement for
            software effort estimation.
            A method to self-adapt diversity might help to improve
            estimations.



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   18 / 22
Research Questions – Revisited


     Question 3
     How can someone determine what model to be used considering a
     particular data set?
            Effort estimation data sets affect dramatically the behaviour
            and performance of different learning machines, even
            considering ensembles.
            So, it would be necessary to run experiments (parameters
            choice is important) using existing data from a particular
            company to determine what method is likely to be the best.
            If the software manager does not have enough knowledge of
            the models, RTs are a good choice.



Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   19 / 22
Risk Analysis


     The learning machines singled out (RTs and bagging+MLPs) were
     further tested using the outlier projects.
            MMRE similar or lower (better), usually better than for
            outliers-free data sets.
            PRED(25) similar or lower (worse), usually lower.


     Even though outliers are projects to which the learning machines
     have more difficulties in predicting within 25% of the actual effort,
     they are not the projects to which they give the worst estimates.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   20 / 22
Risk Analysis


     The learning machines singled out (RTs and bagging+MLPs) were
     further tested using the outlier projects.
            MMRE similar or lower (better), usually better than for
            outliers-free data sets.
            PRED(25) similar or lower (worse), usually lower.


     Even though outliers are projects to which the learning machines
     have more difficulties in predicting within 25% of the actual effort,
     they are not the projects to which they give the worst estimates.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   20 / 22
Conclusions and Future Work

            RQ1 – readily available ensembles do not provide generally
            better effort estimations.
                    Principled experiments (parameters, statistical analysis, several
                    data sets, more ensemble approaches) to deal with validity
                    issues.
            RQ2 – RTs + weighting features; bagging with MLPs + self
            adapting diversity.
                    Insight based on experiments, not just intuition or speculation.
            RQ3 – principled experiments to choose model, RTs if no
            resources.
                    No universally good model, even when using ensembles;
                    parameters choice in framework.
            Future work:
                    Learning feature weights in ML for effort estimation.
                    Can we use self-tuning diversity in ensembles of learning
                    machines to improve estimations?

Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   21 / 22
Conclusions and Future Work

            RQ1 – readily available ensembles do not provide generally
            better effort estimations.
                    Principled experiments (parameters, statistical analysis, several
                    data sets, more ensemble approaches) to deal with validity
                    issues.
            RQ2 – RTs + weighting features; bagging with MLPs + self
            adapting diversity.
                    Insight based on experiments, not just intuition or speculation.
            RQ3 – principled experiments to choose model, RTs if no
            resources.
                    No universally good model, even when using ensembles;
                    parameters choice in framework.
            Future work:
                    Learning feature weights in ML for effort estimation.
                    Can we use self-tuning diversity in ensembles of learning
                    machines to improve estimations?

Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   21 / 22
Conclusions and Future Work

            RQ1 – readily available ensembles do not provide generally
            better effort estimations.
                    Principled experiments (parameters, statistical analysis, several
                    data sets, more ensemble approaches) to deal with validity
                    issues.
            RQ2 – RTs + weighting features; bagging with MLPs + self
            adapting diversity.
                    Insight based on experiments, not just intuition or speculation.
            RQ3 – principled experiments to choose model, RTs if no
            resources.
                    No universally good model, even when using ensembles;
                    parameters choice in framework.
            Future work:
                    Learning feature weights in ML for effort estimation.
                    Can we use self-tuning diversity in ensembles of learning
                    machines to improve estimations?

Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   21 / 22
Conclusions and Future Work

            RQ1 – readily available ensembles do not provide generally
            better effort estimations.
                    Principled experiments (parameters, statistical analysis, several
                    data sets, more ensemble approaches) to deal with validity
                    issues.
            RQ2 – RTs + weighting features; bagging with MLPs + self
            adapting diversity.
                    Insight based on experiments, not just intuition or speculation.
            RQ3 – principled experiments to choose model, RTs if no
            resources.
                    No universally good model, even when using ensembles;
                    parameters choice in framework.
            Future work:
                    Learning feature weights in ML for effort estimation.
                    Can we use self-tuning diversity in ensembles of learning
                    machines to improve estimations?

Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   21 / 22
Acknowledgements




            Search Based Software Engineering (SEBASE) research group.
            Dr. Rami Bahsoon.
            This work was funded by EPSRC grant No. EP/D052785/1.




Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk   Ensembles for Software Effort Estimation   22 / 22

More Related Content

What's hot

應用行動科技紀錄與研究人們日常生活行為與脈絡
應用行動科技紀錄與研究人們日常生活行為與脈絡 應用行動科技紀錄與研究人們日常生活行為與脈絡
應用行動科技紀錄與研究人們日常生活行為與脈絡 Stanley Chang
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
 
Running head finding employment as a java developer
Running head finding employment as a java developer              Running head finding employment as a java developer
Running head finding employment as a java developer DIPESH30
 
IRJET- An Automated Approach to Conduct Pune University’s In-Sem Examination
IRJET- An Automated Approach to Conduct Pune University’s In-Sem ExaminationIRJET- An Automated Approach to Conduct Pune University’s In-Sem Examination
IRJET- An Automated Approach to Conduct Pune University’s In-Sem ExaminationIRJET Journal
 
Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Journal Papers
 
IRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET- Predictive Analytics for Placement of Student- A Comparative StudyIRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET- Predictive Analytics for Placement of Student- A Comparative StudyIRJET Journal
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsbutest
 
Running head cyber security competition framework cyber securi
Running head cyber security competition framework cyber securiRunning head cyber security competition framework cyber securi
Running head cyber security competition framework cyber securiDIPESH30
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET Journal
 
The efficiency examination of teaching of different normalization methods
The efficiency examination of teaching of different normalization methodsThe efficiency examination of teaching of different normalization methods
The efficiency examination of teaching of different normalization methodsijdms
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Bhagyashree Deokar
 
Multilevel analysis of factors
Multilevel analysis of factorsMultilevel analysis of factors
Multilevel analysis of factorsIJITE
 
First Year Report, PhD presentation
First Year Report, PhD presentationFirst Year Report, PhD presentation
First Year Report, PhD presentationBang Xiang Yong
 
Hybrid Classifier for Sentiment Analysis using Effective Pipelining
Hybrid Classifier for Sentiment Analysis using Effective PipeliningHybrid Classifier for Sentiment Analysis using Effective Pipelining
Hybrid Classifier for Sentiment Analysis using Effective PipeliningIRJET Journal
 

What's hot (16)

應用行動科技紀錄與研究人們日常生活行為與脈絡
應用行動科技紀錄與研究人們日常生活行為與脈絡 應用行動科技紀錄與研究人們日常生活行為與脈絡
應用行動科技紀錄與研究人們日常生活行為與脈絡
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
 
Running head finding employment as a java developer
Running head finding employment as a java developer              Running head finding employment as a java developer
Running head finding employment as a java developer
 
IRJET- An Automated Approach to Conduct Pune University’s In-Sem Examination
IRJET- An Automated Approach to Conduct Pune University’s In-Sem ExaminationIRJET- An Automated Approach to Conduct Pune University’s In-Sem Examination
IRJET- An Automated Approach to Conduct Pune University’s In-Sem Examination
 
Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...Automated exam question set generator using utility based agent and learning ...
Automated exam question set generator using utility based agent and learning ...
 
IRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET- Predictive Analytics for Placement of Student- A Comparative StudyIRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET- Predictive Analytics for Placement of Student- A Comparative Study
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agents
 
Running head cyber security competition framework cyber securi
Running head cyber security competition framework cyber securiRunning head cyber security competition framework cyber securi
Running head cyber security competition framework cyber securi
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 
IRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various CoursesIRJET- Evaluation Technique of Student Performance in various Courses
IRJET- Evaluation Technique of Student Performance in various Courses
 
The efficiency examination of teaching of different normalization methods
The efficiency examination of teaching of different normalization methodsThe efficiency examination of teaching of different normalization methods
The efficiency examination of teaching of different normalization methods
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
Multilevel analysis of factors
Multilevel analysis of factorsMultilevel analysis of factors
Multilevel analysis of factors
 
First Year Report, PhD presentation
First Year Report, PhD presentationFirst Year Report, PhD presentation
First Year Report, PhD presentation
 
Hybrid Classifier for Sentiment Analysis using Effective Pipelining
Hybrid Classifier for Sentiment Analysis using Effective PipeliningHybrid Classifier for Sentiment Analysis using Effective Pipelining
Hybrid Classifier for Sentiment Analysis using Effective Pipelining
 
final
finalfinal
final
 

Similar to Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation"

Analysis of pair Programming Effectiveness in Academic Environment
Analysis of pair Programming Effectiveness in Academic EnvironmentAnalysis of pair Programming Effectiveness in Academic Environment
Analysis of pair Programming Effectiveness in Academic Environmentijcnes
 
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...indexPub
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
 
Automated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringAutomated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringMary Montoya
 
IRJET- Academic Performance Analysis System
IRJET- Academic Performance Analysis SystemIRJET- Academic Performance Analysis System
IRJET- Academic Performance Analysis SystemIRJET Journal
 
Applicability of Extreme Programming In Educational Environment
Applicability of Extreme Programming In Educational EnvironmentApplicability of Extreme Programming In Educational Environment
Applicability of Extreme Programming In Educational EnvironmentCSCJournals
 
A Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data MiningA Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data Miningiosrjce
 
Journal publications
Journal publicationsJournal publications
Journal publicationsSarita30844
 
scopus journal.pdf
scopus journal.pdfscopus journal.pdf
scopus journal.pdfnareshkotra
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
 
A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation  A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation CS, NcState
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
 
Games to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare AdministrationGames to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare AdministrationSeriousGamesAssoc
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 

Similar to Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation" (20)

Analysis of pair Programming Effectiveness in Academic Environment
Analysis of pair Programming Effectiveness in Academic EnvironmentAnalysis of pair Programming Effectiveness in Academic Environment
Analysis of pair Programming Effectiveness in Academic Environment
 
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...
 
Clustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of ProgrammingClustering Students of Computer in Terms of Level of Programming
Clustering Students of Computer in Terms of Level of Programming
 
B05110409
B05110409B05110409
B05110409
 
Automated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringAutomated Thai Online Assignment Scoring
Automated Thai Online Assignment Scoring
 
IRJET- Academic Performance Analysis System
IRJET- Academic Performance Analysis SystemIRJET- Academic Performance Analysis System
IRJET- Academic Performance Analysis System
 
Applicability of Extreme Programming In Educational Environment
Applicability of Extreme Programming In Educational EnvironmentApplicability of Extreme Programming In Educational Environment
Applicability of Extreme Programming In Educational Environment
 
G017224349
G017224349G017224349
G017224349
 
A Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data MiningA Survey on Research work in Educational Data Mining
A Survey on Research work in Educational Data Mining
 
Journal publications
Journal publicationsJournal publications
Journal publications
 
IJMERT.pdf
IJMERT.pdfIJMERT.pdf
IJMERT.pdf
 
scopus journal.pdf
scopus journal.pdfscopus journal.pdf
scopus journal.pdf
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation  A Hybrid Approach to Expert and Model Based Effort Estimation
A Hybrid Approach to Expert and Model Based Effort Estimation
 
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...
 
Games to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare AdministrationGames to Improve Clinical Practice and Healthcare Administration
Games to Improve Clinical Practice and Healthcare Administration
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Mangai
MangaiMangai
Mangai
 
Mangai
MangaiMangai
Mangai
 

More from CS, NcState

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 

More from CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 

Recently uploaded

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation"

  • 1. A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk CERCIA, School of Computer Science, The University of Birmingham Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 1 / 22
  • 2. Outline Introduction (Background and Motivation) Research Questions (Aims) Experiments (Method and Results) Answers to Research Questions (Conclusions) Future Work Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 2 / 22
  • 3. Introduction Software cost estimation: Set of techniques and procedures that an organisation uses to arrive at an estimate. Major contributing factor is effort (in person-hours, person-month, etc). Overestimation vs. underestimation. Several software cost/effort estimation models have been proposed. ML models have been receiving increased attention: They make no or minimal assumptions about the data and the function being modelled. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 3 / 22
  • 4. Introduction Ensembles of Learning Machines are groups of learning machines trained to perform the same task and combined with the aim of improving predictive performance. Studies comparing ensembles against single learners in software effort estimation are contradictory: Braga et al IJCNN’07 claims that Bagging improves a bit effort estimations produced by single learners. Kultur et al KBS’09 claims that an adapted Bagging provides large improvements. Kocaguneli et al ISSRE’09 claims that combining different learners does not improve effort estimations. These studies either miss statistical tests or do not present the parameters choice. None of them analyse the reason for the achieved results. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
  • 5. Introduction Ensembles of Learning Machines are groups of learning machines trained to perform the same task and combined with the aim of improving predictive performance. Studies comparing ensembles against single learners in software effort estimation are contradictory: Braga et al IJCNN’07 claims that Bagging improves a bit effort estimations produced by single learners. Kultur et al KBS’09 claims that an adapted Bagging provides large improvements. Kocaguneli et al ISSRE’09 claims that combining different learners does not improve effort estimations. These studies either miss statistical tests or do not present the parameters choice. None of them analyse the reason for the achieved results. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
  • 6. Introduction Ensembles of Learning Machines are groups of learning machines trained to perform the same task and combined with the aim of improving predictive performance. Studies comparing ensembles against single learners in software effort estimation are contradictory: Braga et al IJCNN’07 claims that Bagging improves a bit effort estimations produced by single learners. Kultur et al KBS’09 claims that an adapted Bagging provides large improvements. Kocaguneli et al ISSRE’09 claims that combining different learners does not improve effort estimations. These studies either miss statistical tests or do not present the parameters choice. None of them analyse the reason for the achieved results. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
  • 7. Introduction Ensembles of Learning Machines are groups of learning machines trained to perform the same task and combined with the aim of improving predictive performance. Studies comparing ensembles against single learners in software effort estimation are contradictory: Braga et al IJCNN’07 claims that Bagging improves a bit effort estimations produced by single learners. Kultur et al KBS’09 claims that an adapted Bagging provides large improvements. Kocaguneli et al ISSRE’09 claims that combining different learners does not improve effort estimations. These studies either miss statistical tests or do not present the parameters choice. None of them analyse the reason for the achieved results. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
  • 8. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 9. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? The current studies are contradictory. They either do not perform statistical comparisons or do not explain the parameters choice. It would be worth to investigate the use of different ensemble approaches. We build upon current work by considering these points. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 10. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Question 2 If a particular method is singled out, what insight on how to improve effort estimations can we gain by analysing its behaviour and the reasons for its better performance? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 11. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Question 2 If a particular method is singled out, what insight on how to improve effort estimations can we gain by analysing its behaviour and the reasons for its better performance? Principled experiments, not just intuition or speculations. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 12. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Question 2 If a particular method is singled out, what insight on how to improve effort estimations can we gain by analysing its behaviour and the reasons for its better performance? Question 3 How can someone determine what model to be used considering a particular data set? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 13. Research Questions Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Question 2 If a particular method is singled out, what insight on how to improve effort estimations can we gain by analysing its behaviour and the reasons for its better performance? Question 3 How can someone determine what model to be used considering a particular data set? Our study complements previous work, parameters choice is important. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
  • 14. Data Sets and Preprocessing Data sets: cocomo81, nasa93, nasa, cocomo2, desharnais, 7 ISBSG organization type subsets. Cover a wide range of features. In particular, ISBSG subsets’ productivity rate is statistically different. Attributes: cocomo attributes for PROMISE data, functional size, development type and language type for ISBSG. Missing values: delete for PROMISE, k-NN imputation for ISBSG. Outliers: K-means detection / elimination. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 6 / 22
  • 15. Experimental Framework – Step 1: choice of learning machines Single learners: MultiLayer Perceptrons (MLPs) – universal approximators; Radial Basis Function networks (RBFs) – local learning; and Regression Trees (RTs) – simple and comprehensive. Ensemble learners: Bagging with MLPs, with RBFs and with RTs – widely and successfully used; Random with MLPs – use full training set for each learner; and Negative Correlation Learning (NCL) with MLPs – regression. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 7 / 22
  • 16. Experimental Framework – Step 2: choice of evaluation method Executions were done in 30 rounds, 10 projects for testing and remaining for training, as suggested by Menzies et al. TSE’06. Evaluation was done in two steps: 1 Menzies et al. TSE’06’s survival rejection rules: If MMREs are significantly different according to a paired t-test with 95% of confidence, the best model is the one with the lowest average MMRE. If not, the best method is the one with the best: 1 Correlation 2 Standard deviation 3 PRED(N) 4 Number of attributes 2 Wilcoxon tests with 95% of confidence to compare the two methods more often among the best in terms of MMRE and PRED(25). Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
  • 17. Experimental Framework – Step 2: choice of evaluation method Executions were done in 30 rounds, 10 projects for testing and remaining for training, as suggested by Menzies et al. TSE’06. Evaluation was done in two steps: 1 Menzies et al. TSE’06’s survival rejection rules: If MMREs are significantly different according to a paired t-test with 95% of confidence, the best model is the one with the lowest average MMRE. If not, the best method is the one with the best: 1 Correlation 2 Standard deviation 3 PRED(N) 4 Number of attributes 2 Wilcoxon tests with 95% of confidence to compare the two methods more often among the best in terms of MMRE and PRED(25). Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
  • 18. Experimental Framework – Step 2: choice of evaluation method Executions were done in 30 rounds, 10 projects for testing and remaining for training, as suggested by Menzies et al. TSE’06. Evaluation was done in two steps: 1 Menzies et al. TSE’06’s survival rejection rules: If MMREs are significantly different according to a paired t-test with 95% of confidence, the best model is the one with the lowest average MMRE. If not, the best method is the one with the best: 1 Correlation 2 Standard deviation 3 PRED(N) 4 Number of attributes 2 Wilcoxon tests with 95% of confidence to compare the two methods more often among the best in terms of MMRE and PRED(25). Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
  • 19. Experimental Framework – Step 2: choice of evaluation method Mean Magnitude of the Relative Error |predictedi −actuali | M M RE = T T M REi , where M REi = 1 i=1 actuali Percentage of estimations within N % of the actual values N 1, if M REi ≤ 100 P RED(N ) = T T1 i=1 0, otherwise Correlation between estimated and actual effort: S CORR = √ pa , where Sp Sa T i=1 (predictedi −¯)(actuali −¯) p a Spa = T −1 T (predictedi −¯)2 p T (actuali −¯)2 a Sp = i=1 T −1 , Sa = i=1 T −1 , T predictedi T actuali p= ¯ i=1 T , a=¯ i=1 T . Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
  • 20. Experimental Framework – Step 2: choice of evaluation method Mean Magnitude of the Relative Error |predictedi −actuali | M M RE = T T M REi , where M REi = 1 i=1 actuali Percentage of estimations within N % of the actual values N 1, if M REi ≤ 100 P RED(N ) = T T1 i=1 0, otherwise Correlation between estimated and actual effort: S CORR = √ pa , where Sp Sa T i=1 (predictedi −¯)(actuali −¯) p a Spa = T −1 T (predictedi −¯)2 p T (actuali −¯)2 a Sp = i=1 T −1 , Sa = i=1 T −1 , T predictedi T actuali p= ¯ i=1 T , a=¯ i=1 T . Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
  • 21. Experimental Framework – Step 2: choice of evaluation method Mean Magnitude of the Relative Error |predictedi −actuali | M M RE = T T M REi , where M REi = 1 i=1 actuali Percentage of estimations within N % of the actual values N 1, if M REi ≤ 100 P RED(N ) = T T1 i=1 0, otherwise Correlation between estimated and actual effort: S CORR = √ pa , where Sp Sa T i=1 (predictedi −¯)(actuali −¯) p a Spa = T −1 T (predictedi −¯)2 p T (actuali −¯)2 a Sp = i=1 T −1 , Sa = i=1 T −1 , T predictedi T actuali p= ¯ i=1 T , a=¯ i=1 T . Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
  • 22. Experimental Framework – Step 3: choice of parameters Preliminary experiments using 5 runs. Each approach was run with all the combinations of 3 or 5 parameter values. Parameters with the lowest MMRE were chosen for further 30 runs. Base learners will not necessarily have the same parameters as single learners. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 10 / 22
  • 23. Comparison of Learning Machines – Menzies et al. TSE’06’s survival rejection rules Table: Number of Data Sets in which Each Method Survived. Methods that never survived are omitted. PROMISE Data ISBSG Data All Data RT: 2 MLP: 2 RT: 3 Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2 NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2 Rand + MLP: 1 RT: 1 Bag + RTs: 2 Bag + RBF: 1 MLP: 2 NCL + MLP: 1 Rand + MLP: 1 Bag + RBF: 1 No approach is consistently the best, even considering ensembles! Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
  • 24. Comparison of Learning Machines – Menzies et al. TSE’06’s survival rejection rules Table: Number of Data Sets in which Each Method Survived. Methods that never survived are omitted. PROMISE Data ISBSG Data All Data RT: 2 MLP: 2 RT: 3 Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2 NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2 Rand + MLP: 1 RT: 1 Bag + RTs: 2 Bag + RBF: 1 MLP: 2 NCL + MLP: 1 Rand + MLP: 1 Bag + RBF: 1 No approach is consistently the best, even considering ensembles! Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
  • 25. Comparison of Learning Machines What methods are usually among the best? RTs and bag+MLPs are more Table: Number of Data Sets in which Each Method frequently among the best Was Ranked First or Second According to MMRE and PRED(25). Methods never among the first and second considering MMRE than are omitted. considering PRED(25). (a) Accoding to MMRE The first ranked method’s PROMISE Data ISBSG Data All Data RT: 4 RT: 5 RT: 9 MMRE is statistically different Bag + MLP: 3 Bag + MLP 5 Bag + MLP: 8 Bag + RT: 2 Bag + RBF: 3 Bag + RBF: 3 from the others in 35.16% of MLP: 1 MLP: 1 MLP: 2 the cases. Rand + MLP: 1 Bag + RT: 2 NCL + MLP: 1 Rand + MLP: 1 NCL + MLP: 1 The second ranked method’s MMRE is statistically different (b) Acording to PRED(25) from the lower ranked methods PROMISE Data ISBSG Data All Data Bag + MLP: 3 RT: 5 RT: 6 in 16.67% of the cases. Rand + MLP: 3 Rand + MLP: 3 Rand + MLP: 6 Bag + RT: RT: 2 1 Bag + MLP: MLP: 2 2 Bag + MLP: Bag + RT: 5 3 RTs and bag+MLPs are MLP: 1 RBF: 2 MLP: 3 usually statistically equal in Bag + RBF: 1 RBF: 2 Bag + RT: 1 Bag + RBF: 1 terms of MMRE and PRED(25). Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 12 / 22
  • 26. Research Questions – Revisited Question 1 Do readily available ensemble methods generally improve effort estimations given by single learners? Which of them would be more useful? Even though bag+MLPs is frequently among the best methods, it is statistically similar to RTs. RTs are more comprehensive and have faster training. Bag+MLPs seem to have more potential for improvements. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 13 / 22
  • 27. Why Were RTs Singled Out? Hypothesis: As RTs have splits based on information gain, they may work in such a way to give more importance for more relevant attributes. A further study using correlation-based feature selection revealed that RTs usually put higher features higher ranked by the feature selection method in higher level splits of the tree. Feature selection by itself was not able to always improve accuracy. It may be important to give weights to features when using ML approaches. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 14 / 22
  • 28. Why Were RTs Singled Out? Table: Correlation-Based Feature Selection and RT Attributes Relative Importance for Cocomo81. Attributes ranking First tree level in which the attribute Percentage of appears in more than 50% of the trees trees LOC Level 0 100.00% Development mode Required software reliability Level 1 90.00% Modern programing practices Time constraint for cpu Level 2 73.33% Data base size Level 2 83.34% Main memory constraint Turnaround time Programmers capability Analysts capability Language experience Virtual machine experience Schedule constraint Application experience Level 2 66.67% Use of software tools Machine volatility Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 15 / 22
  • 29. Why Were Bag+MLPs Singled Out Hypothesis: bag+MLPs may have lead to a more adequate level of diversity. If we use correlation as the diversity measure, we can see that bag+MLPs usually had more moderate values when it was the 1st or 2nd ranked MMRE method. However, the correlation between diversity and MMRE was usually quite low. Table: Correlation Considering Data Sets in which Bag+MLPs Were Ranked 1st or 2nd. Table: Correlation Considering All Data Sets. Approach Correlation interval Approach Correlation interval across different data sets across different data sets Bag+MLP 0.74-0.92 Bag+MLP 0.47-0.98 Bag+RBF 0.40-0.83 Bag+RBF 0.40-0.83 Bag+RT 0.51-0.81 Bag+RT 0.37-0.88 NCL+MLP 0.59-1.00 NCL+MLP 0.59-1.00 Rand+MLP 0.93-1.00 Rand+MLP 0.93-1.00 Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 16 / 22
  • 30. Taking a Closer Look... Table: Correlations between ensemble covariance (diversity) and train/test MMRE for the data sets in which bag+MLP obtained the best MMREs and was ranked 1st or 2nd against the data sets in which it obtained the worst MMREs. Cov. vs Cov. vs Test MMRE Train MMRE Best MMRE (desharnais) 0.24 0.14 2nd best MMRE (org2) 0.70 0.38 2nd worst MMRE (org7) -0.42 -0.37 Worst MMRE (cocomo2) -0.99 -0.99 Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
  • 31. Taking a Closer Look... Table: Correlations between ensemble covariance (diversity) and train/test MMRE for the data sets in which bag+MLP obtained the best MMREs and was ranked 1st or 2nd against the data sets in which it obtained the worst MMREs. Cov. vs Cov. vs Test MMRE Train MMRE Best MMRE (desharnais) 0.24 0.14 2nd best MMRE (org2) 0.70 0.38 2nd worst MMRE (org7) -0.42 -0.37 Worst MMRE (cocomo2) -0.99 -0.99 Diversity is not only affected by the ensemble method, but also by the data set: Software effort estimation data sets are very different from each other. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
  • 32. Taking a Closer Look... Table: Correlations between ensemble covariance (diversity) and train/test MMRE for the data sets in which bag+MLP obtained the best MMREs and was ranked 1st or 2nd against the data sets in which it obtained the worst MMREs. Cov. vs Cov. vs Test MMRE Train MMRE Best MMRE (desharnais) 0.24 0.14 2nd best MMRE (org2) 0.70 0.38 2nd worst MMRE (org7) -0.42 -0.37 Worst MMRE (cocomo2) -0.99 -0.99 Correlation between diversity and performance on test set follows tendency on train set. Why do we have a negative correlation in the worst cases? Could a method that self-adapts diversity help to improve estimations? How? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
  • 33. Research Questions – Revisited Question 2 If a particular method is singled out, what insight on how to improve effort estimations can we gain by analysing its behaviour and the reasons for its better performance? RTs give more importance to more important features. Weighting attributes may be helpful when using ML for software effort estimation. Ensembles seem to have more room for improvement for software effort estimation. A method to self-adapt diversity might help to improve estimations. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 18 / 22
  • 34. Research Questions – Revisited Question 3 How can someone determine what model to be used considering a particular data set? Effort estimation data sets affect dramatically the behaviour and performance of different learning machines, even considering ensembles. So, it would be necessary to run experiments (parameters choice is important) using existing data from a particular company to determine what method is likely to be the best. If the software manager does not have enough knowledge of the models, RTs are a good choice. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 19 / 22
  • 35. Risk Analysis The learning machines singled out (RTs and bagging+MLPs) were further tested using the outlier projects. MMRE similar or lower (better), usually better than for outliers-free data sets. PRED(25) similar or lower (worse), usually lower. Even though outliers are projects to which the learning machines have more difficulties in predicting within 25% of the actual effort, they are not the projects to which they give the worst estimates. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
  • 36. Risk Analysis The learning machines singled out (RTs and bagging+MLPs) were further tested using the outlier projects. MMRE similar or lower (better), usually better than for outliers-free data sets. PRED(25) similar or lower (worse), usually lower. Even though outliers are projects to which the learning machines have more difficulties in predicting within 25% of the actual effort, they are not the projects to which they give the worst estimates. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
  • 37. Conclusions and Future Work RQ1 – readily available ensembles do not provide generally better effort estimations. Principled experiments (parameters, statistical analysis, several data sets, more ensemble approaches) to deal with validity issues. RQ2 – RTs + weighting features; bagging with MLPs + self adapting diversity. Insight based on experiments, not just intuition or speculation. RQ3 – principled experiments to choose model, RTs if no resources. No universally good model, even when using ensembles; parameters choice in framework. Future work: Learning feature weights in ML for effort estimation. Can we use self-tuning diversity in ensembles of learning machines to improve estimations? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
  • 38. Conclusions and Future Work RQ1 – readily available ensembles do not provide generally better effort estimations. Principled experiments (parameters, statistical analysis, several data sets, more ensemble approaches) to deal with validity issues. RQ2 – RTs + weighting features; bagging with MLPs + self adapting diversity. Insight based on experiments, not just intuition or speculation. RQ3 – principled experiments to choose model, RTs if no resources. No universally good model, even when using ensembles; parameters choice in framework. Future work: Learning feature weights in ML for effort estimation. Can we use self-tuning diversity in ensembles of learning machines to improve estimations? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
  • 39. Conclusions and Future Work RQ1 – readily available ensembles do not provide generally better effort estimations. Principled experiments (parameters, statistical analysis, several data sets, more ensemble approaches) to deal with validity issues. RQ2 – RTs + weighting features; bagging with MLPs + self adapting diversity. Insight based on experiments, not just intuition or speculation. RQ3 – principled experiments to choose model, RTs if no resources. No universally good model, even when using ensembles; parameters choice in framework. Future work: Learning feature weights in ML for effort estimation. Can we use self-tuning diversity in ensembles of learning machines to improve estimations? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
  • 40. Conclusions and Future Work RQ1 – readily available ensembles do not provide generally better effort estimations. Principled experiments (parameters, statistical analysis, several data sets, more ensemble approaches) to deal with validity issues. RQ2 – RTs + weighting features; bagging with MLPs + self adapting diversity. Insight based on experiments, not just intuition or speculation. RQ3 – principled experiments to choose model, RTs if no resources. No universally good model, even when using ensembles; parameters choice in framework. Future work: Learning feature weights in ML for effort estimation. Can we use self-tuning diversity in ensembles of learning machines to improve estimations? Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
  • 41. Acknowledgements Search Based Software Engineering (SEBASE) research group. Dr. Rami Bahsoon. This work was funded by EPSRC grant No. EP/D052785/1. Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 22 / 22