The document discusses evaluating ensembles of learning machines for software effort estimation. It aims to determine if ensemble methods improve upon single learners, which ensembles perform best, and how to select models for different datasets. The study uses several public software effort estimation datasets and evaluates multiple ensemble techniques, including bagging and negative correlation learning, against single learners like decision trees. Statistical tests are used to rigorously compare the performance of different models.
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Promise 2011: "A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation"
1. A Principled Evaluation of Ensembles of Learning
Machines for Software Effort Estimation
Leandro Minku, Xin Yao
{L.L.Minku,X.Yao}@cs.bham.ac.uk
CERCIA, School of Computer Science, The University of Birmingham
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 1 / 22
2. Outline
Introduction (Background and Motivation)
Research Questions (Aims)
Experiments (Method and Results)
Answers to Research Questions (Conclusions)
Future Work
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 2 / 22
3. Introduction
Software cost estimation:
Set of techniques and procedures that an organisation uses to
arrive at an estimate.
Major contributing factor is effort (in person-hours,
person-month, etc).
Overestimation vs. underestimation.
Several software cost/effort estimation models have been proposed.
ML models have been receiving increased attention:
They make no or minimal assumptions about the data and the
function being modelled.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 3 / 22
4. Introduction
Ensembles of Learning Machines are groups of learning machines
trained to perform the same task and combined with the aim of
improving predictive performance.
Studies comparing ensembles against single learners in software
effort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a bit
effort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provides
large improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present the
parameters choice. None of them analyse the reason for the
achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
5. Introduction
Ensembles of Learning Machines are groups of learning machines
trained to perform the same task and combined with the aim of
improving predictive performance.
Studies comparing ensembles against single learners in software
effort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a bit
effort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provides
large improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present the
parameters choice. None of them analyse the reason for the
achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
6. Introduction
Ensembles of Learning Machines are groups of learning machines
trained to perform the same task and combined with the aim of
improving predictive performance.
Studies comparing ensembles against single learners in software
effort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a bit
effort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provides
large improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present the
parameters choice. None of them analyse the reason for the
achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
7. Introduction
Ensembles of Learning Machines are groups of learning machines
trained to perform the same task and combined with the aim of
improving predictive performance.
Studies comparing ensembles against single learners in software
effort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a bit
effort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provides
large improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present the
parameters choice. None of them analyse the reason for the
achieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
8. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
9. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
The current studies are contradictory.
They either do not perform statistical comparisons or do not
explain the parameters choice.
It would be worth to investigate the use of different ensemble
approaches.
We build upon current work by considering these points.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
10. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Question 2
If a particular method is singled out, what insight on how to
improve effort estimations can we gain by analysing its behaviour
and the reasons for its better performance?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
11. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Question 2
If a particular method is singled out, what insight on how to
improve effort estimations can we gain by analysing its behaviour
and the reasons for its better performance?
Principled experiments, not just intuition or speculations.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
12. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Question 2
If a particular method is singled out, what insight on how to
improve effort estimations can we gain by analysing its behaviour
and the reasons for its better performance?
Question 3
How can someone determine what model to be used considering a
particular data set?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
13. Research Questions
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Question 2
If a particular method is singled out, what insight on how to
improve effort estimations can we gain by analysing its behaviour
and the reasons for its better performance?
Question 3
How can someone determine what model to be used considering a
particular data set?
Our study complements previous work, parameters choice is
important.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
14. Data Sets and Preprocessing
Data sets: cocomo81, nasa93, nasa, cocomo2, desharnais, 7
ISBSG organization type subsets.
Cover a wide range of features.
In particular, ISBSG subsets’ productivity rate is statistically
different.
Attributes: cocomo attributes for PROMISE data, functional
size, development type and language type for ISBSG.
Missing values: delete for PROMISE, k-NN imputation for
ISBSG.
Outliers: K-means detection / elimination.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 6 / 22
15. Experimental Framework – Step 1: choice of learning
machines
Single learners:
MultiLayer Perceptrons (MLPs) – universal approximators;
Radial Basis Function networks (RBFs) – local learning; and
Regression Trees (RTs) – simple and comprehensive.
Ensemble learners:
Bagging with MLPs, with RBFs and with RTs – widely and
successfully used;
Random with MLPs – use full training set for each learner; and
Negative Correlation Learning (NCL) with MLPs – regression.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 7 / 22
16. Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing and
remaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:
1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a paired
t-test with 95% of confidence, the best model is the one with
the lowest average MMRE.
If not, the best method is the one with the best:
1 Correlation
2 Standard deviation
3 PRED(N)
4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the two
methods more often among the best in terms of MMRE and
PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
17. Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing and
remaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:
1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a paired
t-test with 95% of confidence, the best model is the one with
the lowest average MMRE.
If not, the best method is the one with the best:
1 Correlation
2 Standard deviation
3 PRED(N)
4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the two
methods more often among the best in terms of MMRE and
PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
18. Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing and
remaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:
1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a paired
t-test with 95% of confidence, the best model is the one with
the lowest average MMRE.
If not, the best method is the one with the best:
1 Correlation
2 Standard deviation
3 PRED(N)
4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the two
methods more often among the best in terms of MMRE and
PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
19. Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative Error
|predictedi −actuali |
M M RE = T T M REi , where M REi =
1
i=1 actuali
Percentage of estimations within N % of the actual values
N
1, if M REi ≤ 100
P RED(N ) = T T1
i=1
0, otherwise
Correlation between estimated and actual effort:
S
CORR = √ pa , where
Sp Sa
T
i=1 (predictedi −¯)(actuali −¯)
p a
Spa = T −1
T (predictedi −¯)2
p T (actuali −¯)2
a
Sp = i=1 T −1 , Sa = i=1 T −1 ,
T predictedi T actuali
p=
¯ i=1 T , a=¯ i=1 T .
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
20. Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative Error
|predictedi −actuali |
M M RE = T T M REi , where M REi =
1
i=1 actuali
Percentage of estimations within N % of the actual values
N
1, if M REi ≤ 100
P RED(N ) = T T1
i=1
0, otherwise
Correlation between estimated and actual effort:
S
CORR = √ pa , where
Sp Sa
T
i=1 (predictedi −¯)(actuali −¯)
p a
Spa = T −1
T (predictedi −¯)2
p T (actuali −¯)2
a
Sp = i=1 T −1 , Sa = i=1 T −1 ,
T predictedi T actuali
p=
¯ i=1 T , a=¯ i=1 T .
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
21. Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative Error
|predictedi −actuali |
M M RE = T T M REi , where M REi =
1
i=1 actuali
Percentage of estimations within N % of the actual values
N
1, if M REi ≤ 100
P RED(N ) = T T1
i=1
0, otherwise
Correlation between estimated and actual effort:
S
CORR = √ pa , where
Sp Sa
T
i=1 (predictedi −¯)(actuali −¯)
p a
Spa = T −1
T (predictedi −¯)2
p T (actuali −¯)2
a
Sp = i=1 T −1 , Sa = i=1 T −1 ,
T predictedi T actuali
p=
¯ i=1 T , a=¯ i=1 T .
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
22. Experimental Framework – Step 3: choice of parameters
Preliminary experiments using 5 runs.
Each approach was run with all the combinations of 3 or 5
parameter values.
Parameters with the lowest MMRE were chosen for further 30
runs.
Base learners will not necessarily have the same parameters as
single learners.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 10 / 22
23. Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules
Table: Number of Data Sets in which Each Method Survived. Methods
that never survived are omitted.
PROMISE Data ISBSG Data All Data
RT: 2 MLP: 2 RT: 3
Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2
NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2
Rand + MLP: 1 RT: 1 Bag + RTs: 2
Bag + RBF: 1 MLP: 2
NCL + MLP: 1 Rand + MLP: 1
Bag + RBF: 1
No approach is consistently the best, even considering
ensembles!
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
24. Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules
Table: Number of Data Sets in which Each Method Survived. Methods
that never survived are omitted.
PROMISE Data ISBSG Data All Data
RT: 2 MLP: 2 RT: 3
Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2
NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2
Rand + MLP: 1 RT: 1 Bag + RTs: 2
Bag + RBF: 1 MLP: 2
NCL + MLP: 1 Rand + MLP: 1
Bag + RBF: 1
No approach is consistently the best, even considering
ensembles!
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
25. Comparison of Learning Machines
What methods are usually among
the best?
RTs and bag+MLPs are more
Table: Number of Data Sets in which Each Method frequently among the best
Was Ranked First or Second According to MMRE and
PRED(25). Methods never among the first and second
considering MMRE than
are omitted. considering PRED(25).
(a) Accoding to MMRE
The first ranked method’s
PROMISE Data ISBSG Data All Data
RT: 4 RT: 5 RT: 9 MMRE is statistically different
Bag + MLP: 3 Bag + MLP 5 Bag + MLP: 8
Bag + RT: 2 Bag + RBF: 3 Bag + RBF: 3
from the others in 35.16% of
MLP: 1 MLP: 1 MLP: 2 the cases.
Rand + MLP: 1 Bag + RT: 2
NCL + MLP: 1 Rand + MLP: 1
NCL + MLP: 1 The second ranked method’s
MMRE is statistically different
(b) Acording to PRED(25)
from the lower ranked methods
PROMISE Data ISBSG Data All Data
Bag + MLP: 3 RT: 5 RT: 6 in 16.67% of the cases.
Rand + MLP: 3 Rand + MLP: 3 Rand + MLP: 6
Bag + RT:
RT:
2
1
Bag + MLP:
MLP:
2
2
Bag + MLP:
Bag + RT:
5
3
RTs and bag+MLPs are
MLP: 1 RBF: 2 MLP: 3 usually statistically equal in
Bag + RBF: 1 RBF: 2
Bag + RT: 1 Bag + RBF: 1 terms of MMRE and
PRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 12 / 22
26. Research Questions – Revisited
Question 1
Do readily available ensemble methods generally improve effort
estimations given by single learners? Which of them would be
more useful?
Even though bag+MLPs is frequently among the best
methods, it is statistically similar to RTs.
RTs are more comprehensive and have faster training.
Bag+MLPs seem to have more potential for improvements.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 13 / 22
27. Why Were RTs Singled Out?
Hypothesis: As RTs have splits based on information gain,
they may work in such a way to give more importance for
more relevant attributes.
A further study using correlation-based feature selection
revealed that RTs usually put higher features higher ranked by
the feature selection method in higher level splits of the tree.
Feature selection by itself was not able to always improve
accuracy.
It may be important to give weights to features when using ML
approaches.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 14 / 22
28. Why Were RTs Singled Out?
Table: Correlation-Based Feature Selection and RT Attributes Relative
Importance for Cocomo81.
Attributes ranking First tree level in which the attribute Percentage of
appears in more than 50% of the trees trees
LOC Level 0 100.00%
Development mode
Required software reliability Level 1 90.00%
Modern programing practices
Time constraint for cpu Level 2 73.33%
Data base size Level 2 83.34%
Main memory constraint
Turnaround time
Programmers capability
Analysts capability
Language experience
Virtual machine experience
Schedule constraint
Application experience Level 2 66.67%
Use of software tools
Machine volatility
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 15 / 22
29. Why Were Bag+MLPs Singled Out
Hypothesis: bag+MLPs may have lead to a more adequate
level of diversity.
If we use correlation as the diversity measure, we can see that
bag+MLPs usually had more moderate values when it was the
1st or 2nd ranked MMRE method.
However, the correlation between diversity and MMRE was
usually quite low.
Table: Correlation Considering Data Sets in which
Bag+MLPs Were Ranked 1st or 2nd. Table: Correlation Considering All Data Sets.
Approach Correlation interval Approach Correlation interval
across different data sets across different data sets
Bag+MLP 0.74-0.92 Bag+MLP 0.47-0.98
Bag+RBF 0.40-0.83 Bag+RBF 0.40-0.83
Bag+RT 0.51-0.81 Bag+RT 0.37-0.88
NCL+MLP 0.59-1.00 NCL+MLP 0.59-1.00
Rand+MLP 0.93-1.00 Rand+MLP 0.93-1.00
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 16 / 22
30. Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) and
train/test MMRE for the data sets in which bag+MLP obtained the best
MMREs and was ranked 1st or 2nd against the data sets in which it
obtained the worst MMREs.
Cov. vs Cov. vs
Test MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.14
2nd best MMRE (org2) 0.70 0.38
2nd worst MMRE (org7) -0.42 -0.37
Worst MMRE (cocomo2) -0.99 -0.99
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
31. Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) and
train/test MMRE for the data sets in which bag+MLP obtained the best
MMREs and was ranked 1st or 2nd against the data sets in which it
obtained the worst MMREs.
Cov. vs Cov. vs
Test MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.14
2nd best MMRE (org2) 0.70 0.38
2nd worst MMRE (org7) -0.42 -0.37
Worst MMRE (cocomo2) -0.99 -0.99
Diversity is not only affected by the ensemble method, but also by
the data set:
Software effort estimation data sets are very different from
each other.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
32. Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) and
train/test MMRE for the data sets in which bag+MLP obtained the best
MMREs and was ranked 1st or 2nd against the data sets in which it
obtained the worst MMREs.
Cov. vs Cov. vs
Test MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.14
2nd best MMRE (org2) 0.70 0.38
2nd worst MMRE (org7) -0.42 -0.37
Worst MMRE (cocomo2) -0.99 -0.99
Correlation between diversity and performance on test set follows
tendency on train set.
Why do we have a negative correlation in the worst cases?
Could a method that self-adapts diversity help to improve
estimations? How?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
33. Research Questions – Revisited
Question 2
If a particular method is singled out, what insight on how to
improve effort estimations can we gain by analysing its behaviour
and the reasons for its better performance?
RTs give more importance to more important features.
Weighting attributes may be helpful when using ML for
software effort estimation.
Ensembles seem to have more room for improvement for
software effort estimation.
A method to self-adapt diversity might help to improve
estimations.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 18 / 22
34. Research Questions – Revisited
Question 3
How can someone determine what model to be used considering a
particular data set?
Effort estimation data sets affect dramatically the behaviour
and performance of different learning machines, even
considering ensembles.
So, it would be necessary to run experiments (parameters
choice is important) using existing data from a particular
company to determine what method is likely to be the best.
If the software manager does not have enough knowledge of
the models, RTs are a good choice.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 19 / 22
35. Risk Analysis
The learning machines singled out (RTs and bagging+MLPs) were
further tested using the outlier projects.
MMRE similar or lower (better), usually better than for
outliers-free data sets.
PRED(25) similar or lower (worse), usually lower.
Even though outliers are projects to which the learning machines
have more difficulties in predicting within 25% of the actual effort,
they are not the projects to which they give the worst estimates.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
36. Risk Analysis
The learning machines singled out (RTs and bagging+MLPs) were
further tested using the outlier projects.
MMRE similar or lower (better), usually better than for
outliers-free data sets.
PRED(25) similar or lower (worse), usually lower.
Even though outliers are projects to which the learning machines
have more difficulties in predicting within 25% of the actual effort,
they are not the projects to which they give the worst estimates.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
37. Conclusions and Future Work
RQ1 – readily available ensembles do not provide generally
better effort estimations.
Principled experiments (parameters, statistical analysis, several
data sets, more ensemble approaches) to deal with validity
issues.
RQ2 – RTs + weighting features; bagging with MLPs + self
adapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if no
resources.
No universally good model, even when using ensembles;
parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.
Can we use self-tuning diversity in ensembles of learning
machines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
38. Conclusions and Future Work
RQ1 – readily available ensembles do not provide generally
better effort estimations.
Principled experiments (parameters, statistical analysis, several
data sets, more ensemble approaches) to deal with validity
issues.
RQ2 – RTs + weighting features; bagging with MLPs + self
adapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if no
resources.
No universally good model, even when using ensembles;
parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.
Can we use self-tuning diversity in ensembles of learning
machines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
39. Conclusions and Future Work
RQ1 – readily available ensembles do not provide generally
better effort estimations.
Principled experiments (parameters, statistical analysis, several
data sets, more ensemble approaches) to deal with validity
issues.
RQ2 – RTs + weighting features; bagging with MLPs + self
adapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if no
resources.
No universally good model, even when using ensembles;
parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.
Can we use self-tuning diversity in ensembles of learning
machines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
40. Conclusions and Future Work
RQ1 – readily available ensembles do not provide generally
better effort estimations.
Principled experiments (parameters, statistical analysis, several
data sets, more ensemble approaches) to deal with validity
issues.
RQ2 – RTs + weighting features; bagging with MLPs + self
adapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if no
resources.
No universally good model, even when using ensembles;
parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.
Can we use self-tuning diversity in ensembles of learning
machines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
41. Acknowledgements
Search Based Software Engineering (SEBASE) research group.
Dr. Rami Bahsoon.
This work was funded by EPSRC grant No. EP/D052785/1.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 22 / 22