Multi-Objective Cross-Project Defect Prediction

Gerardo
Canfora
Andrea De
Lucia
Massimiliano
Di Penta
Rocco
Oliveto
Annibale
Panichella
Sebastiano
Panichella
Multi-Objective Cross-Project
Defect Prediction

Practical Constraints
Sofwtare
Quality
Money
Time

Defect Prediction
Spent more resources
on components
most likely to fail

Indicators of defects
Cached history
information
Kim at al.
ICSE 2007
Change Metrics
Moser at al.
ICSE 2008.
A metrics suite
for object
oriented design
Chidamber at al.
TSE 1994

Defect Prediction Methodology
Predicting
Model
Project
Test Set
Training Set
Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …

Predicting
Model
Project
Test Set
Training Set
Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Predicting
Model
Test Set
Training Set
Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Past Projects
New Project

Project B
Project A
Predicting
Model
Project
Test Set
Training Set
Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …
Predicting
Model
Test Set
Training Set
Defect
Prone
Class1 YES
Class2 YES
Class3 NO
… YES
ClassN …

Cost Effectiveness
1) Cross-project does not
necessarily works worse
than within-project
2) Better precision (accuracy)
does not mirror less
inspection cost
3) Traditional predicting
model: logistic regression
Recaling the “imprecision” of Cross-
project Defect Prediction, Rahman at
al. FSE 2012

Cost Effectiveness: example
Class A Class B Class C Class D

Cost Effectiveness: example
Predicting model 1
Class A Class B Class A Class C Class D
100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Predicting model 2

Cost Effectiveness: an example
Predicting model 1
100
LOC
10,000
LOC
100
LOC
100
LOC
100
LOC
Predicting model 2

Multi-objective
Logistic Regression

Building Predicting Model on Training Set
Training Set
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …
Logistic Regression
a + b mi1 + c mi2 + ….
1 e
e
Pred


a + b mi1 + c mi2 + …
Pred.
C1
C2
C3
C4
…

Training Set
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …
Logistic Regression
2 + 3 mi1 + 4 mi2 + …
2 + 3 mi1 + 4 mi2 + …
Pred.
C1 1
C2 1
C3 0
C4 1
… 0
.
1 e
e
Pred


Logistic Regression
1 - 2 mi1 + 1 mi2 + …
1 - 2 mi1 + 1 mi2 + …
Pred.
C1 0
C2 0
C3 1
C4 1
… 1
.
1 e
e
Pred



Training Set
Logistic
Regression
Pred.
C1 1
C2 1
C3 0
C4 1
… 0
Actual Val
C1 1
C2 0
C3 1
C4 1
… 0
Comparison
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …

Training Set
Logistic
Regression
Pred.
C1 1
C2 1
C3 0
C4 1
… 0
Actual Val
C1 1
C2 0
C3 1
C4 1
… 0
Comparison
P1 P2 …
Class1 m11 m12 …
Class2 m21 m22 …
Class3 m31 m32 …
Class4 … … …
… … … …
GOAL: minimazing
the predicting error
(PRECISION)

Multi-objective Logistic Regression
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
* =
Cost
100
0
…
110
0
Ispection Cost = 210 LOC

Pred.
1
0
…
1
0
LOC
100
95
…
110
10
* =
Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
* =
#Bug
1
0
…
1
0
Effectiveness = 2 defects










i
ii
i
i
i
ActualPredessEffectiven
LOCPredCostmin
max
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
* =
Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
* =
#Bug
1
0
…
1
0










i
ii
i
i
i
ActualPredessEffectiven
LOCPredCost
min
max
Pred.
1
0
…
1
0
LOC
100
95
…
110
10
* =
Cost
100
0
…
110
0
Pred.
1
0
…
1
0
Actual
Values
1
1
…
1
0
* =
#Bug
1
0
…
1
0

a + b mi1 + c mi2 + …
Multi-objective Genetic Algorithm









i
ii
i
i
i
ActualedessEffectiven
CostPredCostIspection
Pr
min
max
.
1 e
e
Pred


a + b mi1 + c mi2 + …
Chromosome (a, b,c , …)
Fitness Function
Multiple objectives are
optimized using Pareto
efficient approaches

Multi-objective Genetic Algorithm
Multiple otpimal solutions (models)
can be found
Cost
Effectiveness

Research Questions
Cross-project MO vs. cross-project SO
vs. within project SO

Research Questions
Cross-project MO vs. cross-project SO
vs. within project SO
Cross-project MO vs. Local Prediction

Experiment outline
• Cross-projects defect prediction:
 Training model on nine projects and test on the remaining one
(10 times)
RQ1

Experiment outline
(10 times)
• Within project defect prediction:
 10 cross-folder validation
RQ1
RQ1

Experiment outline
(10 times)
• Within project defect prediction:
 10 cross-folder validation
• Local prediction:
 K-means clustering algorithm
 Silhouette Coefficient
RQ1
RQ1
RQ2

Cross-project MO vs. Cross-project SO
0
50
100
150
200
250
300
KLOC
Cross-project SO Cross project MO

Cross-project MO vs. Within-project SO
0
50
100
150
200
250
300
350
KLOC
Within project SO Cross project MO

Cross-project MO vs. Within-project SO
0
10
20
30
40
50
60
70
80
90
100
Precision
Within project SO Cross project MO

0
50
100
150
200
250
300
KLOC
Local Prediction Cross project MO
Cross-project MO vs. Local Prediction

• Multi-objective Logistic Regression and GA Settings
 Cross-project defect validation
 Population size=100
 Max number of generations = 400
 Mutation Probability = 0.05
 Crossover Function = Arithmetic Crossover
Experiment outline

Multi-Objective Cross-Project Defect Prediction

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Multi-Objective Cross-Project Defect Prediction

Similar to Multi-Objective Cross-Project Defect Prediction (20)

More from Annibale Panichella

More from Annibale Panichella (20)

Recently uploaded

Recently uploaded (17)

Multi-Objective Cross-Project Defect Prediction