SlideShare une entreprise Scribd logo
1  sur  79
Télécharger pour lire hors ligne
Yasutaka Kamei Shinsuke Matsumoto Akito Monden
Ken-ichi Matsumoto Bram Adams Ahmed E. Hassan
Revisiting Common Bug Prediction
Findings Using Effort-Aware Models
Bugs are Everywhere
1
These Bugs …
 Have expensive consequences for a
company
 Affect its reputation
2
OCTOPUS “Paul”
3
Correctly predicts win-loss of all 8 German
games in World Cup 2010
Measure the Source Code…
 Complexity
 Cohesion
 Churn
 . . .
 # Previous bugs
4
…and Build a Prediction Model
5
 Statistical or
 Machine learning
techniques
Predict a Bug
6
Predict a Bug
7
Predict a Bug
8
# BUGS: 2
# BUGS: 7
# BUGS: 0
Predict a Bug
9
# BUGS: 2
# BUGS: 7
# BUGS: 0
Do We Consider Effort?
10
Do We Consider Effort?
11
File A
File B
Do We Consider Effort?
12
# BUGS: 5
Effort : 1
File A
File B
# BUGS: 6
Effort : 20
Do We Consider Effort?
13
File A
File B
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 14
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 15
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
File A File B
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 16
5 = 5 / 1 0.30 = 6 / 20
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
File A File B
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 17
5 = 5 / 1 0.30 = 6 / 20
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
File A File B
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 18
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
5 = 5 / 1 0.30 = 6 / 20
SLOC
File A File B
Major Findings in Prediction Studies
 Process metrics are better defect predictors
than product metrics
 Package-level prediction has higher
precision and recall than file-level prediction
 …
19
[Schroter2006ISESE], [Zimmermann2007PROMISE], …
[Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
Major Findings in Prediction Studies
 Process metrics are better defect predictors
than product metrics
 Package-level prediction has higher
precision and recall than file-level prediction
 …
20
RQ1
RQ2
[Schroter2006ISESE], [Zimmermann2007PROMISE], …
[Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
Case Study
Platform
(5,000 files)
JDT
(3,000 files)
PDE
(1,000 files)
Ver. 3.0, 3.1 and 3.2
21
Case Study
Platform
(5,000 files)
JDT
(3,000 files)
PDE
(1,000 files)
Ver. 3.0, 3.1 and 3.2
22
Cross-release Prediction
TimePlatform 3.1 Platform 3.2
23
Cross-release Prediction
24
Build a prediction model
TimePlatform 3.1 Platform 3.2
Measure Metrics
Bugs
Training data
Cross-release Prediction
25
BUGS : 5
BUGS : 7
BUGS : 0
Build a prediction model Predict a bug
TimePlatform 3.1 Platform 3.2
Measure Metrics
Bugs
Measure Metrics
Bugs
Training data Testing data
Cumulative Lift Chart
All modules are ordered by decreasing
predicted RR(x) .
26
#bugs
0 200 400 600 800
05001000150020002500
KSLOC(= Effort)
#Bugs
Cumulative Lift Chart
All modules are ordered by decreasing
predicted RR(x) .
27
#bugs
0 200 400 600 800
05001000150020002500
KSLOC(= Effort)
#Bugs
20%
54%
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are package-level predictions still more
effective than file-level predictions?
28
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are package-level predictions still more
effective than file-level predictions?
29
RQ1: Process vs. Product Metrics
Compare prediction models based on process
and product metrics at the file-level
30
Process metrics Product metrics
Process Metrics
31
Product Metrics
32
Model Building Approach
 Linear model
 Regression tree
 Random forest
33
Model Building Approach
 Linear model
 Regression tree
 Random forest
34
KSLOC
#bugs
0 200 400 600 800
05001000150020002500
Process Metrics are still more
effective than Product Metrics
35
20%
74%
29%
Process
Product
KSLOC(= Effort)
#Bugs
KSLOC
#bugs
0 200 400 600 800
05001000150020002500
36
20%
74%
29%
KSLOC(= Effort)
#Bugs
2.6 (= 74/29)
Process
Product
Process Metrics are still more
effective than Product Metrics
Impact of Process and Product Metrics
The top five metrics are all process metrics
37
NSM
NSC
Refactorings
NORM
LCOM
SIX
DIT
NSF
NOF
NOM
WMC
VG
PAR
MLOC
NBD
SLOC
LOCAdded
Codechurn
LOCDeleted
Age
BugFixes
Revisions
0.00 0.05 0.10 0.15 0.20
rf1
IncNodePurity
Revisions
BugFixes
LOCDeleted
Age
Codechurn
LOCAdded
Refactorings
Process Metrics
Product Metrics
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are package-level predictions still more
effective than file-level predictions?
38
YES
RQ2: Package-level vs. File-level
Package-level File-level
39
Model Building Approach
40
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predictions to package-level
B1 Package-level Metrics
RQ2: Model Building Approach
41
Martin metrics Build a model
Package-level
predictions
Martin’s Package Metrics
42
B1 Package-level Metrics
RQ2: Model Building Approach
43
Martin metrics Build a model
Package-level
predictions
Model Building Approach
44
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predictions to package-level
B2 Lift File-level Metrics to
Package-level
45
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
metrics
B2 Lift File-level Metrics to
Package-level
46
RQ2: Model Building Approach
Package A
File a
File b
File c
Lift
Metrics
File-level
metrics
Package-level
metrics
B2 Lift File-level Metrics to
Package-level
47
RQ2: Model Building Approach
Package A
File a
File b
File c
Complexity:
9
4
5
6 =
9 + 4 + 5
3
Lift
Metrics
File-level
metrics
Package-level
metrics
B2 Lift File-level Metrics to
Package-level
48
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
metrics
B2 Lift File-level Metrics to
Package-level
49
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
metrics
Build a model
Package-level
predictions
Model Building Approach
50
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predictions to package-level
B3 Lift File-level Predictions
to Package-level
51
RQ2: Model Building Approach
Lift
Predictions
Build a model
File-level
predictionsFile-level
metrics
B3 Lift File-level Predictions
to Package-level
52
RQ2: Model Building Approach
Lift
Predictions
Build a model
File-level
predictionsFile-level
metrics
B3 Lift File-level Predictions
to Package-level
53
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2
Lift
Predictions
B3 Lift File-level Predictions
to Package-level
54
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2
KSLOC:
1.0
0.5
1.5
Lift
Predictions
B3 Lift File-level Predictions
to Package-level
55
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2
KSLOC:
1.0
0.5
1.5
1.0+0.5+1.5
6+3+2
2.9 =
Lift
Predictions
B3 Lift File-level Predictions
to Package-level
56
RQ2: Model Building Approach
Build a model
File-level
predictions
Lift
Predictions
Package-level
predictions
File-level
metrics
Summary of Model Building
Approaches
57
RQ2: Model Building Approach
Package-level
metrics
Build a model
at Package-level
Package-level
predictions
B1 Martin Metrics
Summary of Model Building
Approaches
58
RQ2: Model Building Approach
Package-level
metrics
File-level
metrics
Build a model
at Package-level
Package-level
predictions
Lift
MetricsB2 LiftUp(Input)
B1 Martin Metrics
Summary of Model Building
Approaches
59
RQ2: Model Building Approach
Package-level
metrics
File-level
metrics
Build a model
at Package-level
Package-level
predictions
Build a model
at File-level
File-level
predictions
Lift
Predictions
Lift
MetricsB2 LiftUp(Input)
B3 LiftUp(Pred)
B1 Martin Metrics
#bugs
0 200 400 600 800
05001000150020002500
60
20%
Lifting Predictions yields the Best
Performance at Package-level
KSLOC(= Effort)
#Bugs
62% B3 LiftUp(Prediction)
#bugs
0 200 400 600 800
05001000150020002500
61
20%
Lifting Predictions yields the Best
Performance at Package-level
KSLOC(= Effort)
#Bugs
62% B3 LiftUp(Prediction)
57% B2 LiftUp(Input)
#bugs
0 200 400 600 800
05001000150020002500
62
20%
57% B2 LiftUp(Input)
19% B1 Martin Metrics
Lifting Predictions yields the Best
Performance at Package-level
KSLOC(= Effort)
#Bugs
62% B3 LiftUp(Prediction)
Refactorings
A
NA.
LCOM
Ca
NSC
NC
NSM
I
NOF
WMC
NORM
D
PAR
VG
DIT
NSF
NOM
SIX
Ce
LOCAdded
Codechurn
LOCDeleted
Age
MLOC
NBD
BugFixes
SLOC
Revisions
0.000 0.002 0.004 0.006 0.008 0.010 0.012
rf3Impact of Martin Metrics
63
Ce
D
I
NC
Ca
NA
A
Refactorings
Revisions
BugFixes
LOCDeleted
Codechurn
LOCAdded
Age
Martin Metrics
Process Metrics
Product Metrics
64
#bugs
0 200 400 600 800
05001000150020002500
20%
74%
62%
Package
B3 LiftUp(Pred.)
File
KSLOC(= Effort)
#Bugs
File-level Predictions are more
effective than Package-level
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are package-level predictions still more
effective than file-level predictions?
65
YES
NO
Why is RQ2 not Supported?
66
Package-level File-level
Why is RQ2 not Supported?
The larger the package, the more likely a bug
is introduced.
67
Package-level File-level
# BUGS: 8
SLOC : 20
# BUGS: 2
SLOC : 0.5
68
69
70
71
Reference
72
Studied Projects
73
SZZ Algorithm
74
Example of Counting
the Number of Bugs
75
v3.0 release v3.1 release v3.2 release
B
bug introduction bug fix
v3.0 release v3.1 release v3.2 release
A
bug introduction bug fix
Martin’s Package Metrics
76
Product Metrics
77
Used Metrics
78
Process metrics Product metrics
Martin’s Package Metrics

Contenu connexe

Similaire à Icsm2010 kamei

ProDebt's Lessons Learned from Planning Technical Debt Strategically
ProDebt's Lessons Learned from Planning Technical Debt StrategicallyProDebt's Lessons Learned from Planning Technical Debt Strategically
ProDebt's Lessons Learned from Planning Technical Debt StrategicallyQAware GmbH
 
Introduction to goodenuffR
Introduction to goodenuffRIntroduction to goodenuffR
Introduction to goodenuffRMartinFrigaard
 
Fse2012 shihab
Fse2012 shihabFse2012 shihab
Fse2012 shihabSAIL_QU
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?Daniel Alencar
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891SAIL_QU
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihabSAIL_QU
 
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experimentsSFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experimentsSouth Tyrol Free Software Conference
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...Michael Gallo
 
se01.ppt
se01.pptse01.ppt
se01.pptxiso
 
PhD public defense: A Measurement Framework for Analyzing Technical Lag in ...
PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...
PhD public defense: A Measurement Framework for Analyzing Technical Lag in ...Ahmed Zerouali
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
 
Estimating Security Risk Through Repository Mining
Estimating Security Risk Through Repository MiningEstimating Security Risk Through Repository Mining
Estimating Security Risk Through Repository MiningTamas K Lengyel
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Icse2013 shang
Icse2013 shangIcse2013 shang
Icse2013 shangSAIL_QU
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialTEST Huddle
 

Similaire à Icsm2010 kamei (20)

ProDebt's Lessons Learned from Planning Technical Debt Strategically
ProDebt's Lessons Learned from Planning Technical Debt StrategicallyProDebt's Lessons Learned from Planning Technical Debt Strategically
ProDebt's Lessons Learned from Planning Technical Debt Strategically
 
Introduction to goodenuffR
Introduction to goodenuffRIntroduction to goodenuffR
Introduction to goodenuffR
 
Fse2012 shihab
Fse2012 shihabFse2012 shihab
Fse2012 shihab
 
The Knowledgeable Software Engineer
The Knowledgeable Software EngineerThe Knowledgeable Software Engineer
The Knowledgeable Software Engineer
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891
 
Cocomomodel
CocomomodelCocomomodel
Cocomomodel
 
Cocomo model
Cocomo modelCocomo model
Cocomo model
 
COCOMO Model
COCOMO ModelCOCOMO Model
COCOMO Model
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihab
 
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experimentsSFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...Simple is Not Necessarily Better:  Why Software Productivity Factors Can Lead...
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...
 
se01.ppt
se01.pptse01.ppt
se01.ppt
 
PhD public defense: A Measurement Framework for Analyzing Technical Lag in ...
PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...PhD public defense: A Measurement Framework for  Analyzing Technical Lag in  ...
PhD public defense: A Measurement Framework for Analyzing Technical Lag in ...
 
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
 
Estimating Security Risk Through Repository Mining
Estimating Security Risk Through Repository MiningEstimating Security Risk Through Repository Mining
Estimating Security Risk Through Repository Mining
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Icse2013 shang
Icse2013 shangIcse2013 shang
Icse2013 shang
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement Potential
 

Plus de SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 
On the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataOn the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataSAIL_QU
 

Plus de SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 
On the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity DataOn the Unreliability of Bug Severity Data
On the Unreliability of Bug Severity Data
 

Icsm2010 kamei

  • 1. Yasutaka Kamei Shinsuke Matsumoto Akito Monden Ken-ichi Matsumoto Bram Adams Ahmed E. Hassan Revisiting Common Bug Prediction Findings Using Effort-Aware Models
  • 3. These Bugs …  Have expensive consequences for a company  Affect its reputation 2
  • 4. OCTOPUS “Paul” 3 Correctly predicts win-loss of all 8 German games in World Cup 2010
  • 5. Measure the Source Code…  Complexity  Cohesion  Churn  . . .  # Previous bugs 4
  • 6. …and Build a Prediction Model 5  Statistical or  Machine learning techniques
  • 9. Predict a Bug 8 # BUGS: 2 # BUGS: 7 # BUGS: 0
  • 10. Predict a Bug 9 # BUGS: 2 # BUGS: 7 # BUGS: 0
  • 11. Do We Consider Effort? 10
  • 12. Do We Consider Effort? 11 File A File B
  • 13. Do We Consider Effort? 12 # BUGS: 5 Effort : 1 File A File B # BUGS: 6 Effort : 20
  • 14. Do We Consider Effort? 13 File A File B # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20
  • 15. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 14
  • 16. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 15 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  • 17. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 16 5 = 5 / 1 0.30 = 6 / 20 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  • 18. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 17 5 = 5 / 1 0.30 = 6 / 20 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  • 19. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 18 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 5 = 5 / 1 0.30 = 6 / 20 SLOC File A File B
  • 20. Major Findings in Prediction Studies  Process metrics are better defect predictors than product metrics  Package-level prediction has higher precision and recall than file-level prediction  … 19 [Schroter2006ISESE], [Zimmermann2007PROMISE], … [Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
  • 21. Major Findings in Prediction Studies  Process metrics are better defect predictors than product metrics  Package-level prediction has higher precision and recall than file-level prediction  … 20 RQ1 RQ2 [Schroter2006ISESE], [Zimmermann2007PROMISE], … [Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
  • 22. Case Study Platform (5,000 files) JDT (3,000 files) PDE (1,000 files) Ver. 3.0, 3.1 and 3.2 21
  • 23. Case Study Platform (5,000 files) JDT (3,000 files) PDE (1,000 files) Ver. 3.0, 3.1 and 3.2 22
  • 25. Cross-release Prediction 24 Build a prediction model TimePlatform 3.1 Platform 3.2 Measure Metrics Bugs Training data
  • 26. Cross-release Prediction 25 BUGS : 5 BUGS : 7 BUGS : 0 Build a prediction model Predict a bug TimePlatform 3.1 Platform 3.2 Measure Metrics Bugs Measure Metrics Bugs Training data Testing data
  • 27. Cumulative Lift Chart All modules are ordered by decreasing predicted RR(x) . 26 #bugs 0 200 400 600 800 05001000150020002500 KSLOC(= Effort) #Bugs
  • 28. Cumulative Lift Chart All modules are ordered by decreasing predicted RR(x) . 27 #bugs 0 200 400 600 800 05001000150020002500 KSLOC(= Effort) #Bugs 20% 54%
  • 29. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 28
  • 30. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 29
  • 31. RQ1: Process vs. Product Metrics Compare prediction models based on process and product metrics at the file-level 30 Process metrics Product metrics
  • 34. Model Building Approach  Linear model  Regression tree  Random forest 33
  • 35. Model Building Approach  Linear model  Regression tree  Random forest 34
  • 36. KSLOC #bugs 0 200 400 600 800 05001000150020002500 Process Metrics are still more effective than Product Metrics 35 20% 74% 29% Process Product KSLOC(= Effort) #Bugs
  • 37. KSLOC #bugs 0 200 400 600 800 05001000150020002500 36 20% 74% 29% KSLOC(= Effort) #Bugs 2.6 (= 74/29) Process Product Process Metrics are still more effective than Product Metrics
  • 38. Impact of Process and Product Metrics The top five metrics are all process metrics 37 NSM NSC Refactorings NORM LCOM SIX DIT NSF NOF NOM WMC VG PAR MLOC NBD SLOC LOCAdded Codechurn LOCDeleted Age BugFixes Revisions 0.00 0.05 0.10 0.15 0.20 rf1 IncNodePurity Revisions BugFixes LOCDeleted Age Codechurn LOCAdded Refactorings Process Metrics Product Metrics
  • 39. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 38 YES
  • 40. RQ2: Package-level vs. File-level Package-level File-level 39
  • 41. Model Building Approach 40 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  • 42. B1 Package-level Metrics RQ2: Model Building Approach 41 Martin metrics Build a model Package-level predictions
  • 44. B1 Package-level Metrics RQ2: Model Building Approach 43 Martin metrics Build a model Package-level predictions
  • 45. Model Building Approach 44 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  • 46. B2 Lift File-level Metrics to Package-level 45 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics
  • 47. B2 Lift File-level Metrics to Package-level 46 RQ2: Model Building Approach Package A File a File b File c Lift Metrics File-level metrics Package-level metrics
  • 48. B2 Lift File-level Metrics to Package-level 47 RQ2: Model Building Approach Package A File a File b File c Complexity: 9 4 5 6 = 9 + 4 + 5 3 Lift Metrics File-level metrics Package-level metrics
  • 49. B2 Lift File-level Metrics to Package-level 48 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics
  • 50. B2 Lift File-level Metrics to Package-level 49 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics Build a model Package-level predictions
  • 51. Model Building Approach 50 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  • 52. B3 Lift File-level Predictions to Package-level 51 RQ2: Model Building Approach Lift Predictions Build a model File-level predictionsFile-level metrics
  • 53. B3 Lift File-level Predictions to Package-level 52 RQ2: Model Building Approach Lift Predictions Build a model File-level predictionsFile-level metrics
  • 54. B3 Lift File-level Predictions to Package-level 53 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 Lift Predictions
  • 55. B3 Lift File-level Predictions to Package-level 54 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 KSLOC: 1.0 0.5 1.5 Lift Predictions
  • 56. B3 Lift File-level Predictions to Package-level 55 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 KSLOC: 1.0 0.5 1.5 1.0+0.5+1.5 6+3+2 2.9 = Lift Predictions
  • 57. B3 Lift File-level Predictions to Package-level 56 RQ2: Model Building Approach Build a model File-level predictions Lift Predictions Package-level predictions File-level metrics
  • 58. Summary of Model Building Approaches 57 RQ2: Model Building Approach Package-level metrics Build a model at Package-level Package-level predictions B1 Martin Metrics
  • 59. Summary of Model Building Approaches 58 RQ2: Model Building Approach Package-level metrics File-level metrics Build a model at Package-level Package-level predictions Lift MetricsB2 LiftUp(Input) B1 Martin Metrics
  • 60. Summary of Model Building Approaches 59 RQ2: Model Building Approach Package-level metrics File-level metrics Build a model at Package-level Package-level predictions Build a model at File-level File-level predictions Lift Predictions Lift MetricsB2 LiftUp(Input) B3 LiftUp(Pred) B1 Martin Metrics
  • 61. #bugs 0 200 400 600 800 05001000150020002500 60 20% Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction)
  • 62. #bugs 0 200 400 600 800 05001000150020002500 61 20% Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction) 57% B2 LiftUp(Input)
  • 63. #bugs 0 200 400 600 800 05001000150020002500 62 20% 57% B2 LiftUp(Input) 19% B1 Martin Metrics Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction)
  • 64. Refactorings A NA. LCOM Ca NSC NC NSM I NOF WMC NORM D PAR VG DIT NSF NOM SIX Ce LOCAdded Codechurn LOCDeleted Age MLOC NBD BugFixes SLOC Revisions 0.000 0.002 0.004 0.006 0.008 0.010 0.012 rf3Impact of Martin Metrics 63 Ce D I NC Ca NA A Refactorings Revisions BugFixes LOCDeleted Codechurn LOCAdded Age Martin Metrics Process Metrics Product Metrics
  • 65. 64 #bugs 0 200 400 600 800 05001000150020002500 20% 74% 62% Package B3 LiftUp(Pred.) File KSLOC(= Effort) #Bugs File-level Predictions are more effective than Package-level
  • 66. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 65 YES NO
  • 67. Why is RQ2 not Supported? 66 Package-level File-level
  • 68. Why is RQ2 not Supported? The larger the package, the more likely a bug is introduced. 67 Package-level File-level # BUGS: 8 SLOC : 20 # BUGS: 2 SLOC : 0.5
  • 69. 68
  • 70. 69
  • 71. 70
  • 72. 71
  • 76. Example of Counting the Number of Bugs 75 v3.0 release v3.1 release v3.2 release B bug introduction bug fix v3.0 release v3.1 release v3.2 release A bug introduction bug fix
  • 79. Used Metrics 78 Process metrics Product metrics Martin’s Package Metrics

Notes de l'éditeur

  1. Hello, my name is Yasutaka Kamei. I’m a postdoc at Queens University, Canada. Today’s my talk is titled “Revisiing Common bug Pred[i]ction Findings Using Effort-Aware Models”. We seek to revisit some of the major findings in the prediction literature by taking into account the cost of additional software quality assurance. 1st 18min. 2nd 19min for Bram. 1st 24min 2nd 23min 3rd 22min 4th 23min 5th 22min 1st 24min 2nd 24min 3rd 23min 1st 23min 2nd 23min
  2. Bugs are everywhere such as laptop application, iPhone and youtube so on. Sometimes, when we write a research paper, a laptop suddenly shots down.
  3. Software bugs in released products have exp[e]nsive c[o]nsequences for a comany. For example, software field defects cost the U.S. ec[o]nomy an estimated $60 billion annually. Also, these bugs seriously affect company’s reput[a]tion. But, unfortunately, a company has only limited developers and times for software quality assurance activities. How do we do?
  4. Please ask Paul! He is an octopus but a super octopus. He correctly predicts win-lose of all 8 games in world cup 2010. He will help us to predict a bug. But, we have one bad news. Unusually, the life duration of octopus is just about 3 years. So we’ll need another solution in one or two years.
  5. Another approach is a fault prediction technique. Many studies show these techniques could be effective for software quality assurance. When we use these techniques, firstly, we measure source code metrics. For example, we measure complexity, cohesion, churn and bugs.
  6. Then, we build a prediction model using statistic techniques and machine learning techniques.
  7. Then given new files, we measure source code metrics. [click] Then we predict the number of bugs in each code using this prediction model. [click] We can allocate limited testing or reviewing efforts to the middle files because the predicted number of bugs is the largest.
  8. Then given new files, we measure source code metrics. [click] Then we predict the number of bugs in each code using this prediction model. [click] We can allocate limited testing or reviewing efforts to the middle files because the predicted number of bugs is the largest.
  9. Then we predict the number of bugs in each code using this prediction model.
  10. We can allocate limited testing or reviewing efforts to the middle files because the predicted number of bugs is the largest.
  11. But, do we consider aneffort? Always, we should take into account the effort to review a file for detecting a bug. [click] Do files with small size and large size require a same amount of effort to review files? We don’t think its same. So we should consider the effectiveness to review files in terms of effort. [click] For example, we get the predicted number of bugs, 5 and 6. [click] And we need 1 effort and 20 effort in each file. In terms of effectiveness, we should review file A. But previous prediction approaches indicate we allocate more efforts to file B.
  12. Do files with small size and large size require a same amount of effort to review files? We don’t think its same. So we should consider the effectiveness to review files in terms of effort. [click]
  13. For example, we get the predicted number of bugs, 5 and 6. [click]
  14. And we need 1 effort and 20 effort in each file. In terms of effectiveness, we should review file A. But previous prediction approaches indicate we allocate more efforts to file B. Because the number of bugs in file B is larger.
  15. To solve this problem, effort-aware model has been proposed. This equation means how many bugs we can detect per effort. We use this equation as the dependent variable when building a prediction model instead of the number of bugs. [click] We show an example using this case. [click] We can finish a review in 1 effort and detect 5 bugs. On the other hand, we can finish a review in 20 effort and detect 6 bugs. So, a relative risk is just 0.30. We believe this result is more intuitive. [click] In this study, we consider the number of lines of code as an effort. This effort-aware prediction model offers a totally new interpretation and the practical view of bug prediction results.
  16. We show an example using this case.
  17. We can finish a review in 1 effort and detect 5 bugs. On the other hand, we can finish a review in 20 effort and detect 6 bugs. So, a relative risk is just 0.30. We believe this result is more intuitive.
  18. This effort-aware models indicate we allocate more effort to file A. We believe this result is more intuitive.
  19. There are many ways to measure effort. One possible way is easier start. We consider the lines of code as the effort. This effort-aware prediction model offers a totally new interpretation and the practical view of bug prediction results.
  20. We’d like to revisit some of the major findings in the prediction literature when considering the effort. As a part of major findings in prediction study, process metrics are better defect…. Process metrics is measured from And package-level prediction shows to have higher precision and recall than file-level prediction. We revisit two major findings, process versus
  21. The target of our study is three subprojects with the Eclipse software system, one of the well-known open development platforms. Each subsystem are pretty large. They have 5,000 files, 3,000 files and 1,000 files. [click] In this presentation, we show the result of Platform. Because this project is the largest among three projects.
  22. In this presentation, we show the result of Platform. Because this project is the largest among three projects.
  23. We perform a cross-release prediction analysis to study the performance of our experiments. [click] We build a prediction model using a training dataset collected from a past release of a project, [click] and evaluate the performance using a test dataset collected from the following release. This cross-release evaluation leads to an evaluation in a more practical setting. [click]
  24. We evaluate the prediction performance using cumulative lift chart. We order all files by decreasing the predicted fault density. [click] And we calculate how many percentage of bugs to all bugs we detect when we use 20% of effort. 100% of effort means the effort that it take to test all files. This example shows that we can detect 54\% of all bugs.
  25. And we calculate how many percentage of bugs to all bugs do we detect when we use 20% of effort. 100% of effort means the effort that it take to test all files. This example shows that we can detect 54\% of all bugs.
  26. So we are interested in addressing the following two research questions <<07:30>> Q1: process > product Q2: file level > package level
  27. So we are interested in addressing the following two research questions
  28. Let me talk about the research question 1, process metrics versus product metrics. We compare models based on product metrics to models based on process metrics at file-level.
  29. As a product metrics, we measured these product metrics using the Eclipse Metrics plug-in.
  30. As a process metrics, we measured 7 kinds of metrics such as codechurn, number of revisions of a file, number of times a file has been refactored. We wrote a script that obtains these measures from version control system. As a product metrics, we measured these product metrics using the Eclipse Metrics plug-in.
  31. We use three modeling techniques (regression model, regression tree and random forest) to build the prediction models. This environment is in R.
  32. We use three modeling techniques (regression model, regression tree and random forest) to build the prediction models. This environment is in R.
  33. I’ll show the result of process metrics versus product metrics. This graph is … Solid lines shows the result of process metrics and dashed line shows the result of product metrics. Using our bug predictions based on process metrics, we only need to spend 20% of the efforts to detect up to 74% of all faults. On the other hand, using product metrics, just detect up to 29% of all faults. Process metrics outperform product metrics by a factor of 2.6(=74/29) when considering effort
  34. To illustrate the impact of product metrics and process metrics, we build a random forest using both of process metrics and product metrics and use IncNodePurity in the output. A higher IncNodePurity means that a variable plays a more important role in a built prediction model. This figure shows IncNodePurity of the metrics, sorted decreasingly from top to bottom. Process metrics are colored with Red. The top five metrics are all process metrics (Revisions, BugFixes, Age, LOCDeleted and Codechurn). And the lines of code added is ranked at top 6. But only the number of refactorings is not important.
  35. Next, we talk about research question 2...
  36. Package-level vs. File-level. We compare package-predictions to file-level predictions. <<13:30>>
  37. When building a file-level prediction model, product metrics and process metrics can be used as is. However, since a package consists of multiple files, we either need to lift the file-level metrics up to the package-level or use special package-level metrics.
  38. Martin’s package design metrics indicate instability and abstractness of packages such as number of abstract classes, and concrete classes and Fan-in and Fan-out of packages. To our knowledge, no study has reported the effects of Martin metrics on bug prediction.
  39. We show an overview of building approaches. When we use Martin Metrics, we measure metrics and build a model at packages. When we use LiftUp Input, we lift up the metrics to package level. When we use LiftUp Prediction, we lift up the prediction result to package level.
  40. None of the Martin metrics show up in the top five metrics. Two process and three products… One of Martin metrics is ranked at top 10 but the others are not important to build a model
  41. Why is RQ2 not supported? [click] Normally, package is larger and has more bugs than file. If we don’t consider effort, it’s easier to detect a bug in package. Because the larger the package, the more likely a bug is introduced. So in previous work, package-level prediction has higher precision and recall than file-level prediction.
  42. Normally, package is larger and has more bugs than file. If we don’t consider effort, it’s easier to detect a bug in package. Because the larger the package, the more likely a bug is introduced. So in previous work, package-level prediction has higher precision and recall than file-level prediction.
  43. In this paper, we use effort-aware models to revisit some of the major findings in the prediction. RQ1 is process metrics versus product metrics. RQ2 is package-level prediction versus file-level prediction. We introduce three building approach for package-level prediction. To study the two research question, we evaluate prediction performance using dataset of Eclipse project. We find that the first finding holds when effort is considered, while the second finding does not hold.
  44. To study the two research question, we evaluate prediction performance using dataset of Eclipse project. We find that the first finding holds when effort is considered, while the second finding does not hold.
  45. List of Questions: Why do you use the number of lines of code as the effort to test files? Package-level prediction has some different characteristics from file-level prediction. Do you use a bug density as the dependent variable? If yes, what’s the new point of your research? How do you collect a bug information from CVS?
  46. Martin’s package design metrics indicate instability and abstractness of packages such as number of classes inside this package that depend on classes outside this package and number of abstract classes.
  47. As a process metrics, we measured 7 kinds of metrics such as codechurn, number of revisions of a file, number of times a file has been refactored. We wrote a script that obtains these measures from version control system. As a product metrics, we measured these product metrics using the Eclipse Metrics plug-in.
  48. For package-level predictions, in addition to product metrics and process metrics, we use the metrics suite proposed by Martin.