My INSURER PTE LTD - Insurtech Innovation Award 2024
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems
1. ImpactScale:
Quantifying Change Impact to Predict Faults
in Large Software Systems
Kenichi Kobayashi Akihiko Matsuo Katsuro Inoue
Fujitsu Laboratories Fujitsu Laboratories Osaka University
Yasuhiro Hayase Manabu Kamimura Toshiaki Yoshino
University of Tsukuba Fujitsu Laboratories Fujitsu
2. Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary
ICSM2011 @ Williamsburg, 2011-09-27 1 Copyright 2011 FUJITSU LABORATORIES LIMITED
3. Background
Fault prediction in maintenance is a difficult task, and
predictive performance is not enough only with product
metrics.
Product Metrics are metrics extracted from software product such as
source code.
Therefore, process metrics, such as code churn and logical
coupling, have been combined to product metrics.
Process Metrics are metrics extracted from software process such as
change histories.
Practitioners’ Point of View
However, in enterprise scenes of maintenance, documents,
change histories, bug reports, and specialists’ knowledge
are often lost, out-of-date, or unable to be used.
ICSM2011 @ Williamsburg, 2011-09-27 2 Copyright 2011 FUJITSU LABORATORIES LIMITED
4. Goals
Problem
Process metrics cannot be always obtained.
Motivation
To achieve high predictive performance only with product
metrics extractable from source code
Goals
To define a new product metric
To show the effectiveness of the metric
ICSM2011 @ Williamsburg, 2011-09-27 3 Copyright 2011 FUJITSU LABORATORIES LIMITED
5. Basic Idea
Software dependency is one of We assumed Change Impact
surviving factors of faults even Analysis enables us to extract implicit
after release. dependency.
暗黙の依存関係
implicit dependency
Change Impact Analysis
Technique to solve the
affected areas when some
part of software is changed.
修
fix
Weakness
修
fix
High computational cost
修
fix 修正忘れ
missed fix
Need not to solve the affected areas.
Only need to solve the scale of them.
Hypothesis
A metric that quantifies the scale of
ImpactScale change impact can improve the
(abbrev. IS)
performance of fault prediction.
ICSM2011 @ Williamsburg, 2011-09-27 4 Copyright 2011 FUJITSU LABORATORIES LIMITED
6. Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary
ICSM2011 @ Williamsburg, 2011-09-27 5 Copyright 2011 FUJITSU LABORATORIES LIMITED
7. Overview of ImpactScale Definition
Dependencies are extract Propagation Graph
from target software, and
Quantity of
Propagation Graph is built. Change Impact Code Node
from C to A
Propagation Model
Probabilistic propagation Change!
Relation-sensitive propagation
ImpactScale is sum of all
Quantities of Change Impact.
Dependency
Data Node
ICSM2011 @ Williamsburg, 2011-09-27 6 Copyright 2011 FUJITSU LABORATORIES LIMITED
8. Propagation Graph
① Build dependency graph
extracted from target software
《Dependency Graph》 Code Node
module, class,
function,
source code
Data Node Dependency Edge
DB table, with relation type
global variable CALL, READ, WRITE
ICSM2011 @ Williamsburg, 2011-09-27 7-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
9. Propagation Graph
① Build dependency graph ② Add reverse edges
extracted from target software to build Propagation Graph
《Dependency Graph》
《Propagation Graph》 Code Node
module, class,
function,
source code
Data Node Dependency Edge
DB table, with relation type
global variable CALL, READ, WRITE
Change impact analysis for ImpactScale is performed on
Propagation Graph.
ICSM2011 @ Williamsburg, 2011-09-27 7-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
10. Probabilistic Propagation
We assume that change impact probabilistically propagates
from a node to another node
as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07]
Propagation
Probability
Quantity of change
impact
×0.5 from the source node
Change!
×0.5 ×0.5
In this presentation, propagation probability is always 0.5.
ICSM2011 @ Williamsburg, 2011-09-27 8 Copyright 2011 FUJITSU LABORATORIES LIMITED
11. Relation-sensitive Propagation
To avoid overestimation, we used context information to
eliminate unlikely propagation.
We use an edge’s relation type as minimal context information in
point of computational time.
Cut Rules determine whether propagation from one node to its
next node is cut or not, referring its previous and next edge’s
relation type. previous next
relation type current relation type next
node refer node
refer Cut Rule
We call such controlled propagation relation-sensitive
propagation.
Computational complexity is practically low.
ICSM2011 @ Williamsburg, 2011-09-27 9 Copyright 2011 FUJITSU LABORATORIES LIMITED
12. Example of Cut Rules
Example from “C” Example from “F”
Cut Rule 2
During finding callers,
don’t find callees.
Change! Change!
Cut Rule 1 Cut Rule 3
During finding callees, Don’t find beyond
don’t find callers. READ edges.
ICSM2011 @ Williamsburg, 2011-09-27 10 Copyright 2011 FUJITSU LABORATORIES LIMITED
13. Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary
ICSM2011 @ Williamsburg, 2011-09-27 11 Copyright 2011 FUJITSU LABORATORIES LIMITED
14. Data Sets for Evaluations
Two enterprise accounting systems in different companies
Faulty Term
Data Set #Faulty
#Modules Total LOC #Faults Module Fault-
Name Modules
Rate Collected
40
DS1 5.8k 1.6M 269 215 3.7%
months
40
DS2 7.6k 3.7M 250 208 2.7%
months
Common Properties Collected Metrics
Language: COBOL 7 Existing Metrics
Age: Over 20 years LOC, WMC, MaxVG,
Sections, Calls,
Fan-in, Fan-out
ImpactScale
ICSM2011 @ Williamsburg, 2011-09-27 12 Copyright 2011 FUJITSU LABORATORIES LIMITED
15. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
16. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
17. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-3 Copyright 2011 FUJITSU LABORATORIES LIMITED
18. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-4 Copyright 2011 FUJITSU LABORATORIES LIMITED
19. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-5 Copyright 2011 FUJITSU LABORATORIES LIMITED
20. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-6 Copyright 2011 FUJITSU LABORATORIES LIMITED
21. Real Example of Calculating ImpactScale
DS1
#modules
5.8k
Each square-
shaped
group of
modules is a
sub-system.
ICSM2011 @ Williamsburg, 2011-09-27 13-7 Copyright 2011 FUJITSU LABORATORIES LIMITED
22. Measurement Results
Distribution of ImpactScale
4000
Number of Modules
Data Set Mean IS Max IS
3000 Long-tailed
DS1 86.0 2989.6
2000 DS2 156.5 3338.2
1000
0
~50
~100
~150
~200
~250
~300
~350
~400
~450
~500
~550
~600
~650
~700
~750
~800
~850
~900
~950
ImpactScale
Calculation Time Practically Spike:
DS1: about 10 sec. short • system-wide dispatcher
or
DS2: about 30 sec. • symptom of bad smell
ICSM2011 @ Williamsburg, 2011-09-27 14 Copyright 2011 FUJITSU LABORATORIES LIMITED
23. ImpactScale and Faults
First 20% of
modules
contain
48.8% faults.
IS highly
correlates
with faults.
Module
Database Table
ImpactScale
High 10-quartile Low
ICSM2011 @ Williamsburg, 2011-09-27 15 Copyright 2011 FUJITSU LABORATORIES LIMITED
24. Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary
ICSM2011 @ Williamsburg, 2011-09-27 16 Copyright 2011 FUJITSU LABORATORIES LIMITED
25. Overview of Evaluations
Evaluation Procedure
100 times random sub-sampling validation
Evaluations
Fault Prediction
RQ1 Does adding ImpactScale to existing product metrics
improve predictive performance?
• Predicting Faulty or Not Faulty
• Effort-aware Fault Prediction
RQ2
• Comparison between ImpactScale and Network Measures
Validating ImpactScale Definition
RQ3
ICSM2011 @ Williamsburg, 2011-09-27 17 Copyright 2011 FUJITSU LABORATORIES LIMITED
26. Predicting Faulty or Not Faulty
Faults are predicted using logistic regression.
MET = Model without ImpactScale / MET+IS = Model with ImpactScale
Performance DS1 DS1 Improvem Performance DS2 DS2 Improvem
Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS
Precision 0.148 0.168 +0.020 Precision 0.139 0.162 +0.020
Recall 0.315 0.392 +0.077 Recall 0.253 0.334 +0.077
F1 0.200 0.234 +0.034 F1 0.177 0.216 +0.034
All improvements are significant in Wilcoxon’s signed rank test.
Adding IS improves all performance measures supports RQ1 is YES.
Practitioners’ Point of View
Practically, these Precision/Recall/F1 evaluations are not very useful.
Because in maintenance, high fault-estimated modules tend to be large.
Actually, in the case of DS2, the top 10% of high fault-estimated modules
has 24% LOC. It is not effort-effective.
ICSM2011 @ Williamsburg, 2011-09-27 18 Copyright 2011 FUJITSU LABORATORIES LIMITED
27. Effort-aware Fault Prediction Model
Problem
In maintenance, modules estimated as faulty tend to be large.
A large module needs large effort to be reviewed or tested.
Practitioners’ Opinion
“Budget and schedule are very demanding. We want to find more faults
with less effort.”
Therefore, effort-effectiveness is our main concern.
We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10]
# errors( x)
It prioritize modules in the order of relative risk
to maximize effort-effectiveness. Effort ( x )
Poisson Regression is used to learn relative risk.
ICSM2011 @ Williamsburg, 2011-09-27 19 Copyright 2011 FUJITSU LABORATORIES LIMITED
28. Results of Effort-aware Evaluation
《Effort-based Cumulative Lift Chart of DS1》
AUC is the Area Under the Curve of
lift chart. AUC shows overall predictive
performance. High AUC means high
Faults detected
performance.
ddr10 is “detected defect rate in first
0.296 DS1-MET 10% effort”. ddr10 shows the predictive
0.186 DS1-MET+IS performance in the limited effort.
Optimal High ddr10 means high performance.
Effort (LOC inspected)
Practitioners’ Point of View
Performance DS1- DS1- Improvem
Measure MET MET+IS ent by IS In maintenance, budget, schedule
AUC 0.635 0.680 +0.045 and effort is always limited,
therefore, ddr10 is more important.
ddr10 0.186 0.296 ×1.60
All improvements are significant in Wilcoxon’s signed rank test.
ICSM2011 @ Williamsburg, 2011-09-27 20-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
29. Results of Effort-aware Evaluation
《Effort-based Cumulative Lift Chart of DS1》 《Effort-based Cumulative Lift Chart of DS2》
RQ1 Does adding ImpactScale to existing product
metrics improve predictive performance? is YES.
Faults detected
Faults detected
0.343
0.296 DS1-MET DS2-MET
DS1-MET+IS 0.225 DS2-MET+IS
0.186
Optimal Optimal
Effort (LOC inspected) Effort (LOC inspected)
Performance DS1- DS1- Improvem Performance DS2- DS2- Improvem
Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS
AUC 0.635 0.680 +0.045 AUC 0.669 0.714 +0.045
ddr10 0.186 0.296 ×1.60 ddr10 0.225 0.343 ×1.53
All improvements are significant in Wilcoxon’s signed rank test.
ICSM2011 @ Williamsburg, 2011-09-27 20-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
30. Comparison with Network Measures
Network Measures
Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis
(SNA) on a software dependency graph representing relationships
between binary modules of software systems.
Over 50 network measures were used. For example,
• in/out Degrees
• Network Diameter a.k.a. Page Rank
• Closeness
• Eigenvector Centrality, etc.
They and some replication studies [Tosun09][Nguyen10] reported they
work well in some cases.
RQ2
“Does adding ImpactScale to existing product metrics and network
measures improve predictive performance?”
ICSM2011 @ Williamsburg, 2011-09-27 21 Copyright 2011 FUJITSU LABORATORIES LIMITED
31. ImpactScale vs. Network Measures
Hierarchical Model Comparison based on Effort-aware Model
Model with
existing metrics
+ImpactScale +network measures
Adding
ImpactScale
improves
+network measures +ImpactScale performance.
All improvements and deterioration are significant in
Models are learned by using Principal Wilcoxon’s signed rank test.
Component Poisson Regression. *: P<0.05, **: P<0.01, unmarked: P<0.001
ICSM2011 @ Williamsburg, 2011-09-27 22-1 Copyright 2011 FUJITSU LABORATORIES LIMITED
32. ImpactScale vs. Network Measures
Hierarchical Model Comparison based on Effort-aware Model
Model with
existing metrics
+ImpactScale +network measures
Adding
ImpactScale
improves
+network measures +ImpactScale performance.
RQ2
All improvements and deterioration are significant in
Models are learned by ImpactScale to existing product metrics and test. YES.
“Does adding using Principal Wilcoxon’s signed rank is
network measures improve predictive performance?”unmarked: P<0.001
Component Poisson Regression. *: P<0.05, **: P<0.01,
ICSM2011 @ Williamsburg, 2011-09-27 22-2 Copyright 2011 FUJITSU LABORATORIES LIMITED
33. Validating ImpactScale
RQ3 Is considering distant nodes meaningful?
《Test method》 Compare Models with ImpactScale variants with limited
maximum distance of path-finding.
ddr10
0.40 ddr10
0.45
0.35 0.40
0.35
0.30
0.30
0.25
“Limit=1” 0.25
variant means
0.20
almost 0.20
fan-in + fan-out.
0.15 0.15
DS1 DS2
0.10 0.10
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Limit of Maximum Distance of Path-finding
Answer YES.
ICSM2011 @ Williamsburg, 2011-09-27 23 Copyright 2011 FUJITSU LABORATORIES LIMITED
34. Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary
ICSM2011 @ Williamsburg, 2011-09-27 24 Copyright 2011 FUJITSU LABORATORIES LIMITED
35. Summary of Evaluations
RQ1 YES
Does adding ImpactScale to existing product metrics
improve predictive performance? YES
RQ2
“Does adding ImpactScale to existing product metrics and YES
network measures improve predictive performance?”
RQ3 Is considering distant nodes meaningful? YES
Hypothesis
A metric that quantifies the scale of change impact can TRUE
improve the performance of fault prediction.
ICSM2011 @ Williamsburg, 2011-09-27 25 Copyright 2011 FUJITSU LABORATORIES LIMITED
36. Threats to Validity
Language
ImpactScale has no language-specific feature, but the evaluations are
done in only COBOL systems. COBOL has a lot of difference from other
languages.
Application Domain
The evaluated systems are only in accounting business domain.
Call Graph Analysis
The impact of dynamic dispatching (e.g. polymorphism and reflection) is
not assessed.
ICSM2011 @ Williamsburg, 2011-09-27 26 Copyright 2011 FUJITSU LABORATORIES LIMITED
37. Conclusion
We defined a new product metric quantifying change impact,
called ImpactScale.
Probabilistic propagation
Relation-sensitive propagation
Practical computational time even for large-scale software systems
We evaluated its predictive performance in enterprise systems.
Adding ImpactScale improves the performance
• Over 1.5 times in first 10% effort (LOC).
Additional Finding
• Considering distant nodes in dependency graph is meaningful for fault
prediction.
ICSM2011 @ Williamsburg, 2011-09-27 27 Copyright 2011 FUJITSU LABORATORIES LIMITED
38. Future Works
Extending supported languages
Java, C, C++
Expanding use cases
Rapid risk assessment
Watching violations of modularity
Measuring software decay
ICSM2011 @ Williamsburg, 2011-09-27 28 Copyright 2011 FUJITSU LABORATORIES LIMITED