Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

ImpactScale:
Quantifying Change Impact to Predict Faults
in Large Software Systems

Kenichi Kobayashi Akihiko Matsuo Katsuro Inoue
Fujitsu Laboratories Fujitsu Laboratories Osaka University

Yasuhiro Hayase Manabu Kamimura Toshiaki Yoshino
University of Tsukuba Fujitsu Laboratories Fujitsu

Overview
1. Background and Goal
2. Definition of ImpactScale
3. Measuring ImpactScale in Real Systems
4. Fault Prediction and Evaluation
5. Summary

ICSM2011 @ Williamsburg, 2011-09-27 1 Copyright 2011 FUJITSU LABORATORIES LIMITED

Background
 Fault prediction in maintenance is a difficult task, and
predictive performance is not enough only with product
metrics.
 Product Metrics are metrics extracted from software product such as
source code.

 Therefore, process metrics, such as code churn and logical
coupling, have been combined to product metrics.
 Process Metrics are metrics extracted from software process such as
change histories.

Practitioners’ Point of View
 However, in enterprise scenes of maintenance, documents,
change histories, bug reports, and specialists’ knowledge
are often lost, out-of-date, or unable to be used.


Goals
 Problem
 Process metrics cannot be always obtained.

 Motivation
 To achieve high predictive performance only with product
metrics extractable from source code

 Goals
 To define a new product metric
 To show the effectiveness of the metric


Basic Idea
Software dependency is one of We assumed Change Impact
surviving factors of faults even Analysis enables us to extract implicit
after release. dependency.
暗黙の依存関係
implicit dependency
Change Impact Analysis
Technique to solve the
affected areas when some
part of software is changed.
修
fix
Weakness
修
fix
High computational cost
修
fix 修正忘れ
missed fix

Need not to solve the affected areas.
Only need to solve the scale of them.

Hypothesis
A metric that quantifies the scale of
ImpactScale change impact can improve the
(abbrev. IS)
performance of fault prediction.

Overview
5. Summary


Overview of ImpactScale Definition
 Dependencies are extract Propagation Graph
from target software, and
Quantity of
Propagation Graph is built. Change Impact Code Node
from C to A

 Propagation Model
 Probabilistic propagation Change!
 Relation-sensitive propagation

 ImpactScale is sum of all
Quantities of Change Impact.

Dependency
Data Node


Propagation Graph
① Build dependency graph
extracted from target software

《Dependency Graph》 Code Node
module, class,
function,
source code

Data Node Dependency Edge
DB table, with relation type
global variable CALL, READ, WRITE

ICSM2011 @ Williamsburg, 2011-09-27 7-1 Copyright 2011 FUJITSU LABORATORIES LIMITED

Propagation Graph
① Build dependency graph ② Add reverse edges
extracted from target software to build Propagation Graph

《Dependency Graph》
《Propagation Graph》 Code Node
module, class,
function,
source code

Data Node Dependency Edge
DB table, with relation type
global variable CALL, READ, WRITE

Change impact analysis for ImpactScale is performed on
Propagation Graph.

Probabilistic Propagation
 We assume that change impact probabilistically propagates
from a node to another node
as some Ripple Effect studies. [Hanny72] [Tsantalis05] [Sharafat07]

Propagation
Probability
Quantity of change
impact
×0.5 from the source node
Change!

×0.5 ×0.5

In this presentation, propagation probability is always 0.5.

Relation-sensitive Propagation
 To avoid overestimation, we used context information to
eliminate unlikely propagation.
 We use an edge’s relation type as minimal context information in
point of computational time.

 Cut Rules determine whether propagation from one node to its
next node is cut or not, referring its previous and next edge’s
relation type. previous next
relation type current relation type next
node refer node
refer Cut Rule

 We call such controlled propagation relation-sensitive
propagation.

 Computational complexity is practically low.

Example of Cut Rules
Example from “C” Example from “F”
Cut Rule 2
During finding callers,
don’t find callees.

Change! Change!

Cut Rule 1 Cut Rule 3
During finding callees, Don’t find beyond
don’t find callers. READ edges.


Overview
5. Summary


Data Sets for Evaluations
Two enterprise accounting systems in different companies
Faulty Term
Data Set #Faulty
#Modules Total LOC #Faults Module Fault-
Name Modules
Rate Collected
40
DS1 5.8k 1.6M 269 215 3.7%
months
40
DS2 7.6k 3.7M 250 208 2.7%
months

 Common Properties  Collected Metrics
 Language: COBOL  7 Existing Metrics
 Age: Over 20 years  LOC, WMC, MaxVG,
Sections, Calls,
Fan-in, Fan-out
 ImpactScale


Real Example of Calculating ImpactScale
DS1

#modules
5.8k

Each square-
shaped
group of
modules is a
sub-system.


DS1

#modules
5.8k

Each square-
shaped
group of
modules is a
sub-system.


Measurement Results
 Distribution of ImpactScale

4000
Number of Modules

Data Set Mean IS Max IS
3000 Long-tailed
DS1 86.0 2989.6
2000 DS2 156.5 3338.2

1000

0
~50

~100

~150

~200

~250

~300

~350

~400

~450

~500

~550

~600

~650

~700

~750

~800

~850

~900

~950
ImpactScale
 Calculation Time Practically Spike:
 DS1: about 10 sec. short • system-wide dispatcher
or
 DS2: about 30 sec. • symptom of bad smell


ImpactScale and Faults

First 20% of
modules
contain
48.8% faults.

IS highly
correlates
with faults.

Module
Database Table

ImpactScale

High 10-quartile Low

Overview
5. Summary


Overview of Evaluations
 Evaluation Procedure
 100 times random sub-sampling validation

 Evaluations
 Fault Prediction
RQ1 Does adding ImpactScale to existing product metrics
improve predictive performance?

• Predicting Faulty or Not Faulty
• Effort-aware Fault Prediction
RQ2
• Comparison between ImpactScale and Network Measures

 Validating ImpactScale Definition
RQ3


Predicting Faulty or Not Faulty
 Faults are predicted using logistic regression.
 MET = Model without ImpactScale / MET+IS = Model with ImpactScale
Performance DS1 DS1 Improvem Performance DS2 DS2 Improvem
Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS
Precision 0.148 0.168 +0.020 Precision 0.139 0.162 +0.020
Recall 0.315 0.392 +0.077 Recall 0.253 0.334 +0.077
F1 0.200 0.234 +0.034 F1 0.177 0.216 +0.034
All improvements are significant in Wilcoxon’s signed rank test.

Adding IS improves all performance measures  supports RQ1 is YES.

 Practically, these Precision/Recall/F1 evaluations are not very useful.
 Because in maintenance, high fault-estimated modules tend to be large.
 Actually, in the case of DS2, the top 10% of high fault-estimated modules
has 24% LOC. It is not effort-effective.


Effort-aware Fault Prediction Model
 Problem
 In maintenance, modules estimated as faulty tend to be large.
 A large module needs large effort to be reviewed or tested.

 Practitioners’ Opinion
 “Budget and schedule are very demanding. We want to find more faults
with less effort.”
 Therefore, effort-effectiveness is our main concern.

 We use “Effort-aware model” [Arisholm06] [Menzies10] [Mende10]
# errors( x)
 It prioritize modules in the order of relative risk
to maximize effort-effectiveness. Effort ( x )

 Poisson Regression is used to learn relative risk.

Results of Effort-aware Evaluation
《Effort-based Cumulative Lift Chart of DS1》
AUC is the Area Under the Curve of
lift chart. AUC shows overall predictive
performance. High AUC means high
Faults detected

performance.

ddr10 is “detected defect rate in first
0.296 DS1-MET 10% effort”. ddr10 shows the predictive
0.186 DS1-MET+IS performance in the limited effort.
Optimal High ddr10 means high performance.

Effort (LOC inspected)
Performance DS1- DS1- Improvem
Measure MET MET+IS ent by IS In maintenance, budget, schedule
AUC 0.635 0.680 +0.045 and effort is always limited,
therefore, ddr10 is more important.
ddr10 0.186 0.296 ×1.60


Results of Effort-aware Evaluation
《Effort-based Cumulative Lift Chart of DS1》《Effort-based Cumulative Lift Chart of DS2》
RQ1 Does adding ImpactScale to existing product
metrics improve predictive performance? is YES.
Faults detected

Faults detected
0.343
0.296 DS1-MET DS2-MET
DS1-MET+IS 0.225 DS2-MET+IS
0.186
Optimal Optimal

Effort (LOC inspected) Effort (LOC inspected)

Performance DS1- DS1- Improvem Performance DS2- DS2- Improvem
Measure MET MET+IS ent by IS Measure MET MET+IS ent by IS
AUC 0.635 0.680 +0.045 AUC 0.669 0.714 +0.045
ddr10 0.186 0.296 ×1.60 ddr10 0.225 0.343 ×1.53


Comparison with Network Measures
 Network Measures
 Recently, [Zimmermann et al. ICSE08] applied Social Network Analysis
(SNA) on a software dependency graph representing relationships
between binary modules of software systems.
 Over 50 network measures were used. For example,
• in/out Degrees
• Network Diameter a.k.a. Page Rank
• Closeness
• Eigenvector Centrality, etc.
 They and some replication studies [Tosun09][Nguyen10] reported they
work well in some cases.

RQ2
“Does adding ImpactScale to existing product metrics and network
measures improve predictive performance?”


ImpactScale vs. Network Measures
Hierarchical Model Comparison based on Effort-aware Model
Model with
existing metrics

+ImpactScale +network measures

Adding
ImpactScale
improves
+network measures +ImpactScale performance.

All improvements and deterioration are significant in
Models are learned by using Principal Wilcoxon’s signed rank test.
Component Poisson Regression. *: P<0.05, **: P<0.01, unmarked: P<0.001

ImpactScale vs. Network Measures
Hierarchical Model Comparison based on Effort-aware Model
Model with
existing metrics

+ImpactScale +network measures

Adding
ImpactScale
improves
+network measures +ImpactScale performance.
RQ2
All improvements and deterioration are significant in
Models are learned by ImpactScale to existing product metrics and test. YES.
“Does adding using Principal Wilcoxon’s signed rank is
network measures improve predictive performance?”unmarked: P<0.001
Component Poisson Regression. *: P<0.05, **: P<0.01,

Validating ImpactScale
RQ3 Is considering distant nodes meaningful?

《Test method》 Compare Models with ImpactScale variants with limited
maximum distance of path-finding.
ddr10
0.40 ddr10
0.45

0.35 0.40

0.35
0.30
0.30
0.25
“Limit=1” 0.25
variant means
0.20
almost 0.20
fan-in + fan-out.
0.15 0.15
DS1 DS2
0.10 0.10
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Limit of Maximum Distance of Path-finding

Answer YES.

Overview
5. Summary


Summary of Evaluations
RQ1 YES
Does adding ImpactScale to existing product metrics
improve predictive performance? YES

RQ2
“Does adding ImpactScale to existing product metrics and YES
network measures improve predictive performance?”

RQ3 Is considering distant nodes meaningful? YES

Hypothesis
A metric that quantifies the scale of change impact can TRUE
improve the performance of fault prediction.

Threats to Validity
 Language
 ImpactScale has no language-specific feature, but the evaluations are
done in only COBOL systems. COBOL has a lot of difference from other
languages.

 Application Domain
 The evaluated systems are only in accounting business domain.

 Call Graph Analysis
 The impact of dynamic dispatching (e.g. polymorphism and reflection) is
not assessed.


Conclusion
 We defined a new product metric quantifying change impact,
called ImpactScale.
 Probabilistic propagation
 Relation-sensitive propagation
 Practical computational time even for large-scale software systems

 We evaluated its predictive performance in enterprise systems.
 Adding ImpactScale improves the performance
• Over 1.5 times in first 10% effort (LOC).
 Additional Finding
• Considering distant nodes in dependency graph is meaningful for fault
prediction.


Future Works
 Extending supported languages
 Java, C, C++

 Expanding use cases
 Rapid risk assessment
 Watching violations of modularity
 Measuring software decay


Thank you!
Kenichi Kobayashi
Fujitsu Labs


ICSM2011 @ Williamsburg, 2011-09-27

Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (11)

En vedette

En vedette (20)

Similaire à Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems

Similaire à Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems (20)

Dernier

Dernier (20)

Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in Large Software Systems