Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Localizing Failure-Inducing Program
Edits Based on S
Edit B d Spectrum I f
t Information
ti

Lingming Zhang, Miryung Kim, Sarfraz Khurshid
The University of Texas at Austin

ICSM2011, September 27th 2011

1

Overview
Change impact analysis is effective at finding
suspicious edits but lacks precise ranking.
Spectrum-based
Spectrum based fault localization is effective at
ranking but does not scale well.

Our insight: combine change impact analysis and
change-impact
spectrum-based fault localization.
• Identify suspicious edits using extended call graphs.
• Rank suspicious edits using dynamic program
spectrum information.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 2

Summary of our results
y
FaultTracer localizes failure-inducing edits with
failure inducing
high precision:
• Id tif i suspicious edits: outperforms
Identifying i i dit t f
Chianti by 19.37%.
• Ranking all suspicious edits: ranks real
regression faults within top 3 edits for 14 of
the 22 studied real-world failures.
• R ki method-level suspicious edits:
Ranking th d l l i i dit
outperforms existing heuristic by 56.25%.


Outline
FaultTracer Approach
Empirical Evaluation
Related Work
Conclusions


Example
p
Program P Program P’
P
public class A { public class A {
public static int f1=0; public static int f1=1;
p
public static int f2=0;
; public static int f2=1;
f2 1;
... ...
} }
evolve
class B { class B {
int f1=0; int f2=0; int f3=0; int f1=0; int f2=1; int f3=1;
; ; ;
public int foo(){return f1;} int f4=1;
... public int foo()
} { if(f1>=0) return f1;
class C extends B{ else return f4;
... }
} ...
}
Regression test suite T class C extends B{
public int f1=3;
public void test1() public void bar(int f) {f3=f+f1;}
{ A.bar(1); } ...
public void test2() { ... } }
Test
T t public void test3() { ... }
public void test4() { Bug!
C c = new C();

}
int f = c.foo();
Re-Test
public void test5() { ... }


FaultTracer overview
Selecting tests
g
T based on Extended
Call Graph analysis
Detecting
changes and
changes and
T
T’
P dependences
②
① ∆
P’ ③
Identifying suspicious
Id tif i i i
ᵟt
edits based
t on Extended Rank suspicious
Call Graph analysis
Call Graph analysis ④ edits based on
edits based on
program spectrum
information

’
ᵟt

Extended Call Graph representation
p p
public void test1() { A.bar(1); }
public void test4() {
C c = new C();
int f = c.foo();
}

TraditionalCallGraph ExtendedCallGraph
usedbyChianti
used by Chianti usedbyFaultTracer
used by FaultTracer

test1 test4 test1 test4

<C,C.foo()> <C,C.foo()>

A.Clinit() A.bar() C.C() C.foo() A.Clinit() A.bar() C.C() C.foo()

<SFW,A.f2> <FR,C.f1>
B.B()

A.f2 B.B() B.f1


Step 1. Detecting atomic changes and
p g g
dependences

Change Description
types
CM Changemethod
AM Addmethod
DM Deletemethod
AF Addfield
DF Deletefield
CFI Changeinstancefield
CSFI Changestaticfield
Change static field
LCm Methodlook-upchange
LCf Fieldlook-up change
Changedependencesinferencerules
Change dependences inference rules
AtomicChangeTypes


Step 2. Test selection based on Extended
Call Graph
C ll G h (ECG) analysisl i

FaultTracer directly matches all changes with test ECGs
before edits to select the influenced tests.


Step 3. Suspicious edit identification
based on E t d d C ll G h analysis
b d Extended Call Graph l i
FaultTracer directly selects the non-look-up changes
appear on test ECGs after edits as suspicious edits.
FaultTracer selects method or field edits that have caused
look up
look-up changes on test ECGs as suspicious edits
edits.


Step 4. Spectrum-based fault localization
for
f program edits
dit
Correlation between suspicious edits and tests
p
Edits test2 test3 test4 test5
CSFI(A.f1)
CM(B.foo)
CM(B f )
AF(C.f1)
AM(C.bar)
out Pass Pass Pass Fail

Suspiciousness score computation
Suspiciousness Score Tie
Break

Edits Tarantula SBI Jaccard Ochiai -

CSFI(A.f1) 0.00 0.00 0.00 0.00 -

CM(B.foo) 0.75 0.50 0.50 0.71 1

AF(C.f1) 0.75 0.50 0.50 0.71 0

AM(C.bar) 1.00 1.00 1.00 1.00 -


Outline
FaultTracer Approach
Empirical Evaluation
Related Work
Conclusions


Research Questions

RQ1: How does FaultTracer compare to Chianti in
identifying
id tif i suspicious edits?
i i dit ?

RQ2: How effective is FaultTracer in ranking
suspicious edits?


Subjects: overview
j

Subjects from Software-artifact Infrastructure
Repository (SIR)
(SIR).

Project Version Program Size (KLoC)
Program Size (KLoC) Number
Number
of Test
Jtopas 0.0-3.0 1.83 ~ 5.36 95-209
Xml-Security 0.0-3.0 17.44 ~ 18.99 84-106
JMeter 0.0-5.0 31.01 ~ 41.05 70-97
Ant 0.0-8.0 17.20 ~ 80.44 112-878


Subjects: change statistics
j g
Number of changes for each version pair
Ant7.0-8.0
Ant6.0-7.0
Ant5.0 6.0
Ant5 0-6 0
Ant4.0-5.0
Ant3.0-4.0
AM
Ant2.0-3.0
Ant1.0-2.0 DM
Ant0.0-1.0 CM
JMeter4.0-5.0
AF
JMeter3.0 4.0
JMeter3.0-4.0
JMeter2.0-3.0 DF
JMeter1.0-2.0 CFI
JMeter0.0-1.0 CSFI
XmlSec2.0-3.0
LCm
XmlSec1.0-2.0
XmlSec0.0-1.0 LCf
Jtopas2.0-3.0
p
Jtopas1.0-2.0
Jtopas0.0-1.0

0 1000 2000 3000 4000 5000 6000 7000


RQ1: How does FaultTracer compare to
Chianti in identifying
Chi ti i id tif i suspicious edits?
i i dit ?
FaultTracer achieves 19.37% improvement in the
precision of identification suspicious edits.
160
140
120
100
80
60 Chianti

40 FaultTracer

20
0
XmlSec0.0-1.0

2.0

3.0

Ant0.0-1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0
Jtopas0.0-1.0

2.0

3.0

JMeter0.0-1.0

2.0

3.0

4.0

5.0
XmlSec1.0-2

XmlSec2.0-3

Ant1.0-2

Ant2.0-3

Ant3.0-4

Ant4.0-5

Ant5.0-6

Ant6.0-7

Ant7.0-8
Jtopas1.0-2

Jtopas2.0-3

JMeter1.0-2

JMeter2.0-3

JMeter3.0-4

JMeter4.0-5
X

X

X


RQ2: How effective is FaultTracer in
ranking suspicious edits?
ki i i dit ?
Ranks all types of edits:
• Average performance.
Tarantula SBI Jaccard Ochiai Suspicious Edit
edit num. number
Average 8.50 8.50 10.83 14.66 68.83 3932
Percentage To 0.22% 0.22% 0.28% 0.37% 1.75% --
edit number

• Example (Ant5.0-6.0)
Test
T t Tarantula
T t l SBI Jaccar
J Ochiai
O hi i Suspicious Edit
S i i
d edit num. number
ant.taskdefs.optional.EchoPro 1 1 1 10 182 5019
pertiesTest.testEchoToBadFile
pertiesTest testEchoToBadFile


RQ2: How effective is FaultTracer in
ranking suspicious edits?
ki i i dit ?
Ranks method edits (FaultTracer v.s. Heuristic)
• Achieves 56.25% improvement in the precision of
localizing method-level failure-inducing edits


Limitations
Does not currently filter out refactorings (e.g., use
RefFinder [Prete+2010]).
Uses only four spectrum based fault localization
spectrum-based
techniques.
The experimental evaluation is limited by the small
number of real regression faults.


Related work
Change impact
Change-impact analysis
• Chianti [Ren+2004]
• Crisp [Chesley+2005]
• Heuristic ranking [Ren+2007]
Fault localization
• Spectrum-based
Spectrum based
• E.g., Tarantula [Jones+2002], SBI [Liblit+2005], Jaccard
[Abreu+2007], Ochiai [Abreu+2007].
• Delta debugging [Zeller1999]
• Model-based
Model based
• E.g., Bayesian diagnosis [Kleer+1987]


Conclusion
FaultTracer combines change impact analysis with
g p y
dynamic spectra.
FaultTracer improves change impact analysis based
extended call graph analysis.
Experimental evaluation shows FaultTracer:
• Performs 19.37% better than Chianti in determining
affecting changes
changes.
• Localizes failure-inducing edits within top 3 edits for
14 of the 22 regression failures
failures.
• Performs 56.25% better than previous heuristic for
localizing f il
l li i failure-inducing program edits.
i d i dit

zhanglm10@gmail.com
zhanglm10@gmail com


Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Dernier

Dernier (20)

Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information