Search-Based Robustness Testing of Data Processing Systems
1. .lusoftware verification & validation
VVS
Search-Based !
Robustness Testing!
of Data Processing Systems
Daniel Di Nardo, Fabrizio Pastore,
Andrea Arcuri, Lionel Briand
University of Luxembourg
Interdisciplinary Centre for Security, Reliability and Trust
Software Verification and Validation Lab
4. Data
Processing
System
4
• Essential component of systems that aggregate and
analyse real-world data
• Robustness is “the degree to which a system or
component can function correctly in the presence of
invalid inputs or stressful environmental conditions”
7. Contributions
• An evolutionary algorithm to automate robustness testing of
data processing systems
• Use of four fitness functions (model-based and code-based)
that enable the effective generation of robustness test cases
by means of evolutionary algorithms
• An extensive study of the effect of fitness functions and
configuration parameters on the effectiveness of the approach
using an industrial data processing system as case study.
7
8. Testing Automation Problems
• How to automatically generate test inputs?
• Data mutation methodology [ICST’15]
• How to automatically verify test execution results?
• Modelling methodology [ASE’13]
• How to identify the most effective inputs?
• Best size of inputs? Which data types to consider? !
How many data faults should be present? !
Which constraints should be broken?
• Meta-heuristic search approach [ASE’15]
9. Testing Automation Problems
• How to automatically generate test inputs?
• Data mutation methodology [ICST’15]
• How to automatically verify test execution results?
• Modelling methodology [ASE’13]
• How to identify the most effective inputs?
• Best size of inputs? Which data types to consider? !
How many data faults should be present? !
Which constraints should be broken?
• Meta-heuristic search approach [ASE’15]
13. Generic mutation operators
(reusable across projects)
Class
Instance
Duplica/on
A2ribute
Replacement
with
Random
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
14. Generic mutation operators
(reusable across projects)
Configurations for operators
(fit the fault model)
Transmission
Vcdu
1
1
1
versionNumber : Integer
spaceCraftId : Integer
checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
1..*
A2ribute
Replacement
with
Random
15. Generic mutation operators
(reusable across projects)
Configurations for operators
(fit the fault model)
Transmission
Vcdu
1
1
1
versionNumber : Integer
spaceCraftId : Integer
checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
1..*
A2ribute
Replacement
with
Random
Configuration for mutation operators
is provided by UML stereotypes used
to select mutation targets
!
!
16. Generic mutation operators
(reusable across projects)
Transmission
Vcdu
1
1
1
<Identifier> versionNumber : Integer
spaceCraftId : Integer
checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
1..*
A2ribute
Replacement
with
Random
Configuration for mutation operators
is provided by UML stereotypes used
to select mutation targets
!
!
Configurations for operators
(fit the fault model)
17. Generic mutation operators
(reusable across projects)
Transmission
Vcdu
1
1
1
<Identifier> versionNumber : Integer
spaceCraftId : Integer
checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Replacement
with
Random
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
1..*
Configurations for operators
(fit the fault model)
Configuration for mutation operators
is provided by UML stereotypes used
to select mutation targets
!
!
18. Test Input
Field Data
Mutation
Based
Generation
Test Input
Test Input
Test Input
Test Input
Test Input
Effective
Small size
By measuring specific objectives.
How to generate an effective and small test suite?
How to evaluate the effectiveness of a test suite?
By means of a meta-heuristic search algorithm.
19. Generic mutation operators
(reusable across projects)
Configurations for operators
(fit the fault model)
Transmission
Vcdu
1
1
1
<Identifier> versionNumber : Integer
spaceCraftId : Integer
<Derived> checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Replacement
with
Random
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
• UML stereotypes to select !
mutation targets
• UML stereotype to identify the !
fields to update!
!
1..*
19
20. Generic mutation operators
(reusable across projects)
Configurations for operators
(fit the fault model)
Transmission
Vcdu
1..*
1
1
1
<Identifier> versionNumber : Integer
spaceCraftId : Integer
<Derived> checksum : Integer
Header
Class
Instance
Duplica/on
A2ribute
Replacement
with
Random
A2ribute
Bit
Flipping
Class
Instance
Removal
Class
Instances
Swapping
A2ribute
Replacement
Boundary
Cond.
• UML stereotypes to select !
mutation targets
• UML stereotype to identify the !
fields to update
• OCL queries to express complex !
target selection criteria
20
22. Test Effectiveness Objectives
• O1: Include input data that covers all the classes of the data model
• Data has a complex structure
• O2: Cover all the data faults of a fault model
• A variety of faults might be present in a system
• O3: Cover all the clauses of the input/output constraints
• Input/output constraints can have multiple conditions under which
a given output is expected
• O4: Maximise code coverage
• Implemented features should be fully executed
23. O1: Cover all the classes of the
data model
• Coverage of each class of a data model is tracked
• Test input covers a class if it contains at least one instance of
the class
24. O1: Cover all the classes of the
data model
Vcdu
ActivePacketZone
1
1
1
versionNumber : Integer
vcFrameCount : Integer
checksum : Integer
Header
1..*
Transmission
Packet
IdlePacketZone
PacketZone
1
1..*
1
1
25. O1: Cover all the classes of the
data model
Objective Targets
Test Inputs
Inp1
Inp2
Inp3
Vcdu
X
X
X
Header
X
X
X
IdlePacketZone
X
X
ActivePacketZone
X
X
Packet
X
X
X
26. O2: Cover the fault model
• Attributes and class instances of the input data model can be
mutated in different ways by different mutation operators
• Keep track of which mutation operator(s) have been applied to
a specific class/attribute instance when generating test data
27. O2: Cover the fault model
Vcdu
Packet
1
1
1
versionNumber : Integer
vcFrameCount : Integer
Header
1..*
29. O2: Cover the fault model
Vcdu
Packet
1
1
1
versionNumber : Integer
vcFrameCount : Integer
Header
1..*
Header.vcFrameCount::ReplaceWithRandom
30. O2: Cover the fault model
Vcdu
Packet
1
1
1
versionNumber : Integer
vcFrameCount : Integer
Header
1..*
Packet::InstanceDuplication
Packet::InstanceRemoval
Packet::InstanceSwapping
Class Instance
Mutation Operator
31. O2: Cover the fault model
Objective Targets
Test Inputs
Inp1
Inp2
Inp3
Header.versionNumber::ReplaceWithRandom
X
X
Header.vcFrameCount::ReplaceWithRandom
X
Packet::InstanceRemoval
Packet::InstanceDuplication
Packet::InstanceSwapping
X
32. O3: Cover clauses of constraints
• An input/output constraint shows the output expected under a
given input condition
• The test suite should stress all the conditions under which a
given output is expected
33. O3: Cover clauses of constraints
context Vcdu inv:
if previousFrameCount < 16777215
then frameCount <> previousFrameCount + 1
else
previousFrameCount = 16777215 and frameCount <> 0
endif
implies
VcduEvent.allInstances()èexists(e | e.eventType = COUNTER_JUMP)
34. O3: Cover clauses of constraints
context Vcdu inv:
if previousFrameCount < 16777215
then frameCount <> previousFrameCount + 1
else
previousFrameCount = 16777215 and frameCount <> 0
endif
implies
VcduEvent.allInstances()èexists(e | e.eventType = COUNTER_JUMP)
For each clause, keep track of whether a test input makes the
clause true and/or false.
35. O3: Cover clauses of constraints
Objective Targets
Test Inputs
Inp1
Inp2
Inp3
True : previousFrameCount < 16777215
X
X
X
True : frameCount <> previousFrameCount + 1
X
True : previousFrameCount = 16777215
True : frameCount <> 0
X
X
X
False : previousFrameCount < 16777215
False : frameCount <> previousFrameCount + 1
X
X
X
False : previousFrameCount = 16777215
X
X
X
False : frameCount <> 0
36. O4: Maximize code coverage
• Execute JaCoCo to measure the instructions covered by each
test case
Objective Targets
Test Inputs
Inp1
Inp2
Inp3
SesDaq.java : Instruction 10
X
X
X
SesDaq.java : Instruction 11
X
…
• Limitation: Requires the execution of the system under test
37. Evolutionary Algorithm with Archive
How to generate an effective
and small test suite?
Huge amount of test inputs can be generated
Exhaustive test generation not feasible
39. Sample new chunk
Field Data
Field data (satellite transmission):
With Seeding
No Seeding
No seeding: frequent packet types selected more often
No seeding: packets randomly selected
40. Sample new chunk
Field Data
Field data (satellite transmission):
With Seeding
No Seeding
With seeding: all packet types same probability
With seeding: packet types randomly selected
41. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
filtering
pruning
Assessment
43. Objective 1
Data model
coverage
Vcdu
X
X
X
Header
X
X
X
IdlePacketZone
X
X
ActivePacketZone
X
X
X
Packet
X
X
X
Objective 2
Fault model!
coverage
Header.versionNumber::ReplaceWithRandom
X
X
Header.vcFrameCount::ReplaceWithRandom
X
Packet::InstanceRemoval
X
Packet::InstanceDuplication
Packet::InstanceSwapping
Objective 3
Constraints !
clause !
coverage
True : previousFrameCount < 16777215
X
X
X
True : frameCount <> previousFrameCount + 1
X
True : previousFrameCount = 16777215
True : frameCount <> 0
X
X
X
False : previousFrameCount < 16777215
False : frameCount <> previousFrameCount + 1
X
X
X
False : previousFrameCount = 16777215
X
X
X
Objective 4
Code cov.
SesDaq.java : Line 10
X
X
X
SesDaq.java : Line 11
X
44. Objective 1
Data model
coverage
Vcdu
X
X
X
Header
X
X
X
IdlePacketZone
X
X
ActivePacketZone
X
X
X
Packet
X
X
X
Objective 2
Fault model!
coverage
Header.versionNumber::ReplaceWithRandom
X
X
Header.vcFrameCount::ReplaceWithRandom
X
Packet::InstanceRemoval
X
Packet::InstanceDuplication
Packet::InstanceSwapping
Objective 3
Constraints !
clause !
coverage
True : previousFrameCount < 16777215
X
X
X
True : frameCount <> previousFrameCount + 1
X
True : previousFrameCount = 16777215
True : frameCount <> 0
X
X
X
False : previousFrameCount < 16777215
False : frameCount <> previousFrameCount + 1
X
X
X
False : previousFrameCount = 16777215
X
X
X
Objective 4
Code cov.
SesDaq.java : Line 10
X
X
X
SesDaq.java : Line 11
X
45. Objective 1
Data model
coverage
Vcdu
X
X
X
Header
X
X
X
IdlePacketZone
X
X
ActivePacketZone
X
X
X
Packet
X
X
X
Objective 2
Fault model!
coverage
Header.versionNumber::ReplaceWithRandom
X
X
Header.vcFrameCount::ReplaceWithRandom
X
Packet::InstanceRemoval
X
Packet::InstanceDuplication
Packet::InstanceSwapping
Objective 3
Constraints !
clause !
coverage
True : previousFrameCount < 16777215
X
X
X
True : frameCount <> previousFrameCount + 1
X
True : previousFrameCount = 16777215
True : frameCount <> 0
X
X
X
False : previousFrameCount < 16777215
False : frameCount <> previousFrameCount + 1
X
X
X
False : previousFrameCount = 16777215
X
X
X
Objective 4
Code cov.
SesDaq.java : Line 10
X
X
X
SesDaq.java : Line 11
X
46. Objective Targets
Test Inputs
Inp1
Inp2
Inp3
Objective 1
Data model
coverage
Vcdu
X
X
X
Header
X
X
X
IdlePacketZone
X
X
ActivePacketZone
X
X
X
Packet
X
X
X
Objective 2
Fault model
coverage
Header.versionNumber::ReplaceWithRandom
X
X
Header.vcFrameCount::ReplaceWithRandom
X
Packet::InstanceRemoval
X
Packet::InstanceDuplication
Packet::InstanceSwapping
Objective 3
Constraints !
clause !
coverage
True : previousFrameCount < 16777215
X
X
X
True : frameCount <> previousFrameCount + 1
X
True : previousFrameCount = 16777215
True : frameCount <> 0
X
X
X
False : previousFrameCount < 16777215
False : frameCount <> previousFrameCount + 1
X
X
X
False : previousFrameCount = 16777215
X
X
X
False : frameCount <> 0
Objective 4
Code Coverage
SesDaq.java : Line 10
X
X
X
SesDaq.java : Line 11
X
47. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
filtering
pruning
Assessment
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
48. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
49. Research questions
• RQ1: How does the search algorithm compare with random and state-of-
the-art approaches?
• RQ2: How does fitness based on code coverage affect performance?
• RQ3: How does seeding affect performance?
• RQ4: What are the configuration parameters that affect performance?
• RQ5: What configuration should be used in practice?
• Case study: Satellite DAQ developed by SES
50. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
Stop after: 50k, 100k, 150k, 200k, 250k
Coverage-fitness: on, off
This leads to 3 × 3 × 3 × 2 × 2 = 108 configurations
108 × 5 = 540 different configurations of search algorithm
Each experiment repeated 5 times to account for
randomness: 540 × 5 = 2700 runs
51. RQ1: How does the search algorithm compare with random and
state-of-the-art approaches?
Budget (in Cadus)
Configuration
Coverage
# of Tests
50k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23424.4
23424.4
23386.8
28.4
28.4
43.2
100k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23487.8
23487.8
23436.8
31.6
31.6
52.0
150k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23502.0
23502.0
23453.4
34.0
34.0
57.8
200k
Best: r=0.5,m=0.5,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23519.6
23513.4
23465.8
34.6
36.0
60.2
250k
Best: r=0.5,m=1,n=10
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23538.6
23515.2
23482.6
38.4
36.4
62.4
r, probability of random sampling
m, probably of applying mutation when sampling
n, maximum number of allowed mutations in a test
(Seeding not used)
Best, best configuration for the given search budget
BO, best configuration, on average, over all the search budgets
Rand, random approach
52. RQ1: How does the search algorithm compare with
random and state-of-the-art approaches?
• Random approach
• Always sample and mutate; do not reuse archived items
• Previous approach (ICST’15)
• Stops test input generation when all attributes have been
mutated at least once by each applicable mutation operator
• Search-based algorithm
• Best overall configuration
• Best configuration for a given budget
53. RQ1: How does the search algorithm compare with
random and state-of-the-art approaches?
Budget
Configuration
Coverage
# of Tests
ICST’15
23283.0
43.0
50k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Random
23424.4
23424.4
23386.8
28.4
28.4
43.2
100k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23487.8
23487.8
23436.8
31.6
31.6
52.0
150k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23502.0
23502.0
23453.4
34.0
34.0
57.8
200k
Best: r=0.5,m=0.5,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23519.6
23513.4
23465.8
34.6
36.0
60.2
250k
Best: r=0.5,m=1,n=10
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23538.6
23515.2
23482.6
38.4
36.4
62.4
Search algorithm achieves better coverage
than both random and the ICST’15
approaches.
Search also generates significantly
smaller test suites.
54. RQ1: How does the search algorithm compare with
random and state-of-the-art approaches?
Budget
Configuration
Coverage
# of Tests
ICST’15
23283.0
43.0
50k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Random
23424.4
23424.4
23386.8
28.4
28.4
43.2
100k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Random
23487.8
23487.8
23436.8
31.6
31.6
52.0
150k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Random
23502.0
23502.0
23453.4
34.0
34.0
57.8
200k
Best: r=0.5,m=0.5,n=100
BO: r=0.5,m=1,n=100
Random
23519.6
23513.4
23465.8
34.6
36.0
60.2
250k
Best: r=0.5,m=1,n=10
BO: r=0.5,m=1,n=100
Random
23538.6
23515.2
23482.6
38.4
36.4
62.4
At the cost of a larger test suite.
With higher search budgets, search can
achieve greater coverage.
55. RQ1: How does the search algorithm compare with random and
state-of-the-art approaches?
55
Budget
Configuration
Coverage
# of Tests
ICST’15
23283.0
43.0
50k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23424.4
23424.4
23386.8
28.4
28.4
43.2
100k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23487.8
23487.8
23436.8
31.6
31.6
52.0
150k
Best: r=0.5,m=1,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23502.0
23502.0
23453.4
34.0
34.0
57.8
200k
Best: r=0.5,m=0.5,n=100
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23519.6
23513.4
23465.8
34.6
36.0
60.2
250k
Best: r=0.5,m=1,n=10
BO: r=0.5,m=1,n=100
Rand: r=1,m=1,n=1
23538.6
23515.2
23482.6
38.4
36.4
62.4
Search algorithm achieves better coverage
than a random approach.
APT achieved an average coverage of
23283 instructions.
Less than both search and random.
Search also generates
significantly smaller test suites.
56. RQ2: How does fitness based on code coverage affect
performance?
Budget
Code
Seeding
Configuration
Coverage
# of Tests
# of Mut.
50k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=10
BO: r=0.3,m=0,n=10
23361.4
23424.4
23417.2
23428.4
23401.8
17.0
28.4
21.0
34.2
27.0
4.8
3.6
4.0
3.2
4.3
100k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.3,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23404.4
23487.8
23442.2
23487.0
23487.0
16.8
31.6
21.0
33.2
33.2
8.2
4.9
6.4
5.6
5.6
150k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23418.4
23502.0
23447.4
23528.2
23528.2
28.2
34.0
23.4
35.6
35.6
4.0
6.0
7.5
6.5
6.5
200k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=0.5,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23426.0
23519.6
23456.0
23551.0
23551.0
28.0
34.6
23.2
37.2
37.2
4.7
6.7
9.2
7.0
7.0
250k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23433.2
23538.6
23461.8
23554.4
23554.4
28.6
38.4
23.6
37.2
37.2
5.4
7.1
10.3
7.4
7.4
r, probability of random sampling
m, probably of applying mutation when sampling
n, maximum number of allowed mutations in a test
Best, best configuration for the given search budget
BO, best configuration, on average, over all the search budgets
57. RQ2: How does fitness based on code coverage affect
performance?
Budget
Code
Seeding
Configuration
Coverage
# of Tests
# of Mut.
50k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=10
BO: r=0.3,m=0,n=10
23361.4
23424.4
23417.2
23428.4
23401.8
17.0
28.4
21.0
34.2
27.0
4.8
3.6
4.0
3.2
4.3
100k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.3,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23404.4
23487.8
23442.2
23487.0
23487.0
16.8
31.6
21.0
33.2
33.2
8.2
4.9
6.4
5.6
5.6
150k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23418.4
23502.0
23447.4
23528.2
23528.2
28.2
34.0
23.4
35.6
35.6
4.0
6.0
7.5
6.5
6.5
200k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=0.5,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23426.0
23519.6
23456.0
23551.0
23551.0
28.0
34.6
23.2
37.2
37.2
4.7
6.7
9.2
7.0
7.0
250k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23433.2
23538.6
23461.8
23554.4
23554.4
28.6
38.4
23.6
37.2
37.2
5.4
7.1
10.3
7.4
7.4
r, probability of random sampling
m, probably of applying mutation when sampling
n, maximum number of allowed mutations in a test
Best, best configuration for the given search budget
BO, best configuration, on average, over all the search budgets
ess
ger
58. RQ2: How does fitness based on code coverage affect
performance?
• For each search budget:
• Identified the best configuration with/without the code
coverage objective enabled
Code coverage objective results in test suites with
higher code coverage.
At the expense of a larger test suite
(50% more test cases).
59. RQ3: How does seeding affect performance?
• For each search budget:
• Identified the best configuration with/without seeding
Seeding is always part of the configurations
that achieve the
highest code coverage
or lowest number of test cases
(for search budgets above 150k).
60. RQ4: What are the configuration parameters that
affect performance?
61. RQ3: How does smart seeding affect performance?
Budget
Code
Seeding
Configuration
Coverage
# of Tests
# of Mut.
50k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=10
BO: r=0.3,m=0,n=10
23361.4
23424.4
23417.2
23428.4
23401.8
17.0
28.4
21.0
34.2
27.0
4.8
3.6
4.0
3.2
4.3
100k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.3,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23404.4
23487.8
23442.2
23487.0
23487.0
16.8
31.6
21.0
33.2
33.2
8.2
4.9
6.4
5.6
5.6
150k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23418.4
23502.0
23447.4
23528.2
23528.2
28.2
34.0
23.4
35.6
35.6
4.0
6.0
7.5
6.5
6.5
200k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=0.5,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23426.0
23519.6
23456.0
23551.0
23551.0
28.0
34.6
23.2
37.2
37.2
4.7
6.7
9.2
7.0
7.0
250k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23433.2
23538.6
23461.8
23554.4
23554.4
28.6
38.4
23.6
37.2
37.2
5.4
7.1
10.3
7.4
7.4
r, probability of random sampling
m, probably of applying mutation when sampling
n, maximum number of allowed mutations in a test
Best, best configuration for the given search budget
BO, best configuration, on average, over all the search budgets
62. RQ3: How does smart seeding affect performance?
Budget
Code
Seeding
Configuration
Coverage
# of Tests
# of Mut.
50k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=10
BO: r=0.3,m=0,n=10
23361.4
23424.4
23417.2
23428.4
23401.8
17.0
28.4
21.0
34.2
27.0
4.8
3.6
4.0
3.2
4.3
100k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.3,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23404.4
23487.8
23442.2
23487.0
23487.0
16.8
31.6
21.0
33.2
33.2
8.2
4.9
6.4
5.6
5.6
150k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23418.4
23502.0
23447.4
23528.2
23528.2
28.2
34.0
23.4
35.6
35.6
4.0
6.0
7.5
6.5
6.5
200k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=0.5,n=100
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23426.0
23519.6
23456.0
23551.0
23551.0
28.0
34.6
23.2
37.2
37.2
4.7
6.7
9.2
7.0
7.0
250k
F
T
F
T
T
0.0
0.0
0.5
0.5
0.5
Best: r=0.8,m=1,n=100
Best: r=0.5,m=1,n=10
Best: r=0.5,m=1,n=100
Best: r=0.3,m=0,n=10
BO: r=0.3,m=0,n=10
23433.2
23538.6
23461.8
23554.4
23554.4
28.6
38.4
23.6
37.2
37.2
5.4
7.1
10.3
7.4
7.4
r, probability of random sampling
m, probably of applying mutation when sampling
n, maximum number of allowed mutations in a test
Best, best configuration for the given search budget
BO, best configuration, on average, over all the search budgets
For search budgets greater than 150k,
smart seeding achieves the highest coverage or
lowest number of test cases.
63. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
Coverage fitness applied in top configurations.
Never by worst ones.
64. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
Coverage fitness applied in top configurations.
Never by worst ones.
65. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
For small search budgets, search achieves better results
when more focused on exploitation (using archived inputs).
66. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
For small search budgets, search achieves better results
when more focused on exploitation (using archived inputs).
67. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
For larger search budgets, with no seeding or coverage,
putting more emphasis on exploration (new samples) pays off.
68. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
For larger search budgets, with no seeding or coverage,
putting more emphasis on exploration (new samples) pays off.
69. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
If either seeding OR coverage fitness used, the need
to explore the search landscape decreases.
70. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
If either seeding OR coverage fitness used, the need
to explore the search landscape decreases.
71. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
If either seeding AND coverage fitness used, the need
to explore the search landscape further decreases.
72. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
If either seeding AND coverage fitness used, the need
to explore the search landscape further decreases.
73. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Constraint
Violations
filtering
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ4: What are the configuration parameters that
affect performance?
Average number of mutations per test input
remains low (~10).
74. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
Average number of mutations per test input
remains low (~10).
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
75. Apply a mutation
Put in Archive
Copy from archive
Sample new chunk
Field Data
Test
Inputs
Execute on System
And Validate Results
Constraint
Violations
filtering
pruning
Assessment
p sampling = 0.3, 0.5, 0.8
Max mutations = 1, 10, 100
p seeding = 0, 0.5
p mutation = 0, 0.5, 1
Coverage-fitness: on, off
Stop after: 50k, 100k, 150k, 200k, 250k
RQ5: What configuration should be
used in practice?
76. RQ5: What configurations should be used in practice?
• Small probability of sampling new test data at random
• (p=0.3) !
• Do not mutate new inputs immediately when sampled!
• Limit the max number of mutations (max mutations = 10)
• Seeding and code coverage are used