SSBSE 2020 keynote

Search-Based Testing for Formal
Software Veriﬁcation and Vice Versa
Shiva Nejati
snejati@uottawa.ca
@ShivaNejati
School of Electrical Engineering and Computer Science, University of Ottawa and
SnT Centre, University of Luxembourg
1

Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.

Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Bogdan Korel: Automated Software Test Data
Generation. IEEE TSE, 16(8): 870-879, 1990.

SBSE and SBST
3
Problem Domains
http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/

SBST Applications
• Applied to various categories of software testing:
• Unit testing
• System testing
• Regression testing
• Model-based testing
• …
4

SBST Strengths
• Scalable
• Can be parallelized easily
• Versatile
• Make few assumptions about the structure of their inputs
• Flexible and adaptable
• Can be combined with other methods: Constraint Solving, Machine
Learning, etc.
• Simple!
5

But When Can We Not Use Search?
6
Veriﬁcation
• Establishing properties of
programs by mathematical
proofs (Static Veriﬁcation)
• Demonstrating
correctness of all system
usages

But When Can We Not Use Search?
6
Veriﬁcation
• Establishing properties of
programs by mathematical
proofs (Static Veriﬁcation)
• Demonstrating
correctness of all system
usages
Testingvs
Checking the system for a
set of normal and boundary
usages

Classical versus Stochastic
7
Stochastic
Optimisation
vs
Sampling solutions in a
randomised way and checking
if a desired solution is found.
Classical
Optimisation
Building solutions
incrementally (or recursively)
following a (semi)
deterministic algorithm.

“Program testing can be used
to show the presence of bugs,
but never their absence.”
Edsger W. Dijkstra

Dichotomy Between Testing and
Verification
• “As a programmer, even if you would like to have
correctness, you might find yourself spending most of your
time reasoning about incorrectness.” Peter W. O’Hearn
• Testing and verification almost mean the same for
practitioners
9

PhD in Formal
Methods
PostDoc in
Empirical SE

PhD in Formal
Methods
PostDoc in
Empirical SE
GA! No way! how can 
a randomized search  
algorithm solve this problem?

PhD in Formal
Methods
PostDoc in
Empirical SE
But is there a way to
combine or compare both
types of optimisations?

In This Talk, …
11
Testing and/or Veriﬁcation of Models of
Cyber Physical Systems

Claudio Menghi Khouloud Gaaloul Lionel Briand

• R1: The angular velocity of the satellite shall always be lower
than 1.5 m/s
• R4: The satellite attitude shall reach close to its target value
within 2000 s
The SatEx case study
13

The CPS development workﬂow
PHASE 1:
Modeling
(Simulink) Model
14

PHASE 2:
Veriﬁcation
Input Outputs
Model
Requirements
Check
15

PHASE 3:
Coding
Model Source code
16

Modeling
(Simulink)
Veriﬁcation/ 
Testing
Coding
17

Veriﬁcation/ 
Testing
17

Model Requirements
18
Model Checking Model Testing

Comparing Model Checking
and Model Testing
Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking
for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August
26-30, 2019. pp. 1015–1025. ACM (2019)

20
Model CheckingModel Testing
Logical
Properties
Simulink
Models
Natural Language
Requirements
Model
Checking
Model proven to
be correct
Failure Found No result
Ranges of
test input
variables
Simulink
Models
Natural Language
Requirements
Model
Testing
Logical
Properties
Fitness
Functions
No Failure FoundFailure Found

Simulink Model Checker
• QVTrace from QRA Corp,
Canada
• SMT-based model checker
for Simulink, Z3,
Mathematica
21
https://qracorp.com/qvtrace/

QVTrace
22
QVtrace has been designed to optimize the workflow for model-based design analysis. The
interface has three main sections as shown in the image below and described in detail on
the next page.
QVtrace User Manual v0.11.7 qracorp.com of4 21
1
2
3
Analysis in QVtrace can be approached in two ways:
a) By formally translating sets of requirements specifications and verifying the model
meets these, or
b) As an interactive querying process where the domain expert iteratively queries the
model for expected behaviour as the system components are modelled.
Analysis will always be done on all constraints present in the Constraints Window and can
be run from any subsystem in the model. It is important to note that the analysis will always
check the entire model against all constraints present, and not just the subsystem being
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to ensure these are consistent
with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For
example writing “param_1 == 5” where param_1 is a boolean variable will return an error
message stating that the constraint is inappropriately written, and no analysis will be run
on the model.
4. Interpreting QVtrace Analysis Results
4.1.Possible analysis results
No violations exist: This implies that the model is
consistent with the stated constraints for all possible
input values, and at all times. As shown in the left image,
the Results tab will turn green when no violations exist.
model for expected behaviour as the system component
Analysis will always be done on all constraints present in the C
be run from any subsystem in the model. It is important to note
check the entire model against all constraints present, and no
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to
with the QCT language syntax (see Section 5 for a guide to the
example writing “param_1 == 5” where param_1 is a boolean
message stating that the constraint is inappropriately written,
on the model.
4. Interpreting QVtrace Analysis Resu
4.1.Possible analysis results
No violations exist: This i
consistent with the stated
input values, and at all times.
the Results tab will turn green when no violations exist.
No violations exist up to a m
implies that the model has be
with the constraints within th
the system. However, there is no guarantee that at some great
When running analysis, the cons
with the QCT language syntax (s
example writing “param_1 == 5”
message stating that the constra
on the model.
4. Interpreting QVtr
4.1.Possible analysis
the Results tab will turn green wh
the system. However, there is no
occur. In these cases, the result
required to assess the validity o
including an explicit time referenc
QVtrace User Manual v0.11.7

Model Testing
(Falsiﬁcation-based Testing)
• Uses meta-heuristic search
• Search guidance: ﬁtness functions estimating how far a
candidate test is from violating a requirement
• Search heuristics: random, hill climbing, simulated
annealing, genetic algorithm, etc.
Search-based automated testing of continuous controllers: Framework, tool support, and case studies
Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull
Information & Software Technology, 2015
23

CPS Models
24
• 11 Models
• Open-loop vs Feedback-loop
• State-machines
• Continuous-Dynamics - Dynamical-Systems
• Non-linear dynamics
• Machine Learning components

Results — Fault Finding
25
Testing Model Checking
Reqs Violations Proven Violations
92 40 41 23
• MT and MC together could show that 41 requirements are correct and 40
requirements are violated
• Only 11 requirements remain inconclusive

26
500s0 400s300s200s100s

26
500s0 400s300s200s100s
BMC can analyse 
up to 500 steps (50s)

26
500s0 400s300s200s100s
BMC can analyse 
up to 500 steps (50s)
Testing found errors after 2000 steps

Results — Time
27
Testing Model Checking
Violations Proven Violations Inconclusive
5.8 min
MAX = 18.5min,
MIN=3min
0.6s
MAX=1.9s
MIN=0.06s
2.2s
MAX=10.1s
MIN=0.12s
15min to several
hours

Lessons Learned
• L1: Model Checking fails to analyse some CPS models
(Autopilot)
• This is a major obstacle in adoption of QVTrace by CPS
suppliers as conﬁrmed by QRA
28

Lessons Learned
• L2: Model checking is less effective than Model Testing in
ﬁnding requirements failures
• Model Checking found 23 requirements violations
• Model Testing found 40 requirements violations
29

Lessons Learned
• L3: Model Checking executes considerably faster than Model
Testing when it can prove or violate requirements
• Model Checking was able to prove 41 requirements and
ﬁnd violations in 23 requirements within a few seconds
30

Model Requirements
31
Model Checking Model Testing

Model Requirements
32
Model Testing

Model Requirements
32
Model Testing
Using SBST to automatically generate test inputs that reveal
requirement violations

Scaling Model Testing to Complex
Compute-intensive Models
Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-reﬁnement testing of compute-intensive cyber-
physical models: An approach based on system identication”. In: International Conference on Software Engineering
(ICSE). arXiv (2020)

Challenge
• Industrial models of CPS are often compute-intensive
34
*

Challenge
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
34
*

Challenge
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
• A simulation of the model of satellite requires ~1.5 hour
34
Provided by LuxSpace (https://luxspace.lu/)
*
*

Scaling Model Checking
35
E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided
Abstraction Reﬁnement.” CAV 2000: 154-169

CEGAR
36
Model  
Abstraction

CEGAR
36
Model  
Abstraction  
Model Check  
Abstract 
Model

CEGAR
36
Model  
Abstraction  
Model Check  
Abstract 
Model  
No Bug

CEGAR
36
Model  
Abstraction  
Simulate 
Bug
Model Check  
Abstract 
Model  
No Bug

CEGAR
36
Model  
Abstraction  
Simulate 
Bug
Model Check  
Abstract 
Model  
No Bug 
Real 
Bug

CEGAR
36
Model  
Abstraction  
Simulate 
Bug
Model Check  
Abstract 
Model  
No Bug 
Real 
Bug
Reﬁnement 
Spurious Bug

CEGAR
36
Model  
Abstraction  
Simulate 
Bug
Model Check  
Abstract 
Model  
No Bug 
Real 
Bug
Reﬁnement 
Spurious Bug
Reﬁned 
Abstract 
Model

CEGAR
36
Model  
Abstraction  
Simulate 
Bug
Model Check  
Abstract 
Model  
No Bug 
Real 
Bug
Reﬁnement 
Spurious Bug
Reﬁned 
Abstract 
Model  
Abstract  
Interpretation

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model  
Abstraction  
Model Check  
Abstract 
Model  
Simulate 
No Bug 
Bug
Reﬁnement 
Real 
BugSpurious Bug
Reﬁned 
Abstract 
Model

AppRoxImation-based
37
Model  
Abstraction  
Abstract 
Model  
Simulate 
No Bug 
Bug
Reﬁnement 
Real 
BugSpurious Bug
Reﬁned 
Abstract 
Model  
SBST

AppRoxImation-based
37
Model  
Abstraction  
Abstract 
Model  
Simulate 
Bug
Reﬁnement 
Real 
BugSpurious Bug
Reﬁned 
Abstract 
Model  
SBST

AppRoxImation-based
37
Model  
Abstract 
Model  
Simulate 
Bug
Reﬁnement 
Real 
BugSpurious Bug
Reﬁned 
Abstract 
Model  
SBST
Approximation

AppRoxImation-based
37
Model  
Abstract 
Model  
Simulate 
Bug
Refinement 
Real 
BugSpurious Bug
Refined 
Abstract 
Model  
Machine Learning 
(system identification)
SBST
Approximation

Evaluation:
Effectiveness and Efﬁciency
• RQ1. How effective is ARIsTEO in generating tests that reveal
requirements violations?
• RQ2. How efﬁcient is ARIsTEO in generating tests revealing
requirements violations?
38

RQ1 and RQ2 -
Effectiveness and Efﬁciency
39
RQ1: On average, ARIsTEO detects 23.9% more
requirements violations than S-Taliro (min=-8%, max=95%).
RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%)
more efﬁcient than S-Taliro.

RQ3 - Practical Usefulness
• RQ3. How applicable and useful is ARIsTEO in generating
tests revealing requirements violations for industrial CI-CPS
models?
40

RQ3 - Practical Usefulness
41
RQ3: ARIsTEO efﬁciently detected requirement violations
- in practical time - that S-Taliro could not ﬁnd,
on an industrial CI-CPS model

Model Requirements
42
Model Testing

Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Req

43
Yaw
Roll Pitch
Assumption: The pilot should apply sufﬁcient throttle force.
Req

43
Yaw
Roll Pitch
Assumption: The pilot should apply sufﬁcient throttle force.
Yaw
Roll Pitch
Throttle > c Req&
Req

Mining Assumptions using
Search and Decision Trees
Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using
machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)

Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle

Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F

Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c

Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Simple predicates!

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2

Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Complex linear and nonlinear formulas

Conclusions
• Assumption generation is important for model debugging and
compositional veriﬁcation (a.k.a assume-guarantee reasoning)
• Current inference techniques rely on automata theory and can generate
only boolean assumptions or assumptions over predicates
• Applying decision tree learning to test data, we can
• Generate assumptions that include arithmetic constraints over
numeric variables
• Using genetic programming, we can even go beyond linear arithmetic
constraints
48

Summary and Reflections
• Formal verification and testing (including SBST) have a common
goal
• For most applications, formal verification fails to prove
correctness and (like testing) can only show the presence of
bugs
• SBST and ML may improve formal verification in scalability and
applicability
• Systematic frameworks developed in the formal verification
community may help improve and enhance SBST
49

SSBSE 2020 keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SSBSE 2020 keynote

Similar to SSBSE 2020 keynote (20)

Recently uploaded

Recently uploaded (20)

SSBSE 2020 keynote