Generic or specific? Making sensible software design decisions
SSBSE 2020 keynote
1. Search-Based Testing for Formal
Software Verification and Vice Versa
Shiva Nejati
snejati@uottawa.ca
@ShivaNejati
School of Electrical Engineering and Computer Science, University of Ottawa and
SnT Centre, University of Luxembourg
1
2. Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
3. Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Bogdan Korel: Automated Software Test Data
Generation. IEEE TSE, 16(8): 870-879, 1990.
5. SBST Applications
• Applied to various categories of software testing:
• Unit testing
• System testing
• Regression testing
• Model-based testing
• …
4
6. SBST Strengths
• Scalable
• Can be parallelized easily
• Versatile
• Make few assumptions about the structure of their inputs
• Flexible and adaptable
• Can be combined with other methods: Constraint Solving, Machine
Learning, etc.
• Simple!
5
7. But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
8. But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
Testingvs
Checking the system for a
set of normal and boundary
usages
10. “Program testing can be used
to show the presence of bugs,
but never their absence.”
Edsger W. Dijkstra
11. Dichotomy Between Testing and
Verification
• “As a programmer, even if you would like to have
correctness, you might find yourself spending most of your
time reasoning about incorrectness.” Peter W. O’Hearn
• Testing and verification almost mean the same for
practitioners
9
18. • R1: The angular velocity of the satellite shall always be lower
than 1.5 m/s
• R4: The satellite attitude shall reach close to its target value
within 2000 s
The SatEx case study
13
25. Comparing Model Checking
and Model Testing
Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking
for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August
26-30, 2019. pp. 1015–1025. ACM (2019)
26. 20
Model CheckingModel Testing
Logical
Properties
Simulink
Models
Natural Language
Requirements
Model
Checking
Model proven to
be correct
Failure Found No result
Ranges of
test input
variables
Simulink
Models
Natural Language
Requirements
Model
Testing
Logical
Properties
Fitness
Functions
No Failure FoundFailure Found
27. Simulink Model Checker
• QVTrace from QRA Corp,
Canada
• SMT-based model checker
for Simulink, Z3,
Mathematica
21
https://qracorp.com/qvtrace/
28. QVTrace
22
QVtrace has been designed to optimize the workflow for model-based design analysis. The
interface has three main sections as shown in the image below and described in detail on
the next page.
QVtrace User Manual v0.11.7 qracorp.com of4 21
1
2
3
Analysis in QVtrace can be approached in two ways:
a) By formally translating sets of requirements specifications and verifying the model
meets these, or
b) As an interactive querying process where the domain expert iteratively queries the
model for expected behaviour as the system components are modelled.
Analysis will always be done on all constraints present in the Constraints Window and can
be run from any subsystem in the model. It is important to note that the analysis will always
check the entire model against all constraints present, and not just the subsystem being
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to ensure these are consistent
with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For
example writing “param_1 == 5” where param_1 is a boolean variable will return an error
message stating that the constraint is inappropriately written, and no analysis will be run
on the model.
4. Interpreting QVtrace Analysis Results
4.1.Possible analysis results
No violations exist: This implies that the model is
consistent with the stated constraints for all possible
input values, and at all times. As shown in the left image,
the Results tab will turn green when no violations exist.
model for expected behaviour as the system component
Analysis will always be done on all constraints present in the C
be run from any subsystem in the model. It is important to note
check the entire model against all constraints present, and no
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to
with the QCT language syntax (see Section 5 for a guide to the
example writing “param_1 == 5” where param_1 is a boolean
message stating that the constraint is inappropriately written,
on the model.
4. Interpreting QVtrace Analysis Resu
4.1.Possible analysis results
No violations exist: This i
consistent with the stated
input values, and at all times.
the Results tab will turn green when no violations exist.
No violations exist up to a m
implies that the model has be
with the constraints within th
the system. However, there is no guarantee that at some great
When running analysis, the cons
with the QCT language syntax (s
example writing “param_1 == 5”
message stating that the constra
on the model.
4. Interpreting QVtr
4.1.Possible analysis
the Results tab will turn green wh
the system. However, there is no
occur. In these cases, the result
required to assess the validity o
including an explicit time referenc
QVtrace User Manual v0.11.7
29. Model Testing
(Falsification-based Testing)
• Uses meta-heuristic search
• Search guidance: fitness functions estimating how far a
candidate test is from violating a requirement
• Search heuristics: random, hill climbing, simulated
annealing, genetic algorithm, etc.
Search-based automated testing of continuous controllers: Framework, tool support, and case studies
Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull
Information & Software Technology, 2015
23
31. Results — Fault Finding
25
Testing Model Checking
Reqs Violations Proven Violations
92 40 41 23
• MT and MC together could show that 41 requirements are correct and 40
requirements are violated
• Only 11 requirements remain inconclusive
33. Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse
up to 500 steps (50s)
34. Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse
up to 500 steps (50s)
Testing found errors after 2000 steps
35. Results — Time
27
Testing Model Checking
Violations Proven Violations Inconclusive
5.8 min
MAX = 18.5min,
MIN=3min
0.6s
MAX=1.9s
MIN=0.06s
2.2s
MAX=10.1s
MIN=0.12s
15min to several
hours
36. Lessons Learned
• L1: Model Checking fails to analyse some CPS models
(Autopilot)
• This is a major obstacle in adoption of QVTrace by CPS
suppliers as confirmed by QRA
28
37. Lessons Learned
• L2: Model checking is less effective than Model Testing in
finding requirements failures
• Model Checking found 23 requirements violations
• Model Testing found 40 requirements violations
29
38. Lessons Learned
• L3: Model Checking executes considerably faster than Model
Testing when it can prove or violate requirements
• Model Checking was able to prove 41 requirements and
find violations in 23 requirements within a few seconds
30
42. Scaling Model Testing to Complex
Compute-intensive Models
Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-refinement testing of compute-intensive cyber-
physical models: An approach based on system identication”. In: International Conference on Software Engineering
(ICSE). arXiv (2020)
45. Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
34
*
46. Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
• A simulation of the model of satellite requires ~1.5 hour
34
Provided by LuxSpace (https://luxspace.lu/)
*
*
47. Scaling Model Checking
35
E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided
Abstraction Refinement.” CAV 2000: 154-169
63. Evaluation:
Effectiveness and Efficiency
• RQ1. How effective is ARIsTEO in generating tests that reveal
requirements violations?
• RQ2. How efficient is ARIsTEO in generating tests revealing
requirements violations?
38
64. RQ1 and RQ2 -
Effectiveness and Efficiency
39
RQ1: On average, ARIsTEO detects 23.9% more
requirements violations than S-Taliro (min=-8%, max=95%).
RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%)
more efficient than S-Taliro.
65. RQ3 - Practical Usefulness
• RQ3. How applicable and useful is ARIsTEO in generating
tests revealing requirements violations for industrial CI-CPS
models?
40
66. RQ3 - Practical Usefulness
41
RQ3: ARIsTEO efficiently detected requirement violations
- in practical time - that S-Taliro could not find,
on an industrial CI-CPS model
69. Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Req
70. Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Req
71. Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Yaw
Roll Pitch
Throttle > c Req&
Req
72. Mining Assumptions using
Search and Decision Trees
Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using
machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)
76. Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
77. Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
78. Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
79. Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
80. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
81. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F
82. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
83. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Simple predicates!
84. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
85. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
86. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
87. Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Complex linear and nonlinear formulas
88. Conclusions
• Assumption generation is important for model debugging and
compositional verification (a.k.a assume-guarantee reasoning)
• Current inference techniques rely on automata theory and can generate
only boolean assumptions or assumptions over predicates
• Applying decision tree learning to test data, we can
• Generate assumptions that include arithmetic constraints over
numeric variables
• Using genetic programming, we can even go beyond linear arithmetic
constraints
48
89. Summary and Reflections
• Formal verification and testing (including SBST) have a common
goal
• For most applications, formal verification fails to prove
correctness and (like testing) can only show the presence of
bugs
• SBST and ML may improve formal verification in scalability and
applicability
• Systematic frameworks developed in the formal verification
community may help improve and enhance SBST
49