SlideShare a Scribd company logo
1 of 89
Download to read offline
Search-Based Testing for Formal
Software Verification and Vice Versa
Shiva Nejati
snejati@uottawa.ca
@ShivaNejati
School of Electrical Engineering and Computer Science, University of Ottawa and
SnT Centre, University of Luxembourg
1
Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Search-Based Software Testing
2
W. Miller and D. L. Spooner, "Automatic
Generation of Floating-Point Test Data," in
IEEE TS, SE-2(3): 223-226, 1976.
Bogdan Korel: Automated Software Test Data
Generation. IEEE TSE, 16(8): 870-879, 1990.
SBSE and SBST
3
Problem Domains
http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/
SBST Applications
• Applied to various categories of software testing:
• Unit testing
• System testing
• Regression testing
• Model-based testing
• …
4
SBST Strengths
• Scalable
• Can be parallelized easily
• Versatile
• Make few assumptions about the structure of their inputs
• Flexible and adaptable
• Can be combined with other methods: Constraint Solving, Machine
Learning, etc.
• Simple!
5
But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
But When Can We Not Use Search?
6
Verification
• Establishing properties of
programs by mathematical
proofs (Static Verification)
• Demonstrating
correctness of all system
usages
Testingvs
Checking the system for a
set of normal and boundary
usages
Classical versus Stochastic
7
Stochastic
Optimisation
vs
Sampling solutions in a
randomised way and checking
if a desired solution is found.
Classical
Optimisation
Building solutions
incrementally (or recursively)
following a (semi)
deterministic algorithm.
“Program testing can be used
to show the presence of bugs,
but never their absence.”
Edsger W. Dijkstra
Dichotomy Between Testing and
Verification
• “As a programmer, even if you would like to have
correctness, you might find yourself spending most of your
time reasoning about incorrectness.” Peter W. O’Hearn
• Testing and verification almost mean the same for
practitioners
9
PhD in Formal
Methods
PhD in Formal
Methods
PostDoc in
Empirical SE
PhD in Formal
Methods
PostDoc in
Empirical SE
GA! No way! how can

a randomized search 

algorithm solve this problem?
PhD in Formal
Methods
PostDoc in
Empirical SE
But is there a way to
combine or compare both
types of optimisations?
In This Talk, …
11
Testing and/or Verification of Models of
Cyber Physical Systems
Claudio Menghi Khouloud Gaaloul Lionel Briand
• R1: The angular velocity of the satellite shall always be lower
than 1.5 m/s
• R4: The satellite attitude shall reach close to its target value
within 2000 s
The SatEx case study
13
The CPS development workflow
PHASE 1:
Modeling
(Simulink) Model
14
The CPS development workflow
PHASE 2:
Verification
Input Outputs
Model
Requirements
Check
15
The CPS development workflow
PHASE 3:
Coding
Model Source code
16
The CPS development workflow
Modeling
(Simulink)
Verification/

Testing
Coding
17
The CPS development workflow
Verification/

Testing
17
Model Requirements
18
Model Checking Model Testing
Comparing Model Checking
and Model Testing
Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking
for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August
26-30, 2019. pp. 1015–1025. ACM (2019)
20
Model CheckingModel Testing
Logical
Properties
Simulink
Models
Natural Language
Requirements
Model
Checking
Model proven to
be correct
Failure Found No result
Ranges of
test input
variables
Simulink
Models
Natural Language
Requirements
Model
Testing
Logical
Properties
Fitness
Functions
No Failure FoundFailure Found
Simulink Model Checker
• QVTrace from QRA Corp,
Canada
• SMT-based model checker
for Simulink, Z3,
Mathematica
21
https://qracorp.com/qvtrace/
QVTrace
22
QVtrace has been designed to optimize the workflow for model-based design analysis. The
interface has three main sections as shown in the image below and described in detail on
the next page.
QVtrace User Manual v0.11.7 qracorp.com of4 21
1
2
3
Analysis in QVtrace can be approached in two ways:
a) By formally translating sets of requirements specifications and verifying the model
meets these, or
b) As an interactive querying process where the domain expert iteratively queries the
model for expected behaviour as the system components are modelled.
Analysis will always be done on all constraints present in the Constraints Window and can
be run from any subsystem in the model. It is important to note that the analysis will always
check the entire model against all constraints present, and not just the subsystem being
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to ensure these are consistent
with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For
example writing “param_1 == 5” where param_1 is a boolean variable will return an error
message stating that the constraint is inappropriately written, and no analysis will be run
on the model.
4. Interpreting QVtrace Analysis Results
4.1.Possible analysis results
No violations exist: This implies that the model is
consistent with the stated constraints for all possible
input values, and at all times. As shown in the left image,
the Results tab will turn green when no violations exist.
model for expected behaviour as the system component
Analysis will always be done on all constraints present in the C
be run from any subsystem in the model. It is important to note
check the entire model against all constraints present, and no
shown in the Design Navigation Window.
When running analysis, the constraints will first be verified to
with the QCT language syntax (see Section 5 for a guide to the
example writing “param_1 == 5” where param_1 is a boolean
message stating that the constraint is inappropriately written,
on the model.
4. Interpreting QVtrace Analysis Resu
4.1.Possible analysis results
No violations exist: This i
consistent with the stated
input values, and at all times.
the Results tab will turn green when no violations exist.
No violations exist up to a m
implies that the model has be
with the constraints within th
the system. However, there is no guarantee that at some great
When running analysis, the cons
with the QCT language syntax (s
example writing “param_1 == 5”
message stating that the constra
on the model.
4. Interpreting QVtr
4.1.Possible analysis
the Results tab will turn green wh
the system. However, there is no
occur. In these cases, the result
required to assess the validity o
including an explicit time referenc
QVtrace User Manual v0.11.7
Model Testing
(Falsification-based Testing)
• Uses meta-heuristic search
• Search guidance: fitness functions estimating how far a
candidate test is from violating a requirement
• Search heuristics: random, hill climbing, simulated
annealing, genetic algorithm, etc.
Search-based automated testing of continuous controllers: Framework, tool support, and case studies
Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull
Information & Software Technology, 2015
23
CPS Models
24
• 11 Models
• Open-loop vs Feedback-loop
• State-machines
• Continuous-Dynamics - Dynamical-Systems
• Non-linear dynamics
• Machine Learning components
Results — Fault Finding
25
Testing Model Checking
Reqs Violations Proven Violations
92 40 41 23
• MT and MC together could show that 41 requirements are correct and 40
requirements are violated
• Only 11 requirements remain inconclusive
Results — Fault Finding
26
500s0 400s300s200s100s
Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse

up to 500 steps (50s)
Results — Fault Finding
26
500s0 400s300s200s100s
BMC can analyse

up to 500 steps (50s)
Testing found errors after 2000 steps
Results — Time
27
Testing Model Checking
Violations Proven Violations Inconclusive
5.8 min
MAX = 18.5min,
MIN=3min
0.6s
MAX=1.9s
MIN=0.06s
2.2s
MAX=10.1s
MIN=0.12s
15min to several
hours
Lessons Learned
• L1: Model Checking fails to analyse some CPS models
(Autopilot)
• This is a major obstacle in adoption of QVTrace by CPS
suppliers as confirmed by QRA
28
Lessons Learned
• L2: Model checking is less effective than Model Testing in
finding requirements failures
• Model Checking found 23 requirements violations
• Model Testing found 40 requirements violations
29
Lessons Learned
• L3: Model Checking executes considerably faster than Model
Testing when it can prove or violate requirements
• Model Checking was able to prove 41 requirements and
find violations in 23 requirements within a few seconds
30
Model Requirements
31
Model Checking Model Testing
Model Requirements
32
Model Testing
Model Requirements
32
Model Testing
Using SBST to automatically generate test inputs that reveal
requirement violations
Scaling Model Testing to Complex
Compute-intensive Models
Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-refinement testing of compute-intensive cyber-
physical models: An approach based on system identication”. In: International Conference on Software Engineering
(ICSE). arXiv (2020)
Challenge
34
*
Challenge
• Industrial models of CPS are often compute-intensive
34
*
Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
34
*
Challenge
• Industrial models of CPS are often compute-intensive
• Compute-intensive models require hours to complete a single
simulation of the model under test (MUT)
• A simulation of the model of satellite requires ~1.5 hour
34
Provided by LuxSpace (https://luxspace.lu/)
*
*
Scaling Model Checking
35
E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided
Abstraction Refinement.” CAV 2000: 154-169
CEGAR
36
Model 

Abstraction 

CEGAR
36
Model 

Abstraction 

Model Check 

Abstract

Model 

CEGAR
36
Model 

Abstraction 

Model Check 

Abstract

Model 

No Bug

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
Refined

Abstract

Model 

CEGAR
36
Model 

Abstraction 

Simulate

Bug
Model Check 

Abstract

Model 

No Bug

Real

Bug
Refinement

Spurious Bug
Refined

Abstract

Model 

Abstract 

Interpretation
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Model Check 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Model Check 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

No Bug

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstraction 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

SBST
Approximation 

AppRoxImation-based
TEst generatiOn (ARIsTEO)
37
Model 

Abstract

Model 

Simulate

Bug
Refinement

Real

BugSpurious Bug
Refined

Abstract

Model 

Machine Learning

(system identification)
SBST
Approximation 

Evaluation:
Effectiveness and Efficiency
• RQ1. How effective is ARIsTEO in generating tests that reveal
requirements violations?
• RQ2. How efficient is ARIsTEO in generating tests revealing
requirements violations?
38
RQ1 and RQ2 -
Effectiveness and Efficiency
39
RQ1: On average, ARIsTEO detects 23.9% more
requirements violations than S-Taliro (min=-8%, max=95%).
RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%)
more efficient than S-Taliro.
RQ3 - Practical Usefulness
• RQ3. How applicable and useful is ARIsTEO in generating
tests revealing requirements violations for industrial CI-CPS
models?
40
RQ3 - Practical Usefulness
41
RQ3: ARIsTEO efficiently detected requirement violations
- in practical time - that S-Taliro could not find,
on an industrial CI-CPS model
Model Requirements
42
Model Testing
Model Requirements
42
Model Testing
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Req
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Req
Missing Assumptions on Inputs
43
Yaw
Roll Pitch
Req: When the autopilot is enabled, the aircraft altitude should reach the
desired altitude within 500 seconds in calm air.
Assumption: The pilot should apply sufficient throttle force.
Yaw
Roll Pitch
Throttle > c Req&
Req
Mining Assumptions using
Search and Decision Trees
Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using
machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)
Req
Yaw
Roll Pitch
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Req
Yaw
Roll Pitch
SBST
Test Suite + Oracle
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Machine Learning
Throttle > c
Model
Checking
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
P
Decision Tree
PF
F C1 ∨ C2 ∨ … ∨ Cn
Throttle > 0.5 ∧ pitchwheel > 10
Simple predicates!
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Test Inputs + pass/fail
Throttle = 20
Throttle = 0.4
Throttle = -3.6
Throttle = 100
P
F
P
F
Genetic Programming
< ≥
-
∧
×
x Y
5 2
x Z
x × y < 5 ∧ (x − z) ≥ 2
Complex linear and nonlinear formulas
Conclusions
• Assumption generation is important for model debugging and
compositional verification (a.k.a assume-guarantee reasoning)
• Current inference techniques rely on automata theory and can generate
only boolean assumptions or assumptions over predicates
• Applying decision tree learning to test data, we can
• Generate assumptions that include arithmetic constraints over
numeric variables
• Using genetic programming, we can even go beyond linear arithmetic
constraints
48
Summary and Reflections
• Formal verification and testing (including SBST) have a common
goal
• For most applications, formal verification fails to prove
correctness and (like testing) can only show the presence of
bugs
• SBST and ML may improve formal verification in scalability and
applicability
• Systematic frameworks developed in the formal verification
community may help improve and enhance SBST
49

More Related Content

What's hot

Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Lionel Briand
 
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive SystemsTesting the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Lionel Briand
 
Can we predict the quality of spectrum-based fault localization?
Can we predict the quality of spectrum-based fault localization?Can we predict the quality of spectrum-based fault localization?
Can we predict the quality of spectrum-based fault localization?
Lionel Briand
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Lionel Briand
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
Lionel Briand
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
Lionel Briand
 
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
Lionel Briand
 

What's hot (20)

Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
Testing Dynamic Behavior in Executable Software Models - Making Cyber-physica...
 
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive SystemsTesting the Untestable: Model Testing of Complex Software-Intensive Systems
Testing the Untestable: Model Testing of Complex Software-Intensive Systems
 
Can we predict the quality of spectrum-based fault localization?
Can we predict the quality of spectrum-based fault localization?Can we predict the quality of spectrum-based fault localization?
Can we predict the quality of spectrum-based fault localization?
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Applying Product Line Use Case Modeling ! in an Industrial Automotive Embedde...
Applying Product Line Use Case Modeling ! in an Industrial Automotive Embedde...Applying Product Line Use Case Modeling ! in an Industrial Automotive Embedde...
Applying Product Line Use Case Modeling ! in an Industrial Automotive Embedde...
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
 
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...Supporting Change in Product Lines within the Context of Use Case-driven Deve...
Supporting Change in Product Lines within the Context of Use Case-driven Deve...
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
 
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
 
Search-Based Robustness Testing of Data Processing Systems
Search-Based Robustness Testing of Data Processing SystemsSearch-Based Robustness Testing of Data Processing Systems
Search-Based Robustness Testing of Data Processing Systems
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Speeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational IntelligenceSpeeding-up Software Testing With Computational Intelligence
Speeding-up Software Testing With Computational Intelligence
 
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedResearch-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons Learned
 
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
PUMConf: A Tool to Configure Product Specific Use Case and Domain Models in a...
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 

Similar to SSBSE 2020 keynote

Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
Chakkrit (Kla) Tantithamthavorn
 
Dealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in VerificationDealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in Verification
DVClub
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07u
Shubham Sharma
 

Similar to SSBSE 2020 keynote (20)

Dill may-2008
Dill may-2008Dill may-2008
Dill may-2008
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
Combinatorial testing ppt
Combinatorial testing pptCombinatorial testing ppt
Combinatorial testing ppt
 
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
Model-Based Testing: Theory and Practice. Keynote @ MoTiP (ISSRE) 2012.
 
Combinatorial testing
Combinatorial testingCombinatorial testing
Combinatorial testing
 
testing
testingtesting
testing
 
ISTQB Advanced Study Guide - 7
ISTQB Advanced Study Guide - 7ISTQB Advanced Study Guide - 7
ISTQB Advanced Study Guide - 7
 
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
 
Software Testing
Software Testing Software Testing
Software Testing
 
20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx20220914-MBT-Experiences-SB1-final.pptx
20220914-MBT-Experiences-SB1-final.pptx
 
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
 
Manualtestingppt
ManualtestingpptManualtestingppt
Manualtestingppt
 
Introduction & Manual Testing
Introduction & Manual TestingIntroduction & Manual Testing
Introduction & Manual Testing
 
Hands-on Experience Model based testing with spec explorer
Hands-on Experience Model based testing with spec explorer Hands-on Experience Model based testing with spec explorer
Hands-on Experience Model based testing with spec explorer
 
Dealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in VerificationDealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in Verification
 
Debug me
Debug meDebug me
Debug me
 
Qat09 presentations dxw07u
Qat09 presentations dxw07uQat09 presentations dxw07u
Qat09 presentations dxw07u
 
Bart Knaack - The Truth About Model-Based Quality Improvements
Bart Knaack - The Truth About Model-Based Quality ImprovementsBart Knaack - The Truth About Model-Based Quality Improvements
Bart Knaack - The Truth About Model-Based Quality Improvements
 
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
 
Software testing
Software testingSoftware testing
Software testing
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 

SSBSE 2020 keynote

  • 1. Search-Based Testing for Formal Software Verification and Vice Versa Shiva Nejati snejati@uottawa.ca @ShivaNejati School of Electrical Engineering and Computer Science, University of Ottawa and SnT Centre, University of Luxembourg 1
  • 2. Search-Based Software Testing 2 W. Miller and D. L. Spooner, "Automatic Generation of Floating-Point Test Data," in IEEE TS, SE-2(3): 223-226, 1976.
  • 3. Search-Based Software Testing 2 W. Miller and D. L. Spooner, "Automatic Generation of Floating-Point Test Data," in IEEE TS, SE-2(3): 223-226, 1976. Bogdan Korel: Automated Software Test Data Generation. IEEE TSE, 16(8): 870-879, 1990.
  • 4. SBSE and SBST 3 Problem Domains http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/
  • 5. SBST Applications • Applied to various categories of software testing: • Unit testing • System testing • Regression testing • Model-based testing • … 4
  • 6. SBST Strengths • Scalable • Can be parallelized easily • Versatile • Make few assumptions about the structure of their inputs • Flexible and adaptable • Can be combined with other methods: Constraint Solving, Machine Learning, etc. • Simple! 5
  • 7. But When Can We Not Use Search? 6 Verification • Establishing properties of programs by mathematical proofs (Static Verification) • Demonstrating correctness of all system usages
  • 8. But When Can We Not Use Search? 6 Verification • Establishing properties of programs by mathematical proofs (Static Verification) • Demonstrating correctness of all system usages Testingvs Checking the system for a set of normal and boundary usages
  • 9. Classical versus Stochastic 7 Stochastic Optimisation vs Sampling solutions in a randomised way and checking if a desired solution is found. Classical Optimisation Building solutions incrementally (or recursively) following a (semi) deterministic algorithm.
  • 10. “Program testing can be used to show the presence of bugs, but never their absence.” Edsger W. Dijkstra
  • 11. Dichotomy Between Testing and Verification • “As a programmer, even if you would like to have correctness, you might find yourself spending most of your time reasoning about incorrectness.” Peter W. O’Hearn • Testing and verification almost mean the same for practitioners 9
  • 13. PhD in Formal Methods PostDoc in Empirical SE
  • 14. PhD in Formal Methods PostDoc in Empirical SE GA! No way! how can
 a randomized search 
 algorithm solve this problem?
  • 15. PhD in Formal Methods PostDoc in Empirical SE But is there a way to combine or compare both types of optimisations?
  • 16. In This Talk, … 11 Testing and/or Verification of Models of Cyber Physical Systems
  • 17. Claudio Menghi Khouloud Gaaloul Lionel Briand
  • 18. • R1: The angular velocity of the satellite shall always be lower than 1.5 m/s • R4: The satellite attitude shall reach close to its target value within 2000 s The SatEx case study 13
  • 19. The CPS development workflow PHASE 1: Modeling (Simulink) Model 14
  • 20. The CPS development workflow PHASE 2: Verification Input Outputs Model Requirements Check 15
  • 21. The CPS development workflow PHASE 3: Coding Model Source code 16
  • 22. The CPS development workflow Modeling (Simulink) Verification/
 Testing Coding 17
  • 23. The CPS development workflow Verification/
 Testing 17
  • 25. Comparing Model Checking and Model Testing Nejati, S., Gaaloul, K., Menghi, C., Briand, L.C., Foster, S., Wolfe, D. “Evaluating model testing and model checking for nding requirements violations in Simulink models.” In: Proceedings of ESEC/SIGSOFT FSE 2019, Estonia, August 26-30, 2019. pp. 1015–1025. ACM (2019)
  • 26. 20 Model CheckingModel Testing Logical Properties Simulink Models Natural Language Requirements Model Checking Model proven to be correct Failure Found No result Ranges of test input variables Simulink Models Natural Language Requirements Model Testing Logical Properties Fitness Functions No Failure FoundFailure Found
  • 27. Simulink Model Checker • QVTrace from QRA Corp, Canada • SMT-based model checker for Simulink, Z3, Mathematica 21 https://qracorp.com/qvtrace/
  • 28. QVTrace 22 QVtrace has been designed to optimize the workflow for model-based design analysis. The interface has three main sections as shown in the image below and described in detail on the next page. QVtrace User Manual v0.11.7 qracorp.com of4 21 1 2 3 Analysis in QVtrace can be approached in two ways: a) By formally translating sets of requirements specifications and verifying the model meets these, or b) As an interactive querying process where the domain expert iteratively queries the model for expected behaviour as the system components are modelled. Analysis will always be done on all constraints present in the Constraints Window and can be run from any subsystem in the model. It is important to note that the analysis will always check the entire model against all constraints present, and not just the subsystem being shown in the Design Navigation Window. When running analysis, the constraints will first be verified to ensure these are consistent with the QCT language syntax (see Section 5 for a guide to the QCT language syntax). For example writing “param_1 == 5” where param_1 is a boolean variable will return an error message stating that the constraint is inappropriately written, and no analysis will be run on the model. 4. Interpreting QVtrace Analysis Results 4.1.Possible analysis results No violations exist: This implies that the model is consistent with the stated constraints for all possible input values, and at all times. As shown in the left image, the Results tab will turn green when no violations exist. model for expected behaviour as the system component Analysis will always be done on all constraints present in the C be run from any subsystem in the model. It is important to note check the entire model against all constraints present, and no shown in the Design Navigation Window. When running analysis, the constraints will first be verified to with the QCT language syntax (see Section 5 for a guide to the example writing “param_1 == 5” where param_1 is a boolean message stating that the constraint is inappropriately written, on the model. 4. Interpreting QVtrace Analysis Resu 4.1.Possible analysis results No violations exist: This i consistent with the stated input values, and at all times. the Results tab will turn green when no violations exist. No violations exist up to a m implies that the model has be with the constraints within th the system. However, there is no guarantee that at some great When running analysis, the cons with the QCT language syntax (s example writing “param_1 == 5” message stating that the constra on the model. 4. Interpreting QVtr 4.1.Possible analysis the Results tab will turn green wh the system. However, there is no occur. In these cases, the result required to assess the validity o including an explicit time referenc QVtrace User Manual v0.11.7
  • 29. Model Testing (Falsification-based Testing) • Uses meta-heuristic search • Search guidance: fitness functions estimating how far a candidate test is from violating a requirement • Search heuristics: random, hill climbing, simulated annealing, genetic algorithm, etc. Search-based automated testing of continuous controllers: Framework, tool support, and case studies Reza Matinnejad, Shiva Nejati, Lionel C. Briand, Thomas Bruckmann, and Claude Poull Information & Software Technology, 2015 23
  • 30. CPS Models 24 • 11 Models • Open-loop vs Feedback-loop • State-machines • Continuous-Dynamics - Dynamical-Systems • Non-linear dynamics • Machine Learning components
  • 31. Results — Fault Finding 25 Testing Model Checking Reqs Violations Proven Violations 92 40 41 23 • MT and MC together could show that 41 requirements are correct and 40 requirements are violated • Only 11 requirements remain inconclusive
  • 32. Results — Fault Finding 26 500s0 400s300s200s100s
  • 33. Results — Fault Finding 26 500s0 400s300s200s100s BMC can analyse
 up to 500 steps (50s)
  • 34. Results — Fault Finding 26 500s0 400s300s200s100s BMC can analyse
 up to 500 steps (50s) Testing found errors after 2000 steps
  • 35. Results — Time 27 Testing Model Checking Violations Proven Violations Inconclusive 5.8 min MAX = 18.5min, MIN=3min 0.6s MAX=1.9s MIN=0.06s 2.2s MAX=10.1s MIN=0.12s 15min to several hours
  • 36. Lessons Learned • L1: Model Checking fails to analyse some CPS models (Autopilot) • This is a major obstacle in adoption of QVTrace by CPS suppliers as confirmed by QRA 28
  • 37. Lessons Learned • L2: Model checking is less effective than Model Testing in finding requirements failures • Model Checking found 23 requirements violations • Model Testing found 40 requirements violations 29
  • 38. Lessons Learned • L3: Model Checking executes considerably faster than Model Testing when it can prove or violate requirements • Model Checking was able to prove 41 requirements and find violations in 23 requirements within a few seconds 30
  • 41. Model Requirements 32 Model Testing Using SBST to automatically generate test inputs that reveal requirement violations
  • 42. Scaling Model Testing to Complex Compute-intensive Models Menghi, C., Nejati, S., Briand, L.C., Parache, Y.I. “Approximation-refinement testing of compute-intensive cyber- physical models: An approach based on system identication”. In: International Conference on Software Engineering (ICSE). arXiv (2020)
  • 44. Challenge • Industrial models of CPS are often compute-intensive 34 *
  • 45. Challenge • Industrial models of CPS are often compute-intensive • Compute-intensive models require hours to complete a single simulation of the model under test (MUT) 34 *
  • 46. Challenge • Industrial models of CPS are often compute-intensive • Compute-intensive models require hours to complete a single simulation of the model under test (MUT) • A simulation of the model of satellite requires ~1.5 hour 34 Provided by LuxSpace (https://luxspace.lu/) * *
  • 47. Scaling Model Checking 35 E. Clarke, O. Grumberg, S. Jha, Y. Lu, H. Veith “Counterexample-Guided Abstraction Refinement.” CAV 2000: 154-169
  • 49. CEGAR 36 Model 
 Abstraction 
 Model Check 
 Abstract
 Model 

  • 50. CEGAR 36 Model 
 Abstraction 
 Model Check 
 Abstract
 Model 
 No Bug

  • 51. CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug Model Check 
 Abstract
 Model 
 No Bug

  • 52. CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug Model Check 
 Abstract
 Model 
 No Bug
 Real
 Bug
  • 53. CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug Model Check 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug
  • 54. CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug Model Check 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug Refined
 Abstract
 Model 

  • 55. CEGAR 36 Model 
 Abstraction 
 Simulate
 Bug Model Check 
 Abstract
 Model 
 No Bug
 Real
 Bug Refinement
 Spurious Bug Refined
 Abstract
 Model 
 Abstract 
 Interpretation
  • 56. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstraction 
 Model Check 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 

  • 57. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstraction 
 Model Check 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 

  • 58. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstraction 
 Abstract
 Model 
 Simulate
 No Bug
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 59. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstraction 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 60. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstraction 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST
  • 61. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 SBST Approximation 

  • 62. AppRoxImation-based TEst generatiOn (ARIsTEO) 37 Model 
 Abstract
 Model 
 Simulate
 Bug Refinement
 Real
 BugSpurious Bug Refined
 Abstract
 Model 
 Machine Learning
 (system identification) SBST Approximation 

  • 63. Evaluation: Effectiveness and Efficiency • RQ1. How effective is ARIsTEO in generating tests that reveal requirements violations? • RQ2. How efficient is ARIsTEO in generating tests revealing requirements violations? 38
  • 64. RQ1 and RQ2 - Effectiveness and Efficiency 39 RQ1: On average, ARIsTEO detects 23.9% more requirements violations than S-Taliro (min=-8%, max=95%). RQ2: ARIsTEO is on average 31.3% (min=−1.6%, max=85.2%) more efficient than S-Taliro.
  • 65. RQ3 - Practical Usefulness • RQ3. How applicable and useful is ARIsTEO in generating tests revealing requirements violations for industrial CI-CPS models? 40
  • 66. RQ3 - Practical Usefulness 41 RQ3: ARIsTEO efficiently detected requirement violations - in practical time - that S-Taliro could not find, on an industrial CI-CPS model
  • 69. Missing Assumptions on Inputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Req
  • 70. Missing Assumptions on Inputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Assumption: The pilot should apply sufficient throttle force. Req
  • 71. Missing Assumptions on Inputs 43 Yaw Roll Pitch Req: When the autopilot is enabled, the aircraft altitude should reach the desired altitude within 500 seconds in calm air. Assumption: The pilot should apply sufficient throttle force. Yaw Roll Pitch Throttle > c Req& Req
  • 72. Mining Assumptions using Search and Decision Trees Gaaloul, K., Menghi, C., Nejati, S., Briand, L., Wolfe, D. “Mining assumptions for software components using machine learning.” In: Foundations of Software Engineering ESEC/SIGSOFT FSE 2020. ACM (2020)
  • 75. Req Yaw Roll Pitch SBST Test Suite + Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 76. Req Yaw Roll Pitch SBST Test Suite + Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c
  • 77. Req Yaw Roll Pitch SBST Test Suite + Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 78. Req Yaw Roll Pitch SBST Test Suite + Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 79. Req Yaw Roll Pitch SBST Test Suite + Oracle Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Machine Learning Throttle > c Model Checking
  • 80. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 81. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F
  • 82. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F C1 ∨ C2 ∨ … ∨ Cn Throttle > 0.5 ∧ pitchwheel > 10
  • 83. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F P Decision Tree PF F C1 ∨ C2 ∨ … ∨ Cn Throttle > 0.5 ∧ pitchwheel > 10 Simple predicates!
  • 84. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F
  • 85. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z
  • 86. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z x × y < 5 ∧ (x − z) ≥ 2
  • 87. Test Inputs + pass/fail Throttle = 20 Throttle = 0.4 Throttle = -3.6 Throttle = 100 P F P F Genetic Programming < ≥ - ∧ × x Y 5 2 x Z x × y < 5 ∧ (x − z) ≥ 2 Complex linear and nonlinear formulas
  • 88. Conclusions • Assumption generation is important for model debugging and compositional verification (a.k.a assume-guarantee reasoning) • Current inference techniques rely on automata theory and can generate only boolean assumptions or assumptions over predicates • Applying decision tree learning to test data, we can • Generate assumptions that include arithmetic constraints over numeric variables • Using genetic programming, we can even go beyond linear arithmetic constraints 48
  • 89. Summary and Reflections • Formal verification and testing (including SBST) have a common goal • For most applications, formal verification fails to prove correctness and (like testing) can only show the presence of bugs • SBST and ML may improve formal verification in scalability and applicability • Systematic frameworks developed in the formal verification community may help improve and enhance SBST 49