Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Testing, fixing, and proving with contracts
1. Testing, Fixing, and Proving
with Contracts
Carlo A. Furia
Chair of Software Engineering, ETH Zurich
bugcounting.net @bugcounting
2. The (AlpTransit) Gotthard tunnel
The tunnel
• 57 km long
• construction at both ends
• underneath the Gotthard massif
2
Erstfeld
• canton Uri
• German-speaking
• weather probably cloudy
Bodio
• canton Ticino
• Italian-speaking
• weather probably sunny
3. Users with different requirements
Joe the programmer
• little or no background in formal techniques
• weak and simple (incomplete) specifications
• design not optimal for verification
• bugs: full verification is unattainable
• looks for low hanging fruits of verification
Verification expert
• fluent in formal logic techniques
• strong, often complete, specifications
• design for full verification
• could use automation of simpler steps
• aims at the holy grail of verified software
3
6. A key ingredient: contracts
Contracts are a form of lightweight specification:
• Assertions (pre- and postconditions, invariants)
• Contract language = Boolean expressions
• Executable: bring immediate benefits for testing,
debugging, and so on
Verification tools in EVE take advantage of
(simple) functional specifications
in the form of contracts.
10. AutoTest in a nutshell
AutoTest is a push-button generator of unit tests
• Test = sequence of method calls on objects
• Contracts as oracles: target call o.m
– Invalid test: o does not satisfy m’s precondition
– Passing test: all contracts evaluate to True
– Failing test: some contract evaluates to False
10
Similar tools:
• Korat (Java + assertions)
• QuickCheck (Haskell)
11. How AutoTest works
11
Random
object o
Random
method m
call o.m
Invalid test
Failing test:
bug found
• Existing object from object pool
• Fresh object of primitive type (e.g. random integer)
• New object of class type (call constructor)
Passing test
Add any new objects to object pool
Classification based on
runtime contract checking
12. Test generation strategies
AutoTest is a push-button generator of unit tests
• Basic generation strategy: random
• Other strategies as extensions:
– Random+
– Adaptive-random (object distance)
– Precondition satisfaction
– Stateful testing
12
13. Demo example: Bank Account
class ACCOUNT
balance: INTEGER
deposit (amount: INTEGER)
require 0 <= amount
ensure balance = old balance + amount
withdraw (amount: INTEGER)
require 0 <= amount
ensure
balance_set:
amount <= old balance implies balance = old balance - amount
balance_not_set:
amount > old balance implies balance = old balance
invariant
balance_nonnegative: balance >= 0 13
14. Demo 1: bug finding
AutoTest finds a bug in the implementation of
withdraw that violates postcondition
balance_not_set.
withdraw (amount: INTEGER)
require 0 <= amount
do
balance := balance + amount
ensure
balance_set:
amount <= old balance implies
balance = old balance - amount
balance_not_set:
amount > old balance implies balance = old balance
14
15. Demo 1: bug finding
AutoTest finds a bug in the implementation of
withdraw that violates postcondition
balance_not_set.
15
17. AutoFix in a nutshell
AutoFix is a push-button generator of fixes
17
AutoFix
Coding
code + contracts
bugs + patches
Similar tools:
• GenProg, Kali (C)
• PAR (Java)
19. AutoFix: Components
Program state abstraction:
• snapshots: location, predicate, value
Fault localization:
• static information: proximity to failing
location/expression
• dynamic information: number of
failing/passing tests
19
20. AutoFix: Components
Program state abstraction:
• snapshots : location, predicate, value
Synthesis:
• enumeration of common replacement
expressions and instructions
• conditional execution:
@ location:
if predicate = value then some fix action
20
21. AutoFix: Components
Validation:
• regression testing with all available tests for
method being fixed
• valid fix: passes all available tests
Ranking:
• based on suspiciousness score of snapshots
21
22. Demo 1b: bug fixing
AutoFix builds fixes for the bug in the
implementation of withdraw.
A “high-quality” (proper, correct) fix:
22
23. Demo 1b: bug fixing
AutoFix builds fixes for the bug in the
implementation of withdraw.
A fix that just happens to pass all tests:
23
24. Experiments with AutoFix
Source programs: standard data-structure
libraries, text library, card game.
LOC
of source +
contracts
#
Unique
errors
%
Fixed
errors
%
High-quality
fixes
Time:
test + fix
[minutes]
Fix implementation:
73’000 204 42% 25% 17 + 3
Fix contracts:
24’500 44 95% 25% 31 + 3
25. Experiments with AutoFix
Source programs: standard data-structure
libraries, text library, card game.
GenProg, according to
the analysis by [Qui+, ISSTA’15]:
< 2%
LOC
of source +
contracts
#
Unique
errors
%
Fixed
errors
%
High-quality
fixes
Time:
test + fix
[minutes]
Fix implementation:
73’000 204 42% 25% 17 + 3
27. Integrating different tools
A verification assistant manages individual tools
– Select tools and program parts to be verified
– Collect results and aggregate them
Classes Data pool Tools
Verification Assistant
.
.
.
AutoTest
AutoProof
C1
C2
Cn AutoFix
AT
n
AT
2
AT
1 …
AP
n
AT
2
AP
1 …
AInAT
2
AI1 …
AF
n
AT
2
AF
1 … 27
Inspector
28. Scores: aggregated verification results
Each method & class receives a correctness
score
• A value in the interval [-1, 1]
• Estimate of evidence for correctness
-1 0 1
Evidence of
incorrectness
Evidence of
correctness
Lack of
evidence
Conclusive
evidence
Conclusive
evidence
28
29. Score for testing
• Failing test case: conclusive evidence of
incorrectness
• Passing test case: increases evidence of correctness
• Absolute value may vary according to other metrics
– used heuristics, coverage, testing time, …
-1 0 1
29
30. Score for testing
• Failing test case: conclusive evidence of
incorrectness
• Passing test case: increases evidence of correctness
• Absolute value may vary according to other metrics
– used heuristics, coverage, testing time, …
-1 0 1
Failing test case
30
31. Score for testing
• Failing test case: conclusive evidence of
incorrectness
• Passing test case: increases evidence of correctness
• Absolute value may vary according to other metrics
– used heuristics, coverage, testing time, …
-1 0 1
Failing test case
Passing test
case
31
32. Score for testing
• Failing test case: conclusive evidence of
incorrectness
• Passing test case: increases evidence of correctness
• Absolute value may vary according to other metrics
– used heuristics, coverage, testing time, …
-1 0 1
Failing test case
Passing test
case
Passing test
case
32
33. Score for testing
• Failing test case: conclusive evidence of
incorrectness
• Passing test case: increases evidence of correctness
• Absolute value may vary according to other metrics
– used heuristics, coverage, testing time, …
-1 0 1
Failing test case
Passing test
case
Passing test
case
Passing test
case
33
34. Score for correctness proofs
AutoProof is sound but incomplete:
– Timeout: score 0
– Failed proof: score -0.2
-1 0 1
Failed proof for a
complete tool
Successful proof
for a sound tool
34
35. Combining scores of different tools
• Running each tool determines a score for each
method
• Overall score for a class: weighted average
• Weights depend on the relative confidence in
reliability of tools
– may be application and configuration dependent
• Overall score of modules (packages) may also
weigh components differently according to
their criticality
35
36. Demo 2: combined testing and proving
The verification assistant runs on the version of
ACCOUNT patched by AutoFix:
deposit does not verify, but passes all tests
reasonable confidence in its correctness.
36
38. Modular proofs
Verifiers such as AutoProof perform modular
reasoning
• Effects of a call to method m within the caller
= m’s specification (pre, post, frame)
38
deposit (amount: INTEGER)
require
0 <= amount
do
update_balance (amount)
How we wrote it: How AutoProof sees it:
deposit (amount: INTEGER)
require
0 <= amount
do
assert update_balance.pre
havoc update_balance.frame
assume update_balance.post
39. Modular proofs in practice
Verifiers such as AutoProof perform modular
reasoning
• Necessary for scalability
• Consistent with design-by-contract and
information hiding
• But providing the detailed specifications
necessary for verification may be tedious or
overly complex
39
40. Specification writing fatigue
Providing the specification necessary for
verification may be tedious, especially in the
most straightforward cases.
deposit (amount: INTEGER)
require
0 <= amount
do
update_balance (amount)
ensure
balance = old balance + amount
How we wrote it: How we thought about it:
40
deposit (amount: INTEGER)
require
0 <= amount
do
balance := balance + amount
ensure
balance = old balance + amount
41. Debugging failed verification
When verification fails with verifiers such as
AutoProof (modular, sound, incomplete):
• There is a bug?
• The program is correct, but the specification is
insufficient?
To help debug failed verification attempts
AutoProof features two-step verification.
41
42. Two-step verification
Two-step verification improves user feedback,
especially in the presence of little specification.
1. First verification step
– Standard modular verification
2. Second verification step
– Ignore specification of called routines and loops
– Uses inlining and unrolling
Feedback: combination of outcomes of 1 & 2
42
43. Step 1: modular verification
update_balance (a: INTEGER)
do
balance := balance + a
end
deposit (amount: INTEGER)
require
0 <= amount
do
update_balance (amount)
ensure
balance = old balance + amount
Postcondition violated
Modular verification fails.
43
No postcondition of callee:
effect on balance undefined
44. Step 2: verification with inlining
Verification with inlining succeeds.
Attribute balance is
incremented by amount.
Feedback: change (strengthen) the
specification of update_balance.
44
update_balance (a: INTEGER)
do
balance := balance + a
end
deposit (amount: INTEGER)
require
0 <= amount
do
balance := balance + amount
ensure
balance = old balance + amount
45. Demo 2b: two-step verification
AutoProof with two-step verification runs on
the version of ACCOUNT patched by AutoFix:
deposit verifies after inlining update_balance
• Provide postcondition to update_balance
or
• Direct AutoProof to use update_balance inlined
45
Follow this demo at http://bit.do/tap-tutorial
(Switch to tab account2.e)
46. Two-step verification: feedback
r
require Pr
do
s
ensure Qr
s
require Ps
do
:
ensure Qs
Step 1: modular Step 2: inlined
Suggestion
Verify r Verify s Verify r
Ps fails Succeeds Succeeds Weaken Ps or use inlined
Qr fails Succeeds Succeeds Strengthen Qs or use inlined
Succeeds Qs fails Succeeds Strengthen Ps / Weaken Qs
47. Two-step verification: feedback
r
require Pr
do
s
ensure Qr
s
require Ps
do
:
ensure Qs
Step 1: modular Step 2: inlined
Suggestion
Verify r Verify s Verify r
Ps fails Succeeds Succeeds Weaken Ps or use inlined
Qr fails Succeeds Succeeds Strengthen Qs or use inlined
Succeeds Qs fails Succeeds Strengthen Ps / Weaken Qs
1
48. Two-step verification: feedback
r
require Pr
do
s
ensure Qr
s
require Ps
do
:
ensure Qs
Step 1: modular Step 2: inlined
Suggestion
Verify r Verify s Verify r
Ps fails Succeeds Succeeds Weaken Ps or use inlined
Qr fails Succeeds Succeeds Strengthen Qs or use inlined
Succeeds Qs fails Succeeds Strengthen Ps / Weaken Qs
2
49. Two-step verification: feedback
r
require Pr
do
s
ensure Qr
s
require Ps
do
:
ensure Qs
Step 1: modular Step 2: inlined
Suggestion
Verify r Verify s Verify r
Ps fails Succeeds Succeeds Weaken Ps or use inlined
Qr fails Succeeds Succeeds Strengthen Qs or use inlined
Succeeds Qs fails Succeeds Strengthen Ps / Weaken Qs
3
51. AutoProof in a nutshell
AutoProof is an auto-active verifier for Eiffel
• Prover for functional properties
• All-out support of object-oriented idiomatic
structures (e.g. patterns)
– Based on class invariants
• Flexible: incrementality
– Proving simple properties requires little annotations
– Proving complex properties is possible with more
effort
51
52. Demo 3: a taste of AutoProof
AutoProof verifies method transfer with suitable
specification
transfer (amount: INTEGER; other: ACCOUNT)
-- Transfer `amount' from this account to `other'.
require
amount_non_negative: 0 <= amount
amount_available: amount <= balance
do
withdraw (amount)
other.deposit (amount)
ensure
deposit_done: other.balance = old other.balance + amount
withdrawal_done: balance = old balance - amount
52
Follow this demo at http://bit.do/tap-tutorial
(Switch to tab account3.e)
53. Sound program verifiers compared
53
more
complex
properties
more
automation
static analysis
interactive (KIV)
ESC/Java2
OpenJML
Spec#
VCC
Chalice
Dafny
KeY VeriFast
54. Reasoning with class invariants
Class invariants are a natural way to reason
about object-oriented programs:
invariant = consistency of objects
54
ACCOUNT
invariant
balance >= 0
56. AUDITOR
LIST
ACCOUNT
Consistency of multi-object structures
Mutually dependent object structures require
extra care to enforce, and reason about,
consistency (cmp. encapsulation)
56
invariant
balance >= 0
balance = sum (transactions)
transactions
57. AUDITOR
LIST
ACCOUNT
Consistency of multi-object structures
Mutually dependent object structures require
extra care to enforce, and reason about,
consistency (cmp. encapsulation)
57
invariant
balance >= 0
balance = sum (transactions)
transactions
58. Open and closed objects
When (at which program points) must class
invariants hold? To provide flexibility, objects in
AutoProof can be open or closed
58
CLOSED OPEN
Object: Consistent Inconsistent
State: Stable Transient
Invariant: Holds May not hold
59. LIST
ACCOUNT
Ownership
For hierarchical object structures, AutoProof
offers an ownership protocol
59
invariant
balance >= 0
owns = [ transactions ]
balance = sum (transactions)
transactions
owns
60. LIST
ACCOUNT
Ownership
For hierarchical object structures, AutoProof
offers an ownership protocol
60
invariant
balance >= 0
owns = [ transactions ]
balance = sum (transactions)
transactions
AUDITOR
owns
64. add_node
LIST
ACCOUNT
Ownership
For hierarchical object structures, AutoProof
offers an ownership protocol
64
transactions
AUDITOR
owns
update_balance
invariant
balance >= 0
owns = [ transactions ]
balance = sum (transactions)
65. LIST
ACCOUNT
Ownership
For hierarchical object structures, AutoProof
offers an ownership protocol
65
invariant
balance >= 0
owns = [ transactions ]
balance = sum (transactions)
transactions
AUDITOR
owns
66. Demo 4: ownership in AutoProof
AutoProof verifies the ACCOUNT with
an owned list of transactions
transactions: SIMPLE_LIST [INTEGER]
-- History of transactions:
-- positive integer = deposited amount
-- negative integer = withdrawn amount
-- latest transactions in back of list
66
Follow this demo at http://bit.do/tap-tutorial
(Switch to tab account4.e)
69. subjects
observers
Semantic collaboration
• Subjects = objects my consistency depends on
• Observers = objects whose consistency depends
on me
69
invariant
subjects = [ bank ]
Current in bank.observers
-- Implicit in AutoProof
interest_rate = bank.rate
bank
bank
ACCOUNTBANK
bank
70. Demo 5: collaboration in AutoProof
AutoProof verifies the ACCOUNT with
a BANK that sets a master interest rate
bank: BANK
-- Provider of this account
invariant
non_negative_rate: 0 <= interest_rate
bank_exists: bank /= Void
consistent_rate: interest_rate = bank.master_rate
70
Follow this demo at http://bit.do/tap-tutorial
(Switch to tabs account5.e sand bank5.e)
71. AutoProof on realistic software
Verification benchmarks:
EiffelBase2 – a realistic container library:
# programs LOC SPEC/CODE Verification time
25 4400 Lines: 1.0
Tokens: 1.9
Total: 3.4 min
Longest method: 12 sec
Average method: < 1 sec
# classes LOC SPEC/CODE Verification time
46 8400 Lines: 1.4
Tokens: 2.7
Total: 7.2 min
Longest method: 12 sec
Average method: < 1 sec
72. Testing, fixing, and proving
with contracts: acknowledgements
72
Julian Tschannen Nadia Polikarpova
Yu (Max) Pei
Yi (Jason) Wei
Andreas Zeller
Bertrand MeyerIlinca Ciupa-MoserAndreas Leitner
73. Testing, fixing, and proving
with contracts (in Eiffel)
1. AutoTest
73
2. AutoFix
3. Verif. assist.
4. Two-step
5. AutoProof
http://se.inf.ethz.ch/research/
eve/
http://cloudstudio.ethz.ch/
comcom/
See TAP 2015’s proceedings for
references to technical papers