Testing Concurrent Programs to Achieve High Synchronization Coverage
1. Testing Concurrent Programs to Achieve
High Synchronization Coverage
Shin Hong†, Jaemin Ahn†, Sangmin Park*, Moonzoo Kim†, Mary Jean Harrold*
Provable SW Lab† Aristotle Research Group*
KAIST, South Korea GeorgiaTech
Shin Hong @ PSWLAB 1 / 19
2. Overview
• A testing framework for concurrent programs
– To achieve high test coverage fast
• Key idea
1. Utilize coverage to test concurrent programs systematically
2. Manipulate thread scheduler to achieve high coverage fast
Measure Test run
thread1() { coverage
10: lock(m) Thread scheduling
...
{(10,20), …} controller
15: unlock(m)
... Coverage {(10,20),
Covered SPs
thread2() { estimator
20: lock(m) (20,10),…}
...
30: unlock(m) Estimated {(20,10), …}
target SPs Uncovered SPs Threads
Estimation phase Testing phase
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 2 / 19
3. Motivation: Concurrent Program is Error-Prone
• Multi-core processors make concurrent program popular
Concurrency at Microsoft – An Exploratory Survey [P. Godefroid et al., EC22008]
• 60 % of survey respondents are writing concurrent programs
• Reasons for concurrency in MS products
• However, correctness of concurrent programs is hard to achieve
– Interactions between threads should be carefully performed
– A large # of thread executions due to non-deterministic thread scheduling
– Testing technique for sequential programs do not properly work
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 3 / 19
4. Techniques to Test Concurrent Programs
Pattern-based bug detection Systematic testing
• Find predefined patterns of • Explore all possible thread
suspicious synchronizations scheduling cases [CHESS, Fusion]
[Eraser, Atomizer, CalFuzzer] • Limitation: limited scalability
• Limitation: focused on specific
bug
Random testing
• Generate random thread
scheduling [ConTest] Direct thread scheduling
• Limitation: may not investigate for high test coverage
new interleaving
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 4 / 19
5. Code Coverage for Concurrent Programs
• Test requirements of code 01: int data ;
coverage for concurrent …
10: thread1() { 20: thread2() {
programs capture different 11: lock(m); 21: lock(m);
thread interaction cases 12: if (data …){ 22: data = 0;
13: data = 1 ; ...
• Several metrics ... 29: unlock(m);
18: unlock(m); ...
– Synchronization coverage: ...
blocking, blocked, follows, Sync.-Pair: Stmt.-Pair:
synchronization-pair, etc. {(11, 21), {(12, 22),
– Statement-based coverage: (21,11), … } (22,13), … }
PSet, all-use, LR-DEF,
access-pair, statement-pair, etc.
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 5 / 19
8. Testing Framework for Concurrent Programs
(1) Estimates SP requirements,
(2) Generates test scheduling by
– monitor running thread status, and measure SP coverage
– suspend/resume threads to cover new coverage req.
Measure Test run
thread1() { coverage
10: lock(m) Thread scheduling
...
{(10,20), …} controller
15: unlock(m)
... Coverage {(10,20),
Covered SPs
thread2() { estimator
20: lock(m) (20,10),…}
...
30: unlock(m) Estimated {(20,10), …}
target SPs Uncovered SPs Threads
Estimation phase Testing phase
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 8 / 19
9. Thread Scheduling Controller
• Coordinates thread executions to satisfy new SP requirements
• Invokes an operation
(1) before every lock operation, and
(2) after every unlock operation
• Controls thread scheduling by
(1) suspend a thread before a lock operation
(2) select one of suspended threads to resume using three rules
... {(10,20), …}
Decide whether Covered SPs
invoke thread scheduler suspend, or {(20,10), …}
10: synchronized(m) { resume a Uncovered SPs
11: if (t > 0) { current thread
12: ...
Other threads’
status
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 9 / 19
11. Thread Schedule Decision Algorithm (2/3)
• Rule 2: Choose a thread to cover uncovered SP in next decision
Thread1 Thread 2
- Covered SPs:
20:
lock(m) (20,10),(20,22),
(10,22)
21: - Uncovered SPs:
unlock(m) (22,10)
10: 22:
lock(m) lock(m)
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 11 / 19
12. Thread Schedule Decision Algorithm (2/3)
• Rule 2: Choose a thread to cover uncovered SP in next decision
Thread1 Thread 2
20: - Covered SPs:
lock(m) (20,10),(20,22),
(10,22)
21:
25: - Uncovered SPs:
unlock(m)
(22,10)
22:
lock(m)
10:
lock(m)
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 12 / 19
13. Thread Schedule Decision Algorithm (3/3)
• Rule 3: Choose a thread that is unlikely to cover uncovered SPs
Thread1 Thread 2
20:
- Covered SPs:
lock(m) (20,10),(10,20)
(10,22),(22,10)
21: (20,22),
unlock(m)
- Uncovered SPs:
(10,60),(70,22),
10: 22: (80,22),(10,50),
lock(m) lock(m) (22,60)
schedule Thread 1: schedule Thread 2:
Since Thread 2 with line 22 remains Since Thread1 with line 10 remains
under control, more chance to under control, more chance to
cover (70, 22), (80, 22), (22, 60) (10, 60), (10, 50)
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 13 / 19
14. Empirical Evaluation
• Implementation [Thread Scheduling Algorithm, TSA]
– Used Soot for estimation phase
– Extended CalFuzzer 1.0 for testing phase
– Built in Java (about 2KLOC)
• Subjects
– 7 Java library benchmarks (e.g. Vector, HashTable, etc.) (< 11 KLOC)
– 3 Java server programs (cache4j, pool, VFS) (< 23 KLOC)
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 14 / 19
15. Empirical Evaluation
• Compared techniques
– We compared TSA to random testing
– We inserted probes at every read, write, and lock operations
– Each probe makes a time-delay d with probability p
• d: sleep(1ms), sleep(1~10ms), sleep (1~100ms)
• p : 0.1, 0.2, 0.3, 0.4,0.5
– We use 15 (= 3 x 5) different versions of random testing
• Experiment setup
– Executed the program 500 times for each technique
– Measured accumulated coverage and time cost
– Repeated the experiment 30 times for statistical significance in results
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 15 / 19
16. Study 1: Effectiveness
• TSA covers more SPs than random testings
– for accumulated SP coverage after 500 executions
Our technique
RND-sleep(<100ms) avg.
RND-sleep(<10ms) avg.
SP coverage
RND-sleep(1ms) avg.
ArrayList 1
test executions
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 16 / 19
17. Study 2: Efficiency
• TSA reaches the saturation point faster and higher
– A saturation point is computed by r2
(coefficient: 0.1, window size: 120 sec.) [Sherman et al., FSE 2009]
Saturation point
Our technique
SP coverage
RND-sleep(<10ms) avg.
RND-sleep(<100ms) avg.
RND-sleep(1ms) avg.
ArrayList 1
time (sec)
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 17 / 19
18. Study 3: Impact of Estimation-based Heuristics (Rule 3)
• TSA with Rule3 reaches higher coverage at faster saturation point
– Executes the program for 30 minutes, and computed the saturation points
• > 90% of thread scheduling decisions are made by the Rule 3
TSA w/o Rule 3 TSA with Rule 3
Program
Coverage time (sec) Coverage time (sec)
ArrayList1 177.6
177.6 274.4
274.4 181.2
181.2 184.2
184.2
ArrayList2 130.8 246.3 141.4 159.7
HashSet1 151.3 271.5 151.7 172.4 TSA
HashSet2 98.0 198.9 120.8 139.3 TSA w/o Rule 3
HashTable1 23.7 120.0 24.0 120.0
HashTable2 539.6 388.8 538.0 165.4
LinkedList1 179.9 278.2 181.2 155.0
LinkedList2 129.9 237.7 141.2 161.2
TreeSet1 151.6 258.4 151.4 191.2
TreeSet2 98.8 237.5 120.5 139.8
cache4j 201.9 205.8 202.2 146.1
pool T/O T/O 2950.5 431.1
VFS 246.7 478.2 260.1 493.9
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 18 / 19
19. Contributions and Future Work
• Contributions
– Thread scheduling technique that achieves high SP coverage for concurrent
programs fast
– Empirical studies that show the technique is more effective and efficient
to achieve high SP coverage than random testing
• Future works
– Study correlation between coverage achievement and fault finding in
testing of multithreaded programs
– Extend the technique to work with other concurrent coverage criteria
Testing Concurrent Programs to Achieve High Synchronization Coverage Shin Hong @ PSWLAB 19 / 19
Notes de l'éditeur
We have developed a testing framework for concurrent Java programs.As you know, a concurrent program has a large number of differentbehaviors depending on non-deterministic thread scheduling. To check various concurrent program executions systematically, Our technique tries to achieve high test coverage through testing. As test cases to achieve high branch coverage for sequential programs can detect many bugs, achieving high test coverage for concurrent programs can detect many concurrency bugs. In other words, similar to sequential program testing, where coverage metric is widely used for generating effective test inputs, our technique systematicallygenerates test scheduling to achieve high synchronization coverage. Two key features of our testing framework is, first, our framework explicitly utilize synchronization coverage for effective testing. Second, our testing framework utilizes a coverage-directed thread scheduler, which controls thread scheduling to achieve high synchronization coverage, based on estimated target coverage. To our knowledge, our framework is the first testing framework which aims to achieve high synchronization coverage by controlling thread scheduling directly.
As you see, synchronization-pair coverage properly representdifferent thread scheduling cases. Although many code coveragemetrics for concurrent programs seem useful, there are onlyfew technique that generate testing to achieve high coverage.So, we have developed a testing framework to maximizetest coverage achievement by manipulating thread scheduling.To generate high coverage testing framework, our technique first estimatewhich coverage requirement can be achieved in a programin this estimation phase.As you see the synchronization pair definition, the technique may not guess simply that every two synchronized blocks in a code can make coverage.So, our coverage estimator determine feasible synchronization-pair coverageBy checking aliasing and effects of other synchronizations by dynamic analysis.You can find detail description how coverage estimator works in our paper.The testing phase utilizes the estimation result as target to achieve.In testing phase, thread scheduling controller dynamically makedecision which thread executes at a moment. The thread scheduling controller manipulate two coveragedata structure. One is Covered SP which records set of already achieved coverage requirements in a testing. Another is Uncovered SP, a set of coverage requirement that is reachablebut not yet covered in a testing.Thread scheduling controller tries to increase Covered SPs and reduce Uncovered SPs by a greedy algorithm.