3. Introduction
3
Survey of 159 papers on test suite minimization, regression test selection, and test
case prioritization.
Intention is not to undertake a systematic review, but rather to provide a broad
state-of-the-art view on these related fields.
Note: I’m going to go back and forth between spelling minimization and
prioritisation with s’ and z’s
4. Introduction
4
Regression Testing: Provide confidence that the newly introduced changes do not
obstruct the behaviors of the existing, unchanged part of the software.
Difficulties Include:
• Black-box development with 3rd party applications
• Agile development
Note: Most straightforward approach is “retest-all”, but may not be viable in all
scenarios
5. Introduction
5
A number of different approaches have been studied to aid the regression testing
process. Three major branches include:
Test Suite Minimization: Process that seeks to identify and then eliminate the
obsolete or redundant test cases from the test suite.
Test Case Selection: Select a subset of test cases that will be used to test the
changed parts of the software.
Test Case Prioritization: Identify the “ideal” ordering of test cases that maximizes the
desirable properties, such as early fault detection.
6. Overview
6
1. Motivation
2. Background
3. Test Case Selection
4. Test Suite Minimization
5. Test Case Prioritization
6. Summary and Conclusion
7. Suggestions
8. Lessons Learned
7. Motivation
7
Why is this the right set of topics for a survey?
• Each topic is related by a common thread of optimization of already existing test
cases.
• All differ from areas that focus on test case generation.
• Intimate relationship between the topics (e.g. minimization could be performed
by prioritizing a set of cases and choosing the first N).
Is there already a recent survey in this area?
• Most similar paper was a survey on Regression Test Selection techniques in 1996.
• No previous survey paper that consider Prioritization, Selection, and Minimization
collectively.
8. Background
8
Redefine regression testing and further elaborate on the distinction of each
optimization technique defined in the introduction.
Classification of Test Cases
Reusable – Only execute parts of the program that remain unchanged. Not valuable
for new changes, but assist with future regression checks.
Retestable – Test cases that are still valid after a set of changes and can validate if
any regression has occurred.
Obsolete – Could be rendered obsolete due to: input/output is no longer relevant
and/or no longer test the desired specification (i.e. requirements changed).
10. Test Case Selection
10
Compare Test Case Selection vs. Test Case Minimisation
• Very similar to one another; both revolve around choosing a subset of test cases
from the test suite.
• Test suite minimization often based on metrics (e.g. code coverage) of an entire
application.
• Test case selection based on finding relevant tests to be run.
11. Test Case Selection
11
Integer Programming Approach
Optimization program in which all of the variables are restricted to integers.
• Heavily relies on two matrices that describe the relation between program
segments and test cases. (program segment can be defined as a single-entry,
single exit block of code)
• Matrix function represented as am1x1 + am2x2 >= bm (aij equal to 1 if the segment-
test case relation exists, 0 otherwise)
• Results in a decision vector (subset of selected test cases) < x1, … , xn > where xi is
equal to 1 if the ith test case is included.
Problematic with control flow changes. Entire test suite has to be run again.
12. Test Case Selection
12
Data-flow Analysis Approach
Technique for gathering information about the possible set of values calculated at
various points in a computer program. (i.e. How does input flow through the
application)
Seek to identity new, modified or deleted definition-use pairs in the new version of
the program; then select those cases that exercise these pairs. (Does the new code
impact the test data being used?)
Problematic with modifications that are unrelated to data-flow change. These test
scenarios will not be selected for testing.
13. Test Case Selection
13
Symbolic Execution Approach
A means of analyzing a program to determine what inputs cause each part of a
program to execute.
function(f) {
if f == 2 then return fail();
else return success();
}
1. Find all input partitions.
2. Produce test cases so that each input partition is executed at least once.
3. Given information on where the code has been modified (e.g. a diff), return
modified code segments and the test cases that execute these segments.
Drawback is the algorithmic complexity of symbolic execution as well as how
expensive it can be to execute.
14. Test Case Selection
14
Graph-Walk Approach
1. Parse P and P’ into graph data structures.
2. Traverse each graph and compare the nodes
3. If a node in P is not the same as the node in P’, select all the test cases that
execute the code within that node.
Problematic since there is no data dependence, the approach could include test
cases that provide little to no value.
15. Test Case Selection
15
Textual Difference Approach
A very similar approach to the Graph-Walk approach
• Uses the diff tool provided by Unix.
• Code sanitized to remove any characters that would not introduce change (e.g.
whitespace)
16. Test Case Selection
16
Path Analysis
• Construct exemplar paths from P and P’
• Paths in P’ are categorized as new, modified, cancelled, or unmodified.
• Since all test cases and the paths they execute in P are known, the test cases that
traverse the modified paths in P’ are selected.
The authors had a poor definition of “modified”. Test cases that executed new or
cancelled code was not chosen. However, these paths could lead to regression.
17. Test Case Selection
17
Modification-based Technique
Yet another similar approach to Graph-Walking
• Introduced a testing framework called TestTube.
• Partitions P and P’ into program entities (nodes), then monitors the test cases to
find out the code that each test case executes.
• Those entities that were different are selected.
Since the entities include not only functions, but variables, any test case that
executes modified functions will be selected.
This differs from the data-independent Graph-Walking approach described
previously. Modification-based technique encompasses data as well.
18. Test Case Selection
18
Firewall Approach
Draw a firewall around the modules of the system that need to be retested.
A given module M can be represented as:
• No Change NoCh(M)
• Only Code CodeCh(M)
• Spec Change SpecCh(M)
Considering integrations between module A and module B
• Ignore NoCh(A) ^ NoCh(B)
• If A and B are modified in either code or in spec (CodeCh or SpecCh respectively)
the tests should be rerun.
19. Test Case Selection
19
Design-based Approach
• Black-box, design level regression test selection that used UML-based designs.
• Requires traceability between design and test cases
• Leveraged obsolete, retestable, and reusable as highlighted in the background
Possible to select test cases that provide no value as a UML diagram does not
encapsulate all code interactions. (e.g. change a method, but diagram doesn’t
dictate it is ever called, just exists)
21. Test Suite Minimization
21
Heuristics
Essential test cases
• If a test requirement can be satisfied by one and only one test case
Redundant test cases
• If a test case satisfies only a subset of the test requirements satisfied by another
test case.
GE Heuristic
Select the test case that satisfies the maximum number of unsatisfied test
requirements.
GRE Heuristic
Remove all redundant test cases in the test suite (which may make some test cases
essential). Then run the GE heuristic.
22. Test Suite Minimization
22
Heuristics
Empirical evidence suggests no single approach is better
• Concerned with heuristics more so than preciseness.
Vast majority of presented literature focused on the minimal hitting set problem.
Most minimization techniques are based on coverage criteria, there were
exceptions.
• Minimizing the test itself (start with a failed test).
• Black-box approach to program input/output (research in state machines).
Different inputs may not flow through different branches.
24. Test Case Prioritisation
24
Coverage-based Prioritisation (code)
Structural coverage often used as prioritization criterion. The more code a test
executes, the higher chance of finding a fault.
Approaches include:
• Branch-total (number of branches covered by test cases)
• Branch-additional (number of additional branches a test case would execute)
• Statement-total
• Statement-additional
25. Test Case Prioritisation
25
Interaction Testing (black box)
Necessary when the system under test involves multiple combinations of different
components. (consider the application environment, Operating System)
Research focused on findings those interactions that impact a higher user base. (e.g.
prioritize Windows testing over Linux).
Additional research done in GUI-based programs.
• Take a sequence of inputs and find the case that executes the most code.
• Consider user interaction data for prioritisation (heat map).
26. Test Case Prioritisation
26
Distribution-based Approach
Profile test cases based on a dissimilarity metric, a real number representing the
degree of dissimilarity between two inputs.
Cluster test cases according to their similarities which can reveal:
• Similar profiles may indicate a group of redundant test cases
• Isolated clusters may contain test cases in unusual conditions (fault-proneness)
27. Test Case Prioritisation
27
History-based Approach
• Based on association clusters of software artifacts.
• If two files are often modified together, they will be clustered together.
• Each file is also associated with test cases that impact or execute it.
Non-source file (e.g. media, documentation) defects can be as severe as source
code defects.
28. Test Case Prioritisation
28
Requirement-based Approach
• Test cases are mapped to software requirements
• Prioritisation mapped by customer-assigned priority and/or implementation
complexity.
Makes the prioritization very subjective (customers will have conflicting priorities)
29. Test Case Prioritisation
29
Model-based Approach
• Test cases classified into a high priority set TSH and a low priority set TSL
• Initial prioritization was randomly assigned
• Test case is assigned high priority if it is relevant to the modification made in the
model.
Similar approach to the UML based approach when selecting test cases.
30. Test Case Prioritisation
30
Session-based Approach
• Recorded user sessions from the previous version of the (web) application.
• Thought to be ideal for testing web applications as it reflects actual use.
• Metrics such as number of HTTP requests, frequency of visits.
• Better than random selection, but no single prioritization criterion is always the
best.
31. Test Case Prioritisation
31
Cost-Aware Approach
Typical prioritization approaches assume equal fault level and cost.
Areas of focus similarly categorized:
• Time based (tests that take a long time, need a way to fit X tests in Y units of time)
• Fault level (prioritize most catastrophic tests first, not necessarily any fault)
32. Meta-Empirical Studies
32
• Empirical evaluation considered post-hoc, knowledge of faults is known. Without
previous knowledge of faults, not possible to perform a controlled experiment.
• Studies done in regards to seeded vs. real faults (concluded seeded faults can be
safely used in place of hand-seeded faults).
• Frequency of regression testing has a significant impact of the cost-effectiveness
of RTS techniques. The longer the window between tests, the more tests are
selected, lowering the value-add.
• Research efforts attempting to apply an RTS technique based on the type of
program (no silver bullet; Session-based for web applications, Model-based that
had its source generated from UML)
34. Analysis of Current Global Trends
34
Consider the graph as not a representation of the number of publications, but
trends of research popularity (single publication can count towards two categories)
35. Analysis of Current Global Trends
35
• 60% of studies included less than 10,000 lines of code.
• 70% of studies included less than 1,000 test cases.
37. State of the Art
37
• Among the class of RTS techniques, the graph walk approach is the most
predominant. Intuitive and incredibly generic.
• Two ideas played essential roles: test case classification and safe regression test
selection (if a modification occurred, it will be selected).
• Greedy algorithms are prominent in the selected literature (as much as possible,
as soon as possible).
38. Trends
38
• Emphasis on models (early adoption was very code focused)
• Increased domains (e.g. web applications)
• Cost-awareness – more and more literature are starting to consider test time and
amount of fault.
39. Issues / Limitations
39
Limited subjects (60% from the SIR repository). Hard to prove the proposed research
techniques can be generalized.
Solutions
• Design a method that will allow a realistic simulation of real software faults.
• Engage with open source and Industry
Technology Transfer observations of the literature suggests the community may have
reached maturity and its time to transfer.
Out of 159 papers, only 31 of them have an author involved in industry.
Out of 159 papers, only 12 consider industrial software.
40. Future Direction
40
Orchestrating regression testing techniques with test data generation
• Self healing tests
Multi-Objective Regression Testing
• Group tests requiring a given environment together, reduce cost.
Consideration of Other Domains
• Most were white-box
Tool Support
• No readily available tools means practical adoption will remain limited
41. Conclusion
41
Trends in literature show..
• The community is focused on prioritization, especially Graph-Walking.
• The community it moving towards assessment of complex trade offs (cost and
value)
• More are becoming interested in the research area. Number of publications
continue to rise.
42. Suggestions
42
• More literature on Minimization and/or clearer content.
• Briefly describe the references used. Felt a lot of the references forced you to
read the paper.
For a paper meant to give an overview of the state of the art.. it did just that
43. Lessons Learned
43
• How big of an area of research regression testing is
• Symbolic execution
• Consider binaries to be a source of fault