Test-Driven Reuse: Improving the Selection of Semantically Relevant Source Code
1. Test-Driven Reuse
Improving the Selection of Semantically Relevant Code
Mehrdad Nurolahzade
mnurolah@ucalgary.ca
Department of Computer Science
University of Calgary
2 April 2014
4. Research Questions
Q1: Is interface-based retrieval effective in large
source code libraries?
Q2: Does including additional test facts improve
selection?
Q3: Does including additional test facts improve
approximate retrieval?
4Mehrdad Nurolahzade
5. An Assessment of Test-Driven Reuse*
• 10 realistic test-driven reuse tasks
• Solutions were verified to be in the repositories.
• Qualitatively analyzed top 10 results
• Each tool managed to retrieve only one good
solution.
*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-
Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International
Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.
Mehrdad Nurolahzade 5
6. An Assessment of Test-Driven Reuse*
• Potential bugs in tool prototypes
• Interface-based retrieval fails in large repositories
when keywords are very common or unknown.
*Mehrdad Nurolahzade, Robert J. Walker, Frank Maurer, "An Assessment of Test-
Driven Reuse: Promises and Pitfalls". In Proceedings of 13th International
Conference on Software Reuse (ICSR 2013), Pisa, Italy, June 18-20, 2013.
Mehrdad Nurolahzade 6
Q1: Is interface-based retrieval effective in
large source code repositories?
7. Reviver
Approach: Reviver
7
Similar Test
Cases
Test Case
Transformed
Source Code
Compiled
Binary Code
Results
Extract
Similarity Search
Compile
Test
Transform
DisplayWrite
Mehrdad Nurolahzade
Developer
Facts
System
Under Test
RetrieveRelevant
Source Code
Interface of
the System
Under Test
Interface-based Search
Transform
8. Reviver: Heterogeneous Data Model
Mehrdad Nurolahzade 8
Test Indexer Test Case
x
Lexical
Facts
Structural
Facts
Data Flow
Facts
Lexical Model Relational Model Graph Model
Test Case
y
Test Case
z
New Model
Other
Facts
9. Reviver: Multiple Representations
9Mehrdad Nurolahzade
Lexical Facts
AccountTest, Account,
from, to, Bank, bank,
getInstance, register,
testValidTransfer,
fromBalance,
getBalance, toBalance,
getLastTransaction,
Transaction, t,
transfer
Structural Facts
Data Flow Facts
10. Reviver: Federated Search
Lexical similarity (simLexical)
Reference similarity (simType)
Call-set similarity (simCall)
Data flow similarity (simDataFlow)
Σ
10Mehrdad Nurolahzade
Test Case Results
Lexical Model
Relational Model
Graph Model
Relational Model
11. Evaluation: Exact Match Retrieval
• Ad hoc interface-based retrieval prototype
• Repository: seeded with a subset of Merobase
• Tasks: 10 trial tasks from the original study
11Mehrdad Nurolahzade
12. Task Interface-based Retrieval Reviver
#1 3 1
#2 1 1
#3 1 1
#4 1 1
#5 13 1
#6 30 1
#7 2 1
#8 2 1
#9 1 1
#10 1 1
Mehrdad Nurolahzade 12
Q2: Does including additional test facts
improve selection?
Rank of the correct result for each task
13. Evaluation: Approximate Match
Retrieval
• Transformations generate variations of a query.
• Transformations can be combined (24-1=15).
Transaction t = from.transfer(100.0, to);
Name Transaction a = c.m2(100.0,
b);
Type C2t = from.transfer(“100”, to);
Scenario Transaction t = from.transfer(1.0,
to);
Protocol int id = from.transfer(to, 100.0,
true);Mehrdad Nurolahzade 13
14. 14Mehrdad Nurolahzade
Q3:Does including additional test facts
improve approximate retrieval?
Number of correct results for each transformation
(N)ame, (T)ype, (S)cenario, (P)rotocol
15. Conclusion
• Contributions
– An evaluation of interface-based retrieval
– A new paradigm for test-driven reuse
– A multi-representation reuse library
– The Reviver prototype
– A technique for evaluating test-driven reuse
• Implications
– How to detect semantic similarity in source code in
absence of lexical and type similarity?
– Multi-representation reuse libraries are promising.
15Mehrdad Nurolahzade
Notes de l'éditeur
The software community has witnessed the rise of the open source software movement over the past two decades. This has motivated the reuse community, more than ever, to develop tools and techniques that facilitate code reuse. One of these techniques is test-driven reuse, a promising approach that retrieves and verifies source code using test cases written by the developer. With this research we seek to improve the performance of test-driven reuse by suggesting an alternative approach to selecting relevant source code.[CLICK]
To demonstrate how source code selection works in existing test-driven reuse approaches I am going to use the unit test here.This JUnit test case verifies the fund transfer functionality in a bank.The system under test in this scenario is the Account class and the action being tested as [CLICK] the name of the test method suggests is the transfer() method at line 18.[CLICK]Fixture setup and precondition verification takes place in lines 6-16 and [CLICK] the post condition verification takes place by the assertions in lines 20-22.What an existing test-driven reuse system does after realizing that the developer is looking for an Account class, is [CLICK] …
… to extract the interface of the Account class [CLICK] and perform a search to see if there is a class matching this interface in the reuse library. [CLICK]A process known as interface-based retrieval.As you can see, the search query only includes a subset of the information in the test case. Other essential facts like the pre and post conditions of the transfer action, the presence of other collaborator classes like Bank, and their interactions with the Account class are not taken into consideration.[CLICK]
After realizing that the selection step in test-driven reuse can be improved we decided to empirically verify three things.First, we decided to verify if interface-based retrieval is an effective technique in large source code libraries.Second, we decided to find out if including those test facts currently ignored can improve the selection.And third, we decided to investigate how these additional facts impact a test-driven reuse system when the input query is an approximation of the solution in the reuse library.[CLICK]
In order to answer the first question we designed an experiment with the three existing test-driven reuse research prototypes.Our experiment consisted of 10 programming tasks, or in fact 10 test cases.Solutions associated with test cases were verified to be in the tool repositories.For each test case we qualitatively analyzed the first 10 recommendations made by each tool.In the end, each tool managed to retrieve only one good solution.[CLICK]
Aside from the potential bugs in the prototypes, we observed that in large repositories with millions of source files retrieving a solution at the top of the result list by supplying a set of common or arbitrary keywords is quite unlikely.Therefore in response to the research question one, we concluded that interface-based retrieval is an unreliable technique for finding relevant source code.[CLICK]
In order to answer the next two research questions we needed a new test-driven reuse system that would take advantage of those missing facts.Inspired by the case based reasoning paradigm we built a new prototype named Reviver. [CLICK]Here is how the existing test-driven reuse system works. I’ll show you how Reviver is different.Reviver extracts facts from the input test case. [CLICK]Then, it uses those facts to find [CLICK] similar test cases in the reuse library.Once it knows who the similar test cases are, it [CLICK] recommends the system under test in them as relevant source code. [CLICK]The rest of the process is the same as existing test-driven reuse systems.[CLICK]
Reviver relies on a heterogeneous data model for finding similar test cases.At index time, we process existing test code and extract lexical, structural, and data flow facts.These facts are persisted using separate representation schemes. [CLICK]A nice feature of this heterogeneous data model is that it is extensible.[CLICK]
Let’s go back to our example test case and see how it is represented in Reviver’s reuse library. Well, it would have three different representations. [CLICK]Its lexical representation includes the names of classes, objects, and methods in the test case. [CLICK]Its structural representation includes all types and the methods invoked on them. [CLICK]And finally, its data flow representation is a graph of data dependencies between methods invoked.[CLICK]
We built a federated search platform consisting of four similarity search heuristics on top of our multi-representation reuse library.Each similarity search heuristic independently finds similar test cases using one of the representation schemes.An aggregator merges the output of heuristics into a single result set by ranking test cases returned by multiple search heuristics higher.[CLICK]
Now, we evaluated Reviver to verify the correctness of our two hypothesizes.To test our first hypothesis that inclusion of test facts improves the selection of relevant source code we designed an experiment.Not having access to two of the three existing test-driven reuse prototypes, we built our own interface-based retrieval prototype to simulate these approaches.Then we populated a common repository utilized by both prototypes with a subset of the source files in the Merobase code search engine. We reused the ten tasks in our first study in this experiment. Then we compared the results of the Reviver with the other prototype.[CLICK]
This table demonstrates the rank of the correct recommendation made by the two prototypes for the 10 tasks.As you can see, for all 10 tasks, Reviver retrieved the correct result at the top of its result set.While, the interface-based prototype failed to place tasks 5 and 6 both consisting of common programming vocabulary among its top ten results.The results confirm our hypothesis that utilizing additional test facts improves the selection.[CLICK]
Then, in order to study the effect of missing facts when approximate input is provided we designed another experiment.We defined a set of transformations that are rule sets for generating variations of an input test case.We defined four of these transformations for name, type, scenario, and protocol attributes of test cases.Let’s see how each transformation is applied to a single line of code from our earlier example.The name transformation modifies the name tokens.The type transformation modifies the types.The scenario transformation changes how the test works and what it tests.And finally the protocol transformation modifies the design of the system under test.Combining these four transformations we came up with a total of 15 possible variants for each input query. Using the ten tasks from previous experiments, it expanded our task pool size to 150 test cases.[CLICK]
This graph summarizes the result of the experiment.The blue bars represent the interface-based retrieval prototype and the red bars represent our Reviver prototype.The height of the bars indicate the number of tasks for which a prototype retrieved the correct result.Below each pair of bars is the transformations that were applied to the input, hence we have 15 of these pairs.As you can see, the red bars are longer than their blue counterparts meaning that Reviver retrieved more correct results.Therefore, the result confirms that utilizing additional test facts improves selection when approximate input is provided. [CLICK]As you can see, Reviver is significantly better where (N)ame transformation is applied. [CLICK]However, both prototypes fail where (N)ameand(T)ype transformations are applied together.[CLICK]
In conclusion,with this research we have made five main contributions:An evaluation of the interface-based retrieval technique and the test-driven reuse prototypes.A new paradigm for test-driven reuse that utilizes lexical, structural, and data flow facts for selecting relevant source code.An approach for building a multi-representation reuse library.The Reviver prototype.And a new technique for evaluating test-driven reuse using approximate variations.As for the implications of our findings …Detecting semantic similarity in source code in absence of lexical and type similarity still remains an open question. Today, despite being able to build large libraries of open source code we are constrained in what we can retrieve and reuse in such libraries due to our narrow definition of similarity.We believe that the idea of multi-representation reuse libraries is promising and has to be further pursued by the reuse community.