Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RefactoringMiner, a technique that overcomes the above limitations. At the heart of RefactoringMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RefactoringMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RefactoringMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Accurate and Efficient Refactoring Detection in Commit History
1. Accurate and Efficient Refactoring
Detection in Commit History
1May 31, 2018
Danny DigNikolaos Tsantalis Matin Mansouri Laleh Eshkevari Davood Mazinanian
2. Refactoring is noise in evolution analysis
• Bug-inducing analysis (SZZ): flag refactoring edits as bug-introducing
changes
• Tracing requirements to code: miss traceability links due to
refactoring
• Regression testing: unnecessary execution of tests for refactored
code with no behavioral changes
• Code review/merging: refactoring edits tangled with the actual
changes intended by developers
2
3. There are many refactoring detection tools
• Demeyer et al. [OOPSLA’00]
• UMLDiff + JDevAn [Xing & Stroulia ASE’05]
• RefactoringCrawler [Dig et al. ECOOP’06]
• Weißgerber and Diehl [ASE’06]
• Ref-Finder [Kim et al. ICSM’10, FSE’10]
• RefDiff [Silva & Valente, MSR’17]
3
4. Limitations of previous approaches
• Dependence on similarity thresholds
• thresholds need calibration for projects with different characteristics
• Dependence on built versions
• only 38% of the change history can be successfully compiled [Tufano et al., 2017]
• Unreliable oracles for evaluating precision/recall
• Incomplete (refactorings found in release notes or commit messages)
• Biased (applying a single tool with two different similarity thresholds)
• Artificial (seeded refactorings)
4
5. Why do we need better accuracy?
5
Empirical studies
Refactoring
detection
Library adaptation
Framework migration
poor
accuracy
7. Contributions
1. First refactoring detection algorithm operating without any
code similarity thresholds
2. RefactoringMiner open-source tool with an API
3. Oracle comprising 3,188 refactorings found in 538 commits
from 185 open-source projects
4. Evaluation of precision/recall and comparison with previous
state-of-the-art
5. Tool infrastructure for comparing multiple refactoring
detection tools
7
8. Approach in a nutshell
AST-based statement matching algorithm
• Input: code fragments T1 from parent commit and T2 from child commit
• Output:
• M set of matched statement pairs
• UT1 set of unmatched statements from T1
• UT2 set of unmatched statements from T2
• Code changes due to refactoring mechanics: abstraction, argumentization
• Code changes due to overlapping refactorings or bug fixes:
syntax-aware AST node replacements
8
9. 9
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
AfterBefore
10. 10
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(int count) {
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
AfterBefore
11. 11
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
AfterBefore
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
12. 12
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
AfterBefore
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
13. 13
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
}
return addresses;
}
try {
addresses[i] =
new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
AfterBefore
14. 14
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
15. 15
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
AfterBefore
textual similarity 30%
16. 16
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(1) Abstraction
17. 17
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(1) Abstraction
18. 18
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address(host, port);
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(2) Argumentization
19. 19
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(2) Argumentization
20. 20
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", ports.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(3) AST Node Replacements
21. 21
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
(3) AST Node Replacements
22. 22
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
textual similarity = 100%
23. 23
private static Address[] createAddresses(int count) {
Address[] addresses = new Address[count];
for (int i = 0; i < count; i++) {
try {
addresses[i] =
new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
}
return addresses;
}
private static List<Address> createAddresses(AtomicInteger ports, int count){
List<Address> addresses = new ArrayList<Address>(count);
for (int i = 0; i < count; i++) {
addresses.add(createAddress("127.0.0.1", ports.incrementAndGet()));
}
return addresses;
}
protected static Address createAddress(String host, int port) {
try {
return new Address("127.0.0.1", PORTS.incrementAndGet());
}
catch (UnknownHostException e) {
e.printStackTrace();
}
return null;
}
AfterBefore
A
B
C
D
E
F
G
1
2
3
9
4
5
6
7
8
M = {(C, 4) (D, 5) (E, 6) (F, 7)}
UT1 = {A, B, G}
UT2 = {8}
24. Extract Method detection rule
(M, UT1, UT2) = statement-matching(createAddresses, createAddress)
M = {(C, 4) (D, 5) (E, 6) (F, 7)} UT1 ={A, B, G} UT2 = {8}
createAddress is a newly added method in child commit
createAddresses in parent commit does not call createAddress
createAddresses in child commit calls createAddress
|M| > |UT2|
createAddress has been extracted from createAddresses
24
25. Evaluation
RQ1: What is the accuracy of RefactoringMiner and how does it
compare to the state-of-the-art?
RQ2: What is the execution time of RefactoringMiner and how does
it compare to the state-of-the-art?
25
26. Oracle construction
• Public dataset with validated refactoring instances: 538 commits from
185 open-source projects [Silva et al., FSE’2016 Distinguished Artifact]
• We executed two tools:
RefactoringMiner and RefDiff [Silva & Valente, MSR’2017]
• We manually validated all 4,108 detected instances with 3 validators
for a period of 3 months
• 3,188 true positives and 920 false positives
26
27. Comparison with state-of-the-art
RefDiff [Silva & Valente, MSR’2017]
• Commit-based refactoring detection tool
• Evaluation on 448 seeded refactorings in 20 open-source projects
• RefDiff has much higher precision/recall than
Ref-Finder [Kim et al. 2010] and RefactoringCrawler [Dig et al. 2006]
• Ref-Finder and RefactoringCrawler need fully built Eclipse projects
as input
27
28. 28
• RMiner better precision
in all refactoring types
• In half of the types
RefDiff has better recall
• Overall RMiner has +22%
precision and +1.5% recall
29. Advantage of RefDiff
• Treats code fragments as bags of tokens and ignores the structure
29
- private void startScanner() throws Exception + public void startScanner() throws Exception
{ {
- // check if scanning is enabled + if (!isScanningEnabled())
- if (scanIntervalSeconds <= 0) return;
- if ( "manual".equalsIgnoreCase( reload ) )
return; return;
+ public boolean isScanningEnabled ()
+ {
+ if (scanIntervalSeconds <=0 || "manual".equalsIgnoreCase( reload ))
+ return false;
+ return true;
+ }
30. Disadvantage of RefDiff
• Inability to deal with changes in the tokens
30
- if (eventBus != null) { + onRemoteStatusChanged(lastRemoteInstanceStatus, currentRemoteInstanceStatus);
- StatusChangeEvent event = new
StatusChangeEvent(lastRemoteInstanceStatus,
- currentRemoteInstanceStatus);
- eventBus.publish(event);
- }
+ protected void onRemoteStatusChanged(InstanceInfo.InstanceStatus oldStatus,
InstanceInfo.InstanceStatus newStatus) {
+ if (eventBus != null) {
+ StatusChangeEvent event = new StatusChangeEvent(oldStatus, newStatus);
+ eventBus.publish(event);
+ }
+ }
31. Execution time per commit [ms]
• On median, RefactoringMiner is 7 times faster than RefDiff
31
ms
32. Limitations + Future work
• Missing context: Pull Up reported as Move, if a class between the
source and destination is unchanged.
• Nested refactorings: unable to detect Extract Method applied
within an extracted method
• Unsupported refactorings: refactoring types, such as Rename
Variable/Parameter/Field, Extract/Inline Variable can be supported
from the analysis of AST replacements.
• Oracle bias: plan to add more tools for constructing the oracle
(challenge: make tools work without binding information)
32
33. Conclusions
• RefactoringMiner: commit-based refactoring detection
• No similarity thresholds
• High accuracy: 98% precision, 87% recall
• Ultra-fast: 58ms on median per commit
• Better than competitive tools (RefDiff): +22% precision, 7 times faster
• Largest and least biased refactoring oracle up to date
• 3188 true refactoring instances
• 538 commits
• 185 open-source projects
• 3 validators over 3 months (9 person-months)
33
http://refactoring.encs.concordia.ca/oracle/
https://github.com/tsantalis/RefactoringMiner
Notes de l'éditeur
My name is Nikolaos Tsantalis, associate professor at Concordia University, and I will present to you our work on Refactoring detection.The rest of co-authors are Matin (Master student in my group), Laleh (PDF in my group), Davood (PhD in my group), and Danny from Oregon State Univeristy.