call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
FaCoY – A Code-to-Code Search Engine
1. FaCoY – A code-to-code
search engine
Kisub Kim1, Dongsun Kim1,Tegawendé F. Bissyandé1,
Eunjong Choi2, Li Li3, Jacques Klein1, andYves Le Traon1
1SnT, University of Luxembourg - Luxembourg
2Nara Institute of Science and Technology (NAIST) - Japan
3Faculty of Information Technology, Monash University - Australia
1
01. 06. 2018
SerVal
3.1 - the Interdisciplinary Centre for
Security Reliability and Trust 1.1 - logotype of the University
of Luxembourg
2. function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
I want to find a loop in a singly linked list
2
3. I want to find a loop in a singly linked list
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
Is this correct?
Any better implementation?
How can I improve my code?
3
6. State-of-the-art
Static Approaches
Dynamic Approaches
- Code clone detection techniques.
- Mostly focus on textually, structurally or syntactically similar code.
- Leverage various intermediate representations to compute code similarity:
Static approaches tend to miss fragments that have similar behaviour.
- Identify programs that retrieve similar results for the same inputs.
- Generate random inputs, rely on symbolic, concolic execution,
check abstract memory states.
- Compare instruction-level execution traces [DyCLINK @FSE16]
Dynamic approaches do not scale to large repositories and are not
always operational.
6
• Token-based [CCFinder @TSE02]
• AST-based [Deckard @ICSE07]
• Graph-based [GPLAG @KDD06]
9. Conceptual Steps
9
Code fragment
What functionality
is implemented?
What are related
implementations?
What are other representative
tokens for the functionality?
Q&A Posts (questions + code snippets)
Similar Code
fragments
Which code fragments best match
these tokens?
Target code base
10. Conceptual Steps
9
Code fragment
What functionality
is implemented?
What are related
implementations?
What are other representative
tokens for the functionality?
Q&A Posts (questions + code snippets)
Similar Code
fragments
Which code fragments best match
these tokens?
Target code base
11. Conceptual Steps
9
Code fragment
What functionality
is implemented?
What are related
implementations?
What are other representative
tokens for the functionality?
Q&A Posts (questions + code snippets)
Similar Code
fragments
Which code fragments best match
these tokens?
Target code base
12. Conceptual Steps
9
Code fragment
What functionality
is implemented?
What are related
implementations?
What are other representative
tokens for the functionality?
Q&A Posts (questions + code snippets)
Similar Code
fragments
Which code fragments best match
these tokens?
Target code base
13. Conceptual Steps
9
Code fragment
What functionality
is implemented?
What are related
implementations?
What are other representative
tokens for the functionality?
Q&A Posts (questions + code snippets)
Similar Code
fragments
Which code fragments best match
these tokens?
Target code base
15. How it works (Visually)
Input code
11
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
16. How it works (Visually)
Input code
11
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
used_classes : Node
used_classes : hasLoop
methods_called : next
class_instance_creation : currentNode
17. How it works (Visually)
Input code
Syntactically
similar code
11
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
used_classes : Node
used_classes : hasLoop
methods_called : next
class_instance_creation : currentNode
18. How it works (Visually)
Input code
Syntactically
similar code
Natural language
description
of input code
11
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
used_classes : Node
used_classes : hasLoop
methods_called : next
class_instance_creation : currentNode
19. How it works (Visually)
Input code
Syntactically
similar code
Natural language
description
of input code
11
function boolean hasLoop(Node startNode){
Node currentNode = startNode;
while (currentNode = currentNode.next());
return false;
}
used_classes : Node
used_classes : hasLoop
methods_called : next
class_instance_creation : currentNode
27. Step 0: Dataset Indexing
• Parse the code snippets from answer posts and
code files to generate an Abstract Syntax Tree (AST).
• Preprocess the natural language text from posts.
• Indexed as inverted indices.
Answer
Posts
Post Analyzer
Snippet
Index
Code Snippet
with Metadata
Question
Posts
Post Analyzer
Question
Index
Full Post
Information
Project
Repository
Codes
Code Analyzer
Project
Code Index
14
29. User Input
Answer Snippets
Code
Query
Code Fragment
(2)
Generating
Code Query
(1)
Question
Answer
Snippet
Searching for
Similar Code
Snippets
Stack
Overflow
Step 2: Syntactic Search in Q&A Answers
16
30. User Input
Answer Snippets
Code
Query
Code Fragment
(2)
(3)
Question Posts
Generating
Code Query
(1)
Question
Answer
Snippet
Question
Answer
Snippet
Searching for
Similar Code
Snippets
Searching for
Similar
Questions
Stack
Overflow
Step 3: Collection of Descriptive Natural
Language terms
17
31. User Input
Answer Snippets
Code
Query
Code Fragment
(2)
(3)
Question Posts
Generating
Code Query
(1)
Question
Answer
Snippet
Question
Answer
Snippet
Searching for
Similar Code
Snippets
Searching for
Similar
Questions
Stack
Overflow
Generating
Alternate
Code Query
Code
Queries
(4)
Step 4: Query Alternation
18
32. User Input
Answer Snippets
Code
Query
Code Fragment
(2)
(3)
Question Posts
Generating
Code Query
(1)
Question
Answer
Snippet
Question
Answer
Snippet
Searching for
Similar Code
Snippets
Searching for
Similar
Questions
Stack
Overflow
Generating
Alternate
Code Query
Code
Queries
(4)
GitHub Codebase
Search Results
(5)
Searching for
Code
Examples
Step 5: Results Retrieving from Codebase
19
33. User Input
Snippet
Index
Code
Index
Code
Query
Generating
Code Query
Code
Query
Code Fragment
Search Results
(2)
(3)
(4)(5)
Question
Index
Generating
Code Query
(1)
Question
Answer
Snippet
Question
Answer
Snippet
Searching for
Similar Code
Snippets
Searching for
Similar
Questions
Searching for
Code
Examples
Overview of the Approach
20
38. RQ1
How relevant are code examples found by FaCoY compared to other code-
to-code search engines?
RQ2
What is the effectiveness of FaCoY including semantic clones based on a
code clone benchmark?
RQ3
Do the semantically similar code fragments yielded by FaCoY exhibit
similar runtime behavior?
RQ4 Could FaCoY recommend correct code as alternative of buggy code?
Research Questions
24
39. Code Clone Type Definitions
Clone Type Definition
Type-1 Identical code fragments, except for white-space, layout, and comments
Type-2
Identical code fragments, except for identifier names and literal values
+ Type-1
Type-3
Syntactically similar, but statements are added, modified and/or
removed with respect to each other + Type-1 and Type-2
Type-4
Syntactically dissimilar, but the same functionality
= Semantic clones.
25
• Stefan Bellon, Rainer Koschke, Guildo Antoniol, Jens Krinke, and Ettore Merlo, 2007. Comparison and evaluation of clone detection tools. IEEE
Transactions on Software Engineering 22, 9 (2007), 557-591.
• Chanchal K Roy, James R Cordy, and Rainer Koschke, 2009. Comparison and evaluation of code clone detection techniques and tools: A qualitative
approach. Science of Computer Programming 74, 7 (2009), 470-395.
• Fang-Hsiang Su, Jonathan Bell, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, and Tony Jebara, 2016. Code Relatives: Detecting similarly
Behaving Software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016).
ACM, 702-714.
40. RQ 1. Comparison with Code-to-Code
Search Engines
• Query tools with top 10 Stackoverflow snippets
• Manual checking
• Does FaCoY find similar code fragments?
• Syntactically or Semantically?
26
41. RQ 2. Benchmark Assessment
• IJaDataset 2.0 of 25,000 projects (> 3 M files)
• BigCloneBench annotates 8,345,104 clone pairs with 43 functionalities
• Can FaCoY find more semantic clones (MT3 or T4) than the state-of-the-art?
No initial user query
No query alternation
No query alternation &
No query structuring
27
42. Clone Types
T1 T2 VST3 ST3 MT3 WT3/T4 Total
# of samples 10 10 10 10 10 10 60
BigCloneBench
missed to include
4 1 2 25 32
• For the rest of the 28 files, FaCoY points 26 files correctly,
• But failed to locate in the files.
• In only 2 cases, FaCoY completely failed.
FaCoY can detect clones that BigCloneBench missed.
Double Checking FaCoY’s False Positives
28
43. F18: Play Sound
F19: Take Screenshot to File
F21: XMPP Send Message
FaCoY’s Limitation
Requiring external
APIs and libraries
Pure computation tasks
F7: Bubble Sort Array
F14: Binary Search
F41: Transpose a Matrix
29
FaCoY performs much better with code that are
requiring external APIs and Libraries.
44. RQ 3.Validating Semantic Clones
• Use DyCLINK - A dynamic approach that computes similarity of execution traces to detect code
relatives.
• Index the benchmark of DyCLINK: 411 pairs as code relatives.
• Results:
•FaCoY’s hit ratio is at 68% (278 out of 411 code fragments)
•FaCoY’s Mean Reciprocal Rank value is 0.18 (retrieves into lower rankings)
Google Code Jam
- 2011: Irregular Cake
- 2012: Perfect Game
- 2013: Cheaters
- 2014: Magical Tour
30
45. RQ 4. Recommending code for patches
Buggy code
• Buggy information
Project: Apache commons-lang
File path: projects/Lang/14/org/apache/commons/lang3/StringUtils.java
As maintainers, can we quickly find alternative implementations?
31
46. RQ 4. Recommending code for patches
Buggy code
Additionally, we consider more cases:
1. Are cs1 and cs2 nulls?
2. Are cs1 and cs2 have the same values?
3. Are cs1 and cs2 the same objects?
4. Are cs1 and cs2 are String objects?
5. If they are not pure String objects, check the equality character by character
32
47. RQ 4. Recommending code for patches
Fix
Buggy code
• Commit: cf7211f9 by Matthew Jason Benson, 01/23/2012 06:47 PM
• parent: c8afaa3e
• git-svn-id: https://svn.apache.org/repos/asf/commons/proper/lang/trunk@1234915
• Log: [LANG-786] StringUtils equals() relies on undefined behavior; thanks to Daniel Trebbien
33
48. RQ 4. Recommending code for patches
• 395 bugs in Defect4J repair benchmark [@ISSTA14]
• 21 FaCoY recommendations are correct (manual assessment)
Fix
Buggy code
34