Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
MeCC: Memory Comparison-based Code Clone Detector
1. MeCC: Memory Comparison-
based Clone Detector
Heejung Kim1,Yungbum Jung1, Sunghun Kim2, and Kwangkeun Yi1
Seoul National University
1
2 The Hong Kong University of Science and Technology
http://ropas.snu.ac.kr/mecc/
1
2. Code Clones
• similar code fragments
(syntactically or semantically)
static PyObject * static PyObject *
float_add(PyObject *v, PyObject *w) float_mul(PyObject *v, PyObject *w)
{ {
double a,b; double a,b;
CONVERT_TO_DOUBLE(v,a); CONVERT_TO_DOUBLE(v,a);
CONVERT_TO_DOUBLE(w,b); CONVERT_TO_DOUBLE(w,b);
PyFPE_START_PROTECT(“add”,return 0) PyFPE_START_PROTECT(“multiply”,return 0)
a = a + b; a = a * b;
PyFPE_END_PROTECT(a) PyFPE_END_PROTECT(a)
return PyFloat_FromDouble(a); return PyFloat_FromDouble(a);
} }
2
14. MeCC: Our Approach
• Static analyzer estimates the semantics of
programs
• Abstract memories are results of analysis
• Comparing abstract memories is a measure
14
32. Detected Clones
Total 623
6% 2% code clones
39%
53%
Type-1 Type-2
Type-3 Type-4
C. K. Roy and J. R. Cordy. A survey on software clone detection research. SCHOOL OF COMPUTING TR 2007-541, QUEENʼS UNIVERSITY, 115, 2007.
36. Finding Potential Bugs
• A large portion of semantic clones are due
to inconsistent changes
• Inconsistent changes may lead to potential
bugs (inconsistent clones)
Two semantic clones with potential bugs
36
37. #1 Missed Null Check
const char *GetVariable (VariableSpace space, const char *name)
{
struct_variable *current;
if (!space) parameter name also should be checked!
return NULL;
for (current=space-next;current;current=current-next)
{
if (strcmp(current-name,name) == 0)
{
return current-value;
}
}
return NULL;
}
const char *PQparameterStatus (const PGconn *conn, const char *paramName)
{
const pgParameterStatus *pstatus;
if (!conn || !paramName)
return NULL;
for (pstatus=conn-pstatus; pstatus!=NULL; pstatus = pstatus-next)
{
if (strcmp(pstatus-name,paramName)== 0)
return pstatus-value;
}
return NULL;
} 37
38. #2 A Resource Leak Bug
PyObject *pwd_getpwall (PyObject *self)
{
PyObject *d;
struct passwd *p;
if ((d = PyList_New(0)) == NULL)
return NULL;
setpwent(); open user database
while ((p = getpwent()) != NULL) {
PyObject *v = mkpwent(p);
if (v==NULL || PyList_Append(d,v)!=0) {
Py_XDECREF(v);
Py_DECREF(d);
return NULL;
A resource leak without
}
Py_DECREF(v); endpwent() procedure call
}
endpwent(); close user database
return d;
}
Python project revision #20157
38
41. Procedure A was created
revision #20157
with a resource leak
Procedure B (a code clone of A)
revision #38359 is introduced
without resource leaks
4 years the resource leak can be fixed
if MeCC were applied
The resource leak bug in
revision #73017
procedure A is fixed
41
44. Study Limitation
• Projects are open source and may not be
representative
• All clones are manually inspected
• Default options are used for other tools
(CCfinder, Deckard, PDG-based)
44
45. Conclusion
• MeCC: Memory Comparison-based Clone
Detector
• a new clone detector using semantics-
based static analysis
• tolerant to syntactic variations
• can be used to find potential bugs
45
48. Time Spent
Projects KLOC FP Total Time
Python 435 39 264 1h
Apache 343 24 191 5h
PostgreSQL 937 47 278 7h
Ubuntu 64-bit machine with a 2.4 GHz Intel Core 2 Quad CPU and 8 GB RAM.
• False positive ratio is less than 15%
• Slower than other tools
(deep semantic analysis)
48