SlideShare a Scribd company logo
1 of 27
Download to read offline
.
         Filtering Clones for
       Individual User Based on
 .     Machine Learning Analysis

           Jiachen Yang, Keisuke Hotta, Yoshiki Higo,
                 Hiroshi Igaki, Shinji Kusumoto
          Graduate School of Information Science and Technology, Osaka University


                                   June 4, 2012

                                                                .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                           ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)          Fica@IWSC2012                                        June 4, 2012                          1 / 14
Motivating Example
                              Participants of survey




              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Motivating Example
                              Participants of survey
                                   1 2 3 4 5 6 7 8


              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Motivating Example
                              Participants of survey




              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Interesting U:0 vs I:8

1542 static . har *.
              c      .                                   126 c
                                                             . har *.
                                                                    .
1543 . istory_substring ( string , start , end).
     h                                         .         127 . ubstring ( string , start , end).
                                                             s                                 .
1544 .     const char *string;.   .                      128 .     const char *string;.    .
1545 .      int start , end;..                           129 .     int start , end;.  .
1546 . .
     {                                                   130 . .
                                                             {
1547 . register int len ;. .                             131 . register int len ;.  .
1548 . register char *result ;.   .                      132 . register char *result ;.    .
1549 . len = end − start;.    .                          133 . len = end − start;.     .
1550 . result = (char *)xmalloc (len + 1);.   .          134 . result = (char *)xmalloc (len + 1);.  .
1551 . strncpy ( result , string + start, len);. .       135 . strncpy ( result , string + start, len);.
                                                                                                       .
1552 . result [ len ] = '0';.  .                        136 . result [ len ] = '0';.   .
1553 . return result ;. .                                137 . return ( result );..
1554 . .
     }                                                   138 . .
                                                             }

       (a) lib/readline/histexpand.c                                          (b) stringlib.c
                     Figure: Example of source code in bash-4.2
                                                                        .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
 Jiachen Yang (IST, Osaka-U)                   Fica@IWSC2012                                      June 4, 2012                          3 / 14
Un-Interesting U:8 vs I:0



191  ... __P((char *, arrayind_t, . har *));. 309 static
                                  c          .                   int run_one_command __P((. har *));.
                                                                                               c         .
192 .static intmax_t subexpr __P((char *));. 310 .static
                                                .                int run_wordexp __P((char *));.   .
193 .static intmax_t expcomma __P((void));.311 .static
                                                  .              int uidget __P((void));..
194 .static intmax_t expassign __P((void));. 312 .static
                                                .                void init_interactive __P((void));. .
195 .static intmax_t expcond __P((void));. 313 .static
                                              .                  void init_noninteractive __P((void));..
196 .static intmax_t explor __P((void));.  .       314 .static   void init_interactive_script __P((void));..
197 .static intmax_t expland __P((void. );
                                         )
                                         .         315 .static   void set_shell_name __P((char. *));
                                                                                                 .

                   (a) expr.c                                                 (b) shell.c
                     Figure: Example of source code in bash-4.2



                                                                              .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                         ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                    Fica@IWSC2012                                            June 4, 2012                          4 / 14
Disagreed U:4 vs I:4

710 static int
711 displen (s)                                  1098 else
712       const char *s;                         1099 {
713 {                                            1100    if ( wcharlist == 0)
714   wchar_t *wcstr;                            1101   {
715   size_t wclen, slen ;                       1102      size_t len. .
                                                                      ;
716   wcstr = 0..;                               1103      . len = mbstowcs (wcharlist, charlist , 0);.
                                                                                                      .
717   . len = mbstowcs (wcstr, s, 0);.
      s                              .           1104      . if (len == −1).  .
718   .if (slen == −1).  .                       1105      .    len = 0;..
719   . slen = 0;. .                             1106      . wcharlist = (wchar_t *)xmalloc (sizeof .... .
720   w
      . cstr = (wchar_t *)xmalloc (sizeof ....
                                             .   1107      . mbstowcs (wcharlist, charlist , len + 1);..
721   m
      . bstowcs (wcstr, s, slen + 1);.
                                     .           1108      }
722   wclen = wcswidth (wcstr, slen);            1109       if (wcschr (wcharlist , wc))
723    free (wcstr);                             1110         break;
724   return (( int)wclen);                      1111 }
725 }
                                                                             (b) subst.c
           (a) execute_cmd.c
                    Figure: Example of source code in bash-4.2
                                                                       .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
 Jiachen Yang (IST, Osaka-U)                Fica@IWSC2012                                        June 4, 2012                          5 / 14
Fica — the name


 Filter for
 Individual user on code
 Clone
 Analysis
                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          6 / 14
Fica — the website




                          Figure: Snapshot of Fica


                                                     .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)     Fica@IWSC2012                                  June 4, 2012                          7 / 14
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
Compare Code Clone Similarity

Pi = possibility to be interesting
Pu = possibility to be un-interesting
 Len    Pi      Pi /Pu    Pu     Comp
 50 5.56% 1.18 4.72%               O
 87 2.89% 1.11 2.59%               O
 79 1.97% 0.69 2.87%               X
 63 3.55% 0.64 5.57%               O
 77 2.66% 0.46 5.83%               X


                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        11 / 14
Good Experiment Result
All training 44               Matched 32      un-interesting 1
All evaluation 34             Accuracy 94.12% interesting 1




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        12 / 14
Bad Experiment Result
All training 47               Matched 14      un-interesting 16
All evaluation 31             Accuracy 45.16% interesting 1




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        13 / 14
Open Question



 How to improve accuracy?
     By combining metrics like McCabe Cyclomatic
     Complexity?
 Thank you!




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        14 / 14
Unmatched: User un-interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        15 / 14
Unmatched: User interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        16 / 14
Overall Workflow
   . Submits source code
   1

   .
   2 Detects clones

   .
   3 Mark clones as “interesting”

     or not
   . Records marked clones into
   4

     database
   .
   5 Studies characteristics of

     marks using machine learning
                                  Figure: Overall Workflow
     algorithms                   of Fica with CDT
   .
   6 Ranks unmarked clones based

     on machine learning
                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        17 / 14
Calc Similarity of Clones


                                   |t : t ∈ d|
                      tf(t, d) =                                                                                      (1)
                                        |d|
                                             |D|
                    idf(t, D) = log                                                                                   (2)
                                    1 + |d ∈ D : t ∈ d|
              tfidf(t, d, D) = tf(t, d) × idf(t, D)                                                                   (3)
                 −− −→
                  −−−
                 tfidf(d, D) = [tfidf(t, d, D) ∀t ∈ d]                                                                (4)


                                                      .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)    Fica@IWSC2012                                 June 4, 2012                        18 / 14
Predicting Category


                       −− −→ −− −→
                        −−−             −−−
        sim(a, b, D) = tfidf(a, D) · tfidf(b, D)                                                                          (5)
                       {
                                0       , sim(a, b, D) = 0
       nsim(a, b, D) =      sim(a,b,D)                                                                                    (6)
                           |sim(a,b,D)| , otherwise

                              {
                                  ∑
                                            1                  , |M| = 0
          poss(t, M) =                ∀m∈M nsim(t,m,M)
                                                                                                                          (7)
                                           |M|                 , otherwise


                                                          .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                     ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)       Fica@IWSC2012                                  June 4, 2012                        19 / 14
Result — bash
                                A    B   C    D   E      F             G              H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                  .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                             ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)              Fica@IWSC2012                                   June 4, 2012                        20 / 14
Result — git
                                     A     B   C    D   E      F    G             H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10       20 30 40 50 60 70 80 90 100
                                         Percentage of Training Set (%)
                                                                        .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                    Fica@IWSC2012                                   June 4, 2012                        21 / 14
Result — xz
                                 A    B   C    D   E      F        G               H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)               Fica@IWSC2012                                   June 4, 2012                        22 / 14
Result — e2fsprogs
                                A    B   C   D   E       F        G                 H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                  .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                             ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)              Fica@IWSC2012                                   June 4, 2012                        23 / 14
Result — All Projects
                                     A   B     C   D    E    F             G           H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                      .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                  Fica@IWSC2012                                   June 4, 2012                        24 / 14

More Related Content

More from Jiachen Yang

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性Jiachen Yang
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Jiachen Yang
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テストJiachen Yang
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsJiachen Yang
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object OwnershipJiachen Yang
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究Jiachen Yang
 

More from Jiachen Yang (7)

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テスト
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly Reports
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object Ownership
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 

Recently uploaded

7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdftbatkhuu1
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 

Recently uploaded (20)

7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdf
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 

Output fica.beamer.43

  • 1. . Filtering Clones for Individual User Based on . Machine Learning Analysis Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University June 4, 2012 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 1 / 14
  • 2. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 3. Motivating Example Participants of survey 1 2 3 4 5 6 7 8 Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 4. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 5. Interesting U:0 vs I:8 1542 static . har *. c . 126 c . har *. . 1543 . istory_substring ( string , start , end). h . 127 . ubstring ( string , start , end). s . 1544 . const char *string;. . 128 . const char *string;. . 1545 . int start , end;.. 129 . int start , end;. . 1546 . . { 130 . . { 1547 . register int len ;. . 131 . register int len ;. . 1548 . register char *result ;. . 132 . register char *result ;. . 1549 . len = end − start;. . 133 . len = end − start;. . 1550 . result = (char *)xmalloc (len + 1);. . 134 . result = (char *)xmalloc (len + 1);. . 1551 . strncpy ( result , string + start, len);. . 135 . strncpy ( result , string + start, len);. . 1552 . result [ len ] = '0';. . 136 . result [ len ] = '0';. . 1553 . return result ;. . 137 . return ( result );.. 1554 . . } 138 . . } (a) lib/readline/histexpand.c (b) stringlib.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 3 / 14
  • 6. Un-Interesting U:8 vs I:0 191 ... __P((char *, arrayind_t, . har *));. 309 static c . int run_one_command __P((. har *));. c . 192 .static intmax_t subexpr __P((char *));. 310 .static . int run_wordexp __P((char *));. . 193 .static intmax_t expcomma __P((void));.311 .static . int uidget __P((void));.. 194 .static intmax_t expassign __P((void));. 312 .static . void init_interactive __P((void));. . 195 .static intmax_t expcond __P((void));. 313 .static . void init_noninteractive __P((void));.. 196 .static intmax_t explor __P((void));. . 314 .static void init_interactive_script __P((void));.. 197 .static intmax_t expland __P((void. ); ) . 315 .static void set_shell_name __P((char. *)); . (a) expr.c (b) shell.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 4 / 14
  • 7. Disagreed U:4 vs I:4 710 static int 711 displen (s) 1098 else 712 const char *s; 1099 { 713 { 1100 if ( wcharlist == 0) 714 wchar_t *wcstr; 1101 { 715 size_t wclen, slen ; 1102 size_t len. . ; 716 wcstr = 0..; 1103 . len = mbstowcs (wcharlist, charlist , 0);. . 717 . len = mbstowcs (wcstr, s, 0);. s . 1104 . if (len == −1). . 718 .if (slen == −1). . 1105 . len = 0;.. 719 . slen = 0;. . 1106 . wcharlist = (wchar_t *)xmalloc (sizeof .... . 720 w . cstr = (wchar_t *)xmalloc (sizeof .... . 1107 . mbstowcs (wcharlist, charlist , len + 1);.. 721 m . bstowcs (wcstr, s, slen + 1);. . 1108 } 722 wclen = wcswidth (wcstr, slen); 1109 if (wcschr (wcharlist , wc)) 723 free (wcstr); 1110 break; 724 return (( int)wclen); 1111 } 725 } (b) subst.c (a) execute_cmd.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 5 / 14
  • 8. Fica — the name Filter for Individual user on code Clone Analysis . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 6 / 14
  • 9. Fica — the website Figure: Snapshot of Fica . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 7 / 14
  • 10. ... ... ........ ........ ........ ....... . . .... .
  • 11. ... ... ........ ........ ........ ....... . . .... .
  • 12. ... ... ........ ........ ........ ....... . . .... .
  • 13. ... ... ........ ........ ........ ....... . . .... .
  • 14. Compare Code Clone Similarity Pi = possibility to be interesting Pu = possibility to be un-interesting Len Pi Pi /Pu Pu Comp 50 5.56% 1.18 4.72% O 87 2.89% 1.11 2.59% O 79 1.97% 0.69 2.87% X 63 3.55% 0.64 5.57% O 77 2.66% 0.46 5.83% X . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 11 / 14
  • 15. Good Experiment Result All training 44 Matched 32 un-interesting 1 All evaluation 34 Accuracy 94.12% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 12 / 14
  • 16. Bad Experiment Result All training 47 Matched 14 un-interesting 16 All evaluation 31 Accuracy 45.16% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 13 / 14
  • 17. Open Question How to improve accuracy? By combining metrics like McCabe Cyclomatic Complexity? Thank you! . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 14 / 14
  • 18. Unmatched: User un-interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 15 / 14
  • 19. Unmatched: User interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 16 / 14
  • 20. Overall Workflow . Submits source code 1 . 2 Detects clones . 3 Mark clones as “interesting” or not . Records marked clones into 4 database . 5 Studies characteristics of marks using machine learning Figure: Overall Workflow algorithms of Fica with CDT . 6 Ranks unmarked clones based on machine learning . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 17 / 14
  • 21. Calc Similarity of Clones |t : t ∈ d| tf(t, d) = (1) |d| |D| idf(t, D) = log (2) 1 + |d ∈ D : t ∈ d| tfidf(t, d, D) = tf(t, d) × idf(t, D) (3) −− −→ −−− tfidf(d, D) = [tfidf(t, d, D) ∀t ∈ d] (4) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 18 / 14
  • 22. Predicting Category −− −→ −− −→ −−− −−− sim(a, b, D) = tfidf(a, D) · tfidf(b, D) (5) { 0 , sim(a, b, D) = 0 nsim(a, b, D) = sim(a,b,D) (6) |sim(a,b,D)| , otherwise { ∑ 1 , |M| = 0 poss(t, M) = ∀m∈M nsim(t,m,M) (7) |M| , otherwise . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 19 / 14
  • 23. Result — bash A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 20 / 14
  • 24. Result — git A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 21 / 14
  • 25. Result — xz A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 22 / 14
  • 26. Result — e2fsprogs A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 23 / 14
  • 27. Result — All Projects A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 24 / 14