Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

Consultez-les par la suite

1 sur 20
1 sur 20

Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

Télécharger pour lire hors ligne

A slide accompanying my talk at 16th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2016. North Carolina, USA.

A slide accompanying my talk at 16th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2016. North Carolina, USA.

Plus De Contenu Connexe

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Similarity of Source Code in the Presence of Pervasive Modifications [SCAM'16]

  1. 1. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark Centre for Research on Evolution, Search and Testing (CREST) Dept. of Computer Science, UCL, London, UK
  2. 2. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 2 /* ORIGINAL */ private static int partition
 (Comparable[] a, int lo, int hi) {
 int i = lo;
 int j = hi+1;
 Comparable v = a[lo];
 while (true) {
 while (less(a[++i], v)) {
 if (i == hi) break;
 }
 while (less(v, a[--j])) {
 if (j == lo) break;
 }
 if (i >= j) break;
 exch(a, i, j);
 }
 exch(a, lo, j);
 return j;
 } /* PERVASIVELY MODIFIED CODE */ private static int partition (int[] bob, int left, int right){
 int x = left;
 int y = right+1;
 for (;;) {
 while (less(bob[left],bob[--y]))
 if (y == left) break;
 while (less(bob[++x],bob[left]))
 if (x == right) break;
 if (x >= y) break;
 swap(bob, y, x);
 }
 swap(bob, y, left);
 return y;
 } From: https://www.princeton.edu/pr/pub/integrity/pages/plagiarism/
  3. 3. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Pervasive Modifications 3 Changes affecting many locations in the whole method, file, or project Examples: layout changes, identifier renaming, API changes, refactoring Code cloning, software plagiarism, software evolution But do not include (strong) code obfuscation
  4. 4. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 4 When source code is pervasively modified, which similarity detection techniques or tools get the most accurate results?
  5. 5. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 30 Similarity Analysers 5 CCFinderX iClones Simian, NiCad Deckard Clone detectors JPlag Plaggie, Sherlock Sim Plagiarism detectors 7zncd, bzip2ncd gzipncd, xz-ncd icd, ncd Compression diff, bsdiff difflib, fuzzywuzzy jellyfish, ngram, sklearn Others
  6. 6. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Test Data Generation 6 original source obfuscator bytecode obfuscator decompilers InfixConverter.java SqrtAlgorithm.java Hanoi.java Queens.java MagicSquare.java pervasively modified code to be used in detection phase pervasively modified code compiler javac ARTIFICE ProGuard Krakatau Procyon
  7. 7. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Parameter Settings 7
  8. 8. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Report 8 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  9. 9. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Similarity Threshold = 50 9 InfC/ orig InfC/ artfc InfC/ orig no kraka tau InfC/ orig no procy on InfC/ orig pg kraka tau InfC/ orig pg procy on InfC/ artfc no kraka tau InfC/ artfc no procy on InfC/ artfc pg kraka tau InfC/ artfc pg procy on Sqrt/ orig Sqrt/ artfc … Squr/ artfc pg kraka tau Squr/ artfc pg procy on InfConv/orig 100 55 36 63 32 43 34 60 31 43 20 20 … 14 17 InfConv/artifice 55 100 35 54 33 39 37 56 32 39 19 30 … 14 17 InfConv/orig_no_krakatau 36 35 100 38 60 26 80 35 59 26 13 14 … 28 17 InfConv/orig_no_procyon 63 54 38 100 34 58 37 80 34 58 21 20 … 15 21 InfConv/orig_pg_krakatau 32 33 60 34 100 33 61 33 82 33 17 17 … 29 20 InfConv/orig_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 InfConv/artific_no_krakatau 34 37 80 37 61 26 100 36 59 26 14 14 … 28 17 InfConv/artifice_no_procyon 60 56 35 80 33 59 36 100 32 59 19 20 … 15 19 InfConv/artifice_pg_krakatau 31 32 59 34 82 33 59 32 100 33 15 16 … 28 17 InfConv/artifice_pg_procyon 43 39 26 58 33 100 26 59 33 100 19 20 … 14 21 Sqrt/orig 20 19 13 21 17 19 14 19 15 19 100 32 … 14 16 Sqrt/artifice 20 30 14 20 17 20 14 20 16 20 32 100 … 15 18 … … … … … … … … … … … … … … … … Square/artifice_pg_krakatau 14 14 28 15 29 14 28 15 28 14 14 15 … 100 32 Square/artifice_pg_procyon 17 17 17 21 20 21 17 19 17 21 16 18 … 32 100
  10. 10. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Best Threshold 10 F-measure 0.00 0.23 0.45 0.68 0.90 Threshold Value (T) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 31 F-measure = 0.8282
  11. 11. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Optimal Configuration 11 Best ThresholdBest Parameter Settings
  12. 12. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Results 12 Tool Settings T Acc Prec Rec AUC Prec@n F1 ccfx b=20,t=1 4 0.9640 0.9145 0.9040 0.9468 0.9040 0.9095 simjava r=22 5 0.9568 0.8769 0.9120 0.9490 0.8840 0.8941 jplag-text t=8 2 0.9408 0.8235 0.8960 0.9453 0.8440 0.8582 py-difflib noautojunk 35 0.9392 0.8901 0.7940 0.9147 0.8080 0.8393 7zncd-BZip2 mx=1 39 0.9368 0.8977 0.7720 0.9419 0.8180 0.8301 ncd-bzlib 31 0.9336 0.8584 0.8000 0.9482 0.8200 0.8282 jplag-java t=3 43 0.9160 0.7526 0.8640 0.9667 0.7860 0.8045 py-sklearn 33 0.8488 0.5894 0.8040 0.9146 0.6200 0.6802
  13. 13. ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-LZMA 7zncd-LZMA2 7zncd-Deflate 7zncd-Deflate64 7zncd-PPMd bzip2ncd gzipncd icd ncd-bzlib ncd-zlib xz-ncd bsdiff diff py-difflib py-fuzzywuzzy py-jellyfish py-ngram py-sklearn 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1 Clone 
 det. Plag 
 det. Comp. Others
  14. 14. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 14 Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures.
  15. 15. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Normalisation by Decompilation 15 javac Krakatau Procyon Pervasively modified code Normalised code Normalisation Compile Decompile
  16. 16. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code Before Decompilation 16
  17. 17. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK Code After Decompilation 17
  18. 18. Clone 
 det. Plag 
 det. Comp. Others ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-LZMA 7zncd-LZMA2 7zncd-Deflate 7zncd-Deflate64 7zncd-PPMd bzip2ncd gzipncd icd ncd-bzlib ncd-zlib xz-ncd bsdiff diff py-difflib py-fuzzywuzzy py-jellyfish py-ngram py-sklearn 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1 Orig. Dec.
  19. 19. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 19 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code
  20. 20. Similarity of Source Code in the Presence of Pervasive Modifications — C. Ragkhitwetsagul, J. Krinke, D. Clark — CREST, UCL, UK 20 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code Highly specialised source code similarity detection techniques and tools can perform better than more general, textual similarity measures. Similarity of Source Code
 in the Presence of Pervasive Modifications Chaiyong Ragkhitwetsagul, Jens Krinke, David Clark — CREST, UCL More info: http://crest.cs.ucl.ac.uk/resources/cloplag/

×