SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
TRIS: a Fast and Accurate Identifiers
Splitting and Expansion
Algorithm
WCRE 2012, Kingston, Canada

Latifa Guerrouj, Philippe Galinier, Yann-Gaël Guéhéneuc,
Giuliano Antoniol, and Massimiliano Di Penta
Content
Research Context
Current Practices
TRIS in Essence
Case Study
Threats to Validity
Conclusion
WCRE’12, Kingston
2/25
Research Context
Researchers agree that the identifier semantics are
important:
Help program comprehension: [CT99, DP05, LFB06]…;
Improve software quality: [MPF08, LMFB06, JH06]…;
Suggest clues: [TGM96, LMFB07]...

Composed identifiers:
Camel Case: MyLocalAccount, User_Address
Contraction based: pntr ctr, usrAdrss , imagEdge
Good and possibly known to the developers:
WCRE’12, Kingston
3/25

hmmm, ixoth , pqrstuvwxyz
Current Practices
Camel Case-based approaches [LFB06]
Case[LFB06]
Samurai by Enslen [ELK09]:
Mines term frequencies in large source code bases and relies on two
assumptions:
1) A substring composing an identifier is also likely to be used in
other parts of the program or in other programs.
2) Given two possible splits of a given identifier, the split that
most likely represents the developer’s intent partitions the
identifier into terms occurring more often in the program.

WCRE’12, Kingston
4/25

Abbreviations not treated, no quantification of how
close the match is to the unknown string.
Current Practices
TIDIER [GMA11]:
[GMA11]:
Inspired by speech recognition techniques and uses Dynamic
Time Warping to match terms to domain concepts.
Exploits context in the form of specialized dictionaries and
mimics the process of transforming words via contraction
rules.

GenTest/Normalize
GenTest/Normalize [LBM10, LB11]:
GenTest generates all possible splittings and evaluates a
scoring function against each proposed splitting.
Normalize uses a machine translation technique, the
maximum coherence model.

WCRE’12, Kingston
5/25

TIDIER, GenTest and Normalize are demanding
in terms of computation time.
TRIS in Essence
TRIS assumes that developers compose identifiers:
Using terms and words reflecting domain concepts developer’s
experience, knowledge.

TRIS uses set of transformation rules:
Drop vowels, drop prefix, drop suffix, etc.

TRIS treats the problem as an optimization problem
where the cost function is:
C(wOrig w) = α*Freq(wOrig) + C(type(wOrig w))
Freq(wOrig): frequency of wOrig in the source code
WCRE’12, Kingston
6/25

C(type(wOrig

w)): cost of the transformation type
TRIS in Essence
TRIS applies a two-phase strategy:
two1) Building dictionary transformation
Computation of the frequency of dictionary words
Construction of the set of transformations
Construction of the arborescence

2) Identifier processing algorithm
Construction of the auxiliary graph of the identifier
Search for a shortest path in the auxiliary graph
WCRE’12, Kingston
7/25
TRIS in Essence
TRIS phases – Example:

WCRE’12, Kingston
8/25

Dictionary Transformations Building Information for the Identifier
callableint
TRIS in Essence

Arborescence of Transformations for the Dictionary D

WCRE’12, Kingston
9/25

Auxiliary Graph for the Identifier callableint
Case Study - Research Question

Quality focus:

Accuracy of TRIS with respect to oracles, and compared with stateof-the-art approaches: 1) Camel Case Splitter, 2) Samurai, 3)
TIDIER, and 4) GenTest.

Research Question:

What is the accuracy of the TRIS identifier splitting and expansion
approach compared with alternative state-of-the art approaches?
WCRE’12, Kingston
10/25
Case Study – Analyzed Systems
JHotDraw – Java
16 KLOC
155 files
2,348 identifiers (longer than 2 chars)
957 manually segmented identifiers
Lynx – C
174 KLOC
247 files
12,194 identifiers (longer than 2 chars)
3,085 manually segmented identifiers
Lawrie et al. Data Set
186 programs
26 MLOC of C
15 MLOC of C++
WCRE’12, Kingston
11/25

7 MLOC of Java
489 C/C++ sampled from 430 GNU projects
Case Study - Results

Precision, recall and F-measure of TRIS, Camel Case, Samurai,
and TIDIER on JHotDraw
WCRE’12, Kingston
12/25
Case Study - Results

Precision, recall and F-measure of TRIS, Camel Case, Samurai,
and TIDIER on Lynx

WCRE’12, Kingston
13/25
Case Study - Results
Approach 1
TRIS
TRIS
TRIS

Approach 2 Adj p-value
Camel Case
0.431
Samurai
0.894
TIDIER
0.024

Cliff's delta
0.041
0.001
0.043

Comparison among approaches: Results of Wilcoxon paired test
and Cliff’s Delta effect size on JHotDraw
Approach 1

Approach 2

TRIS
TRIS
TRIS

Camel Case
Samurai
TIDIER

Adj p-value

<0.001
<0.001
<0.001

Cliff's delta

0.743
0.684
0.204

Comparison among approaches: Results of Wilcoxon paired test
and Cliff’s Delta effect size on Lynx

WCRE’12, Kingston
14/25

Cliff’s delta Interpretation:
- small for 0.148 <= d <0.33
- medium for 0.33 <= d < 0.474
- large for d >= 0.474
Case Study - Results

Metric
Precision

Median
Mean
3Q
σ
0.6667
0.6368 1.0000 0.3681

1.0000

1.0000

0.8933

1.0000 0.2471

TIDIER

0.5000

0.6667

0.6496

1.0000 0.3654

TRIS

1.0000

1.0000

0.8720

1.0000 0.2606

TIDIER

0.4000

0.6667

0.6409

1.0000 0.3650

TRIS

F-measure

1Q
0.4000

TRIS
Recall

Approach
TIDIER

1.0000

1.0000

0.8790

1.0000 0.2524

Precision, recall and F-measure of TRIS and TIDIER on the
489 C Sampled Identifiers

WCRE’12, Kingston
15/25
Case Study - Results
Metric

Approach

1Q

Median Mean

3Q

σ

Precision

TRIS

1.0000 1.0000 0.9763 1.0000 0.1184

Recall

TRIS

1.0000 1.0000 0.9439 1.0000 0.1565

F-measure

TRIS

1.0000 1.0000 0.9559 1.0000 0.1358

Precision, recall and F-measure of TRIS and TIDIER on the
data set from LAWRIE et al

Approach

Identifier Splitting Correctness

Samurai
GenTest

16/25

82%

TRIS
WCRE’12, Kingston

70%

86%

Correctness of the splitting provided using the data set from
LAWRIE et al
Conclusion
TRIS maps the identifier splitting/expansion problem in a graph
optimization problem to find the optimal path in an identifier
acyclic weighted graph.
TRIS was applied on Java, C, and C++ samples and compared to
Camel Case, Samurai, TIDIER, and GenTest.
Results show that TRIS outperforms other approaches with
medium to large effect size.
TRIS is also efficient in terms of computation time: quadratic
complexity in the length of the identifier.
WCRE’12, Kingston
17/25
Finally… Questions

Thank you for your attention
WCRE’12, Kingston
18/25
References
[LB11 ] D. Lawrie and D. Binkley, “Expanding identifiers to normalize source
code vocabulary,” International Conference on Software Maintenance , 2011, pp.
113–122.
[LBM10] D. Lawrie, D. Binkley, and C. Morrell, “Normalizing source code
vocabulary,” Working Conference on Reverse Engineering, 2010, pp. 112–122.
[GMA11] L. Guerrouj, D. P. Massimiliano, A. Giuliano, and Y.-G. Guéhéneuc,
“Tidier: An identifier splitting approach using speech recognition techniques,”
Journal of Software Maintenance and Evolution: Research and Practice, 2011.
[MGD10] N. Madani, L. Guerrouj, M. Di Penta, Y. Guéhéneuc, and G. Antoniol.
“Recognizing words from source code identifiers using speech recognition
techniques”. 14th European Conference on Software Maintenance and
Reengineering, March 2010.
[ELK09] E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “Mining source
code to automatically split identifiers for software analysis”. Mining Software
Repositories, International Workshop on, vol. 0, pp. 71-80, 2009.
WCRE’12, Kingston
19/25
References
[MPF08] A. Marcus, D. Poshyvanyk, and R. Ferenc. “Using the conceptual
cohesion of classes for fault prediction in object-oriented systems”. IEEE
Transactions on Software Engineering, 34(2):287-300, 2008.
LMFB07] Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley.
“Effective identifier names for comprehension and memory”. Innovations in
Systems and Software Engineering, 3(4):303-318, 2007.
[LFB06] D. Lawrie, H. Feild, and D. Binkley. “Syntactic identifier conciseness
and consistency”. 6th International Workshop on Source Code Analysis and
Manipulation, pages139-148, Sept 27-29, 2006.
[LMFB06] D. Lawrie, C. Morrell, H. Feild, and D. Binkley. “What’s in a name? a
study of identifiers”. 14th International Conference on Program
Comprehension, pages 3-12, Athens, Greece, 2006.
[JH06] Z. Ming Jiang and Ahmed E. Hassan. “Examining the evolution of code
comments in postgresql”. 2006 International Workshop on Mining Software
Repositories, pages 179-180, 2006.
WCRE’12, Kingston
20/25
References
[DP05] Florian Deissenbock and Markus Pizka. “Concise and consistent
naming”. Proceedings of the International Workshop on Program
Comprehension, May 2005.
[Ian00] Ian Sommerville. Software Engineering. Addison-Wesley, sixth edition,
2000.
[TGM96] Armstrong Takang, Penny A. Grubb, and Robert D. Macredie. “The
effects of comments and identifier names on program comprehensibility: an
experiential study”. Journal of Program Languages, 4(3):143–167, 1996.

[HNey84] H. Ney, “The use of a one-stage dynamic programming algorithm for
connected word recognition”. Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 32, no. 2, pp. 263-271, Apr 1984.

WCRE’12, Kingston
21/25

Contenu connexe

En vedette

Icsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptIcsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptPtidej Team
 
Icsm07 tooldemo.pdf
Icsm07 tooldemo.pdfIcsm07 tooldemo.pdf
Icsm07 tooldemo.pdfPtidej Team
 
Software Design Patterns in Theory
Software Design Patterns in TheorySoftware Design Patterns in Theory
Software Design Patterns in TheoryPtidej Team
 
Quality and Software Design Patterns
Quality and Software Design PatternsQuality and Software Design Patterns
Quality and Software Design PatternsPtidej Team
 
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future ChallengesAsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future ChallengesPtidej Team
 
Software Design Patterns in Practice
Software Design Patterns in PracticeSoftware Design Patterns in Practice
Software Design Patterns in PracticePtidej Team
 

En vedette (12)

Rsse12.ppt
Rsse12.pptRsse12.ppt
Rsse12.ppt
 
Icsoc12 tooldemo.ppt
Icsoc12 tooldemo.pptIcsoc12 tooldemo.ppt
Icsoc12 tooldemo.ppt
 
Ssbse12a.ppt
Ssbse12a.pptSsbse12a.ppt
Ssbse12a.ppt
 
Icsm07 tooldemo.pdf
Icsm07 tooldemo.pdfIcsm07 tooldemo.pdf
Icsm07 tooldemo.pdf
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
Ppap13a.ppt
Ppap13a.pptPpap13a.ppt
Ppap13a.ppt
 
Wcre13b.ppt
Wcre13b.pptWcre13b.ppt
Wcre13b.ppt
 
Software Design Patterns in Theory
Software Design Patterns in TheorySoftware Design Patterns in Theory
Software Design Patterns in Theory
 
Quality and Software Design Patterns
Quality and Software Design PatternsQuality and Software Design Patterns
Quality and Software Design Patterns
 
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future ChallengesAsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
AsianPLoP'14: How and Why Design Patterns Impact Quality and Future Challenges
 
Software Design Patterns in Practice
Software Design Patterns in PracticeSoftware Design Patterns in Practice
Software Design Patterns in Practice
 
Jcom02.ppt
Jcom02.pptJcom02.ppt
Jcom02.ppt
 

Similaire à Wcre12b.ppt

Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...
Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...
Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...eSAT Journals
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmNECST Lab @ Politecnico di Milano
 
Cytoscape ci chapter 1
Cytoscape ci chapter 1Cytoscape ci chapter 1
Cytoscape ci chapter 1bdemchak
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
 
Implementation of hummingbird cryptographic
Implementation of hummingbird cryptographicImplementation of hummingbird cryptographic
Implementation of hummingbird cryptographiceSAT Publishing House
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...Benoit Combemale
 
Towards Modeling the User-perceived Quality of Source Code using Static Analy...
Towards Modeling the User-perceived Quality of Source Code using Static Analy...Towards Modeling the User-perceived Quality of Source Code using Static Analy...
Towards Modeling the User-perceived Quality of Source Code using Static Analy...ISSEL
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemAlejandro Llaves
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...CARLOS III UNIVERSITY OF MADRID
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...IncQuery Labs
 
April2010cataract audit
April2010cataract auditApril2010cataract audit
April2010cataract auditeyetech
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Jonas Traub
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp usRISC-V International
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsEnrico Palumbo
 

Similaire à Wcre12b.ppt (20)

Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...
Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...
Implementation of hummingbird cryptographic algorithm for low cost rfid tags ...
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 
Cytoscape ci chapter 1
Cytoscape ci chapter 1Cytoscape ci chapter 1
Cytoscape ci chapter 1
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
Implementation of hummingbird cryptographic
Implementation of hummingbird cryptographicImplementation of hummingbird cryptographic
Implementation of hummingbird cryptographic
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Towards Modeling the User-perceived Quality of Source Code using Static Analy...
Towards Modeling the User-perceived Quality of Source Code using Static Analy...Towards Modeling the User-perceived Quality of Source Code using Static Analy...
Towards Modeling the User-perceived Quality of Source Code using Static Analy...
 
Trivandrum
TrivandrumTrivandrum
Trivandrum
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing system
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
 
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
Towards the Next Generation of Reactive Model Transformations on Low-Code Pla...
 
Shilpa ppt
Shilpa pptShilpa ppt
Shilpa ppt
 
April2010cataract audit
April2010cataract auditApril2010cataract audit
April2010cataract audit
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 

Plus de Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

Plus de Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Wcre12b.ppt

  • 1. TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm WCRE 2012, Kingston, Canada Latifa Guerrouj, Philippe Galinier, Yann-Gaël Guéhéneuc, Giuliano Antoniol, and Massimiliano Di Penta
  • 2. Content Research Context Current Practices TRIS in Essence Case Study Threats to Validity Conclusion WCRE’12, Kingston 2/25
  • 3. Research Context Researchers agree that the identifier semantics are important: Help program comprehension: [CT99, DP05, LFB06]…; Improve software quality: [MPF08, LMFB06, JH06]…; Suggest clues: [TGM96, LMFB07]... Composed identifiers: Camel Case: MyLocalAccount, User_Address Contraction based: pntr ctr, usrAdrss , imagEdge Good and possibly known to the developers: WCRE’12, Kingston 3/25 hmmm, ixoth , pqrstuvwxyz
  • 4. Current Practices Camel Case-based approaches [LFB06] Case[LFB06] Samurai by Enslen [ELK09]: Mines term frequencies in large source code bases and relies on two assumptions: 1) A substring composing an identifier is also likely to be used in other parts of the program or in other programs. 2) Given two possible splits of a given identifier, the split that most likely represents the developer’s intent partitions the identifier into terms occurring more often in the program. WCRE’12, Kingston 4/25 Abbreviations not treated, no quantification of how close the match is to the unknown string.
  • 5. Current Practices TIDIER [GMA11]: [GMA11]: Inspired by speech recognition techniques and uses Dynamic Time Warping to match terms to domain concepts. Exploits context in the form of specialized dictionaries and mimics the process of transforming words via contraction rules. GenTest/Normalize GenTest/Normalize [LBM10, LB11]: GenTest generates all possible splittings and evaluates a scoring function against each proposed splitting. Normalize uses a machine translation technique, the maximum coherence model. WCRE’12, Kingston 5/25 TIDIER, GenTest and Normalize are demanding in terms of computation time.
  • 6. TRIS in Essence TRIS assumes that developers compose identifiers: Using terms and words reflecting domain concepts developer’s experience, knowledge. TRIS uses set of transformation rules: Drop vowels, drop prefix, drop suffix, etc. TRIS treats the problem as an optimization problem where the cost function is: C(wOrig w) = α*Freq(wOrig) + C(type(wOrig w)) Freq(wOrig): frequency of wOrig in the source code WCRE’12, Kingston 6/25 C(type(wOrig w)): cost of the transformation type
  • 7. TRIS in Essence TRIS applies a two-phase strategy: two1) Building dictionary transformation Computation of the frequency of dictionary words Construction of the set of transformations Construction of the arborescence 2) Identifier processing algorithm Construction of the auxiliary graph of the identifier Search for a shortest path in the auxiliary graph WCRE’12, Kingston 7/25
  • 8. TRIS in Essence TRIS phases – Example: WCRE’12, Kingston 8/25 Dictionary Transformations Building Information for the Identifier callableint
  • 9. TRIS in Essence Arborescence of Transformations for the Dictionary D WCRE’12, Kingston 9/25 Auxiliary Graph for the Identifier callableint
  • 10. Case Study - Research Question Quality focus: Accuracy of TRIS with respect to oracles, and compared with stateof-the-art approaches: 1) Camel Case Splitter, 2) Samurai, 3) TIDIER, and 4) GenTest. Research Question: What is the accuracy of the TRIS identifier splitting and expansion approach compared with alternative state-of-the art approaches? WCRE’12, Kingston 10/25
  • 11. Case Study – Analyzed Systems JHotDraw – Java 16 KLOC 155 files 2,348 identifiers (longer than 2 chars) 957 manually segmented identifiers Lynx – C 174 KLOC 247 files 12,194 identifiers (longer than 2 chars) 3,085 manually segmented identifiers Lawrie et al. Data Set 186 programs 26 MLOC of C 15 MLOC of C++ WCRE’12, Kingston 11/25 7 MLOC of Java 489 C/C++ sampled from 430 GNU projects
  • 12. Case Study - Results Precision, recall and F-measure of TRIS, Camel Case, Samurai, and TIDIER on JHotDraw WCRE’12, Kingston 12/25
  • 13. Case Study - Results Precision, recall and F-measure of TRIS, Camel Case, Samurai, and TIDIER on Lynx WCRE’12, Kingston 13/25
  • 14. Case Study - Results Approach 1 TRIS TRIS TRIS Approach 2 Adj p-value Camel Case 0.431 Samurai 0.894 TIDIER 0.024 Cliff's delta 0.041 0.001 0.043 Comparison among approaches: Results of Wilcoxon paired test and Cliff’s Delta effect size on JHotDraw Approach 1 Approach 2 TRIS TRIS TRIS Camel Case Samurai TIDIER Adj p-value <0.001 <0.001 <0.001 Cliff's delta 0.743 0.684 0.204 Comparison among approaches: Results of Wilcoxon paired test and Cliff’s Delta effect size on Lynx WCRE’12, Kingston 14/25 Cliff’s delta Interpretation: - small for 0.148 <= d <0.33 - medium for 0.33 <= d < 0.474 - large for d >= 0.474
  • 15. Case Study - Results Metric Precision Median Mean 3Q σ 0.6667 0.6368 1.0000 0.3681 1.0000 1.0000 0.8933 1.0000 0.2471 TIDIER 0.5000 0.6667 0.6496 1.0000 0.3654 TRIS 1.0000 1.0000 0.8720 1.0000 0.2606 TIDIER 0.4000 0.6667 0.6409 1.0000 0.3650 TRIS F-measure 1Q 0.4000 TRIS Recall Approach TIDIER 1.0000 1.0000 0.8790 1.0000 0.2524 Precision, recall and F-measure of TRIS and TIDIER on the 489 C Sampled Identifiers WCRE’12, Kingston 15/25
  • 16. Case Study - Results Metric Approach 1Q Median Mean 3Q σ Precision TRIS 1.0000 1.0000 0.9763 1.0000 0.1184 Recall TRIS 1.0000 1.0000 0.9439 1.0000 0.1565 F-measure TRIS 1.0000 1.0000 0.9559 1.0000 0.1358 Precision, recall and F-measure of TRIS and TIDIER on the data set from LAWRIE et al Approach Identifier Splitting Correctness Samurai GenTest 16/25 82% TRIS WCRE’12, Kingston 70% 86% Correctness of the splitting provided using the data set from LAWRIE et al
  • 17. Conclusion TRIS maps the identifier splitting/expansion problem in a graph optimization problem to find the optimal path in an identifier acyclic weighted graph. TRIS was applied on Java, C, and C++ samples and compared to Camel Case, Samurai, TIDIER, and GenTest. Results show that TRIS outperforms other approaches with medium to large effect size. TRIS is also efficient in terms of computation time: quadratic complexity in the length of the identifier. WCRE’12, Kingston 17/25
  • 18. Finally… Questions Thank you for your attention WCRE’12, Kingston 18/25
  • 19. References [LB11 ] D. Lawrie and D. Binkley, “Expanding identifiers to normalize source code vocabulary,” International Conference on Software Maintenance , 2011, pp. 113–122. [LBM10] D. Lawrie, D. Binkley, and C. Morrell, “Normalizing source code vocabulary,” Working Conference on Reverse Engineering, 2010, pp. 112–122. [GMA11] L. Guerrouj, D. P. Massimiliano, A. Giuliano, and Y.-G. Guéhéneuc, “Tidier: An identifier splitting approach using speech recognition techniques,” Journal of Software Maintenance and Evolution: Research and Practice, 2011. [MGD10] N. Madani, L. Guerrouj, M. Di Penta, Y. Guéhéneuc, and G. Antoniol. “Recognizing words from source code identifiers using speech recognition techniques”. 14th European Conference on Software Maintenance and Reengineering, March 2010. [ELK09] E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “Mining source code to automatically split identifiers for software analysis”. Mining Software Repositories, International Workshop on, vol. 0, pp. 71-80, 2009. WCRE’12, Kingston 19/25
  • 20. References [MPF08] A. Marcus, D. Poshyvanyk, and R. Ferenc. “Using the conceptual cohesion of classes for fault prediction in object-oriented systems”. IEEE Transactions on Software Engineering, 34(2):287-300, 2008. LMFB07] Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. “Effective identifier names for comprehension and memory”. Innovations in Systems and Software Engineering, 3(4):303-318, 2007. [LFB06] D. Lawrie, H. Feild, and D. Binkley. “Syntactic identifier conciseness and consistency”. 6th International Workshop on Source Code Analysis and Manipulation, pages139-148, Sept 27-29, 2006. [LMFB06] D. Lawrie, C. Morrell, H. Feild, and D. Binkley. “What’s in a name? a study of identifiers”. 14th International Conference on Program Comprehension, pages 3-12, Athens, Greece, 2006. [JH06] Z. Ming Jiang and Ahmed E. Hassan. “Examining the evolution of code comments in postgresql”. 2006 International Workshop on Mining Software Repositories, pages 179-180, 2006. WCRE’12, Kingston 20/25
  • 21. References [DP05] Florian Deissenbock and Markus Pizka. “Concise and consistent naming”. Proceedings of the International Workshop on Program Comprehension, May 2005. [Ian00] Ian Sommerville. Software Engineering. Addison-Wesley, sixth edition, 2000. [TGM96] Armstrong Takang, Penny A. Grubb, and Robert D. Macredie. “The effects of comments and identifier names on program comprehensibility: an experiential study”. Journal of Program Languages, 4(3):143–167, 1996. [HNey84] H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition”. Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 32, no. 2, pp. 263-271, Apr 1984. WCRE’12, Kingston 21/25