SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
A Comment Analysis Approach for Program
             Comprehension
José L. Freitas   1
                           Daniela da Cruz     1
                                                     Pedro R. Henriques         1




                      1
                          Universidade do Minho, Portugal




           Software Engineering Workshop, Crete
                     Oct. 12-13, 2012


             Freitas, Cruz, Henriques     Comment Analysis for Program Comprehension
Context

Program Comprehension is a vital task of Software
Maintenance.
In Software Maintenance, 50% of the time is spent on
comprehending the system.
Several approaches of source code analysis have been applied
to develop PC tools: program slicing, control-ow, data-ow,
etc.



           Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Motivation

Most of PC tools are based on the extraction of structural
information.
Example: Function Y is used by function X n times. etc.


However, they lack the extraction of the meaning of a program
or the Problem domain concepts related with the program.
Example: Function Y calculates the amount of credit of a

banking account. etc.




           Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Motivation

         Comments    can be the biggest source of semantic information
         on code, alongside with identiers.
1   / ∗ T h i s f u n c t i o n r e c e i v e s t h e i d number o f a b a n k i n g
         a c c o u n t and r e t u r n s t h e a v a i l a b l e amount o f c r e d i t
        ∗/

3   int credit ( int id ){ . . . }

     Why not use comments to search for Problem Domain concepts,
                   needed to understand a program?


                        Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Bad and Good Comments

                  When a comment is bad or good?
Apart from the existing controversy around this subject, a bad
comment can start from being a comment which is inconsistent
with the code which is commenting, and that leads to the
misleading of the person who reads it.
         states that comments help on the comprehension if they
provide Problem and Program Domain information and means to
Brooks



establish bridges between those two domains.


               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Goal


 Create a Program Comprehension tool that explores comments to
          search for Problem Domain concepts: Darius.                     1




   1
    Relative to King Darius I of Persia, the rst known man to create the rst
bridge between Europe and Asia, on the Bosphorus strait.
                   Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline

    Darius  Comment Evaluator
      Preliminary study
1




    Darius  Concept Locator
      Experiment
2




3   Conclusion


                 Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Comment Evaluator


The rst version of Darius analyzes:
    Comment Quantity: number of comments, percentage of
    comments, etc.
    Comment Content: Use of Problem Domain and Program
    Domain terms.




              Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study
           Comment Evaluator Modules




         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius GUI (1)




         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Comment Extractor module


                         Comment Extractor

Darius extract three types of comments:
  1  Inline Comments, IC for short: // ...
  2  Block Comments, BC for short: /* ... */
  3  JavaDoc Comments, BC for short: /** ... */



               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Comment Extractor module

In order to discover and identify what type of source code entity is
associated with the comment, the next line after the comment is
extracted too. Darius associates comments with:
  1  classes
  2  interfaces
  3  methods
  4  conditionals (if)
  5  loops (while and for)
  6  switches


                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Statistics Calculator module

              Statistic Calculator module


Number of comments of a project (global, per type of
comment and per line of source code);
Average number of comment lines per lines of code;
Average number of lines of a non inline comment;
Average number of each type of source code entity which is
commented;
Type of comments most used (global and per source code
entity).

           Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Words Analyzer module

                     Words Analyzer module

Given a list of words extracted from the ontology of the Problem
Domain, Darius computes:
    Percentage and frequency of words in the list found in
    comments;
    Frequency of each type of comment that contains words from
    the list;
    Frequency of each type of source code entity commented that
    contains words from the list.

               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Words Analyzer module




        Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline

    Darius  Comment Evaluator
      Preliminary study
1




    Darius  Concept Locator
      Experiment
2




3   Conclusion


                 Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study

In order to perform a preliminary study, 10 open-source software
projects written in Java were selected.
The choice for the use of open-source projects has two reasons:
  1  The source code is totally free;
     Open-source software projects are highly used by the
     community to change and manipulate the source code over
  2




     and over again
These kind of projects tend to be constantly updated and thus
comprehension tasks are involved. Commenting can be a proper
way of helping on these tasks.


                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study
 Project                Description                   Files       LoC         Classes
    iText            PDF Library                480 145666                      403
ganttproject Project Management Library         530 68945                       394
  gwt-dev       Google's Web Toolkit            987 192738                      803
    jEdit             Text Editor               531 176006                      404
     vuze         Peer-to-peer client          3284 785935                      2463
    junit          Tests Framework              154 10926                       130
 jfreechart          Chart Library              989 313231                      876
    antlr       Grammar Framework               221 85867                       212
  jexcelapi          Excel Library              438 93876                       166
 robocode Programming Game of Robots            571 81519                       485
                 Total                         8185 1954709                     6336
       Table : Description and size of each selected project

             Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results



Comment Quantity:         6/10 test programs ≥ 19% comments




         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results
                                                 Type of Comments
      Project     #CM          CM/LOSC        #IC       #BC        #JD
        iText    13343           0.24        4930        3777       4636
 ganttproject     4468           0.11        2925         814        729
     gwt-dev     12969           0.16        7219         866       4884
         jEdit   18986           0.21         806       14421       3759
         vuze    27723           0.08        18245       2319       7159
         junit     519           0.21          2          77         440
   jfreechart    22516           0.27        6592        2530      13394
         antlr    5292           0.14        3903        1380         9
    jexcelapi     8354           0.26        2213        775        5366
   robocode       5071           0.19        3108        102        1861
        Total    119241          0.16        63633      13371      42237
         Table : Comments Frequency in the projects.


             Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results
    Project      If    For While Switch Class Interf. Method
       iText     5       7 7          7 89 90             76
ganttproject     5       5 3          8 57 41             18
    gwt-dev      9      10 7          5 96 97             19
        jEdit    9       8 4          2 86 79             61
        vuze     6       6 5          7 45 46             24
        junit    1       0 0          0 25 71             37
  jfreechart     6      10 2 18 100 100 100
        antlr   11      16 5          4 61 56             22
   jexcelapi    14      18 12         0 99 100            88
  robocode       7      11 12         3 76 94             20
      Total      7       8 6          5 69 60             45
       Table : Percentage of Source Code Entities commented
                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results
    Project      If    For      While Switch Class               Interf. Method
       iText    IC     IC        IC IC JD                          JD JD
ganttproject    IC     IC        IC IC JD                          JD JD
    gwt-dev     IC     IC        IC IC JD                          JD JD
        jEdit   IC     IC        IC IC JD                          JD JD
        vuze    IC     IC        IC IC JD                          JD JD
        junit   IC     NA        NA NA JD                          JD JD
  jfreechart    IC     IC        IC IC JD                          JD JD
        antlr   IC     IC        IC IC BC                          BC BC
   jexcelapi    IC     IC        BC NA JD                          JD JD
  robocode      IC     IC        IC IC JD                          JD JD
      Total     IC     IC        IC IC JD                          JD JD
 Table : Most used type of comment per type of source code entity
                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results


Comment Content:   10/10 test programs ≥ 23% Problem
and Program Domain terms




          Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results

Goal:  explore the content of comments, by checking weather
comments contain Problem and Program domain information.
Information necessary to run these tests:
     a list of problem domain terms for each one of the software
     projects.
     a (single) list of program domain terms.



                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results
   Project         Problem Domain               Program Domain
     iText                   92.31                       86.76
 ganttproject                84.31                        75.0
   gwt-dev                   56.34                       86.76
     jEdit                   89.74                       86.76
      vuze                   92.11                       88.24
     junit                   81.82                       67.65
  jfreechart                 86.36                       89.71
     antlr                   88.24                       83.82
   jexcelapi                 79.31                       85.29
  robocode                   88.89                       83.82
    Total                    82.21                       83.38
         Table : Percentage of domain words found

             Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Preliminary study results
                        IC                      BC                      JD
  Project      Prob.         Prog.        Prob.     Prog.       Prob.       Prog.
    iText      13.05         13.99         6.89     14.1        14.7         17.36
ganttproject   13.76         13.52        14.78     11.58       14.67         14.2
  gwt-dev       0.96          19.7         2.03     18.31       2.62         22.16
    jEdit        5.1         17.15         6.44     24.76       9.28         16.69
     vuze        4.6         18.02         5.14     11.38       4.29         18.89
    junit         0           20.0        17.14     16.57       22.66        25.77
 jfreechart     20.7         20.73        16.74     12.45       15.58        21.41
    antlr      13.85         13.81        13.95     10.7        2.13         11.35
  jexcelapi    10.38         16.16        17.08     12.97       24.97        17.01
 robocode       17.0         14.06        16.52     12.6        25.13         12.5
   Total        9.97         16.27         8.33     14.58       13.13        19.13
Table : Frequency (%) of words of each Domain per type of comment
               Freitas, Cruz, Henriques    Comment Analysis for Program Comprehension
Darius  Preliminary study conclusion


Higher level source code entities tend to have comments
oriented for Problem Domain information, while comments
of lower level entities tends to include more Program
Domain information.




             Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline

    Darius  Comment Evaluator
      Preliminary study
1




    Darius  Concept Locator
      Experiment
2




3   Conclusion


                 Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius - Problem Concept Location
Goal:  Search of Problem Domain concepts to nd the mappings of
these concepts on the source code.




               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius GUI (2)




         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Employed Techniques

Latent Semantic Analysis (LSA)
A technique in natural language processing, of analyzing
relationships between a set of documents and the terms they
contain by producing a set of concepts related to the documents
and terms.
LSA assumes that words that are close in meaning will occur close
together in text. It constructs a matrix containing word counts per
paragraph (rows represent unique words and columns represent
each paragraph).


                Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Employed Techniques

Vector Space Model (VSM)
An algebraic model for representing text documents as vectors.
Each dimension corresponds to a separate term. If a term occurs in
the document, its value in the vector is non-zero.
A weight is used to evaluate how important a word is to a
document in a collection. The importance increases
proportionally to the number of times a word appears in the
document.



               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline

    Darius  Comment Evaluator
      Preliminary study
1




    Darius  Concept Locator
      Experiment
2




3   Conclusion


                 Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText


The object of study chosen to be subject on this test, is iText.
iText contains a sucient amount of comments, and the contents
of that comments have a sucient dose of Problem and Program
domain information, and so this program can be explored for PC
purposes using its comments.




               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText




        Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText
How a PDF document is created?




         Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText

How to write a PDF document into an output stream?




          Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText

How to add a title to a PDF?




          Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Darius  Experiment with iText

Going deeper in the searches and using the information discovered
in each executed query, the programmer can build an incremental
knowledge of the software.
The programmer should be able to gure out the implementation of
every concept on the source code and the relations among them, by
using the information present on comments.



               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Outline

    Darius  Comment Evaluator
      Preliminary study
1




    Darius  Concept Locator
      Experiment
2




3   Conclusion


                 Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension
Conclusion
Do real world programs actually contain enough and meaningful
comments to justify the analysis eort and the approach proposed?
    Using simple but eective queries the process of locating
    concepts using comment information is faster than the
    complex task of reading the whole source code of the program.
    Darius shows the potential value of comprehension that
    comments poses.
    As future work:
         Questionnaires will be made to understand how a programmer
         would deal with Darius.
         Develop Darius as a plugin for an IDE (e.g. Eclipse).


               Freitas, Cruz, Henriques   Comment Analysis for Program Comprehension

Contenu connexe

Similaire à Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 
Extra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech TalkExtra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech TalkRed Hat Developers
 
A project report on chat application
A project report on chat applicationA project report on chat application
A project report on chat applicationKumar Gaurav
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor routBikram Rout
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor routBikram Rout
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to pythonMohamed Hegazy
 
Hcplphx920
Hcplphx920Hcplphx920
Hcplphx920Thinkful
 
Hostel management system (5)
Hostel management system (5)Hostel management system (5)
Hostel management system (5)PRIYANKMZN
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...Yandex
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionYasir Raza Khan
 

Similaire à Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete) (20)

Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
The Knowledgeable Software Engineer
The Knowledgeable Software EngineerThe Knowledgeable Software Engineer
The Knowledgeable Software Engineer
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
FuzzyDbg_Report.pdf
FuzzyDbg_Report.pdfFuzzyDbg_Report.pdf
FuzzyDbg_Report.pdf
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
Extra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech TalkExtra micrometer practices with Quarkus | DevNation Tech Talk
Extra micrometer practices with Quarkus | DevNation Tech Talk
 
H1803044651
H1803044651H1803044651
H1803044651
 
A project report on chat application
A project report on chat applicationA project report on chat application
A project report on chat application
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor rout
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor rout
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
Introduction
IntroductionIntroduction
Introduction
 
Hcplphx920
Hcplphx920Hcplphx920
Hcplphx920
 
Hostel management system (5)
Hostel management system (5)Hostel management system (5)
Hostel management system (5)
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...
Николай Бьернер «Program Analysis and Testing using Efficient Satisfiability ...
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd edition
 

Plus de Daniela Da Cruz

Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-CDaniela Da Cruz
 
Game Development with AndEngine
Game Development with AndEngineGame Development with AndEngine
Game Development with AndEngineDaniela Da Cruz
 
Interactive Verification of Safety-Critical Systems
Interactive Verification of Safety-Critical SystemsInteractive Verification of Safety-Critical Systems
Interactive Verification of Safety-Critical SystemsDaniela Da Cruz
 
Android Lesson 3 - Intent
Android Lesson 3 - IntentAndroid Lesson 3 - Intent
Android Lesson 3 - IntentDaniela Da Cruz
 
Android Introduction - Lesson 1
Android Introduction - Lesson 1Android Introduction - Lesson 1
Android Introduction - Lesson 1Daniela Da Cruz
 

Plus de Daniela Da Cruz (9)

Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-C
 
Games Concepts
Games ConceptsGames Concepts
Games Concepts
 
C basics
C basicsC basics
C basics
 
Game Development with AndEngine
Game Development with AndEngineGame Development with AndEngine
Game Development with AndEngine
 
Interactive Verification of Safety-Critical Systems
Interactive Verification of Safety-Critical SystemsInteractive Verification of Safety-Critical Systems
Interactive Verification of Safety-Critical Systems
 
Android Introduction
Android IntroductionAndroid Introduction
Android Introduction
 
Android Lesson 3 - Intent
Android Lesson 3 - IntentAndroid Lesson 3 - Intent
Android Lesson 3 - Intent
 
Android Lesson 2
Android Lesson 2Android Lesson 2
Android Lesson 2
 
Android Introduction - Lesson 1
Android Introduction - Lesson 1Android Introduction - Lesson 1
Android Introduction - Lesson 1
 

Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

  • 1. A Comment Analysis Approach for Program Comprehension José L. Freitas 1 Daniela da Cruz 1 Pedro R. Henriques 1 1 Universidade do Minho, Portugal Software Engineering Workshop, Crete Oct. 12-13, 2012 Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 2. Context Program Comprehension is a vital task of Software Maintenance. In Software Maintenance, 50% of the time is spent on comprehending the system. Several approaches of source code analysis have been applied to develop PC tools: program slicing, control-ow, data-ow, etc. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 3. Motivation Most of PC tools are based on the extraction of structural information. Example: Function Y is used by function X n times. etc. However, they lack the extraction of the meaning of a program or the Problem domain concepts related with the program. Example: Function Y calculates the amount of credit of a banking account. etc. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 4. Motivation Comments can be the biggest source of semantic information on code, alongside with identiers. 1 / ∗ T h i s f u n c t i o n r e c e i v e s t h e i d number o f a b a n k i n g a c c o u n t and r e t u r n s t h e a v a i l a b l e amount o f c r e d i t ∗/ 3 int credit ( int id ){ . . . } Why not use comments to search for Problem Domain concepts, needed to understand a program? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 5. Bad and Good Comments When a comment is bad or good? Apart from the existing controversy around this subject, a bad comment can start from being a comment which is inconsistent with the code which is commenting, and that leads to the misleading of the person who reads it. states that comments help on the comprehension if they provide Problem and Program Domain information and means to Brooks establish bridges between those two domains. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 6. Goal Create a Program Comprehension tool that explores comments to search for Problem Domain concepts: Darius. 1 1 Relative to King Darius I of Persia, the rst known man to create the rst bridge between Europe and Asia, on the Bosphorus strait. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 7. Outline Darius Comment Evaluator Preliminary study 1 Darius Concept Locator Experiment 2 3 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 8. Darius Comment Evaluator The rst version of Darius analyzes: Comment Quantity: number of comments, percentage of comments, etc. Comment Content: Use of Problem Domain and Program Domain terms. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 9. Darius Preliminary study Comment Evaluator Modules Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 10. Darius GUI (1) Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 11. Darius Comment Extractor module Comment Extractor Darius extract three types of comments: 1 Inline Comments, IC for short: // ... 2 Block Comments, BC for short: /* ... */ 3 JavaDoc Comments, BC for short: /** ... */ Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 12. Darius Comment Extractor module In order to discover and identify what type of source code entity is associated with the comment, the next line after the comment is extracted too. Darius associates comments with: 1 classes 2 interfaces 3 methods 4 conditionals (if) 5 loops (while and for) 6 switches Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 13. Darius Statistics Calculator module Statistic Calculator module Number of comments of a project (global, per type of comment and per line of source code); Average number of comment lines per lines of code; Average number of lines of a non inline comment; Average number of each type of source code entity which is commented; Type of comments most used (global and per source code entity). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 14. Darius Words Analyzer module Words Analyzer module Given a list of words extracted from the ontology of the Problem Domain, Darius computes: Percentage and frequency of words in the list found in comments; Frequency of each type of comment that contains words from the list; Frequency of each type of source code entity commented that contains words from the list. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 15. Darius Words Analyzer module Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 16. Outline Darius Comment Evaluator Preliminary study 1 Darius Concept Locator Experiment 2 3 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 17. Darius Preliminary study In order to perform a preliminary study, 10 open-source software projects written in Java were selected. The choice for the use of open-source projects has two reasons: 1 The source code is totally free; Open-source software projects are highly used by the community to change and manipulate the source code over 2 and over again These kind of projects tend to be constantly updated and thus comprehension tasks are involved. Commenting can be a proper way of helping on these tasks. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 18. Darius Preliminary study Project Description Files LoC Classes iText PDF Library 480 145666 403 ganttproject Project Management Library 530 68945 394 gwt-dev Google's Web Toolkit 987 192738 803 jEdit Text Editor 531 176006 404 vuze Peer-to-peer client 3284 785935 2463 junit Tests Framework 154 10926 130 jfreechart Chart Library 989 313231 876 antlr Grammar Framework 221 85867 212 jexcelapi Excel Library 438 93876 166 robocode Programming Game of Robots 571 81519 485 Total 8185 1954709 6336 Table : Description and size of each selected project Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 19. Darius Preliminary study results Comment Quantity: 6/10 test programs ≥ 19% comments Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 20. Darius Preliminary study results Type of Comments Project #CM CM/LOSC #IC #BC #JD iText 13343 0.24 4930 3777 4636 ganttproject 4468 0.11 2925 814 729 gwt-dev 12969 0.16 7219 866 4884 jEdit 18986 0.21 806 14421 3759 vuze 27723 0.08 18245 2319 7159 junit 519 0.21 2 77 440 jfreechart 22516 0.27 6592 2530 13394 antlr 5292 0.14 3903 1380 9 jexcelapi 8354 0.26 2213 775 5366 robocode 5071 0.19 3108 102 1861 Total 119241 0.16 63633 13371 42237 Table : Comments Frequency in the projects. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 21. Darius Preliminary study results Project If For While Switch Class Interf. Method iText 5 7 7 7 89 90 76 ganttproject 5 5 3 8 57 41 18 gwt-dev 9 10 7 5 96 97 19 jEdit 9 8 4 2 86 79 61 vuze 6 6 5 7 45 46 24 junit 1 0 0 0 25 71 37 jfreechart 6 10 2 18 100 100 100 antlr 11 16 5 4 61 56 22 jexcelapi 14 18 12 0 99 100 88 robocode 7 11 12 3 76 94 20 Total 7 8 6 5 69 60 45 Table : Percentage of Source Code Entities commented Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 22. Darius Preliminary study results Project If For While Switch Class Interf. Method iText IC IC IC IC JD JD JD ganttproject IC IC IC IC JD JD JD gwt-dev IC IC IC IC JD JD JD jEdit IC IC IC IC JD JD JD vuze IC IC IC IC JD JD JD junit IC NA NA NA JD JD JD jfreechart IC IC IC IC JD JD JD antlr IC IC IC IC BC BC BC jexcelapi IC IC BC NA JD JD JD robocode IC IC IC IC JD JD JD Total IC IC IC IC JD JD JD Table : Most used type of comment per type of source code entity Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 23. Darius Preliminary study results Comment Content: 10/10 test programs ≥ 23% Problem and Program Domain terms Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 24. Darius Preliminary study results Goal: explore the content of comments, by checking weather comments contain Problem and Program domain information. Information necessary to run these tests: a list of problem domain terms for each one of the software projects. a (single) list of program domain terms. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 25. Darius Preliminary study results Project Problem Domain Program Domain iText 92.31 86.76 ganttproject 84.31 75.0 gwt-dev 56.34 86.76 jEdit 89.74 86.76 vuze 92.11 88.24 junit 81.82 67.65 jfreechart 86.36 89.71 antlr 88.24 83.82 jexcelapi 79.31 85.29 robocode 88.89 83.82 Total 82.21 83.38 Table : Percentage of domain words found Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 26. Darius Preliminary study results IC BC JD Project Prob. Prog. Prob. Prog. Prob. Prog. iText 13.05 13.99 6.89 14.1 14.7 17.36 ganttproject 13.76 13.52 14.78 11.58 14.67 14.2 gwt-dev 0.96 19.7 2.03 18.31 2.62 22.16 jEdit 5.1 17.15 6.44 24.76 9.28 16.69 vuze 4.6 18.02 5.14 11.38 4.29 18.89 junit 0 20.0 17.14 16.57 22.66 25.77 jfreechart 20.7 20.73 16.74 12.45 15.58 21.41 antlr 13.85 13.81 13.95 10.7 2.13 11.35 jexcelapi 10.38 16.16 17.08 12.97 24.97 17.01 robocode 17.0 14.06 16.52 12.6 25.13 12.5 Total 9.97 16.27 8.33 14.58 13.13 19.13 Table : Frequency (%) of words of each Domain per type of comment Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 27. Darius Preliminary study conclusion Higher level source code entities tend to have comments oriented for Problem Domain information, while comments of lower level entities tends to include more Program Domain information. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 28. Outline Darius Comment Evaluator Preliminary study 1 Darius Concept Locator Experiment 2 3 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 29. Darius - Problem Concept Location Goal: Search of Problem Domain concepts to nd the mappings of these concepts on the source code. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 30. Darius GUI (2) Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 31. Darius Employed Techniques Latent Semantic Analysis (LSA) A technique in natural language processing, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur close together in text. It constructs a matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 32. Darius Employed Techniques Vector Space Model (VSM) An algebraic model for representing text documents as vectors. Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. A weight is used to evaluate how important a word is to a document in a collection. The importance increases proportionally to the number of times a word appears in the document. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 33. Outline Darius Comment Evaluator Preliminary study 1 Darius Concept Locator Experiment 2 3 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 34. Darius Experiment with iText The object of study chosen to be subject on this test, is iText. iText contains a sucient amount of comments, and the contents of that comments have a sucient dose of Problem and Program domain information, and so this program can be explored for PC purposes using its comments. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 35. Darius Experiment with iText Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 36. Darius Experiment with iText How a PDF document is created? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 37. Darius Experiment with iText How to write a PDF document into an output stream? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 38. Darius Experiment with iText How to add a title to a PDF? Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 39. Darius Experiment with iText Going deeper in the searches and using the information discovered in each executed query, the programmer can build an incremental knowledge of the software. The programmer should be able to gure out the implementation of every concept on the source code and the relations among them, by using the information present on comments. Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 40. Outline Darius Comment Evaluator Preliminary study 1 Darius Concept Locator Experiment 2 3 Conclusion Freitas, Cruz, Henriques Comment Analysis for Program Comprehension
  • 41. Conclusion Do real world programs actually contain enough and meaningful comments to justify the analysis eort and the approach proposed? Using simple but eective queries the process of locating concepts using comment information is faster than the complex task of reading the whole source code of the program. Darius shows the potential value of comprehension that comments poses. As future work: Questionnaires will be made to understand how a programmer would deal with Darius. Develop Darius as a plugin for an IDE (e.g. Eclipse). Freitas, Cruz, Henriques Comment Analysis for Program Comprehension