SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Delving (Smalltalk) Source Code
   with Formal Concept Analysis



 Pr. Kim Mens           Dr. Tom Tourwé
 INGI / UCL               SEN / CWI


      Monday, September 6, 2004
Overview
     Research goal
 

     A crash course on formal concept analysis
 

     Delving Smalltalk source code with FCA
 

     Experiments
 

     Results
 

     Conclusion
 




September 6, 2004         ESUG 2004 Research Track   2
Research Goal
     A lightweight source-code mining tool
 
      –  get “initial understanding” of structure of software system
      –  detect recurring patterns in the source code
     Formal concept analysis (FCA)
 
      –  A mathematical technique
      –  Known applications in data analysis and knowledge processing
     Can we use FCA to delve code for indications of patterns?
 
      –    Coding conventions
      –    Programming idioms and design patterns
      –    Opportunities for refactoring
      –    Relevant domain concepts




September 6, 2004                  ESUG 2004 Research Track             3
A crash course on FCA — example

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++

    Java
                            Find relevant taxonomy of
                              programming languages
Smalltalk

                         based on their common properties
 Scheme

  Prolog


September 6, 2004                      ESUG 2004 Research Track                      4
A crash course on FCA — theory
A.  Starts from
      –  a set of elements
      –  a set of properties of those elements
      –  incidence table
B.  Determines concepts
      –  Maximal groups of elements and properties
      –  Group:
             •  Every element of the concept has those properties
             •  Every property of the concept holds for those elements
      –  Maximal
             •  No other element (outside the concept) has those same properties
             •  No other property (outside the concept) is shared by all elements
C.  Organizes these concepts in a lattice structure


September 6, 2004                      ESUG 2004 Research Track                 5
A. Incidence table

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                      6
B. Concept 1

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                      7
B. Concept 2

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                      8
B. Concept 3

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                      9
B. More concepts …

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                   10
B. More concepts …

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                   11
B. More concepts …

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                   12
B. Concepts

                    object-                                       static   dynamic
                               functional        logic
                    oriented                                      typing    typing

     C++               X           -               -                X         -

    Java               X           -               -                X         -

Smalltalk              X           -               -                -         X


 Scheme                -           X               -                -         X

  Prolog               -           -               X                -         X


September 6, 2004                      ESUG 2004 Research Track                   13
C. Concept Lattice




September 6, 2004   ESUG 2004 Research Track   14
Delving ST source code with FCA
     Elements : classes, methods, argument names
 

     Properties : substrings of classes, methods, …
 

                                                     The “Foo” concept


                     Foo             Zork
                                 import: aFoo

                    Bar
               asFoo: anObject



September 6, 2004                 ESUG 2004 Research Track               15
Foo            Zork
                                                                             import: aFoo




Delving source code
                                                                Bar
                                                           asFoo: anObject




1.  Generate the formal context
      •  Elements, properties & incidence relation
2.  Concept Analysis
      •  Calculate the formal concepts
      •  Organize them into a concept lattice
3.  Filtering
      •  Remove irrelevant concepts (false positives, noise,
         useless, …)
4.  Classify, combine and annotate concepts
      •  In a way that is more easy for a software engineer to
         interpret



September 6, 2004               ESUG 2004 Research Track                                    16
DelfSTof, our Code Delving Tool




September 6, 2004   ESUG 2004 Research Track   17
Foo            Zork
                                                                                     import: aFoo




1. Generate formal context
                                                                        Bar
                                                                   asFoo: anObject




     We want to group elements that share a substring
 
     As elements we collect
 
      –  all classes, methods and parameters
      –  in some package(s) of interest
     As properties : “relevant” substrings of element names
 
      –  Normalisation :
             •  extract terms based on where uppercases occur
             •  convert to lower case and remove special characters like ‘:’
             •  QuotedCodeConstant                 { quoted, code, constant }
                                          →
      –  Elimination of stopwords : with, do, object
      –  Stemming : reduce words to their root
     Incidence relation : An element has a certain property if
 
      –  It has the substring in its name



September 6, 2004                       ESUG 2004 Research Track                                    18
Foo              Zork
                                                                                                         import: aFoo




 2. Concept Analysis
                                                                                         Bar
                                                                                    asFoo: anObject




                                       unify   index       env     source message   functor           variable          …


  Object>>unifyWithObject: inEnv:
                                        X       X           X        X         -                         -              …
   myIndex: hisIndex: inSource:

Variable>>unifyWithMessageFunctor:
                                        X       X           X        X         X      X                  -              …
 inEnv: myIndex: hisIndex: inSource:

  AbstractTerm>>unifyWith: inEnv:
                                        X       X           X        X         -       -                 -              …
    myIndex: hisIndex: inSource:

  AbstractTerm>>unifyWithVariable:
                                        X       X           X        X         -      X                  X              …
 inEnv: myIndex: hisIndex: inSource:


                 …                      X       X           X        X         …      …                 …               …



 September 6, 2004                                  ESUG 2004 Research Track                                            19
Foo              Zork
                                                                                                         import: aFoo




 2. Concept Analysis
                                                                                         Bar
                                                                                    asFoo: anObject




                                       unify   index       env     source message   functor           variable          …


  Object>>unifyWithObject: inEnv:
                                        X       X           X        X         -                         -              …
   myIndex: hisIndex: inSource:

Variable>>unifyWithMessageFunctor:
                                        X       X           X        X         X      X                  -              …
 inEnv: myIndex: hisIndex: inSource:

  AbstractTerm>>unifyWith: inEnv:
                                        X       X           X        X         -       -                 -              …
    myIndex: hisIndex: inSource:

  AbstractTerm>>unifyWithVariable:
                                        X       X           X        X         -      X                  X              …
 inEnv: myIndex: hisIndex: inSource:


                 …                      X       X           X        X         …      …                 …               …



 September 6, 2004                                  ESUG 2004 Research Track                                            20
Foo            Zork
                                                                                             import: aFoo




3. Filtering
                                                                                Bar
                                                                           asFoo: anObject




     Preprocessing to filter irrelevant properties :
 
      –  with little meaning : “do”, “with”, “for”, “from”, “the”, “ifTrue”, …
      –  too small (< 3 chars)
      –  ignore plurals, uppercase and colons
     Extra filtering
 
      –  Drop top & bottom concept when empty
      –  Drop concepts with two elements are less
      –  Drop concepts that group only classes
     More filtering needed (ongoing work)
 
      –    Recombine substrings belonging together
      –    Require some minimal coverage of element name by properties
      –    Concepts higher in the lattice may be more relevant (more properties)
      –    Avoid redundancy in discovered concepts
             •  Make better use of the lattice structure (Now it is “flattened”)




September 6, 2004                           ESUG 2004 Research Track                                        21
Foo



4. Classification,
                                                                                                Zork
                                                                                             import: aFoo
                                                                                Bar
                                                                           asFoo: anObject



Combination & Annotation
     Annotate concepts with their properties
 
      –  i.e. with the substring(s) shared by their elements
     Classification
 
      –  Single class concepts
             •  Elements are methods (or their parameters) in that class
      –  Hierarchy concepts
             •  Group classes, methods and parameters in same class hierarchy
             •  Annotate concept with root of hierarchy
             •  Annotate methods with implementing class
      –  Crosscutting concepts
             •  When two different class hierarchies are involved
     Combine concepts
 
      –  that belong together (subconcept relationship)
     Group methods
 
      –  belonging to the same class



September 6, 2004                          ESUG 2004 Research Track                                         22
Foo            Zork
                                                                                             import: aFoo




Quantitative results
                                                                                Bar
                                                                           asFoo: anObject




Case study          #elements #properties              #raw          #filtered         time (sec)

DelfSTof             802 (135)                247         650              131                               5
StarBrowser           731 (52)                352         740              115                               7
Soul                1488 (111)                438       1206              284                               22
CodeCrawler          1370 (93)                477        1419             327                               24
Ref.Browser         4834 (271)                736       4228             1243                        447

     Time to compute = a few seconds / minutes
 

      properties  <  elements  is a good sign
 

     Still too much concepts remain after filtering
 
       Upperlimit: theoretical < 2min(#elements, #properties); experimental < #elements


September 6, 2004                         ESUG 2004 Research Track                                          23
Foo



Discovered “indications”
                                                                                   Zork
                                                                                import: aFoo
                                                                   Bar
                                                              asFoo: anObject



of patterns
     Code duplication
 

     Design patterns
 
      –  Visitor, Abstract Factory, Builder, Observer
     Programming idioms
 
      –  Accessing methods, chained messages, delegating methods,
         polymorphism
     Relevant domain concepts
 
      –  Correspond to frequently occurring properties
      –  “Unification”, “Bindings”, “Horn clauses”, “resolution”
     Opportunities for refactoring
 

     Some crosscutting concerns
 




September 6, 2004                  ESUG 2004 Research Track                                    24
Conclusion
     Current status : feasibility study
 
       –  Approach produced relevant results
       –  Efficiency is acceptable
       –  Tool needs refinement
              •  More advanced filtering ; extra checking a posteriori
     Future work : applying FCA to delve source code for
 
       –  aspects and crosscutting concerns
              •  based on “generic parse trees”
              •  by using an incidence relation that represents “message sends”
       –  refactoring opportunities
       –  Both Smalltalk and Java source code




September 6, 2004                        ESUG 2004 Research Track                 25

Contenu connexe

Plus de kim.mens

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolutionkim.mens
 
Context-Oriented Programming
Context-Oriented ProgrammingContext-Oriented Programming
Context-Oriented Programmingkim.mens
 
Software Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented ProgrammingSoftware Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented Programmingkim.mens
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smellskim.mens
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristicskim.mens
 
Software Patterns
Software PatternsSoftware Patterns
Software Patternskim.mens
 
Code Refactoring
Code RefactoringCode Refactoring
Code Refactoringkim.mens
 
Domain Modelling
Domain ModellingDomain Modelling
Domain Modellingkim.mens
 
Object-Oriented Application Frameworks
Object-Oriented Application FrameworksObject-Oriented Application Frameworks
Object-Oriented Application Frameworkskim.mens
 
Towards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation FrameworkTowards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation Frameworkkim.mens
 
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty ApproachesTowards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty Approacheskim.mens
 
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software EngineeringBreaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineeringkim.mens
 
Context-oriented programming
Context-oriented programmingContext-oriented programming
Context-oriented programmingkim.mens
 
Basics of reflection
Basics of reflectionBasics of reflection
Basics of reflectionkim.mens
 
Advanced Reflection in Java
Advanced Reflection in JavaAdvanced Reflection in Java
Advanced Reflection in Javakim.mens
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in javakim.mens
 
Reflection in Ruby
Reflection in RubyReflection in Ruby
Reflection in Rubykim.mens
 
Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to Rubykim.mens
 
Introduction to Smalltalk
Introduction to SmalltalkIntroduction to Smalltalk
Introduction to Smalltalkkim.mens
 
A gentle introduction to reflection
A gentle introduction to reflectionA gentle introduction to reflection
A gentle introduction to reflectionkim.mens
 

Plus de kim.mens (20)

Software Maintenance and Evolution
Software Maintenance and EvolutionSoftware Maintenance and Evolution
Software Maintenance and Evolution
 
Context-Oriented Programming
Context-Oriented ProgrammingContext-Oriented Programming
Context-Oriented Programming
 
Software Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented ProgrammingSoftware Reuse and Object-Oriented Programming
Software Reuse and Object-Oriented Programming
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smells
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristics
 
Software Patterns
Software PatternsSoftware Patterns
Software Patterns
 
Code Refactoring
Code RefactoringCode Refactoring
Code Refactoring
 
Domain Modelling
Domain ModellingDomain Modelling
Domain Modelling
 
Object-Oriented Application Frameworks
Object-Oriented Application FrameworksObject-Oriented Application Frameworks
Object-Oriented Application Frameworks
 
Towards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation FrameworkTowards a Context-Oriented Software Implementation Framework
Towards a Context-Oriented Software Implementation Framework
 
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty ApproachesTowards a Taxonomy of Context-Aware Software Variabilty Approaches
Towards a Taxonomy of Context-Aware Software Variabilty Approaches
 
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software EngineeringBreaking the Walls: A Unified Vision on Context-Oriented Software Engineering
Breaking the Walls: A Unified Vision on Context-Oriented Software Engineering
 
Context-oriented programming
Context-oriented programmingContext-oriented programming
Context-oriented programming
 
Basics of reflection
Basics of reflectionBasics of reflection
Basics of reflection
 
Advanced Reflection in Java
Advanced Reflection in JavaAdvanced Reflection in Java
Advanced Reflection in Java
 
Basics of reflection in java
Basics of reflection in javaBasics of reflection in java
Basics of reflection in java
 
Reflection in Ruby
Reflection in RubyReflection in Ruby
Reflection in Ruby
 
Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to Ruby
 
Introduction to Smalltalk
Introduction to SmalltalkIntroduction to Smalltalk
Introduction to Smalltalk
 
A gentle introduction to reflection
A gentle introduction to reflectionA gentle introduction to reflection
A gentle introduction to reflection
 

Dernier

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 

Dernier (20)

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 

Delving Source Code with Formal Concept Analysis

  • 1. Delving (Smalltalk) Source Code with Formal Concept Analysis Pr. Kim Mens Dr. Tom Tourwé INGI / UCL SEN / CWI Monday, September 6, 2004
  • 2. Overview Research goal   A crash course on formal concept analysis   Delving Smalltalk source code with FCA   Experiments   Results   Conclusion   September 6, 2004 ESUG 2004 Research Track 2
  • 3. Research Goal A lightweight source-code mining tool   –  get “initial understanding” of structure of software system –  detect recurring patterns in the source code Formal concept analysis (FCA)   –  A mathematical technique –  Known applications in data analysis and knowledge processing Can we use FCA to delve code for indications of patterns?   –  Coding conventions –  Programming idioms and design patterns –  Opportunities for refactoring –  Relevant domain concepts September 6, 2004 ESUG 2004 Research Track 3
  • 4. A crash course on FCA — example object- static dynamic functional logic oriented typing typing C++ Java Find relevant taxonomy of programming languages Smalltalk based on their common properties Scheme Prolog September 6, 2004 ESUG 2004 Research Track 4
  • 5. A crash course on FCA — theory A.  Starts from –  a set of elements –  a set of properties of those elements –  incidence table B.  Determines concepts –  Maximal groups of elements and properties –  Group: •  Every element of the concept has those properties •  Every property of the concept holds for those elements –  Maximal •  No other element (outside the concept) has those same properties •  No other property (outside the concept) is shared by all elements C.  Organizes these concepts in a lattice structure September 6, 2004 ESUG 2004 Research Track 5
  • 6. A. Incidence table object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 6
  • 7. B. Concept 1 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 7
  • 8. B. Concept 2 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 8
  • 9. B. Concept 3 object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 9
  • 10. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 10
  • 11. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 11
  • 12. B. More concepts … object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 12
  • 13. B. Concepts object- static dynamic functional logic oriented typing typing C++ X - - X - Java X - - X - Smalltalk X - - - X Scheme - X - - X Prolog - - X - X September 6, 2004 ESUG 2004 Research Track 13
  • 14. C. Concept Lattice September 6, 2004 ESUG 2004 Research Track 14
  • 15. Delving ST source code with FCA Elements : classes, methods, argument names   Properties : substrings of classes, methods, …   The “Foo” concept Foo Zork import: aFoo Bar asFoo: anObject September 6, 2004 ESUG 2004 Research Track 15
  • 16. Foo Zork import: aFoo Delving source code Bar asFoo: anObject 1.  Generate the formal context •  Elements, properties & incidence relation 2.  Concept Analysis •  Calculate the formal concepts •  Organize them into a concept lattice 3.  Filtering •  Remove irrelevant concepts (false positives, noise, useless, …) 4.  Classify, combine and annotate concepts •  In a way that is more easy for a software engineer to interpret September 6, 2004 ESUG 2004 Research Track 16
  • 17. DelfSTof, our Code Delving Tool September 6, 2004 ESUG 2004 Research Track 17
  • 18. Foo Zork import: aFoo 1. Generate formal context Bar asFoo: anObject We want to group elements that share a substring   As elements we collect   –  all classes, methods and parameters –  in some package(s) of interest As properties : “relevant” substrings of element names   –  Normalisation : •  extract terms based on where uppercases occur •  convert to lower case and remove special characters like ‘:’ •  QuotedCodeConstant { quoted, code, constant } → –  Elimination of stopwords : with, do, object –  Stemming : reduce words to their root Incidence relation : An element has a certain property if   –  It has the substring in its name September 6, 2004 ESUG 2004 Research Track 18
  • 19. Foo Zork import: aFoo 2. Concept Analysis Bar asFoo: anObject unify index env source message functor variable … Object>>unifyWithObject: inEnv: X X X X - - … myIndex: hisIndex: inSource: Variable>>unifyWithMessageFunctor: X X X X X X - … inEnv: myIndex: hisIndex: inSource: AbstractTerm>>unifyWith: inEnv: X X X X - - - … myIndex: hisIndex: inSource: AbstractTerm>>unifyWithVariable: X X X X - X X … inEnv: myIndex: hisIndex: inSource: … X X X X … … … … September 6, 2004 ESUG 2004 Research Track 19
  • 20. Foo Zork import: aFoo 2. Concept Analysis Bar asFoo: anObject unify index env source message functor variable … Object>>unifyWithObject: inEnv: X X X X - - … myIndex: hisIndex: inSource: Variable>>unifyWithMessageFunctor: X X X X X X - … inEnv: myIndex: hisIndex: inSource: AbstractTerm>>unifyWith: inEnv: X X X X - - - … myIndex: hisIndex: inSource: AbstractTerm>>unifyWithVariable: X X X X - X X … inEnv: myIndex: hisIndex: inSource: … X X X X … … … … September 6, 2004 ESUG 2004 Research Track 20
  • 21. Foo Zork import: aFoo 3. Filtering Bar asFoo: anObject Preprocessing to filter irrelevant properties :   –  with little meaning : “do”, “with”, “for”, “from”, “the”, “ifTrue”, … –  too small (< 3 chars) –  ignore plurals, uppercase and colons Extra filtering   –  Drop top & bottom concept when empty –  Drop concepts with two elements are less –  Drop concepts that group only classes More filtering needed (ongoing work)   –  Recombine substrings belonging together –  Require some minimal coverage of element name by properties –  Concepts higher in the lattice may be more relevant (more properties) –  Avoid redundancy in discovered concepts •  Make better use of the lattice structure (Now it is “flattened”) September 6, 2004 ESUG 2004 Research Track 21
  • 22. Foo 4. Classification, Zork import: aFoo Bar asFoo: anObject Combination & Annotation Annotate concepts with their properties   –  i.e. with the substring(s) shared by their elements Classification   –  Single class concepts •  Elements are methods (or their parameters) in that class –  Hierarchy concepts •  Group classes, methods and parameters in same class hierarchy •  Annotate concept with root of hierarchy •  Annotate methods with implementing class –  Crosscutting concepts •  When two different class hierarchies are involved Combine concepts   –  that belong together (subconcept relationship) Group methods   –  belonging to the same class September 6, 2004 ESUG 2004 Research Track 22
  • 23. Foo Zork import: aFoo Quantitative results Bar asFoo: anObject Case study #elements #properties #raw #filtered time (sec) DelfSTof 802 (135) 247 650 131 5 StarBrowser 731 (52) 352 740 115 7 Soul 1488 (111) 438 1206 284 22 CodeCrawler 1370 (93) 477 1419 327 24 Ref.Browser 4834 (271) 736 4228 1243 447 Time to compute = a few seconds / minutes    properties  <  elements  is a good sign   Still too much concepts remain after filtering   Upperlimit: theoretical < 2min(#elements, #properties); experimental < #elements September 6, 2004 ESUG 2004 Research Track 23
  • 24. Foo Discovered “indications” Zork import: aFoo Bar asFoo: anObject of patterns Code duplication   Design patterns   –  Visitor, Abstract Factory, Builder, Observer Programming idioms   –  Accessing methods, chained messages, delegating methods, polymorphism Relevant domain concepts   –  Correspond to frequently occurring properties –  “Unification”, “Bindings”, “Horn clauses”, “resolution” Opportunities for refactoring   Some crosscutting concerns   September 6, 2004 ESUG 2004 Research Track 24
  • 25. Conclusion Current status : feasibility study   –  Approach produced relevant results –  Efficiency is acceptable –  Tool needs refinement •  More advanced filtering ; extra checking a posteriori Future work : applying FCA to delve source code for   –  aspects and crosscutting concerns •  based on “generic parse trees” •  by using an incidence relation that represents “message sends” –  refactoring opportunities –  Both Smalltalk and Java source code September 6, 2004 ESUG 2004 Research Track 25