Aspect Mining - An emerging research domain

Aspect Mining

An emerging research domain

Prof. Kim Mens
Département d’Ingénierie Informatique (INGI)
Université catholique de Louvain (UCL)

http://www.info.ucl.ac.be/~km

1

Aspect Mining
an emerging research domain
• Motivation
• Preliminaries
• formal concept analysis
• fan-in
• Three aspect mining techniques
• identifier analysis
• dynamic analysis
• fan-in analysis
• comparison
• Conclusion

2

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

3

Need for aspect mining

• Aspects offer a better separation of concerns
by solving the problems of “scattering” and “tangling”

• But what if we have (non AO) legacy code
how can we migrate it to an AOP solution?

• Need for aspect mining
• aspect identiﬁcation : how to ﬁnd all code relevant to some
crosscutting concern
• aspect refactoring : and turn it into an aspect
• with the help of automated software tools

4

Three open research
problems in AOP

“Aspect Mining”

5

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

6

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

7

Formal Concept Analysis (FCA)
• Starts from
• a set of elements
• a set of properties of those elements

• Determines concepts
• Maximal groups of elements and properties
• Group:
• Every element of the concept has those properties
• Every property of the concept holds for those elements
• Maximal
• No other element (outside the concept) has those same
properties
• No other property (outside the concept) is shared by all
8

FCA : Elements and Properties

object- static dynamic
functional logic
oriented typing typing

C++ X - - X -

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

Prolog - - X - X

9

FCA : Concepts

functional logic

C++ - - X -
X

Java X - - X -

X
Smalltalk - - - X

Scheme - X - - X

Prolog - - X - X

10

FCA : Concepts

functional logic

C++ - - X -
X

Java X - - X -

X
Smalltalk - - - X

Scheme - X - - X

X
Prolog - - X -

11

FCA : Concepts

functional logic

C++ -
X - - X

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

X
Prolog - - X -

12

FCA : Concepts

functional logic

C++ -
X - - X

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

X
Prolog - - X -

13

FCA : Concepts

functional logic

C++ -
X - - X

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

Prolog - - X - X

14

FCA : Concepts

functional logic

C++ -
X - - X

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

Prolog - - X - X

15

FCA : Concepts

functional logic

C++ -
X - - X

Java X - - X -

Smalltalk X - - - X

Scheme - X - - X

Prolog - - X - X

16

FCA : Concept Lattice

Concept hierarchy
based on containment
relation between
concepts.
17

“Mined” Concepts
Properties shared by
all languages (none)

Languages with
OO languages
dynamic typing

Static. typed Dynam. typed Dynam. typed Dynam. typed
OO languages OO languages funct. languages logic languages

Languages having
all properties (none)

18

Concept Lattice
with sparse labeling

object-oriented dynamic typing

Java, C++ Smalltalk Scheme Prolog
static typing functional logic

the labeling algorithm
detects for each concept its
most speciﬁc elements and
properties.
19

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

20

Fan-in
• Fan-in metric [Henderson-Sellers]
counts the number of locations from which
module /
control is passed into a module
method
• Fan-in metric for OOP
• applied to method M
• number of distinct method bodies that can invoke M

• What about polymorphism?
• one call-site can affect the fan-in of several methods
• a call to M contributes to the fan-in of M but also of all its
overriding methods as well as all methods it overrides
• this interpretation corresponds to the standard behavior of the
“search for references” feature of the Eclipse IDE
21

Fan-In Example
Fan-in of a method m =
the number of distinct method
interface A{

public void m();
bodies that can invoke m
}

class B implements A {

public void m() {};

}
Method Caller set Fan-in
class C1 extends B {

public void m() {};
A.m {D.f1, D.f2, D.f3} 3

}
B.m {D.f1, D.f2, D.f3, C2.m} 4

class C2 extends B {
C1.m {D.f1, D.f2, D.f3} 3

public void m() { super.m(); };
C2.m {D.f1, D.f2} 2

}

class D {

void f1(A a) { a.m(); }

void f2(B b) { b.m(); }

void f3(C1 c) { c.m(); }

}

22

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

23

Aspect Mining Techniques
• Aspect mining is an emerging research domain
• Several aspect mining techniques are being proposed
• Based on pattern matching, clone detection, logic
reasoning, concept analysis, clustering, fan-in analysis,
program slicing...
• We will focus on three speciﬁc techniques
• Two based on formal concept analysis :
• Identiﬁer analysis
• Dynamic analysis
• One based on fan-in metric :
• Fan-in analysis

24

Some references
• A qualitative comparison of three aspect mining techniques
M. Ceccato, M. Marin, K. Mens, L. Moonen, P. Tonella, T. Tourwé
Int’l Working Conference on Program Comprehension 2005
• Mining aspectual views using formal concept analysis
T. Tourwé, K. Mens
Int’l workshop on Source-Code Analysis and Manipulation 2004
• Aspect mining through the formal concept analysis of execution traces
P. Tonella, M. Ceccato
IEEE Working Conference on Reverse Engineering 2004
• Identifying aspects using fan-in analysis
M. Marin, A. van Deursen, L. Moonen
IEEE Working Conference on Reverse Engineering 2004

25

3 Aspect Mining Techniques
• Approach : Use FCA to group classes/methods with similar names
• Motivation : In absence of real AOP support, (OO) developers often
rely on naming conventions

• Approach : Use FCA to relate methods to the use case scenarios in
which they appear
• Motivation : Methods used in different scenarios may represent a
crosscutting concern

• Fan-in analysis
• Approach : Look for methods with a high fan-in value
• Motivation : Methods that are being invoked from “all over the
place” indicate a kind of scattering

26

Case study
• Applied each technique to same case study
• JHotDraw
• Framework for 2D graphics ~ 18,000NCLOC
• Open source (jhotdraw.org)
• Rather well designed (design patterns)
• shows relevance of aspect mining even for well-designed
cases

• Qualitative comparison of identiﬁed aspects

27

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion
28

“Identiﬁer analysis”
• Idea: Use FCA to group program entities with
similar names
• Elements : classes and methods
• Properties : substrings of the elements’ names
• Only considering crosscutting groups

• Approach relies on naming conventions
• Primary means to associated related but distant program
entities, in absence of designated AOP constructs
• Especially for object-oriented software
• polymorphism, intention-revealing names, design patterns, ...

• Joint work with Dr. Tom Tourwé (CWI)

29

Identiﬁer analysis approach
1. Generate the formal context
Elements, properties & incidence relation
2. Concept Analysis
Calculate the formal concepts
(& organise them into a concept lattice)
3. Filtering
Remove irrelevant concepts
• too small
• not scattered
4. Manually inspect concepts
Are they really an aspect or crosscutting concern?

30

1. Generate formal context
• Elements
• all classes and methods in analyzed program
• except test classes, accessor methods (produce too much
noise)

• Properties
• all “relevant” substrings of the elements’ names
• Based on where uppercases occur in an element’s name
• createUndoActivity → { create, undo, activity }
• Filter substrings that produce too much noise
• Uses stemming algorithm to map substrings to same ‘stem’

• Incidence relation : an element has a property if it
has the substring in its name
31

2. Concept Analysis
groups entities with similar identiﬁers

figure drawing request remove update change event …

drawingRequestUpdate(DrawingChangeEvent e) - X X - X - - …

figureRequestRemove(FigureChangeEvent e) X - X X - - - …

figureRequestUpdate(FigureChangeEvent e) X - X - X - - …

figureRequestRemove(FigureChangeEvent e) X - X X - - - …

figureRequestUpdate(FigureChangeEvent e) X - X - X - - …

… … … X … … … … …

32

3. Filtering
• Irrelevant elements and properties already ﬁltered
• substrings with little meaning or that are too small
• test classes and methods, accessor methods

• Extra ﬁltering
• Drop top & bottom concept when empty
• Drop concepts with too few elements (less than 4)
• Drop concepts where classes and methods are not ‘scattered’
• should be in at least 2 different unrelated class hierarchies

33

4. Manually inspect concepts
• Use DelfSTof, our Conceptual Code Mining tool
• Browse code of concept elements
• Does a discovered concept really present an aspect or
crosscutting concern?
• Is the code really similar?
• Or do the elements ‘accidentally’ have a similar name?
• Group concepts that seem to address a similar concern
• Persistence : ﬁle / storable / load / register

34

DelfSTof, our Conceptual
Code Mining tool

35

Case study : JHotDraw
• 2193 elements and 507 properties
• 230 concepts were discovered in 31 seconds
• when using a threshold of 4 for minimum number of elements
• with threshold 10 : 100 concepts ; similar execution time

• 41 crosscutting concerns identified
• after (laborous) manual analysis of the concepts
• three kinds :
• traditional aspects (observer, undo, persistence)
• crosscutting business logic (drawing figures, moving figures)
• Java-specific concerns (iterating over collections)

36

Selection of results of identifier analysis experiment
Crosscutting Some
Concept(s) #elements
concern elements
change / check / figureChanged(e) /
Observer 67 / 14 / 65 /12 checkDamage() /
listener / release createDesktopListener()

createUndoActivity() /
Undo undo / redo 53 / 14 redo()

Visitor visit 12 visit(FigureVisitor)

file / storable / registerFileFilters(c) /
Persistence 15 / 5 / 8 / 7 readStorable() /
load / register loadRegisteredImages

Drawing draw 112 draw(g)

moveBy(x,y)
Moving figures move 36 moveSelection(dx,dy)

iterator()
Iterating iterator 5 listIterator()

37

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

38

“Dynamic analysis”

Aspect Mining
through
Formal Concept Analysis
of
Execution Traces

Based on a presentation by Mariano Ceccato & Paolo Tonella (ITC-irst)
39

Why execution traces?
 We are interested in mining those crosscutting concerns
that are associated to a not well modularized system
requirements
Use-cases specify system requirements

Execution traces are the result of use-case executions


Trace 1
Scenario 1
(feature a)
Trace 2
Scenario 2
(feature b)
…
…
Trace N
Scenario N Run Application
(feature z)

40

Rough trace analysis output
 Dynamic analysis produces a relation between :
 Elements = methods (computational units)
 Properties = scenarios (use-cases)
 Relation R = in scenario s the method m is executed

 The table can be too big to be manually inspected or analyzed
 We need a way to extract knowledge from it
 Use FCA (with sparse labeling)

Computational Units
Scenarios
method1 method2 method3 method4
scenario1 x x x
scenario2 x x x
scenario3 x x

41

Dynamic Analysis
 The concept speciﬁc for a given feature is labeled by the
corresponding scenario.
 The most speciﬁc method for a concept are the ones in its label.
 “Dynamic analysis” focusses on those concepts that have both
scenarios and methods in their labels
Top

meth meth meth meth
C3
C2
1 2 3 4
scen1 x x x
C1
FCA
scen2 C0
x x x

scen3 x x
Bottom
Concept lattice with sparse labeling

42

Interpretation of the lattice
• A potential aspect is detected when the existing
modularity fails in dividing requirements
• It happens when:
• The specific methods for a use-case come from different classes
(scattering).
• The same class defines specific methods for more than one
use-case (tangling).
DocOption GraphCanvas Options GraphAlgorithm

Draw

Documentation
Algorithm

43

A small case study :
Dijkstra algorithm
• Small size application (1068 LOC) easy to analyze
• Many features: interesting case study

44


Draw

Documentation
Algorithm

45

GraphAlgorithm.unlock()
GraphAlgorithm.lock()
Options.unlock()
Options.lock()
GraphCanvas.lock()
GraphCanvas.unlock()
GraphCanvas.runalg()
GraphCanvas.detailsDijkstra(Graphics,int,int)
GraphCanvas.endstepalg(Graphics)
GraphCanvas.detailsalg(Graphics,int,int)
Discovered aspect : “locking” GraphCanvas.endstepDijkstra(Graphics)
- tangled GraphCanvas.stepalg()
GraphCanvas.reset()
- scattered GraphCanvas.nextstep()
- can be associated a well-deﬁned functionality GraphCanvas.clear()
GraphCanvas.showexample()
- but is not main functionality (“algorithm”) GraphCanvas.initalg()
GraphCanvas.run()


Draw

Documentation
Algorithm

46

• 27 elements (use cases)
draw a rectangle, draw a line with the scribble tool, create a
connector between two figures, ...

• 1262 properties
JHotDraw methods executed by running the scenarios

• concept lattice with 1514 nodes
• 11 were classified as use-case specific aspects
• 56 as generic aspects
• these were revisited manually to determine plausible aspects
• that can be associated to a single well-defined functionality
• that is not the main functionality of the involved classes

47

Summary of results of dynamic analysis experiment
Aspect Concepts Methods CH.ifa.draw.figures:
EllipseFigure.basicMoveBy(int,int)
Undo 2 36 PolyLineFigure.basicMoveBy(int,int)
RectangleFigure.basicMoveBy(int,int)
Bring to front 1 3
RoundRectangleFigure.basicMoveBy(int,int)
Send to back 1 3 TextFigure.moveBy(int,int)
Connect text 1 18
CH.ifa.draw.standard:
AbstractFigure.moveBy(int,int)
Persistence 1 30
DecoratorFigure.moveBy(int,int)
Manage Handles 4 60
Move figure 1 7
Command executability 1 25
Connect figures 1 55
Figure observer 4 11
Add text 1 26
Add URL to figure 1 10 48

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

49

“Fan-in analysis”

Identifying Aspects
using
Fan-in Analysis

Based on a presentation by Marius Marin, Leon Moonen & Arie van Deursen (TUDelft)
50

Why fan-in analysis?
Fan-in analysis can help us to identify :
• Scattered code relying on some common functionality, e.g., persistence
• Tangled code, needed in various places, e.g. logging
• Some design patterns generate high fan-in values, e.g. Observer, Visitor
write(StorableOutput)

implementations

calls

StorableOutput.writeStorable(Storable)
Write to a storable output in JHotDraw
51

Identiﬁcation steps
1. Automatic computation of fan-in metric
for all methods in analysed application
2. Filtering the results
• Methods with fan-in < 10
• Getters & setters (name get*/set* and returns/sets a reference)
• Utility methods
3. Largely manual analysis
• Call sites
• Naming conventions used
• Implementation Eclipse
• Comments in source code plug-in
52

• Threshold fan-in : 10
• 7% of total # methods kept
• other ﬁlters removed
another 50%
• getters / setters
• utility methods

• 52% of remaining methods
were manually classiﬁed as
aspect seeds

53

Concern type # Seed’s description
Methods implementing the consistent behavior shared by different callers, such
Consistent behavior 4 as checking/refreshing ﬁgures/views affected by executing a command.

Method implementing a contract that needs to be enforced, such as checking
Contract enforcement 4 the reference to the editorʼs active view before executing a command.

Methods checking whether a command is undoable/redoable + undo method in
Undo 1 the superclass, which is invoked from the overriding methods in subclasses.

Methods implementing functionality common to persistent elements, such as
Persistence & resurrection 1 read/write operations for primitive types wrappers (like Double, Integer) which
are referenced by the scattered implementations of persistence/resurrection.

Command design pattern 1 The execute method in the command classes and command constructors.

The observersʼ manipulation methods and notify methods in classes acting as
Observer design pattern 1 subject.

The compositeʼs methods for manipulating child components, such as adding a
Composite design pattern 2 new child.

Decorator design pattern 1 Methods in the decorator that pass the calls on to the decorated components.

Methods that manipulate the reference from the adapter(Handle) to the
Adapter design pattern 1 adaptee(Figure).

54

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

55

Comparing the techniques
• Case study : JHotDraw
• Qualitative comparison of identiﬁed aspects
• Identiﬁed aspects : discovered / discarded / missed
• Quality and level of detail of discovered information
• Weaknesses and limitations of each of the techniques
• Complementarity and opportunities for combination

56

Results of the comparison (1)
• A selection of detected concerns in JHotDraw

Fan-in Identiﬁer Dynamic
Concern
analysis Analysis Analysis
Observer + + +
Consistent Behavior /
+ - -
Contract Enforcement
Command Execution + + -
Bring to front /
- - +
Send to back
Manage Handles - + +

Move Figures + (discarded) + +
57

• Observer, Undo, Persistence
• concerns reported by all 3 techniques
• correspond to well-known aspects
or functionality for which AOP is natural solution
• surprisingly few concerns were detected by all techniques
=> need for a combined technique

• Bring to front / Send to back
• not detected by fan-in analysis (fan-in value too low)
• not detected by identiﬁer analysis (#elements < threshold)
• detected by dynamic analysis
(corresponds to speciﬁc use-case scenario)

58

• Contract Enforcement / Consistent Behavior
• E.g., common functionality for checking preconditions
• Found by fan-in because many calls to ‘check’ methods
• Identiﬁer analysis misses cases when no common naming
scheme
• Also missed by dynamic analysis

• Command execution
• could be seen as particular case of Contract Enforcement
• all execute methods need to check that an active view exists
• found by identiﬁer analysis (and fan-in analysis)

59

• Manage Handles
• Partly detected by identifier analysis
• methods with identifier ‘handle’ appearing in their name
• missed specific methods north(), south(), east(), west()
• Partly detected by dynamic analysis
• detected specific methods
• and some (not all) of the handle methods
• Missed by fan-in analysis: calls too specific
• similar but distinct calls instead of one single called method
with high fan-in

60

Limitations of the techniques
• fails in absence of good naming conventions
• too many and too detailed results
=> better grouping / ﬁltering needed

• fails for functionality present in all execution traces
• completeness depends on coverage by scenarios

• Fan-in analysis
• only crosscutting with large extent
• false negatives due to ﬁltering

• All require quite some manual work
61

Combining the techniques
• Techniques rely on orthogonal properties
• suggests possibility of useful combinations, to

• Increase coverage
• by taking the union of discovered results (fan-in + dynamic)

• Complete the discovered aspect “seeds”
• with more methods relevant to the aspect (<= identiﬁer
analysis)

• Provide more coarse-grained aspects
• e.g., grouping of identiﬁer analysis concepts (<= fan-in /
dynamic)

• Discard irrelevant concepts

62

Future work
• More detailed comparison
of quality of discovered aspects

• Comparison with other aspect mining techniques
• Multi-technique tool
• fan-in, FCA, clone detection, slicing, ...
• to obtain a higher degree of automation
• and better quality of results

• JHotDraw as common benchmark
• Aspect refactoring

63

Aspect Mining
• Motivation
• Preliminaries
• fan-in
• fan-in analysis
• comparison
• Conclusion

64

Aspect mining...
• is a promising new research area
• can be (partly) automated with fairly simple
techniques
and combinations thereof

• is only the ﬁrst step...

But most of all... it’s fun :-)

65

Aspect Mining - An emerging research domain

Recommandé

Recommandé

Contenu connexe

Similaire à Aspect Mining - An emerging research domain

Similaire à Aspect Mining - An emerging research domain (6)

Plus de kim.mens

Plus de kim.mens (20)

Dernier

Dernier (20)

Aspect Mining - An emerging research domain