2. S.Ducasse LSE
A word of presentation
Co-author of Object-Oriented Reengineering Patterns
Co-developer of Moose (reengineering platform)
10 PhD Theses in reengineering
50+ articles
Grounded in reality
Maintainer of open-source
projects
Worked with:
Harman-Becker AG
Bedag AG
Nokia, Daimler
2
3. S.Ducasse LSE
Roadmap
• Some software development facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Visual principles in 3 min
• Some visual examples
• Conclusion
3
4. S.Ducasse LSE
Software is complex.
The Standish Group, 2004
53% Challenged
18% Failed
29% Succeeded
4
6. S.Ducasse LSE
How large is your project?
1’000’000 lines of code
* 2 = 2’000’000 seconds
/ 3600 = 560 hours
/ 8 = 70 days
/ 20 = 3 months
6
7. S.Ducasse LSE
Maintenance is Continuous Development
7
Relative Maintenance Effort
Between 50% and 75% of global
effort is spent on
“maintenance” !
17.4% Corrective
(fixing reported errors)
18.2% Adaptive
(new platforms or OS)
60.3% Perfective
(new functionality)
4.1% Other
The bulk of the maintenance cost is due to new functionality
even with better requirements, it is hard to predict new functions
9. S.Ducasse LSE
Software are living…
Early decisions were certainly good at that time
But the context changes
Customers change
Technology changes
People change
9
10. Software development
is more than forward engineering.
Forward
engineering
Actual development }
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
11. Maintenance is
is needed to evolve the code.
Reverse
engineering
Forward
engineering
Actual development }
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
12. S.Ducasse LSE
Roadmap
• Some software development facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Visual principles in 3 min
• Some visual examples
• Conclusion
12
13. S.Ducasse
Help teams maintaining large software
What is the xray for software?
code, people, practices
Which analyses?
How can you monitor your system (dashboards....)
How to present extracted information?
13
18. S.Ducasse LSE
How properties spread in large systems?
Properties:
Metrics
People
Symbol/Concepts
Spread = how many packages does it touch?
Focus = do packages and properties match?
Distribution Map:
a generic visualization
18
20. Moose is designed to be extensible
Method Class
Inheritance
Method Class
Inheritance
Author
File
Duplication
Event
Trace
Class
Version
Class
History
open
meta-described
21. S.Ducasse LSE
Moose has been validated on real life systems
Several large, industrial case studies (NDA)
Harman-Becker
Nokia
Daimler
Siemens
Different implementation languages (C++, Java, Smalltalk,
Cobol)
We use external C++ parsers
Different sizes
Moose is used in several research groups
21
22. S.Ducasse LSE
Visualization principles in 3 min
• Preattentive visualization (unconscious < 200ms)
• Gestalt principles (from 1912)
• 70% of our sensors are dedicated to vision
22
23. Tudor Gîrba
How many 5?
23
3332123466509000096766689877835367
7866750910919818971746453039821768
34567865860880221167687687789762
345678915116718199101081876616161
61819010180980808097767674333
24. Tudor Gîrba
How many 5?
24
3332123466509000096766689877835367
7866750910919818971746453039821768
34567865860880221167687687789762
345678915116718199101081876616161
61819010180980808097767674333
25. Tudor Gîrba
Preattentive attributes
25
Color intensity
Form: orientation, line length, line width, size, shape,
added marks, enclosure
Spatial position (2D location)
Motion (flicker)
44. S.Ducasse LSE
Roadmap
• Some software development facts
• Our approach
• Supporting maintenance
• Moose an open-platform
• Visual principles in 3 min
• Some visual examples
• Conclusion
44
45. S.Ducasse LSE
Challenges inVisualization
Screen size
Max 12 colors
Edge-crossing
Limited short-term memory (three to nine)
Extracting semantics out
Beauty cannot be a goal
Get some help from
Gestalt principles
pre-attentive visualization
45
46. S.Ducasse LSE
Understanding large systems
Understanding code is difficult!
Systems are large
Code is abstract
Should I really convinced you?
Some existing approaches
Metrics: you often get meaningless results once
combined
Visualization: often beautiful but with little meaning
46
48. S.Ducasse LSE
Polymetric views condense information
48
Classes+Inheritance
W: # of Added Methods
H: # of Overridden Method
C: # of Method Extended
To get a feel of the inheritance
semantics: adding vs. reusing
methods
LOC
# statements
# parameters
55. S.Ducasse LSE
How developers develop?
• More efficient to put people working together in the
same office?
• How can we optimize software development?
55
57. S.Ducasse LSE
Line colors show which author owned
which files in which period
57
File A
File B
Green author
large commit
Green author
ownership
Blue author
small commit
60. S.Ducasse LSE
Based on similar commit signature
60
DialogueMonologue
Edit Takeover
Familiarization
61. S.Ducasse LSE
Understanding evolution of large systems
• How old are the hierarchies?
• How did the classes change?
• How did the inheritance change?
61
62. S.Ducasse LSE
Evolution holds useful information
62
A
B
A
BC
A
BC
D
A
BC
D
A
D
time
B is stable
C was removed
E is newborn
A is persistent
D inherited from C and then from A …
63. S.Ducasse LSE
Hierarchy Evolution Complexity View
characterizes class hierarchy histories
63
B is stable
C was removed
E is newborn
A is persistent
D inherited from C and then from A …
A
B
E
C
D
ENOS
Removed
Age
Removed
Age Inheritance
History
Class
History
ENOM
65. S.Ducasse LSE
Identifying Duplicated Code
“Parsing the program suite of interest requires a parser for the
language dialect of interest.While this is nominally an easy task, in
practice one must acquire a tested grammar for the dialect of the
language at hand. Often for legacy codes, the dialect is unique and the
developing organization will need to build their own parser.Worse,
legacy systems often have a number of languages and a parser is
needed for each. Standard tools such as Lex andYacc are rather a
disappointment for this purpose, as they deal poorly with lexical
hiccups and language ambiguities.” [Baxter 98]
Problems
Unknown Duplicated Code
Scalability
Understanding
65
66. S.Ducasse LSE
Language Independent
Language independent,Textual,
[ICSM’99], M. Rieger’s PhD.Thesis
Duploc handled
Pascal, Java, Smalltalk, Python,
Cobol, C++, PDP-11, C
Slower than other approaches but...
Max 45 min to adapt our approach to
a new language
Between 3% and 10%
less identification than parametrized match
66
Exact Copies
a b c d e f a b c d e f
Copies with
a b c d e fa b x y e f
67. S.Ducasse LSE
A Conceptual MatrixFile A
File A
File B
File B
Exact Copies
a b c d e f a b c d e f
Copies with
a b c d e fa b x y e f
Variations
67
69. S.Ducasse LSE
We are interested in your problems!
• Remodularization/Repackaging
• SOA - Service Identification
• Architecture Extraction/Validation
• Software Quality
• Cost prediction
• EJB Analysis
• Business rules extraction
• Model transformation
69
70. S.Ducasse LSE
Evolution is difficult
• We are interested in your problems!
• Moose is open-source, you can use it, extend it, change
it
• We can collaborate!
70
}
{
}
{
}
{
}
{
}
{
NOM > 10 &
LOC > 100