Gender Diversity and Inclusion and Software Engineering
Icsm 2011 you can't control the unfamiliar
1. Metrics are usually computed at a low level:
classes, methods, …
/ W&I / MDSE 3-11-2012 PAGE 0
2. Multitude of data values obscures a general
picture of the system maintainability
/W&I / MDSE 3-11-2012 PAGE 1
3. That we are actually interested in!
/W&I / MDSE 3-11-2012 PAGE 2
4. You Can't Control the Unfamiliar:
A Study on the Relations
Between Aggregation
Techniques for Software Metrics
Bogdan Vasilescu
Alexander Serebrenik
Mark van den Brand
5. Two kinds of aggregation
Same metrics, different Same artifact, different
artifacts metrics
/W&I / MDSE 3-11-2012 PAGE 4
6. Various techniques can be
found in the literature
Same metrics, different Traditional: mean,
artifacts median, sum, …
Econometric
inequality indices:
Gini, Theil, Hoover,
Kolm, Atkinson
/W&I / MDSE 3-11-2012 PAGE 5
7. Various techniques can be
found in the literature
Same metrics, different Traditional: mean,
artifacts median, sum, …
Which
aggregation
Econometric
technique
inequality indices:
Gini, Theil, Hoover,
should we
Kolm, Atkinson
use?
/W&I / MDSE 3-11-2012 PAGE 6
8. Questions
1. Which and to what extent do the different
aggregation techniques agree?
2. What is the nature of the relation between the
various aggregation techniques?
3. How does the correlation coefficient change as the
systems evolve?
/W&I / MDSE 3-11-2012 PAGE 7
9. Qualitas Corpus 20101126
• Qualitas Corpus 20101126r, 106 systems
• FitJava v1.1, 2 packages, 2240 SLOC
• NetBeans v6.9.1, 3373 packages 1890536 SLOC.
/W&I / MDSE 3-11-2012 PAGE 8
10. 1) Agreement between diff techniques
• Agreement:
• Aggregation: Class SLOC Package
• Techniques agree if they rank the packages similarly
We use rank-based correlation coefficient: Kendall’s
/W&I / MDSE 3-11-2012 PAGE 9
11. 1) Agreement: different inequality indices?
• Gini, Theil, Hoover, Atkinson – agree
• aggregates obtained convey the same information
• Kolm does not!
/W&I / MDSE 3-11-2012 PAGE 10
12. 1) Agreement: traditional and ineq indices?
• mean
• Kolm: strong (0,8) and statistically significant (92%)
• median, standard deviation, and variance
• sum
• does not correlate with any other aggregation technique
/W&I / MDSE 3-11-2012 PAGE 11
13. 2) Nature of the relation: Typical patterns
• Theil is known to be more • Linear relation with a “fat”
sensitive to the rich head
• Theil increases faster
when Gini increases
/W&I / MDSE 3-11-2012 PAGE 12
14. Which aggregation technique? (1)
• Theil, Hoover, Gini and Atkinson agree
• Any can be chosen from the correlation point of view
• Some might be “better” in each specific case
• easy to interpret: Gini [0,1]
• provide additional insights: Theil (explanation)
• negative values: Gini, Hoover
− affects the domain!
• sensitive for high values: Theil, Atkinson
• deviations from uniformity: Gini, Hoover
/ W&I / MDSE 3-11-2012 PAGE 13
15. Which aggregation technique? (2)
• Kolm and mean agree
• Kolm is reliable for skewed distributions
− better alternative (“by no means”)
• Not in the paper:
− agreement observed for NOC
− but not for DIT!
/ W&I / MDSE 3-11-2012 PAGE 14