Master Thesis presentation

Analysis of Advanced
Aggregation Techniques
for Software Metrics
Final presentation

Bogdan Vasilescu
b.n.vasilescu@student.tue.nl
Supervisor: Dr. Alexander Serebrenik

July 20, 2011

Analysis of advanced aggregation techniques for software metrics 2/32

Most metrics do not have a deﬁnition at system level.

/ department of mathematics and computer science


“Designing a sound aggregation of software metrics is not
obvious and it is still an open issue.” [CSS09]



“Designing a sound aggregation of software metrics is not
obvious and it is still an open issue.” [CSS09]

Goal
Derive requirements for aggregation techniques for software
metrics.


Aggregation of software metrics 4/32

Many to one:
Same artifact
Different metrics
Example:
Maintainability Index


Aggregation of software metrics 4/32

Many to one:
Same artifact
Different metrics
Example:
Maintainability Index

One to many:
Same metric
Different artifacts
Example:
Weighted Methods per
Class


Approach 5/32

Derive requirements for one-to-many
aggregation techniques for software metrics


Approach 5/32

Study existing
aggregation techniques:
- traditional (e.g., mean, median)
- inequality indices (e.g., Gini, Theil)
- threshold-based (e.g., SIG, Squale)

Theoretical Empirical
analysis analysis



Inequality indices 6/32

Econometrics: measure/explain the inequality of income or wealth.

Software metrics and econometric variables have distributions with
similar shapes.

Source Lines of Code: freecol−0.9.4 Household income in Ilocos, Philippines (1998)

100 200 300 400 500
400
300
Frequency

Frequency
200
100
0

0

0 500 1000 1500 2000 2500 3000 0 500000 1500000 2500000

SLOC per class Income


Degree of concentration of functionality 7/32

Lorenz curve for SLOC in Hibernate
3.6.0-beta4.
1.0
0.8
0.6
% SLOC

0.4
0.2
0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

% Classes



Lorenz curve for SLOC in Hibernate
3.6.0-beta4.

A
2A
A+ B =
I Gini =

I Hoover

A
B



Lorenz curve for SLOC in Hibernate Measure inequality between:
3.6.0-beta4. individuals
(e.g., classes)
A groups
2A
A+ B =
I Gini =
(e.g., components)

I Hoover

A
B



When computing the inequality Measure inequality between:
within the entire population, it is individuals
often desirable to assess the (e.g., classes)
contribution of the inequality groups
between the groups. (e.g., components)

Decomposability:

I (X ) = I within + I between
m
= ωj I (Xj ) + I between
j =1


Traceability via decomposability 8/32

Share of inequality explained by the partitioning G = {G1 , . . . , Gm }:

I between (G )
R (G ) =
I (X )



Share of inequality explained by the partitioning G = {G1 , . . . , Gm }:

I between (G )
R (G ) =
I (X )

Which individuals (classes in package) contribute to 80% of the
inequality of SLOC?
Which class contributes the most to the inequality?



Lemma
Let X = {x1 , x2 , . . . , xn } be a collection of values such that x1 ≤ xi ≤ xn .
Then, it is either x1 or xn that contributes the most to the inequality
measured using ITheil , i.e., it is either the partitioning ({x1 }, X {x1 }) or
the partitioning ({xn }, X {xn }) that provides the best explanation for
the inequality measured using ITheil .


Other properties of inequality indices 9/32

Symmetry

Inequality stays the same for any permutation of the population.



Population principle

Inequality does not change if the population is replicated any number of
times.



Transfers principle

A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.



Transfers principle

20 36 45

30 36

A transfer from a rich man to a poor man (without reversing their
position) should decrease inequality.



Scale invariance

Inequality does not change if all values are multiplied by the same
constant.

Summary 13/32

Ineq. index Sym. Inv. Dec. Pop. Tra.
IGini ×
ITheil ×
IMLD ×
IHoover ×
α
IAtkinson ×
β
IKolm +

Problems include:
Domain not always Rn .
No distinction between all values equal but low, and all values
equal but high.


Threshold-based aggregation techniques 14/32

Two types:
hard thresholds: improvements in quality are not reﬂected as long
as the metrics stay within certain boundaries (e.g., SIG).
soft thresholds: do not exhibit staircasing effects (e.g., Squale).


The Squale Quality Model 15/32

Metrics

Individual Marks
in [0,3]

Global Mark
in [0,3]

The Squale Quality Model 15/32

3.0

Individual Mark (IM)
2.5

2.0

1.5

1.0

0.5

Metrics 0.0

0 10 20 30 40 50 60 70 80 90 110 130 150 170

SLOC per method

Individual Marks
in [0,3]

Global Mark
in [0,3]

Properties of Squale aggregation 16/32

Symmetry

Population princ.
20 36 45

30 36

Anti-transfers princ.


Properties of Squale aggregation 17/32

Lemma
log λ λ
IKolm (x1 , . . . , xn ) + ISquale (x1 , . . . , xn ) = x
¯

Lemma
λ
For all c ∈ R it holds that ISquale is “unit translatable”, i.e.,

λ λ
ISquale (x1 + c, . . . , xn + c) = ISquale (x1 , . . . , xn ) + c

Inequality indices are invariant with respect to either multiplication, or
addition.


Summary 18/32

We distill:
Highlighting undesirable values in the aggregated result.

However, problems include:
Thresholds should be derived and validated.
A high rating is not necessarily an indication of good software
engineering practices.
Not decomposable.


Approach 19/32

Study existing
aggregation techniques:
- traditional (e.g., mean, median)
- inequality indices (e.g., Gini, Theil)
- threshold-based (e.g., SIG, Squale)

Theoretical Empirical
analysis analysis



Empirical evaluation 20/32


Pilot study 21/32

Aggregate SLOC from class to package level.
Study statistical correlation between
aggregation techniques and
number of defects per package.
pairs of aggregation techniques.
Case studies: ArgoUML, Adempiere, Mogwai.
Questions:
Does aggregation technique inﬂuence correlation with bugs?

Which aggregation techniques convey the same information?


Pilot study 21/32

Aggregate SLOC from class to package level.
Study statistical correlation between
aggregation techniques and
number of defects per package.
pairs of aggregation techniques.
Case studies: ArgoUML, Adempiere, Mogwai.
Questions:
Does aggregation technique inﬂuence correlation with bugs?
• Correlation between SLOC and defects is not strong, and is
inﬂuenced by the aggregation technique.
Which aggregation techniques convey the same information?
• IGini , ITheil , IMLD , IHoover , and IAtkinson convey the same information.


Threats to validity 22/32

Threat Pilot
Metric SLOC
ArgoUML
System Adempiere
Mogwai
Version single
Technique traditional
ineq. indices
Aggr. level class–package


Threats to validity 22/32

Threat Pilot Subsequent studies
Metric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC
ArgoUML Qualitas Corpus
System Adempiere 106 Java open-source systems
Mogwai 430K ﬁles, 57 MSLOC
Version single 414 from 13/106 systems (> 10 versions)
Technique traditional traditional, ineq. indices, threshold-based
ineq. indices
Aggr. level class–package class-package, method–class


Results (1) 23/32

IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information.
1.0
0.5
SLOC

0.0
-0.5
-1.0

(91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%)

MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk
1.0
0.5
DIT

0.0
-0.5
-1.0

(85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%)

MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk


Results (2) 24/32

IKolm shows high correlation with mean for size metrics.

Kendall corr.: mean - Kolm (SLOC) Kendall corr.: mean - Kolm (DIT) Kendall corr.: mean - Kolm (PLwC)
1.0

1.0

1.0
0.5

0.5

0.5
Kendall correlation coefficient


0.0

0.0

0.0
-0.5

-0.5

-0.5
-1.0

-1.0

-1.0


Results (3) 25/32

Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can
be observed in the scatter plots.

compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00 compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01
1.0

1.0
0.8

0.8
Theil (SLOC)

Theil (SLOC)
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
0.1 0.2 0.3 0.4 0.5 0.6 0 50 100 150 200 250 300 350

Gini (SLOC) Kolm (SLOC)


Results (4) 26/32

Changing the aggregation level to class level does not affect the
correlation between various aggregation techniques as measured at
package level.

Kendall: Gini - Theil (SLOC) (100%) Kendall: Theil - Atkinson (SLOC) (100%) Kendall: Theil - MLD (SLOC) (100%)
1.0

1.0

1.0
0.5

0.5

0.5


0.0

0.0

0.0
-0.5

-0.5

-0.5
-1.0

-1.0

-1.0


/
Cor. coeff. Theil(SLOC) − Kolm(SLOC)

0.0 0.2 0.4 0.6 0.8 1.0

0.8.1
1.0
1.1
2.0−beta−1
2.0−beta−2
2.0−beta−3
2.0−beta−4
2.0−final
2.0−rc2
2.0.1
Results (5)

2.0.2
2.0.3
2.1−beta−1
2.1−beta−2
2.1−beta−3
2.1−beta−3b
2.1−beta−4
2.1−beta−5
2.1−beta−6
2.1−final
2.1−rc1
2.1.1
2.1.2
2.1.3
2.1.4
2.1.5
2.1.6
2.1.7

department of mathematics and computer science
2.1.8
3.0
3.0−alpha
3.0−beta1
3.0−beta2
3.0−beta3
3.0−beta4
3.0−rc1
3.0.1
3.0.2
3.0.3
3.0.4
3.0.5
3.1
3.1−alpha1
3.1−beta1
3.1−beta2
3.1−beta3
3.1−rc1
3.1−rc2
3.1−rc3
3.1.1
3.1.2
3.1.3
3.2−alpha1
3.2−alpha2
3.2−cr1
3.2−cr2
3.2.0−cr3
3.2.0−cr4
3.2.0−cr5
3.2.0.ga
3.2.1−ga
3.2.2−ga
3.2.3−ga
hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)

3.2.4−ga
3.2.4−sp1
3.2.5−ga
3.2.6−ga
techniques, e.g., ITheil –IKolm increases with system size.

3.2.7−ga
3.3.0−cr2
3.3.0−ga
3.3.0−sp1
3.3.0.cr1
3.3.1−ga
3.3.2−ga
3.5.0−beta−1
3.5.0−beta−2
3.5.0−beta−3
3.5.0−beta−4
3.5.0−cr−1
System size does inﬂuence the correlation between aggregation

3.5.0−cr−2
3.5.3−final
3.5.5−final
3.6.0−beta1
3.6.0−beta2
3.6.0−beta3
3.6.0−beta4
27/32

Results (6) 28/32

SIG and Squale correlate positively to each other and negatively to all
other aggregation techniques.

Kendall: Squale(3) - SIGd (SLOC) (95%) Kendall: Gini - Squale(3) (SLOC) (95%) Kendall: Theil - Squale(3) (SLOC) (95%)
1.0

1.0

1.0
0.5

0.5

0.5


0.0

0.0

0.0
-0.5

-0.5

-0.5
-1.0

-1.0

-1.0


Results (7) 29/32

Inequality indices are less appropriate for highlighting undesirable
values unless assumptions about their number can be made.
Squale (weight = 3) aggregate for different percentages of perfect IMs Theil aggregate for different percentages of perfect IMs
3.0 3.0 0.0 3.0
Average Squale (weight = 3) mark

2.5 2.5 2.5
0.5

Average Theil aggregate
Average mean range

Average mean range
2.0 2.0 2.0

1.0
1.5 1.5 1.5

1.0 1.0 1.5 1.0
range [2, 3)
range [2, 3) range [1, 2)
0.5 0.5 range [0.5, 1) 0.5
range [1, 2)
2.0 range [0.1, 0.5)
range [0.5, 1)
range [0.1, 0.5) range (0, 0.1)
0.0 range (0, 0.1) 0.0 0.0

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Percentage of imperfect marks Percentage of imperfect marks

Kolm aggregate for different percentages of perfect IMs
0.0 3.0

2.5
0.2
Average Kolm aggregate

Average mean range
2.0
0.4

1.5

0.6
1.0
range [2, 3)
0.8 range [1, 2)
range [0.5, 1) 0.5
range [0.1, 0.5)
range (0, 0.1)
1.0 0.0

0 10 20 30 40 50 60 70 80 90 100
/ department of mathematics and computer science Percentage of imperfect marks

Summary 30/32

We distill:
Correlation with Squale or SIG for aggregation techniques that
satisfy the highlight problems requirement.
Correlation with ITheil , IMLD , or IAtkinson , e.g., for aggregation
techniques that satisfy the symmetry and decomposability
requirements.


Conclusions 31/32

Existing aggregation techniques

Empirical analysis
Theoretical analysis
- methodology and tooling
- root-cause analysis using - correlation studies with different
- mathematical properties of objectives, metrics, systems,
versions, aggregation techniques,
aggregation levels

Requirements for one-to-many


Conclusions 31/32

Existing aggregation techniques

Empirical analysis
Theoretical analysis
- methodology and tooling
- root-cause analysis using - correlation studies with different
- mathematical properties of objectives, metrics, systems,
versions, aggregation techniques,
aggregation levels

Requirements for one-to-many
Social organization
Determine an optimal partitioning of software projects
Extensions:
- other software metrics
- non-software domains
Apply the same techniques to
aggregation of combined metrics data New one-to-many aggregation
techniques for software metrics


Publications 32/32

You Can’t Control the Unfamiliar:
Comparative Study of Software Metrics’ Aggregation Techniques
A Study on the Relations Between Aggregation
Techniques for Software Metrics
Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand
Technische Universiteit Eindhoven, Bogdan Vasilescu, Alexander Serebrenik, Mark van den Brand
Den Dolech 2, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
Technische Universiteit Eindhoven,
Den Dolech 2, P Box 513,
.O.
5600 MB Eindhoven, The Netherlands
{b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl

Abstr act
While software metrics are commonly used to assess software maintainability and study software evolution, they are Abstract—A popular approach to assessing software main- However, metrics are usually defined at micro level (method,
usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to provide tainability and predicting its evolution involves collecting and class, package), while the analysis of maintainability and
By No Means: A Study on Aggregating Software Metrics
insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the analyzing software metr ics. However, metr ics are usually defined
on a micro-level (method, class, package), and should therefore
evolution requires insights at macro (system) level. Moreover,
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE
mean, recently econometric aggregation techniques such as the Gini index and the Theil index have been proposed. be aggregated in or der to provide insights in the evolution at the due to privacy reasons, it J. Softw. Maint. Evol.: Res. to disclose00:1–15
might be undesirable Pract. 0000;
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr
Advantages and disadvantages of di erent aggregation techniques have not been evaluated empirically so far. In this macro-level (system). I n addition to tr aditional aggregation tech- metrics pertaining to a single developer as opposed to those
paper we present the preliminary results of the comparative study of di erent aggregation techniques.Alexander Serebrenik
Bogdan Vasilescu Mark van den Brand niques such as the mean, median, or sum, recently econometr ic pertaining to the entire project [10]. Metrics should therefore
Technische Universiteit Technische Universiteit Technische Universiteit aggregation techniques, such as the Gini, Theil, Kolm, Atkinson, be aggregated [11].
Keywords: and Hoover inequality indices have been proposed and applied
Eindhoven Eindhoven Eindhoven Popular aggregation techniques include such standard sum-
software metrics, maintainability, aggregation techniques Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, to software metr ics.
5600 MB Eindhoven 5600 MB Eindhoven 5600 MB Eindhoven I n this paper we present the results of an extensive cor relation Practical Software Quality Metrics Aggregation
mary statistical measures as mean, median, or sum [12], [13].
study of the most widely-used tr aditional and econometr ic aggre- Their main advantage is universality (metrics-independence):
The Netherlands The Netherlands The Netherlands gation techniques, applied to lifting SL OC values from class to whatever metrics are considered, the measures should be cal-
b.n.vasilescu@student.tue.nl a.serebrenik@tue.nl m.g.j.v.d.brand@tue.nl package level in the 106 systems compr ising the Qualitas Cor pus. culated in the same way. However, as the distribution of many
1. I ntroduction M oreover, we investigate the nature of this relation, and study Karine Mordal 1 , Nicolas Anquetil 2 , Jannik Laval 2 , Alexander Serebrenik3 , Bogdan
interesting software metrics is skewed [14], the interpretation
ABSTRACT (source) lines of code, (S)LOC. Size (SLOC) not onlyits evolution on a subset of 12 systems from the Qualitas Cor pus.
corre- of such measures becomes unreliable [15].
Vasilescu3 , and St´ phane Ducasse2
e
While software metrics are commonly used to assess software maintainability and study software evolution, they sponds to the intuitive belief that large systems have more results indicate high and statistically significant cor re-
Our
Fault prediction models usually employ software metrics which
are usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to faults in them than small systems, but was shown lation between the Gini, Theil, Atkinson, and Hoover indices, Alternatively, distribution fitting [14], [16], [17] consists of1
to act LIASD, University of Paris 8, France
were previously shown to be a strong predictor for defects,
provide insights in the evolution at the macro-level (system). Popular aggregation techniques include themicro- [15] as an early indicator of problems better than, e.g., object- i.e., aggregation values obtained using these techniques convey selecting a known family of distributions (e.g., log-normal 2 RMoD Team, INRIA, Lille, France
e.g., SLOC. However, metrics are usually de ned on a mean the same infor mation. However, we discuss some of the r ationale or exponential) and fitting its parameters to approximate the Universiteit Eindhoven, The Netherlands
3 Technische
and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics are oriented metrics such as the Chidamber and Kemerer suite choosing between one index or another.
level (method, class, package), and should therefore be ag- behind
considered, the mean should be calculated in the same way. However, as the distribution of manyevolution atsoftware or the Lorenz and Kidd suite [9]. metric values observed. The fitted parameters can be then
gregated in order to provide insights in the interesting the
Distribution fitting consists of selecting a known family of distri- However, software metrics are commonly de ned at micro-
metrics is skewed [24] the mean becomes unreliable. macro-level (system). In addition to traditional aggrega- seen as aggregating these values. However, the fitting process
I . I NTRODUCTI ON
level (method, class, package), and should therefore be ag-
butions (e.g., log-normal, exponential or negativebinomial) and fitting its parameters to approximate the metric values gregated at macro-level (system), in order to provide insights
tion techniques such as the mean, median, or sum, recently should be repeated whenever a new metric is being consid-
Software maintenance is an area of software engineering ered. Moreover, it is still a matter of controversy whether,
observed. However, the fitting process should be repeated whenever a new metric is beingsuch as the Gini, Theil, it is in the study of maintainability and evolution.
econometric aggregation techniques, considered. Moreover, SUMMARY
and Hoover indices have been proposed. In this paper we with deep financial implications. Indeed, it was reported that e.g., software size is distributed log-normally [16] or double
still a matter of controversy whether, e.g., software size is distributed log-normally [4] or double Pareto [11]. Popular aggregation techniques include such standard sum-
wish to understand whether the aggregation technique in-
It is highly desirable, hence, to develop an aggregation approach that would be bothof the relation between of mary statistical measures as mean, median, or sum [19].
reliable and independent
between 60% and 90% of the software budgets represent main- Pareto [18]. We do not consider the growing fitting. quality assessment of entire software systems, in practice, new issues are
With distribution need for
uences the presence and strength
the metrics being aggregated. Examples of such approaches are the Gini coe cientindicate that correlation is[22], Their main advantage is universality (metrics-independence):
tenance and evolution costs [1]–[3]. Furthermore, maintenance Recently, there is an emerging trendFirst, since most software quality metrics are defined at the level of individual software
emerging. in using more advanced
SLOC and defects. Our results [10] and the Theil index components, there is a need for aggregation methods to summarize the results at the system level. Second,
whatever metrics are considered, the measures should be and evolution costs were forecasted to account for more than aggregation techniques borrowed practical evaluation requires the use of different metrics, with possibly widely varying output ranges,
since a from econometrics, where
both well-known in econometrics [6] and recently not strong, software metrics [23, 20]. Comparison of di erent
applied to and is in uenced by the aggregation technique.
calculated in the same way. However, as the distribution of North American and European software budgets in
half of they are used to study inequality of a need to combine distribu-
there is income or welfare these results into a unified quality assessment. Third, since projects vary and
aggregation techniques was so far missing, however. In this short paper we present the first preliminary results.
many interesting software metrics is skewed [29], the2010 [4]. Similar or even higher figures were reported for
inter- tions [19]–[21]. The motivation for organizations have different perceptions on quality, there is a need to adapt the interpretation of the
different applying such techniques
Categor ies and Subj ect Descr iptor s
Remainder of thispaper isorganized asfollows. In Section 2 webriefly introducetheaggregation techniquesbeing pretation of such measures becomes unreliable.
countries such as Norway [5] and Chile [6]. quality assessment to the perception of
to software metrics is twofold. First, as numerous countries the users performing it. In this paper we identify the requirements for
compared. Section 3 compares the theoretical properties of di erent aggregation techniques. Section 4 described the Alternatively, distribution tting [6, 26, 29] consists of se-
D.2.7 [Software Engineering]: Distribution, Maintenance, a practical aggregation method, and present the Squale model for metric aggregation, specifically designed
empirical studies conducted and, finally, Section 5 discusses related work and concludes. [Software Engineer- Controlling software maintenance costs requires predicting have few rich and many poor, numerous software systems
and Enhancement corrections; D.2.8 lecting a known family of distributions (e.g., log-normal or to address the needs of practitioners. We empirically validate the adequation of Squale through experiments
exponential) and tting its parameters to approximate the
how the system will evolve in the future, which in turn have few very big or complex Eclipse. Additionally, wesmall or the Squale model to both traditional aggregation techniques (e.g., the
on components, and many compare
ing]: Metrics complexity measures
metric values observed. The tted parameters can be then a better understanding of software evolution [7]–[9].
requires simple ones [15], [22], [23]. Consequently, it is commoneconometric inequality indices (e.g., the Gini or the Theil indices), recently
arithmetic mean), as well as to both
2. Aggregation techniques considered as aggregating these values. However, the A ttingpopular approach to assessing software maintainability and for software metrics, as well as for econometric variables metrics. Copyright c 0000 John Wiley & Sons, Ltd.
applied to aggregation of software to
Gener al Ter ms process should be repeated whenever a new metric predicting its evolution involves performing measurements on
is be- have strongly-skewed distributions (Figure 1).
In this section we briefly present the mathematical definitions of the aggregation techniques to be evaluated. Let ing considered. Moreover, it is still a matter of controversy
Measurement, Economics, Experimentation code artifacts. It starts off by identifying a number of specific Second, the shape of these distributions, which appear
Received . . .
{x1, . . . , xn} be the set of values to be aggregated. Then, the mean, denoted as x, is defined as 1 n xi .
¯ n i=1 whether, e.g., software size is distributed log-normally [6] or
properties of the system under investigation, and then collect- visually to follow a power law, renders the use of traditional
Keywor ds double Pareto [14]. ing the corresponding software metrics and analyzing their KEY WORDS: software metrics; software quality; aggregation; inequality indices
aggregation techniques such as the sample mean and variance
Recently, there is an emerging trend in using more ad-
∗ Corresponding author Software metrics, maintainability, aggregation techniques evolution. Although it is debatable whether one cannot control
vanced aggregation techniques, that are both reliable, as well
questionable at best. Indeed, it was reported that many impor-
Email addresses: b.n.vasilescu@student.tue.nl (Bogdan Vasilescu), a.serebrenik@tue.nl (Alexander Serebrenik), what one cannot measure, it is without a doubt that collecting
as general. Examples of such approaches are the Gini coe - tant relationships between software artifacts follow a power-
m.g.j.v.d.brand@tue.nl (Mark van den Brand) 1. I NTRODUCTI ON and analyzing metrics helps increase one’s familiarity and
cient [11], the Theil index [28], and the Hoover index [15], all law distribution [16], [25], and it is known that a power-law
Software maintenance is an area of software engineering well-known in econometrics for their applicability to understanding of the analyzed systems.
study- distribution may not have a finite mean and variance [22]. 1. INTRODUCTION
with deep nancial implications. Indeed, it was reported ing income inequality [7], and recently applied to software
Preprint submitted to Elsevier that up to 90% of the software budgets represent mainte- 2011 metrics [27, 30, 13, 31].
June 27, Software metrics are becoming part of the software development fabric, essential to understanding
nance and evolution costs [10, 3]. Thus, in order to control In this preliminary study, based on the assumption that whether the quality of the software we are building corresponds to our expectations [Pfl08]. As
size is a good predictor for defects, hence size and defects
software maintenance costs, it is desirable, e.g., to predict a consequence, many different metrics have been proposed, as well as a plethora of tools to
faulty components early in the development phase. should be statistically related, we wish to understand whether
the aggregation technique in uences the presence and strength computethem and perform quality assessments. Considering thedifferent stakeholdersparticipating
Fault prediction models usually employ software metrics
which were previously shown to be a strong predictor for de- of this relation. Brie y, our results indicate that correlation in software projects (e.g. developers, managers, users), quality needs to be evaluated at different
fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in between SLOC and defects is not strong, and is in uenced levels of detail. Practical application of software metrics is, however, challenged by (i) the need
by the aggregation technique. to combine different metrics as recommended by quality-model design methods such as Factor-
Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) the need to obtain
2. M ETHODOL OGY insights in quality of the entire system based on the metric values obtained for low-level system
Permission to make digital or hard copies of all or part of this work for elements such as classes and methods; and (iii) the need to fine tune the quality model to different
personal or classroom use is granted without fee provided that copies are We apply correlation analysis to SLOC data of Java classes
not made or distributed for profit or commercial advantage and that copies aggregated at package level using di erent aggregation tech- quality standards employed by different organizations. We detail each challenge separately.
bear this notice and the full citation on the first page. To copy otherwise, to niques, and defects (bug count per package). As a by- First, a practical quality assessment needs to combine the results of various methods to answer
republish, to post on servers or to redistribute to lists, requires prior specific product of our evaluation, we also study the correlation be- specific questionsassuggested by such modelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal-
permission and/or a fee.
ICSE ’ 11, May 21–28, 2011, Waikiki, Honolulu, HI, USA tween the di erent aggregation techniques themselves. The Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might be combined with test
Copyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00. choice for aggregating data from class to package level rather

Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza,
59650 Villeneuve d’ Ascq, France. E-mail: Nicolas.Anquetil@inria.fr

Copyright c 0000 John Wiley & Sons, Ltd.
Prepared using smrauth.cls [ Version: 2010/05/10 v2.00]

BeNeVol 2010 WETSoM 2011 ICSM 2011 JSME


Master Thesis presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Master Thesis presentation

Similar to Master Thesis presentation (20)

More from Bogdan Vasilescu

More from Bogdan Vasilescu (10)

Recently uploaded

Recently uploaded (20)

Master Thesis presentation